Skip to content Skip to navigation


You are here: Home » Content » Planning for Focused Modularity in Open-Source Software Development


Recently Viewed

This feature requires Javascript to be enabled.

Planning for Focused Modularity in Open-Source Software Development

Module by: Warren Myers. E-mail the author

Summary: There are a large number of Open-Source Software projects whose codebase is not clean. The goal of this module is to get developers to think more about modularity when creating solutions.

Most programmers, when faced with a challenge, come up with a quick solution that just 'works'. Unfortunately, these solutions are often written as one-off answers to a problem at hand, and little, if any, thought is given towards making it portable and reusable. And if they do happen to put a tad of effort and time into ensuring its reuse, they generally end up with an overly complicated piece of code that you wouldn't want to reuse. The dilemma comes when they need to update their code, or if they need a little chunk of it in another program. Even if they did think about reusing the code briefly, they were so excited about getting the answer, they didn't follow through on their musings.

The solution to this quandary is spending more energy in planning and determining exactly what this new piece of code has to do, and then implementing only what has been deemed absolutely necessary. For example, a friend asked me for a way of ensuring he did not run past the end of his string array in a C++ program he was writing. I spent some time thinking about what he meant (he was new to programming at the time), and came up with a templated, memory-safe object that acts as an array, with normal array notation, but that can be accessed like a list also. You can see the direct fruits of this effort here: arrlist.h. I then took the work I had done on this generalized piece of code, and then partially optimized it for use with the C++ string type: strlist.h. Notice that I took the time to not just force the strlist to be an instantiation of an arrlist, but I actually rewrote the class to work specifically with strings.

Two questions come to mind, "how did you determine what needed to be done?" and "why didn't you solve the exact problem first, and then, if at all, generalize?". I will handle these questions in order. First, I determined what needed to be done from several brief conversations with my friend, and some hands-on experience working with the code he was writing. Since he was new to programming, I was his in-house C++ guru. Having helped him with basic syntax, the overall logic for his program, and wading through a few of the myriad oddities of the MS Win32 API, I had a pretty good idea as to what he needed my class to do. He was maintaining several lists of strings, concatenating each listed extension with each possible prefix. This required maintaining the exact size of the arrays, and any time he added items to either of the arrays, he needed to update the size.

Due to certain security requirements of the program, the list has to be updated in the main source, and any updates to the program require recompilation instead of just reading a resource file of some sort. With all of these things in mind, I began working on the problem of creating a memory-safe, array-like object that he could use with an almost zero learning curve. My initial ideas led me down a path of creating a very complicated object that would automatically allocate new memory for itself when you added items, but would let you access it like an array. I tried adapting some previously written code I had from when I was a student, but it turned into a far more complicated activity than I had anticipated. So I discarded that work, sat back and doodled some other ideas in a notebook.


Even if you never read it again, writing ideas down in a notebook is fantastically helpful.
While doodling and letting my mind just stew on the matter at hand, I perused some other code I had written. During my mental wanderings, I happened across the FiXed StaCK class I wrote near the end of 2001. As I went through that code, I realized how much of it could be directly used, and how close it was to what I wanted to accomplish with my arrlist class.

This leads me into the second question: "why didn't you solve the exact problem first, and then, if at all, generalize?" If you have taken any advanced computer science classes, you will, no doubt, recall that your professor told you at one point, "Solve the general case. The general case is easier than thinking about all the special conditions this problem asks." (Joel Hollingsworth). My professor certainly told me that, many times. As you may have guessed, it's actually easier to solve this general case, also. The specific demands of making the class work with strings added several considerations to the base tool. So, bearing in mind the advice of my professor, I figured out how to solve the general case.

As you will notice when reading the code to the arrlist class, I made several assumptions about the use of the class. I assumed that you could compare two of whatever you decide to hold in the list. I assumed that you wouldn't normally use the default size limit of 10000 items (though there is no technical reason why you can't on most systems). And I assumed that you were more worried about the class just 'working' than the hows of what I did. A serious C++ programmer, or anyone who wants to learn some of the more interesting things you can do with templates and overloading, may find a great deal in what I wrote, but to the average programmer who just needs the code to work correctly, the first time, he can copy it and use it with only the understanding of the interface as described in the heading comment.

The general case solution also works for my friend's specific case problem. Because arrlist is templated, you can have an array list of anything, int, char, string, another arrlist, etc. By building the general case first, I incidentally solved the specific problem I was asked to solve. It also gave me the opportunity to roll out a solution that would work, and would work now, rather than waiting for me to complete the specific fix. When I moved into adapting the class to work specifically with C++ strings, I started by removing all of the template/generalization stuff in the class. I also built a couple other ways to get data into the list. For example, the strlist class allows you to copy part or all of an existing (normal) array of strings into itself. There are a couple rules you have to follow when copying, but they are explained in the heading comment.

The tailored solution only took me another couple days to write and test, but while I was doing that, my friend was able to continue working on his project.

So, how does all of this relate to the topic at hand, "Focused Modularity in Open-Source Software Development"? It relates quite simply. I was presented with a problem, and focused my efforts on solving that problem, but in a manner such that the solution can be very easily sucked into another project. In this case, I wrote a class to support an existing project. But this is how all software development, especially Open-Source, should proceed. For any given problem, there are multiple ways to arrive at the solution. The example I gave above is small, but scales easily. Imagine that, instead of just needing a simple class, I needed to write a content management tool. Should I sit down and just one-off a tool that 'works'? Or should I take some extra time, break down the parts of the problem into little subproblems, decide how many of those can be made compact, self-contained modules, and then write them as I have time, making sure each module does just what it is supposed to?

I trust that you would go with choice two. If the entire Open-Source community were to approach every project like this, there would be the potential to speed everyone's development times. I'm sure that classes such as the one I wrote exist, and that I was not the first to come up with the idea. But the ones that do exist are either locked up in corporate code libraries, not published to the web, or are hidden in larger projects.

There is an enormous wealth of source code available in thousands of Open-Source projects. Unfortunately, there is no way of finding what has been written without examining each project for the tools it uses. Beyond just getting OSS developers to think about focusing their development energies on modularity, I hope to get developers to publish lists of classes, functions, or sub-tools they have written and included in their projects. Much of my development effort goes into writing 'gut' code - the code no one ever sees, doesn't do display or anything 'exciting', but needs to be there to support all of the 'exciting' stuff the user interacts with.

I maintain lists of everything I've written, though they sit on a few different websites. I would like to see other Open-Source programmers take a little time and catalogue the work they have done and then post lists of the libraries they have developed. Except when you're first learning how to program, there is no need to constantly reinvent the wheel. The wheel, or the class, or the routine, has probably already been written. Open-Source developers need to take some time and make sure that other programmers can find those wheels.

Content actions

Download module as:

Add module to:

My Favorites (?)

'My Favorites' is a special kind of lens which you can use to bookmark modules and collections. 'My Favorites' can only be seen by you, and collections saved in 'My Favorites' can remember the last module you were on. You need an account to use 'My Favorites'.

| A lens I own (?)

Definition of a lens


A lens is a custom view of the content in the repository. You can think of it as a fancy kind of list that will let you see content through the eyes of organizations and people you trust.

What is in a lens?

Lens makers point to materials (modules and collections), creating a guide that includes their own comments and descriptive tags about the content.

Who can create a lens?

Any individual member, a community, or a respected organization.

What are tags? tag icon

Tags are descriptors added by lens makers to help label content, attaching a vocabulary that is meaningful in the context of the lens.

| External bookmarks