How to Structure a Large Project

From Jonathan Gardner's Tech Wiki
Jump to: navigation, search

Introduction

Beginning (and sometimes, not-so-beginning) programmers sometimes think they want to write a huge software project. They wonder how you structure and manage a codebase of millions of lines of code.

I have my own ideas, and my idea is simply this: "Don't."

What Happens in Big Projects

The Network Effect is a relatively recently understood phenomena. The basic mathematical principle is this.

  • Suppose you have some things.
  • Suppose each of those things communicate or have a relationship with all of the other things.
  • Notice how the number of connections dramatically increase as you increase the number of things? That's generally a bad idea. (See [1].)

The formula for counting the number of connections is:

n(n-1)/2

Or, in computer programmer math:

O(n^2)

Now, why is this relevant? Because every node can be a line of code or a file or an object or a function or whatever. The number of things it deals with increases the complexity by roughly the square of that. That means, having to worry about 100 things is about 100 times more complicated than worrying about 10 things.

Every new file or line of code or function or object class or whatever will cost you about as much effort as you have already spent.

In practice, this is roughly true---unless!!!!

You keep things separated from each other.

Which is why I worry about things like abstraction and simplifying the interface. These practices reduce the number of connections between things by keeping things separate and limiting the interactions between them. Adopting these practices means adding new things is cheap for me while it is expensive to others.

On a big project, you don't have a lot of incentive to separate out your components. And if you don't design it that way from the beginning, you will not have separated components in the end. And that means your productivity will gradually decrease until it's cheaper to re-write to add one more feature.

What I Recommend

The first step in a big project is to break it up in smaller projects. This is universally understood by everyone who works on big projects, be it in construction or engineering or politics or whatever.

Choosing how to break it up is really hard. The idea is that the parts inside a component are heavily dependent on the other parts inside the component and not so dependent on the parts outside the component. The other idea is that you try to reduce the number of parts inside the component.

Take, for instance, an automobile. There are many parts to an automobile, and all of them are loosely integrated. They generally rely on certain holes being at exact positions and that's about it. That means I can rip out the front seats and replace them with any other kind of front seat as long as the posts match the place where they go.

Inside the automobile, we can break the car down into these big components.

  1. The frame. This holds everything together, but is only a skeleton. The frame usually includes the exterior walls of the car and the floor and firewall.
  2. The drive train. This is generally everything from the transmission to the wheels, including the wheels. It also includes the steering wheel and the brake system.
  3. The engine. The engine is all the systems needed to provide energy to the drivetrain. Note that other systems rely on the engine as well, such as power for the stereo or power for the air conditioner. Even the brake fluid may need power, along with the steering system.
  4. The interior. This includes the seats and the nice things that make cars livable.
  5. The electrical system. This is all the lights and such.

Each of these components are loosely integrated with each other to the point where you can take one part out and put it in a completely different model of car.

Within the engine itself, there are simple components made up only of a few parts. For example:

  1. The pistons go up and down.
  2. The valves go up and down.
  3. The timing system drives the valves and coordinates them with the pistons.
  4. The crankshaft transfers the work from the pistons to the transmission. It also connects, by pulleys and chains, to the other systems that require power.
  5. The carburetor only mixes air with fuel.
  6. The fuel injector only sprays fuel, hopefully into the carburetor.
  7. The exhaust simply carries the hot gasses away from the engine.

If you build a system composed of parts as simple as a carburetor and pistons and such, you are guaranteed to succeed. Why? Because even an idiot (and we are all idiots from time to time) can build a component part, put them together and it will work.

How to Componentize Software

With Free Software, componentizing software is not only easy, it's by far the best way to do things. You take your big project and break it up into small, easily understood components.

The next step is to go shopping on the Free Software "marketplace" for off-the-shelf components that will fit your needs. When you find something that doesn't quite fit, you will have to adapt your design or adapt the project to your requirements.

This will only get you so far. Eventually, you'll have to build some components for your specific system. But I strongly recommend you stick to build small components, not large, tightly-coupled projects. These components should be useful in their own right. You should share them with the Free Software marketplace so that you can take advantage of anyone who might want to use and adapt your components.

In the end, you'll get something that looks more like Pylons and less like Django. Or more like KDE and less like Linux.

Scaffolding

Scaffolding is a temporary structure used to build or work on houses and buildings. If you first build the temporary building, it's easier to work on the real building because you can get around.

In a software project, scaffolding may look like this:

def standard_deviation(series):
    return 1

or even:

def standard_deviation(series):
    raise NotImplementedError

Having functions that work, but don't do quite the right thing, allows you to quickly write all the parts of the system and quickly get something working. From there, it's a matter of writing tests on the scaffolding that fail, and then making the tests succeed by fixing the code.

Get it to Work and Keep it Working

If you get your entire software stack to "work" (in the scaffolding sense), then you are well on your way to completing the project. Once the project works, you can see how a feature is added to each component, and can easily verify that the components are communicating one with another as you planned. You'll also quickly identify when you have an interface conflict---that is, when one component and another component disagree on what their interface should be. Identifying and resolving these is your primary duty because the interface dictates the function. If the interface is broken, then it doesn't matter if the code works or not, you still have to rewrite it.

Sticking to Standards

Your components should stick to whatever standards of management exist for components like yours. Either the programming language has a community that adheres to a standard or the component itself belongs to a different community with their own standards. This solves a lot of problems of "How do I structure my code?" because you are really asking, "How do others structure their code?"

If the standards suck, then it's not impossible to start a new standard. A-A-P by Bram Moolenaar of ViM fame is a good example. You should note, however, that A-A-P is still not nearly as standard as make or other systems out there, so the caveat of starting a new standard is that the chances of your standard being accepted are almost nil.

Conclusion

By the end of this essay, I should have convinced you not to structure a large project. Instead, build small components that are loosely integrated.