Recently I tried to explain to a coworker my thoughts on code design. This isn’t something I’ve really done before. I shared a video that I thought summed up my thoughts. To ensure this was the case, I rewatched it and remembered another video that summed up another good portion of my thoughts. I couldn’t find that video. So I suppose this is my continued attempt to communicate my thoughts on code design.
1 big picture
1.1 goal of well-designed code
Like most goals, our goal is to save time. In our case, time is consumed when we need to create a new feature, and we do so by modifying code.
1.2 when to make well-designed code
Not all code needs to be gold. If you’re writing code to do something once, you can write sketch code to get it done quickly, and then rub it out after. If you’re writing a library that will evolve over time, you probably want to spend some time up front designing so that you save time in the long run.
Predicting when you need gold code or just sketch code, and how to ensure sketch code is rubbed out or made into gold is beyond the scope of this post. All I want to cover is how to make better-designed code given that you’ve judged that your code needs to be better-designed.
2 well-designed code
We know that well-designed code should fulfill our goal of minimizing time to modify code. But we need to know how to go about that. Because I like fundamental laws, I propose the following fundamental law of code design:
“Well-designed code is minimally dependent.”
For brevity, I’ll call minimally-dependent things simple. Like any good fundamental law, the implications should be many, and we’ll go through some of them. The 2.x subsections will justify simplicity as a means to achieve our goal, and the remaining sections will show how simplicity underlies many coding strategies (and therefore how to create simple code).
I’d like to illustrate simplicity by constrasting it with easiness, which is colloquially hard to distinguish. It is easy to say “1, 2, 3, 4, 5…” but it is simple to say “n[i+1]=n[i]+1, n=1”. In the first example, we take advantage of a familiarity with counting by 1 to give meaning. Without that familiarity, who’s to say the next number isn’t 7, or -1? Further, we probably don’t even need to say the 5 and maybe the 4 to get the same meaning across. It’s unclear exactly how many numbers we need to specify before an acceptable number of people understand what we mean. The second example is much less familiar. But it says exactly what we mean with no ambiguity and no redundancy. It’s unclear how many people will understand what we mean, but it is what we mean and anyone interested in what we mean will make it their business to understand. That’s not to say ease doesn’t have any place in the world. The symbols n and i, the mathematical symbols I’ve used, every character you’re reading — they are all useful just because they are familiar. But simplicity must be appreciated separately from ease, and ease fits better with sketch code, and simplicity fits better with gold code.
2.1 understanding code
Understanding is a prerequisite for modifying, and there is usually more code to be understood than there is to be modified. Working by yourself, you need to understand your code to create a new feature. Working with others, everyone who didn’t write the code should have some interest in understanding it, to leverage it. Code is one of those all-too-rare media where someone can lay down objective meaning and broadcast it to whoever is interested. Making this efficient pays off.
Simplicity makes understanding efficient. If something is independent, then it can be read and understood immediately. If something is dependent, you have to understand its dependencies before you can understand it.
I like to refer to the amount of stuff you have to understand at once to modify code as your context. Understandable code allows for low-context modification. Being able to reduce your context is a benefit of simplifying your code, or more generally, your workflow. Low-context code is understandable is simple.
2.2 modifying code
Assuming you could instantly understand any piece of code, simple code is still easiest to modify because any single change is meaningful. Simple code is like a genotype, and all you have to do is pick the right one to get the phenotype you want. In this case, your job is making sure your code describes reality. With unsimple code, you’d also have to make sure any change you make is consistent with the rest of your code. Code that depends on reality is useful; code that depends on itself is pointless.
3 simplicity underlies many code design strategies
We’ll be using a C-like language for examples.
A simple interface to a complicated dependency is like cheating; if you understand the interface, you effectively understand the entire dependency. You understand more for less, and your context stays smaller. Modular code is simple code.
3.2 don’t repeat yourself
There’s a common rule of thumb that goes something like this:
“If you write the same literal in two places, use a constant.”
So, say we want to calculate the area of a circle, so we write 3.14 somewhere. And then later we want to calculate the circumference of a circle, so we write 3.14 again. At this point, it’s a good idea to instead use a constant, because some day we might want to increase our precision and use 3.1416 as our value of pi. Because we haven’t repeated ourselves, our code is simple, and can be modified quickly in one place. Additionally, our code is understandable, because the symbol pi has simple meaning whereas 3.14 doesn’t have quite as much. Does it matter that it’s only 2 decimal places? Does the type matter?
The strategy itself applies to more than just constants, it applies to functions, inheritance, and generally any repeated meaning. It’s like LZ compression, where any repeated pattern is replaced with a reference to the original pattern. The end result should have high entropy.
There are corner cases where the stated rule is at odds with the spirit of the strategy. Imagine a program that draws rectangles. In your rectangle structure, you make three arrays to store the color of each edge (r, g, b). Do you put 4 or SIDES_PER_RECTANGLE for the size of each array? I would put 4 because 4 is the number of sides per rectangle. If I saw SIDES_PER_RECTANGLE, I’d be paranoid that SIDES_PER_RECTANGLE wasn’t 4, which seems like a definite possibility if someone bothered making a constant for it. So I’d check it, and that would increase my context.
I think most people will think this one is stupid:
#define TWO 2
A constant is defined as exactly what its value means. This decreases simplicity because now instead of a single concept for 2, you have two. The meaning of if(x==TWO) depends on the definition of TWO.
These statements all mean the same thing:
The bottom is the simplest. The 2nd and 3rd may be easier if x is an integer or pointer, and in real life it may be hard to make a call about which to prefer. The top, however, decreases simplicity with no added benefit. A predicate in an if statement is a boolean expression; there is no escaping that. If if(x) isn’t good enough then why is if(x==true)? Shouldn’t it be if(x==true==true)?
It’s also important not to mix up your constants just because they have the same value. Let’s imagine we’re writing software for a robot that plays a piano. In the initial design of the robot, it is given 88 fingers, one for each key. We write software for the robot that models each of its fingers and each key of the piano. We make an array of fingers, give it a size of 88, and then make an array of keys, give it a size of 88, notice the number 88 written in two places, and create a constant KEYS=88. We proceed to make lots of code with lots of references to KEYS. Then the price point of the robot has to be cut down and it’s decided that it will have only 60 fingers (but still play 88 keys). At this point we look at all our references to KEYS and it feels as useful as the constant EIGHTY_EIGHT. But anticipating and avoiding this pain is beside the point. The finger array whose size is KEYS is not simple in that it depends on the idea of one-finger-per-key for someone to understand why it isn’t nonsense. KEYS and FINGERS should have been two constants from the get go, and a reader would appreciate that someone chose KEYS and FINGER to be equal. If there’s some piece of code that can take advantage of this equality — say, the code that models fingers when they contact keys — and it seems like a good bet to take advantage of that equality, then a compile-time check can be added to that code so that if the equality breaks, the code fails in a clear way.
3.3 global variables
Global variables are a fact of life. A common rule of thumb is to try to avoid them. This is in line with simplicity; minimizing the scope of each variable means dependency is minimized, and it’s also quicker to understand that certain dependencies can’t exist. For example, a global variable used in only one place makes a reader paranoid about that global variable. On the other hand, if there’s some reality that is global, ferrying it between n function calls or splitting it up into n different facets of the same thing is just going to increase the amount of thought required to understand how these pieces possibly describe the single global piece of reality.
Let’s use mutexing as an example. Mutexes guard things. There are two extreme strategies to use: have one mutex for all things, or have one mutex per thing. Let’s ignore performance concerns (which may be entirely realistic). The former strategy introduces a global variable. The latter strategy will quickly create scads of dependencies as an ordering to the mutexes is established. In the end, the mutexes must all be ordered and therefore dependent, creating a global behavior that isn’t immediately obvious because the mutexes aren’t global.
3.4 pasta code
3.4.1 spaghetti code
A common name for code full of long functions is spaghetti code. Long functions have high context, because later code depends on earlier code. So it’s nice to have small functions. This doesn’t mean you should enforce some arbitrary length on your functions; you can’t escape complexity in the reality you’re trying to describe. Splitting up a function that shouldn’t be split up leads to lasagna code. You want to make functions with nice interfaces. Usually as you go through a long function, you’ll notice points where much context can be eliminated. Splitting the function at this point makes explicit that conclusion. When splitting a function adds no benefit, you’ve hit its optimal length.
3.4.2 lasagna code
A not-so-common name for code with many layers of abstraction is lasagna code. Most of the layers will have vague meanings, because they are not designed to model reality; and inefficient interfaces, because the interfaces were cut at random places to optimize some arbitrary metric. As a result, to understand the code, you’ll have to open your editor to multiple locations in the project. This is a physical manifestation of the intradependency in the code. Simple code requires you to only look at one place at a time. The same strategy as with spaghetti code is correct; find separations that have small amounts of dependencies and turn them into interfaces to make them explicit. When cutting this way is no longer beneficial, you’ve hit the optimal number of abstraction layers.
3.5 coding style
Consistent coding style makes simple code. If you name one class Platinum_refinery and another ptSmelter then you have to remember these two strings individually. One starts with a capital, has underscores separating words, and platinum is written out in full; the other starts with a lowercase letter, uses camelCase, and has the symbol for platinum. In other words, they share nothing. Say instead they were named platinum_refinery and platinum_smelter, then you can remember that in this project, classes start with a lowercase letter, words are separated by underscores, and platinum is always written out in full. In terms of the individual classes, all you need to remember is one’s a platinum refinery, and the other’s a platinum reactor. And if you think your IDE will save you, I challenge you to find one that knows pt and platinum are the same thing, and how to change that to ag and silver at once.
4 not-so-obvious simplicity
Simplicity applies outside of code. It applies to the tools with which you make code.
Any optimizing compiler for a common language is an incredible piece of work. The team who created it had to navigate a language spec and get the compiler to create correct performant code quickly. Don’t assume your compiler checks that your code conforms to the language spec. When you see an error, it is simply because the compiler cannot proceed. When you see a warning, the compiler is (usually) generously informing you of something you probably didn’t intend to do.
Get rid of warnings. If you have a set of warnings that are “OK”, how do you expect a reader to know or care? The difference between 38 and 39 warnings is easily missable and easily judged to be fine. The difference between 0 and 1 warnings makes people uneasy. I think warnings are generally treated with less gravity than they should be. Yeah, sure, a few Microsoft ones seem to be designed to create Windows-only code, and some are just telling you you are doing something kind of uncommon. But C/C++ signed/unsigned conversions, for example, are important to consider. C/C++ signed math has undefined behavior associated with it, and according to the language spec, if there’s any undefined behavior, the whole program is undefined. That’s very unsimple! Reasonably speaking, a bad line wont completely transform your program — the mess will most likely remain somewhat local — but it does mean debugging can be a hell of a job.
A common saying simply states, “Macros are evil.” Macros don’t have type checking, and the errors that come up are ugly and hard to understand. You have to read the macro and the place it’s used at the same time, or dig up the preprocessor results, but good luck integrating those with the rest of your tools. So it’s not simple to fix a break that involves a macro. But that doesn’t mean you should never use a macro. Macros are another tool in your box. One must judge whether the inevitable macro errors or the problem that could be solved with macros is the bigger evil.
4.2 version control and testing
Say you’re working on some feature, and then realize you’re in so deep that you can’t verify if you haven’t regressed, and you can’t see the light at the end of the tunnel. If you were using version control, you can back out of the tunnel, establish the testing you need to verify your work, and then split up the feature into appropriate chunks, each one validated by the testing framework. Escaping from a highly dependent mess of changes and creating a framework that allows you to create independent changes is the power of version control systems. More generally, they allow you to externally manage your context of changes.
4.3 reinventing the wheel
Reinventing the wheel is simple. The wheels out there aren’t exactly what you’re looking for. It’s typical to find a library with a superset of the features you want, because, being a successful library, it has more users than just you. A needle-focused library is simplest for your project. My most common reason for avoiding 3rd party code is to keep my build simple. Setting up a build is probably less common than modifying code. But to me this is what makes it insidious — I often forget how to set up complicated builds.
Simplicity is at odds with the common rule, which is “don’t reinvent the wheel”. 3rd party code can represent huge amounts of work, possibly dwarfing the amount of work you’ll ever do on your project. In this case, it’s hard to argue for long-term time benefits because you’ll never be able to do the amount of work needed to create the library you want to use.
So I suppose here we have to defer to the bigger picture. If using a 3rd party wheel seems to be the quickest overall, do it. If inventing a new wheel seems to be the quickest overall, do it.
Moreso than the other posts on this blog, I plan to give this post life by updating it as my experience grows and as I think of more interesting points.