How to Program

This advice assumes you already are a programmer – but that you want to be a much better one! It reflects my own personal methods, and may not work for everyone. Still, I hope you find it helpful. I know that I myself am a much better programmer now than I was when I started, and this is what I have learned.

My advice reflects decades of experience in multiple languages, multiple problem domains, and multiple applications that are used all over the world. I have written this in the order of the steps that you would take in developing a new software project; but I have also prioritized the importance of the critical steps.

In general, I am trying to impart a scientific approach to programming.

Before anything else, start a written log for your project. The purpose of the log is to clarify your thinking about the project, to record test results, and to save you from having to solve the same problem more than once. I prefer LaTeX for maintaining such logs, because LaTeX turns out to be easier to use, and to produce more beautiful results, than word processors. But using a spreadsheet or a word processor would also work. The log should be used on conjunction with comments that go into your source code control system whenever you check in a revision to a file (see below). In a sense, the log and the checkin comments are part of the same system. The present paragraph is the second most important paragraph in this advice.

At the beginning of the project log, write down as clearly and concisely as possible what the new software should do. This is the “Program Specification” (also called “Design Specification”) for your project. At first, you will not be able to be precise enough or complete enough. But be as clear and concise as you can be. As the project proceeds, you will clarify and re-clarify the “Program Specification.” It will become the Introduction to the “User’s Guide” for your software.

Use only general-purpose, high-level, widely used programming languages. These are (roughly in order of actual usage for critical software such as operating systems, or high-performance applications in warcraft, automobiles, medicine, finance): first C++ and C; then Java, C#, and Python; possibly FORTRAN, possibly Lua. Use only one language if at all possible; but in some cases, you will need to use more than one. For example, if your software needs to be user-programmable, then often it is a good idea to write the user-accessible parts in Python or Lua and the internals in C++. For another example, you may want to write some numerical routines in FORTRAN. The main reasons for using widely-used languages are (a) they are better maintained, and (b) when (not if) you run into difficult problems in using the language, you will probably be able to google for solutions because (remarkably often!) other people will already have encountered your problems and solved them for you.

Use the optimal language (or languages). Identify your problem domain (e.g. in my case computer music, trading systems, or e-commerce) and look at the leading software systems in that domain to see what languages they use. But if you think some other language would be better, go ahead and add it to the list. Make a scatter plot. The X dimension is frequency of usage in your problem domain (starting with the most used language). Rank the following in order of importance for your project: runtime efficiency, ease of writing code, ease of reading code, and size of user base. The weighted average of these features is the Y dimension of your plot (starting with the largest value). Plot each language as one point on both dimensions, and simply use the language that is closest to the origin. Use that language even if you have to learn it first. Do not use a language because you know it, or like it, or because it is fashionable – use the language that looks most likely to solve your actual problems. However, trust me – you can’t go wrong with C++.

If existing programs or libraries solve your problem, or part of it, then use them. Do not re-write software that already works. Prefer open source software to proprietary software – it is cheaper, it is easier to understand, and you can try fixing it yourself when (not if) you run into problems. Prefer third-party software with larger user bases, since such software is likely to be better maintained and to last longer. Also prefer software that runs on multiple platforms. Do not distract yourself from the difficulty of your real problems by playing with code for no good reason. The present paragraph is the third most important paragraph in this advice.

The choice of build system is almost as important as the choice of language. I prefer SCons, others CMake. Do not use autotools, or makefiles, or Visual C++ project files. Everything goes into one build file – libraries, programs, tests, documentation, installation, you name it. The present paragraph is the fifth most important paragraph in this advice.

You also need, in order of decreasing importance, a source code control system, a good visual source-level debugger, a good code editor, and a GUI form builder. There is no reason not to use an open source system, such as Git or Subversion, for source control unless your institution requires otherwise. In many cases Emacs is a good editor, in other cases Eclipse, in yet others Visual Studio.

Once you have got this far, create an empty source repository and empty build system for your project, write a stub application (or library, or whatever it will be), and compile it. The reason for doing a stub first is, that way you don’t begin by writing loads of code that turn out not to be correct for your platform or compiler. Every time you write some code, rebuild the project. Get rid of all compiler warnings if at all possible.

Now you are writing code that compiles without warnings. If your editor has an automatic code reformatting feature, use it constantly instead of manually formatting your code. If you are just getting started, consider using the default formatting of your editor. If your company has a policy on this or you want consistency with existing code, change the reformatter settings as required but do use the auto-reformatting.  Don’t put blank lines in your code, except between class and function declarations and definitions; because the more code there is on a page, the easier it is to read, and your idea of where to put a blank line inside a function is not the same as mine.

Do not comment code; instead, use understandable names for functions, variables, and classes – including understandable names for loop indexes and such. One reason is that you will change the code and forget to change the comments – therefore, comments can actually end up making code harder, rather than easier, to read. An even more important reason is that doing this also helps you to clarify the code, which helps to make sure that it really solves your problems (see below).

But there are important exceptions. You need a “Reference Manual” for your software. Whenever you need to document something for the “Reference,” do it only using Doxygen comments in the code. Also, if you are not sure whether you yourself have clearly understood an algorithm that you are about to code, or you think somebody else reading your code might have trouble understanding how a function or algorithm works, put in comments that explain what is happening as clearly and concisely as possible – always using complete sentences that start with capital letters and end with periods. Whenever you clarify the code, either remove or clarify these comments as well.

Do not abbreviate, either in code or in comments. Spell out everything, even if it takes more typing. The reason is that you will search for names (often), and if you use abbreviations you will not remember which abbreviation you used.

As for checkin comments, keep them as short as possible; their real purpose is to help someone looking at the source code tree to find changes, who will then look at the source code and comments in source code for more complete information.

Do not use the C preprocessor if at all possible (or any other similar facility), except to prevent multiple inclusions of header file contents. The preprocessor not only makes code harder to debug, it also makes code harder to read.

In C++, put as much code as possible into your header files. Put member function definitions inline, in the class bodies, in the header files. Either put each class into its own header file, or put all related classes into a single header file. In many libraries, you can put all the code into header files, or even one header file; and if you can, you should. If you are writing a plugin that does not expose a programming interface for compile-time linkage, then put all the code into one regular source file, including all declarations (i.e. no header file at all). The idea is to have the smallest possible number of files. Bigger files are easier to search and to read, and fewer files make the build system easier to maintain. Modern compilers won’t complain. Similarly, you want to have the smallest possible number of declarations, hence it is better to define functions inline in the class bodies in the header files.

Now you are ready to design. Start with the basic ideas. They will vary from project to project. For a library, that might be the API declaration; for a program, it might be the basic data structures and algorithms, or a database schema; for a plugin, it might be a data flow graph. Finish the declarations, implement stubs for all functions, declare instances of all objects in the test code, and get it to compile without warnings. Now start actually implementing functions…

All of the above is preliminary. The following is the core of my advice. The last numbered item is the most important, but the first item is the one to which you should pay the most constant attention.

  1. Do not write fancy, tricky code. Write plain, straightforward code. If at all possible, use only the facilities of the language itself. Never use a platform-specific system call or a function in a third-party library, if the C runtime library or the standard C++ library contains a function that does the same job. Indeed, if you are using C++, use the Standard C++ Library of collections and algorithms as much as possible, this is fantastic software, roughly as fast as assembler, usually does not require debugging, and is easier to read than what you would come up with. If your code is multi-threaded, prefer OpenMP, which is a part of the C++ and FORTRAN languages, to pthreads or Intel Threading Building Blocks, and implement multi-threading using task constructs at the highest possible level of granularity. Use only the core, stable features of the language. Use standard algorithms if at all possible. Write code that kiddies can read and understand. Your code should look just like the short examples in a beginner’s text for your language. Believe me, such plain code will run faster and be easier to maintain than fancy code. Yet clear code has an even more important role in testing the correctness of your ideas (as explained below).
  2. As you write your software, also write a program (or programs) to test it. Keep adding tests to this program or programs until your project is finished.  If you write code without testing all important use cases, your code will not be correct.
  3. If your code has a client-server structure, or a protocol stack, make completing a round trip through the stack the priority, before adding any other functionality to your project.
  4. Re-work the design of your code as often as necessary. This includes re-writing the “Program Specification.” Code expresses ideas. These ideas are like a scientific theory or set of engineering principles. Or rather, they are not like a theory; they are a theory – and a formal theory, to boot, since a programming language is a formal language. The ideas/code should capture the problem domain completely, without contradiction, and as concisely as possible. You must re-work your code until it produces the scientific or artistic “Ah-HAH!” sensation that tells you, both logically and intuitively, that you have captured the problem domain in this complete, consistent, and concise way. You can get the “Ah-HAH!” from a half-baked idea; but you cannot have a correct idea without the “Ah-HAH!”  If anything seems not quite right, or is bothering you even a little, you must rework the code. Such intuitions are the foundation of success in programming. Some examples of software that elicit this “Ah-HAH!” are the Standard Template Library in C++, the Jack system for Linux audio, the C programming language, the Scheme programming language, the Lua programming language, the SQL language for databases, the Swing user interface toolkit in Java, and the first spreadsheet programs. Examples of software that does not do this are the Java language itself, the C++ programming language, Microsoft Office, and all operating systems.

The reason for putting clarity and testing ahead of formulating ideas is that, in reality, you will not be able to write clear, concise code that runs without error until you have grounded your code in consistent, concise ideas that completely capture your problem domain. In other words, the tests are the engineering experiments that tell you whether your code/theory about the problem domain is correct.

Yet I have put the clarity of the code ahead even of passing all tests – because although it definitely is possible to write code that passes all tests, but does not express a complete solution to the problem, it is all but impossible to write such code clearly. In other words, the more clearly the code is written, the easier it is to spot what parts of the problem you have missed. The present paragraph is the most important paragraph in this advice.

To get the utmost runtime performance from your code, then in order of decreasing importance:

  1. If your code can be analyzed into tasks that can run at the same time without affecting each other, do this at the highest possible level of granularity and run them in parallel, e.g. using OpenMP.
  2. Make sure you are using the highest-performance libraries, e.g. for matrix operations use the Intel Math Kernel Library or Eigen.
  3. Make sure you have the optimal build options, e.g. in-line as much as possible, use auto-vectorization.
  4. Then cut to the chase. Do not try to figure out what code is fast and what is slow, unless you are a chip designer or a compiler engineer you cannot know enough about what the compiler is doing or will do. Use a profiler to measure the actual speed of your code. Basically, start with the block of code that eats the most time and speed it up. If you have been following my general advice here, this is not likely to do much good unless it enables you to select or design a more suitable algorithm to use for that block. Then go on to the block of code that eats the next most time and speed that up; and keep going until you are not significantly speeding up the project. “Significant” in this context is plus or minus 5 or 10 percent. Write up all results in the project log.

Sometimes the debugger is essential, but for the most part, you should avoid it like the plague. As a project grows, the software quickly becomes complex, and then tracing through the debugger can become unbelievably time-consuming. Rather than debug, re-read the code line by line and try to figure out what it is actually doing (thanks, Vipin!). Formulate a theory of what went wrong, and put in print statements to test your theory. You can narrow down where the problem by trying this in different places, working from higher up in the code on down to lower levels. However, if your problem crashes or throws an exception, use the debugger right away, because it will automatically show the stack trace at the time of the crash or exception. Write up all theories and test results in the project log. Note that on platforms with something like DTrace, you do not need to explicitly code print statements. On such platforms you should maintain a library of trace scripts for the project, perhaps in the body of the project log (another reason for using LaTeX, which uses plain text files). The present paragraph is the fourth most important paragraph in this advice.

If you work in an environment where code review is possible or required, by all means, use it. Don’t use it automatically; but do use it the instant you get into a bind, or any time you think someone else  is more likely than you are to solve your problems.

If you follow this advice, and especially if you internalize the philosophy of writing the clearest possible code and testing all use cases, you will be surprised at how little debugging you need to do. You will spend a very short time setting up your build system and stub project; what seems like forever fussing around, with your head hurting, while you re-work and re-work the ideas and clarify and re-clarify the initial code; then the “Ah-HAH!” will come, and you will finish coding all functionality and test cases in what seems like a remarkably short time – often, coding as fast as you can type.

When you are done you will also have a test suite, a “User’s Guide,” and a “Reference Manual;” and your code will probably port very easily to other operating systems.

Comments are closed.