by Amy Farrell

If you’re like me, you use a compiler (or the compiler’s cousin, an interpreter) fairly regularly without giving it much thought. Even more likely, you use the output of a compiler, giving it no thought at all. This is actually a wonderful state of affairs!

Before compilers, programming a computer involved specifying a sequence of extremely simple instructions. For example, if you wanted to add two numbers, you would instruct the computer to:

  • store one of the numbers at a particular register (location),
  • then store the other number in another register,
  • then add the contents of the two registers.

It’s as if you had a teenager, and instead of saying “take out the trash,” you had to say “go to the garage, then find the bin in the front right corner. Open the bin, tie off the bag in the bin, close the bin again, then open the garage door, then take the bin to the curb.” 1 2

To make this even more difficult, the low-level instructions for a computer are fairly cryptic. Computer hardware is designed to accept instructions in the form of ones and zeros, corresponding to high and low voltages in electronic components. Early programs were sequences of bits — ones and zeros. Instructions were coded as a fixed-length “word” with the first bits indicating an operation and the rest indicating values or storage locations.

Today’s computers still take instructions in sequences of ones and zeros, known as “machine code.” But hardly anyone writes programs that way anymore. Machine code is tedious to write and even more difficult to read. For example, here’s what a single instruction in MIPS machine code looks like:
000000 00001 00010 00110 00000 100000
Adding the contents of registers 1 and 2 and placing the result in register 6. Example from http://en.wikipedia.org/wiki/Machine_code

In a higher-level language, an equivalent might be:
$result = $a + $b;
Easier!

Soon, mnemonics were introduced, so that instructions might use “A” for add and “D” for decrement. The programmer still needed to specify all the low-level steps, though.

Enter the compiler! A compiler takes code in one computer language and generates code to perform the same task, usually in lower-level code.

In 1949, Grace Hopper began work on a more symbolic system of instructions for the UNIVAC I. This first compiler, the A-0, provided a couple of major advancements for programmers. It translated symbolic mathematical code into machine code, and it allowed the programmer to call previously-defined routines (sequences of instructions) simply by specifying a call number for them. As Hopper put it, the computer would “find them on the tape, bring them over and do the additions.” 3

This brought a much-needed level of abstraction to computer programming. Constructing a computer program always involves some mix of understanding the problem to be solved, and understanding how the computer works. All of those stored routines on tape represented work to solve a particular problem on a particular computer — work that could simply be summarized in a single line in the A-0 program to be compiled.

The more the details of the computer can be abstracted away and common operations can be summarized succinctly, the easier it becomes for a human to reason about the logic of a program.

The first really successful high-level language was FORTRAN 1, developed between 1954-1957 by a team at IBM led by John Backus. IBM was motivated to take on this project to decrease the expense of developing software, which they discovered was more expensive than the hardware on which it ran.

The FORTRAN compiler had a similar structure to most compilers in use today: 4

  1. Analyze and parse the source code for tokens and syntax,
  2. Perform semantic analysis (assigning meaning to some of the tokens and possibly identifying errors),
  3. Optimizing an intermediate representation of the code, usually for speed or memory efficiency, and finally
  4. Generate code.

I tend to think that the major breakthrough here is that the programmer can specify work in a higher-level language; something that is more expressive and less cumbersome than machine code. But the analysis and optimization features have proven to be so valued that there are compilers that take code in one computer language and generate output in the same language. For example, Google offers a Closure Compiler that takes Javascript and produces smaller, faster Javascript.

Speaking of compilers that output code in the same language they take in, the term “recompiler” is occasionally used to refer to a compiler that operates on already-compiled code—usually to work better in the specific hardware or environment where the code is to be run.

In software programming, as in any technological endeavor, when we’re building something new, we build upon work by others. The development of the compiler and high-level programming languages is one example of how even the solo developer is part of a group effort that’s been going on for decades. What are we doing today that might change the way people work in the future?

1  You’ll notice I forgot to tell the teenager to close the door and to come back. See how hard these low-level instructions can be?
2  Thanks to my work colleagues for this example, which I have exaggerated only slightly.
3  http://www.cs.yale.edu/homes/tap/Files/hopper-story.html
4 The “Compilers” course on Coursera, by Alex Aiken of Stanford, provides a nice overview of this topic in its introductory video lectures: https://class.coursera.org/compilers/lecture/preview

Amy Farrell is a full-stack web developer at Lucid Meetings. She’s been programming (among other things) since the 1980s and continues to enjoy figuring things out.