by Sonia Connolly
In my first programming job out of school, I was assigned to debug an overlaid1 (meaning it needed more that 640K of memory to run, and paged itself out in sections) MS-DOS executable written in C that crashed intermittently. It took me weeks to track down the uninitialized array that was causing the problem. Always initialize your newly allocated memory!
Aside from one memorable lecture where my C professor handed out six versions of a student’s homework assignment and showed us methodical debugging techniques, no one at school talked about working with existing codebases, let alone ones that were too big to fit in memory. That was for lowly “software engineers,” and we were lofty “computer scientists.”
The second 90%
It turns out that I like software engineering. It’s like that old aphorism, “There’s the first 90% of a project, and then the second 90%.”2 Not just proving that something can work3, but following up on all the practical details and special cases to make it work for actual customers and keep it functioning over time. I have developed a few greenfield (started from scratch) projects in the 25 years since that first full-time job, but most of my work has been on sprawling, messy, awkward legacy codebases written by other people.
Just as some computer scientists look down on software engineers, some software engineers look down on those who work on legacy systems and/or maintain production code rather than creating shiny new applications. “Hotshot” programmers work with the latest technologies on highly visible projects with lots of scheduling drama. “Maintenance” programmers work with older technologies on lumbering projects that are held in oddly low regard even though they usually provide a major income stream for the company.
Diverse legacy teams
Typically, cis straight white men jostle each other for “hotshot” positions, while women and people from other under-represented groups get shunted to the “less desirable” jobs on legacy teams. This means that legacy teams tend to be more diverse, and can be more relaxed, pleasant places to work. There tend to be more assumptions of competence, and less mansplaining, whitesplaining, etc. More camaraderie, and less of an attitude of, “What are you doing here?”
Maintaining legacy code could be seen as “women’s work,” cleaning up someone else’s mess. At the same time, it can be fun to track down bugs in an unfamiliar codebase. It can be even more satisfying to gradually build an internal model of a sprawling codebase over time and use hard-earned intuition to go directly to the source of an issue.
Messy because it works
Programmers are prone to declaring that legacy systems need a full rewrite, without recognizing that by the time the new system handles everything the old system does, it will be just as scraggly4. There is a hidden assumption that those past programmers did not know what they were doing, and of course we would do it better. Possibly true. More likely, they were well-intentioned, competent people doing their best with the resources available at the time, just like us. Also, they might still be sitting just over the cubicle wall, or a few rows over in the open workspace, or in the same remote chat room. Best to be courteous.
Sprawling, messy software systems get that way for a reason, usually many reasons. They are often written and maintained by more than one programmer, with varying styles and idiosyncrasies, under time pressure, with requirements and features that change over time. Even if a system starts with pristine, carefully structured code, it quickly acquires baroque special cases and exceptions to accommodate customer needs, performance tuning, and workarounds for bugs in other software. Last year’s best practices look outdated and hard to maintain this year.
Weighing tradeoffs
More than once, I dove in to refactor a gnarly “nonsensical” section of code, only to learn more about the tradeoffs in play and understand why the past programmers made the decisions they did. I might still prefer a different solution, but the old one does make sense. Perhaps multiple processes are adding to a queue of jobs, requiring repeated checks for whether a queue is empty. Perhaps the scheduling rules for this business process really are that complicated. Perhaps that simpler-looking piece of code triggers a core language bug, and no one has had time (or interest) to dig into the mysterious details.
As I weigh tradeoffs myself, I have probably left behind sections of code that puzzled or dismayed my successors. Looking back, I probably did not need bit flags to keep track of that one set of dialog options, although it did make it fast to check whether they were all set to false.
Sometimes, the best course is to leave code unchanged, no matter how messy it is. I spent a week refactoring a long procedural section of code studded with if-else clauses that had accreted over time. I created small classes and methods with single responsibilities, and confirmed that the tests still passed. In the end, we did not merge in my changes, because the complex code was a mission-critical interface to an outside vendor, and we decided the costs of destabilizing that workflow outweighed the benefits of cleaner code. The time was not wasted, because I gained an in-depth understanding of the existing code.
Eliminate issues in advance
While some mysterious sections of code are still needed for obscure reasons, other features become obsolete over time. Tugging on a thread might result in deleting 1000 lines of unused code. Of course, you do not want to discover whether it is used or unused by trying it in production. Thorough test coverage is a legacy maintainer’s friend, in addition to careful code path analysis and consultation with product or project managers.
Like a lot of code maintenance work, deleting code might not feel as productive as implementing a new feature. At the same time, no one will ever have to read, debug or upgrade that code again. Overall, maintenance contributes to future productivity by eliminating problems before they happen.
Legacy codebase spelunking tips
- Start small. If you get a choice, make your first task small in scope. Expect it to go slowly as you get oriented in the codebase.
- Expect to be puzzled. At first nothing will make sense. How the code fits together will become clear over time. Have patience with your learning process.
- Pair program. If there are developers available who are familiar with the codebase, work alongside them to pick up their insider knowledge. Work with someone else even if they do not already know the code. Hard problems are best investigated in teams.
- Read documentation. Read whatever you can find. Add to it as you learn. Your successors, possibly including future-you, will appreciate it.
- Read the tests. Tests can give you hints about what the code is meant to do, and where past trouble-spots have been. If there are no tests, start by writing some.
- Run the tests. Make a change, and see what tests break. See if you can write a test that reproduces the bug you are fixing.
- Search the code. Get a fast command line search utility like ack5, or ag6, especially if your preferred editor does not have a fast multi-file search. Sometimes tests for a single piece of code are scattered across several files. Sometimes a method is reused in unexpected places.
- Search the web. Read the whole error stack output, and search on key phrases. You are probably not the first person to encounter this problem.
- Take notes. Run experiments like a scientist, where you make one change at a time and take note of the results.
- Leave a trail. Commit to your local branch early and often. You never know when that “one little change” will break everything. Be prepared to back out and start over.
- Add print and log statements. This is a time-honored debugging technique, especially when you are not sure what parts of the code are being executed. (Except in an overlaid executable, where printf causes an immediate, opaque crash because it is no longer in memory.) Do not forget to take them out again before you commit your code.
- Look at the logs, including database logs, test output, and production logs.
Respect, collaborate, improve
Over time, most engineers who maintain legacy systems acquire a long view, or maybe there is a personality type that tends toward maintenance work. It takes respect for a wide variety of code styles and solutions to problems, and at the same time careful attention to consistent, clear code. An eye toward incremental improvement. Willingness to do tedious repetitive work when needed, as well as stubborn, creative investigation of unsolved issues. Collaboration within and among teams. A focus on user satisfaction as the ultimate goal.
Unfortunately, legacy projects and teams are often valued less than more recently launched projects and their developers. Career advancement might be slower, especially for already marginalized engineers. Legacy teams might need to do ongoing advocacy for recognition and resources.
While it might be worrisome not to be learning the latest technology to list on one’s resume, the tools and techniques for maintaining legacy code will always be needed. Respect for colleagues, joy in solving puzzles, commitment to improvement, and patience with inevitable roadblocks in legacy code are skills needed in any company with an existing codebase.
What you can do: Acknowledge maintainers
Take a look at your roles as an engineer, and in your life in general. Where do you forge ahead, creating new things? Where do you maintain what already exists, keeping it running and gradually improving it? Do you notice the maintainers in your world? Everything needs maintenance, from physical structures to roads to relationships. Where can you acknowledge yourself or someone else for the art and labor of maintenance?
Thanks to fellow legacy software engineers Sam Livingston-Gray and Tara Scherner de la Fuente for valuable conversations on these topics.
Sonia Connolly has been programming since the Internet was born. In her other career she offers bodywork for trauma and writes at TraumaHealed.com.
Illustration by Victoria Wang.
- Use of* the Overlay Technique in MS-DOS to Circumvent the 640K Conventional Memory Barrier by Andrew Vogan ↩
- The first 90 percent of the code accounts for the first 90 percent of the development time. The remaining 10 percent of the code accounts for the other 90 percent of the development time. — Tom Cargill, Bell Labs (via Wikipedia) ↩
- Why It‘s Not Academia‘s Job to Produce Code That Ships by Jean Yang ↩
- Software Rewrite: The Chase, by Erik Dietrich ↩
- http://beyondgrep.com/install/ ↩
- http://geoff.greer.fm/ag/ ↩