Refactoring Legacy Code
My recent talk at GoGaRuCo is about refactoring legacy code. The talk is based on the Gilded Rose code kata, which is a really fun kata to try. I highly recommend setting aside a few hours to play with it. There are variants of the kata in a number of different languages.
This post reviews the main points I made in the talk, expanding on some of them a bit more than time allowed in the talk itself.
First of all, if you work with legacy code, you need to read Michael Feathers’ book, Working Effectively With Legacy Code. This book gives all kinds of techniques for understanding legacy code and getting tests around it so that you can safely refactor.
The Gilded Rose kata comes with a spec. I ignored it initially because with legacy code, it’s hard to tell how accurate the spec is. It might be out-of-date, or the code might be buggy and not match the spec. I did come back to the spec at the end once I had refactored the code and understood it better.
Every time I’ve seen someone tackle the Gilded Rose kata, they’ve chosen to rewrite the code. It might be an incremental rewrite like Sandi Metz does in her awesome All The Little Things talk, or it might be a full-on, throw-all-the-code-away-and-start-over rewrite. I’ve never seen anyone try to refactor the code into shape. And yet, rewrites are almost never a good idea. It’s really important to know how to refactor legacy code, so that’s what I demonstrated.
In order to make progress on necessary features while making the code better, I tried to balance the following two guidelines:
-
The Boy Scout Rule: “Leave the campground cleaner that you found it”. Every time you have to work in a piece of legacy code, do a bit of extra refactoring to make things better. The more you touch that code, the cleaner it will get over time. Code that you never have to touch will stay messy, but that’s OK because you never have to touch it. The code that will benefit most from the investment in extra cleanup is the code that gets the extra cleanup.
-
Don’t Boil the Ocean: Stay focused on the task you’re working on. Yes, make the extra investment in cleaning up the code you have to touch for the new feature, but don’t go down rabbit trails. You can clean up other code the next time you work on a feature that touches it.
Be Safe
If there aren’t tests, write some “characterization tests” as Feathers calls them. You really want to be sure that you’re not changing the visible behavior of the code when you refactor.
J.B. Rainsberger’s recent post, Surviving Legacy Code With Golden Master and Sampling, gives some techniques for getting tests around legacy code. Katrina Owens’ Approvals gem is a handy tool for implementing the Golden Master technique that J.B. talks about.
Work in very small steps. Do lots of simple micro-refactorings one step at a time. These baby steps will add up to big changes over time. Even if you choose to take bigger steps most of the time, it is important to be able to fall back to really tiny steps when things get complicated.
In my talk, I performed 80 baby steps to go from the initial state to a significantly cleaner final state that also implemented the new feature that was the goal of the kata.
Start Simple
Start with really basic refactorings, preferably ones that can be performed automatically. I used RubyMine to prepare the talk, and was able to use its built-in refactoring support quite a bit, especially early on.
Do simple things like reducing noise and removing duplication by extracting variables and methods, using common language idioms to make the code cleaner and more expressive, etc.
When you’re first starting on some new legacy code, you know less about the code than you ever will. Using simple, mechanical changes allows you to get your hands on the code and begin to understand it.
Capture Learning
As you learn something about part of the code, try to capture that new learning in the code. Extract a variable or method and give it a name that says what the code does. That way you can get that knowledge out of your head, freeing up bandwidth for later cleanups.
Make sure you express important ideas and domain concepts in the code. This makes it easier for the next person to understand the code when they come back to it.
Don’t Blindly Fix Bugs
As you clean up legacy code, you will almost certainly discover bugs. Don’t blindly fix them. Many times, the bug is actually masked or worked around by other code or systems. Or people have come to expect the output they’re getting, and fixing the bug will mess them up.
You should absolutely make a note of the bug and do some research to figure out if it should be fixed. Talk to others, look at the code that consumes the buggy result, etc. But don’t just fix the bug without doing some due diligence.
Until you have a better handle on things, you want to give priority to preserving the existing behavior of the system, even if it seems wrong.
Practice and Learn
Practice refactoring whenever you can. Try some code katas that emphasize refactoring. Try refactoring some code in your current codebase. Even if you choose not to commit the changes, the practice will help.
Read Martin Fowler’s Refactoring book (or the Ruby edition).
Almost all of us will encounter legacy code at some point. The more tools you have for working with it, the better off you will be.