All systems eventually reach a point where they are either no longer needed or the cost of maintenance outweighs their benefit to the business. If the system was originally built using hardware and tools that are now unavailable what choice do you have but to rewrite from scratch? Hopefully you’ve seen this position coming and have instead made provision to migrate to newer tools and technologies whilst at least maintaining the usefulness of the system. I’ve heard this kind of change referred to as an Architectural Refactoring. The use of the term refactoring has the same meaning as you’d expect - there is no change in behaviour, only in the underlying mechanism.
In one sense refactoring is rewriting, after all, code has changed. But the difference is that the change is still part of the same feature and is therefore still the code providing the same business value it did before, but hopefully in a “better” way. What rewriting seems to have come to mean is developing the same feature, but to one side of the production codebase. In essence there is a switchover from the old to the new, and probably a change in behaviour too as it’s likely that new features are what has driven it to occur in the first place.
I’ve seen plenty written about the fallacy of rewriting, but usually from the perspective of whether it actually delivers the intended business benefits it was expected to. One commonly cited problem is that the same mistakes are often repeated or the system ends up doing exactly what the old one did, which in turn is not what the business probably wanted in the first place!
However, my beef here is with rewriting internal parts of the system, especially when that rewrite does not involve a major change in tool or technology. If you already have a feature providing value, then the desire should be to evolve the implementation of that feature whilst continuing to draw on its value. By evolving through refactoring you get to continue extracting value whilst at the same time being in a position to put your spade down and switch focus to more important matters when necessary. If you choose the rewriting path you gain no value out of the change until you have switched over to it in production and decommissioned the original code.
Some time ago, at a previous client, I wrote a lengthy diatribe about how COM was not the problem in itself, but the way COM was used within the system was. I declared that one of the barriers to delivering new features was the inability to do unit testing because the excessive use of COM within the components made it intrinsically hard[1]. To try and turn-the-tide I wrote a couple of simple helper functions[2] that allowed us to reach into the COM component and extract the native C++ object inside as we owned both sides of the interface. This gave us a clear path for the future where we would push the logic from the COM component down into a native C++ class, which would then be unit testable. Initially we would use the dirty helper functions to overcome the COM boundary and then start fixing up the call graphs by taking the native type instead of the COM interface as the parameter. This would then push COM back out the module boundary where it belongs.
More recently I’ve been hit a couple of times by the worst of both worlds - a rewrite that is only partially wired in - so we are currently maintaining two versions of the same feature. Worse still is that the end state almost certainly means ripping out the temporary code used to partially wire it in, so there is yet more disruption ahead of us. One obvious question is whether refactoring would have just lead to the same outcome? Personally, I don’t believe so and it’s because refactoring keeps you in permanent contact with all the current consumers of your feature. When you rewrite you have the opportunity to focus on a single use case and then try and evolve it as you reel in the remainder; with refactoring you are always dealing with real use cases[3].
Aside from the mental gymnastics required to juggle two implementations there is another cost which is commonly missed - source code continuity. As is probably already apparent from my other posts I’m a huge fan of using Software Configuration Management (SCM) tools to perform Software Archaeology when I need to track down why a change was made. Writing new code makes that task much harder because you have to trace back through the current implementation, whilst keeping an eye out for where the previous one was removed; just so that you can then trace back through that too. Although refactoring moves code around it generally remains within the same set of source files and commits and so you stand a better chance of piecing together the history.
In an modern project every commit should expect to deliver some value today. Refactoring may take longer in the long run, but at least every commit is a step forward. Rewriting stores value up in the hope that it will deliver in the future, but it also stores up risk too along with creating a disconnect between past and present.
[1] Registration-free COM was yet to come. Solutions involving registering every component before running any tests are brittle at best and disallow running parallel builds on the same machine. Registering the components using relative paths is possible, but you have to do a little work yourself.
[2] Essentially I created a marker COM interface called ISupportDynamicCast that any natively implemented COM class would “implement”. The presence of this interface means the caller knows it’s safe to attempt a dynamic_cast<> on the object from ISupportDynamicCast to the underlying C++ type. The are plenty of caveats to doing this, but if you implement both the caller and callee, and are using the same compiler for both it just may dig you out of a hole.
[3] Yes, it’s possible that one or more of those use cases is no longer required, if it is you get to rip it out there and then rather than discover later that you’ve been living with dead code all along.