I read an interesting article recently from Reginald Braithwaite titled “Defactoring”. Sadly the article was published in a format where no comments can be added directly, so I’m going to publish mine here instead.
The premise of the article was about performing the reverse action of “factoring” on a codebase - essentially manually inlining methods again. The reason the author cited for doing this was largely because the promise of reuse, for which the original factoring had apparently taken place, had never materialised. My problem with this assertion is that the code was factored out solely for the purpose of reuse [1], which is rarely the reason I do it. For me, after having factored code out, if I should happen to need to reuse it later then it’s merely a fortuitous side-effect that it has already been simplified.
Introduce Explaining Variable
In Martin Fowler’s book on Refactoring, one of the items (Ch. 6, p124) is about breaking up long expressions into smaller ones and giving the result a name, i.e. using a variable (or constant [2]), e.g. instead of writing:-
var pt = new Point(((right-left)/2)+left
((bottom-top)/2)+top);
...you could break the expression down and name different parts:-
var width = right - left;
var height = bottom - top;
var centreX = left + (width / 2);
var centreY = top + (height / 2);
var centrePoint = new Point(centreX, centreY);
OK, so I’ve gone to the extreme here to make a point (pun intended) and in the real world the expressions are usually longer and have a few nested uses of the ternary operator thrown in for good measure.
The main reason I would do this decomposition is to explain my thinking. If there is a bug in the original example it might not be obvious what that bug is because you can’t see how I derived the expression. In the latter case, by trying to name the elements of the expression I’m cementing in my own mind exactly how it is I’ve arrived at the solution.
Show Your Workings
With my eldest child just about to go through his exams, I’m reminded about how important it is to “show your workings”. At school it’s all very well knowing you know something, but in an exam it’s important to show the examiner that you know it too. Sometimes we have to learn it the hard way by staring at a maths problem we know we’ve done wrong but can’t see where we went astray because all we wrote down was the final answer.
Magic Numbers
If you take an expression and reduce it down to its smallest elements you eventually arrive at creating names for literal values, or as they are often referred to: magic numbers. The most primitive of these are boolean values and anyone who has come across code that uses the CreateEvent function in the Windows API will have met this kind of code:-
CreateEvent(..., true, false, ...);
You have to look up the documentation to see what the arguments refer to; some simple local constants can make this much more readable at the call site:-
const bool manualEvent = true;
const bool notSignalled = false;
CreateEvent(..., manualEvent, notSignalled, ...);
Introduce Explaining Method
The alternative to introducing explaining variables would be to introduce explaining methods where we wrap up the functionality into lots of small, well named methods instead. Going back to our earlier example we could refactor our expression this way instead:-
public int CentreX(left, right)
{
return ((right - left) / 2) + left;
}
Then we can use the methods directly:-
var centrePoint = new Point(CentreX(left, right),
CentreY(top, bottom));
On such as simple and obvious example as this it probably seems overkill, but I’ve often come across many other small expressions that just seem to take way too much brain power to understand - both what they represent and, more importantly, if they’re correct.
Turning once again to the Refactoring book and the Extract Method item in particular (Ch. 6, p 110) we find the following advice in the Motivation section:-
“If extracting improves clarity, do it, even if the name is longer than the code you have extracted”.
Too Many Layers of Abstraction?
Once again I find a criticism of this degree of breakdown being that it adds too many levels to the code. I suspect the people who say this seem hell bent on trying to squeeze as much code as possible into the smallest area on the screen. When there are too many layers the calls are across multiple classes, whereas what I’m talking about means lots of little private methods in the same source file.
Additionally, the point of naming is to convey something to the reader, and that’s often hard to do in programming, so much so I can understand why it seems easier to just not do it.
Decades ago there might also have been an argument against doing this on the grounds that it affects performance, and it may well do in dynamic languages, but modern statically compiled languages can easily inline this stuff.
There is a famous quote by Sir Tony Hoare that goes:-
“One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies”
To me, breaking a problem down as much as possible helps to create a program that approaches the former state rather than the latter.
[1] Of course it’s entirely plausible that the author has chosen to focus solely on this particular use case for factoring out methods and is perfectly aware of others. Either way, I’m going to add my own £0.02 to the debate.
[2] See the Single Assignment Pattern.