Sunday, 14 February 2016

Get Considered Harmful

[I’ve tried really hard to come up with a better title than this, e.g. Thesaurus Driven Development, but none of them quite conveyed my feelings. Surely we must all be entitled to use the “??? Considered Harmful” template at least once in our careers? Well, this is mine.]

They say that one of the hardest problems in computer science is naming things [1]. We struggle over what to call our functions, classes, variables, filenames, servers, etc. in the hope that they adequately convey what it is that they do or contain.

In my own programming endeavours I’ve found the use of the Thesaurus a vital ingredient in staving off the boring repetition that so often litters the names in a codebase. In fact one of the latest additions to my “In The Toolbox” column for C Vu touched on this very topic (see “Dictionary & Thesaurus”).

Bueller, Bueller, Anyone…

Whilst the travesty that is appending the word “Service” as a suffix to some noun in the hope of making it sound “behavioural” ( and therefore also suitable for repurposing as an equivalently named interface) is a common blight, but it’s not the one I’m interested in this time.

No, in this instance the word that is the target of my ire is “get”. It’s often as if no other word exists in the English language for helping to name what a function (or method) returning a value does. I sometimes wonder if the all the effort is expended by programmers trying to decide what it is that they’re trying to obtain that they have no time or energy left to describe how they will be obtaining it.

The word has become so utterly bland and devoid of any usefulness that, by definition, it could be used for absolutely anything and therefore it is a de-facto choice that could never be wrong. Of course unless a complete lack of descriptiveness is one of your criterion for a good name. Sadly it is one of mine.

A Smorgasbord of Alternatives

As I mentioned in that C Vu article, and more recently on Twitter, I can think of a whole bunch of words that could almost certainly describe more accurately what it is that “some function” might be trying to achieve. Here once again is a similar, off-the-cuff list of choices:

build, create, allocate, make, acquire, format, locate, request, fetch, retrieve, find, calculate, derive, pull, process, format, transform, generate.

Within that simple list of words there are clearly a number of categories under which we could group our choices. For example if I’ve got a factory function then I might choose: create, build, make or allocate.

If the factory is some kind of object pool I might not want to use the create/destroy or allocate/free pair as they potentially suggest too much and so I might go for the slightly looser acquire/release pair.

If the creation is for a computed sequence of items rather than a single value or entity, then generate might be more applicable.

Another group centres around the kind of calls made on an external (or remote) service, e.g. request, fetch, retrieve, locate, find, query and search. Some of these words are equally fitting for searching a local collection too and therefore the degree of transparency may have a bearing on the choice. There is an old programming adage “Don’t hide the network”, and naming is one clue we can use to do that.

The third obvious group I can make out are words for describing the more common notion of what a traditional function is: the transformation of one value into another. Typically we put one value in and get the same value out, but changed in an interesting way, e.g. a different representation. Or maybe we do get a different value out, but with the same representation. Or even a change in both, but nevertheless one that is still similar in more ways than not [2].

For this style of (pure) function I would naturally look to words like: transform, convert, format, parse, calculate, derive, etc. Words like execute or process, which might also convey the notion of some transformation come (in my mind) with thoughts of side-effects too, rather than the purity of a mathematical function.

Accessor or Factory?

If you take a very rigid stance that says every function can either return a value it already holds, or can create a new one, then every function must either be an accessor or a factory. It’s easy to see how this kind of myopia leads to a codebase littered with “factory” classes when they’re basing their naming on a simplistic view of the function’s implementation.

The flipside to being overly generic is being overly specific. When you have a concrete method it’s easy to name it after what it does. But when the method is part of an interface which can, by definition, be implemented in many different ways we run the risk of potentially sending the reader down the wrong path.

Which is worse though: forcing the reader to look inside the box because it has no meaningful label, or asking them to suspend their disbelief when the contents eventually turn out to be blander than advertised? Personally I’d rather be surprised to find a simple calculation when I wasn’t expecting it rather than a network hop to a remote machine.

Blandness Over Configuration

I do wonder if part of the problem might not lie with modern tooling. In our desire to embrace the notion of Convention over Configuration we restrict ourselves to using only those words that the tool can recognise. Those words are very likely to be the ones that have the broadest stroke exactly because we expect them to cater for a wide degree of diversity – it’s the same issue as with interfaces, but on an even wider scale.

Another source of inspiration for such vocabulary are books on Design Patterns, and that usually means the seminal one from the Gang of Four. One might argue that the raison d'ĂȘtre of the Design Patterns movement was to create a common vocabulary, and it was, but that doesn’t mean you have to use the pattern names and examples in your own types and functions.

Haven’t We Been Here Before?

If any of this coming from me sounds at all familiar it might be because I’ve trodden a similar path before right back in the infancy of this very blog, see “Standard Method Name Verb Semantics”. Back then I was particularly interested in trying to overcome the horribly overloaded nature of certain words, of which “get” happened to be just one.

In a sense this blog post is a step backwards. Rather than trying to overcome the subtitles of whether Get or Find is more suitable for a method that can also return the absence of a value [3], or Allocate versus Acquire for an object pool, I’m just trying to convince programmers to use almost anything [4] other than “get” once again.

[1] The other one of course is cache invalidation (and off-by-one errors).

[2] For example I once saw an F# example that “mapped” a URL into a JSON document. While a lawyer might argue that it’s a perfectly valid “function”, I’d contend that it is hiding a huge amount of complexity. This is both a blessing and a curse depending on whether or not it’s your turn with the support pager.

[3] The answer of course is probably “either”, but return an Option instead of a value / null reference; i.e. make the return type the defining characteristic.

[4] By applying Cunningham’s Law the function stands a chance of being renamed when the mistake is noticed, whereas giving it a dreary pretty much guarantees it’ll never be changed.