Thursday, 15 June 2017

Refactoring – Before or After?

I recently worked on a codebase where I had a new feature to implement but found myself struggling to understand the existing structure. Despite paring a considerable amount I realised that without other people to easily guide me I still got lost trying to find where I needed to make the change. I felt like I was walking through a familiar wood but the exact route eluded me without my usual guides.

I reverted the changes I had made and proposed that now might be a good point to do a little reorganisation. The response was met with a brief and light-hearted game of “Ken Beck Quote Tennis” - some suggested we do the refactoring before the feature whilst others preferred after. I felt there was a somewhat superficial conflict here that I hadn’t really noticed before and wondered what the drivers might be to taking one approach over the other.

Refactor After

If you’re into Test Driven Development (TDD) then you’ll have the mantra “Red, Green, Refactor” firmly lodged in your psyche. When practicing TDD you first write the test, then make it pass, and finally finish up by refactoring the code to remove duplication or otherwise simplify it. Ken Beck’s Test Driven Development: By Example is probably the de facto read for adopting this practice.

The approach here can be seen as one where the refactoring comes after you have the functionality working. From a value perspective most of it comes from having the functionality itself – the refactoring step is an investment in the codebase to allow future value to be added more easily later.

Just after adding a feature is the point where you’ve probably learned the most about the problem at hand and so ensuring the design best represents your current understanding is a worthwhile aid to future comprehension.

Refactor Before

Another saying from Kent Beck that I’m particularly fond of is “make the change easy, then make the easy change” [1]. Here he is alluding to a dose of refactoring up-front to mould the codebase into a shape that is more amenable to allowing you to add the feature you really want.

At this point we are not adding anything new but are leaning on all the existing tests, and maybe improving them too, to ensure that we make no functional changes. The value here is about reducing the risk of the new feature by showing that the codebase can safely evolve towards supporting it. More importantly It also gives the earliest visibility to others about the new direction the code will take [2].

We know the least amount about what it will take to implement the new feature at this point but we also have a working product that we can leverage to see how it’s likely to be impacted.

Refactor Before, During & After

Taken at face value it might appear to be contradictory about when the best time to refactor is. Of course this is really a straw man argument as the best time is in fact “all the time” – we should continually keep the code in good shape [3].

That said the act of refactoring should not occur within a vacuum, it should be driven by a need to make a more valuable change. If the code never needed to change we wouldn’t be doing it in the first place and this should be borne in mind when working on a large codebase where there might be a temptation to refactor purely for the sake of it. Seeing stories or tasks go on the backlog which solely amount to a refactoring are a smell and should be heavily scrutinised.

Emergent Design

That said, there are no absolutes and whilst I would view any isolated refactoring task with suspicion, that is effectively what I was proposing back at the beginning of this post. One of the side-effects of emergent design is that you can get yourself into quite a state before a cohesive design finally emerges.

Whilst on paper we had a number of potential designs all vying for a place in the architecture we had gone with the simplest possible thing for as long as possible in the hope that more complex features would arrive on the backlog and we would then have the forces we needed to evaluate one design over another.

Hence the refactoring decision became one between digging ourselves into an even deeper hole first, and then refactoring heavily once we had made the functional change, or doing some up-front preparation to solidify some of the emerging concepts first. There is the potential for waste if you go too far down the up-front route but if you’ve been watching how the design and feature list have been emerging over time it’s likely you already know where you are heading when the time comes to put the design into action.


[1] I tend to elide the warning from the original quote about the first part potentially being hard when saying it out loud because the audience is usually well aware of that :o).

[2] See “The Cost of Long-Lived Feature Branches” for a cautionary tale about storing up changes.

[3] See “Relentless Refactoring” for the changes in attitude towards this practice.

Monday, 12 June 2017

Stack Overflow With Custom JsonConverter

[There is a Gist on GitHub that contains a minimal working example and summary of this post.]

We recently needed to change our data model so that what was originally a list of one type, became a list of objects of different types with a common base, i.e. our JSON deserialization now needed to deal with polymorphic types.

Naturally we googled the problem to see what support, if any, Newtonsoft’s JSON.Net had. Although it has some built-in support, like many built-in solutions it stores fully qualified type names which we didn’t want in our JSON, we just wanted simple technology-agnostic type names like “cat” or “dog” that we would be happy to map manually somewhere in our code. We didn’t want to write all the deserialization logic manually, but was happy to give the library a leg-up with the mapping of types.


Our searching quickly led to the following question on Stack Overflow: “Deserializing polymorphic json classes without type information using”. The lack of type information mentioned in the question meant the exact .Net type (i.e. name, assembly, version, etc.), and so the answer describes how to do it where you can infer the resulting type from one or more attributes in the data itself. In our case it was a field unsurprisingly called “type” that held a simplified name as described earlier.

The crux of the solution involves creating a JsonConverter and implementing the two methods CanConvert and ReadJson. If we follow that Stack Overflow post’s top answer we end up with an implementation something like this:

public class CustomJsonConverter : JsonConverter
  public override bool CanConvert(Type objectType)
    return typeof(BaseType).

  public override object ReadJson(JsonReader reader,
           Type objectType, object existingValue,
           JsonSerializer serializer)
    JObject item = JObject.Load(reader);

    if (item.Value<string>(“type”) == “Derived”)
      return item.ToObject<DerivedType>();
    . . .

This all made perfect sense and even agreed with a couple of other blog posts on the topic we unearthed. However when we plugged it in we ended up with an infinite loop in the ReadJson method that resulted in a StackOverflowException. Doing some more googling and checking the Newtonsoft JSON.Net documentation didn’t point out our “obvious” mistake and so we resorted to the time honoured technique of fumbling around with the code to see if we could get this (seemingly promising) solution working.

A Blind Alley

One avenue that appeared to fix the problem was manually adding the JsonConverter to the list of Converters in the JsonSerializerSettings object instead of using the [JsonConverter] attribute on the base class. We went back and forth with some unit tests to prove that this was indeed the solution and even committed this fix to our codebase.

However I was never really satisfied with this outcome and so decided to write this incident up. I started to work through the simplest possible example to illustrate the behaviour but when I came to repro it I found that neither approach worked – attribute or serializer settings - I always got into an infinite loop.

Hence I questioned our original diagnosis and continued to see if there was a more satisfactory answer.

ToObject vs Populate

I went back and re-read the various hits we got with those additional keywords (recursion, infinite loop and stack overflow) to see if we’d missed something along the way. The two main candidates were “Polymorphic JSON Deserialization failing using Json.Net” and “Custom inheritance JsonConverter fails when JsonConverterAttribute is used”. Neither of these explicitly references the answer we initially found and what might be wrong with it – they give a different answer to a slightly different question.

However in these answers they suggest de-serializing the object in a different way, instead of using ToObject<DerivedType>() to do all the heavy lifting, they suggest creating the uninitialized object yourself and then using Populate() to fill in the details, like this:

  JObject item = JObject.Load(reader);

  if (item.Value<string>(“type”) == “Derived”)
    var @object = new DerivedType();
    serializer.Populate(item.CreateReader(), @object);
    return @object;
    . . .

Plugging this approach into my minimal example worked, and for both the converter techniques too: attribute and serializer settings.

Unanswered Questions

So I’ve found another technique that works, which is great, but I still lack closure around the whole affair. For example, how come the answer in the the original Stack Overflow question “Deserializing polymorphic json classes” didn’t work for us? That answer has plenty of up-votes and so should be considered pretty reliable. Has there been a change to Newtonsoft’s JSON.Net library that has somehow caused this answer to now break for others? Is there a new bug that we’ve literally only just discovered (we’re using v10)? Why don’t the JSON.Net docs warn against this if it really is an issue, or are we looking in the wrong part of the docs?

As described right at the beginning I’ve published a Gist with my minimal example and added a comment to the Stack Overflow answer with that link so that anyone else on the same journey has some other pieces of the jigsaw to work with. Perhaps over time my comment will also acquire up-votes to help indicate that it’s not so cut-and-dried. Or maybe someone who knows the right answer will spot it and point out where we went wrong.

Ultimately though this is probably a case of not seeing the wood for the trees. It’s so easy when you’re trying to solve one problem to get lost in the accidental complexity and not take a step back. Answers on Stack Overflow generally carry a large degree of gravitas, but they should not be assumed to be infallible. All documentation can go out of date even if there are (seemingly) many eyes watching over it.

When your mind-set is one that always assumes the bugs are of your own making, unless the evidence is overwhelming, then those times when you might actually not be entirely at fault seem to feel all the more embarrassing when you realise the answer was probably there all along but you discounted it too early because your train of thought was elsewhere.

Tuesday, 23 May 2017

Are Refactoring Tools Less Effective Overall?

Prior to the addition of automatic refactoring tools to modern IDEs refactoring was essentially a manual affair. You would make a code change, hit build, and then fix all the compiler errors (at least for statically typed languages). This technique is commonly known as “leaning on the compiler”. Naturally the operation could be fraught with danger if you were far too ambitious about the change, but knowing when you could lean on the compiler was part of the art of refactoring safely back then.

A Hypothesis

Having lived through both eras (manual and automatic) and paired with developers far more skilled with the automatic approach I’ve come up with a totally non-scientific hypothesis that suggests automatic refactoring tools are actually less effective than the manual approach, overall.

I guess the basis of this hypothesis pretty much hinges on what I mean by “effective”. Here I’m suggesting that automatic tools help you easily refactor to a local minima but not to a global minima [1]; consequently the codebase as a whole ends up in a less coherent state.

Shallow vs Deep Refactoring

The goal of an automatic refactoring tool appears to be to not break your code – it will only allow you to use it to perform a simple refactoring that can be done safely, i.e. if the tool can’t fix up all the code it can see [2] it won’t allow you to do it in the first place. The consequence of this is that the tool constantly limits you to taking very small steps. Watching someone refactor with a tool can sometimes seem tortuous as they may need to use so many little refactoring steps to get the code into the desired state because you cannot make the leaps you want in one go unless you switch to manual mode.

This by itself isn’t a bad thing, after all making a safe change is clearly A Good Thing. No, where I see the problem is that by fixing up all the call sites automatically you don’t get to see the wider effects of the refactoring you’re attempting.

For example the reason you’d choose to rename a class or method is because the existing one is no longer appropriate. This is probably because you’re learned something new about the problem domain. However that class or method does not exist in a vacuum, it has dependencies in the guise of variable names and related types. It’s entirely likely that some of these may now be inappropriate too, however you won’t easily see them because the tool has likely hidden them from you.

Hence one of the “benefits” of the old manual refactoring approach was that as you visited each broken call site you got to reflect on your change in the context of where it’s used. This often led to further refactorings as you began to comprehend the full nature of what you had just discovered.

Blue or Red Pill?

Of course what I’ve just described could easily be interpreted as the kind of “black hole” that many, myself included, would see as an unbounded unit of work. It’s one of those nasty rabbit holes where you enter and, before you know it, you’re burrowing close to the Earth’s core and have edited nearly every file in the entire workspace.

Yes, like any change, it takes discipline to stick to the scope of the original problem. Just because you keep unearthing more and more code that no longer appears to fit the new model it does not mean you have to tackle it right now. Noticing the disparity is the first step towards fixing it.

Commit Review

It’s not entirely true that you won’t see the entire outcome of the refactoring – at the very least the impact will be visible when you review the complete change before committing. (For a fairly comprehensive list of the things I go through at the point I commit see my C Vu article “Commit Checklist”.)

This assumes of course that you do a thorough review of your commits before pushing them. However by this point, just as writing tests after the fact are considerably less attractive, so is finishing off any refactoring; perhaps even more so because the code is not broken per-se, it just might not be the best way of representing the solution.

It’s all too easy to justify the reasons why it’s okay to go ahead and push the change as-is because there are more important things to do. Even if you think you’re aware of technical debt it often takes a fresh pair of eyes to see how you’re living in a codebase riddled with inconsistencies that make it hard to see it’s true structure. One is then never quite sure without reviewing the commit logs what is the legacy and what is the new direction.

Blinded by Tools

Clearly this is not the fault of the tool or their vendors. What they offer now is far more favourable than not having them at all. However once again we need to be reminded that we should not be slaves to our tools but that we are the masters. This is a common theme which is regularly echoed in the software development community and something I myself tackled in the past with “Don’t Let Your Tools Pwn You”.

The Boy Scout Rule (popularised by Uncle Bob) says that we should always leave the camp site cleaner than we found it. While picking up a handful of somebody else’s rubbish and putting it in the bin might meet the goal in a literal sense, it’s no good if the site is acquiring rubbish faster than it’s being collected.

Refactoring is a technique for improving the quality of a software design in a piecewise fashion; just be careful you don’t spend so long on your hands and knees cleaning small areas that you fail to spot the resulting detritus building up around you.


[1] I wasn’t sure whether to say minima or maxima but I felt that refactoring was about lowering entropy in some way so went with the reduction metaphor.

[2] Clearly there are limits around published APIs which it just has to ignore.

Thursday, 18 May 2017

Excel-style DDE Requests

Despite being over 2 decades old Microsoft’s Dynamic Data Exchange (DDE) in Windows still seems to be in use for Windows IPC by a not insignificant number of companies. At least, if the frequency of DDE questions in my inbox is anything to go by [1][2].

Earlier this year I got a question from someone who was trying to use my DDE Command tool (a command line tool for querying DDE servers) to get data out of the MetaTrader 4 platform. Finance is the area I first came across DDE in anger and it still seems to be a popular choice there even to this day.

Curious Behaviour

The problem was that when they used the ddecmdrequest” verb to send an XTYP_REQUEST message to the MetaTrader 4 DDE Server (MT4) for a symbol they always got an immediate result of “N/A”. As a workaround they tried using the “advise” verb, which sends an XTYP_ADVSTART, to listen for updates for a short period instead. This worked for symbols which changed frequently but missed those that didn’t change during the interval. Plus this was a dirty hack as they had to find a way to send a CTRL+C to my tool to stop it after this short interval.

Clearly the MetaTrader DDE server couldn’t be this broken, and the proof was that it worked fine with Microsoft Excel – the other stalwart of the finance industry. Hence the question posed to me was why Excel appeared to work, but sending a request from my tool didn’t, i.e. was there a bug in my tool?

Reproducing the Problem

Given the popularity of the MetaTrader platform and Microsoft Excel the application of Occam’s Razor would suggest a bug in my tool was clearly the most likely answer, so I investigated…

Luckily MetaTrader 4 is a free download and they will even give you a demo account to play with which is super welcome for people like me who only want to use the platform to fix interop problems in their own tools and don’t actually want to use it to trade.

I quickly reproduced the problem by sending a DDE request for a common symbol:

> DDECmd.exe request -s MT4 -t QUOTE -i COPPER

And then I used the DDE advise command to see it working for background updates:

> DDECmd.exe advise -s MT4 -t QUOTE -i COPPER
2.5525 2.5590
2.5520 2.5590
. . . 

I also tried it in Excel too to see that it was successfully managing to request the current value, even for slow ticking symbols.

How Excel Requests Data Via DDE

My DDE Command tool has a nice feature where it can also act as a DDE server and logs the different requests sent to it. This was originally added by me to help diagnose problems in my own DDE client code but it’s also been useful to see how other DDE clients behave.

As you can see below, when Excel opens a DDE link (=TEST|TEST!X) it actually sends a number of different XTYP_ADVSTART messages as it tries to find the highest fidelity format to receive the data in:

> DDECmd.exe listen –s TEST –t TEST
XTYP_ADVSTART: 'T…', 'T…', 'StdDocumentName', '49157'
XTYP_ADVSTART: 'TEST', 'TEST', 'X', '50018'
. . .
XTYP_REQUEST: 'TEST', 'TEST', 'X', '50018'

After it manages to set-up the initial advise loop it then goes on to send a one-off XTYP_REQUEST to retrieve the initial value. So, apart from the funky data formats it asks for, there is nothing unusual about the DDE request Excel seems to make.

Advise Before Request

And then it dawned on me, what if the MetaTrader DDE server required an advise loop to be established on a symbol before you’re allowed to request it? Sure enough, I hacked a bit of code into my request command to start an advise loop first and the subsequent DDE request succeeded.

I don’t know if this is a bug in the MetaTrader 4 DDE server or the intended behaviour. I suspect the fact that it works with Excel covers the vast majority of users so maybe it’s never been a priority to support one-off data requests. The various other financial DDE servers I coded against circa 2000 never exhibited this kind of requirement – you could make one-off requests for data with a standalone XTYP_REQUEST message.

The New Fetch Command

The original intent of my DDE Command tool was to provide a tool that allows each XTYP_* message to be sent to a DDE server in isolation, mostly for testing purposes. As such the tools’ verbs pretty much have a one-to-one correspondence with the DDE messages you might send yourself.

To allow people to use my tool against the MetaTrader 4 platform to snapshot data would therefore mean making some kind of small change. I did consider adding various special switches to the existing request and advise verbs, either to force an advise first or to force a request if no immediate update was received but that seemed to go against the ethos a bit.

In the end I decided to add a new verb called “fetch” which acts just like “request”, but starts an advise loop for every item first, then sends a request message for the latest value, thereby directly mimicking Excel.

> DDECmd.exe fetch -s MT4 -t QUOTE -i COPPER -i SILVER
COPPER|2.6075 2.6145
SILVER|16.771 16.821

Hey presto it now works!

This feature was released in DDE Command v1.6.


[1] This is a bit of artistic licence :o), they are not a daily occurrence but once every couple of months wouldn’t be far off. So yes, “DDE Is Still Alive & Kicking”.

[2] Most recently it seems quite a few people are beginning to discover that Microsoft dropped NetDDE support way back in Windows Vista.

Tuesday, 16 May 2017

My Dislike of GOPATH

[This post was written in response to a tweet from Matt Aimonetti which asked “did you ever try #golang? If not, can you tell me why? If yes, what was the first blocker/annoyance you encountered?”]

I’m a dabbler in Go [1], by which I mean I know enough to be considered dangerous but not enough to be proficient. I’ve done a number of katas and even paired on some simple tools my team has built in Go. There is so much to like about it that I’ve had cause to prefer looking for 3rd party tools written in it in the faint hope that I might at some point be able to contribute one day. But, every time I pull the source code I end up wasting so much time trying to get the thing to build because Go has its own opinions about how the source is laid out and tools are built that don’t match the way I (or most other tools I’ve used) work.

Hello, World!

Go is really easy to get into, on Windows it’s as simple as:

> choco install golang

This installs the compiler and standard libraries and you’re all ready to get started. The obligatory first program is as simple as doing:

> pushd \Dev
> mkdir HelloWorld
> notepad HelloWorld.go
> go build HelloWorld.go
> HelloWorld

So far, so good. It’s pretty much the same as if you were writing in any other compiled language – create folder, create source file, build code, run it.

Along the way you may have run into some complaint about the variable GOPATH not being set. It’s easily fixed by simply doing:

> set GOPATH=%CD%

You might have bothered to read up on all the brouhaha, got side-tracked and discovered that it’s easy to silence the complaint without any loss of functionality by setting it to point to anywhere. After all the goal early on is just to get your first program built and running not get bogged down in tooling esoterica.

When I reached this point I did a few katas and used a locally installed copy of the excellent multi-language exercise tool to have a play with the test framework and some of the built-in libraries. Using a tool like cyber-dojo meant that GOPATH problems didn’t rear their ugly head again as it was already handled by the tool and the katas only needed standard library stuff.

The first non-kata program my team wrote in Go (which I paired on) was a simple HTTP smoke test tool that also just lives in the same repo as the service it tests. Once again their was nary a whiff of GOPATH issues here either – still simple.

Git Client, But not Quite

The problems eventually started for me when I tried to download a 3rd party tool off the internet and build it [2]. Normally, getting the source code for a tool from an online repository, like GitHub, and then building it is as simple as:

> git clone https://…/tool.git
> pushd tool
> build

Even if there is no Windows build script, with the little you know about Go at this point you’d hope it to be something like this:

> go build

What usually happens now is that you start getting funny errors about not being able to find some code which you can readily deduce is some missing package dependency.

This is the point where you start to haemorrhage time as you seek in vain to fix the missing package dependency without realising that the real mistake you made was right back at the beginning when you used “git clone” instead of “go get”. Not only that but you also forgot that you should have been doing all this inside the “%GOPATH%\src” folder and not in some arbitrary TEMP folder you’d created just to play around in.

The goal was likely to just build & run some 3rd party tool in isolation but that’s not the way Go wants you to see the world.

The Folder as a Sandbox

The most basic form of isolation, and therefore version control, in software development is the humble file-system folder [3]. If you want to monkey with something on the side just make a copy of it in another folder and you know your original is safe. This style of isolation (along with other less favourable forms) is something I’ve written about in depth before in my C Vu In the Toolbox column, see “The Developer’s Sandbox”.

Unfortunately for me this is how I hope (expect) all tools to work out of the box. Interestingly Go is a highly opinionated language (which is a good thing in many cases) that wants you to do all your coding under one folder, identified by the GOPATH variable. The rationale for this is that it reduces friction from versioning problems and helps ensure everyone, and everything, is always using a consistent set of dependencies – ideally the latest.

Tool User, Not Developer

That policy makes sense for Google’s developers working on their company’s tools, but I’m not a Google developer, I’m a just a user of the tool. My goal is to be able to build and run the tool. If there happens to be a simple bug that I can fix, then great, I’d like to do that, but what I do not have the time for is getting bogged down in library versioning issues because the language believes everyone, everywhere should be singing from the same hymn sheet. I’m more used to a world where dependencies move forward at a pace dictated by the author not by the toolchain itself. Things can still move quickly without having to continually live on the bleeding edge.


As a Go outsider I can see that the situation is definitely improving. The use of a vendor subtree to house a snapshot of the dependencies in source form makes life much simpler for people like me who just want to use the tool hassle free and dabble occasionally by fixing things here and there.

In the early days when I first ran into this problem I naturally assumed it was my problem and that I just needed to set the GOPATH variable to the root of the repo I had just cloned. I soon learned that this was a fools errand as the repo also has to buy into this and structure their source code accordingly. However a variant of this has got some traction with the gb tool which has (IMHO) got the right idea about isolation but sadly is not the sanctioned approach and so you’re dicing with the potential for future impedance mismatches. Ironically to build and install this tool requires GOPATH to still be working the proper way.

The latest version of Go (1.8) will assume a default location for GOPATH (a “go” folder under your profile) if it’s not set but that does not fix the fundamental issue for me which is that you need to understand that any code you pull down may be somewhere unrelated on your file-system if you don’t understand how all this works.

Embracing GOPATH

Ultimately if I am going to properly embrace Go as a language, and I would like to do more, I know that I need to stop fighting the philosophy and just “get with the programme”. This is hard when you have a couple of decades of inertia to overcome but I’m sure I will eventually. It’s happened enough times now that I know what the warnings signs are and what I need to Google to do it “the Go way”.

Okay, so I don’t like or agree with all of (that I know) the choices the Go language has taken but then I’m always aware of the popular quote by Bjarne Stroustrup:

“There are only two kinds of languages: the ones people complain about and the ones nobody uses.”

This post is but one tiny data point which covers one of the very few complaints I have about Go, and even then it’s just really a bit of early friction that occurs at the start of the journey if you’re not a regular. Still, better that than yet another language nobody uses.


[1] Or golang if you want to appease the SEO crowd :o).

[2] It was winrm-cli as I was trying to put together a bug report for Terraform and there was something going wrong with running a Windows script remotely via WinRM.

[3] Let’s put old fashioned technologies like COM to one side for the moment and assume we have learned from past mistakes.

Wednesday, 1 March 2017

Manual Mutation Testing

One of the problems when making code changes is knowing whether there is good test coverage around the area you’re going to touch. In theory, if a rigorous test-first approach is taken no production code should be written without first being backed by a failing test. Of course we all know the old adage about how theory turns out in practice [1]. Even so, just because a test has been written, you don’t know what the quality of it and any related ones are.

Mutation Testing

The practice of mutation testing is one way to answer the perennial question: how do you test the tests? How do you know if the tests which have been written adequately cover the behaviours the code should exhibit? Alternatively, as a described recently in “Overly Prescriptive Tests”, are the tests too brittle because they require too exacting a behaviour?

There are tools out there which will perform mutation testing automatically that you can include as part of your build pipeline. However I tend to use them in a more manual way to help me verify the tests around the small area of functionality I’m currently concerned with [2].

The principle is actually very simple, you just tweak the production code in a small way that would likely mimic a real change and you see what tests fail. If no tests fail at all then you probably have a gap in your spec that needs filling.

Naturally the changes you make on the production code should be sensible and functional in behaviour; there’s no point in randomly setting a reference to null if that scenario is impossible to achieve through the normal course of events. What we’re aiming for here is the simulation of an accidental breaking change by a developer. By tweaking the boundaries of any logic we can also check that our edge cases have adequate coverage too.

There is of course the possibility that this will also unearth some dead code paths too, or at least lead you to further simplify the production code to achieve the same expected behaviour.


Imagine you’re working on a service and you spy some code that appears to format a DateTime value using the default formatter. You have a hunch this might be wrong but there is no obvious unit test for the formatting behaviour. It’s possible the value is observed and checked in an integration or acceptance test elsewhere but you can’t obviously [3] find one.

Naturally if you break the production code a corresponding test should break. But how badly do you break it? If you go too far all your tests might fail because you broke something fundamental, so you need to do it in varying degrees and observe what happens at each step.

If you tweak the date format, say, from the US to the UK format nothing may happen. That might be because the tests use a value like 1st January which is 01/01 in both schemes. Changing from a local time format to an ISO format may provoke something new to fail. If the test date is particularly well chosen and loosely verified this could well still be inside whatever specification was chosen.

Moving away from a purely numeric form to a more natural, wordy one should change the length and value composition even further. If we reach this point and no tests have failed it’s a good chance nothing will. We can then try an empty string, nonsense strings and even a null string reference to see if someone only cares that some arbitrary value is provided.

But what if after all that effort still no lights start flashing and the klaxon continues to remain silent?

What Does a Test Pass or Fail Really Mean?

In the ideal scenario as you slowly make more and more severe changes you would eventually hope for one or maybe a couple of tests to start failing. When you inspect them it should be obvious from their name and structure what was being expected, and why. If the test name and assertion clearly specifies that some arbitrary value is required then its probably intentional. Of course It may still be undesirable for other reasons [4] but the test might express its intent well (to document and to verify).

If we only make a very small change and a lot of tests go red we’ve probably got some brittle tests that are highly dependent on some unrelated behaviour, or are duplicating behaviours already expressed (probably better) elsewhere.

If the tests stay green this does not necessary mean we’re still operating within the expected behaviour. It’s entirely possible that the behaviour has been left completely unspecified because it was overlooked or forgotten about. It might be that not enough was known at the time and the author expected someone else to “fill in the blanks” at a later date. Or maybe the author just didn’t think a test was needed because the intent was so obvious.

Plugging the Gaps

Depending on the testing culture in the team and your own appetite for well defined executable specifications you may find mutation testing leaves you with more questions than you are willing to take on. You only have so much time and so need to find a way to plug any holes in the most effective way you can. Ideally you’ll follow the Boy Scout Rule and at least leave the codebase in a better state than you found it, even if that isn’t entirely to your own satisfaction.

The main thing I get out of using mutation testing is a better understanding of what it means to write good tests. Seeing how breaks are detected and reasoned about from the resulting evidence gives you a different perspective on how to express your intent. My tests definitely aren’t perfect but by purposefully breaking code up front you get a better feel for how to write less brittle tests than you might by using TDD alone.

With TDD you are the author of both the tests and production code and so are highly familiar with both from the start. Making a change to existing code by starting with mutation testing gives you a better orientation of where the existing tests are and how they perform before you write your own first new failing test.

Refactoring Tests

Refactoring is about changing the code without changing the behaviour. This can also apply to tests too in which case mutation testing can provide the technique for which you start by creating failing production code that you “fix” when the test is changed and the bar goes green again. You can then commit the refactored tests before starting on the change you originally intended to make.


[1] “In theory there is no difference between theory and practice; in practice there is.” –- Jan L. A. van de Snepscheut.

[2] Just as with techniques like static code analysis you really need to adopt this from the beginning if you want to keep the noise level down and avoid exploring too large a rabbit hole.

[3] How you organise your tests is a subject in its own right but suffice to say that it’s usually easier to find a unit test than an acceptance test that depends on any given behaviour.

[4] The author may have misunderstood the requirement or the requirement was never clear originally and so the the behaviour was left loosely specified in the short term.

Tuesday, 28 February 2017

Automate Only What You Need To

The meme tells us to “automate all the things” and it’s a noble cause which has sprung up as a backlash against the ridiculous amount of manual work we’ve often had to do in the past. However in our endeavour to embrace the meme we should not go overboard and lose sight of what we’re automating and why.

The Value

The main reason we tend to automate things is to save ourselves time (and by extension, money) by leveraging tools that can perform tasks quicker than we can, but also with more determinism and reliability, thereby saving even more time. For example, pasting a complex set of steps off a wiki page into a command prompt to perform a task is slower than an interpreter running a script and is fraught with danger as we might screw up at various points along the way and so end up not doing exactly what we’d intended. Ultimately computers are good at boring repetitive tasks whilst we humans are not.

However if we only do this operation once every six months and there are too many potential points of failure we might spend far longer trying to automate it than it actually takes to do carefully, manually. It’s a classic trade-off and like most things in IT there are some XKCD’s for that – “Automation” and “Is It Worth the Time”. They make sobering reading when you’re trying to work out how much time you might save automating something and therefore also gives a good indication of the maximum amount of time you should spend on achieving that.

Orchestration First, Actor Later

Where I think the meme starts to break down is when we get this balance wrong and begin to lose sight of where the real value is, thereby wasting time trying to automate not only all the steps but also wire it into some job scheduling system (e.g. CI server) so that once in a blue moon we can push a button and the whole task from start to finish is executed for us without further intervention.

The dream suggests at that point we can go off and do something else more valuable instead. Whilst this notion of autonomy is idyllic it can also come with a considerable extra up-front cost and any shortcuts are likely to buy us false security (i.e. it silently fails and we lose time investigating downstream failures instead).

For example there are many crude command prompt one-liners I’ve written in the past to pick up common mistakes that are trivial for me to run because they’ve been written to automate the expensive bit, not the entire problem. I often rely on my own visual system to filter out the noise and compensate for the impurities within the process. Removing these wrinkles is often where the proverbial “last 10% that takes 90% of the time” goes [1].

It’s all too easy to get seduced by the meme and believe that no automation task is truly complete until it’s fully automated.

An Example

In .Net when you publish shared libraries as NuGet packages you have a .nuspec file which lists the package dependencies. The library .csproj build file also has project dependencies for use with compilation. However these two sets of dependencies should be kept in sync [2].

Initially with only a couple of NuGet packages it was easy to do manually as I knew it was unlikely to change. However once the monolithic library got split it up the dependencies started to grow and manually comparing the relevant sections got harder and more laborious.

Given the text based nature of the two files (XML) it was pretty easy to write a simple shell one-liner to grep the values from the two sets of relevant XML tags, dump them in a file, and then use diff to show a side-by-side comparison. Then it just needed wrapping in a for loop to traverse the solution workspace.

Because the one-liner was mine I got to take various shortcuts like hardcoding the input path and temporary files along with “knowing” that a certain project was always misreported. At this point a previously manual process has largely been automated and as long as I run it regularly will catch any mistakes.

Of course it’s nice to share things like this so that others can take advantage after I’m gone, and it might be even better if the process can be added as a build step so that it’s caught the moment the problem surfaces rather than later in response to a more obscure issue. Now things begin to get tricky and we start to see diminishing returns.

First, the Gnu on Windows (GoW) toolset I used isn’t standard on Windows so now I need to make the one-liner portable or make everyone else match my tooling choice [3]. I also need to fix the hard coded paths and start adding a bit of error handling. I also need to find a way to remove the noise caused by the one “awkward” project.

None of this is onerous, but this all takes time and whilst I’m doing it I’m not doing something (probably) more valuable. The majority of the value was in being able to scale out this safety check, there is (probably) far less value in making it portable and making it run reliably as part of an automated build. This is because essentially it only needs to be run whenever the project dependencies change and that was incredibly rare once the initial split was done. Additionally the risk of not finding an impedance mismatch was small and should be caught by other automated aspects of the development process, i.e. the deployment and test suite.

Knowing When to Automate More

This scenario of cobbling something together and then finding you need to do it more often is the bread and butter of build & deployment pipelines. You often start out with a bunch of hacked together scripts which do just enough to allow the team to bootstrap itself in to an initial fluid state of delivery. This is commonly referred to as a walking skeleton because it forms the basis for the entire development process.

The point of starting with the walking skeleton rather than just diving headlong into features is to try and tackle some of the problems that historically got left until it was too late, such as packaging and deployment. In the modern era of continuous delivery we look to deliver a thin slice of functionality quickly and then build upon it piecemeal.

However it’s all too easy to get bogged down early on in a project and spend lots of time just getting the build pipeline up and running and have nothing functional to show for it. This has always made me feel a little uncomfortable as it feels as though we should be able to get away with far less than perhaps we think we need to.

In “Building the Pipeline - Process Led or Automation Led” and my even earlier post “Layered Builds” I’ve tried to promote a more organic approach that focuses on what I think really matters most which is a consistent and extensible approach. In essence we focus first on producing a simple, repeatable process that can be used locally to enable the application skeleton to safely evolve and then balance the need for automating this further along with the other features. If quality or speed of delivery drops and more automation looks to be the answer then it can be added with the knowledge that it’s being done for deliberate reasons, rather than because we’ve got carried away gold plating the build system based on what other people think it should do (i.e. a cargo cult mentality).

Technical Risk

The one caveat to being leaner about your automation is that you may (accidentally) put off addressing one or more technical risks because you don’t perceive them as risks. This leads us back to why the meme exists in the first place – failing to address certain aspects of software delivery until it’s too late. If there is a technical concern, address it, but only to the extent that the risk is understood, you may not need to do anything about it now.

With a team of juniors there is likely to be far more unknowns [4] than with a team of experienced programmers, therefore the set of perceived risks will be higher. Whilst you might not know the most elegant approach to solving a problem, knowing an approach already reduces the risk because you know that you can trade technical debt in the short term for something else more valuable if necessary.

Everything is Negotiable

The thing I like most about an agile development process is that every trade-off gets put front-and-centre, everything is now negotiable [5]. Every task now comes with an implicit question: is this the most valuable thing we could be doing?

Whilst manually building a private cloud for your production system using a UI is almost certainly not the most scalable approach, neither is starting day one of a project by diving into, say, Terraform when you don’t even know what you’re supposed to be building. There is nothing wrong with starting off manually, you just need to be diligent and ensure that your decision to only automate “enough of the things” is always working in your favour.


[1] See “The Curse of NTLM Based HTTP Proxies”.

[2] I’m not aware of Visual Studio doing this yet although there may now be extensions and tools written by others I’m not aware of.

[3] Yes, the Unix command line tools should be ubiquitous and maybe finally they will be with Bash on Windows.

[4] See “Turning Unconscious Incompetence to Conscious Incompetence”.

[5] See “Estimating is Liberating”.