Thursday 15 June 2017

Refactoring – Before or After?

I recently worked on a codebase where I had a new feature to implement but found myself struggling to understand the existing structure. Despite paring a considerable amount I realised that without other people to easily guide me I still got lost trying to find where I needed to make the change. I felt like I was walking through a familiar wood but the exact route eluded me without my usual guides.

I reverted the changes I had made and proposed that now might be a good point to do a little reorganisation. The response was met with a brief and light-hearted game of “Ken Beck Quote Tennis” - some suggested we do the refactoring before the feature whilst others preferred after. I felt there was a somewhat superficial conflict here that I hadn’t really noticed before and wondered what the drivers might be to taking one approach over the other.

Refactor After

If you’re into Test Driven Development (TDD) then you’ll have the mantra “Red, Green, Refactor” firmly lodged in your psyche. When practicing TDD you first write the test, then make it pass, and finally finish up by refactoring the code to remove duplication or otherwise simplify it. Ken Beck’s Test Driven Development: By Example is probably the de facto read for adopting this practice.

The approach here can be seen as one where the refactoring comes after you have the functionality working. From a value perspective most of it comes from having the functionality itself – the refactoring step is an investment in the codebase to allow future value to be added more easily later.

Just after adding a feature is the point where you’ve probably learned the most about the problem at hand and so ensuring the design best represents your current understanding is a worthwhile aid to future comprehension.

Refactor Before

Another saying from Kent Beck that I’m particularly fond of is “make the change easy, then make the easy change” [1]. Here he is alluding to a dose of refactoring up-front to mould the codebase into a shape that is more amenable to allowing you to add the feature you really want.

At this point we are not adding anything new but are leaning on all the existing tests, and maybe improving them too, to ensure that we make no functional changes. The value here is about reducing the risk of the new feature by showing that the codebase can safely evolve towards supporting it. More importantly It also gives the earliest visibility to others about the new direction the code will take [2].

We know the least amount about what it will take to implement the new feature at this point but we also have a working product that we can leverage to see how it’s likely to be impacted.

Refactor Before, During & After

Taken at face value it might appear to be contradictory about when the best time to refactor is. Of course this is really a straw man argument as the best time is in fact “all the time” – we should continually keep the code in good shape [3].

That said the act of refactoring should not occur within a vacuum, it should be driven by a need to make a more valuable change. If the code never needed to change we wouldn’t be doing it in the first place and this should be borne in mind when working on a large codebase where there might be a temptation to refactor purely for the sake of it. Seeing stories or tasks go on the backlog which solely amount to a refactoring are a smell and should be heavily scrutinised.

Emergent Design

That said, there are no absolutes and whilst I would view any isolated refactoring task with suspicion, that is effectively what I was proposing back at the beginning of this post. One of the side-effects of emergent design is that you can get yourself into quite a state before a cohesive design finally emerges.

Whilst on paper we had a number of potential designs all vying for a place in the architecture we had gone with the simplest possible thing for as long as possible in the hope that more complex features would arrive on the backlog and we would then have the forces we needed to evaluate one design over another.

Hence the refactoring decision became one between digging ourselves into an even deeper hole first, and then refactoring heavily once we had made the functional change, or doing some up-front preparation to solidify some of the emerging concepts first. There is the potential for waste if you go too far down the up-front route but if you’ve been watching how the design and feature list have been emerging over time it’s likely you already know where you are heading when the time comes to put the design into action.

 

[1] I tend to elide the warning from the original quote about the first part potentially being hard when saying it out loud because the audience is usually well aware of that :o).

[2] See “The Cost of Long-Lived Feature Branches” for a cautionary tale about storing up changes.

[3] See “Relentless Refactoring” for the changes in attitude towards this practice.

Monday 12 June 2017

Stack Overflow With Custom JsonConverter

[There is a Gist on GitHub that contains a minimal working example and summary of this post.]

We recently needed to change our data model so that what was originally a list of one type, became a list of objects of different types with a common base, i.e. our JSON deserialization now needed to deal with polymorphic types.

Naturally we googled the problem to see what support, if any, Newtonsoft’s JSON.Net had. Although it has some built-in support, like many built-in solutions it stores fully qualified type names which we didn’t want in our JSON, we just wanted simple technology-agnostic type names like “cat” or “dog” that we would be happy to map manually somewhere in our code. We didn’t want to write all the deserialization logic manually, but was happy to give the library a leg-up with the mapping of types.

JsonConverter

Our searching quickly led to the following question on Stack Overflow: “Deserializing polymorphic json classes without type information using json.net”. The lack of type information mentioned in the question meant the exact .Net type (i.e. name, assembly, version, etc.), and so the answer describes how to do it where you can infer the resulting type from one or more attributes in the data itself. In our case it was a field unsurprisingly called “type” that held a simplified name as described earlier.

The crux of the solution involves creating a JsonConverter and implementing the two methods CanConvert and ReadJson. If we follow that Stack Overflow post’s top answer we end up with an implementation something like this:

public class CustomJsonConverter : JsonConverter
{
  public override bool CanConvert(Type objectType)
  {
    return typeof(BaseType).
                       IsAssignableFrom(objectType);
  }

  public override object ReadJson(JsonReader reader,
           Type objectType, object existingValue,
           JsonSerializer serializer)
  {
    JObject item = JObject.Load(reader);

    if (item.Value<string>(“type”) == “Derived”)
    {
      return item.ToObject<DerivedType>();
    }
    else
    . . .
  }
}

This all made perfect sense and even agreed with a couple of other blog posts on the topic we unearthed. However when we plugged it in we ended up with an infinite loop in the ReadJson method that resulted in a StackOverflowException. Doing some more googling and checking the Newtonsoft JSON.Net documentation didn’t point out our “obvious” mistake and so we resorted to the time honoured technique of fumbling around with the code to see if we could get this (seemingly promising) solution working.

A Blind Alley

One avenue that appeared to fix the problem was manually adding the JsonConverter to the list of Converters in the JsonSerializerSettings object instead of using the [JsonConverter] attribute on the base class. We went back and forth with some unit tests to prove that this was indeed the solution and even committed this fix to our codebase.

However I was never really satisfied with this outcome and so decided to write this incident up. I started to work through the simplest possible example to illustrate the behaviour but when I came to repro it I found that neither approach worked – attribute or serializer settings - I always got into an infinite loop.

Hence I questioned our original diagnosis and continued to see if there was a more satisfactory answer.

ToObject vs Populate

I went back and re-read the various hits we got with those additional keywords (recursion, infinite loop and stack overflow) to see if we’d missed something along the way. The two main candidates were “Polymorphic JSON Deserialization failing using Json.Net” and “Custom inheritance JsonConverter fails when JsonConverterAttribute is used”. Neither of these explicitly references the answer we initially found and what might be wrong with it – they give a different answer to a slightly different question.

However in these answers they suggest de-serializing the object in a different way, instead of using ToObject<DerivedType>() to do all the heavy lifting, they suggest creating the uninitialized object yourself and then using Populate() to fill in the details, like this:

{
  JObject item = JObject.Load(reader);

  if (item.Value<string>(“type”) == “Derived”)
  {
    var @object = new DerivedType();
    serializer.Populate(item.CreateReader(), @object);
    return @object;
  }
  else
    . . .
}

Plugging this approach into my minimal example worked, and for both the converter techniques too: attribute and serializer settings.

Unanswered Questions

So I’ve found another technique that works, which is great, but I still lack closure around the whole affair. For example, how come the answer in the the original Stack Overflow question “Deserializing polymorphic json classes” didn’t work for us? That answer has plenty of up-votes and so should be considered pretty reliable. Has there been a change to Newtonsoft’s JSON.Net library that has somehow caused this answer to now break for others? Is there a new bug that we’ve literally only just discovered (we’re using v10)? Why don’t the JSON.Net docs warn against this if it really is an issue, or are we looking in the wrong part of the docs?

As described right at the beginning I’ve published a Gist with my minimal example and added a comment to the Stack Overflow answer with that link so that anyone else on the same journey has some other pieces of the jigsaw to work with. Perhaps over time my comment will also acquire up-votes to help indicate that it’s not so cut-and-dried. Or maybe someone who knows the right answer will spot it and point out where we went wrong.

Ultimately though this is probably a case of not seeing the wood for the trees. It’s so easy when you’re trying to solve one problem to get lost in the accidental complexity and not take a step back. Answers on Stack Overflow generally carry a large degree of gravitas, but they should not be assumed to be infallible. All documentation can go out of date even if there are (seemingly) many eyes watching over it.

When your mind-set is one that always assumes the bugs are of your own making, unless the evidence is overwhelming, then those times when you might actually not be entirely at fault seem to feel all the more embarrassing when you realise the answer was probably there all along but you discounted it too early because your train of thought was elsewhere.