Friday 18 October 2013

Start C# Enumerations at 1

Coming from a background in C/C++ I was brought up with the problems of uninitialised variables, which, through the use of better compiler warnings and static code analysis tools I managed to keep under control. Sadly uninitialised member variables are still the domain of the more specialist tools as far as I know. With C# the problem still exists, to some extent. Whilst C#, nay .Net, guarantees that variables and class fields will be initialised to a known value (0 or the moral equivalent, e.g. false, null, etc.) that does not always help in the fight against bugs due to uninitialised variables [1].

Reflection

In C#, unlike C++, you have Reflection which pretty much allows you to do whatever you like and ignore whatever access controls and constraints the class designer put in place by poking data directly into class fields. When you create an object via reflection it only initialises values to their defaults, there must be some other protocol in place to allow a post-initialisation method to be called, e.g. the [OnDeserialized] attribute.

Enums

With the enum type essentially being implemented as a primitive, an integer, it gets the value 0 after construction by reflection. Whereas a real integer always has the value 0 as part of its range, an enum may not - it’s usually constrained to a small subset of the entire range of integral values. What this means in practice is that the bit pattern for 0 may or may not fall onto one of the values in the enumeration depending on how you defined it.

enum Colour
{
  Red,   // first is 0 by default
  White,
  Blue,
}

The C# documentation suggests that you make “the default value" of your enumeration the first. What they are really saying though is that you should try and ensure that an uninitialised variable of your enumeration type happens to coincide with a valid value, and therefore a value that will cause the code to behave sensibly.

Maybe, I’ve being doing C++ for too long because frankly that advice disturbs me.

Special Values

I’m sure there are cases where a default value might makes sense in some enumerations but I’ve rarely come across a case where it does [2]. Also I thought we’d got passed the whole “special” value problem and put the likes of -1, NaN and NULL behind us? If the value is optional, then use an optional-style type, like Nullable<T>. Only, it’s not actually optional, it’s just optional when you want to create the object via reflection [3].

Don’t Mask The Error

So, what’s the alternative? Well you could start your enumeration from 1 instead of 0, by being explicit on the first value:-

enum Colour
{
  Red = 1,
  White,
  Blue,
}

At least now you can detect an uninitialised value when you come to process a value from the enumeration because both the static variation through a switch statement:-

switch (colour)
{
  case red: . . .
  case white: . . .
  case blue: . . .
  default: throw new ArgumentOutOfRangeException();
}

…and the dynamic variation via a dictionary:-

var map = new Dictionary<Colour, Method>();
. . .
if (!map.TryGetValue(colour, out method))
  throw new ArgumentOutOfRangeException();

…allow a check to be made and an invalid value to be detected. Also don’t forget that you can use the Enum.IsDefined() static method to validate an integer value before casting it to its enum equivalent. In fact this kind of thing is just perfect for an extension method:-

public static T AsEnum<T>(this int value)
  where T : struct
{
  if (!typeof(T).IsEnum)
    throw new ArgumentException();

  if (!Enum.IsDefined(typeof(T), value))
    throw new ArgumentOutOfRangeException();

  return (T)Enum.ToObject(typeof(T), value);
}
. . .
int value = 1;
Colour colour = value.AsEnum<Colour>();
Assert.That(colour, Is.EqualTo(Colour.Red));

Sadly C# generics aren’t quite good enough to constrain the type specifically to an enum, a value type is the best we can do at compile time, hence the runtime check at the start of the method. Also I’d be wary of the performance of the Enum.ToObject() to perform the cast as it seems a very ham-fisted way of just doing a (T)value style cast. As always there’s plenty more depth to this topic on StackOverflow.

Tests - Simples

I know the stock answer to my paranoia is just to write some unit tests and make sure the correct behaviour is covered, but there still feels like there is a subtlety here that is just waiting to trick up the unwary maintenance programmer down the line if you go with the party line.

 

[1] My definition of “uninitialised” is based on a variable not having the value it was intended to have, that means it has to be a conscious decision to allow a variable to take on the default value, not an unconscious one. 

[2] Using an enumeration to define a set of bitfield flags is possibly one case. But that in itself is a special type of enumeration that is hardly different from an integer and a class of constants. An enumerated type is commonly used as a type identifier which is a different sort of beast to a set of boolean flags.

[3] I’m currently working with a web API and MongoDB back-end and all the marshalling is being handled automatically via reflection, which is all new ground for me.

Thursday 17 October 2013

Codebase Stability With Feature Toggles

Whilst at Agile Cambridge a few weeks ago I got involved in a brief discussion about using Feature Toggles instead of Feature Branches. The scepticism around using them seemed to focus around the instability that it potentially creates within the codebase.

Development Branch

To me a “development” branch can either be a private (nay, feature) branch or an integration branch where complex change is allowed to take place. If the change is potentially disruptive and cannot be broken down, just maybe a private branch is needed to avoid screwing up “the build” and blocking the team. However, even with Continuous Integration and Continuous Deployment any non-trivial change has the potential to destabilise the product by introducing bugs, that is not a side-effect of feature branches or toggles in particular - just change.

Release Branch

If you want stability then that is what a release branch is for. The policy of change on a release branch is radically different than for a development branch - notably more care is taken with any change to ensure that the potential for new bugs is minimised. In short, I would not expect any formal development on a release branch at all, only surgical changes to fix any late problems. Consequently I wouldn’t expect any work on unreleased features to continue once the branch had been cut, therefore all feature toggles would be “off” and the unused code sits in stasis unless it has any unintended side-effects on the release itself.

Bedding In Changes

Before the days of branching, a software product would often go through the process of a “code freeze” where all new development would halt and the focus would switch to finding and squashing any outstanding bugs before release. Release branches help avoid that to a large degree by allowing the stability work to be done in parallel and in isolation whilst the product continues to evolve in the background on the main integration (development) branch.

Whilst in theory there is no reason not to put a potentially dangerous change in just before you cut your release branch, I would question whether you will give yourself enough time to root out any subtle problems, e.g. performance. Whilst continuously deploying is great for pushing new features out ready for formal QA it destroys any stability of the deployment itself. Unless you have automated stress testing on every deployment you risk the danger of not exposing any changes in the non-functional side, such as resource leaks, badly performing caches, error recovery, etc.

As I mentioned in “Feature Branch or Feature Toggle?” I try to break down my changes into the smallest chunks possible [1]. This has the added benefit that I can then try to get the most dangerous part of my changes into the development branch as soon as possible to give them as long as possible to “bed in”. The same goes for any other nasty changes that might have unanticipated side-effects, like upgrading 3rd party components. As time marches on the development branch moves ever closer to another release and so I would have a preference for ever lower risk changes. As long as the release cycle is frequent enough, it will not be long before the riskier changes can be picked up again.

Cherry Picking

One time when this “anything goes” attitude might bite you is if you need to cherry pick changes from your development branch and apply them to an impending release. As a rule of thumb you should only merge branches in the direction of stable to volatile, i.e. release to development. But sometimes a feature has to get dragged into a release late and if you’re pulling code from a volatile branch you run the risk of needing to drag in more than you bargained for. If the feature you need depends on some earlier refactoring that didn’t make it to the release branch originally you’re in trouble - you either have to drop the feature, pull the refactoring in too (and any changes that it depends on, and so on and so forth) or you have to re-write the change to only depend on the release version of the code. This is not really what you want to be dealing with late in the day.

So, whilst feature toggles are one way to evolve software with the benefits of early integration, there is a time and a place for them, and that is where the other riskier changes are - a development branch.

 

[1] But not so small that I would appear hypocritical and fall foul of my own advice - “What’s the Check-In Frequency, Kenneth?”.

Tuesday 15 October 2013

Concrete Interfaces

Back at the end of last year I wrote a blog post titled “Interfaces Are Not the Only Abstract Fruit”. At the time I was bemoaning the use of interfaces that mirror the public interface of classes that implement them, and in particular where those types were value types. More recently I’ve seen an alternate side to this where a concrete class has had an interface extracted from it, albeit again one which just mirrors the existing public interface of the class. The reason it seems is that the concrete class must now become a seam for unit testing…

I’ve written before in “Don’t Let Your Tools Pwn You” about the dangers of letting your tools dictate the code you produce. In this case, whilst the human is seemingly in control, the ease with which it’s possible to perform this refactoring without seeing it through to its logical conclusion is all too easy. So what is that logical conclusion I’m alluding to?

Implementation != Abstraction

Let’s start with the simpler case. Say you have a proxy for a remote service, that might be implemented in a concrete class like this:-

public class RemoteProductServiceProxy
{
  public Product[] FetchProducts(string startDate,
                                 string endDate)
  { . . . }
}

The refactoring tool, which only knows about the concrete class and not what it represents will probably lead you into creating an interface like this, i.e. stick an I on the front of the class name and expose all the public methods:-

public interface IRemoteProductServiceProxy
{
  public Product[] FetchProducts(string startDate,
                                 string endDate);
}

But an IRemoteProductServiceProxy is not an abstraction it is the implementation. From the name I’m going to guess that the thing your client really wants to interact with is a service for querying about products. The fact that it’s hosted in or out of process is almost certainly of little consequence for the the client [1].

Secondly the types of the parameters for the FetchProducts() method are also leaky, in the abstraction sense, if you’re pushing the burden onto the client for marshalling any input parameters, e.g. if it’s a web API call where the parameters will ultimately be passed as string values. The interface should traffic in the domain types of the abstraction, not the implementation, after all if you do host the service in-process you’d be marshalling the parameters unnecessarily.

With that all said, I’d be looking at something more like this [2]:-

public interface IProductService
{
  public Product[] FetchProducts(Date start,
                                 Date end);
}

Hopefully it should be obvious now why I called the original refactoring a “Concrete Interface”. I did canvas Twitter a few weeks back to see if there was another more formal (or at least polite!) term for this anti-pattern, but it didn’t come to anything. Perhaps the comments to this post will shed some light?

Mock the Abstraction, Not the Technology

There is another variation of this story where, instead of creating an interface for the service, which then allows the client to be entirely isolated from the implementation, the underlying technology is abstracted instead. The premise seems to be that to get a piece of code under test you mock out “the external dependencies”. In this instance the dependency would be the various classes that allow an HTTP request to be made.

However, before creating interfaces and factories that allow you to mock a technology at a low-level, ask yourself what this will actually buy you. Personally I see very little value in sensing all the miniscule interactions between your code and a 3rd party library because all the complexity is in their library, so it often ends up being like tests for getters and setters. For example, mocking all the database classes just to sense that you’ve put the right value in the right parameter seems less valuable to me than actually making a database call that tests the integration between the two (See “C#/SQL Integration Testing With NUnit”).

The reason why the underlying 3rd party technology is often complicated is exactly because it does many different things. For instance a web based API has to consider proxies, authentication, content of different types, different protocols, etc. But I bet you are probably only using a very small part of it. This is especially true in the enterprise arena where you have considerable control over the infrastructure.

Assuming I still want to write unit tests for some aspects of this code, e.g. verifying the URL format for a web API call, I’ll try and create some kind of simple facade that wraps up all that complexity so I’m dealing with an interface tailored to my needs:-

public interface IHttpRequester
{
  string RequestData(string url);
}

Oftentimes much of the grunge with dealing with technology like this is not in the actual execution of the task-in-hand, but in the dance required to create and configure all the various objects that are needed to make it happen. For me that complexity is best handled once elsewhere and tested through separate integration tests.

The Smell

So, if you find yourself creating an interface which matches a single concrete class in both name and public methods ask yourself if there is not really a more formal abstraction trying to get out.

 

[1] Yes, the network is evil, but the interface will likely take that into account by not being too chatty. If some sort of error recovery is possible even in the absence of the service then that might also be the case too, but these are generally the exception in my experience.

[2] I still don’t like the “I” prefix or the “weak” Service suffix, but it’s a common convention. I really like “Requester”, “Locator” or “Finder” for query based services as that sits nicely with the suggestions I made in “Standard Method Name Verb Semantics”.