Showing posts with label c#. Show all posts
Showing posts with label c#. Show all posts

Friday, 8 March 2019

The Perils of Multi-Phase Construction

I’ve never really been a fan of C#’s object initializer syntax. Yes, it’s a little more convenient to write but it has a big downside which is it forces you to make your types mutable by default. Okay, that’s a bit strong, it doesn’t force you to do anything, but it does promote that way of thinking and allows people to take advantage of mutability outside the initialisation block [1].

This post is inspired by some buggy code I encountered where my suspicion is that the subtleties of the object initialisation syntax got lost along the way and partially constructed objects eventually found their way into the wild.

No Dragons Yet

The method, which was to get the next message from a message queue, was originally written something like this:

Message result = null;
RawMessage message = queue.Receive();

if (message != null)
{
  result = new Message
  {
    Priority = message.Priority,
    Type = GetHeader(message, “MessageType”),
    Body = message.Body, 
  };
}

return result;

This was effectively correct. I say “effectively correct” because it doesn’t contain the bug which came later but still relies on mutability which we know can be dangerous.

For example, what would happen if the GetHeader() method threw an exception? At the moment there is no error handling and so the exception propagates out the method and back up the stack. Because we make no effort to recover we let the caller decide what happens when a duff message comes in.

The Dragons Begin Circling

Presumably the behaviour when a malformed message arrived was undesirable because the method was changed slightly to include some recovery fairly soon after:

Message result = null;
RawMessage message = queue.Receive();

if (message != null)
{
  try
  {
    result = new Message
    {
      Priority = message.Priority,
      Type = GetHeader(message, “MessageType”),
      Body = message.Body,  
    };
  }
  catch (Exception e)
  {
    Log.Error(“Invalid message. Skipping.”);
  }
}

return result;

Still no bug yet, but that catch handler falling through to the return at the bottom is somewhat questionable; we are making the reader work hard to track what happens to result under the happy / sad paths to ensure it remains correct under further change.

Object Initialisation Syntax

Before showing the bug, here’s a brief refresher on how the object initialisation syntax works under the covers [2] in the context of our example code. Essentially it invokes the default constructor first and then performs assignments on the various other properties, e.g.

var __m = new Message();
__m.Priority = message.Priority;
__m.Type = GetHeader(message, “MessageType”);
__m.Body = message.Body,  
result = __m;

Notice how the compiler introduces a hidden temporary variable during the construction which it then assigns to the target at the end? This ensures that any exceptions during construction won’t create partially constructed objects that are bound to variables by accident. (This assumes you don’t use the constructor or property setter to attach itself to any global variables either.)

Hence, with respect to our example, if any part of the initialization fails then result will be left as null and therefore the message is indeed discarded and the caller gets a null reference back.

The Dragons Surface

Time passes and the code is then updated to support a new property which is also passed via a header. And then another, and another. However, being more complicated than a simple string value the logic to parse it is placed outside the object initialisation block, like this:

Message result = null;
RawMessage message = queue.Receive();

if (message != null)
{
  try
  {
    result = new Message
    {
      Priority = message.Priority,
      Type = GetHeader(message, “MessageType”),
      Body = message.Body,  
    };

    var str = GetHeader(message, “SomeIntValue”);
    if (str != null && TryParseInt(str, out var value))
      result.IntValue = value;

    // ... more of the same ...
  }
  catch (Exception e)
  {
    Log.Error(“Invalid message. Skipping.”);
  }
}

return result;

Now the problems start. With the latter header parsing code outside the initialisation block result is assigned a partially constructed object while the remaining parsing code runs. Any exceptions that occur [3] mean that result will be left only partially constructed and the caller will be returned the duff object because the exception handler falls out the bottom.

+1 for Tests

The reason I spotted the bug was because I was writing some tests around the code for a new header which also temporarily needed to be optional, like the others, to decouple the deployments. When running the tests there was an error displayed on the console output [4] telling me the message was being discarded, which I didn’t twig at first. It was when I added a retrospective test for the previous optional fields and I found my new one wasn’t be parsed correctly that I realised something funky was going on.

Alternatives

So, what’s the answer? Well, I can think of a number of approaches that would fix this particular code, ranging from small to large in terms of the amount of code that needs changing and our appetite for it.

Firstly we could avoid falling through in the exception handler and make it easier on the reader to comprehend what would be returned in the face of a parsing error:

catch (Exception e)  
{  
  Log.Error(“Invalid message. Skipping.”);
  return null;
}

Secondly we could reduce the scope of the result variable and return that at the end of the parsing block so it’s also clearer about what the happy path returns:

var result = new Message  
{  
  // . . .  
};

var str = GetHeader(message, “SomeIntValue”);
if (str != null && TryParseInt(str, out var value)
  result.IntValue = value;

return result;

We could also short circuit the original check too and remove the longer lived result variable altogether with:

RawMessage message = queue.Receive();

if (message == null)
    return null;

These are all quite simple changes which are also safe going forward should someone add more header values in the same way. Of course, if we were truly perverse and wanted to show how clever we were, we could fold the extra values back into the initialisation block by doing an Extract Function on the logic instead and leave the original dragons in place, e.g.

try
{  
  result = new Message  
  {  
    Priority = message.Priority,  
    Type = GetHeader(message, “MessageType”),  
    Body = message.Body,
    IntValue = GetIntHeader(message, “SomeIntValue”),
    // ... more of the same ...  
  };
}  
catch (Exception e)  
{  
  Log.Error(“Invalid message. Skipping.”);  
}

But we would never do that because the aim is to write code that helps stop people making these kinds of mistakes in the first place. If we want to be clever we should make it easier for the maintainers to fall into The Pit of Success.

Other Alternatives 

I said at the beginning that I was not a fan of mutability by default and therefore it would be remiss of me not to suggest that the entire Message type be made immutable and all properties set via the constructor instead:

result = new Message  
(  
  priority: message.Priority,  
  type: GetHeader(message, “MessageType”),  
  body: message.Body,
  IntValue: GetIntHeader(message, “SomeIntValue”),
  // ... more of the same ...  
);

Yes, adding a new property is a little more work but, as always, writing the tests to make sure it all works correctly will dominate here.

I would also prefer to see use of an Optional<> type instead of a null reference for signalling “no message” but that’s a different discussion.

Epilogue

While this bug was merely “theoretical” at the time I discovered it [5] it quickly came back to bite. A bug fix I made on the sending side got deployed before the receiving end and so the misleading error popped up in the logs after all.

Although the system appeared to be functioning correctly it had slowed down noticeably which we quickly discovered was down to the receiving process continually restarting. What I hadn’t twigged just from reading this nugget of code was that due to the catch handler falling through and passing the message on it was being acknowledged on the queue twice –– once in that catch handler, and again after processing it. This second acknowledgment attempt generated a fatal error that caused the process to restart. Deploying the fixed receiver code as well sorted the issue out.

Ironically the impetus for my blog post “Black Hole - The Fail Fast Anti-Pattern” way back in 2012 was also triggered by two-phase construction problems that caused a process to go into a nasty failure mode, but that time it processed messages much too quickly and stayed alive failing them all.

 

[1] Generally speaking the setting of multiple properties implies it’s multi-phase construction. The more common term Two-Phase Construction comes (I presume) from explicit constructor methods names like Initialise() or Create() which take multiple arguments, like the constructor, rather than setting properties one-by-one.

[2] This is based on my copy of The C# Programming Language: The Annotated Edition.

[3] When the header was missing it was passing a null byte[] reference into a UTF8 decoder which caused it to throw an ArgumentNullException.

[4] Internally it created a logger on-the-fly so it wasn’t an obvious dependency that initially needed mocking.

[5] It’s old, so possibly it did bite in the past but nobody knew why or it magically fixed itself when both ends where upgraded close enough together.

Monday, 12 June 2017

Stack Overflow With Custom JsonConverter

[There is a Gist on GitHub that contains a minimal working example and summary of this post.]

We recently needed to change our data model so that what was originally a list of one type, became a list of objects of different types with a common base, i.e. our JSON deserialization now needed to deal with polymorphic types.

Naturally we googled the problem to see what support, if any, Newtonsoft’s JSON.Net had. Although it has some built-in support, like many built-in solutions it stores fully qualified type names which we didn’t want in our JSON, we just wanted simple technology-agnostic type names like “cat” or “dog” that we would be happy to map manually somewhere in our code. We didn’t want to write all the deserialization logic manually, but was happy to give the library a leg-up with the mapping of types.

JsonConverter

Our searching quickly led to the following question on Stack Overflow: “Deserializing polymorphic json classes without type information using json.net”. The lack of type information mentioned in the question meant the exact .Net type (i.e. name, assembly, version, etc.), and so the answer describes how to do it where you can infer the resulting type from one or more attributes in the data itself. In our case it was a field unsurprisingly called “type” that held a simplified name as described earlier.

The crux of the solution involves creating a JsonConverter and implementing the two methods CanConvert and ReadJson. If we follow that Stack Overflow post’s top answer we end up with an implementation something like this:

public class CustomJsonConverter : JsonConverter
{
  public override bool CanConvert(Type objectType)
  {
    return typeof(BaseType).
                       IsAssignableFrom(objectType);
  }

  public override object ReadJson(JsonReader reader,
           Type objectType, object existingValue,
           JsonSerializer serializer)
  {
    JObject item = JObject.Load(reader);

    if (item.Value<string>(“type”) == “Derived”)
    {
      return item.ToObject<DerivedType>();
    }
    else
    . . .
  }
}

This all made perfect sense and even agreed with a couple of other blog posts on the topic we unearthed. However when we plugged it in we ended up with an infinite loop in the ReadJson method that resulted in a StackOverflowException. Doing some more googling and checking the Newtonsoft JSON.Net documentation didn’t point out our “obvious” mistake and so we resorted to the time honoured technique of fumbling around with the code to see if we could get this (seemingly promising) solution working.

A Blind Alley

One avenue that appeared to fix the problem was manually adding the JsonConverter to the list of Converters in the JsonSerializerSettings object instead of using the [JsonConverter] attribute on the base class. We went back and forth with some unit tests to prove that this was indeed the solution and even committed this fix to our codebase.

However I was never really satisfied with this outcome and so decided to write this incident up. I started to work through the simplest possible example to illustrate the behaviour but when I came to repro it I found that neither approach worked – attribute or serializer settings - I always got into an infinite loop.

Hence I questioned our original diagnosis and continued to see if there was a more satisfactory answer.

ToObject vs Populate

I went back and re-read the various hits we got with those additional keywords (recursion, infinite loop and stack overflow) to see if we’d missed something along the way. The two main candidates were “Polymorphic JSON Deserialization failing using Json.Net” and “Custom inheritance JsonConverter fails when JsonConverterAttribute is used”. Neither of these explicitly references the answer we initially found and what might be wrong with it – they give a different answer to a slightly different question.

However in these answers they suggest de-serializing the object in a different way, instead of using ToObject<DerivedType>() to do all the heavy lifting, they suggest creating the uninitialized object yourself and then using Populate() to fill in the details, like this:

{
  JObject item = JObject.Load(reader);

  if (item.Value<string>(“type”) == “Derived”)
  {
    var @object = new DerivedType();
    serializer.Populate(item.CreateReader(), @object);
    return @object;
  }
  else
    . . .
}

Plugging this approach into my minimal example worked, and for both the converter techniques too: attribute and serializer settings.

Unanswered Questions

So I’ve found another technique that works, which is great, but I still lack closure around the whole affair. For example, how come the answer in the the original Stack Overflow question “Deserializing polymorphic json classes” didn’t work for us? That answer has plenty of up-votes and so should be considered pretty reliable. Has there been a change to Newtonsoft’s JSON.Net library that has somehow caused this answer to now break for others? Is there a new bug that we’ve literally only just discovered (we’re using v10)? Why don’t the JSON.Net docs warn against this if it really is an issue, or are we looking in the wrong part of the docs?

As described right at the beginning I’ve published a Gist with my minimal example and added a comment to the Stack Overflow answer with that link so that anyone else on the same journey has some other pieces of the jigsaw to work with. Perhaps over time my comment will also acquire up-votes to help indicate that it’s not so cut-and-dried. Or maybe someone who knows the right answer will spot it and point out where we went wrong.

Ultimately though this is probably a case of not seeing the wood for the trees. It’s so easy when you’re trying to solve one problem to get lost in the accidental complexity and not take a step back. Answers on Stack Overflow generally carry a large degree of gravitas, but they should not be assumed to be infallible. All documentation can go out of date even if there are (seemingly) many eyes watching over it.

When your mind-set is one that always assumes the bugs are of your own making, unless the evidence is overwhelming, then those times when you might actually not be entirely at fault seem to feel all the more embarrassing when you realise the answer was probably there all along but you discounted it too early because your train of thought was elsewhere.

Tuesday, 28 February 2017

LINQ: Did You Mean First(), or Really Single()?

TL;DR: if you see someone using the LINQ method First() without a comparator it’s probably a bug and they should have used Single().

I often see code where the author “knows” that a sequence (i.e. an Enumerable<T>) will result in just one element and so they use the LINQ method First() to retrieve the value, e.g.

var value = sequence.First();

However there is also the Single() method which could be used to achieve a similar outcome:

var value = sequence.Single();

So what’s the difference and why do I think it’s probably a bug if you used First?

Both First and Single have the same semantics for the case where the the sequence is empty (they throw) and similarly when the sequence contains only a single element (they return it). The difference however is when the sequence contains more than one element – First discards the extra values and Single will throw an exception.

If you’re used to SQL it’s the difference between using “top” to filter and trying extract a single scalar value from a subquery:

select top 1 x as [value] from . . .

and

select a, (select x from . . .) as [value] from . . .

(The latter tends to complain loudly if the result set from the subquery is not just a single scalar value or null.)

While you might argue that in the face of a single-value sequence both methods could be interchangeable, to me they say different things with Single begin the only “correct” choice.

Seeing First says to me that the author knows the sequence might contain multiple values and they have expressed an ordering which ensures the right value will remain after the others have been consciously discarded.

Whereas Single suggests to me that the author knows this sequence contains one (and only one) element and that any other number of elements is wrong.

Hence another big clue that the use of First is probably incorrect is the absence of a comparator function used to order the sequence. Obviously it’s no guarantee as the sequence might be being returned from a remote service or function which will do the sorting instead but I’d generally expect to see the two used together or some other clue (method or variable name, or parameter) nearby which defines the order.

The consequence of getting this wrong is that you don’t detect a break in your expectations (a multi-element sequence). If you’re lucky it will just be a test that starts failing for a strange reason, which is where I mostly see this problem showing up. If you’re unlucky then it will silently fail and you’ll be using the wrong data which will only manifest itself somewhere further down the road where it’s harder to trace back.

Monday, 12 December 2016

Surprising Defaults – HttpClient ExpectContinue

One of the things you quickly discover when moving from building services on-premise to “the cloud” is quite how many more bits of wire and kit suddenly sit between you and your consumer. Performance-wise this already elongated network path can then be further compounded when the framework you’re using invokes unintuitive behaviour by default [1].

The Symptoms

The system was a new REST API built in C# on the .Net framework (4.6) and hosted in the cloud with AWS. This AWS endpoint was then further fronted by Akamai for various reasons. The initial consumer was an on-premise adaptor (also written in C#) which itself had to go through an enterprise grade web proxy to reach the outside world.

Naturally monitoring was added in fairly early on so that we could start to get a feel for how much added latency moving to the cloud would bring. Our first order approximation to instrumentation allowed us to tell how long the HTTP requests took to handle along with a breakdown of the major functions, e.g. database queries and 3rd party requests. Outside the service we had some remote monitoring too that could tell us the performance from a more customer-like position.

When we integrated with the 3rd party service some poor performance stats caused us to look closer into our metrics. The vast majority of big delays were outside our control, but it also raised some other questions as the numbers didn’t quite add up. We had expected the following simple formula to account for virtually all the time:

HTTP Request Time ~= 3rd Party Time + Database Time

However we were seeing a 300 ms discrepancy in many (but not all) cases. It was not our immediate concern as there was bigger fish to fry but some extra instrumentation was added to the OWIN pipeline and we did a couple of quick local profile runs to look out for anything obviously out of place. The finger seemed to point to time lost somewhere in the Nancy part of the pipeline, but that didn’t entirely make sense at the time so it was mentally filed away and we moved on.

Serendipity Strikes

Whilst talking to the 3rd party about our performance woes with their service they came back to us and asked if we could stop sending them a “Expect: 100-Continue” header in our HTTP requests.

This wasn’t something anyone in the team was aware of and as far as we could see from the various RFCs and blog posts it was something “naturally occurring” on the internet. We also didn’t know if it was us adding it or one of the many proxies in between us and them.

We discovered how to turn it off, and did, but it made little difference to the performance problems we had with them, which were in the order of seconds, not milliseconds. Feeling uncomfortable about blindly switching settings off without really understanding them we reverted the change.

The mention of this header also cropped up when we started investigating some errors we were getting from Akamai that seemed to be more related to a disparity in idle connection timeouts.

Eventually, as we learned more about this mysterious header someone in the team put two-and-two together and realised this was possibly where our missing time was going too.

The Cause

Our REST API uses PUT requests to add resources and it appears that the default behaviour of the .Net HttpClient class is to enable the sending of this “Expect: 100-Continue” header for those types of requests. Its purpose is to tell the server that the headers have been sent but that it will delay sending the body until it receives a 100-Continue style response. At that point the client sends the body, the server can then process the entire request and the response is handled by the client as per normal.

Yes, that’s right, it splits the request up so that it takes two round trips instead of one!

Now you can probably begin to understand why our request handling time appeared elongated and why it also appeared to be consumed somewhere within the Nancy framework. The request processing is started and handled by the OWN middleware as that only depends on the headers, it then enters Nancy which finds a handler, and so requests the body in the background (asynchronously). When it finally arrives the whole request is then passed to our Nancy handler just as if it had been sent all as a single chunk.

The Cure

When you google this problem with relation to .Net you’ll see that there are a couple of options here. We were slightly nervous about choosing the nuclear option (setting it globally on the ServicePointManager) and instead added an extra line into our HttpClient factory so that it was localised:

var client = new HttpClient(...);
...
client.DefaultRequestHeaders.ExpectContinue = false;

We re-deployed our services, checked our logs to ensure the header was no longer being sent, and then checked the various metrics to see if the time was now all accounted for, and it was.

Epilogue

In hindsight this all seems fairly obvious, at least, once you know what this header is supposed to do, and yet none of the people in my team (who are all pretty smart) joined up the dots right away. When something like this goes astray I like to try and make sense of why we didn’t pick it up as quickly as perhaps we should have.

In the beginning there were so many new things for the team to grasp. The difference in behaviour between our remote monitoring and on-premise adaptor was assumed to be one of infrastructure especially when we had already battled the on-premise web proxy a few times [2]. We saw so many other headers in our requests that we never added so why would we assume this one was any different (given none of us had run across it before)?

Given the popularity and maturity of the Nancy framework we surmised that no one would use it if there was the kind of performance problems we were seeing, so once again were confused as to how the time could appear to be lost inside it. Although we were all aware of what the async/await construct does none of us had really spent any serious time trying to track down performance anomalies in code that used it so liberally and so once again we had difficulties understanding perhaps what the tool was really telling us.

Ultimately though the default behaviour just seems so utterly wrong that none of use could imagine the out-of-the-box settings would cause the HttpClient to behave this way. By choosing this default we are in essence optimising PUT requests for the scenario where the body does not need sending, which we all felt is definitely the exception not the norm. Aside from large file uploads or massive write contention we were struggling to come up with a plausible use case.

I don’t know what forces caused this decision to be made as I clearly wasn’t there and I can’t find any obvious sources that might explain it either. The internet and HTTP has evolved so much over the years that it’s possible this behaviour provides the best compatibility with web servers out-of-the-box. My own HTTP experience only covers the last few years along with few more around the turn of the millennium, but my colleagues easily cover the decades I’m missing so I don’t feel I’m missing anything obvious.

Hopefully some kind soul will use the comments section to link to the rationale so we can all get a little closure on the issue.

 

[1] Violating The Principle of Least Astonishment for configuration settings was something I covered more generally before in “Sensible Defaults”.

[2] See “The Curse of NTLM Based HTTP Proxies”.

Monday, 14 November 2016

Automated Integration Testing with TIBCO

In the past few years I’ve worked on a few projects where TIBCO has been the message queuing product of choice within the company. Naturally being a test-oriented kind of guy I’ve used unit and component tests for much of the donkey work, but initially had to shy away from writing any automated integration tests due to the inherent difficulties of getting the system into a known state in isolation.

Organisational Barriers

For any automated integration tests to run reliably we need to control the whole environment, which ideally is our development workstations but also our CI build environment (see “The Developer’s Sandbox”). The main barriers to this with a commercial product like TIBCO are often technological, but also more often than not, organisational too.

In my experience middleware like this tends to be proprietary, very expensive, and owned within the organisation by a dedicated team. They will configure the staging and production queues and manage the fault-tolerant servers, which is probably what you’d expect as you near production. A more modern DevOps friendly company would recognise the need to allow teams to test internally first and would help them get access to the product and tools so they can build their test scaffolding that provides the initial feedback loop.

Hence just being given the client access libraries to the product is not enough, we need a way to bring up and tear down the service endpoint, in isolation, so that we can test connectivity and failover scenarios and message interoperability. We also need to be able develop and test our logic around poisoned messages and dead-letter queues. And all this needs to be automatable so that as we develop and refactor we can be sure that we’ve not broken anything; manually testing this stuff is not just not scalable in a shared test environment at the pace modern software is developed.

That said, the TIBCO EMS SDK I’ve been working with (v6.3.0) has all the parts I needed to do this stuff, albeit with some workarounds to avoid needing to run the tests with administrator rights which we’ll look into later.

The only other thorny issue is licensing. You would hope that software product companies would do their utmost to get developers on their side and make it easy for them to build and test their wares, but it is often hard to get clarity around how the product can be used outside of the final production environment. For example trying to find out if the TIBCO service can be run on a developer’s workstation or in a cloud hosted VM solely for the purposes of running some automated tests has been a somewhat arduous task.

This may not be solely the fault of the underlying product company, although the old fashioned licensing agreements often do little to distinguish production and modern development use [1]. No, the real difficulty is finding the right person within the client’s company to talk to about such matters. Unless they are au fait with the role modern automated integrated testing takes place in the development process you will struggle to convince them your intended use is in the interests of the 3rd party product, not stealing revenue from them.

Okay, time to step down from the soap box and focus on the problems we can solve…

Hosting TIBEMSD as a Windows Service

From an automated testing perspective what we need access to is the TIBEMSD.EXE console application. This provides us with one or more TIBCO message queues that we can host on our local machine. Owning thing process means we can therefore create, publish to and delete queues on demand and therefore tightly control the environment.

If you only want to do basic integration testing around the sending and receiving of messages you can configure it as a Windows service and just leave it running in the background. Then your tests can just rely on it always being there like a local database or the file-system. The build machine can be configured this way too.

Unfortunately because it’s a console application and not written to be hosted as a service (at least v6.3 isn’t), you need to use a shim like SRVANY.EXE from the Windows 2003 Resource Kit or something more modern like NSSM. These tools act as an adaptor to the console application so that the Windows SCM can control them.

One thing to be careful of when running TIBEMSD in this way is that it will stick its data files in the CWD (Current Working Directory), which for a service is %SystemRoot%\System32, unless you configure the shim to change it. Putting them in a separate folder makes them a little more obvious and easier to delete when having a clear out [2].

Running TIBEMSD On Demand

Running the TIBCO server as a service makes certain kinds of tests easier to write as you don’t have to worry about starting and stopping it, unless that’s exactly the kinds of test you want to write.

I’ve found it’s all too easy when adding new code or during a refactoring to accidentally break the service so that it doesn’t behave as intended when the network goes up and down, especially when you’re trying to handle poisoned messages.

Hence I prefer to have the TIBEMSD.EXE binary included in the source code repository, in a known place so that it can be started and stopped on demand to verify the connectivity side is working properly. For those classes of integration tests where you just need it to be running you can add it to your fixture-level setup and even keep it running across fixtures to ensure the tests running at an adequate pace.

If, like me, you don’t run as an Administrator all the time (or use elevated command prompts by default) you will find that TIBEMSD doesn’t run out-of-the-box in this way. Fortunately it’s easy to overcome these two issues and run in a LUA (Limited User Account).

Only Bind to the Localhost

One of the problems is that by default the server will try and listen for remote connections from anywhere which means it wants a hole in the firewall for its default port. This of course means you’ll get that firewall popup dialog which is annoying when trying to automate stuff. Whilst you could grant it permission with a one-off NETSH ADVFIREWALL command I prefer components in test mode to not need any special configuration if at all possible.

Windows will allow sockets that only listen for connections from the local host to avoid generating the annoying firewall popup dialog (and this was finally extended to include HTTP too). However we need to tell the TIBCO server to do just that, which we can achieve by creating a trivial configuration file (e.g. localhost.conf) with the following entry:

listen=tcp://127.0.0.1:7222

Now we just need to start it with the –conf switch:
> tibemsd.exe -config localhost.conf

Suppressing the Need For Elevation

So far so good but our other problem is that when you start TIBEMSD it wants you to elevate its permissions. I presume this is a legacy thing and there may be some feature that really needs it but so far in my automated tests I haven’t hit it.

There are a number of ways to control elevation for legacy software that doesn’t have a manifest, like using an external one, but TIBEMSD does and that takes priority. Luckily for us there is a solution in the form of the __COMPAT_LAYER environment variable [3]. Setting this, either through a batch file or within our test code, supresses the need to elevate the server and it runs happily in the background as a normal user, e.g.

> set __COMPAT_LAYER=RunAsInvoker
> tibemsd.exe -config localhost.conf


Spawning TIBEMSD From Within a Test

Once we know how to run TIBEMSD without it causing any popups we are in a position to do that from within an automated test running as any user (LUA), e.g. a developer or the build machine.
In C#, the language where I have been doing this most recently, we can either hard-code a relative path [4] to where TIBEMSD.EXE resides within the repo, or read it from the test assembly’s app.config file to give us a little more flexibility.

<appSettings>
  <add key=”tibemsd.exe”
       value=”..\..\tools\TIBCO\tibemsd.exe” />
  <add key=”conf_file”
       value=”..\..\tools\TIBCO\localhost.conf” />
</appSettings>


We can also add our special .conf file to the same folder and therefore find it in the same way. Whilst we could generate it on-the-fly it never changes so I see little point in doing this extra work.

Something to be wary of if you’re using, say, NUnit to write your integration tests is that it (and ReSharper) can copy the test assemblies to a random location to aid in insuring your tests have no accidental dependencies. In this instance we do, and a rather large one at that, so we need the relative distance between where the test assemblies are built and run (XxxIntTests\bin\Debug) and the TIBEMSD.EXE binary to remain fixed. Hence we need to disable this copying behaviour with the /noshadow switch (or “Tools | Unit Testing | Shadow-copy assemblies being tested” in ReSharper).

Given that we know where our test assembly resides we can use Assembly.GetExecutingAssembly() to create a fully qualified path from the relative one like so:

private static string GetExecutingFolder()
{
  var codebase = Assembly.GetExecutingAssembly()
                         .CodeBase;
  var folder = Path.GetDirectoryName(codebase);
  return new Uri(folder).LocalPath;
}
. . .
var thisFolder = GetExecutingFolder();
var tibcoFolder = “..\..\tools\TIBCO”;
var serverPath = Path.Combine(
            thisFolder, tibcoFolder, “tibemsd.exe”);
var configPath = Path.Combine(
            thisFolder, tibcoFolder, “localhost.conf”);


Now that we know where the binary and config lives we just need to stop the elevation by setting the right environment variable:

Environment.SetEnvironmentVariable("__COMPAT_LAYER", "RunAsInvoker");

Finally we can start the TIBEMSD.EXE console application in the background (i.e. no distracting console window) using Diagnostics.Process:

var process = new System.Diagnostics.Process
{
  StartInfo = new ProcessStartInfo(path, args)
  {
    UseShellExecute = false,
    CreateNoWindow = true,
  }
};
process.Start();


Stopping the daemon involves calling Kill(). There are more graceful ways of remotely stopping a console application which you can try first, but Kill() is always the fall-back approach and of course the TIBCO server has been designed to survive such abuse.
Naturally you can wrap this up with the Dispose pattern so that your test code can be self-contained:

// Arrange
using (RunTibcoServer())
{
  // Act
}

// Assert


Or if you want to amortise the cost of starting it across your tests you can use the fixture-level set-up and tear down:

private IDisposable _server;

[FixtureSetUp]
public void GivenMessageQueueIsAvailable()
{
  _server = RunTibcoServer();
}

[FixtureTearDown]
public void StopMessageQueue()
{
  _server?.Dispose();
  _server = null;
}


One final issue to be aware of, and it’s a common one with integration tests like this which start a process on demand, is that the server might still be running unintentionally across test runs. This can happen when you’re debugging a test and you kill the debugger whilst still inside the test body. The solution is to ensure that the server definitely isn’t already running before you spawn it, and that can be done by killing any existing instances of it:

Process.GetProcessesByName(“tibemsd”)
       .ForEach(p => p.Kill());


Naturally this is a sledgehammer approach and assumes you aren’t using separate ports to run multiple disparate instances, or anything like that.

Other Gottchas

This gets us over the biggest hurdle, control of the server process, but there are a few other little things worth noting.

Due to the asynchronous nature and potential for residual state I’ve found it’s better to drop and re-create any queues at the start of each test to flush them. I also use the Assume.That construct in the arrangement to make it doubly clear I expect the test to start with empty queues.

Also if you’re writing tests that cover background connect and failover be aware that the TIBCO reconnection logic doesn’t trigger unless you have multiple servers configured. Luckily you can specify the same server twice, e.g.

var connection= “tcp://localhost,tcp://localhost”;

If you expect your server to shutdown gracefully, even in the face of having no connection to the queue, you might find that calling Close() on the session and/or connection blocks whilst it’s trying to reconnect (at least in EMS v6.3 it does). This might not be an expected production scenario, but it can hang your tests if something goes awry, hence I’ve used a slightly distasteful workaround where the call to Close() happens on a separate thread with a timeout:

Task.Run(() => _connection.Close()).Wait(1000);

Conclusion

Writing automated integration tests against a middleware product like TIBCO is often an uphill battle that I suspect many don’t have the appetite or patience for. Whilst this post tackles the technical challenges, as they are at least surmountable, the somewhat harder problem of tackling the organisation is sadly still left as an exercise for the reader.


[1] The modern NoSQL database vendors appear to have a much simpler model – use it as much as you like outside production.
[2] If the data files get really large because you leave test messages in them by accident they can cause your machine to really grind after a restart as the service goes through recovery.
[3] How to Run Applications Manifested as Highest Available With a Logon Script Without Elevation for Members of the Administrators Group
[4] A relative path means the repo can then exist anywhere on the developer’s file-system and also means the code and tools are then always self-consistent across revisions.

Monday, 22 August 2016

Sharing Code with Git Subtree

The codebase I currently work on is split into a number of repositories. For example the infrastructure and deployment scripts are in separate repos as are each service-style “component”.

Manual Syncing

To keep things moving along the team decided that the handful of bits of code that were shared between the two services could easily be managed by a spot of manual copying. By keeping the shared code in a separate namespace it was also partitioned off to help make it apparent that this code was at some point going to be elevated to a more formal “shared” status.

This approach was clearly not sustainable but sufficed whilst the team was still working out what to build. Eventually we reached a point where we needed to bring the logging and monitoring stuff in-sync and I also wanted to share some other useful code like an Optional<T> type. It also became apparent that the shared code was missing quite a few unit tests as well.

Share Source or Binaries?

The gut reaction to such a problem in a language like C# would probably be to hive off the shared code into a separate repo and create another build pipeline for it that would result in publishing a package via a NuGet feed. And that is certainly what we expected to do. However the problem was where to publish the packages to as this was closed source. The organisation had its own license for an Enterprise-scale product but it wasn’t initially reachable from outside the premises where our codebase lay. Also there were some problems with getting NuGet to publish to it with an API key that seemed to lay with the way the product’s permissions were configured.

Hence to keep the ball rolling we decided to share the code at the source level by pulling the shared repo into each component’s solution. There are two common ways of doing this with Git – subtrees and submodules.

Git Submodules

It seemed logical that we should adopt the more modern submodule approach as it felt easier to attach, update and detach later. It also appeared to have support in the Jenkins 1.x plugin for doing a recursive clone so we wouldn’t have to frig it with some manual Git voodoo.

As always there is a difference between theory and practice. Whilst I suspect the submodule feature in the Jenkins plugin works great with publicly accessible open-source repos it’s not quite up to scratch when it comes to private repos that require credentials. After much gnashing of teeth trying to convince the Jenkins plugin to recursively clone the submodules, we conceded defeat assuming that we’re another victim of JENKINS-20941.

Git Subtree

Given that our long term goal was to move to publishing a NuGet feed we decided to try using a Git subtree instead so that we could at least move forward and share code. This turned out (initially) to be much simpler because for tooling like Jenkins it appears no different to a single repo.

Our source tree looked (unsurprisingly) like this:

<solution>
  +- src
     +- app
     +- shared-lib
        +- .csproj
        +- *.cs

All we needed to do was replace the shared-lib folder with the contents of the new Shared repository.

First we needed to set up a Git remote. Just as the remote main branch of a cloned repo goes by the name origin/master, so we set up a remote for the Shared repository’s main branch:

> git remote add shared https://github/org/Shared.git

Next we removed the old shared library folder:

> git rm src\shared-lib

…and grafted the new one in from the remote branch:

> git subtree add --prefix src/shared shared master --squash

This effectively takes the shared/master branch and links it further down the repo source tree to src/shared which is where we had it before.

However the organisation of the new Shared repo is not exactly the same as the old shared-lib project folder. A single child project usually sits in it’s own folder, but a full-on repo has it’s own src folder and build scripts and so the source tree now looked like this:

<solution>
  +- src
     +- app
     +- shared
        +- src
           +- shared-lib
              +- .csproj
              +- *.cs

There is now two extra levels of indirection. First there is the shared folder which corresponds to the external repo, plus there is that repo’s src folder.

At this point all that was left to do was to fix up the build, i.e. fix up the path to the shared-lib project in the Visual Studio solution file (.sln) and push the changes.

We chose to use the --squash flag when creating the subtree as we weren’t interested in seeing the entire history of the shared library in the solution’s repository.

Updating the Subtree

Flowing changes from the parent repo down into the subtree of the child repo is as simple as a fetch & pull:

> git fetch shared master
> git subtree pull --prefix src/shared shared master --squash

The latter command is almost the same as the one we used earlier but we pull rather than add. Once again we’re squashing the entire history as we’re not interested in it.

Pushing Changes Back

Naturally you might want to make a change in the subtree in the context of the entire solution and then push it back up to the parent repo. This is doable but involves using git subtree push to normalise the change back into the folder structure of the parent repo.

Personally we decided just to make the changes test-first in the parent and always flow down to the child. In the few cases the child solution helped in debugging we decided to work on the fix in the child solution workspace and then simply manually copy the change over to the shared workspace and push it out through the normal route. It’s by no means optimal but a NuGet feed was always our end game so we tolerated the little bit of friction in the short term.

The End of the Road

If we were only sucking in libraries that had no external dependencies themselves (up to that point our small shared code only relied on the .Net BCL) we might have got away with this technique for longer. But in the end the need to pull in 3rd party dependencies via NuGet in the shared project pushed it over the edge.

The problem is that NuGet packages are on a per-solution basis and the <HintPath> element in the project file assumes a relative path (essentially) from the solution file. When working in the real repo as part of the shared solution it was “..\..\packages\Xxx”, but when it’s part of the subtree based solution it needed to be two levels further up as “..\..\..\..\packages\Xxx”.

Although I didn’t spend long looking I couldn’t find a simple way to easily overcome this problem and so we decided it was time to bite-the-bullet and fix the real issue which was publishing the shared library as a NuGet feed.

Partial Success

This clearly is not anything like what you’d call an extensive use of git subtree to share code, but it certainly gave me a feel for it can do and I think it was relatively painless. What caused us to abandon it was tooling specific (the relationship between the enclosing solution’s NuGet packages folder and the shared assembly project itself) and so a different toolchain may well fair much better if build configuration is only passed down from parent to subtree.

I suspect the main force that might deter you from this technique is how much you know, or feel you need to know, about how git works. When you’re inside a tool like Visual Studio it’s very easy to make a change in the subtree folder and check it in and not necessarily realise you’re modifying what is essentially read-only code. When you next update the subtree things get sticky. Hence you really need to be diligent about your changes and pay extra attention when you commit to ensure you don’t accidentally include edits within the subtree (if you’re not planning on pushing back that way). Depending on how experienced your team are this kind of tip-toeing around the codebase might be just one more thing you’re not willing to take on.

Friday, 24 June 2016

Confusing Assert and Throw

One of the earliest programming books I read that I still have a habit of citing fairly often is Steve Maguire’s Writing Solid Code. This introduced me to the idea of using “asserts” to find bugs where programming contracts had been violated and to the wider concept of Design by Contract. I thought I had understood “when and where” to use asserts (as opposed to signalling errors, e.g by throwing an exception) but one day many years back a production outage brought home to me how I had been lured into misusing asserts when I should have generated errors instead.

The Wake Up Call

I was one of a small team developing a new online trading system around the turn of the millennium. The back end was written in C++ and the front-end was a hybrid Java applet/HTML concoction. In between was a custom HTTP tunnelling mechanism that helped transport the binary messages over the Internet.

One morning, not long after the system had gone live, the server hung. I went in to the server room to investigate and found an assert dialog on the screen telling me that a “time_t value < 0” had been encountered. The failed assert had caused the server to stop itself to avoid any potential further corruption and wait in case we needed to attach a remote debugger. However all the server messages were logged and it was not difficult to see that the erroneous value had come in via a client request. Happy that nothing really bad had happened we switched the server back on, apologised to our customers for the unexpected outage, and I contacted the user to find out more about what browser, version of Java, etc. they were using.

Inside the HTML web page we had a comma-separated list of known Java versions that appeared to have broken implementations of the Date class. The applet would refuse to run if it found one of these. This particular user had an even more broken JVM version that was returning -1 for the resulting date/time value represented in milliseconds. Of course the problem within the JVM should never have managed to “jump the air gap” and cause the server to hang; hence the more important bug was how the bad request was managing to affect the server.

Whilst we had used asserts extensively in the C++ codebase to ensure programmatic errors would show up during development we had forgotten to put validation into certain parts of the service - namely the message deserialization. The reason the assert halted the server instead of silently continuing with a bad value was because we were still running a debug build as performance at that point didn’t demand we switch to a release build.

Non-validated Input

In a sense the assert was correct in that it told me there had been a programmatic error; the error being that we had forgotten to add validation on the message which had come from an untrusted source. Whereas something like JSON coming in over HTTP on a public channel is naturally untrusted, when you own the server, client, serialization and encryption code it’s possible to get lured into a false sense of security that you own the entire chain. But what we didn’t control was the JVM and that proved to be our undoing [1].

Asserts in C++ versus C#

In my move from C++ towards C# I discovered there was a different mentality between the two cultures. In C++ the philosophy is that you don’t pay for what you don’t use and so anything slightly dubious can be denounced as “undefined behaviour” [2]. There is also a different attitude towards debug and release builds where the former often performs poorly at the expense of more rigorous checking around contract violations. In a (naïve) sense you shouldn’t have to pay for the extra checks in a release build because you don’t need them due to you having removed (in theory) any contract violations via testing with the debug build.

Hence it is common for C++ libraries such as STLport to highlight entering into undefined behaviour using an assert. For example iterators in the debug build of STLport are implemented as actual objects that track their underlying container and can therefore detect if they have been invalidated due to modifications. In a release build they might be nothing more than an alias for the built-in pointer type [3].

However in C# the contract tends be validated in a build agnostic manner through the use of checks that result in throwing an explicit ArgumentException / ArgumentNullException type exception. If no formal error checking is used up front then you’ll probably encounter a more terse NullPointerException or IndexOutOfBoundsException when you try and dereference / index on the value sometime later.

Historically many of the assertions that you see in C++ code are validations around the passing of null pointers or index bounds. In C# these are catered for directly by the CLR and so it’s probably unsurprising that there is a difference in culture. Given that it’s (virtually) impossible to dereference in invalid pointer in the CLR and dereferencing a null one has well defined behaviour there is not the same need to be so cautious about memory corruption caused by dodgy pointer arithmetic.

Design by Contract

Even after all this time the ideas behind Design by Contract (DbC) still do not appear to be very well known or considered useful. C# 4 added support for Code Contracts but I have only seen one codebase so far which has used them, and even then its use was patchy. Even the use of Debug.Assert as a very simple DbC mechanism never appears to be used, presumably because there is so little use made of debug builds in the first place (so they would never fire anyway).

Consequently I have tended to use my own home-brew version which is really just a bunch of static methods that I use wherever necessary:

Constraint.MustNotBeNull(anArgument, “what it is”);

When these trigger they throw a ContraintViolationException that includes a suitable message. When I first adopted this in C# I wasn’t sure whether to raise an exception or not, and so I made the behaviour configurable. It wasn’t long before I stripped that out along with the Debug.Assert() calls and just made the exception throwing behaviour the default. I decided that if performance was ever going to be an issue I would deal with it when I had a real example to optimise for. So far that hasn’t happened.

Always Enabled

If I was in any doubt as to their value I got another reminder a while back when developing an in house compute grid. I added a whole bunch of asserts inside the engine management code, which strictly speaking weren’t essential as long as the relationship between the node manager, engine and task didn’t change.

One day someone added a bit of retry logic to allow the engine to reconnect back to its parent manager, which made perfect sense in insolation. But now the invariant was broken as the node manager didn’t know about engines it had never spawned. When the orphaned task finally completed it re-established the broken connection to its parent and tried to signal completion of its work. Only it wasn’t the same work dished out by this instance of the manager. Luckily an internal assert detected the impedance mismatch and screamed loudly.

This is another example where you can easily get lulled into a false sense of security because you own all the pieces and the change you’re making seems obvious (retrying a broken connection). However there may be invariants you don’t even realise exist that will break because the cause-and-effect are so far apart.

Confused About the Problem

The title for this post has been under consideration for the entire time I’ve been trying to write it. I’ve definitely thought about changing it twice before to “Confusing Assert and Validation” as essentially the decision to throw or not is about how to handle the failure of the scenario. But I’ve stuck with the title because it highlights my confused thinking.

I suspect that the reason for this confusion is because the distinction between a contract violation and input validation is quite subtle. In my mind the validation of inputs happens at the boundary of a component, and it only needs to be done once. When you are inside that component you should be able to assume that validation has already occurred. But as I’ve described earlier, a bug can allow something to slip through the net and so input validation happens at the external boundary, but contract enforcement happens at an internal boundary, e.g. a public versus internal class or method [4].

If the input validation triggers its significance is probably less because the check is doing the job it was intended to do, but if a contract violation fires it suggests there is a genuine bug. Also by throwing different exception types it’s easier to distinguish the two cases and then any Big Outer Try Blocks can recover in different ways if necessary, e.g. continue vs shutdown.

Another way to look at the difference is that input validation is a formal part of the type’s contract and therefore I’d write tests to ensure it is enforced correctly. However I would not write tests to ensure any internal contracts are enforced (any more than I would write tests for private methods).

Epilogue

I started writing this post a few years ago and the fact that it has taken this long shows that even now I’m not 100% sure I really know what I’m doing :o). While I knew I had made mistakes in the past and I thought I knew what I was doing wrong, I wasn’t convinced I could collate my thoughts into a coherent piece. Then my switch to a largely C# based world had another disruptive effect on my thinking and succeeded in showing that I really wasn’t ready to put pen to paper.

Now that I’ve finally finished I’m happy where it landed. I’ve been bitten enough times to know that shortcuts get taken, whether consciously or not, and that garbage will end up getting propagated. I can still see value in the distinction of the two approaches but am happy to acknowledge that it’s subtle enough to not get overly hung up about. Better error handling is something we can always do more of and doing it one way or another is what really matters.

[1] Back then the common joke about Java was “write once, test everywhere”, and I found out first-hand why.

[2] There is also the slightly more “constrained” variant that is implementation-defined behaviour. It’s a little less vague, if you know what your implementation will do, but in a general sense you still probably want to assume very little.

[3] See “My Kingdom for a C# Typedef” for why this kind of technique isn’t used.

[4] The transformation from primitive values to domain types is another example, see “Primitive Domain Types - Too Much Like Hard Work?”.

Wednesday, 15 June 2016

Mocking Framework Expectations

I’ve never really got on with mocking frameworks as I’ve generally preferred to write my own mocks, when I do need to use them. This may be because of my C++ background where mocking frameworks are a much newer invention as they’re harder to write due to the lack of reflection.

I suspect that much of my distaste comes from ploughing through reams of test code that massively over specifies exactly how a piece of code works. This is something I’ve covered in the past in “Mock To Test the Outcome, Not the Implementation” but my recent experiences unearthed another reason to dislike them.

It’s always a tricky question when deciding at what level to write tests. Taking an outside-in approach and writing small, focused services means that high-level acceptance tests can cover most of the functionality. I might still use lower-level tests to cover certain cases that are hard to test from the outside, but as tooling and design practices improve I’ve felt less need to unit test everything at the lowest level.

The trade-off here is an ability to triangulate when a test fails. Ideally when one or more tests fail you should be able home in on the problem quickly. It’s likely to be the last thing you changed but sometimes the failure is slightly further back but you’ve only just unearthed it with you latest change. Or, if you’re writing your first test for a new piece of code and it doesn’t go green, then perhaps something in the test code itself is wrong.

This latter case is what recently hit the pair of us when working on a simple web API. We had been fleshing out the API by mocking the back-end and some other dependent services with NSubstitute. One test failure in particular eluded our initial reasoning and so we fired up the debugger to see what was going on.

The chain of events that lead to the test failure ultimately were the result of a missing test method implementation on a mock. The mocking framework, instead of doing something sensible like throwing an exception such as NotImplementedException, chose to return the equivalent of default(T) which for a reference type is null.

While a spurious null reference might normally result in a NullReferenceException quickly after, the value was passed straight into JArray.Parse() which helpfully transformed the bad value into a good one – an empty JSON array. This was a valid scenario (an empty set of things) and so a valid code path was then taken all the back out again to the caller.

Once the missing method on the mock was added the code started behaving as expected.

This behaviour of the mock feels completely wrong to me. And it’s not just NSubstitute that does this, I’ve seen it at least once before with one of the older C# mocking frameworks. In my mind any dependency in the code must be explicitly catered for in the test. The framework cannot know what a “safe” default value is and therefore should not attempt to hide the missing dependency.

I suspect that it’s seen as a “convenience” to help people write tests for code which has many dependencies. If so, then that to me is a design smell. If you have a dependency that needs to be satisfied but plays no significant role in the scenario under test, e.g. validating a security token, then create a NOP style mock that can be reused, e.g. an AlwaysValidIdentityProvider and configure it in the test fixture SetUp so that it’s visible, but not dominant.

A tangential lesson here is one that Sir Tony Hoare is personally burdened with, which is the blatant overloading of a null string reference as a synonym for “no value”. One can easily see how this can happen because of the way that String.IsNullOrEmpty() is used without scant regard for what a null reference should symbolise. My tale of woe in “Extension Methods Should Behave Like Real Methods” is another illustration of this malaise.

I did wonder if we’d shot ourselves in the foot by trying to test at too a high a level, but I don’t believe so because you wouldn’t normally write unit tests for 3rd party components like a JSON parser. I might write characterisation tests for a 3rd party dependency such as a database API if I wanted to be sure that we really knew how it would handle certain types of data conversion or failure, or if we rely on some behaviour we might suspect is not clearly defined and could therefore silently change in a future upgrade. There did not feel like there should have been anything surprising in the code we had written and our dependencies were all very mature components.

Of course perhaps this behaviour is entirely intentional and I just don’t fit the use case, in which case I’d be interested to know what it is. If it’s configurable then I believe it’s a poor choice of default behaviour and I’ll add it to those already noted in “Sensible Defaults”.

Thursday, 2 June 2016

My Kingdom for a C# Typedef

Two of the things I’ve missed most from C++ when writing C# are const and typedef. Okay, so there are a number of other things I’ve missed too (and vice-versa) but these are the ones I miss most regularly.

Generic Types

It’s quite common to see a C# codebase littered with long type names when generics are involved, e.g.

private Dictionary<string, string> headers;

public Constructor()
{
  headers = new Dictionary<string, string>();
}

Dictionary<string, string> FilterHeaders(…)
{
  return something.Where(…).ToDictionary(…);
}

The overreliance on fundamental types in a codebase is a common smell known as Primitive Obsession. Like all code smells it’s just a hint that something may be out of line, but often it’s good enough to serve a purpose until we learn more about the problem we’re trying to solve (See “Refactoring Or Re-Factoring”).

The problem with type declarations like Dictionary<string, string> is that, as their very name suggests, they are too generic. There are many different ways that we can use a dictionary that maps one string to another and so it’s not as obvious which of the many uses applies in this particular instance.

Naturally there are other clues to guide us, such as the names of the class members or variables (e.g. headers) and the noun used in the method name (e.g. FilterHeaders), but it’s not always so clear cut.

The larger problem is usually the amount of duplication that can occur both in the production code and tests as a result of using a generic type. When the time comes to finally introduce the real abstraction you were probably thinking of, the generic type declaration is then littered all over the place. Duplication like this is becoming less of an issue these days with liberal use of var and good refactoring tools like ReSharper, but there is still an excess of visual noise in the code that distracts the eye.

Type Aliases

In C++ and Go (amongst others) you can help with this by creating an alias for the type that can then be used in it’s place:

typedef std::map<std::string, std::string> headers_t;

headers_t headers; 

There is a special form of using declaration in C# that actually comes quite close to this:

using Headers = Dictionary<string, string>;

private Headers headers;

public Constructor()
{
  headers = new Headers();
}

Headers FilterHeaders(…)
{
  return something.Where(…).ToDictionary(…);
}

However it suffers from a number of shortcomings. The first is that it’s clearly not a very well known construct – I hardly ever see C# developers using it. But that’s just an education problem and the reason I’m writing this, right?

The second is that by leveraging the “using” keyword it seems to mess with tools like ReSharper that want to help manage all your namespace imports for you. Instead of keeping all the imports at the top whilst keeping alias style using’s separately below it just lumps them all together, or sees the manual ones later and just appends new imports after them. Essentially they don’t see a difference between these two different use cases for “using”.

A third problem with this case of “using” though is that you cannot compose them. This means you can’t use one alias inside another. For example, imagine that you wanted to provide a cache of a user’s full name (first & last) against some arbitrary user ID, you wouldn’t be able to do this:

using FullName = Pair<string, string>;
using NameCache = Dictionary<int, FullName>;

It’s not unusual to want to compose dictionaries with other dictionaries or lists and that’s the point at which the complexity of the type declaration really takes off.

Finally, and sadly the most disappointing aspect, is that the scope of the alias is limited to just the source file where it’s defined. This means that you have to redefine the alias wherever it is used if you want to be truly consistent. Consequently I’ve tended to stick with only using this technique for the implementation details of a class where I know the scope is limited.

Inheritance

One alternative to the above technique that I’ve seen used is to create a new class that inherits from the generic type:

public class Headers : Dictionary<string, string>
{ }

This is usually used with collections and can be a friction-free approach in C# due to the way that the object-initialization syntax relies on the mutable Add() method:

var header = new Headers
{
  { “location”, “http://chrisoldwood.com” },
};

However if you wanted to use the traditional constructor syntax you’d have to expose all the base type constructors yourself manually on the derived type so that they can chain through to the base.

What makes me most uncomfortable about this approach though is that it uses inheritance, and that has a habit of leading some developers down the wrong path. By appearing to create a new type it’s tempting to start adding new behaviours to it. Sometimes these behaviours will require invariants which can then be violated by using the type through it’s base class (i.e. violate the Liskov Substitution Principle).

Domain Types

Earlier on I suggested that this is often just a stepping stone on the way to creating a proper domain type. The example I gave above was using a Pair to store a person’s first and last name:

using FullName = Pair<string, string>;

Whilst this might be palatable for an implementation detail it’s unlikely to be the kind of thing you want to expose in a published interface as the abstraction soon leaks:

var firstName = fullName.First;
var lastName = fullName.Second;

Personally I find it doesn’t take long for some generics like Pair and Tuple to leave a nasty taste and so I quickly move on to creating a custom type for the job (See “Primitive Domain Types - Too Much Like Hard Work?”):

public class FullName
{
  public string FirstName { get; }
  public string LastName { get; }
}

This is another one of those areas where C# is playing catch-up with the likes of F#. Creating simple immutable data types like these are the bread-and-butter of F#’s Record type; you can do it with anonymous types in C# but not easily with named types.

Strongly-Typed Aliases

One other aside before I sign off about typedefs is that in C, C++, and the “using” approach in C#, they are not strongly-typed types, they are just simple aliases. Consequently you can freely mix-and-match the alias with the underlying type:

var headers = new Headers();
Dictionary<string, string> dictionary = headers;

In C++ you can use templates to workaround this [1] but Go goes one step further and makes the aliases more strongly-typed. This means that there is no implicit conversion even when the fundamental type is the same, which makes them just a little bit more like a real type. Once again F# with its notion of units of measure also scores well on this front.

[1] The first article I remember about this was Matthew Wilson’s “True Typedefs”, but the state of the art has possibly moved on since then. The Boost template metaprogramming book also provides a more extensive discussion on the topic.