Tuesday, 30 March 2010

Integration Testing with NUnit

Back near the start of my current project I came across a number of tests in one of our unit test assemblies that touched the file-system. My gut reaction was to cry foul and point out that unit tests are not allowed to have any external dependencies. In fact just the other day on the accu-general mailing list someone was trying to locate the definition of a Unit Test that Kevlin Henney used in his ACCU 2008 Conference session:-

A unit test is a test of behaviour (normally functional) whose success or failure is wholly determined by the correctness of the test and the correctness of the unit under test. Thus the unit under test has no external dependencies across boundaries of trust or control, which could introduce other sources of failure. Such external dependencies include networks, databases, files, registries, clocks and user interfaces.

Test Code Structure

The author of the tests agreed that they weren’t really unit tests, but defended his position by saying that he just wanted an easy way to invoke his code. He was using Resharper* which allows you to run a unit test incredibly easy due to it having its only NUnit compatible test runner. Being new to NUnit I also didn’t know that you can categorise tests with string labels that can be passed to the test runner to ensure only a subset of the tests are run:-

[TestFixture, Category(“MyCategory”)]
public class MyClassTests
{
. . .

C:\> nunit-console /include=”MyCategory” MyAssembly.Tests.dll

He had created a number of categories including “UnitTest” and “IntegrationTest”. It felt wrong to include Integration Tests inside what had historically been our Unit Test assemblies so we agreed to split them out at a later date. This meant we could use the assembly itself as a coarse grained grouping mechanism and save NUnit categories for a more functional grouping.

It didn’t take long before it became apparent that we were heading into assembly hell as we had already decided to partition the code so that production logic sat in one assembly, the unit tests in another and a further one for related hand-crafted mocks and stubs that would be used by any test utilities. Adding an integration tests assembly as well would mean a ratio of 1 production assembly to 3 test assemblies. The Visual Studio solution was already creaking even with Solution Folders used to group components.

Instead we decided to keep a ratio of 1:1 with production and test assemblies and use his original test categories and then adjust the build scripts to only run a certain type of test from each test assembly. This also didn’t upset Resharper as it already provides support for running tests of different categories.

NUnit allows you to categorise tests either at the Test or TestFixture level. I didn’t like seeing the two mixed together in the same source file and found the Category attribute added quite a bit of extra noise to the individual tests so we elected to use separate fixtures for each category. This also means that should we decide to go back to using separate assemblies for unit and integration tests we can. So our fixture headings currently look like this:-

In MyClassUnitTests.cs

[TestFixture, Category(Test.Category.UnitTest)]
public class MyClassUnitTests
{
. . .
}

In MyClassIntegrationTests.cs

[TestFixture, Category(Test.Category.IntegrationTest)]
public class MyClassIntegrationTests
{
. . .
}

The nature of integration tests is such that you should be testing your interaction with some external dependency and therefore it is that dependency that more naturally lends itself to the test fixture name and avoids this kind of clash:-

[TestFixture, Category(Test.Category.IntegrationTest)]
public class TheirBackEndIntegrationTests
{
. . .

Of course this may work for a simple Gateway style component, like a web service with a couple of methods, but when you write tests for your Data Access Layer classes you may well just be writing tests that either use a mock or real database connection and so the distinction may appear to be only the test type and database dependency. Mind you, perhaps that’s a smell indicating your testing efforts are not optimal?

Test Configuration

By Kevlin’s definition there must be an external dependency and in the initial cases there was – the file-system. However looking forward we also had plenty of database code to come and it would be neat if we could test that the same way. We now do and I’ll describe that in a follow-up post.

Fortunately the nature of the components that we were initially testing were such that they serve up data from the file-system in a read-only manner. This meant that we could create a cut-down copy of the service data repository inside the VCS structure and always keep the folder layout and data format in step with the code. It also means that multiple simultaneous builds will not interfere with each other.

The build process uses some pretty run-of-the-mill batch files, so the question was how to pass the path of the test data folder via nunit-console.exe onto the test code assembly. I didn’t find any obvious mechanism during my searches – mostly people seemed to be trying to use the standard .Net .config mechanism and running into issues with AppDomains and suchlike. I decided that given the core folder structure was pretty much settled, and it’s already part of the VCS repository, what’s wrong with hard-coding a relative path?

const string REPOSITORY_ROOT = “..\..\..\IntegrationTests\Data\MyService”

Yes I know it feels unclean, but is it any different to hard-coding the same path in the .bat file that invokes nunit-console? At least it’s a relative path and I would hope that we would find out pretty quickly if this breaks or our Continuous Integration process is worthless.

This is all fine and dandy for read-only integration tests (which the vast majority of ours are at the moment) but what about if they also have to write to the file-system? For this we are using the Temp directory. The .Net API has methods for generating a temporary file with a unique name or you can use the Process ID and Thread ID if you want to something a little more user-friendly as this will still be safe with multiple simultaneous builds running. In retrospect I guess you could create a Temp folder within the VCS working copy and treat it like the ‘obj’ and ‘bin’ output folders. In Subversion you can use the ‘svn:ignore’ property to exclude a file or folder from being treated as unversioned and keep your “SVN Commit…” window clean.

Test Results

You still write your tests in the exact same was as unit tests, it’s just that you may have to be a little more creative in what it is that you are asserting. If you’re not round-tripping the data and cannot use Assert.AreEqual() you may instead just be satisfied with checking for a files existence and/or length:-

Assert.IsTrue(System.IO.File.Exists(filename));

Alternatively you can use the absence of exceptions to indicate success by finishing the test with Assert.Pass()** or Assert.IsTrue(true). Sometimes it’s just not possible to completely verify the results of this kind of test and the ROI to achieve that may be small; it may be worth putting that effort into System Tests instead. At the very least you’ll end up with an easy way to invoke the code under a debugger should the need arise.

Component Level Tests

Integration testing appears to be quite a loose term with the definition of ‘external’ being applied to everything from another class, to another team, to another organisation. In his book Code Craft, Pete Goodliffe describes Component Testing (p138) as something that sits between Unit and Integration Testing. Having not made this distinction in the past I’m not sure quite where the ‘external dependency’ line is drawn, but my guess is that it’s probably closer to Unit than Integration. Hopefully someone will add a comment that points out a really good book/article on testing as Code Complete’s chapter on Integration/System Testing talks a lot about methodology and little about the testing mechanics.

 

 

[*The reason why I’m procrastinating over installing Resharper will have to wait for another day, but Thomas Guest touched nicely on the subject just recently.]

[**It seems that Resharper doesn’t like Assert.Pass() - maybe I’m using it incorrectly? The uglier Assert.IsTrue(true) satisfies my need to always finish a test with an Assert.]

Friday, 12 March 2010

Cleaning up svn:mergeinfo Droppings

I’ve been using Subversion for about 6 months now and the daily grind of Update/Edit/Commit is a breeze with TortoiseSVN and Ankh. Admittedly my team only has a small codebase at the moment but performance is generally very good. However one area that Subversion is already struggling with is merging. I don’t think this is news to anyone who has used Subversion for any period of time (or the developers for that matter) but it does taint the experience a little. It’s also obviously unfair to compare a free VCS like Subversion with something Enterprisey like ClearCase (that comes with an Enterprise price tag to match), but I’ll do it anyway because my only other real yardstick is Visual SourceSafe…

The Merging Strategy

We are 3 iterations into a new system and although we have made no formal delivery yet, I have nonetheless treated the end of each iteration as a formal release so that I can get used to the branching, merging and labelling techniques in Subversion. I was already aware that Subversion does not treat branches and labels as first-class concepts like ClearCase, so was keen to explore Subversion’s model to discover its limitations as early as possible. We are following a run-of-the-mill development model, with continuous integration into the trunk and branching for release, but due to the odd mishap we have needed to cherry-pick changes off the trunk for the release and also cherry-pick changes off the release back to the trunk. The former is just an education problem as some of the developers adjust to working with branches, whilst the latter is a necessity because releases are often not 100% correct at the point of branching and need nursing to get them production ready. In the past with ClearCase cherry-picking changes caused no side effects but with Subversion it creates an ever growing trail that affects subsequent merges in unpleasant manner.

The svn:mergeinfo Property in Subversion

Subversion uses a special property on files and folders to record when a merge has taken place called “svn:mergeinfo”. The value of the property is a list of the branches (and associated range of revisions) that have been merged to date – irrespective of whether a physical change has been made to the item itself, e.g.

/branches/releases/1.0:1-1000
/branches/feature/cool_stuff:900-910
/branches/releases/2.0:1100-2000

Generally speaking it’s a good idea to merge using the root of a branch to ensure that you don’t miss anything (unless you’re cherry-picking changes of course). In Subversion this means that only the branch roots then need to have their svn:mergeinfo properties updated which keeps things clean and tidy. But if you cherry-pick a change Subversion then needs to maintain a merge list on that file for ever more, and more importantly for every future merge where that file is a child and therefore a potential candidate. The net outcome of this architecture is that you end up with a very noisy commit every time you merge because it is full of svn:mergeinfo property updates with the real code changes obscured.

ClearCase uses a similar technique (called hyperlinks) to achieve the same goal, but the difference is that it only records a link back to the single source version that contributed the changes – subsequent merges have no effect. Of course ClearCase is renowned for it’s slowness* and it’s possible that the Subversion architecture may improve the speed of merging at the cost of fidelity. The Revision Graph from TortoiseSVN doesn’t appear to have the wonderful merge arrows that the equivalent feature in ClearCase has and you can’t instigate merges from the TortoiseSVN Revision Graph – a feature I used heavily on a previous project - but maybe that says more about the quality of the Development Process than the tool…

Deleting the MergeInfo Properties

Now, if I interpret this post “Subversion merge reintegrate” correctly then the point of the mergeinfo property is so that Subversion knows what contributions have already been taken from other branches. The fact that Subversion keeps updating the upper revision in the property after each branch is merged reinforces my belief that this is an optimisation of some sort. In theory then you could remove those entries in the merginfo properties which reference dead branches. The only fly-in-the-ointment is that you probably won’t find a mergeinfo property with a single reference exactly because of the behaviour outlined above. So what about deleting the whole mergeinfo property on each file and folder?

C:\> svn propdel svn:mergeinfo –R

Obviously you can’t do this on the source branch as it’s the target of the branch that gets updated on a merge. So it would really have to be done on the trunk. If you know that you are never going to do a large scale merge again from any release branches (i.e. trunk is the only active branch) then I think wiping out all the svn:mergeinfo properties could be the quick solution. And you still have the ability to cherry-pick changes from an old branch should the need arise.

When I say all properties, there is one folder you should leave untouched – the root folder of the branch, e.g. /trunk or /branches/Release/0.1 or whatever. I believe that this advice is sound as long as you have not deleted any properties from files or sub-folders that have revisions outside the range specified at the root folder. The svn:mergeinfo property appears to be ‘inherited’ by its children so the act of deleting the property from the children doesn’t seem to stop Subversion from correctly inferring the merge set as long as a parent folder back up the tree has the right information.

Faking a Merge

One of the features I’ve seen in the TortoiseSVN UI that I’ve not yet used is the ability to record a merge without making any actual content changes. I’ve used similar features in the past when you have files you want to reconcile but the target version doesn’t physically need to change. The canonical example of this is a file that just contains the version number – it is different on every release branch but fixed to, say, 0.0.0 on the trunk. Merging these files will always cause a conflict so you only want to record the fact that the file has been ‘logically’ merged to ensure the entire merge is clean, i.e. no changes are left unaccounted for on the source branch.

I reckon this feature gives us a clean(ish) way to resolve the issue at strategic points in our development cycle – as long as we don’t perform long term development outside of the trunk. Here’s how I think it goes:-

  1. Feature branches are created, then merged (reintegrated) at their roots and discarded. This means only the root folder needs to have its svn:mergeinfo up-to-date. Once discarded the merge information in the root serves no further purpose, but also causes no problems.
  2. Release branches on the other hand may have changes cherry-picked off the trunk and other temporary branches and the reverse may also occur when the release is patched. This is where we start accumulating svn:mergeinfo properties on the child files and folders.
  3. Once we reach a stable point where we no longer expect to take old contributions from our previous release branches we delete all the child svn:mergeinfo properties on the trunk (leaving one at the root) and record merges at the trunk root for each branch that we cherry-picked from. The net result should be the trunk root having merge records accurate up to today. Fixes made after today on the release branches should still show up in future merges as candidates.

I was going to add an additional condition on point 3 that there must be no open feature branches. My reservation is that Subversion may undo all this hard work when it comes to reintegrate a feature branch as there will be svn:mergeinfo properties on one branch but not the other. I haven’t done any experiments yet to see how Subversion handles this kind of property merging, but I will when a suitable opportunity arises.

It’s Easy To Google The Answer Once You Know It

When I first started looking into this issue, I didn’t really know what to search for and so didn’t find much. I then read bits of the SVN Book and started writing this post based on what I thought might be the answer. Once I’d finished it I went back and Googled “mergeinfo propdel” and what do you know? Yup, more posts about this issue than you can shake a stick at! I’m still publishing it though because I’ve not really seen one that explains why you can use svn propdel and many of them don’t mention keeping the mergeinfo intact at the branch root. In fact this is the most succinct answer I came across…

 

[*Personally I’ve never found the time waiting for ClearCase to generate the merge set on some 30,000 file views that excessive. At least not compared to the time you actually spend trying to decipher what code people have written to verify whether the merge is valid]

Monday, 8 March 2010

WCF Service Refusing Connections

I spent the majority of the day looking into a WCF issue with our system whereby the same request submitted in bulk against the same WCF service would fail indeterminately:-

System.ServiceModel.EndpointNotFoundException: Could not connect to http://localhost:8000/MyService. TCP error code 10061: No connection could be made because the target machine actively refused it 127.0.0.1:8000

I had noticed by log inspection that about 10 of the requests would succeed and the majority of the others would fail, this led me to believe that there was a default setting somewhere inside the WCF configuration that was having a throttling effect. When I finally got to the bottom of the issue I was somewhat annoyed with myself as it was a classic socket issue - not a WCF specific one. Therein I guess lies the problem with dealing with ever higher layers of abstraction – you begin to forget about the low-level details that underpin it (and which eventually leaks through). Fortunately my copy of Programming WCF Services by Juval Lowy arrived the other day so I had a chance to do some reading to see what possible setting could be having an effect.

ServiceThrottling “maxConcurrent*'”

Lowy dedicates an entire chapter to service request throttling. There are 3 settings:- maxConcurrentSessions, maxConcurrentCalls and maxConcurrentInstances described for the Service Throttling Behaviour that allow the maximum load on each service to be controlled. The effects of these settings are dependent on the transport and concurrency mode of the service, but it’s all spelt out clearly in the book. The most interesting of the lot for me was the number of concurrent sessions - which defaults to 10 under a Per-Session model which is what we are using.

I bumped the setting up to 1000 (way over the 100 needed to service my test harness load), but it had no effect. To humour myself I dropped all three settings to 1 to ensure it had some impact. It did. Oh.

NetTcpBinding “MaxConnections”

Back to the drawing board, or more accurately Google and the book. So convinced was I that I had discovered the cause that I never bothered to read on to the end of the chapter, where I would have discovered another setting that limits the maximum number of TCP connections for the binding:- maxConnections. This setting, which also has a default of 10, goes hand-in-hand with the previous settings as the smaller of it and maxConcurrentSessions becomes the effective throttle.

Once again I changed my service config only to become immediately disappointed as it also failed to have the desired result.

NetTcpBinding “listenBacklog”

Switching back to Google I read various posts about firewall issues which didn’t apply as I was running on the same machine. However I did start to pay closer attention to some of the settings that were being included in the App.config files. One in particular caught my eye – listenBacklog. Anyone who has done any raw sockets programming will know that when you create a server-side socket and start listening on it you need to specify how many pending connections you’re willing to buffer. I quickly Googled again to see what the default value was – yup another 10!

I quickly plugged my ridiculous value into the .config file and bingo! This time it worked. So, a new twist on a very old problem…