Thursday, 17 November 2016

Overly Prescriptive Tests

In my recent post “Tautologies in Tests” I adapted one of Einstein’s apocryphal sayings and suggested that tests should be “as precise as possible, but not too precise”. But what did I mean by that? How can you be too precise, in fact isn’t that the point?

Mocking

One way is to be overly specific when tracking the interactions with mocks. It’s very easy when using a mocking framework to go overboard with your expectations, just because you can. My personal preference (detailed before in “Mock To Test the Outcome, Not the Implementation”) is to keep the details of any interactions loose, but be specific about the outcomes. In other words what matters most is (usually) the observable behaviour, not necessarily how it’s achieved.

For example, rather than set-up detailed instructions on a mock that cover all the expected parameters and call counts I’ll mostly use simple hand-crafted mocks [1] where the method maps to a delegate where I’ll capture only the salient details. Then in the assertions at the end I verify whatever I need to in the same style as the rest of the test. Usually though the canned response is test case specific and so rarely needs any actual logic.

In essence what I’m creating some people prefer to call stubs as they reserve the term “mocks” for more meatier test fakes that record interactions for you. I’d argue that using the more complex form of mock is largely unnecessary and will hurt in the long run. To date (anecdotally speaking) I’ve wasted too much time “fixing” broken tests that overused mocks by specifying every little detail and were never written to give the implementation room to manoeuvre, e.g. during refactoring. In fact an automated refactoring tool is mandatory on code like this because the methods are referenced in so many tests it would take forever to fix-up manually.

I often feel that some of the interactions with dependencies I’ve seen in the past have felt analogous to testing private methods. Another of my previous posts that was inspired by mocking hell is “Don’t Pass Factories, Pass Workers”. Naturally there is a fine line here and maybe I’ve just not seen enough of it done well to appreciate how this particular tool can be used effectively.

White-Box Testing 

The other form of overly specific test I’ve seen comes from what I believe is relying too much on a white-box testing approach so that the tests express the output exactly.

The problem with example based tests is that they are often taken literally, which I guess is kind of the point, but as software engineers we should try and see passed the rigid examples and verify the underlying behaviour instead, which is what we’re really after.

For example, consider a pool of numbers [2] up to some predefined limit, say, 10. A naïve approach to the problem might test the pool by asserting a very specific sequence, i.e. the starting one:

[Test]
public void returns_sequence_up_to_limit()
{
  var pool = new NumberPool(10);
  var expected = new[] { 1, 2, 3, ... , 9, 10 };

  for (var number in expected)
    Assert.That(pool.Acquire(), Is.EqualTo(number));
}

From a white-box testing approach we can look inside the NumberPool and probably see that it’s initially generating numbers using the ++ operator. The implementation might eagerly generate that sequence in the constructor, add them to the end of a queue, and then divvy out the front of the queue.

From a “programmer’s test” point of view (aka unit test) it does indeed verify that, if my expectation is that the implementation should return the exact sequence 1..10, then it will. But how useful is that for the maintainer of this code? I’d argue that we’ve over-specified the way this unit should be allowed to behave.

Verify Behaviours

And that, I think, lies at that heart of the problem. For tests to be truly effective they should not describe exactly what they do, but should describe how they need to behave. Going back to our example above the NumberPool class does not need to return the exact sequence 1..10, it needs to satisfy some looser constraints, such as not returning a duplicate value (until re-acquired), and limiting the range of numbers to between 1 and 10.

[Test]
public void sequence_will_be_unique()
{
  var pool = new NumberPool(10);
  var sequence = new List<int>();

  for (var i in Enumerable.Range(1, 10))
    sequence.Add(pool.Acquire());

  Assert.That(sequence.Distinct().Count(),
              Is.EqualTo(10)); 
}

[Test]
public void sequence_only_contains_one_to_limit()
{
  var pool = new NumberPool(10);
  var sequence = new List<int>();

  for (var i in Enumerable.Range(1, 10))
    sequence.Add(pool.Acquire());

  Assert.That(sequence.Where(n => (n < 1) || (n > 10)),
              Is.Empty);
}

With these two tests we are free to change the implementation to generate a random sequence in the constructor instead if we wanted, and they would still pass, because it conforms to the looser, albeit still well defined, behaviour. (It may have unpredictable performance characteristics but that is a different matter.)

Once again we are beginning to enter the realm of property based testing which forces us to think harder about what behaviours our code exhibits rather than what it should do in one single scenario.

This does not mean there is no place for tests that take a specific set of inputs and validate the result against a known set of outputs. On the contrary they are an excellent starting point for thinking about what the real test should do. They are also important in scenarios where you need some smoke tests that “kick the tyres” or you are naturally handling a very specific scenario.

Indicative Inputs

Sometimes we don’t intend to make our test look specific but it just turns out that way to the future reader. For example in our NumberPool tests above what is the significance of the number “10”? Hopefully in this example it is fairly obvious that it is an arbitrary value as the test names only talk about “a limit”. But what about a test for code that handles, say, an HTTP error?

[Test]
public void client_throws_when_service_unavailable()
{
  using (FakeServer.Returns(InternalServerError))
  {
    var client = new RestClient(. . .);

    Assert.That(client.SendRequest(. . .),
                Throws.InstanceOf<RequestException>());
  }
}

In this test we have a mock (nay stub) HTTP server that will return a non-2XX style result code. Now, what is the significance of the InternalServerError result code returned by the stub? Is it a specific result code we’re handling here, or an indicative one in the 5XX range? The test name uses the term “service unavailable” which maps to the more specific HTTP code 503, so is this in fact a bug in the code or test?

Unless the original author is around to ask (and even remembers) we don’t know. We can surmise what they probably meant by inspecting the production code and seeing how it processes the result code (e.g. a direct comparison or a range based one). From there we might choose to see how we can avoid the ambiguity by refactoring the test. In the case where InternalServerError is merely indicative we can use a suitably named constant instead, e.g.

[Test]
public void throws_when_service_returns_5xx_code()
{
  const int CodeIn5xxRange = InternalServerError;

  using (FakeServer.Returns(CodeIn5xxRange))
  {
    var client = new RestClient(. . .);

    Assert.That(client.SendRequest(. . .),
                Throws.InstanceOf<RequestException>());
  }
}

A clue that there is a disconnect is when the language used in the test name isn’t correctly reflected in the test body itself. So if the name isn’t specific then nor should the test be, but also vice-versa, if the name is specific then expect the test to be. A corollary to this is that if your test name is vague don’t surprised when the test itself turns out equally vague.

Effective Tests

For a suite of tests to be truly effective you need them to remain quietly in the background until you change the code in a way that raises your awareness around some behaviour you didn’t anticipate. The fact that you didn’t anticipate it means that you’ll be relying heavily on the test rather than the code you just changed to make sense of the original intended behaviour.

When it comes under the spotlight (fails) a test needs to convince you that it was well thought out and worthy of your consideration. To be effective a guard dog has to learn the difference between friend and foe and when we write tests we need to learn how to leave enough room for safe manoeuvring without forgetting to bark loudly when we exceed our remit.

 

[1] When you keep your interfaces simple and focused this is pretty easy given how much a modern IDE can generate for you when using a statically typed language.

[2] This example comes from a real one where the numbers where identifiers used to distinguish compute engines in a grid.

Tuesday, 15 November 2016

In The Toolbox – Season Two

As I pointed out in my blog post that collates Season One of my In The Toolbox C Vu column I never intended to write more than a couple of introductory articles before handing it over for others to share their experiences. Yet now, three years later, I’m still plugging away at it and Season Three is already in the making with a couple of episodes already under my belt.

Just as before I also strongly advise you to become a member of the ACCU so you can get this, plus loads of much better content, which may or may not be published online by their respective authors. As I write this post it’s still only a measly £45 per year and is one of the last remaining printed journals about programming.

Anyway, here are links and summaries for episodes 7 through 12.

7: Feature Tracking

We have so many ideas for our products but only so many hours in the day to develop them. Sometimes all it needs is a simple text file in the repo, whilst bigger projects seem to demand an enterprise-grade solution like JIRA.

8: Taming the Inbox

Email is still the predominant means of [a]synchronous communication for many organisations and the barrage of messages need to be triaged if we stand any hope of separating the wheat from the chaff.

9: The Developer’s Sandbox

As programmers we need a safe environment in which to write and test our code, free from the distractions going on around us. When running the tests it should not be at the mercy of other developers running tests at the same time as us; first and foremost we start in isolation, if we can.

10: Dictionary & Thesaurus

One of the hardest problems in computer science is naming and yet two of the oldest tools used to solve this problem often lay dormant on the programmer’s bookshelf.

11: Finding Text

It’s a simple question: how do you find a piece of text? And yet there is a dizzying array of choices available that depend heavily on what’s accessible at the time and where and how that elusive text is stored.

12: Whiteboards

In the move to go digital the humble whiteboard has been pushed aside, which is disappointing as it’s still probably the best design tool available. It also has many other uses than drawing pictures of boxes, drums and cylinders.

Monday, 14 November 2016

Automated Integration Testing with TIBCO

In the past few years I’ve worked on a few projects where TIBCO has been the message queuing product of choice within the company. Naturally being a test-oriented kind of guy I’ve used unit and component tests for much of the donkey work, but initially had to shy away from writing any automated integration tests due to the inherent difficulties of getting the system into a known state in isolation.

Organisational Barriers

For any automated integration tests to run reliably we need to control the whole environment, which ideally is our development workstations but also our CI build environment (see “The Developer’s Sandbox”). The main barriers to this with a commercial product like TIBCO are often technological, but also more often than not, organisational too.

In my experience middleware like this tends to be proprietary, very expensive, and owned within the organisation by a dedicated team. They will configure the staging and production queues and manage the fault-tolerant servers, which is probably what you’d expect as you near production. A more modern DevOps friendly company would recognise the need to allow teams to test internally first and would help them get access to the product and tools so they can build their test scaffolding that provides the initial feedback loop.

Hence just being given the client access libraries to the product is not enough, we need a way to bring up and tear down the service endpoint, in isolation, so that we can test connectivity and failover scenarios and message interoperability. We also need to be able develop and test our logic around poisoned messages and dead-letter queues. And all this needs to be automatable so that as we develop and refactor we can be sure that we’ve not broken anything; manually testing this stuff is not just not scalable in a shared test environment at the pace modern software is developed.

That said, the TIBCO EMS SDK I’ve been working with (v6.3.0) has all the parts I needed to do this stuff, albeit with some workarounds to avoid needing to run the tests with administrator rights which we’ll look into later.

The only other thorny issue is licensing. You would hope that software product companies would do their utmost to get developers on their side and make it easy for them to build and test their wares, but it is often hard to get clarity around how the product can be used outside of the final production environment. For example trying to find out if the TIBCO service can be run on a developer’s workstation or in a cloud hosted VM solely for the purposes of running some automated tests has been a somewhat arduous task.

This may not be solely the fault of the underlying product company, although the old fashioned licensing agreements often do little to distinguish production and modern development use [1]. No, the real difficulty is finding the right person within the client’s company to talk to about such matters. Unless they are au fait with the role modern automated integrated testing takes place in the development process you will struggle to convince them your intended use is in the interests of the 3rd party product, not stealing revenue from them.

Okay, time to step down from the soap box and focus on the problems we can solve…

Hosting TIBEMSD as a Windows Service

From an automated testing perspective what we need access to is the TIBEMSD.EXE console application. This provides us with one or more TIBCO message queues that we can host on our local machine. Owning thing process means we can therefore create, publish to and delete queues on demand and therefore tightly control the environment.

If you only want to do basic integration testing around the sending and receiving of messages you can configure it as a Windows service and just leave it running in the background. Then your tests can just rely on it always being there like a local database or the file-system. The build machine can be configured this way too.

Unfortunately because it’s a console application and not written to be hosted as a service (at least v6.3 isn’t), you need to use a shim like SRVANY.EXE from the Windows 2003 Resource Kit or something more modern like NSSM. These tools act as an adaptor to the console application so that the Windows SCM can control them.

One thing to be careful of when running TIBEMSD in this way is that it will stick its data files in the CWD (Current Working Directory), which for a service is %SystemRoot%\System32, unless you configure the shim to change it. Putting them in a separate folder makes them a little more obvious and easier to delete when having a clear out [2].

Running TIBEMSD On Demand

Running the TIBCO server as a service makes certain kinds of tests easier to write as you don’t have to worry about starting and stopping it, unless that’s exactly the kinds of test you want to write.

I’ve found it’s all too easy when adding new code or during a refactoring to accidentally break the service so that it doesn’t behave as intended when the network goes up and down, especially when you’re trying to handle poisoned messages.

Hence I prefer to have the TIBEMSD.EXE binary included in the source code repository, in a known place so that it can be started and stopped on demand to verify the connectivity side is working properly. For those classes of integration tests where you just need it to be running you can add it to your fixture-level setup and even keep it running across fixtures to ensure the tests running at an adequate pace.

If, like me, you don’t run as an Administrator all the time (or use elevated command prompts by default) you will find that TIBEMSD doesn’t run out-of-the-box in this way. Fortunately it’s easy to overcome these two issues and run in a LUA (Limited User Account).

Only Bind to the Localhost

One of the problems is that by default the server will try and listen for remote connections from anywhere which means it wants a hole in the firewall for its default port. This of course means you’ll get that firewall popup dialog which is annoying when trying to automate stuff. Whilst you could grant it permission with a one-off NETSH ADVFIREWALL command I prefer components in test mode to not need any special configuration if at all possible.

Windows will allow sockets that only listen for connections from the local host to avoid generating the annoying firewall popup dialog (and this was finally extended to include HTTP too). However we need to tell the TIBCO server to do just that, which we can achieve by creating a trivial configuration file (e.g. localhost.conf) with the following entry:

listen=tcp://127.0.0.1:7222

Now we just need to start it with the –conf switch:
> tibemsd.exe -config localhost.conf

Suppressing the Need For Elevation

So far so good but our other problem is that when you start TIBEMSD it wants you to elevate its permissions. I presume this is a legacy thing and there may be some feature that really needs it but so far in my automated tests I haven’t hit it.

There are a number of ways to control elevation for legacy software that doesn’t have a manifest, like using an external one, but TIBEMSD does and that takes priority. Luckily for us there is a solution in the form of the __COMPAT_LAYER environment variable [3]. Setting this, either through a batch file or within our test code, supresses the need to elevate the server and it runs happily in the background as a normal user, e.g.

> set __COMPAT_LAYER=RunAsInvoker
> tibemsd.exe -config localhost.conf


Spawning TIBEMSD From Within a Test

Once we know how to run TIBEMSD without it causing any popups we are in a position to do that from within an automated test running as any user (LUA), e.g. a developer or the build machine.
In C#, the language where I have been doing this most recently, we can either hard-code a relative path [4] to where TIBEMSD.EXE resides within the repo, or read it from the test assembly’s app.config file to give us a little more flexibility.

<appSettings>
  <add key=”tibemsd.exe”
       value=”..\..\tools\TIBCO\tibemsd.exe” />
  <add key=”conf_file”
       value=”..\..\tools\TIBCO\localhost.conf” />
</appSettings>


We can also add our special .conf file to the same folder and therefore find it in the same way. Whilst we could generate it on-the-fly it never changes so I see little point in doing this extra work.

Something to be wary of if you’re using, say, NUnit to write your integration tests is that it (and ReSharper) can copy the test assemblies to a random location to aid in insuring your tests have no accidental dependencies. In this instance we do, and a rather large one at that, so we need the relative distance between where the test assemblies are built and run (XxxIntTests\bin\Debug) and the TIBEMSD.EXE binary to remain fixed. Hence we need to disable this copying behaviour with the /noshadow switch (or “Tools | Unit Testing | Shadow-copy assemblies being tested” in ReSharper).

Given that we know where our test assembly resides we can use Assembly.GetExecutingAssembly() to create a fully qualified path from the relative one like so:

private static string GetExecutingFolder()
{
  var codebase = Assembly.GetExecutingAssembly()
                         .CodeBase;
  var folder = Path.GetDirectoryName(codebase);
  return new Uri(folder).LocalPath;
}
. . .
var thisFolder = GetExecutingFolder();
var tibcoFolder = “..\..\tools\TIBCO”;
var serverPath = Path.Combine(
            thisFolder, tibcoFolder, “tibemsd.exe”);
var configPath = Path.Combine(
            thisFolder, tibcoFolder, “localhost.conf”);


Now that we know where the binary and config lives we just need to stop the elevation by setting the right environment variable:

Environment.SetEnvironmentVariable("__COMPAT_LAYER", "RunAsInvoker");

Finally we can start the TIBEMSD.EXE console application in the background (i.e. no distracting console window) using Diagnostics.Process:

var process = new System.Diagnostics.Process
{
  StartInfo = new ProcessStartInfo(path, args)
  {
    UseShellExecute = false,
    CreateNoWindow = true,
  }
};
process.Start();


Stopping the daemon involves calling Kill(). There are more graceful ways of remotely stopping a console application which you can try first, but Kill() is always the fall-back approach and of course the TIBCO server has been designed to survive such abuse.
Naturally you can wrap this up with the Dispose pattern so that your test code can be self-contained:

// Arrange
using (RunTibcoServer())
{
  // Act
}

// Assert


Or if you want to amortise the cost of starting it across your tests you can use the fixture-level set-up and tear down:

private IDisposable _server;

[FixtureSetUp]
public void GivenMessageQueueIsAvailable()
{
  _server = RunTibcoServer();
}

[FixtureTearDown]
public void StopMessageQueue()
{
  _server?.Dispose();
  _server = null;
}


One final issue to be aware of, and it’s a common one with integration tests like this which start a process on demand, is that the server might still be running unintentionally across test runs. This can happen when you’re debugging a test and you kill the debugger whilst still inside the test body. The solution is to ensure that the server definitely isn’t already running before you spawn it, and that can be done by killing any existing instances of it:

Process.GetProcessesByName(“tibemsd”)
       .ForEach(p => p.Kill());


Naturally this is a sledgehammer approach and assumes you aren’t using separate ports to run multiple disparate instances, or anything like that.

Other Gottchas

This gets us over the biggest hurdle, control of the server process, but there are a few other little things worth noting.

Due to the asynchronous nature and potential for residual state I’ve found it’s better to drop and re-create any queues at the start of each test to flush them. I also use the Assume.That construct in the arrangement to make it doubly clear I expect the test to start with empty queues.

Also if you’re writing tests that cover background connect and failover be aware that the TIBCO reconnection logic doesn’t trigger unless you have multiple servers configured. Luckily you can specify the same server twice, e.g.

var connection= “tcp://localhost,tcp://localhost”;

If you expect your server to shutdown gracefully, even in the face of having no connection to the queue, you might find that calling Close() on the session and/or connection blocks whilst it’s trying to reconnect (at least in EMS v6.3 it does). This might not be an expected production scenario, but it can hang your tests if something goes awry, hence I’ve used a slightly distasteful workaround where the call to Close() happens on a separate thread with a timeout:

Task.Run(() => _connection.Close()).Wait(1000);

Conclusion

Writing automated integration tests against a middleware product like TIBCO is often an uphill battle that I suspect many don’t have the appetite or patience for. Whilst this post tackles the technical challenges, as they are at least surmountable, the somewhat harder problem of tackling the organisation is sadly still left as an exercise for the reader.


[1] The modern NoSQL database vendors appear to have a much simpler model – use it as much as you like outside production.
[2] If the data files get really large because you leave test messages in them by accident they can cause your machine to really grind after a restart as the service goes through recovery.
[3] How to Run Applications Manifested as Highest Available With a Logon Script Without Elevation for Members of the Administrators Group
[4] A relative path means the repo can then exist anywhere on the developer’s file-system and also means the code and tools are then always self-consistent across revisions.

Tuesday, 1 November 2016

Tautologies in Tests

Imagine you’re writing a test for a simple function like abs(). You would probably write something like this:

[Test]
public void abs_returns_the_magnitude_of_the_value()
{
  Assert.That(Math.Abs(-1), Is.EqualTo(1));
}

It’s a simple function, we can calculate the expected output in our head and just plug the expectation (+1) directly in. But what if I said I’ve seen this kind of thing written:

[Test]
public void abs_returns_the_magnitude_of_the_value()
{
  Assert.That(Math.Abs(-1), Is.EqualTo(Math.Abs(-1)));
}

Of course in real life it’s not nearly as obvious as this, the data is lifted out into variables and there is more distance between the action and the way the expectation is derived:

[Test]
public void abs_returns_the_magnitude_of_the_value()
{
  const int negativeValue = –1;

  var expectedValue = Math.Abs(-1);

  Assert.That(Math.Abs(negativeValue),
              Is.EqualTo(expectedValue));
}

I still doubt anyone would actually write this and a simple function like abs() is not what’s usually under test when this crops up. A more realistic scenario would need much more distance between the test and production code, say, a component-level test:

[Test]
public void processed_message_contains_the_request_time()
{
  var requestTime = new DateTime(. . .);
  var input = BuildTestMessage(requestTime, . . . );
  var expectedTime = Processor.FormatTime(requestTime);

  var output = Processor.Process(input, . . .);

  Assert.That(output.RequestTime,
              Is.EqualTo(expectedTime));
}

What Does the Test Say?

If we mentally inline the derivation of the expected value what the test is saying is “When a message is processed the output contains a request time which is formatted by the processor”. This is essentially a tautology because the test is describing its behaviour in terms of the thing under test, it’s self-reinforcing [1].

Applying the advice from Antoine de Saint-Exupéry [2] about perfection being achieved when there is nothing left take away, lets implement FormatTime() like this:

public string FormatTime(DateTime value)
{
  return null;
}

The test will still pass. I know this change is perverse and nobody would ever make that ridiculous a mistake, but the point is that the test is not really doing its job. Also as we rely more heavily on refactoring tools we have to work harder to verify that we have not silently broken a different test that was also inadvertently relying on some aspect of the original behaviour.

Good Duplication

We duplicate work in the test for a reason, as a cross-check that we’ve got it right. This “duplication” is often just performed mentally, e.g. formatting a string, but for a more complex behaviour could be done in code using an alternate algorithm [3]. In fact one of the advantages of a practice like TDD is that you have to work it out beforehand and therefore are not tempted to paste the output from running the test on the basis that you’re sure it’s already correct.

If we had duplicated the work of deriving the output in the example above my little simplification would not have worked as the test would then have failed. Once again, adopting the TDD practice of starting with a failing test and transitioning to green by putting the right implementation in proves that the test will fail if the implementation changes unexpectedly.

This is a sign to watch out for – if you’re not changing the key part of the implementation to make the test pass you might have overly-coupled the test and production code.

What is the Test Really Saying?

The problem with not being the person that wrote the test in the first place is that it may not be telling you what you think it is. For example the tautology may be there because what I just described is not what the author intended the reader to deduce.

The test name only says that the output will contain the time value, the formatting of that value may well be the responsibility of another unit test somewhere else. This is a component level test after all and so I would need to drill into the tests further to see if that were true. A better approach might be to make the breaking change above and see what actually fails. Essentially I would be doing a manual form of Mutation Testing to verify the test coverage.

Alternatively the author may be trying to avoid creating a brittle test which would fail if the formatting was tweaked and so decided the best way to do that would be to reuse the internal code. The question is whether the format matters or not (is it a published API?), and with no other test to specifically answer that question one has to work on an assumption.

This is a noble cause (not writing brittle tests) but there is a balance between the test telling you about a fault in the code and it just being overly specific and annoying by failing on unimportant changes. Sometimes we just need to work a little harder to express the true specification in looser terms. For example maybe we only need to assert that a constituent part of the date is included, say, the year as that is usually the full 4 digits these days:

Assert.That(output.RequestTime,
            Is.StringContaining(“2010”));

If we are careful about the values we choose we can ensure that multiple formats can still conform to a looser contract. For example 10:33:44 on 22/11/2016 contains no individual fields that could naturally be formatted in a way where a simple substring search could give a false positive (e.g. the hour being mistaken for the day of the month).

A Balancing Act

Like everything in software engineering there is a trade-off. Whilst we’d probably prefer to be working with a watertight specification that leaves as little room for ambiguity as possible, we often have details that are pretty loose. When that happens we have to decide how we want to trigger a review of this lack of clarity in the future. If we make the test overly restrictive it runs the risk of becoming brittle, whilst making it overly vague could allow breaking changes to go unnoticed until too late.

Borrowing (apocryphally) from Einstein we should strive to make our tests as precise as possible, but not overly precise. In the process we need to ensure we do not accidentally reuse production code in the test such that we find ourselves defining the behaviour of it, with itself.

 

[1] I’ve looked at the self-reinforcing nature of unit tests before in “Man Cannot Live by Unit Testing Alone”.

[2] See “My Favourite Quotes” for some of the other programming related quotes I find particularly inspiring.

[3] Often one that is slower as correctness generally takes centre stage over performance.