Friday 25 April 2014

Leaky Abstractions: Querying a TIBCO Queue’s Pending Message Count

My team recently had a stark reminder about the Law of Leaky Abstractions. I wasn’t directly involved but feel compelled to pass the details on given the time we haemorrhaged [1] tracking it down...

Pending Messages

If you google how to find the number of messages that are pending in a TIBCO queue you’ll probably see a snippet of code like this:-

var admin = new Admin(hostname, login, password);
var queueInfo = admin.GetQueue(queueName);
var count = queueInfo.PendingMessageCount;

In fact I’m sure that’s how we came by this code ourselves as we didn’t have any formal documentation to work off right at the very start as this was only going in some test code. In case you're curious we were writing some acceptance tests and in the test code we needed to wait for the queue pending message count to drop to zero as a sign that the messages had been processed and we could safely perform our other asserts to very the behaviour.

Queue Monitoring

So far so good. The test code had been working flawlessly for months but then we started writing a new component. This time we needed to batch message handling up so instead of processing the stream as fast as possible we needed to wait for “a large enough batch”, then pull the messages and aggregate them. We could have buffered the messages in our own process but it felt safer to leave them in the durable storage for as long as possible as there was no two-phase commit going on.

The new acceptance tests were written in a similar style to before but the production code didn’t seem to be working properly - the pending message count always seemed to return 0. The pending message count check was just one of a number of triggers that could start the batch processing and so the code lived in a class something like this:-

public class QueueCountTrigger : IBatchTrigger
{
  public QueueCountTrigger(...)
  {
    _admin = new Admin(hostname, login, password);
    _queueInfo = _admin.GetQueue(queueName);
    _triggerLevel = triggerLevel;
  }

  public bool IsTriggered()
  {
    return (queueInfo.PendingMessageCount >=
            _triggerLevel);
  }

  private Admin _admin;
  private QueueInfo _queueInfo;
  private int _triggerLevel;
}

Granted polling is less than ideal but it would serve our purpose initially. This behaviour was most curious because as we saw it similar code had been working fine in the old acceptance tests for months.

The Leaky Abstraction

Eventually one of the team started poking around under the hood with a disassembler and everything began to fall into place. The QueueInfo object returned from the Admin type was just a snapshot of the queue. When subsequent attempts were made to query the PendingMessageCount property it was just returning a cached value. Because the service always started first, after the queue had been purged, it never saw the count change.

Looking at the TIBCO documentation for the classes (and method) in question you’d struggle to find anything that suggests you can’t hold on to the QueueInfo object and get real-time updates of the various queue attributes. Perhaps the “Info” suffix is supposed to be a clue? In retrospect perhaps QueueSnapshot would be a better name? Or maybe it’s documented clearly elsewhere in some kind of design rationale that you’re supposed to read up front?

I can’t remember if it was Steve Maguire in Writing Solid Code or Steve McConnell in Code Complete, but I’m sure one of them suggested that there is no such thing as a black box. Whilst the documentation may describe the interface there are often implementation details, such as performance or caching effects, that are left unspecified and eventually you’ll need to open the box to find out what’s really going on when it doesn’t behave as you would like in your scenario.

Back to the Tests

Looking more closely at the original code for the acceptance tests it made a sub-optimal call which opened a connection every time the message count was requested (just as the example code at the top would do):-

public int GetPendingMessageCount(...)
{
  var admin = new Admin(hostname, login, password);
  var queueInfo = admin.GetQueue(queueName);

  return queueInfo.PendingMessageCount;
}

This was later changed to an inline call, but re-fetched the QueueInfo snapshot the end of every loop iteration. Sadly the author of this change is no longer around so it’s hard to say if this was done after going through the same discovery exercise above, by basing it on a better example from somewhere else, prior knowledge of the problem, or just out-and-out luck.

public int WaitForQueueToEmpty(...)
{
  var admin = new Admin(hostname, login, password);
  var queueInfo = admin.GetQueue(queueName);

  while (queueInfo.PendingMessageCount > 0)
  {
    // Other timeout handling code
    . . .
    queueInfo = admin.GetQueue(queueName);
  }
}

 

[1] Exaggeration used for dramatic effect.

No comments:

Post a Comment