Friday 25 April 2014

Leaky Abstractions: Querying a TIBCO Queue’s Pending Message Count

My team recently had a stark reminder about the Law of Leaky Abstractions. I wasn’t directly involved but feel compelled to pass the details on given the time we haemorrhaged [1] tracking it down...

Pending Messages

If you google how to find the number of messages that are pending in a TIBCO queue you’ll probably see a snippet of code like this:-

var admin = new Admin(hostname, login, password);
var queueInfo = admin.GetQueue(queueName);
var count = queueInfo.PendingMessageCount;

In fact I’m sure that’s how we came by this code ourselves as we didn’t have any formal documentation to work off right at the very start as this was only going in some test code. In case you're curious we were writing some acceptance tests and in the test code we needed to wait for the queue pending message count to drop to zero as a sign that the messages had been processed and we could safely perform our other asserts to very the behaviour.

Queue Monitoring

So far so good. The test code had been working flawlessly for months but then we started writing a new component. This time we needed to batch message handling up so instead of processing the stream as fast as possible we needed to wait for “a large enough batch”, then pull the messages and aggregate them. We could have buffered the messages in our own process but it felt safer to leave them in the durable storage for as long as possible as there was no two-phase commit going on.

The new acceptance tests were written in a similar style to before but the production code didn’t seem to be working properly - the pending message count always seemed to return 0. The pending message count check was just one of a number of triggers that could start the batch processing and so the code lived in a class something like this:-

public class QueueCountTrigger : IBatchTrigger
{
  public QueueCountTrigger(...)
  {
    _admin = new Admin(hostname, login, password);
    _queueInfo = _admin.GetQueue(queueName);
    _triggerLevel = triggerLevel;
  }

  public bool IsTriggered()
  {
    return (queueInfo.PendingMessageCount >=
            _triggerLevel);
  }

  private Admin _admin;
  private QueueInfo _queueInfo;
  private int _triggerLevel;
}

Granted polling is less than ideal but it would serve our purpose initially. This behaviour was most curious because as we saw it similar code had been working fine in the old acceptance tests for months.

The Leaky Abstraction

Eventually one of the team started poking around under the hood with a disassembler and everything began to fall into place. The QueueInfo object returned from the Admin type was just a snapshot of the queue. When subsequent attempts were made to query the PendingMessageCount property it was just returning a cached value. Because the service always started first, after the queue had been purged, it never saw the count change.

Looking at the TIBCO documentation for the classes (and method) in question you’d struggle to find anything that suggests you can’t hold on to the QueueInfo object and get real-time updates of the various queue attributes. Perhaps the “Info” suffix is supposed to be a clue? In retrospect perhaps QueueSnapshot would be a better name? Or maybe it’s documented clearly elsewhere in some kind of design rationale that you’re supposed to read up front?

I can’t remember if it was Steve Maguire in Writing Solid Code or Steve McConnell in Code Complete, but I’m sure one of them suggested that there is no such thing as a black box. Whilst the documentation may describe the interface there are often implementation details, such as performance or caching effects, that are left unspecified and eventually you’ll need to open the box to find out what’s really going on when it doesn’t behave as you would like in your scenario.

Back to the Tests

Looking more closely at the original code for the acceptance tests it made a sub-optimal call which opened a connection every time the message count was requested (just as the example code at the top would do):-

public int GetPendingMessageCount(...)
{
  var admin = new Admin(hostname, login, password);
  var queueInfo = admin.GetQueue(queueName);

  return queueInfo.PendingMessageCount;
}

This was later changed to an inline call, but re-fetched the QueueInfo snapshot the end of every loop iteration. Sadly the author of this change is no longer around so it’s hard to say if this was done after going through the same discovery exercise above, by basing it on a better example from somewhere else, prior knowledge of the problem, or just out-and-out luck.

public int WaitForQueueToEmpty(...)
{
  var admin = new Admin(hostname, login, password);
  var queueInfo = admin.GetQueue(queueName);

  while (queueInfo.PendingMessageCount > 0)
  {
    // Other timeout handling code
    . . .
    queueInfo = admin.GetQueue(queueName);
  }
}

 

[1] Exaggeration used for dramatic effect.

Thursday 24 April 2014

What’s the Right Size for a Commit?

Embedded image permalink

At end of my ACCU 2014 Conference talk this year on Version Control - Patterns & Practices an interesting question came up:-

What’s the right size for a commit?

My answer, which I’ll admit was mostly based on instinct and not through any prior reasoning, was this:-

Something that can be cherry picked.

So that was the short answer, this post is the longer one that leads me to that conclusion…

Single Responsibility Principle

I’ve written before (What’s the Check-In Frequency, Kenneth?) about the effects of fine-grained commits on an integration branch, suffice to say that they cause a lot of noise when you come back to the history to do a spot of software archaeology. In my opinion, if you want to check-in after you get every unit test passing then you should use some kind of private branch. The converse side would be to bundle up so much into a single commit that it would appear as the proverbial Big Ball of Mud.

In essence the advice I gave was nothing more than the application of the age old Single Responsibility Principle as applied to the problem of committing changes to a version control system.

Cherry Picking

The practice of cherry picking changes from one branch to another (often from development to release to squeeze one more feature in) has a bad reputation. I suspect the reason for this is largely down to the breakages and stress it’s caused from incorrectly trying to divorce one single “feature” from the other changes that got made at the same time. Or from not merging all the commits that make up the “logical set” for that feature.

Don’t get me wrong cherry picking is an ugly business that should be avoided if at all possible, but it has its uses and so my approach has always been to  ensure that my commits create small, consistent units of change. Of course I break the build too sometimes and consequently I might have the odd “stray” commit that fixes up the build, but by-and-large each commit should stand alone and add some “value”.

Feature Branches

I very rarely use feature branches because I dislike all the constant merging to and fro, but when I have to the merge at the end usually becomes a single commit to the integration branch. The exception is when I’ve also made a few other changes as a by-product. Whilst the main goal is to implement a specific feature (or fix a bug, etc.) when you’re working on a legacy codebase that lacks any automated test coverage it can save a fair bit of time if the cost of testing can be amortised across a number of changes. This means that during the final merge I need to initially cherry pick a number of other fixes first as individual commits, then merge the remainder (the core feature) as a single commit.

In the integration branch this translates to a series of commits, where each one corresponds to a single feature, it just so happens that they all came from the same source branch. Hence my view is that as long as the “observable outcome” is the same - small, feature-focused commits on the integration branch - it doesn’t really matter too much how they got there. Granted it makes reading the private branch is little more awkward in the future but I feel the saving in development time is often worth the cost.

Feature Toggles

My preference has generally been for continuous integration through the use of feature toggles. This makes integration easier but cherry-picking harder because the entire feature might be spaced out across a number of discrete commits. I often break a feature down into many smaller tasks that can be delivered to production as soon as possible. This means I generally start with any refactoring because, by definition, I cannot have made any observable changes. Next up is the new code, which, as long as I don’t provide a means to access it can also be deployed to production as a silent passenger. That just leaves the changed or deleted code which will start to have some noticeable impact, at least when it’s toggled on. Each one of these steps may be one or more commits depending on how big the feature is, but each one should still be a cohesive unit of work.

From a cherry-picking perspective what we need to know is all the individual commits that make up the entire feature. This is where I’ve found the practice of tagging each commit with the “change request number”. If you’re using something like JIRA then each feature will likely be represented in that system with a unique id, e.g. PROJ-123. Even Trello cards have a unique number that can be used. I usually prefix (rather then append) each commit with the change code as it makes them easy to see when scanning the VCS change log.

Theory and Practice

It might sound as though cherry-picking features is a regular affair because I pay more than lip-service to them. That’s really not the case, I will avoid them like everyone else, even if they appear to be easy. Unless a change is relatively trivial, e.g. a simple bug fix, it’s quite possible that some refactoring on the integration branch will muddy the waters enough to make not doing it a no-brainer anyway.

It’s hard to quantify how much value there would be in any arbitrary commit. If the entire change is a one-line bug fix it’s easier to determine how big the commit should be. When it’s a new feature that will involve the full suite of modifications - refactoring, adding, removing, updating - it can be harder to see how to break it down into smaller units. This is where I find the notion of software archaeology comes in handy because I project forward 12 months to my future self and look back at the commit and ask myself whether it makes sense.

 

Photo by Thaddaeus Frogley (@codemonkey_uk)

Keeping the Faith Under Pressure

I walked into my client’s offices one day and almost before I had got my coat off I was greeted by one of my client’s architects and told to meet them in a room once I’d got myself composed. Luckily I’d bought my coffee on the way in. In the meeting was my current project’s manager and an enterprise-level architect and he started by saying “Ignore what you’re doing at the moment we have something far more important for you"...

They had a performance problem with one of their new services that was waiting to go into production, but the performance problem was being stubborn to diagnose and so they wanted to know if we could build a simple tactical workaround that would help them move forward with the project in the meantime. The service only needed to do a simple lookup and the data set was around 3 GB. Given that the performance problem was in the infrastructure we came up with a simple in-memory solution that used a binary search so we should easily be able to cope with the necessary 10 ms response time.

All through the meeting the time-sensitive nature for the delivery of this tactical solution was a continuous theme. Basically we had 2 weeks to get it into production, which given the speed of enterprise scale software projects felt really tight, even for something apparently “so simple”. I began to wonder if the mentioning of the short time-scale was a subtle hint to get us to try and drop (or at least reduce) some of our practices, like TDD or refactoring. After all it had taken us two weeks in our current project just to get the continuous integration pipeline up and running! Also the service implementation was far from rocket science as there was virtually no “business logic” to speak of.

Fortunately the answer was a firm “no”. My client also believed that the fastest way to deliver this project was to do it properly, just at we had already been doing on the other one. Past experience suggested that when time was of the essence people sometimes regress to old habits in a belief that they can cut some corners and save time. Tactical solutions, which we all know have a habit of living way longer than anticipated, are particularly susceptible to this kind of thinking.

After 3 days (including the day of the initial meeting) we had the core build pipeline and web API up and running so that we could demo it and, more importantly, so that the downstream team could start coding against it. And we had a decent set of unit and acceptance tests in place, written in a test-first manner. By the end of the week we were good to go as far as the web API aspect was concerned and a few days after that we had written the other tools and scripts we needed to handle the upstream data.

Sadly we didn’t quite make the deadline. But that wasn’t down to the software being late, it was all the other enterprise-y problems, such as getting the staging and production environments built, and testing integration done between the dependent services. This had a knock-on effect on the deployment scripts as infrastructure was reconfigured late in the day.

It’s great that we kept the faith and delivered a time-critical project using all the same practices we used for more lengthy projects as it showed that techniques like TDD and refactoring do not slow you down. Of course what was different this time around as opposed to the start of our main project was that the team knew each other well, we managed to leverage a lot of the knowledge and tools we had gained earlier, such as a shared library of common infrastructure code. We also had logins and permissions that we could reuse rather than have to procure new ones, at least for the development side.

Where we did slow down was in the final deployment and testing in the staging environment. But this should be no surprise because none of us had any experience with this aspect of the development process with that particular client. Consequently many of the “non-functional” requirements such as deployment, logging and performance monitoring needed heavy tweaking to fit in with their operations model. We had prepared as much as we could from experience but every enterprise is slightly different.

It feels silly now to have even doubted my client’s loyalty to our development process. They may still have many other classic enterprise traits but at least a lack of faith in agility is not one of them.

Wednesday 23 April 2014

Terminology Overdose

I’ve never been a fan of dependency injection frameworks, but what really bugs me most is the use of the term “injection”. It grates on me when I hear programmers talking about using “constructor injection” or “setter injection”. Sorry, but what you’re talking about is “passing parameters”, or if you prefer “passing arguments”. It’s bad enough that we already have two terms for the things we provide to a function to vary how it behaves, why do we need a third?

Just think about it for a moment, what exactly does “injection” mean? It’s a finely-controlled way to force a substance into some other material. Does passing a parameter sound like anything out of the ordinary? No. Now, if the framework used reflection to set a private member variable because there was no natural way to do it, then I’d be happier with the term. But only for that use case. Maybe that’s how these frameworks started out, but it sure isn’t what seems to be going on these days.

So let’s be clear, designing a class that depends on another which is typed via an interface rather than a concrete class is not anything new. It’s just the age old advice about Programming to an Interface, Not an Implementation, but taken quite literally. The fact that you pass the dependent object via the constructor [1] is just Programming 101, it doesn’t need any sort of fanfare.

[1] Although you could use a setter, don’t.