Monday, 20 October 2014

What’s the Price of Confidence?

I recently had one of those the conversations about testing that comes up every now and then. It usually starts off with someone in the team, probably the project manager, conveying deep concerns about a particular change or new feature with them getting twitchy about whether it’s been tested well enough or not. In those cases where it comes from the management side, that fear can be projected onto the team, but in a way that attempts to somehow use that fear as a tool to try and “magically” ensure there are no bugs or performance problems (e.g. by implying that lots of late nights running manual tests will do the trick). This is all in contrast to the famous quote from Edsger Dijkstra:

Program testing can be used to show the presence of bugs, but never to show their absence

Financial Loss

The first time this situation came up I was working at an investment bank and the conversation meandered around until a manager started to get anxious and suggested that if we screw up it could cost the company upwards of 10-20 million quid. Okay, so that’s got mine and my colleague’s attention, but neither of us could see what we could obviously do that would ensure we were “100%” bug free. We were already doing unit testing and some informal code reviewing, and we were also delivering to our development system-test environment as often as we could where we ran in lock-step with production but on a reduced data set and calculation resolution.

In fact the crux of the argument was really that our UAT environment was woefully underpowered - it had become the production environment on the first release. If we had parity with production we could also do the kind of regression testing that would get pretty close to 100% confidence that there was nothing, either functionally or performance wise, that was likely to appear after releasing.

My argument was that, knowing what we do from Dijkstra, if the company stands to lose so much money from us making a mistake, then surely the risk is worth the investment by the company to help us minimise the chances of a problem slipping through; us being human beings and all (even if we are experienced ones). Bear in mind that this was an investment bank, where the team was made up of 6 skilled contractors, and we were only asking for a handful of beefy app servers and a few dozen blades to go in the compute grid. I posited that the cost of the hardware, which was likely to be far less than 100K, was orders of magnitude lower than the cost of failure and it was only a month or two of what the entire team costs. That outlay did not seem unrealistic to me given all the other project costs.

Loss of Reputation

The more recent conversation was once again about the parity between the pre-production and production environments, but this time about the database. The same “fear” was there again, that the behaviour of a new maintenance service might screw up, but this time the cost was more likely to be directly expressed as a soiled reputation. That could of course lead to the loss of future business from the affected customers and anyone else who was unhappy about a similar prospect happening to them, so indirectly it could still lead to some financial loss.

My response was once again a sense of dismay that we could not just get the database restored to the test environment and get on with it. I could understand if the data was sensitive, i.e. real customer data needed masking, or if it was huge (hundreds of TBs, not a couple of hundred GBs) although that would give me more cause for concern, not less. But it wasn’t, and I don’t know why this couldn’t just be done either as a one off, which is possibly more valid in this scenario, or better yet to establish it as a practice going forward.

The cultural difference at play here is that the databases appear to be closely guarded and so there are more hoops to go through to gain access to both a server and the backups.


Maybe I’m being naive here, but I thought one of the benefits of all the effort going into cloud computing is that the provisioning of servers, at least for the run-of-the-mill roles, becomes trivial. I accept that for the bigger server roles, such as databases, the effort and cost may be higher, but given how sensitive they can be to becoming the bottleneck we should put more effort into ensuring they are made available when the chances of a performance problem showing up is heightened. At the very least it must be possible to temporarily tailor any test environment so that it can be used to perform adequate testing of the changes that are a cause for concern.

Continuous Delivery

This all sounds decidedly old school though, i.e. doing development and a big bang release where you can focus on some final testing. In his talk at Agile on the Beach 2014 Steve Smith described Release Testing as Risk Management Theatre. A more modern approach is to focus on delivering “little and often” which means that you’re constantly pushing changes through your pre-production environment; so it has to be agile too. If you can not or will not invest in what it takes to continuously scale your test environment(s) to meet their demands then I find it difficult to see how you are ever going to gain the level of confidence that appears to be being sought after.

One thing Steve simplified in his talk [1] was the way features are pushed through the pipeline. In his model features go through in their entirety, and only their entirety, which is not necessarily the case when using practices such as Feature Toggles which forces integration to happen as early as possible. A side-effect of this technique is that partially finished features can go out into production sooner, which is potentially desirable for pure refactorings so that you begin to reap your return in the investment (ROI) sooner. But at the same time you need to be careful that the refactoring does not have some adverse impact on performance. Consequently Continuous Delivery comes with it’s own set of risks, but the general consensus is that these are manageable and worth taking to establish an earlier ROI, but you must be geared up for it.

One of the questions Steve asked in his talk was “how long does it take to get a code change into production?” [2]. Personally I like to think there are really two questions here: “how long does it take to reap the ROI on a new feature or change” and “how long does it take to roll a fix out”. A factor in both of these questions is how confident you are in your development process that you’re delivering quality code and doing adequate (automated) testing to root out any unintended side-effects. This confidence will come at a price that takes in direct costs, such as infrastructure & tooling, but also indirect costs such as the time spent by the team writing & running tests and reviewing the design & code. If you decide to save money on infrastructure and tooling, or work in an environment that makes it difficult to get what you need, how are you going to compensate for that? And will it cost you more in time and energy in the long run?


[1] I asked him about this after his talk and he agreed that it was a simplified model used to keep the underlying message clear and simple.

[2] This question more famously comes from “Lean Software Development: An Agile Toolkit” by Mary and Tom Poppendieck.

1 comment:

  1. Hi Chris

    You're entirely right that I simplify the implications of releasing more frequently. As with Trunk Based Development and Continuous Integration, Continuous Delivery requires more of an investment in the codebase if it is to be always releasable. Increased complexity due to Feature Toggles, Branch By Abstraction is certainly possible... and choosing where to put the toggle can indeed be tricky if you are writing a high performance system.

    With Dual Value Streams, the aspiration is for the codebase to always be releasable and to have the same answer for features and fixes i.e. "as quickly as you like". For the majority of organisations that is a long way off.