Thursday, 26 November 2015

The Cost of Not Starting

The idea of emergent design is uncomfortable to those at the top and it’s pretty easy to see why. Whilst there are no real physical barriers to overcome if the software architecture goes astray, there is the potential for some significant costs if the rework is extensive (think change of platform / paradigm / language). In times gone by there was a desire to analyse the problem to death in an attempt to try and ensure the “correct” design choices were made early and would therefore (theoretically) minimise rework.

In a modern agile world however we see the fallacy in that thinking and are beginning to rely more on emergent design as we make better provision for adapting to change. It’s relatively easy to see how this works for the small-scale stuff, but ultimately there has to be some up-front architectural choices that will shape the future of the system. Trying to minimise the number of up-front choices to remain lean, whilst also deciding enough to make progress and learn more, is a balancing act. But the cost of not actually starting the work and even beginning to learn can definitely be dear if the desire is to move to a newer platform.

A Chance to Learn

I recently had some minor involvement in a project that was to build a new, simple, lookup-style data service. Whilst the organisation had built some of these in the past they have been on a much older platform, and given the loose timescale at the project’s inception it was felt to be a great opportunity to try and build this one on a newer, more sustainable platform.

Essentially the only major decision to make up-front was about the platform itself. There had already been some inroads into both Java and .Net, with the former already being used to provide more modern service endpoints. So it seemed eminently sensible to go ahead and use it again to build a more SOA style service where it owns the data too. (Up to that point the system was a monolith where data was shared through the database.)

Due to there being an existing team familiar with the platform they already knew plenty about how to build Java-based services, so there was little risk there, aside from perhaps choosing a RESTful approach over SOAP. Where there would be an opportunity to learn was in the data storage area as a document-oriented database seemed like a good fit and it was something the department hadn’t used before.

Also as a result of the adapter-style nature of the work the team had done before they had never developed a truly “independent” service, so they had a great opportunity to try building something more original in an ATDD/BDD manner. And then there was the chance to make it independently serviceable too which would give them an initial data point on moving away from a tightly-coupled monolithic architecture to something looser [1].

Just Enough Design

In my mind there was absolutely no reason why the project could not be started based on the knowledge and decisions already made up to that point. The basic platform had been chosen and therefore the delivery team was known and so it would be possible to begin scheduling the work.

The choice of protocol and database were yet to be finalised, but in both cases they would be relying heavily on integrating a 3rd party product or library – there was little they had to write themselves. As such the risk was just in evaluating and choosing an approach, and they already had experience with SOAP and their existing database to fall back on if things didn’t pan out.

Admittedly the protocol was a choice that would affect the consumer, but the service was a simple data access affair and therefore there was very little complexity at this stage. The database was going to be purely an implementation detail and therefore any change in direction here would be of no interest to the consumers.

The only other design work might be around what is needed to support the various types of automated tests, such as test APIs. This would all just come out “in the wash”.

Deferring the Decision to Start

The main reason for choosing the project as a point of learning was its simplicity. Pretty much everything about it allowed for work in isolation (i.e. minimal integration) so that directions could be explored without fear of breaking the existing system, or development process.

What happened was that some details surrounding the data format of the 3rd party service were still up in the air. In a tightly-coupled system where the data is assumed to be handled almost verbatim, not knowing this kind of detail has the potential to cause rework and so it is seen as preferable to defer any decision it affects. But in a loosely-coupled system where we decide on a formal service contract between the consumer and producer that is independent of the underlying implementation [2], we have less reason to defer any decisions as the impact will be minimal.

As a consequence of delaying doing any actual development on the service the project reached a point well passed the Last Responsible Moment and as such a decision was implicitly made for it. The looming deadline meant that there was no time or resources to confidently deliver the project on time and so it was decided that it would be done the old way instead.

Cost versus Value

One of the reasons I feel that the decision to do it the old way was so easy to make was down to the cost based view of the project. Based solely on the amount of manpower required, it likely appears to be much cheaper to deliver when you’ve done similar work before and have a supply of people readily available. But that only takes the short-term cost into account – the longer term picture is different.

For a start it’s highly likely that the service will have to be rewritten on a newer platform at some point in the future. That means some of the cost to build it will be duplicated. It’s possible many of the same learning's could be done on another project and then leveraged in the rebuild, but what are the chances they’ll have the same deadline luxuries next time?

In the meantime it will be running on a platform that is more costly to run. It may only add a small overhead, but when you’re already getting close to the ceiling it has the potential to affect the reliably of the entire monolithic system. Being done on the old platform also opens the door to any maintenance being done using the “culture” of that platform, which is to tightly-couple things. This means that when the time finally comes to apply The Strangler Pattern it won’t just be a simple lift-and-shift.

Whilst it might be easy to gauge and compare the short-term costs of the two approaches it’s pretty hard to put a tangible value on them. Even so it feels as though you could make a judgment call as to whether doing it on a newer platform was “worth” twice or three times the cost if you knew you were going to be gaining a significant amount of knowledge about how to build a more sustainable system that can also be continuously delivered.

Using Uncertainty as a Driver

One of Kevlin Henney’s contributions to the book “97 Things Every Software Architect Should Know” discusses how we can factor uncertainty into our architecture and design so that we can minimise the disruption caused when the facts finally come to light.

In this particular case I see the uncertainty around the external data format as being a driver for ensuring we encapsulate the behaviour behind a service and instead formalise a contract with the consumer to shield them from the indecision. Whilst Kevlin might have largely been alluding to design decisions the notion “use uncertainty as a driver” is also an allegory for “agile” itself.

Eliminating Waste

There is undoubtedly an element of poetic justice in this tale. The reason we have historically put more effort into our analysis is to try and avoid wasting time and money on building the wrong thing. In this instance all the delays waiting for the analysis and design phases to finish meant that there was no time left to do it “right” and so we will in all likelihood end up generating more waste by doing it twice instead.

Also instead of moving forward the knowledge around building a more sustainable platform we now know no more than we do today, which means maintenance will continue to be more costly too, both in terms of time & money and, potentially more importantly, morale.

[1] Whilst a monolithic architecture is very likely to be tightly-coupled, it doesn’t have to be. The problem was not being monolithic per-se, but being tightly-coupled.

[2] Yes, it’s possible that such as change could cause a major re-evaluation of the tech stack, but if that happens and we had no way of foreseeing it I’m not sure what else we could have done.

Wednesday, 25 November 2015

Don’t Fail Fast, Learn Cheaply

The term “failing fast” has been around for a long time and is one that I’ve used since the early days of my career. When talking to other developers I’ve never had a problem with it, but using it with business folk has had a different reaction on occasion.

Defensive Programming

I first came across the term (I believe) when reading Steve Maguire’s excellent book “Writing Solid Code”. In it he describes how letting a process crash at the moment something really bad happens is often more desirable than trying to code defensively, as that just masks the underlying issue. Whilst it sucks for the user they stand less chance of even worse things happening, e.g. silent data corruption. I wrote about my own experiences with this type of coding in “The Cost of Defensive Programming”.

Resource Efficiency

The second context under which I met the “failing fast” term was when reading Michael Nygard’s fabulous book “Release It!” Here he was talking about avoiding queuing or doing work which was ultimately going to be wasteful. For example if you can’t acquire a resource because it is unavailable, it’s better to discover that early and fail then instead of waiting until the end at which point you need to throw work away. Once again I’ve told my own tale around this in “Service Providers Are Interested In Your Timeouts Too”.


The most recent use of “fail fast” I’ve encountered has appeared in relation to the delivery of software projects. In this guise we are talking about how to do just enough work to either prove or disprove that the project is in fact viable. At a smaller scale you could apply the same idea to a spike, which is often one part of a project and used to validate, say, a technical approach.

In essence what we’re saying is that if you’re going to fail, make sure you do it as quickly as possible. By delaying the work that will allow you to decide whether the idea is actually viable runs the risk of so much being done that you fall foul of the Sunk Cost Fallacy. Sander Hoogendoorn has a post titled “Failing fast” that talks about this idea in more detail.

Negative Connotations

As you can see the term has many uses and so I’ve found it quite natural when talking to fellow developers in any of these three contexts to say it – they’ve always understood the real meaning. However when talking to less technical people, such as business folk and higher level managers I’ve found that you can get a different reaction. Instead of latching onto the second word “fast”, they focus on the first word “fail”. What then happens is that the discussion turns into one about “why would we do something where we might fail?”. And “isn’t that a backwards step?”.

At this point you’ve now got explain that failing quickly is really not failing per-se, but actually a successful outcome from a business cost perspective. In essence you’ve already put yourself on the back foot and you’ve potentially lost your audience as they try and work out why this “agile” stuff is beneficial if it means you’re going to fail! Why wouldn’t you just do more analysis up front and avoid failing in the first place?

Learning Cheaply

A more positive sounding way of promoting the idea is instead to focus on the point of the exercise, which is to learn more about the problem. And if we flip the notion of “fast” around and turn it into something the business really understands, money, we can talk about saving it by spending less to get an answer to our question. Also, where failing feels like a backwards step, we generally consider learning to be a cumulative process and therefore it sounds like we’re always making some sort of progress instead.

It’s all just smoke-and-mirrors of course, but in a world where everyone is vying for the company’s money being a little careful with your language may just tip the scales in your favour.

Tuesday, 24 November 2015

Missing the Daily Commute by Train

I started programming professionally just over 20 years ago and in all that time I have mostly commuted to my place of work either by car or train. My first role, straight out of university, was at the height of the recession in 1992 and so I pretty much moved to wherever it was going to be. After that I started contracting and did a couple of stints where I commuted by car for an hour each-way which pretty much convinced me that commuting by car any distance was less than desirable. Whilst I had been car-sharing and enjoyed some very interesting chats with my fellow programmer passenger it was still tiring and no fun when stuck in traffic (which eventually became a regular occurrence).

During that time my wife had tried commuting into London by train for her first job and found it was quite palatable. Hence it felt as though I either took a contract on my doorstep (which was unlikely), we moved house, or I headed into London by train [1]. And so I spent the better part of the next 20 years commuting into London and its suburbs.

All Quiet on the Writing Front

You may have noticed that my writing activities have taken a serious nosedive over the last 6 months and that’s almost entirely due to me taking a contract that was once again almost on my doorstep. A chance to commute by car for only 20 minutes a day, and in the opposite direction to all the traffic (which was heading into Cambridge) felt like too good an opportunity to pass up. It wasn’t a hands-on development role like I’ve been doing for the past 2 decades but, frankly, given the short commute I was happy to try my hand at some pure consulting for a change. And I’m glad I did as I learned heaps of stuff in the process [2].

Having such a short commute by car has been an absolute delight. I’ve left the house in the morning, and the office in the evening, when I felt like it rather than to meet a timetable. And the short drive time has meant me spending more time at both ends of the day with the wife and kids [3]. I never quite made it home to enjoy dinner every day with them as getting out of the business park’s car park was a nightmare at 5 pm; but it was close.

That said it seems a little churlish to complain about the lack of time I’ve had to write either for this blog or the ACCU journals. Clearly I have had the time, in the evening for example, but I’ve (implicitly) chosen to spend it differently. As I look back over my career I now begin to understand some of the comments that my colleagues have made in the past around how “lucky” I was to have a regular train-based commute.

A Time to Learn

My journey into London consists initially of 45 minutes solid travel followed by “an amount” of time on the underground rail network which has been anywhere from around 10 to 30 minutes. That first solid block of time has been great for really getting my teeth into a spot of reading, gaming or writing (articles or code) as I nearly always get to sit down, even if it’s on the carriage floor. The underground stretch is “standing room only” but still perfectly fine for reading a paper journal, like MSDN or one of the ACCU publications, as they are easy to hold and manipulate even on a very crowded train.

In the early days when a development capable laptop cost in the region of “thousands of pounds” I spent most of my time reading books about C++, Design Patterns, Windows internals, etc. I also read a variety of journals that have long since gone out of print such as C++ Report, MSJ, CUJ, Application Development Advisor and Dr Dobbs. Pretty much the only one left from this era that’s still in print is MSJ, but now under the name of MSDN Magazine. Luckily the ACCU journals, which I only discovered when CUJ disappeared (circa 2005), are also still in printed form.

Deliberate Practice

There is a saying that goes:

In theory there is no difference between theory and practice. In practice there is.

And so I’ve spent plenty of time coding on the train too. The train line I travel on has never even had decent mobile phone reception and so the idea of using the Internet is somewhat laughable. But given that my background has mostly been writing applications and services in C++ this has hardly been a problem to date, and is, in my mind, even highly desirable (See “The Developer’s Sandbox”). Most of what you see on my web site and GitHub page is code that has been written in these little 45 minute stints to-and-from work. Occasionally I’ve done a little bit in the evenings or in the office when the tool has been used to actually help me with my day job, but I’ve never worked anywhere that provides “20% time” - even for permanent staff (and I’d never expect to as a freelancer either).

Habitual Behaviour

It shouldn’t have come as any real surprise that my non-work activities would fall by the wayside the moment that my commute disappeared. After all I’ve taken a few lengthy periods of time off between contracts in the past and despite my best efforts to get motivated and spend it productively I’ve instead found it easy to fritter it away (but in a really nice way, e.g. having time with the family).

It took me a long time to realise just how much structure I need in my life to get “other” things done. Whilst I’d like to believe that I don’t need this kind of formality I’m really just kidding myself. Just as I need my notebook (paper based, of course) by my side to make notes and not forget things, so I need some semblance of order throughout my day to help guide me.

As I write this I’m beginning to wonder how much of what I said in “Code by Day, Design by Night” actually describes my behaviour outside “work time”? I guess my commute means I’ve always had “20%” time, it’s just that it’s had to be on top of my 100% working day. Either way I now realise how valuable to my career that time actually is.

[As if to prove a point I’m only just proof-reading this and hitting the “publish” button now that I’m back commuting again…]

[1] Another choice would have been to go permanent again but I had just started to enjoy the freedom of freelancing and was reluctant to give that up again so quickly.

[2] Hopefully I’ll be filling these very pages with musings from that gig in the coming months.

[3] Let’s put aside for a moment the fact that working from home means a zero-minute commute. Also, given the disruption I caused just by being around when the kids are supposed to be getting ready for school, I’m not convinced my wife always saw it as a bonus :o).