Friday, 25 September 2015

Choosing a Supplier: The Hackathon

One would hope in this day-and-age that when hiring a candidate for a role as a developer the interview process would include some element of actually doing what it is we do - programming. This could be as simple as submitting an offline coding test, but better yet a face-to-face session where you pair with one of the people you’re likely to work with. Either way, just as an artist would expect to show their portfolio, we should expect to do something similar – not just write a CV and then talk about it.

If that’s the case for hiring a single developer, why shouldn’t choosing a software supplier follow a similar process, after all there is probably far more money at stake. And that’s really what the whole interview process is all about – it’s a form of risk management – you’re trying to find a supplier that you’re confident is going to deliver what you really need in a timely manner and for an acceptable cost.

Show Your Workings

The Hackathon, which probably sounds a bit too hipster for some people’s taste, is the embodiment of this idea – get the potential suppliers to show how they actually produce software, albeit on  a very small scale. My last two days have just been spent on one of these and it was great fun, if a little nerve-wracking.

The consultancy firm I’m currently working through (Equal Experts) has been involved in one of these before, quite recently, and they relish the chance to do this kind of selection process more as it plays to their strengths – delivering working software incrementally using modern development techniques [1].

By all accounts this particular one was slightly different to the previous one, although it still followed a similar outline. There are a number of very small teams (3) representing the different suppliers. Each team has a small number of people (~4) that covers whatever skills they think they’ll need based on the brief. In our case we had 3 devs and a UX person, but obviously we could also act in a lesser capacity as a BA, QA, Ops, etc. too to cover all the bases.

The overall structure was that we would spend a short period learning a bit about the company we were pitching to and about the problem they wanted us to tackle. Each team was then given a separate “war” room in which they could work for roughly two days before presenting back at the end.

Where in the previous Hackathon they got more freedom about what they wanted to tackle (i.e. a higher-level problem statement) this one had a more specific problem to be solved. The problem was also directly related to the actual problem they chosen supplier would eventually be asked to work on, which makes sense.

During the two days various people involved in the selection process would come round and visit us to see what we’d been up to and that would also give us an opportunity to ask any questions we had about the problem. If we really needed to we could have called upon The Business at any time to come and help us but their visits were frequent enough that it meant we never needed to go that far.

Our Approach

Naturally I can’t say anything about the problem itself, but suffice to say that it was far bigger than anything we could expect to build in just two days. However that didn’t stop us tackling it in the same way we would a real project. We did some up-front analysis to explore the problem domain a fair bit, make some key architectural decisions to decide what we were going to build, create a backlog with some stories on it and then start coding. We also littered the walls with sheets of magic whiteboard [2], post-it notes and index cards that covered our questions, answers, architecture, backlog, etc.

We were able to to do a little bit of work beforehand, such as creating a GitHub repo that we could all check we had access to, along with a dummy build in AppVeyor to cover the CI part. The guest Wi-Fi wasn’t brilliant [3] which meant pushing and pulling to/from GitHub was a bit laggy, but it was usable and the cloud based CI monitor was stable enough.

Despite it being unofficially described as a “Hackathon” we decided we would do the best we could to show how we worked in practice, rather than try and push out as much code as possible. Whilst we no doubt could have got more code working if we had cut corners on some of the other parts of process (e.g. analysis or testing) we would not have given a fair representation of what we do (IMHO). I’ve touched on this before in “Keeping the Faith Under Pressure” and I’m pleased with what we produced. I’m perfectly happy to call the code I wrote over the two days “production code”.

Day One

After doing our early analysis we started putting together the walking skeleton which was a text-book example of Conway’s Law as we initially kept out of each other’s way whilst we got the boilerplate stuff together. Of course this separation didn’t last long as by the end of the day the temporary assemblies and folders had all gone in a simple refactoring exercise so that we had a coherent codebase to build from.

We finished the day with our technology stack up and running (browser-based client + REST API), watched over by AppVeyor, and a very thin slice of functionality in play which we demoed to a couple of the stakeholders.

Day Two

The following day was essentially a few hours shorter to allow time for each team to present to the panel, that also meant we’d need to factor in some time to put the presentation together. In essence we only had ~5 hours to implement the next set of features and so got one of the stakeholders in first-thing to make a priority call so we knew where to focus our efforts during the morning.

During the time we three devs were building a working product our UX expert was building a clickable prototype to help explore the problem further ahead. This had the benefit of us being able to see the context in which the data was used and therefore we could better understand the bigger picture. In such a short timeframe it was perhaps difficult to see what beneficial effect it had on our design and thinking but what it did show clearly was how we work and how important we believe UX to be to the development process in general.

We stopped coding an hour before our presentation slot to give ourselves plenty of time to think about what we needed to say and show. Our UX expert had surreptitiously taken plenty of photos of us engaging with the stakeholders along with the evolving boards and backlog so that we could put together a compelling story about our approach.

We talked for 30 minutes about the process and architecture, and gave a demo of what we’d built to date. We then had about 15 minutes to answer questions from the panel, both about our approach and how we might tackle some of the questions we hadn’t yet answered in the code.

Each team got to present to the panel in isolation, but we all hung around until the end at which point we each did a very brief version of the presentation to the other teams. This was really interesting as up to that point we had no idea what the others were up to. For example we chose a thin-client whereas the other two chose thick-clients. We used post-its and a makeshift whiteboard for our notes and product backlog whilst another used an online tool and the third didn’t mention their approach.

Wrapping Up

Did the exercise achieve what it set out to do? As I’m not the client I have no idea what their final decision is or whether the eventual product has met their needs, because clearly it hasn’t been built yet. But I believe their behaviour suggested that they were pretty pleased with what they had seen from everyone. I think they seemed surprised that the three teams had behaved quite differently and so they probably got a lot more out of the exercise than anticipated as each team would probably have asked different questions and explored different avenues. Given that this was a real problem we were working on I’m sure there is a lot of value in that alone.

Personally I went into the process somewhat nervous. My current role is a real departure for me - it’s not a hands-on development role. As such I thought I might not be “match fit” even with 20 years of coding experience behind me. What I forgot though was that the point of the process was to be ourselves and write code as I would for real and that just came back naturally. Hopefully I added an equal amount of value to the team and gave as good an account of myself as the others appeared to.

So, is this is the way I think tenders should be done? Yes, but I’m just a developer :o). I did ask someone at Equal Experts about how it compared cost-wise for them given that the RFP (Request for Proposal) process can also be pretty time consuming too and he suggested it wasn’t that far off. Not having done any others I can’t say whether the client had a disproportionate number of people involved at their end but given the potential sums at stake I’m sure they saw it as a good investment. I certainly hope more companies do it in the future.

[1] Apologies if that sounded like a thinly-veiled sales pitch, it wasn’t mean to be. I was just trying to describe how they see themselves; I’m an associate, not an employee.

[2] Somewhat unfortunately the room we where in had frosted glass and the sheets of magic whiteboard didn’t really stick to it. Without any blu-tack or sellotape we had to use even more post-it notes to hold up the whiteboards!

[3] The Wi-Fi dropped out every now and then and had limited bandwidth which meant that any image-heavy web pages could take a while to load.

Monday, 7 September 2015

Stand-Up on the Beach

aotb-stand-up-lBack in April I performed a stand-up comedy routine as a lightning talk on an unsuspecting audience at the ACCU Conference (see my previous blog post The Daily Stand-Up). At this year's Agile on the Beach I got to have another go at it, but not as a lightning talk, this time it was going to form part of the pre-conference evening event (aka “pasty night”).

Whereas the ACCU conference is almost entirely about software craftsmanship, Agile on the Beach has mostly other streams covering all aspects of the software development process. As such I needed to come up with a different routine that catered for a much wider audience. Given its nature I swapped out many of the programming specific puns and replaced them with something (hopefully) more appropriate, i.e. more process related. Also there was a bonus track on continuous delivery this year so that meant I could throw in some relevant content there too.

Once again it seemed to go down pretty well, by which I mean the audience groaned appropriately :o). So for those of you unfortunate enough to have missed it, here is the set:

“Was the Tower of Pisa built using lean manufacturing?”

“Agile methods might be all the rage these days but I reckon the writing’s on the wall for Kanban.”

“Are cross-functional teams just a bunch of grumpy LISP programmers?”

“Some say Scrum’s sprint goals are necessary for motivation, but Kanban is also about cracking the WIP.”

“If you want to adopt Agile Release Trains do your developers need to use Ruby on Rails?”

“The last census had a box labelled ‘Religion’, so I put ‘TDD’.”

“When I’m working from home I like to get the kids involved in my coding; I call this ‘au pair programming’.”

“My team’s not really got the hang of this agile stuff – our successes are just stories and our fails are all epics.”

“If you have too many information radiators do you get scrumburnt?”

“The other day the product owner asked me why all our acceptance tests only covered happy paths. I said they’re rose tinted specs.”

“Agile’s a lot older than many people think – Dick Turpin was doing stand-up and deliver years ago.”

“If poor quality code results in technical debt, does that make bad programmers loan sharks?”

“Some say C# and Java programmers overuse reflection, but given the quality of their code I’d say they aren’t reflecting enough.”

“Is it me or are C# and Java so similar these days they’re harder to tell apart than The Munsters and The Adams Family?”

“Our system has five-nines reliability, it’s usually working about 45% of the time.”

“When working for New Scotland Yard do developers have to work on a special branch?”

“I really dig software archaeology.”

“As a baby I was brought up on Farley’s, and I’m bringing my children up on Farley’s too – at bedtime I read them a chapter from Continuous Delivery.”

“Are modern developers obese because they rely so heavily on syntactic sugar?”

“Is the removal of a dependency injection framework from a codebase known as ‘spring cleaning‘?”

“If you think keeping up with the Kardashians is hard, try JavaScript frameworks.”

“When it comes to drawing diagrams of micro-service architectures, I never know where to draw the line.”

“The other day I went to the dentist and he told me I had a scaling problem. I said that’s awkward as I’ve no room for any more teeth.”

“Our DR strategy is not so much active/passive as passive/aggressive – when it fails we sit around and tut loudly until someone fixes it.”

“Don’t upgrade your database as the SQL is never as good as the original.”

“I blame Facebook for the quality of modern SQL – young developers are so obsessed with LIKEs.”

“When Sherlock Holmes talks of a three-pipe problem does he mean it needs grep, sed, awk and sort?”

“I don’t understand why the police are profiling criminals – surely they shouldn’t be helping them to commit crimes more efficiently?”

“C++ has complexity guarantees – if you write C++, it’s guaranteed to be complex.”

“Some people are like ‘chars’ in C, they get promoted for no apparent reason.”

“Would our codebase be healthier if we only used natural numbers?”

“One of the hardest problems in computer science is dealing with nans – they always want you to fix their machines.”

“In my thirties I blew loads of cash on a monster gaming rig, I think I was suffering from a Half-Life crisis.”

“I forgot my password the other day so I tried a dictionary attack – I just kept hitting the administrator over the head with a copy of the OED until he let me in.”

“My wife and I have been together for many years so I thought I’d get a token ring, but it seems you can only get Gigabit Ethernet these days.”

“My son was being hassled at school to share his music on the internet with BitTorrent. I told him not to give in to peer-to-peer pressure.”

“When flying here I had to put my phone into airplane mode, at which point it assumed the crash position.”

“Are electronic cigarettes just vapourware?”

“I spent all day trying to upload a picture of Marcel Marceau but the server kept responding with ‘415 Unsupported Mime Type’.”

Wednesday, 29 April 2015

The Daily Stand-Up

At the ACCU Conference this year I managed to fulfil a lifelong ambition - do a stand-up comedy routine. OK, so it was only 5 minutes long (I did it as a lightning talk) and the audience and content was heavily programmer centric, but even so it seemed to go down a treat. A few people asked me to post the “set” that I did, so here it is.

The content all comes from my own tweets between 2009 and 2013 (except for a couple of more recent ones). Whilst this is the majority of the elapsed time I’ve been on Twitter, it only accounts for half of my tweets so far. This is because I was tweeting less than 100 times per-month at the beginning whereas I’m well over 400 per-month these days. Who says I’m an addict. It also means there should be enough content to do another routine next year (you have been warned).

I’d like to believe that it’s all my own work but the phenomena that is Cryptomnesia suggests this likely won’t be the case. Even if many are my own original thoughts it’s very likely others have been there well before me. Still I hope that you find some originality somewhere in this batch of 30 or so quips and puns.

If you’re trying to recreate the atmosphere of the original event then you’ll need to deliver these in a deadpan manner.

“My wife bought me some jigsaws of famous computer scientists for Christmas. She asked how I was getting on. I said it’s Turing complete.”

“I think Turing was a wicked mathematician and computer scientist. In fact I’d say he was da bombe.”

“Speaking of mathematicians, when they work-out do they focus on their abs?”

“Was the Tower of Pisa built using lean manufacturing?”

“I know agile is all the rage but I think the writing is on the wall for kanban boards.”

“Are cross-functional teams just a bunch of grumpy LISP programmers.”

“I’ve been writing more functional code lately. I recently tried a few numerical recipes with currying functions, but all I got was a NaN.”

“The recent census had a box marked ‘Religion’. So I put ‘TDD’.”

“Why are we surprised that Git developers are so fanatical when everything they do is dictated by a SHA?”

“I’m getting the hang of mocking. During the last code review I said ‘this is rubbish!’, ‘what were you thinking?’ and ‘are you an idiot?’”

“When I work from home I like to get the kids involved with my coding. I call it Au Pair Programming.”

“Just because it’s called code doesn’t mean it has to be cryptic.”

“If technical debt is taking shortcuts, does that make bad programmers loan sharks?”

“Our code is so bad we don’t say ‘here be dragons’, we say ‘here be weeping angels’ because every time you blink the code looks worse.”

“I blame Facebook for the quality of SQL produced by young developers. They’re obsessed with likes.”

“When I see ‘static’ used in multi-threaded code it makes my hair stand on end.”

“I like the actor model for handling concurrency, but I find it suboptimal due to the constant need to talk to their agents.”

“I’ve been refactoring my code towards a more single-entry/single-exit style. It worked out well in the beginning but now there’s less impact due to the law of diminishing returns.”

“Singletons are like onions, the moment you touch one you end up in tears.”

“Inexperienced programmers are like inexperienced swimmers, they rely too heavily on floats.”

“I’m not saying the C programming standard is loose but I reckon it should be called Subjective-C.”

“C++ comes with complexity guarantees. If you write C++ it’s guaranteed to be complex.”

“Our system has five-nines reliability. It usually works about 45% of the time.”

“When Sherlock Holmes talks about a three-pipe problem does he mean it requires grep, sed, awk and sort?”

“My daughter asked what ABBA stands for. I said 43,962.”

“Once I pulled into an overflow car park and found myself right back at the entrance.”

“I don’t understand the point of graph databases. There must be an easier way to draw a bar chart.”

“Isn’t it depressing that everything we create ends up in a folder called ‘bin’.”

“If a team mate tells you to use more lynx they mean you have a problem with body odour, not that you need to test you web site with a text-only browser.”

“If you’re going to write a résumé for a C++ job don’t forget to add your cv-qualifiers.”

“I’ve enlisted the help of Simon Cowell to help me manage my passwords. It’s called X-Factor authentication.”

“Our interfaces are less cohesive and more adhesive. Developers keep sticking new methods on the end.”

“Our codebase has so many try/catch blocks that it often feels like a game of throwns.”

“When Amazon start using drones to deliver parcels I’m going to mess with them by sticking a robots.txt on the front door.”

Monday, 26 January 2015

IIS Hosted Web API Not Caching Internal HTTP Responses

Executive Summary: Set “Load Profile” to true for the IIS application pool.

I had a fun afternoon last week trying to help a colleague work out why the responses to the internal HTTP requests from one of my team’s web APIs weren’t being cached by the .Net HTTP component. It appeared that this was all working fine in the pre-production environment, but not production itself.

Caching HTTP Responses

The code (which I’m not overly familiar with) uses one of the standard .Net HTTP request classes (WebRequest I believe) to make authentication calls to another internal, RESTful service. It was anticipated by the owner of this service that responses could be cached by the .Net framework thereby alleviating the need for our own service to manually implement a caching layer. Naturally the point of (briefly) caching the HTTP responses was to reduce load on the underlying authentication servers and they had made sure that their response headers were correctly configured to make this “all just work”.

In theory this was true, but in practice it turned out to be a little more complicated once the code as running under the real IIS worker process (w3wp.exe).

Administrator Rights

My colleague had already done a fair bit of analysis and experimentation with a spare production server and the wonderful Process Monitor tool from Sysinternals. By watching the w3wp.exe process whilst he sent requests with CURL he could see that the process was getting Access Denied errors when accessing parts of the Registry and file-system.

He had also tried configuring the IIS application pool to run under another non-service account, i.e. a support user’s account, and that appeared to work. Comparing the behaviours with Process Monitor for the two accounts he could verify that the problem was one of permissions, or so it seemed.

A natural response when seeing Access Denied problems is to grant the underprivileged account admin rights and that’s what he did and lo-and-behold things started working. However further experimentation suggested that this didn’t always fix the problem, which was mightily confusing.

At this point I joined the party...

Verifying Admin Rights

Past experience has taught me that just adding an account to the Administrators group is not always enough - it’s best to actually check the security attributes of the process in question to double-check that it really has become elevated. Conversely when removing rights I make the same check to ensure they’ve been revoked correctly.

For this check I used another of Sysinternals tools - Process Explorer. We hit an initial “Access Denied” for some process information [1] so I used the technique of starting it with “psexec -s -i” which was probably a bit of overkill (See “Process Explorer Failed to Display DLLs or Handles” for when this might be necessary). After right-clicking on w3wp.exe to show the process’s Properties and then switching to the Security tab I could see that the process was indeed a member of BUILTIN\Administrators.

It later transpired that the w3wp.exe process not correctly acquiring or relinquishing its power might have been the source of some of the earlier confusion. I too witnessed the removal of the account from the local Administrators group and after recycling the web site application pool found the w3wp.exe process still appeared to be a member. The process start time strongly suggested it had been restarted too when we recycled the app pool.

The Default Profile

We then switched back to Process Monitor and looked at the Registry and file-system access. The HTTP client was accessing various “Internet Settings” registry keys under the HKEY_USERS tree - HKU\.Default in particular. It was then reading and writing to an internet cache that was stored under “C:\Windows\Temp\Temporary Internet Files\...”.

I had always suspected that user profiles might have a role to play in this problem because I remembered back in the dim-and-distant past that the native WinInet library couldn’t (or perhaps shouldn’t) be used for making HTTP calls when running as a service. Microsoft later added another native component called WinHttp that could be used by services. IIRC the main problem back then was around configuring proxy settings, and whilst we weren’t suffering from that specific ailment this time, the registry keys it appeared to be accessing was pointing in the same direction.

We then revoked the admin rights from the service account, checked they had been revoked with Process Explorer, and then executed the test HTTP requests again so I could see whether the same registry and file-system access patterns occurred, and they did. A brief play with the REG command-line tool and DIR helped prove that, although the keys and folders existed, the unprivileged service account wouldn’t have access to them.


This suggested to me that the problem was likely down to the fact that the service account was being used by the w3wp.exe process without a user profile being loaded. As such, due to the lack of a profile, it was trying to do the same thing it would with just a default profile on hand, but it didn’t have the necessary rights to update the cache in the “C:\Windows\Temp\Temporary Internet Files” folders.

Changing the Application Pool setting for “Load Profile” to “true” seemed to fix the problem. We then applied this to another spare production server (after first proving that it was not behaving correctly) and that worked too.

Admin Rights Not Required

I think this tale highlights why the answer to many deployment issues can be solved by apparently making the service account a local administrator. The huge gain in power that right confers allows the process to read, and more importantly write, to so much of the system. Even if the correct answer is an entirely different setting (as in this case) taking the sledgehammer approach can get you running, but at a potentially disastrous cost [2].

Anyone doing service style deployments on Windows should know how to weld tools like NTRIGHTS and CACLS (or their more modern equivalents) to grant just the right amount of permissions for their processes.


[1] I’ve spent so many years supporting pre-Windows Vista systems that I still forget occasionally that I need to elevate admin tools. I usually start with a non-elevated command prompt for safety and then wonder later why a tool doesn’t work properly because I’ve forgotten that I’ve gone for safety first. Which of course is the whole point :-).

[2] If the service is compromised it might grant an intruder further access. Even if it’s an internal service putting it in a sandbox helps ensure it does what you think it should.

Thursday, 22 January 2015

Configuration Can Be Simpler

Every time a developer creates an XML based IoC container config file I’m sure not just one, but an entire litter of kittens die. Not just them but any build managers [1] and support staff too. I’ll admit that I have yet to work on a project where I’ve seen any clear reason to even use an IoC container. I’m sure they exist, it’s just where I’ve seen them used it’s as a workaround for some other problem, e.g. a weak build and deployment pipeline [2].

Configuration Files are a User Interface

In “Diagnostic & Support User Interfaces” and “Testing Drives the Need for Flexible Configuration” I tried to show how system configuration files are a form of Human Computer Interface (HCI). This is a view that is also promoted in Michael Nygard’s wonderful book “Release It! Design and Deploy Production-Ready Software”.

If that is the case, then files with hundreds or thousands of lines of XML is not my idea of a pleasant UI to work with. One knee-jerk reaction might be to manage the complexity by trying to use an extra level of indirection and generate the XML config file(s) from a much simpler DSL written with, say, a scripting language. To me this just appears to be an attempt to overcome the XML “angle-bracket tax” instead of getting to the heart of the problem, which is using an overly-flexible and complex tool to solve what might be a much simpler problem.

Example: Configuring a Chain of Responsibility

To try and explain my position further I’m going to use a real-world example that has come up. Within a web API I worked on there was a component that handles the authentication aspects of an internal system. This component used the Chain of Responsibility design pattern to achieve its goals so that a client request could be fired into the head of the chain and somewhere along it one of the providers would answer the question of who the principle was, even if the eventual answer was “unknown”.

The section of an IoC container XML config file needed to describe this chain was not pretty - it was littered with fully qualified types names (assembly, namespace and class/interface). Whilst I grant you that it is possible to configure a Directed Acyclic Graph (DAG) of objects using the tool and its config file I would suggest that it is probably not desirable to. The example - a chain of responsibility - does not call for that amount of power (an arbitrary DAG) and I think we can do something much simpler and with very little coding effort.

Key/Value Configuration

I’ve always found the old fashioned .ini file style of configuration files one of the most easy to read and maintain (from a multi-environment perspective). One of the reasons for this I believe is that the configuration is generally flattened to a simple set of key/value pairs. Having one entry per line makes them really easy to process with traditional tools and the constraint of one-setting-per-line forces you to think a bit harder about how best to break-down and represent each “setting”. If the configuration value cannot be distilled down to a simple value then perhaps a richer structure might be needed (indicated with a simple setting like a filename), but just for that one setting, it does not invite unnecessary complexity elsewhere.

Consequently I would look to try and express the structure of our example “chain” in a single key/value entry, albeit in this case as a .Net style “App Setting”. This is how I would express it:

  . . .
  <add key=“Security:AuthenticationChain” 
       value=“Token,IP,Default” />
  <!-- component specific settings below it -->
  . . .

You’ll immediately notice that the name of the setting describes the structure we are trying to create - a chain - you don’t have to infer it from the structure of the XML or read an XML comment. You’ll also see that we haven’t included any implementation details in the setting’s value - each term is a shorthand (an abstraction if you will) for the component that rests in that position within the chain. This makes code refactoring a breeze as no implementation details have leaked outside the code and could therefore be hidden from a refactoring tool.

The Factory Method

If you’re thinking that all the complexity can’t have vanished, then you’d be right. But much of the accidental complexity has gone away simply by choosing to use a simpler and more restrictive representation. Parsing this representation only requires a handful of lines of code:

public static class Factory
  public static IProvider CreateChain(IConfiguration configuration)
    var key = “Security:AuthenticationChain”;
    var setting = configuration[key];
    var chain = setting.Split(‘,’).Reverse();

    IProvider root = null;
    foreach (var type in chain)
      root = CreateProvider(type, configuration, root);

    return root;

  private static IProvider CreateProvider(string type, IConfiguration configuration, IProvider nextProvider)
    switch (type)
      case “Token”:
        return new TokenProvider(..., nextProvider);
      case “IP”:
        return new IpProvider(..., nextProvider);
      case “Default”:
        return new DefaultProvider(..., nextProvider);

    throw new ArgumentException(“Unknown provider type ‘{0}’”, type);

If all the dependencies in this chain are purely functional (and they almost certainly should be for a Chain of Responsibility) then the factory and chain can be easily tested using component-level tests [3]. The only mock it requires is for the configuration object which is trivial to knock-up, even as a manual mock.

What About Adding New Providers?

One of the allures of IoC containers is the apparent ease with which you can “re-configure” your system and add support for a new component. Whilst that may be true for the deployment configuration step, I know that the development and testing cost of any new component is easily going to dwarf the miniscule amount of change required to the Factory Method shown above to support such a change. Unless the new component is going to be implemented in an entirely new assembly and there is a strict requirement not to touch the existing codebase, then you’ve already got a code change and re-deployment on your hands anyway.

Configuration Validation

One of the things in particular I like about manual factory methods is that they provide a suitable place to put validation code for your configuration. For example, in our authentication chain the final provider is the default one. This provider is “a provider of last resorts”, no matter what happens before it, it can always answer our query and return an answer, even if that answer is “no-one knows who they are”. The fact that this provider should always appear at the end of the chain means that there is the possibility to mis-configure the system by accident. Imagine what would happen if it was the first in the chain - every other provider would essentially be ignored.

One approach to this problem is to punt and let the documentation (if any exists) provide a warning; if the sysadmins then mess it up that’s their problem. Personally I’d like to give them a fighting chance, especially when I know that I can add some trivial validation (and even unit test its behaviour):

case “Default”:
  if (nextProvider != null)
    throw new ConfigurationException(“The ‘Default’ provider can only be used at the end of the chain”);
  return new DefaultProvider(...);

If you look closely you’ll see that I’ve also removed the final argument from the call to the DefaultProvider constructor. The great thing about expressing your design in code (for a statically compiled language [4]) is that once I decided to remove the final parameter from the DefaultProvider class ctor I could lean on the compiler and fix-up the call sites as appropriate. If one of the call sites wouldn’t have been usable this way I’d have spotted it immediately rather than having to debug a curious runtime issue some time later.

Test-Driven Design

I’ve heard it said that TDD done properly leads you naturally towards using an IoC container. Frankly I’m bemused by that statement because when I do TDD it leads me exactly towards what I describe above. Frankly I genuinely cannot see how the simplest step can be to go from a hardcoded implementation (i.e. the DefaultProvider in our example) to using a highly-configurable tool to define an arbitrary DAG - there is always something much simpler in between.

Even if some of the steps below are not actually coded up and just become a thought exercise, I’d still not end up reaching for the big guns  by the end:

  1. Identify a need to do authentication
  2. Create the default provider with TDD
  3. Hard-code it as the only implementation
  4. Identify a need for other authentication mechanisms
  5. Create them with TDD
  6. Hard-code the initial chain
  7. Identify a need to make the chain order configurable
  8. Introduce a factory (method) to house the creation logic, using TDD

As I said right back at the beginning, with a fluid development process and one eye on good design principles, the term “hard-coded” no longer remains quite such the pejorative it once was.


[1] Not a role I had come across until recently as the build pipeline and deployment process has always been maintained by the developers.

[2] When you can make a change in isolation, build your artefacts, run an extensive test suite and deploy changes to production quickly, many needs for components to be configurable through config files just evaporate.

[3] The difference between unit-level tests and component-level tests is really just the size of your “unit”. I chose not to say “unit tests” to make it clear that I am aware there would be more than one class under test in certain scenarios.

[4] Or a non-statically compiled language with some form of type annotations.

Friday, 16 January 2015

Terse Exception Messages - Part II

Quite unintentionally it looks like my post back in October last year “Terse Exception Messages” needs a second part. The initial post was about dealing with terse error messages generated by external code, however I’d forgot that terse exception messages are not just the domain of 3rd party libraries, they can appear in “our code” too. Take this example from some real world production code:

throw new NotFoundException(“Invalid ID”);

Sadly this will likely appear in any diagnostic log file with only the usual thread context information to back it up, e.g.

<date, time, thread, etc.> ERROR: Invalid ID

If you also have a stack trace to work from, that could give you a clue as to what type the ID refers to. You might also be able to hunt around and find an earlier diagnostic message that could give you the value too, but you’re probably going to have work for it.

Clearer Exception Messages

At a bare minimum I would also include the “logical” type and, if possible (and it usually is), then the offending value too. As such what I would have thrown is something more like this:

throw new NotFoundException(“Invalid customer ID ‘{0}’”, id);

This message contains the two key pieces of information I know are available at the throw site and so would generate a more support-friendly message like this:

<date, time, thread, etc.> ERROR: Invalid customer ID ‘1234-5678’

Discerning Empty Values

You’ll notice that the value, which in this case was a string value, is surrounded by quotes. This is important in a message where it might not be obvious that the value is empty. For example if I had not added the quotes and the customer ID was blank (a good indication of what the problem might be) then it would have looked thus:

ERROR: Invalid customer ID

Unless you know intimately what all messages look like, you could easily be mistaken for believing that the error message contains little more than our first example, when in fact the value is there too, albeit invisibly. In the improved case you would see this:

ERROR: Invalid customer ID ‘’

For strings there is another possible source of problems that can be hard to spot when there are no delimiters around the value, and that is leading and trailing whitespace. Even when using a fixed width font for viewing log files it can be tricky to spot a single erroneous extra space - adding the quotes makes that a little easier to see:

ERROR: Invalid customer ID ‘1234-5678 ’

As an aside my personal preference for using single quotes around the value probably stems from many years working with SQL. I’ve lifted values out of log files only to paste them into a variety of SQL queries countless times and so automatically including the quotes seemed a natural thing to do.

Discerning Incorrect Values

It’s probably obvious but another reason to include the value is to discern the difference between a value that is badly formed, and could therefore never work (i.e. a logic error), and one that is only temporarily unusable (i.e. a runtime error). For example, if I saw the following error I would initially assume that the client has passed the wrong value in the wrong argument or that there is an internal logic bug which has caused the value to become “corrupted”:

ERROR: Invalid customer ID ‘10p off baked beans’

If I wanted to get a little more nit-picky about this error message I would stay clear of the word “invalid” as it is a somewhat overloaded term, much like “error”. Unless you have no validation at all for a type, there are usually two kinds of errors you could generate - one during parsing and another during range validation. This is most commonly seen with datetimes where you might only support ISO-8601 formatted values which, once parsed, could be checked to ensure an end date does not precede a start date.

To me the kind of error below would still be a terse message; better than the original one, but could be even better.

ERROR: Invalid datetime ‘01/01/2001 10:12 am’

My preferred term for values that fail “structural validation” was adopted from XML parsing - malformed. A value that parses correctly is considered “well formed” whereas the converse would be “malformed”. Hence I would have thrown:

ERROR: Malformed datetime ‘01/01/2001 10:12 am’

For types like a datetime where there are many conventions I would also try and include the common name for the format or perhaps a template if the format is dynamically configurable:

ERROR: Malformed datetime ‘01/01/2001 10:12 am’ (Only ISO-8601 supported)

ERROR: Malformed datetime ‘01/01/2001 10:12 am’ (Must conform to ‘YYYY-MM-DD HH:MM:SS’)

Once a value has passed structural validation if it then fails “semantic validation” I would report that using an appropriate term, such as “out of range” or explicitly describe the comparison. With semantic validation you often know more about the role the value is playing (e.g. it’s a ‘start’ or ‘end’ date) and so you can be more explicit in your message about what the problem is:

ERROR: The end date ‘2010-01-01’ precedes the start date ‘2014-02-03’

Custom Parsing Methods

As we saw back in my original post the language framework parsing methods are often horribly terse, even when failing to parse a simple value like an integer. The reason is that, like all general purpose functions, there is a balance between performance, security, etc. and so that means including the value in the exception might be undesirable in some classes of application.

However in the systems on which I work this is not a problem and I’d happily trade-off better diagnostics for whatever little extra garbage this might produce, especially given that it’s only in (hopefully) exceptional code paths.

My solution to this problem is the proverbial “extra level of indirection” and so I wrap the underlying TryParse() framework methods with something that produces a more satisfying diagnostic. In C# extension methods are a wonderful mechanism for doing that. For example I’ve wrapped the standard configuration mechanism with little methods that read a string value, try to parse it, and if fails then include the setting name and value in the message. Consequently instead of the usual terse FormatException affair you get when parsing an int with code like this:


You invoke an extension method like so:


If this fails you’ll get a much more pleasing error message along these lines:

ERROR: The ‘timeoutInMs’ configuration setting value ‘3.1415’ was not a well formed integer value

Formatting Exceptions in C#

Whenever I create a custom exception type in C# I generally add a variety of overloads so that you can create one directly with a message format and selection of arguments without forcing the caller to use String.Format(). This is how the earlier example worked:

throw new NotFoundException(“Invalid customer ID ‘{0}’”, id);

All the class needs for this kind of use is a signature akin to String.Format’s:

public NotFoundException(string format,
                         params object[] args)
    : base(String.Format(format, args))
{ }

However this is not enough by itself, it’s dangerous. If we pass a raw message that happens to contain formatting instructions but no arguments (e.g. “Invalid list {0, 1, 2}”) it will throw during the internal call to String.Format(), so we need to add a simple string overload as well:

public NotFoundException(string message)
    : base(message)
{ }

As an aside I never add a public default constructor because I’m not convinced you should be allowed to throw an exception without providing some further information.

Due to the framework classes not having a varargs style ctor you might already be happy putting formatting calls in the callee, either with String.Format() directly or via a simple extension method like FormatWith():

throw new NotFoundException(String.Format(“Invalid customer ID ‘{0}’”, id));

throw new NotFoundException(“Invalid customer ID ‘{0}’”.FormatWith(id));

One scenario where the varargs style falls down in C# is when you start adding overloads to capture any inner exception. You can’t just add it on the end of the signature due to the final params-based args parameter, so you have to add it to the beginning instead:

public NotFoundException(Exception inner,
                         string format,
                         params object[] args)
    : base(String.Format(format, args), inner)
{ }

Sadly, this is the exact opposite of what the framework does, it adds them on the end.

In my experience very little effort is put in by developers to simulate faults and validate that exception handling is behaviour correctly, e.g. verifying what is reported and how. Consequently this makes the varargs overload potentially dangerous as adding an exception argument on the end by following the framework style would cause it to be silently swallowed as no placeholder would reference it. This has never been a problem for me in practice because I always throw custom types and always try and format a string, but that’s easy when you’re the one who discovers and instigates the pattern - others are more likely to follow the pattern they’re used to, which is probably how the framework does it.

Testing Error Messages

At the tail end of the previous section I touched briefly on what I think is at the heart of why (server-side) error messages are often so poor - a lack of testing to make sure that they satisfy the main “requirement”, which is to be helpful in diagnosing a problem. When writing unit tests for error scenarios it’s all too common to see something like this:

Assert.That(() => o.DoIt(. . .), Throws.Exception);

This verifies and documents very little about what will is expected to happen. In fact if “o” is a null reference the test will still pass and not even invoke the behaviour we want to exercise! Our expectations should be higher; but of course at the same time we don’t want to over specify and make the test brittle.

If a caller is going to attempt recovery of the failure then they need to know what the type of the exception is, lest they be forced to resort to “message scraping” to infer its type. Consequentially it’s beneficial to verify that the exception is of at least a certain type (i.e. it may also be a derived type). Where an obvious value, such as an input argument, is a factor in the error we can loosely test that the value is contained within the message somewhere too:

const string id = “malformed-value”;

Assert.That(() => o.DoIt(id),

If there are many arguments to be validated it is important to try and match on the message too, or some other attribute, to discern which of the many arguments caused the exception to be thrown. It’s all to easy when making a change to a method that you end up finding one or more of your tests are passing because they are detecting a different error to the one they were intended to, as the null reference example above highlighted.

Whilst that takes care of our code-based contractual obligations we also have an obligation to those who will be consuming these messages to make sure they form a useful narrative within any diagnostic output. I tackled this side of the issue some years ago in “Diagnostic & Support User Interfaces” and so if you’ve experienced a sense of déjà vu reading this post then that may be a contributing factor.

Tuesday, 13 January 2015

Text Editor Outlining Modes Hide Complexity

I have this theory, but I don’t have a statistically large enough data set [1] with which to publish anything other than a random musing. My theory is that developers who use a text editor with its “Outlining Mode” [2] switched on have a tendency to write more “complex” code.

Of course tarring every developer with the same brush is not exactly endearing myself to my readership and so I fully expect there to be many who use this feature with great aplomb and are aware of its apparent shortcomings. However for those who are unsure as to why I’m proposing this theory let me explain what it is that I’ve seen when working with developers who have this feature switched on.

Duplicate Code

The most obvious thing I’ve noticed is that developers who don’t see all the code on screen at once don’t notice all the duplication. If I’ve spotted a problem in a method then, as a natural by-product of seeing all the code for the class, I’ve also noticed the same pattern appears in a number of other places. In contrast when pairing with someone who can only see the method names, except for the one being currently edited, they do not automatically notice and therefore go looking for any duplication.

Of course the real problem is the duplication itself. I suspect that because I always see all the code I tend to see duplication as it builds up and so will naturally refactor it as soon as the test goes green. When all you see is method names I’m not sure how you are expecting to notice duplication unless you literally cut-and-pasted some other code. At least with code literally cut-and-pasted it looks identical, but when you’ve written code that only appears logically similar to that in another method it can be harder to reason away the duplication and do the necessary refactoring.

Accidental Complexity

In a more general sense what appears to be happening is that, paradoxically, the tool that tries to reduce the noise and clutter whilst editing code has succeeded in creating further complexity by constantly hiding it. If you do not “feel the pain” caused by working with overly complex code you do not feel the need to refactor it, until it becomes a problem, by which point you’ve probably already lost the battle. By having all the code in a source file visible you get to see the big picture so that even if you’re making edits to one method your subconscious can still go to work on the surrounding code and nudge you when it thinks something smelly is appearing.

Debugging is another opportunity where you get to take a walk through code you’ve perhaps not worked on yourself. In effect you have the opportunity to review that code as you’re debugging it, not formally, but you can certainly makes notes and raises any issues later (See “In The Toolbox - Pen & Paper”).

C# Regions

The use of regions in C# is another variation on this problem and it becomes yet another switch to turn off in the IDE to ensure all code is visible all the time. In theory being able to hide auto-generated code from developers when editing is a venerable goal, but the way I’ve seen it being used is to hide big chunks of code which just appears to act as a catalyst for the creation of behemoth source files.


Does this “theory” hold water? When I tweeted about it there were a few murmurs that at least supported vague anecdotal evidence and nobody came straight back and cited a paper, so I don’t think it’s already been debunked. I was pretty sure it’s old news too and amusingly I did get a reply from Len Holgate with a link to a blog post he had written almost exactly 10 years earlier that touched on part of this topic “VS.Net #Region”.

Perhaps a passer-by will leave a comment in future that will provide evidence of a more scientific study of this feature. Or alternatively someone else will point out how the tools being used incorrectly so that I don’t feel the need to turn it off the moment I install Visual Studio.


[1] You can read that as “a handful of cases”.

[2] That’s the name used in Visual Studio, but I have also heard the term “folding editor” used for the same feature.