Wednesday, 8 September 2010

What’s Exceptional Depends on Context

The latest edition of Overload (one of the journals from the ACCU) has another episode of Matthew Wilson’s excellent Quality Matters column. The latest topic is exception handling in which he makes a reference to The Practice of Programming by Kernighan & Pike – a book which I’ve only ever skimmed so far. Hence I thought I should to read the relevant chapter and somewhat surprisingly I found the following statement:-

“Exceptions are often overused. Because they distort the flow of control, they can lead to convoluted constructions that are prone to bugs. It is hardly exceptional to fail to open a file; generating an exception in this case strikes us as over-engineering. Exceptions are best reserved for truly unexpected events, such as file systems filling up or floating-point errors.”

My gut reaction to that was “No it’s not – it depends on context”…

External Resources Always Fail

Messer’s Kernighan & Pike are obviously far more knowledgeable and experienced than me so why did they make such a bold statement? Generalising somewhat, I presume that they were suggesting that any interaction with an external resource should be handled with due care as you should expect it to fail – after all we always encounter problems with files, sockets, databases etc. And as a general rule this makes perfect sense.

Full Environmental Control

But… In the world of in-house server-side software you are often completely in control of the entire environment in which your code runs. Even for in-house desktop software the environment may be a little more hostile but you can still ensure certain invariants. Taking the “opening a file” example again and given the following conditions, would failing to open a file still be considered unexceptional?

  1. The file exists (say we just created it and control its lifetime)
  2. The permissions on the folder are correct (our server installation script set them)
  3. The server has plenty of internal resources (memory, handles etc)
  4. The process has not suffered any unrecoverable errors

In this situation I would be on the side of it being exceptional. In my experience failures under these kinds of scenarios usually point to a system configuration error [1 or 2], which means it would never have worked, or the server/process itself has become unstable [3 or 4] due to some earlier (and hence unexpected) issue.


Of course the issue is perhaps moot as all errors fall into two categories – recoverable and unrecoverable. In the scenario described above the recovery is effectively manual[*] – fix the system configuration or restart the process/server. In the real world I can’t see how you could justify the cost of designing a system to automatically cope with some/all of those potential edge cases[+]. Even the last one (process instability) can be hard to handle fully via process recycling because too much badly written code masks nasty errors like Out Of Memory or Access Violations by sinking everything so you don’t get to react to it.

In contrast the times where I can see you might be able to recover from failing to open a file are when the choice of file is controlled by the user, such as in a desktop application, or when the contents of the file can be considered optional, such as user settings.

Applying the issue wider afield to databases and remote services we end up at clustering/replication as the obvious recovery option. Even so, given the general reliability of in-house networks and (non-blade) hardware these days, I would still consider it pretty exceptional for a failure to occur. By-and-large misconfiguration is the most common reason I see for failures involving resources.

Do Or Do Not (Or There May Be a Try)

The chapter of The Practice of Programming in question was about designing interfaces so do we assume that this message is aimed more at the writers of libraries rather than applications? Given that you don’t know the context in which your code is being called do you have to choose between error codes or exceptions? Would your interfaces be better or worse if you somehow supported both so that the caller could decide? I’m sure there are many developers out there right now wincing as they stare at their legacy COM heavy codebases that have a mixture of wrappers generated by Visual C++’s #import feature where some methods throw and others return HRESULTs, some have a Chk prefix and others don’t. What I’m suggesting has more moderation to it…

In the .Net world there is a common pattern where methods by default succeed or throw an exception and a parallel set of methods named TryXxx that return a bool and use an output parameter instead. The latter ‘overloads’ allow you to avoid the cost[#] of handling an exception such as when parsing numbers and DateTimes that you can expect to be iffy (format wise) but they will still throw other exceptions which are orthogonal to the task such as Out of Memory.

Was It Just a Poke At Java?

Or maybe I’m just being too literal? The book was written over 10 years ago (back when exceptions where still not exactly mainstream) and it follows an example in Java so perhaps they were just having a pop at the choice Java had made. After all, at that time Java was still popular as a client-side technology – even outside the browser.

The fact that people like Matthew Wilson are still regurgitating this argument a decade later means there is still much to learn. I think the TryXxx pattern is a nice compromise as you get the flexibility of handling a domain failure without resorting to an unnecessary try/catch block and you also don’t accidentally mask orthogonal failures by ignoring the return code or catching an exception of too generic a type. I wonder if this also satisfies Kernighan & Pike or do they find it muddies the waters even further and still stand by their original statement?


[*] I say manual because what normally seems to happen is that the system monitoring software gets bombarded with errors and someone has to make sense of them. Once that’s done it usually involves bouncing something…

[+] In the embedded world, where lives may be at risk, there are forces way outside the normal risk/reward trade offs of an in-house corporate system.

[#] I’m referring to both the performance and maintenance costs of using exceptions.

No comments:

Post a Comment