Friday 16 January 2015

Terse Exception Messages - Part II

Quite unintentionally it looks like my post back in October last year “Terse Exception Messages” needs a second part. The initial post was about dealing with terse error messages generated by external code, however I’d forgot that terse exception messages are not just the domain of 3rd party libraries, they can appear in “our code” too. Take this example from some real world production code:

throw new NotFoundException(“Invalid ID”);

Sadly this will likely appear in any diagnostic log file with only the usual thread context information to back it up, e.g.

<date, time, thread, etc.> ERROR: Invalid ID

If you also have a stack trace to work from, that could give you a clue as to what type the ID refers to. You might also be able to hunt around and find an earlier diagnostic message that could give you the value too, but you’re probably going to have work for it.

Clearer Exception Messages

At a bare minimum I would also include the “logical” type and, if possible (and it usually is), then the offending value too. As such what I would have thrown is something more like this:

throw new NotFoundException(“Invalid customer ID ‘{0}’”, id);

This message contains the two key pieces of information I know are available at the throw site and so would generate a more support-friendly message like this:

<date, time, thread, etc.> ERROR: Invalid customer ID ‘1234-5678’

Discerning Empty Values

You’ll notice that the value, which in this case was a string value, is surrounded by quotes. This is important in a message where it might not be obvious that the value is empty. For example if I had not added the quotes and the customer ID was blank (a good indication of what the problem might be) then it would have looked thus:

ERROR: Invalid customer ID

Unless you know intimately what all messages look like, you could easily be mistaken for believing that the error message contains little more than our first example, when in fact the value is there too, albeit invisibly. In the improved case you would see this:

ERROR: Invalid customer ID ‘’

For strings there is another possible source of problems that can be hard to spot when there are no delimiters around the value, and that is leading and trailing whitespace. Even when using a fixed width font for viewing log files it can be tricky to spot a single erroneous extra space - adding the quotes makes that a little easier to see:

ERROR: Invalid customer ID ‘1234-5678 ’

As an aside my personal preference for using single quotes around the value probably stems from many years working with SQL. I’ve lifted values out of log files only to paste them into a variety of SQL queries countless times and so automatically including the quotes seemed a natural thing to do.

Discerning Incorrect Values

It’s probably obvious but another reason to include the value is to discern the difference between a value that is badly formed, and could therefore never work (i.e. a logic error), and one that is only temporarily unusable (i.e. a runtime error). For example, if I saw the following error I would initially assume that the client has passed the wrong value in the wrong argument or that there is an internal logic bug which has caused the value to become “corrupted”:

ERROR: Invalid customer ID ‘10p off baked beans’

If I wanted to get a little more nit-picky about this error message I would stay clear of the word “invalid” as it is a somewhat overloaded term, much like “error”. Unless you have no validation at all for a type, there are usually two kinds of errors you could generate - one during parsing and another during range validation. This is most commonly seen with datetimes where you might only support ISO-8601 formatted values which, once parsed, could be checked to ensure an end date does not precede a start date.

To me the kind of error below would still be a terse message; better than the original one, but could be even better.

ERROR: Invalid datetime ‘01/01/2001 10:12 am’

My preferred term for values that fail “structural validation” was adopted from XML parsing - malformed. A value that parses correctly is considered “well formed” whereas the converse would be “malformed”. Hence I would have thrown:

ERROR: Malformed datetime ‘01/01/2001 10:12 am’

For types like a datetime where there are many conventions I would also try and include the common name for the format or perhaps a template if the format is dynamically configurable:

ERROR: Malformed datetime ‘01/01/2001 10:12 am’ (Only ISO-8601 supported)

ERROR: Malformed datetime ‘01/01/2001 10:12 am’ (Must conform to ‘YYYY-MM-DD HH:MM:SS’)

Once a value has passed structural validation if it then fails “semantic validation” I would report that using an appropriate term, such as “out of range” or explicitly describe the comparison. With semantic validation you often know more about the role the value is playing (e.g. it’s a ‘start’ or ‘end’ date) and so you can be more explicit in your message about what the problem is:

ERROR: The end date ‘2010-01-01’ precedes the start date ‘2014-02-03’

Custom Parsing Methods

As we saw back in my original post the language framework parsing methods are often horribly terse, even when failing to parse a simple value like an integer. The reason is that, like all general purpose functions, there is a balance between performance, security, etc. and so that means including the value in the exception might be undesirable in some classes of application.

However in the systems on which I work this is not a problem and I’d happily trade-off better diagnostics for whatever little extra garbage this might produce, especially given that it’s only in (hopefully) exceptional code paths.

My solution to this problem is the proverbial “extra level of indirection” and so I wrap the underlying TryParse() framework methods with something that produces a more satisfying diagnostic. In C# extension methods are a wonderful mechanism for doing that. For example I’ve wrapped the standard configuration mechanism with little methods that read a string value, try to parse it, and if fails then include the setting name and value in the message. Consequently instead of the usual terse FormatException affair you get when parsing an int with code like this:

int.Parse(settings[“timeoutInMs”]);

You invoke an extension method like so:

settings.ReadInt(“timeoutInMs”);

If this fails you’ll get a much more pleasing error message along these lines:

ERROR: The ‘timeoutInMs’ configuration setting value ‘3.1415’ was not a well formed integer value

Formatting Exceptions in C#

Whenever I create a custom exception type in C# I generally add a variety of overloads so that you can create one directly with a message format and selection of arguments without forcing the caller to use String.Format(). This is how the earlier example worked:

throw new NotFoundException(“Invalid customer ID ‘{0}’”, id);

All the class needs for this kind of use is a signature akin to String.Format’s:

public NotFoundException(string format,
                         params object[] args)
    : base(String.Format(format, args))
{ }

However this is not enough by itself, it’s dangerous. If we pass a raw message that happens to contain formatting instructions but no arguments (e.g. “Invalid list {0, 1, 2}”) it will throw during the internal call to String.Format(), so we need to add a simple string overload as well:

public NotFoundException(string message)
    : base(message)
{ }

As an aside I never add a public default constructor because I’m not convinced you should be allowed to throw an exception without providing some further information.

Due to the framework classes not having a varargs style ctor you might already be happy putting formatting calls in the callee, either with String.Format() directly or via a simple extension method like FormatWith():

throw new NotFoundException(String.Format(“Invalid customer ID ‘{0}’”, id));

throw new NotFoundException(“Invalid customer ID ‘{0}’”.FormatWith(id));

One scenario where the varargs style falls down in C# is when you start adding overloads to capture any inner exception. You can’t just add it on the end of the signature due to the final params-based args parameter, so you have to add it to the beginning instead:

public NotFoundException(Exception inner,
                         string format,
                         params object[] args)
    : base(String.Format(format, args), inner)
{ }

Sadly, this is the exact opposite of what the framework does, it adds them on the end.

In my experience very little effort is put in by developers to simulate faults and validate that exception handling is behaviour correctly, e.g. verifying what is reported and how. Consequently this makes the varargs overload potentially dangerous as adding an exception argument on the end by following the framework style would cause it to be silently swallowed as no placeholder would reference it. This has never been a problem for me in practice because I always throw custom types and always try and format a string, but that’s easy when you’re the one who discovers and instigates the pattern - others are more likely to follow the pattern they’re used to, which is probably how the framework does it.

Testing Error Messages

At the tail end of the previous section I touched briefly on what I think is at the heart of why (server-side) error messages are often so poor - a lack of testing to make sure that they satisfy the main “requirement”, which is to be helpful in diagnosing a problem. When writing unit tests for error scenarios it’s all too common to see something like this:

Assert.That(() => o.DoIt(. . .), Throws.Exception);

This verifies and documents very little about what will is expected to happen. In fact if “o” is a null reference the test will still pass and not even invoke the behaviour we want to exercise! Our expectations should be higher; but of course at the same time we don’t want to over specify and make the test brittle.

If a caller is going to attempt recovery of the failure then they need to know what the type of the exception is, lest they be forced to resort to “message scraping” to infer its type. Consequentially it’s beneficial to verify that the exception is of at least a certain type (i.e. it may also be a derived type). Where an obvious value, such as an input argument, is a factor in the error we can loosely test that the value is contained within the message somewhere too:

const string id = “malformed-value”;

Assert.That(() => o.DoIt(id),
            Throws.InstanceOf<ValidationException>()
                  .And.Message.Contains(is));

If there are many arguments to be validated it is important to try and match on the message too, or some other attribute, to discern which of the many arguments caused the exception to be thrown. It’s all to easy when making a change to a method that you end up finding one or more of your tests are passing because they are detecting a different error to the one they were intended to, as the null reference example above highlighted.

Whilst that takes care of our code-based contractual obligations we also have an obligation to those who will be consuming these messages to make sure they form a useful narrative within any diagnostic output. I tackled this side of the issue some years ago in “Diagnostic & Support User Interfaces” and so if you’ve experienced a sense of déjà vu reading this post then that may be a contributing factor.

No comments:

Post a Comment