Tuesday, 26 November 2013

Logging Stack Traces Should be Unnecessary

I like a nice clean log file. The left hand margin should be a fixed width and easy on the eye so that my built-in pattern recognition gets to work as effectively as possible. It should also be easy to parse with the age old UNIX command line tools (and LogParser) without me having to pre-process the file first to get shot of the noise.

What really messes with this is when I see stack traces in the log file. They are always huge and contain far too much duplication because modern software design principles suggest we write short, simple methods, with overloads and reuse by forwarding calls instead of cutting-and-pasting functionality:-

. . .
void DoSomething(string[], int, int)
void DoSomething(string[])
void DoSomething(IEnumerable<string>)
. . .

So, seriously, has that level of detail ever enabled you to solve a problem? Without knowing what the parameter values are how much do stack traces even tell you? Agreed, if all you’ve got is a crash dump then a stack trace is invaluable, but I’m talking about logging stack traces which by definition means that you’re already writing other diagnostic information too.

Design Smell?

I’ve always favoured keeping stack traces out of log files on the basis that they are of little value in comparison to other techniques, and so far I’ve found that I don’t miss them. In my experience, if the design of the code is right and the error message (e.g. exception message) is well written it should be fairly easy to reason about where in the code the problem is, which is effectively what a stack traces tells you. In short that means a simple GREP on the source code to find where the message is generated.

You might argue that a stack trace tells you that up front so why make more effort than necessary, which is of course true, but you’re also going to need the context, which a stack trace will not tell you unless it logs all its parameter values too. And for that we need to visit the log file, and if we’re going to do that how much time and effort are we really saving at the cost of extra background noise? More importantly this is the moment when the quality of our log message entries will begin to shine or we find ourselves lost in the trees looking for the wood. Hopefully during development you’ve already been dog-fooding your own logs to get a feel for how well you can solve real support problems when using them.

Test Infrastructure

The second part of the puzzle of avoiding needing all this gratuitous text is an ability to reproduce the problem easily within a debugger. Hopefully from the context you should be able to explore the problem in isolation - varying different inputs to see how the code is reacting. If the design is simple you should easily be able to step through an existing test case and see where the points of trouble might be, e.g. some missing or dodgy error handling.

At this stage, while the overarching goal is to fix the problem at hand, the fact that a bug has crept in means that the development process has failed to some degree and therefore I’d be taking this as an opportunity to compensate by doing a review. It’s likely I won’t action anything there-and-then, instead favouring to make some notes so that any eventual action can be triaged and prioritised separately.

Depending on the complexity of the system, this is the point at which I might rely on any custom tooling I’ve built to help isolate certain aspects of the system so that they can be exercised in a tightly controlled and deterministic environment, e.g. console app test harness that hosts the various services in-process.

Minimal Traces

What I despise most about many stack traces I see in forum posts is the sheer volume of noise. There is clearly some value in them, more so for “generic” unhandled exceptions like a NullReferenceException that have no message, but do they have to extend to reams and reams of text? When I log an exception I write the entire exception chain with both the type and the message; all on a single line. This is done using an extension method for the Exception type in C#. The same could easily be done for the stack trace, the only reason I haven’t done it is because I haven’t needed to, yet. But if I did write one what would it do?

Firstly I’d strip away all the arguments as they are fairly meaningless without their values. I’d also collapse all overloads into a single method as forwarding calls are uninteresting too. The bottom part of any stack trace is a load of boilerplate system code, so I’d chuck that away and make the entry point to my code the first interesting point of reference, which I should be able to find because assembly names and namespaces tend to follow a corporate convention. The same is probably true for the top of the stack, but the very first frame may be meaningful so perhaps I’d keep that, although if I had to keep just one method name it would be the last method of my code I’d keep as that is the first point that has any real meaning. Finally, I’d rotate what’s left and stitch it together with pipes, probably something like this (ignoring the unfortunate word-wrap):-

2001-01-01 ERR Unhandled exception - OrderController.GetOrder|>ProductService.FindProduct {NullReferenceException}

I definitely don’t need the full namespace names, just the class and method, although I’d argue that with decent method names even the classes might easily be inferred from just the method name and context. Doesn’t that look a whole lot less noisy?

Whilst I might not convince you to drop stack traces entirely from your logs, at least entertain the idea that you can represent them in a far more support-friendly fashion than what the runtime throws out by default.

No comments:

Post a Comment