Friday 9 November 2012

Include the Units in Configuration Data

[I have no right to chastise anyone else about this because I do it myself - it just always seems so “unnecessary” at the time… Hence this post is an attempt to keep me honest by giving my colleagues the opportunity to shove it back in my face every time I do it in the future.]

Say I asked you to change a “timeout” setting to “3 minutes” what value would you instinctively use?

  1. 0.05
  2. 3
  3. 180
  4. 180000

Although I’m tempted to say that milliseconds are still the de-facto resolution in most low-level APIs I deal with these days, I have seen nanoseconds creeping in, even if the actual resolution of the implementation is actually less than that. However that’s what the internal API deals with. These kinds of numbers are often way too big and error prone for humans to have to manipulate; if you miss a single zero off that last example it could make a rather important difference.

If you don’t need anywhere near millisecond precision, such as dealing with a run-of-the-mill database query, you can reduce the precision to seconds and still have ample wiggle room. Dealing with more sane numbers, such as “3”, feels so much easier than “0.05” (hours) and is probably less likely to result in a configuration error.

There is of course a downside to not using a uniform set of units for all your configuration data - it now becomes harder to know what the value “3” actually represents when looking at an arbitrary setting. You could look in the documentation if you (a) have some and (b) it’s up to date. If you’re privileged enough to be a developer you can look in the source code, but even then it may not have a comment or be obvious.

One answer is to include the units in the name of the setting, much like you probably already do when naming the same variable in your code. Oh, wait, you don’t do it there either? No, it always seems so obvious there, doesn’t it? And this I suspect is where I go wrong - taking the variable and turning it into an externally configurable item. The variable name gets copied verbatim, but of course the rest of the source code context never travels with it and consequently disappears.

So, the first solution might be to name the setting more carefully and include any units in it. Going back to my original example I would call it “TimeoutInMinutes”. Of course, one always feels the urge to abbreviate and so “TimeoutInMins” may be acceptable too, but that’s a whole different debate. Applying the same idea to another common source of settings - file sizes - you might have “MaxLogFileSizeInKB”. The abbreviation of Kilobytes to KB though is entering even murkier waters because KB and Kb are different units that only differ by case. If you’re dealing with case-insensitive key names you better watch out[1].

Another alternative is to look at adding units to the value instead. So, in my timeout example the value might be “3mins”, or you could say “180 secs” if you preferred. If you’re used to doing web stuff (i.e. I’m thinking CSS) then this way of working might seem more natural. At its expense it adds a burden to the consuming code which must now invoke a more complicated parser than just strtoul() or int.Parse()[2]. It’s not hard to add this bit of code to your framework but it’s still extra work that needs thinking about, writing and testing[3].

For timeouts in particular .Net has the TimeSpan type which has a pretty flexible string representation that takes hours, minutes, seconds, etc into account. For the TimeSpan type our example would be “00:03:00”. But would you actually call it “TimeoutInTimeSpan”? It sounds pretty weird though. Better sounding would be “TimeoutAsTimeSpan”. I guess you could drop the “InTimeSpan” suffix if you were to use it consistently for timeouts, but then we’re back to where we started because I’m not aware of any similar scheme for representing bytes, kilobytes, megabytes, etc.

 

[1] This is one of those situations where case-sensitivity is often not formally defined either. Explicit use of the == operator or implicit use via a Dictionary means no one cares until someone else sticks a different-cased duplicate in by accident and things don’t quite work as expected.

[2] Personally I prefer to wrap this kind of parsing because a raw “invalid format” style exception often tells you pretty much nothing about which value it was that choked. Extracting related settings into a richly-typed “Settings” class up-front means you don’t end up with the parsing exception being thrown later at some horribly inconvenient time. And it reduces the need to mock the configuration mechanism because it’s a simple value-like type.

[3] Not necessarily in that order. It’s perfect TDD/unit test fodder though, so what’s not to like…

2 comments:

  1. Great subject, nice post!

    I think the solution is to let the code carry the quantity (value with unit) all the way and convert it to the API at hand at the 'last responsible moment'.

    So I'd say: preferably one of:
    - "3 min"
    - "180 s"

    Or if surrounding values are in [ks] may be:
    - "0.18 ks"

    See for example:
    - http://martin-moene.blogspot.com/2012/07/light-on-whole-value.html
    - https://github.com/martinmoene/PhysUnits-RT

    Just to have it mentioned: The SI name for 1024 is kibi Ki (pending adoption), see http://physics.nist.gov/cuu/pdf/sp811.pdf pages 7 and 74.

    ReplyDelete
  2. Thanks. Credit where it's due your recent musings around whole value types goes some way to what I've been reflecting on recently. I have another post on primitive domain types already queued up :-).

    ReplyDelete