Tuesday, 20 December 2011

Diagnostic & Support User Interfaces

When people talk about user interfaces they are invariably talking about the face of a product or system that the end-users see and/or interact with. But within a complex system there are other user interfaces too that are not seen or interacted with by end-users at all; instead it is fellow developers and, more importantly, support staff that get to see this less salubrious side.

In a small well-established team of developers there will naturally be a lot of cross fertilisation meaning that they will be well versed in significant portions of the system. In contrast, a larger team will have skills spread across different disciplines and so from a support perspective it starts to approach the knowledge levels of a support team - far less. In my experience support teams do not get the luxury of devoting themselves to learning one system instead out; instead they invariably have to support a whole bunch of in-house systems written by different teams using different technologies and having different ideas about what is required to make a product or system “supportable”.

So what kinds of interfaces am I referring too? Well, the most traditional are status monitoring pages and custom admin tools; the kind of utilities built specifically for the purpose of administering and supporting the system. Due to the time constraints imposed and the lack of direct business value they are usually not given any real TLC but are just thrown together, presumably under the misguided assumption that it will never really be used except by them or their close colleagues. But these scripts and command line tools are not the only faces of the system you’ll see when doing support duties; before that you’ll probably be faced with hunting through the custom format log files, which may or may not be easily consumable. Stretching the analogy somewhat further you could encompass configuration files and the database schema as interfaces of sorts. They’re not graphical in themselves, but careful though about their structure can ensure that the off-the-shelf tools stand more chance of making interaction a more productive experience.

Anyway, here is my list of things to think about. I’ve not done any “production” UI programming for a few years and it’s been even longer since I read About Face (the 1st edition no less) so I’ll no doubt contradict some modern wisdom along the way. Your target audience may not be novice computer users, but they do have the clock against them and they won’t thank you for making an already difficult situation any harder.

Command-Line Tools

Server-side developers aren’t renowned for their user interface programming. After all, that’s why you have UX designers and developers whose major skills lie in the area of UI programming. But that doesn’t mean we have be to sloppy about the interfaces for the scripts and tools we do create. I’m not suggesting for a moment that we spend hours gold-plating our utilities, but I am proposing that you adhere to some of the common conventions that have been established over the years:-

Provide a help switch (-?, -h, --help) so that the user can see what the options are. Documentation, if you can even find it for a utility, can easily get out of step with the tool usage and so you need to be able to see which switch it is that you’re apparently missing, or that is no longer relevant for the task at hand. Even if you’re au fait with the tool you might just want to double-check its usage before going ahead and wiping out last years data.

Provide a version switch (-v, --version) so the user can check its compatibility, or if it’s become out of date. Many tools are installed on the application servers or alongside client binaries, but sometimes those tools are just “passed around” and then become available on some desktop share without you realising. You can even dump the version out on every invocation as the first line; this can be particularly useful if you’re capturing stdout when running it as a scheduled task.

Provide long switch names (--switch-name) by default and use them in documentation so it is obvious what the example command is doing. Short switch names are for advanced users and so should be considered the optional case. Positional parameters may save the user some typing but when you use concepts like integer based surrogate keys it’s not exactly obvious what an example may be suggesting:-

C:\> DC 1234 2001-02-03 1

compared to:-

C:\> CustomerAdmin delete --customer-id 1234
  --date 2001-02-03 --force

Use a consistent switch format. I’ve had to use utilities where one developer has made the switch names case-sensitive and another case-insensitive. Sometimes they use ‘/’ as the switch marker and other times it’s a ‘-‘. Often the case-sensitivity issue is down to lazy programming - using == or Equals() instead of the correct method. If you’re on Windows try and support both styles as even Microsoft’s developers aren’t consistent. Better yet download a 3rd party library that lets you use either style as there are plenty to choose from[#].

Return a meaningful result code. The convention (on Windows) is that zero means success and non-zero means failure or partial success. Some tools do provide more detailed result codes, but fundamentally you want assume that anyone trying to automate your tool on Windows is going to write something along these lines as a first-order approximation:-

CustomerAdmin.exe --do-something-funky
if errorlevel 1 exit /b 1

Sadly, 0 is also the default result code that most frameworks will return for an unhandled exception which really makes the whole process rather messy and means that you need to get a Big Outer Try Block in place in main() as part of your bootstrap code.

Support a read-only/test/what-if mode. Most of my support tools have grown out of test harnesses and other development tools and so I’ve naturally added switches that allow you to run them against production in a benign mode so that you can see what would happen. PowerShell has taken this idea forward formally with its common -WhatIf switch. When you’re not 100% sure of what the expected action is or how many items it will act upon this can be a life saver.

Most of the same comments apply to scripts, but they have the added advantage that they are deployed in source code form and so you have the ability to see the logic first hand. Of course when you’re in the thick of it you need to be able to trust the tool to do what it says it will.

Logging

When faced with a production issue, log files will often become the first port of call after you’ve established which process it is that has failed. But log files[+] often serve two different masters and the latter one can get forgotten once the development work has finished. During development it is probably the developer that spends the most time pouring over the output and therefore they have a habit of dumping every last bit of detail out. After the feature is complete though the main consumer then becomes the support team. What they don’t want to see is reams of output so that the proverbial wood is lost behind the trees.

Make sure your log messages make sense. This may seem obvious but spelling mistakes, whilst cute in a time of stability, are a distraction when you’re up to your eyes in it. Log messages should form a narrative that lead you up to the moment of interest, and that narrative should be written with the reader in mind - mostly likely a support engineer - someone who doesn’t know the code intimately. By all means be succinct, but ensure you’re not being ambiguous either.

Use the console output for high-level commentary, warnings and errors only. If extra output is required it should be enabled by an extra --verbose switch (which can easily be added temporarily under automation) or by having a companion log file with more detail. I tend to favour the second approach so that the detail is always there if needed but so that it’s not in your face the majority of the time. Remember we’re talking about in-house stuff here and so you have the luxury of controlling the environment in ways a 3rd party can often only dream of.

Keep to a low number of severity levels. If you have a large number of severity levels developers will not be able to log messages consistently. I recently saw a tweet that suggested you use only the following levels - FYI, WFT & OMG. If you put yourself in the support person’s shoes that’s actually a pretty good scale. Once the issue is triaged and the development team are involved the TRACE or DEBUG level output suddenly becomes of more use, but until then keep it simple.

Write the messages as text. A simple lined-based text format may seem very old-fashioned, but there are a plethora of tools out there designed to slice-and-dice text based output any way you want, e.g. grep, awk & sed. Plus you have modern GUI based Tail programs that support highlighting to really make those important messages shine, e.g. BareTail & LogExpert. Layer on top something like LogParser which supports a SQL like syntax and support for many formats out of the box and you’re in good shape to triage quickly. Also don’t underestimate the importance of NotePad as a tool of last resort.

Make the log message format simple. If you’re writing a separate log file include at least the date & time, and preferably the process ID (PID). If the tool is multi-threaded then the thread ID (TID) will be invaluable as well, along with the highest precision[*] of time (e.g. milliseconds or better) to spot potential race conditions. I’ve found the following format to be very easy to scan by eye and parse by tool as the left-hand side is of fixed width (the PID & TID are padded appropriately):-

<ISO date & time> <PID> <TID> <severity> <message...>

2001-01-01 01:02:03.456 1234 3456 INF Starting...
2001-01-01 01:02:04.123 1234 5678 ERR Stuff broke...

Enclose string values in some form of quotes or braces to make empty strings more visible. As I posted very recently, empty strings are a form of Null Object and so they have a habit of being silently propagated around before finally causing odd behaviour. Personally I like to enclose strings in single quotes because I often have to use those values in SQL queries and so then it’s already quoted for me. As an example I once spent far longer than necessary investigating an issue because the log message said this:-

Valuing trade ...

instead of:-

Valuing trade ''...

Include all relevant context, preferably automatically. Messages that just tell you something bad happened are useless without the context in which the operation is taking place. There is nearly always some higher-level context, such as a request ID or customer name that you can associate with the PID or TID to help you narrow down the data set. Debugging a complex or lengthy process just so you can find out the set of inputs to find that the data is dodgy is not a good use of time and costs the business money, especially if the relevant variables are in scope at the point of writing the error message![^]

Don’t spam the log, write a summary. If you have a process that deals with particularly dirty data, or you have a service that is temperamental try not to log every single indiscretion as a warning or error. Instead try and batch them up, perhaps by item count or time window and write a single message that gives a summary of the issue. At a high-level you’ll just be interested in seeing if the problem is getting worse and so using a tolerance is even better as once you have an idea of what “normal operation” is you can save the scary messages for when they’re really needed.

One of the reviews I do during system testing is to look at the log output. I try and put myself in the position of the support engineer and read the log messages to see if they make sense to me (I often only have a vague appreciation of the process in question). The thing I’m most looking out for is log spam which is usually solved by downgrading messages from, say, INFO to TRACE so that the main output is all about the big picture. It should contain virtually no warnings or errors to avoid support staff becoming desensitised by noise and when they are present they should hopefully be backed by more detailed diagnostics somewhere else.

 

[*] Don’t be fooled into thinking that because a datetime value can be formatted with a milliseconds part that it is actually that accurate. IIRC 16-bit Windows used to have a resolution of 50 ms whilst the NT lineage was better at 10 ms. If you wanted better accuracy than that you needed to resort to the Performance Counters API.

[#] My own free C++ Core library has a reasonably decent parser should you want to tap into the 3rd party option.

[+] I have mixed sentiments about the use of log files. On the one hand I feel they should largely be unnecessary because we should be able to anticipate, test for and handle most types of failure gracefully. But I’m also aware that in-house systems rarely have that level of robustness as a requirement and so we’re often forced into using it as a defensive measure against inadequate test environments and the other in-house external systems.

[^] I clearly remember spending a few hours at the weekend debugging a financial process that was failing with the message “Failed to correlate pairs”. What annoyed me most was that the developer had the two failing currency names in variables in scope but had not bothered to include them in the message! I fixed the error message first thing Monday morning...

Sunday, 11 December 2011

New SQL Tools - SS-Unit, SS-Cop & sql2doxygen

At my current client I’ve spent far more time working with SQL (and specifically SQL Server) then ever before; but that is probably quite apparent from the bias of blog posts over the last year or so. With any new venture you soon find yourself drawing on your past experiences and in this case I’ve been looking for some of the equivalent tools in the SQL world that I’d use normally use with C# & C++. But I’ve found them hard to come by. It’s possible I’m looking in the wrong places but it’s not as if there are a stream of vendor adverts either trying to distract me and that gives me a feeling it’s not worthy of a prolonged hunt.

There is another reason why I may have not searched overly hard, and that’s because I find it more fun to build something myself - it’s a great learning exercise. Mind you that’s not the right answer for my client where we have both money and far more important things to be doing than building tooling - assuming this is we can find the right alternatives[*]. What I’m releasing below are my clean-room[+] versions of some of the tools we knocked-up to get ourselves going.

Naturally as these begin to surpass what we lashed up they become 3rd party candidates themselves to take over the same duties and my client wins too because it no longer has to consider supporting the code internally. I’ve always made the source code available for my stuff so that others have a chance to tinker or fix bugs and these are no exception. Of course given that they are T-SQL based (or PowerShell in the case of sql2doxygen), it would be pretty hard for me to keep it secret anyway!

SS-Unit - SQL Server Unit

This is a pure T-SQL based unit testing framework. You write your unit tests, in T-SQL, in the familiar xUnit style and then execute them by invoking the SS-Unit test runner (a stored procedure). The framework has support for both the test and fixture level SetUp and TearDown helpers which are more prevalent in SQL tests because of the static data dependencies you often need to satisfy. It also distinguishes between being run interactively, in something like SQL Server Management Studio (SSMS), and being run in batch mode with, say, SQLCMD so that it can be integrated into your Continuous Integration system for automated SQL unit testing.

The .zip file package, which includes a trivial example database and the unit tests for the framework itself is available on my web site here on the SQL section page. There is also an online copy of the manual if you just want to have a nosy.

SS-Cop - SQL Server Schema Cop

Earlier this year I posted about “DbCop”, a StyleCop/FxCop like tool that a team-mate had put together for use on the project. That generated quite a bit of interest and a few people asked if there was any chance of it being released. As I point out in a footnote[+] that isn’t really an option so instead here is my take on the idea. Clearly FxCop does some pretty deep analysis and I’m not suggesting this is going to provide anything of that magnitude. But there are a number of checks that we can do that ensure style conventions are being adhered to and common mistakes, like forgetting a primary key, are avoided. I have a few other new ideas too, such as trying to detect when a foreign key might be missing.

The .zip package is available from the same SQL page as SS-Unit and also contains an online copy of the manual. Not unsurprisingly the unit tests for SS-Cop are written using SS-Unit and so you could consider it a more realistic example of how to use SS-Unit - SS-Cop was written using a Test-Driven Development (TDD) approach.

sql2doxygen - SQL Doxygen Input Filter

Funnily enough the chap that wrote DbCop was the same person that introduced me to Doxygen - a tool for generating API style documentation from source code. What attracted me was the fact that it could generate useful documentation even when no effort had been made by the developer! Better still was that with only a small change in your comment style Doxygen could handle so much more - no horrendous tags or mark up like you get with JavaDoc or the standard C# offering. There are still a bunch of special tags if you need them but they don’t detract from the comments which is what you need when you’re actually trying to read them as you understand the source code.

Anyway, Doxygen supports many programming languages out of the box, but sadly not SQL. However it does provide you with a mechanism for supporting any language as long as you can transform the source files into something Doxygen understands; these are called input filters and are configured with the INPUT_FILTER setting. This is the command line of a process or batch file to execute for each source file and in the case of sql2doxygen it’s a PowerShell script that turns T-SQL code into C.

The sql2doxygen .zip package contains the Doxygen filter script and a manual describing what T-SQL styles the script understands. It is of course available from the same page I’ve mentioned twice already above. Both SS-Unit and SS-Cop are documented with Doxygen using the sql2doxygen filter and so examples of the output are available online here and here.

 

[*] These tools are not deployed into production and so if we picked, say, an open source project that went stale and became unusable, or bought a tool for a lesser known vendor that went bust the direct impact on the business is none. There is of course an indirect impact because we use the tools to help us deliver faster and with better quality and that would be compromised.

[+] Although the stuff we knocked up at my client’s is not the kind of thing I could imagine they would ever try and turn into a product (IT is not their business for starters) the code is clearly their property. I also wouldn’t know where to start trying to get permission to release it. Then there is the whole Fred Brooks approach which is to just throw it away anyway and learn from your mistakes instead. And this is the approach I have decided to take here.

Monday, 28 November 2011

Null String Reference vs Empty String Value

One of the first things I found myself in two minds about when making the switch from C++ to C# was what use to represent an empty string - a null reference or an empty string (i.e. String.Empty).

C++ - Value Type Heaven

In C & C++ everything is a value type by default and you tend to avoid dynamically allocated memory as a matter of course, especially for primitive types like an int. This leads to the common technique of using a special value to represent NULL instead:-

int pos = -1;

Because of the way strings in C are represented (a pointer to an array of characters) you can use a NULL pointer here instead, which chalks one up for the null reference choice:-

int pos = -1;
const char* name = NULL;

But in C++ you have a proper value type for strings - std::string - so you don’t have to delve into the murky waters of memory management[*]. Internally it still uses dynamic memory management[+] to some degree and so taking the hit twice (i.e. const std::string*) just so you can continue to use a NULL pointer seems criminal when you can just use an empty string instead:-

int pos = -1;
std::string name(“”);

I guess that evens things up and makes the use of a special value consistent across all value types once again; just so long as you don’t need to distinguish between an empty string value and “no” string value. But hey, that’s no different to the same problem with -1 being a valid integer value in the problem domain. The c++ standard uses this technique too (e.g. string::npos) so it can’t be all bad…

If you’re less concerned about performance, or special values are a problem, you can happily adopt the use of the Swiss-army knife that is shared_ptr<T>, or the slightly more discoverable boost::optional<T> to handle the null-ness consistently as an attribute of the type:-

boost::shared_ptr<int> pos;
boost::optional<std::string> name;

if (name)
{
  std::string value = *name;
  . . .

C# - Reference Type Heaven

At this point in my transition from C++ to C# I’m still firmly in the empty string camp. My introduction to reference-based garbage-collected languages where the String type is implemented as a reference-type, but has value-type semantics does nothing to make the question any easier to answer. Throw the idea of “boxing values” into the equation (in C# at least) and about the only thing that you can say is that performance is probably not going to matter as much as you’re used to and so you should just let go. But, you’ll want something a little less raw than doing this though for primitive types:-

object pos = null;
string name = null;

if (pos != null)
{
  int value = (int)pos;
  . . .

If you’re using anything other than an archaic[$] version of C# than you have the generic type Nullable<T> at your disposal. This, with its syntactic sugar provided by the C# compiler, provides a very nice way of making all value types support null:-

int? pos;
string name;

if (pos.HasValue())
{
  int value = pos.Value;
  . . .

So that pretty much suggests we rejoin the “null reference” brigade. Surely that’s case closed, right?

Null References Are Evil

Passing null objects around is a nasty habit. It forces you to write null-checks everywhere and in nearly all cases the de-facto position of not passing null’s can be safely observed with the runtime picking up any unexpected violations. If you’re into ASSERTs and Code Contracts you can annotate your code to fail faster so that the source of the erroneous input is highlighted at the interface boundary rather than as some ugly NullReferenceException later. In those cases where you feel forcing someone to handle a null reference is inappropriate, or even impossible, you you can always adopt the Null Object pattern to keep their code simple.

In a sense an empty string is the embodiment of the Null Object pattern. The very name of the static string method IsNullOrEmpty() suggests that no one is quite sure what they should do and so they’ll just cover all bases instead. One of the first extension methods I wrote was IsEmpty() because I wanted to show my successor that I was nailing my colours to the mast and not “just allowing nulls because I implemented using a method that happens to accept either”.

Sorry, No Answers Here… Move Along

Sadly the only consistency I’ve managed to achieve so far is in and around the Data Access Layer where it seems sensible to map the notion of NULL in the database sense to a null reference. The Nullable<T> generic fits nicely in with that model too.

But in the rest of the codebase I find myself sticking to the empty string pattern. This is probably because most of the APIs I use that return string data also tend to return empty strings rather than null references. Is this just Defensive Programming in action? Do library vendors return empty strings in preference to null references because their clients have a habit of not checking properly? If so, perhaps my indecision is just making it harder for them to come to a definitive answer…

 

[*] Not that you would anyway when you have the scoped/shared_ptr<T> family of templates to do the heavy lifting and RAII runs through your veins.

[+] Short string optimisations notwithstanding.

[$] I know I should know better than to say that given there are people still knee deep in Visual C++ 6 (aka VC98)!

Wednesday, 23 November 2011

I Know the Cost but Not the Value

[I wrote the vast majority of this post “the morning after the night before”. I hope that 9 months later I’ve managed to finish it in a more coherent manner than I started it…]

Back in February I made a rare appearance at the eXtreme Tuesday Club and got to talk to some of the movers and shakers in the Agile world. Professionally speaking I’ve mostly been on the outside of the shop looking in through the window wondering what all the commotion is inside. By attending XTC I have been hoping to try and piece together what I’ve seen and read about the whole “movement” and how that it works in practice. As I documented in an earlier post “Refactoring – Do You Tell Your Boss?” I’ve long been struggling with the notion of Technical Debt and how I can help it get sold correctly to those who get to decide when we draw on it and when it’s paid back.

That night, after probably a few too many Staropramen, I finally got to enter into a discussion with Chris Matts on the subject and what I can’t decide is whether we were actually in agreement or, more likely, that I didn’t manage to explain correctly what I believe my relationship is with regards to identifying Cost and Value; in short, I can the former (Cost), but not the latter (Value). So maybe my understanding of what cost and value are is wrong and that’s why I didn’t quite get what he was saying (although I can’t discount the effects of the alcohol either). This post is therefore an opportunity for me to put out what I (and I’m sure some of my colleagues agree) perceive to be our place in the pecking order and the way this seems to work. I guess I’m hoping this will either provide food for thought, a safe place for other misguided individuals or a forum in which those in the know can educate the rest of us journeymen...

From the Trenches

The following example is based on a real problem, and at the time I tried to focus on what I perceived to be the value in the fixes so that the costs could be presented in a fair way and therefore an informed[*] choice would then be made.

The problem arose because the semantics of the data from an upstream system changed such that we were not processing as much data as we should. The problem was not immediately identified because the system was relatively new and so many upstream changes had been experienced that it wasn’t until the dust started to settle that the smaller issues were investigated fully.

The right thing to do would be to fix the root problem and lean on the test infrastructure to ensure no regressions occurred in the process. As always time was perceived to be a factor (along with a sprinkling of politics) and so a solution that involved virtually no existing code changes was also proposed (aka a workaround). From a functional point of view they both provide the same outcome, but the latter would clearly incur debt that would need to be repaid at some point in the future.

Cost vs Value

It’s conceivable that the latter could be pushed into production sooner because there is notionally less chance of another break occurring. In terms of development time there wasn’t much to choose between them and so the raw costs were pretty similar, but in my mind there was a significant difference in value.

Fixing the actual problem clearly has direct value to the business, but what is the difference in value between it being fixed tomorrow, whilst incurring a small amount of debt, and it being fixed in 3 days time with no additional debt? That is not something I can answer. I can explain that taking on the debt increases the risk of potential disruption to subsequent deliveries but only they are in a position to quantify what a slippage in schedule would cost to them.

Perhaps I’m expected to be an expert in both the technology and the business? Good luck with that. It feels to me that I provide more value to my customer by trying to excel at being a developer so that I can provide more accurate estimates, which are very tangible, than trying to understand the business too in the hope of aiding them to quantify the more woolly notion of value. But that’s just the age old argument of Generalists vs Specialists isn’t it? Or maybe that’s the point I’m missing - that the act of trying to quantify the value has value in itself? If so, am I still the right person to be involved in doing that?

I’m clearly just starting out on the agile journey and have so much more to read, mull over and discuss. This week saw XP Day which would have been the perfect opportunity to further my understanding but I guess I’ll have to settle for smaller bites at the eXtreme Tuesday Club instead - if I could only just remember to go!

Epilogue

The workaround cited in the example above is finally going to be removed some 10 months later because it is hopelessly incompatible with another change going in. Although I can’t think of a single production issue caused by the mental disconnect this workaround created, I do know of a few important test runs the business explicitly requested that were spoilt because the workaround was never correctly invoked and so the test results were useless. Each one of these runs took a day to set up and execute and so the costs to the development team has definitely been higher. I wonder how the business costs have balanced out?

 

[*] Just writing the word “informed” makes me smile. In the world of security there is the Dancing Pigs problem which highlights human nature and our desire for “shiny things”. Why should I expect my customer to ever choose “have it done well” over “have it done tomorrow” when there are Dancing Pigs constantly on offer?

Tuesday, 22 November 2011

Cookbook Style Programming

Changing a tyre on a car is reasonably simple. In fact if I wasn’t sure how to do it I could probably find a video on YouTube to help me work through it. Does that now make me a mechanic though? Clearly not. If I follow the right example I might even do it safely and have many happy hours of motoring whilst I get my tyre fixed. But, if I follow an overly simplistic tutorial I might not put the spare on properly and cause an imbalance, which, at best will cause unnecessary wear on the tyre and at worst an accident. Either way the time lapse between cause and effect could be considerable.

If you’re already thinking this is a rehash of my earlier post “The Dying Art of RTFM”, it’s not intended to be. In some respects it’s the middle ground between fumbling in the dark and conscious incompetence...

The Crowbar Effect

The StackOverflow web site has provided us with a means to search for answers to many reoccurring problems, and that’s A Good Thing. But is it also fostering a culture that believes the answer to every problem can be achieved just by stitching together bunches of answers? If the answer to your question always has a green tick and lots of votes then surely it’s backed by the gurus and so is AAA grade; what can possibly be wrong with advice like that?

while (!finished)
{
  var answer = Google_problem(); 
  answer.Paste();
}

At what point can you not take an answer at face value and instead need to delve into it to understand what you’re really getting yourself into?

For example, say I need to do some database type stuff and although I’ve dabbled a bit and have a basic understanding of tables and queries I don’t have any experience with an ORM (Object Relational Mapper). From what I keep reading an ORM (alongside an IoC container) is a must-have these days for any “enterprise” sized project and so I’ll choose “Super Duper ORM” because it’s free and has some good press.

So I plumb it in and start developing features backed by lots of automated unit & integration tests. Life is good. Then we hit UAT and data volumes start to grow and things start creaking here and there. A quick bit of Googling suggests we’re probably pulling way too much data over the wire and should be using Lazy Loading instead. So we turn it on. Life is good again. But is it? Really?

Are You Qualified to Make That Decision?

Programmers are problem solvers and therefore it’s natural for them to want to see a problem through to the end, hopefully being the one to fix it on the way. But fixing the problem at hand should be done in a considered manner that weighs up any trade-offs; lest the proverbial “free-lunch” rears its ugly head. I’m not suggesting that you have to analyse every possible angle because quite often you can’t, in which case one of the trade-offs may well be the very fact that you cannot know what they are; but at least that should be a conscious decision. Where possible there should also be some record of that decision such as a comment in the code or configuration file, or a maybe a task in the bug database to revisit the decision at a later date when more facts are available. If you hit a problem and spent any time investigating it you can be pretty sure that your successor will too; but at least you’ll have given them a head start[*].

One common source of irritation is time-outs. When a piece of code fails due to a timeout the natural answer seems to be to just bump the timeout up to some ridiculous value without regard to the consequences. If the code is cut-and-pasted from an example it may well have an INFINITE timeout which could make things really interesting further down the line. For some reason configuration changes do not appear to require the same degree of consideration as a code change and yet they can cause just as much head-scratching as you try to work out why one part of the system ended up dealing with an issue that should have surfaced elsewhere.

Lifting the Bonnet

I’ll freely admit that I’m the kind of person who likes to know what’s going on behind the scenes. If I’m going to buy into some 3rd party product I want to know exactly how much control I have over it because when, not if, a problem surfaces I’m going to need to fix it or work around it. And that is unlikely to be achievable without giving up something in return, even if its psychological[+] rather than financial.

However I accept that not everyone is like that and some prefer to learn only enough to get them through the current problem and onto the next one; junior programmers and part-time developers don’t know any better so they’re [partly] excused. In some environments where time-to-market is everything that attitude is probably even highly desirable, but that’s not the environment I’ve ended up working in.

It’s Just a Lack of Testing...

It’s true that the side-effects of many of the problems caused by this kind of mentality could probably be observed during system testing - but only so long as your system tests actively invoke some of the more exceptional behaviour and/or are able to generate the kinds of excessive loads you’re hoping to cope with. If you don’t have the kind of infrastructure available in DEV and/or UAT for serious performance testing, and let’s face it most bean counters would see it as the perfect excuse to eliminate “waste”, you need to put even more thought into what you’re doing - not less.

 

[*] To some people this is the very definition of job security. Personally I just try hard not to end up on the The Daily WTF.

[+] No one likes working on a basket-case system where you don’t actually do any development because you spend your entire time doing support. I have a mental note of various systems my colleagues have assumed me I should avoid working on at all costs...

Saturday, 19 November 2011

PowerShell & .Net - Building Systems as Toolkits

I first came across COM back in the mid ‘90s when it was more commonly known by the moniker OLE2[*]. I was doing this in C, which along with the gazillion of interfaces you had to implement for even a simple UI control, just made the whole thing hard. I had 3 attempts at reading Kraig Brockschmidt’s mighty tome Inside OLE 2 and even then I still completely missed the underlying simplicity of the Component Object Model itself! In fact it took me the better part of 10 years before I really started to appreciate what it was all about[+]. Not unsurprisingly that mostly came about with a change in jobs where it wasn’t C++ and native code across the board.

COM & VBScript

Working in a bigger team with varied jobs and skillsets taught me to appreciate the different roles languages and technologies play. The particular system I was working on made heavy use of COM internally, purely to allow the VB based front-end and C++ based back-ends communicate. Sadly the architecture seemed upside down in many parts of the native code and so instead of writing C++ that used STL algorithms against STL containers with COM providing a layer on top for interop, you ended up seeing manual ‘for’ loops that iterated over COM collections of COM objects that just wrapped the underlying C++ type. COM had somehow managed to reach into the heart of the system.

Buried inside that architecture though was a bunch of small components just waiting to be given another life - a life where the programmer interacting with it isn’t necessarily one of the main development team but perhaps a tester, administrator or support analyst trying to diagnose a problem. You would think that automated testing would be a given in such a “component rich” environment, but unfortunately no. The apparent free-lunch that is the ability to turn COM into DCOM meant the external services were woven tightly into the client code - breaking these dependencies was going to be hard[$].

One example of where componentisation via COM would have been beneficial was when I produced a simple tool to read the compressed file of trades (it was in a custom format) and filter it based on various criteria, such as trade ID, counterparty, TOP 10, etc. The tool was written in C++, just like the underlying file API, and so the only team members that could work on it were effectively C++ developers even though the changes were nearly always about changing the filtering options; due to its role as a testing/support tool any change requests would be well down the priority list.

What we needed was a component that maintains the abstraction - a container of trades - but could be exposed, via COM, so that the consumption of the container contents could just as easily be scripted; such as with VBScript which is a more familiar tool to technical non-development staff. By providing the non-developers with building blocks we could free up the developers to concentrate on the core functionality. Sadly, the additional cost of exposing that functionality via COM purely for the purposes of non-production reasons is probably seen as too high, even if the indirect benefits may be an architecture that lends itself better to automated testing which is a far more laudable cause.

PowerShell & .Net

If you swap C#/.Net for C++/COM and PowerShell for VBScript in the example above you find a far more compelling prospect. The fact that .Net underpins PowerShell means that it has full access to every abstraction we write in C#, F#, etc. On my current project this has caused me to question the motives for even providing a C# based .exe stub because all it does is parse the command line and invoke a bootstrap method for the logging, configuration etc. All of this can, and in fact has been done in some of the tactical fixes that have deployed.

The knock-on effects of automated testing right up from the unit, through integration to system level means that you naturally write small loosely-coupled components that can be invoked by a test runner. You then compose these components into ever bigger ones, but the invoking stub, whether a test harness or production process rarely changes because all it does is invoke factories and stitch together a bunch of service objects before calling the “logical” equivalent of main.

Systems as Toolkits

Another way of building a system then is not so much as a bunch of discrete processes and services but more a toolkit from which you can stitch together the final behaviour in a more dynamic fashion. It is this new level of dynamism that excites me most because much of the work in the kind of batch processing systems I’ve worked on recently has focused on the ETL (Extract/Transform/Load) and reporting parts of the picture.

The systems have various data stores that are based around the database and file-system that need to be abstracted behind a facade to protect the internals. This is much harder in the file-system case because it’s too easy to go round the outside and manipulate files and folders directly. By making it easier to invoke the facade you remove the objections around not using it and so retain more control on the implementation. I see a common trend of moving from the raw file-system to NoSQL style stores that would always require any ad-hoc workarounds to be cleaned up. This approach provides you with a route to refactoring your way out of any technical debt because you can just replace blobs of script code with shiny new unit-tested components that are then slotted in place.

On the system I’m currently working with the majority of issues seem to have related to problems caused by upstream data. I plan to say more about this issue in a separate post, but it strikes me that the place where you need the utmost flexibility is in the validation and handling of external data. My current project manager likens this to “corrective optics” ala the Hubble Space Telescope. In an ideal world these issues would never arise or come out in testing, but the corporate world of software development is far from ideal.

Maybe I’m the one wearing rose tinted spectacles though. Visually I’m seeing a dirty great pipeline, much like a shell command line with lots of greps, sorts, seds & awks. The core developers are focused on delivering a robust toolkit whilst the ancillary developers & support staff plug them together to meet the daily needs of the business. There is clearly a fine line between the two worlds and refactoring is an absolute must if the system is to maintain firm foundations so that the temptation to continually build on the new layers of sand can be avoided. Sadly this is where politics steps in and I step out.

s/PowerShell/IronXxx/g

Although I’ve discussed using PowerShell on the scripting language side the same applies to any of the dynamic .Net based languages such as IronPython and IronRuby. As I wrote back in “Where’s the PowerShell/Python/IYFSLH*?” I have reservations about using the latter for production code because there doesn’t appear to be the long-term commitment and critical mass of developers to give you confidence that it’s a good bet for the next 10 years. Even PowerShell is a relative newcomer on the corporate stage and probably has more penetration into the sysadmin space than the development community which still makes it a tricky call.

The one I’m really keeping my eye on though is F#. It’s another language whose blog I’ve followed since its early days and even went to a BCS talk by its inventor Don Syme back in 2009. It provides some clear advantages to PowerShell such as its async behaviour and now that Microsoft has put its weight behind F# and shipped it as part of the Visual Studio suite you feel it has staying power. Sadly its functional nature may keep it out of those hands we’re most interested in freeing.

I’ve already done various bits of refactoring on my current system to make it more amenable for use within PowerShell and I intend to investigate using the language to replace some of the system-level test consoles which are nothing but glue anyway. What I suspect will be the driver for a move to a more hybrid model will be the need automate further system-level tests, particularly of the regression variety. The experience gained here can then feed back into the main system development cycle to act as living examples.

 

[*] In some literature it stood for something - Object Linking & Embedding, and in others it just was the letters OLE. What was the final outcome, acronym or word?

[+] Of course by then .Net had started to take over the world and [D]COM was in decline. This always happens. Just as I really start to get a grip on a technology and finally begin to understand it the rest of the world has moved on to bigger and better things…

[$] One of the final emails I wrote (although some would call it a diatribe) was how the way COM had been used within the system was the biggest barrier to moving to a modern test driven development process. Unit testing was certainly going to be impossible until the business logic was separated from the interop code. Yes, you can automate tests involving COM, but why force your developers to be problem domain experts and COM experts? Especially when the pool of talent for this technology is shrinking fast.

Tuesday, 15 November 2011

Merging Visual Studio Setup Projects - At My Wix End

When the Windows Installer first appeared around a decade ago (or even longer?) there was very little tooling around. Microsoft did the usual thing and added some simple support to Visual Studio for it (the .vdproj project type); presumably keeping it simple for fear of raising the ire of another established bunch of software vendors - the InstallShield crowd. In the intervening years it appears to have gained a few extra features, but nothing outstanding. As a server-side guy who has very modest requirements I probably wouldn’t notice where the bells and whistles have been added anyway. All I know is that none of the issues I have come across since day one have ever gone away...

Detected Dependencies Only Acknowledges .DLLs

In a modern project where you have a few .exes and a whole raft of .dlls it’s useful that you only have to add the .exes and Visual Studio will find and add all those .dll dependencies for you. But who doesn’t ship the .pdb files as well*? This means half the files are added as dependencies and the other half I have to go and add manually. In fact I find it easier to just exclude all the detected dependencies and just manually add both the .dll and .pdb; at least then I can see them listed together as a pair in the UI and know I haven’t forgotten anything.

Debug & Release Builds

In the native world there has long been a tradition of having at least two build types - one with debug code in and one optimised for performance. The first is only shipped to testers or may be used by a customer to help diagnose a problem, whereas the latter is what you normally ship to customers on release. The debug build is not just the .exe, but also the entire chain of dependencies as there may be debug/release specific link-time requirements. But the Setup Project doesn’t understand this unless you make it part of your solution and tie it right into your build. Even then any third party dependencies you manually add causes much gnashing of teeth as you try and make the “exclude” flag and any subsequent detected dependencies play nicely together with the build specific settings.

Merging .vdproj Files

But, by far my biggest gripe has always been how merge-unfriendly .vdproj files are. On the face of it, being a text file, you would expect it to be easy to merge changes after branching. That would be so if Visual Studio stopped re-generating the component GUID’s for apparently unknown reasons, or kept the order of things in the file consistent. Even with “move block detection” enabled in WinMerge often all you can see is a sea of moved blocks. One common (and very understandable) mistake developers make is to open the project without the binaries built and add a new file which can cause all the detected dependencies to go AWOL. None of this seems logical and yet time and time again I find myself manually merging the changes back to trunk because the check-in history is impenetrable. Thanks goodness we put effort into our check-in comments.

WiX to the Rescue

WiX itself has been around for many years now and I’ve been keen to try it out and see if it allows me to solve the common problems listed above. Once again for expediency my current project started out with a VS Setup Project, but this time with a definite eye on trying out WiX the moment we got some breathing space or the merging just got too painful. That moment finally arrived and I’m glad we switched; I’m just gutted that I didn’t do the research earlier because it’s an absolute doddle to use! For server-side use where you’re just plonking a bunch of files into a folder and adding a few registry keys it almost couldn’t be easier:-

<?xml version="1.0"?>
<Wix xmlns="
http://schemas.microsoft.com/wix/2006/wi"> 
  <Product Id="12345678-1234-. . ." 
           Name="vdproj2wix" 
           Language="1033" 
           Version="1.0.0" 
           Manufacturer="Chris Oldwood" 
           UpgradeCode="87654321-4321-. . .">

    <Package Compressed="yes"/>

    <Media Id="1" Cabinet="product.cab" 
                                    
EmbedCab="yes"/>

    <Directory Name="SourceDir" Id="TARGETDIR"> 
      <Directory Name="ProgramFilesFolder"
                             Id="ProgramFilesFolder"> 
        <Directory Name="Chris Oldwood" Id="_1"> 
          <Directory Name="vdproj2wix" Id="_2"> 
            <Component Id="_1" Guid="12341234-. . ."> 
              <File Source="vdproj2wix.ps1"/> 
              <File Source="vdproj2wix.html"/> 
            </Component> 
          </Directory> 
        </Directory> 
      </Directory> 
    </Directory>

   
<Feature Id="_1" Level="1"> 
      <ComponentRef Id="_1"/> 
    </Feature>

  </Product>
</Wix>

The format of the .wxs file is beautifully succinct, easy to diff and merge, and yet it supports many advanced features such as #includes and variables (for injecting build numbers and controlling build types). I’ve only been using it recently and so can’t say what versions 1 & 2 were like but I can’t believe it’s changed that radically. Either way I reckon they’ve got it right now.

Turning a .wxs file into an .msi couldn’t be simpler either which also makes it trivial to integrate into your build process:-

candle vdproj2wix.wxs
if errorlevel 1 exit /b 1

light vdproj2wix.wixobj
if errorlevel 1 exit /b 1

My only gripe (and it is very minor) is with the tool naming. Yes it’s called WiX and so calling the (c)ompiler Candle and the (l)inker Light is cute the first few times but now the add-ons feel the need to carry on the joke which just makes you go WTF? instead.

So does it fix my earlier complaints? Well, so far, yes. I’m doing no more manual shenanigans than I used to and I have something that diffs & merges very nicely. It’s also trivial to inject the build number compared with the ugly VBScript hack I’ve used in the past to modify the actual .msi because the VS way only supports a 3-part version number.

Converting Existing .vdproj Files to .wxs Files

Although as I said earlier the requirements of my current project were very modest I decided to see if I could write a simple script to transform the essence (i.e. the GUID’s & file list) of our .vdproj files into a boiler-plate .wxs file. Not unsurprisingly this turned out to be pretty simple and so I thought I would put together a clean-room version on my web site for others to use in future - vdproj2wix. Naturally I learnt a lot more about the .wxs file format in the process** and it also gave me another excuse to learn more about PowerShell.

If you’re now thinking that this entire post was just an excuse for a shameless plug of a tiny inconsequential script you’d be right - sort of. I also got to vent some long standing anger too which is a bonus.

So long .vdproj file, I can’t say I’m going to miss you...

* OK, in a .Net project there is less need to ship the .pdbs but in a native application they (or some other debug equivalent such as .dbg files) are pretty essential if you ever need to deal with a production issue.

** Curiously the examples in the tutorial that is linked to from the WiX web site have <File> elements with many redundant attributes. You actually only need the Source attribute as a bare minimum.