Monday, 1 April 2013

My ACCU 2013 Conference Session - Robust Software

I once again have the pleasure of being accepted to speak at this year’s ACCU conference in Bristol. My session is on the Thursday (11th April) and is titled “Robust Software – Dotting the I’s and Crossing the T’s”. Here is the full conference schedule and below that the abstract for my talk.

http://accu.org/index.php/conferences/accu_conference_2013/accu2013_schedule

It’s been said that the first 90% of a project consumes 90% of the time, whereas the second 10 % accounts for the other 90% of the time. One reason might be because elevating software from “mostly works” to robust and supportable requires an attention to detail in the parts of a system that are usually mocked out during unit testing. It’s all too easy to focus on testing the happy paths and gloss over the more tricky design problems such as how to handle a full disk or Cheshire cat style network.

This session delves into those less glamorous non-functional requirements that crop up the moment you start talking to hard disks, networks, databases, etc. Unsurprisingly it will have a fair bit to say about detecting and recovering from errors; starting with ensuring that you generate them correctly in the first place. This will undoubtedly lead on to the aforementioned subject of testing systemic effects. Finally there will also be diversions into the realms of monitoring and configuration as we look into the operational side of the code once it’s running.

At the end you will hopefully have smiled at the misfortune of others (mostly me) and added a few more items to the ever growing list of “stuff I might have to think about when developing software”.

Friday, 29 March 2013

[DataMember] Hides Dead Code From ReSharper

18 months ago I wrote a post describing how unit tests obscure dead code. The solution I found was to unload all the test projects and see what ReSharper then threw up. I’ve since discovered another case where ReSharper will not warn you about unused members - the [DataMember] attribute.

If you’ve done any serialization in C# I’m sure you’ve come across the pair of attributes DataContract and DataMember. You annotate your classes [1] with them like so to allow them to be serialized (e.g. with WCF):-

[DataContract]
public class MyType
{
  [DataMember]
  public string MyValue { get; private set; }
}

Whilst doing the refactoring I mentioned in my last post one of the changes I was going to make was to remove the [DataMember] attribute from a property that did not need to be serialized because it was a cached value. A few seconds after removing the attribute, the name when dark grey and ReSharper pointed out it was not used. Somewhat bemused I undid my change and lo and behold ReSharper stopped pointing out it was redundant.

I can only imagine (or I could read the manual) that this behaviour is because ReSharper has to assume the property is accessed via reflection and so in some codebases it’s likely to report a false positive. To a degree we already have this problem with the classes we use with the Plossum command line parser because they are only poked by reflection, although I set a default value in the constructor in this case to placate ReSharper/FxCop.

For the moment, until I follow my own advice and RTFM, I’m just temporarily commenting out the [DataMember] attributes on all properties and waiting a few seconds to give ReSharper time to think. Then I put the attributes back again where R# thinks they are in use. Of course they may only be in use because of the unit tests, which is quite likely because I always write a unit test to verify that a class correctly supports serialization [2]. However I’m hopeful that those clever JetBrains developers will one day allow me to have my cake and eat it too.

 

[1] I’ve never been entirely comfortable with this approach to serialization as it feels wrong to be adding the responsibility for serialization to the class itself. In C++ your serialization code was often orthogonal to the class itself as you might read and write multiple formats. These two attributes don’t seem overly invasive but the XML serialization attributes certainly appear to be.

[2] All it does is create an instance of the class, serialize and deserialize with DataContractSerializer and then compare the properties against the values used to construct the original instance. Any derived properties are also verified to ensure that an internal OnDeserialized() method has been added where necessary.

Thursday, 28 March 2013

Property vs Method - Ownership Semantics

I was recently doing a spot of refactoring and came across some code that used an object that was backed by a native resource in a buggy way. I can see how the situation came about but have not so far come across any description of a convention that suggests how it should be done. Perhaps it’s blindingly obvious to everyone else, but in case it’s not or my argument is flawed, this is my take on the matter...

The Bug

Imagine you have a C# managed type that is a wrapper around a native type which, because it can be very large and is native needs managing carefully in the C# world using the Dispose pattern. If you want something more concrete imagine you have a managed library that wraps a native library to allow you to load web pages and manipulate the Document Object Model (DOM).

public class DOM : IDisposable
{
  public static DOM LoadFile(string filename)
  {
    // Invoke native code to load file.
  }

  public void Dispose()
  {
    // Invoke native code to cleanup.
  }
}

So, given that this is a resource that generally needs careful management [1] the default stance will be to apply a Using block to ensure it’s correctly cleaned up at the end of use, whether that be through normal or exceptional circumstances:-

using (var dom = DOM.LoadFile(filename))
{
  // Manipulate the DOM to your hearts content…
}

Now, what about when you don’t create this resource directly but instead you acquire it from another object, perhaps via a property or a method. Absent any documentation or looking at the property/method implementation would you expect to take ownership or not? For the sake of the example code I’m going to suggest that the type from which we acquire this DOM is another C# managed class called WebPage [2].

var page = new WebPage(. . .);

// Access DOM from a property - taking ownership?
var dom = page.DOM;

// Access DOM from a method - taking ownership?
var dom = page.ToDOM();

The code I came across was written like this:-

using (var dom = page.DOM)
{
  // Manipulate the DOM and directly dispose it
}

But the implementation of the WebPage’s DOM property looked like this:-

public class WebPage
{
  private DOM m_dom = null;

  public DOM
  {
    get
    { 
      if (m_dom == null) 
        m_dom = CreateDom(); 

      return m_dom;
    }
  }
}

Naturally, being legacy code, the container class didn’t implement the Dispose pattern itself and so you could never have managed the resource properly even if you wanted to. But I’m in the middle of a refactoring and wandering what to do next. Plus, if possible, I want to lay down (or better yet promote an existing one) a convention about when ownership is likely to be transferred. And, when it isn’t what the containing class needs to do instead.

ToXxx() Method Passes Ownership

In the particular refactoring I was looking at the class was really acting like a Builder [2] and so I changed the property to a method and renamed it ToXxx(), where Xxx was the type. Essentially it was a type conversion, like ToString(), and that also meant ownership should be passed to the caller. This also meant I didn’t have to fix the code structure of the buggy caller; I only had to do the renaming.

Going back to my example above this leads to the following implementation and style of invocation:-

public class WebPage
{
  public DOM ToDom()
  {
    return CreateDom();
  }
}

// In the caller
using (var dom = page.DOM)
{
  // Manipulate the DOM and directly dispose it
}

Naturally this is the simplest implementation for both the container and caller.

Xxx Property Retains Ownership

In the past I’ve also found myself taking the property approach. A property is usually a class attribute held by containment and the obvious question might be why something this complex was being exposed directly in the first place. In the places I’ve used it’s because the containing class is essentially a Facade over an underlying complex object. In an idea world it would remain purely an implementation detail but legacy code (until refactored) often demands access to the underlying object in the mean time.

In these cases I have decided to retain ownership for performance reasons and expose the legacy object through a property. This is what the WebPage class in the example above was alluding too. But what it missed was the necessity of implementing the Dispose pattern itself so that it could delegate cleanup to the underlying object at the right time. So it should have looked like this:-

public class WebPage : IDisposable
{
  private DOM m_dom = null;

  public DOM
  {
    get
    { 
      if (m_dom == null) 
        m_dom = CreateDom(); 

      return m_dom;
    }
  }

  public void Dispose()
  {
    if (m_dom != null)
      m_dom.Dispose();
  }
}

// In the caller
using (var page = new WebPage(. . .))
{
  var dom = page.DOM;

  // Manipulate the DOM and indirectly dispose it
}

This is a little more effort for the callee and a fair bit more for the containing class. It probably seems a lot more effort to C# developers who are only used to managing simple resources like files and database connections in a limited scope. Annoyingly in C++ you have types like std::shared_ptr<> that just make this a nonissue, in fact after const it’s probably the second biggest thing I miss.

I would have hoped that a static analysis tool like ReSharper would have pointed out the need to implement IDisposable originally, but it doesn’t. I don’t remember FxCop picking it up either but I’ve not looked at FxCop recently and I suspect both would generate way too much noise for most normal users. Perhaps there is a setting I need to turn on instead? What I’m after (should a more knowledgeable person be reading this) is a warning when one type aggregates another (IDisposable) type and doesn’t implement IDisposable itself. In library and server-side code I assume that any type implementing IDisposable always requires its lifetime to be tightly controlled as you have no idea how tight or lapse the constraints are for the hosting process.

 

[1] Yes, in many cases where the objects are quite small we can let the garbage collector weave “it’s magic” and be blissfully unaware of the memory pressures created by the native goings on. But in many cases these babies are multi-megabyte monsters that have a serious impact on the (32-bit) process’s footprint.

[2] Try and ignore for the moment the possibly obvious (to you) thought that it looks like the Builder pattern and therefore ownership semantics are probably obvious. This is legacy code here, these rules don’t apply until after I’ve refactored it.

Thursday, 21 March 2013

You Want Windows 98 Support in 2013?

A few weeks back I got an email from a chap in Australia who wanted to know if I could fix one of my DDE tools (DDE Command) to work under Windows 98 SE. After a quick check of the calendar to make sure I hadn’t entered a time warp I couldn’t help but be a little curious about where this mysterious Windows box was running and why it couldn’t be upgraded…

It turned out that this machine was fitted with a propriety ISA card (remember those?) for which support was discontinued in 1996! This chap seemed to be going through heroic efforts to keep it running by using Excel to grab data from the card via DDE. He then stumbled upon my command line tool which worked fine on Windows 7 (as it does right back to Windows 2000), but was failing on Windows 98 like this:-

C:\> ddecmd servers
ERROR: Failed to query DDE servers: DMLERR_DLL_NOT_INITIALIZED

I had not seen that kind of error before and it’s pretty fundamental too - the underlying DDEML library was apparently not initialised. All ddecmd commands seemed to report the same problem. Luckily I had more than an inkling of what it might be because the tool is pretty simple and my DDE classes have barely changed in over decade. Also I didn’t really fancy trying to cobble together a VM and source a Windows 98 licence just to help this chap out. One of the reasons I provide the source code to all my tools is exactly to allow someone to find their own (paid) support if I can’t accommodate them myself.

One of the key differences between the Windows 95/98/ME lineage and the NT/2K/XP/Vista/etc one is that the former is ANSI internally whereas the latter is Unicode. Naturally for backwards compatibility reasons the NT line can also run ANSI binaries too, with a slight overhead as it translates back and forth as required. One of the decisions I made back in February 2008 was to switch to Unicode builds by default; however the Windows build targets were still left as they were when I first ported my libraries to Win32 back in the mid ‘90’s!

#define WINVER         0x0400   //! Windows 95+
#define _WIN32_WINNT   0x0400   //! Windows NT 4.0+
#define _WIN32_WINDOWS 0x0400   //! Windows 95+
#define _WIN32_IE      0x0400   //! IE 4.0+

Luckily I have no need for anything more fancy, especially with my command line tools so it was a simpler matter of flicking the switch in Visual C++ (7.1, aka VS2003, is still my favoured version) and out popped an ANSI build. In the meantime because of the time lag between the UK and Australia I suggested to the chap that he download my GUI based DDE tool (DDEQuery) and try that. I’d remembered that this tool hasn’t been touched since 2005 and so the binary would have been an ANSI build anyway. Unsurprisingly, it worked. So I shipped off an ANSI build of DDE Command and it was “case closed”.

Every time I think it’s time to leave the past behind and get-with-the-program a question like this pops up and it feels great to still be able to support such an old OS. Although I still use VC++ 7.1 (on XP) for my personal C++ code, I still compile it with modern versions of VC++ and GCC to gain access to all that extra static analysis so that someone can just pick it up if needs be. My day job now consists of writing C# and although it’s currently .Net 3.5 I have my sights firmly set on .Net 4.5 and C# 5 as I’m no luddite.

Even though the corporate desktop standard is still Windows XP in many organisations and we now have to suffer all those annoying “you’re using an outdated browser version” banners, it feels good to know that there are others out there that have it so much worse.

Wednesday, 6 February 2013

Code by Day, Design by Night

There is this wonderful anecdote in the book Peopleware by Tim Lister and Tom De Marco:-

One day, while Wendl was staring into space pondering problems of extreme complexity with his feet propped up on the desk. Their boss came in and asked, "Wendl! What are you doing?" Wendl said, "I'm thinking." And the boss said, "Can't you do that at home?"

When I read it, it reminded me of another remark made by a project manager I once had. I can’t remember the ins and outs but essentially he remarked about how funny it was that someone he knew (of) was paid for his “thinking time”. Naturally, being just a mere “resource” I followed suit and indicated my surprise too. But not because I couldn’t imagine such a practice, but because at that point I began to realise how I’d been partitioning my own day to avoid such a confrontation.

At the time I hadn’t read Peopleware and so when I got home I asked my wife, who is a producer that spends a fair bit of time dealing with “creative” people, whether the people she worked with got paid “thinking time” too. I seem to remember her reaction was one of slight bewilderment due to the notion that people who do creative work can somehow perform it without doing any “thinking”.

There is a long standing debate[1] around whether (or how much of) programming is Science, Maths, Engineering, Art, etc. For me personally, given the type of work I do, I’d probably be in the “mostly engineering” camp, but am aware that “creativity” also plays a significant role at times. And that is probably why, when I read that anecdote in Peopleware, I smiled because I can see the same reaction from some of the people I’ve worked with. To them, what we do is pure Engineering. We just follow the various best practices and established Design Patterns (along with cut and pasting solutions from StackOverflow) and turn the handle on the meat grinder to churn out another system that solves the customer’s problem. In the supposedly uber-safe corporate environment you don’t take risks, and therefore there is obviously very little reason to ever “get creative”.

For me perhaps that perception is one of my own making. After all, during the hours of 9 and 5 I spend the day “writing code”, answering questions and generally getting stuff[2] done. I can’t imagine sitting at my desk gazing into outer space whilst I contemplate how I’m going to solve some tricky problem - it’s just not the done thing. Techniques like TDD are great at helping you triangulate to a solution, but only once you have some vague idea where you’re heading. And that’s a large part of the battle - having some idea how you’re going to tackle a particular problem. Often it involves drawing on your Google Fu to find the answer, a spark that leads you in a new direction, or recognising the similarity elsewhere in the code that brings the problem into focus.

Sometimes though you just don’t have an obvious direction, and at those times I’ve found that rather than get blocked (and therefore appear to be doing nothing) I’d prefer to find some tangential work to do instead. Then I can use a forthcoming break, either for coffee, lunch, or going home as the distraction I need to take myself away from “being busy” to allowing my conscious and subconscious to get to work on the gnarly problem I’ve been procrastinating over.

Working this way draws on the popular concept of The Lazy Programmer. This term is not meant to be a derogatory one to describe those within the profession that clearly cut corners to get their work done, but more an attitude that tries to suggest how we need to work smarter - not harder. The alternative is to “go dark” and plug away relentlessly at keyboard, somewhat akin to The Infinite Array of Monkeys, until “the solution” eventually drops out the end. That’s really not my style - I prefer inspiration to perspiration.

If you’re driving a car and you get lost, assuming you’re not “a stereotypical man”, you get the map out or flag down a passerby to help orientate yourself or get new directions. So it’s natural therefore to talk to your teammates and discuss problems with them, no? And yet that’s something I’ve found myself do much less of, not because I don’t want to, but because in the corporate environment there is the (perceived) expectation of autonomy[3] that almost forces you to solve everything by yourself. After all, if you didn’t, what on Earth are they paying you for? Or maybe, and perhaps this is a very British thing, we do not always feel comfortable interrupting our colleagues?

Over the years I’ve learnt to break my workload down into such small tasks that I find any interruption easy to cope with. If an email pops up or a message via IRC or someone swings by the desk I can usually deal with it there and then. The notion of Flow, another concept that Peopleware discusses in detail, is something I just do without during the day. The way I’ve come to see it is that is someone else is blocked and I can help then out, then its the team that wins. Okay, so I get less work done personally, and that has been hard to justify a couple of times in the past, but I just achieve the necessary state of consciousness at another time - whilst cleaning the kitchen or making the tea.

Ultimately it now feels inefficient to use the time sat at my desk to “think about ‘heavy’ stuff” when I could multi-task and do it later. Sadly I’ve not yet learnt to multi-task to the extent that I can do thinking, washing up and hold a (meaningful) conversation with my wife. I suspect it’s also a downside of your job being your hobby too - when exactly do you “switch off”?

 

[1] I’ve lost count of the number of times this has shown up on the ACCU’s main mailing list - accu-general. But I do remember it being a more interesting debate last time as people got right down into the nitty-gritty of what they do and how they work.

[2] Yes, by “stuff” I mostly mean doing support. I might be answering the same question for the umpteenth time, or doing some sysadmin style cleanup. Alternatively it might be a dash of analysis, either of a “new” problem or the results from a recent change. Or writing some documentation, looking at a broken build, etc. Sometimes when things are really quiet I’m known to write some actual code too...

[3] This, I suspect is the reason why pair programming struggles to find ground within The Enterprise. The sheer thought of two people doing one job, despite old adages like “two heads are better than one”, is just incomprehensible to some. After all, isn’t Pair Programming really just a synonym for Job Sharing?

Friday, 1 February 2013

Top 10 Posts So Far

With my blog approaching its 4th anniversary I feel it’s reached the critical mass of posts and time needed before I can look at what the most popular posts have been to date. I know there is a side-bar on the right at the moment displaying this, but I thought it would be useful to capture it so that I can look back in another 4 years and see whether I managed to add anything new or whether it just went downhill from then on.

One site I worked at introduced a somewhat draconian web content filtering policy. One particular category of web sites they blocked were “blogs and personal pages”. I pointed out to the security team that blogs are the modern day Knowledge Base; they are relied on heavily by IT to get their work done. Yes, you have your MSDNs and StackOverflows, but they just aren’t enough. And anyway some people only post answers that contain a link to a blog post they found that answers the original query.

So, looking at the list of most popular posts of mine to date, it’s clearly not the diatribes or random musings that people come across, but those that solve people’s everyday problems. Well, all except for number 10 that is. That anomaly has almost certainly come about as a consequence of generous publicity via Twitter by some of the ACCU’s more followed members[*].

  1. PowerShell, Throwing Exceptions & Exit Codes
  2. Cleaning up svn:mergeinfo Droppings
  3. Simple Database Build & Deployment With SQLCMD
  4. Integration Testing with NUnit
  5. WCF Weirdy – NetTcpBinding, EndpointIdentity and Programmatic Configuration
  6. WCF Service Refusing Connections
  7. SQL Server Cursors - Avoiding the Duplicate FETCH NEXT
  8. Using Visual C++ Together With GCC - Some Alternatives
  9. C#/SQL Integration Testing With NUnit
  10. TODO or TODO Not - There Is No Later

 

[*] Pleasing though it was to get the re-tweets of my original posting announcement, I suspect the 2 additional plugs from @KevlinHenney didn’t do any harm either :-).

Monday, 21 January 2013

The Perils of Interactive Service Account Sessions

It’s common within the Enterprise to run the services (or daemon processes if you prefer) under a special “service account”. This account is often a domain account that has very special privileges and as such no one is generally expected to use it for anything other than running processes that are part of The System. Sometimes you might need to elevate to that account to diagnose a permissions problem, but those occasions should be very rare.

What you want to avoid doing is logging on interactively to a Windows machine using that account[1], such as remotely via MSTSC. What you should do is logon with your own credentials, or better yet those of a “break glass account” and then elevate yourself using, say, the RUNAS tool. This allows you to open a separate command prompt, or run another process under a separate set of credentials - usually the service account, e.g.

C:\> runas /user:chris@domain cmd.exe

There are various switches to control the loading of the user profile, etc. but that is the basic form. Once you have the command prompt or process open you can do your thing in a limited kind of sandbox.

The first reason for not logging in interactively is that by default Terminal Services will only let you have 2 connections open. Given that some developer’s (and admins) have a habit of leaving themselves logged in for extended periods, you invariably have to hunt down who has the connections open and ask one of them to drop out. If one or other user is logged in interactively using the service account it becomes a much harder job of finding out who “owns” that session and, as we’ll see, just toasting their session can be dangerous.

The main problem I’ve come across with logging in is down to the way scheduled tasks that are configured to run using separate credentials (in this case the service account) end up running in the interactive session (even without the “interactive” box checked). If you’ve ever had seemingly random console windows popping up whilst you’re logged i, this could be what they are. If you’re lucky the keyboard focus won’t be stolen, but if it is or you’re clicking with the mouse at the wrong time you can block the I/O to the process by accidentally enabling Quick Edit mode. Or worse yet you hit the close box as it pops into life.

You might notice those effects, but the more deadly one is logging off. If one of these scheduled tasks is running at the time you logoff, it will be killed. You might not notice it at first (especially if it gets scheduled again soon after) but the scheduled task will have a failed status and the very curious error code of 0xC000013A (application terminated by Ctrl+C).

The second issue I’ve seen relates to the service account not picking up changes in Windows AD group membership. I’ve read various chapters from Keith Brown’s excellent book Programming Windows Security (which is admittedly getting a bit long in the tooth) but can’t see why this would happen. Basically the account was removed from an AD group, and then later reinstated. However at the time the account was re-added to the group there was an interactive session on just one machine in the farm.

The other machines in the environment picked up the change, but the one with the interactive session didn’t. I could understand that an existing process might cache the group membership, but even starting new command prompts didn’t help. The scheduled tasks that were running, which were also new processes each time didn’t pick the change up either. After logging the session off and logging straight back on again everything was fine.

Maybe it was a one off, or perhaps it’s a known problem. Either way my Google Fu was clearly letting me down - that and the fact that the key words to describe the problem are about as vague as you can get in IT (group, windows, cache, etc). Hopefully some kind soul will leave a comment one day which explains my experience and brings me closure.

 

[1] I’m sure there are some edge cases to that advice, but I can’t personally remember a time when I needed to actually logon to a machine with a service account. Well, I can, but that’s because the software used to hide the passwords forced me to do it when all I needed was an elevated command prompt. That aside I haven’t.