The OldWood Thing

Wednesday, 19 May 2010

The Dying Art of RTFM

The classic stereotype of men is that they don’t read manuals and wont ask for directions when lost. I’m pleased to say that I exhibit neither of these two tendencies – I’ll take all the help I can thank you very much. In fact I find it incredibly frustrating when I see other developers (and let’s face it, rightly or wrongly, they’re mostly men) thrashing around trying to fix a bug or get some code to compile and they’re ignoring the obvious sources of information – the manual (e.g. books/local MSDN) and Google/Bing/Whatever.

Here’s a recent example. A colleague was trying to work out why at runtime a particular C# method in a system class was throwing an exception that neither of us had encountered before but which implied that the behaviour wasn’t implemented. After doing a quick rebuild to make sure the obvious was taken care of I then put the caret on the method name and hit F1. The MSDN help page for the method clearly stated that this method was not supported in the scenario it was being used. Problem solved. Total time time to resolve it: about 3 minutes.

Obviously not all cases are as simple as this, but there have been so many times that fellow developers have done something and I’ve done a quick Google and pointed out a page that sits in the top 3 of the results that provides the same solution. OK, so maybe I don’t get there first time because I don’t know the optimum search terms, but a quick read of a related page or two (especially if it’s a Stack Overflow page) and I can usually concoct the required magic incantation to find the relevant answer. Of course we’re a little bit lazy sometimes and find it easier to ask the person next to us, I’ve done that too. But that doesn’t excuse not taking the most basic steps to solve your own problems.

I believe ‘Google-ability’ is one of the most important skills a developer can have. This sounds like a classic Grumpy Old Man but I didn’t have the Internet as resource when I started out programming. Fortunately the company I worked for at the time soon gained access to the MSDN on CD and had subscriptions to MSJ, DDJ etc. and access to other online services like CiX. Mining the MSDN quickly became a very useful skill as it not only contained manuals, but white papers and previous articles from MSJ that may not have been directly relevant but did point you in new directions. Naturally this skill directly transferred to searching via Alta-Vista/Lycos/etc. (remember them…) and onto this day with Google.

But how do you discover a candidate’s ‘Google-ability’ skill at interview time? I usually ask interviewees about what they’ve read as I find it’s a good indication of how passionate they are about the job and whether they have that desire for learning. The same goes for blogs and and major online resources like Dr Dobbs. Not being able to name any is an ‘interview smell’ but I’ve yet to come across someone who’s a good programmer that doesn’t read around the subject.

Books are a funny thing these days. I’ve probably bought more in the last couple of years than in the decade before. AbeBooks and Amazon’s second hand marketplace means I can easily pick up a copy of a book that I’m only vaguely interested in for just a few quid – and that’s including shipping it across the Atlantic! I really can’t get on with e-books and reading large volumes of text on screen; blogs are about the limit of my screen based reading. But then I’m a self confessed toilet reader – my technical library is in the downstairs toilet – much to my wife’s chagrin. It’s like vinyl records; there is a ceremony involved with selecting a book from the bookshelf and flicking through the pages, just like removing the vinyl from the sleeve and carefully depositing it on the turntable.

Perhaps this is where that investment in moving the Unconscious Incompetence to mere Conscious Incompetence that I blogged about recently pays off. Is my ability to easily find stuff in the MSDN and via Google just because I have lots of vague terms and concepts floating around in my noggin so that I am more aware of what information is actually out there – whether that be in book form, magazine or on the web? Or is it just that I’m not a real bloke? Should I in fact be feeling inferior because I don’t ignore assistance and just slog on relentlessly instead…

Tuesday, 11 May 2010

Refactoring – Do You Tell Your Boss?

[This is one of the posts that I’ve thought long and hard about whether I should publish. I’ve edited it a few of times to try and ensure that it comes across in the spirit that was intended – to illustrate that it’s a complex issue that often fosters an “Us and Them” style attitude which often leads to resentment on the part of the programmer. Stack Overflow once again comes to my rescue with two beautifully titles questions:- “How do you justify Refactoring work to your penny-pinching boss?” and “How can I convince skeptical management and colleagues to allow refactoring of awful code?”]

I want to be as transparent as possible about my work; if I’m doing something I want that to happen with my boss’ approval. I hope that they trust me to be do the right thing, but will also challenge me to ensure that I’m still being as effective as possible. Although I like to think of myself as being pretty pragmatic, I’m only human and so can get sidetracked or incorrectly prioritise tasks like anyone else.

When it comes to refactoring I’ve found this to be a double-edged sword. Those managers who have a technical background and/or a modern outlook on software development will probably already be aware of concepts like Technical Debt and be willing to allocate time for it in the schedule. But there are also those managers who are are less technical, often described as ‘more aligned with the business’, and for them it’s a much harder sell. To a less technical manager refactoring either sounds like ‘rewriting’ or ‘gold-plating’ and may be dismissed as unnecessary because you are essentially taking something that works and making it continue to work - but in a different way. Given a choice between ‘New Feature X’ and a ‘Better Implemented Feature Y’ they’ll always pick the former - that’s only natural. I’m sure if we software developers were salesman (much like the business people we develop for) we’d have no trouble getting it accepted as a necessary practice, but most of us aren’t and that means we often resort to one of the following choices:-

Don’t do it at all

This isn’t really a choice. If your codebase is already getting you down, doing nothing with only cause you more pain if you care. In the end it’ll win and you’ll be forced to leave and find another job. Hopefully with a more understanding team.

Do it in Your Own Time

Using your own time to improve the codebase might be viable if it’s quite small and could lead to you showing your boss what difference it makes in the long run to adding new features. But even then they may still not ‘get it’. In a larger team and with a larger project this is just an arms race – can you clean up faster than your colleagues can make more mess? If you can’t then you’ll just resent the fact that it’s your own time you’re wasting and in the end be forced to leave and find another job. Hopefully with a more understanding team.

Do it By Stealth

That just leaves doing it by stealth as part of your normal working day. But that means inflating your timescales and lying about what you’re actually doing. This may have the desired effect on the codebase but won’t make you feel at ease inside. You’ll also have to be careful what you email and say to avoid embarrassing questions. Without your teammates behind you, you can only survive like this for so long, eventually it’ll wear you down as you’ll never get the time to tackle the really big issues. In the end you’ll be forced to leave and find another job. Hopefully with…

I’ve been fortunate enough to mostly work for managers that have trusted me to do the right thing; but I’ve also had occasions where I’ve had to hide some refactoring to do what I felt was fundamentally important and always getting sidelined. Not unsurprisingly this seems to be more symptomatic in Big Corporations, at least according to the developers I’ve spoken to about this, but that’s not exactly a statistically valid sample.

One person I asked at the recent ACCU Conference was Roy Osherove because he was speaking about team leading; and he suggested that refactoring is a natural part of the development process so why do I consider it such an issue? In principle I agree with this sentiment, but it I think it misses a key point which is the effect it can have on the project schedule - which is something I do not own. When implementing a feature there is often a quicker way that may incur Technical Debt, and a slower way that avoids any new debt or maybe even allow it to be reduced, plus other variations in between. This implies a sliding timescale for the answer to “How long will it take?” and so that is not something I can always answer with a single figure. I can advise and provide options, but ultimately the choice must surely lie with the customer. Failing that (perhaps the customer isn’t in the right position to make such a decision or doesn’t care) it must be my boss or his boss etc. as they are the ones juggling the budget, schedule and resources balls.

So what is a poor programmer to do? Those people who I feel should make a judgement call on when the opportunity is ripe for refactoring (instead of always picking the ‘crowd pleasing’ items) aren’t inclined to do so; and although I’m probably empowered to do so I feel duty bound to give them first refusal on how I go about my job to ensure I’m acting responsibly and making pragmatic choices to the benefit of the customer.

Saturday, 8 May 2010

My ACCU Conference 2010 Lightning Talk

At this years ACCU 2010 Conference I decided to be brave and enter in for a Lightning Talk. These are 5 minute presentations on pretty much anything you like – although it helps to have some relevance to the hosting event :-) If you so desire you can provide a few background slides, but with only 5 minutes of talking time you probably won’t need many.

The premise of my talk was a homage to the BBC 2 TV comedy show Room 101. Each week a different celebrity tries to convince the host (Paul Merton) that some things they dislike (physical, emotional or whatever) are truly appalling and should be banished from this Earth. Think “Crazy Frog” and you should get the general gist. For my talk I picked three of Microsoft’s products that have somehow managed to attain immortal status in the development world when they should have been retired years ago. The fact that they are all products from Microsoft should suggest nothing about my opinions of Microsoft per se (after all I work solely on their platform through my own choice) but more about the mentality of the companies that have managed to tie themselves so inextricably to the products in question. Funnily enough Udi Dahan sort of covered similar ground himself a few weeks ago – though in a far more intelligent fashion of course.

Given that the slides are going to be up on the ACCU web site I decided that it might be worth adding a little prose to go alongside them and to try and diffuse any flames that might arise from my last slide – should the comment about jQuery be taken at face value. So, if you want to play along at home you can pull up the slides on the ACCU website here, or you can just imagine these bullet points are some slick PowerPoint presentation. Obviously reading the text below won’t have anywhere near the same impact as me standing in front of an audience because what little intonation and comedy timing there was will be completely lost :-) I’m not sure exactly what I said but it went something like this…

Visual SourceSafe

10 Analyze.exe –F, 20 GOTO 10
More chatty than Alan Carr
Working Folder randomiser

My first choice is the proverbial “Punch Bag” of Version Controls Systems. If you Google “Visual SourceSafe” you’ll quickly discover that the most common complaint is about how it has the ability to corrupt your source code database on a regular occurrence. One of the first automated tasks you learn to create is one to run analyze.exe to find and fix any errors - that assumes of course you can persuade Visual Studio to leave the repository alone long enough to actually run the tool.

One sure-fire way to corrupt your database is to use VSS over a VPN. If you don’t manage to kill your network with the excessive SMB traffic caused by the file-system based architecture you’ll eventually be greeted with the legendry “Invalid DOS handle” error message. At this point you’ll be executing the “20 GOTO 10” part of this slide as your entire team “downs tools” once more.

But not all weirdness is the result of repository corruption, sometimes tools like Visual Studio will try and “help you out” by explicitly setting the “Working Folder” for a project which you later moved to another more suitable location. Now your code lives in one part of the hierarchy in VSS but in the working copy it’s back where it started! So now you start deleting all those .scc files along with the usual .ncb and .suo files in hope that you’ll remove whatever it is that seems to link the project with its past life.

Visual C++ 6

Visual Studio 1998 (10 is the new 6)
#if _MSC_VER < 1300
It’s Visual C++

My second choice is Microsoft’s flagship C++ compiler. As you can see this was also know as Visual Studio 1998. Yes, that’s a year from another millennia! So, how many of you bought a copy of Andrei Alexandrescu’s seminal book “Modern C++ Design”? And how many of you put it straight back on the shelf when you discovered that you couldn’t even compile the first example because your compilers’ template support was so broken? This compiler was released before the current standard was finalised and it’s interesting to note that some Microsoft bloggers used the phrase “10 is the new 6” when writing about the upcoming Visual Studio 2010. This was a reference to the supposed performance enhancements going into making VS2010 as nippy as VC6, but it seems more amusing to point out that they are going to make the same mistake again and release a major C++ compiler version just before the next C++ standard is finalised.

Many developers think that Boost is incredibly complex because it uses lots of clever Template Meta-Programming. It’s not. It’s because every class has to be implemented twice. One inside an “#if _MSC_VER < 1300” guard for Visual C++ 6 users and another for every other compiler to use.

It’s interesting to note that many recruitment agencies ask for Visual C++ experience, instead of just plain C++, as if somehow it’s a different language? Mind you, given the number of supposedly experienced C++ developers I’ve interviewed in recent years that still believe the default container type is CArray and not std::vector it makes me wonder if the recruitment consultants aren’t actually right on this one.

Internet Explorer 6

The Corporate Standard
MDI is so 90’s
jQuery is fanning the flames

My final choice is Internet Explorer 6. Despite what the cool kids think, this is still the standard web browser in many large corporations. Putting popup iframes on a web site that tells me I’m using a lame browser doesn’t help me fell any better either – if I could do something about it I would. In one organisation you get a virus warning if you try to install a more modern browser like Firefox.

I’ve come to believe that Tabbed Browsing is a right and not a privilege. How I am supposed to adhere to the company’s clean desk policy when IE litters my desktop with Windows and my taskbar with little ‘e’ icons?

I partially lay the blame for the continued existence of IE6 at the feet of jQuery. It used to be hard to support because you needed to know how to code around all its little quirks and foibles. jQuery now makes it easy to support again. Please stop doing that! We want it to be hard so that we have a reason to get off this merry-go-round.

Epilogue

Sadly all three products look like they’re going to be with us for some time yet. So, would anyone like to wager a bet on which we’ll see first - the death of one of these atrocities or the first Liberal Democrat government? [*]

[*] Apparently politics was out of bounds, but I felt this incredibly cheap shot was far too good to pass up :-)

Friday, 30 April 2010

Happy Birthday, Blog

Today sees the anniversary of my inaugural post (An apology to Raymond Chen) on this blog. That seems a highly appropriate moment to reflect and see if it’s turned out the way I hoped…

As I mentioned back in July, when my first two reviews were published in the ACCU Journal, I’ve found writing difficult, largely I guess because I’m out of practice. Writing this blog has certainly made a dramatic improvement to the speed at which I write. For example, last year my review of the ACCU 2009 Conference took me days to write – and I was on my sabbatical at the time! This year I did it on the train during my commute in a matter of hours. Ok, so I’ve not been one of those prolific bloggers that rattles out a piece every day, but I have tried to write a post a week. I realise that many of my posts are quite lengthy and given that I only use my daily commute as the source of time for it I reckon that’s not too shabby a rate. My commute time already has competition from reading, gaming and maintenance of my freeware codebase so it’s already a tight squeeze.

The one area I certainly didn’t expect to be writing about was C#. I was still a die hard C++ aficionado back in April last year and naturally assumed I’d be writing about C++ issues (if there are any left). It’s funny, but with all that time on my hands during my sabbatical I found it harder to know what to write about, whereas now I’m back working full-time the ideas keep flooding in. Once again I expect that part of it is down to the blogging experience, but I also suspect that I feel more confident about the topics I’m covering. I definitely expected to be sticking to very technical issues such as the recent ones involving WCF, but my current project is both greenfield and using an agile methodology and that has highlighted some very interesting dynamics which in turn has led to a new degree of consciousness about a number of software development issues.

Without a doubt the single biggest contribution blogging has made to me has been the clarity of thought that comes from the fear of “publishing and being damned”. Knowing that the moment I hit the ‘publish’ button my words will be broadcast out onto the Internet for all eternity where potential future employers will be able to see them ensures that I try to remain objective. In the last year there have been two posts that I started to write and ended up canning because I realised they were straw-man arguments. Conversely the mere act of documenting my experiences also leads to new questions that I’ve not considered in any real depth before. I have one post on Unit Test Naming Guidelines that I thought was all done-and-dusted until I met Steve Freeman and Nat Pryce and discovered that I was barking up the wrong tree. No doubt when I come to revise that post at a later date more questions will emerge…

The one thing I haven’t done is look at the stats in Google Analytics. I added a hit counter back at the start, mostly because it was easy, but I never expected anyone to actually read this stuff. The fact that there have been comments submitted (that aren’t just link spam) means that at least a couple of people have bothered to read my musings which is pretty satisfying. Now that a whole year has passed I feel tempted to take a peek and see if the number of hits has reached double figures yet.

Being Author and Editor means that I don’t have that fear of rejection you get with a ‘real’ publication, but I still have that fear of embarrassment to keep me on the straight and narrow. I’m quite contented at present to continue to build up a portfolio of posts that hopefully helps give me that edge we all need to ensure our own survival in the fast changing world of Software Development.

Friday, 9 April 2010

Object Finalized Whilst Invoking a Method

The JIT Compiler & Garbage Collector are wonders of the modern computer age. But if you’re an ex-C++ developer you should put aside what you think you might know about how your C# code runs because it appears their view of the world lacks some artificial barriers we’re used to. Deterministic Destruction such as that used in C++ has a double meaning, sort of. On the one hand it means that an object will be deleted when it goes out of scope, and yet on the other hand it also means that an object is only destroyed by an external influence[*]. Essentially this means that the lifetime of a root object or temporary is defined by the exiting of a scope which keeps life really simple. In the C# world scopes affect some similar aspects to C++ such as value type lifetimes and the visibility of local variables, but not the lifetime of reference types…

The Scenario

The C# bug that I’ve just been looking into involved an Access Violation caused by the finalizer thread trying to destroy an object whilst it was still executing an instance method. Effectively the object finalizer destroyed the native memory it was managing whilst it was also trying to persist that very same block of memory to disk. On the face of it that sounds insane. How can the CLR reach the conclusion that an object has no roots and is therefore garbage when you’re inside a method as surely at least the stack frame that is invoking the method has a reference to ‘this’?

Here is a bare bones snippet of the code:-

public class ResourceWrapper : IDisposable
{
public void Save(string filename);
}

. . .

public class MyType
{
    public ResourceWrapper Data
    {
        get { return new ResourceWrapper(m_data); }
    }

    private byte[] m_data; // Serialized data in
                           // managed buffer.
}

. . .

public void WriteStuff(string filename)
{
m_myType.Data.Save(filename);
}

Now, reduced to this simple form there are some glaring omissions relating to the ownership of the temporary ResourceWrapper instance. But that should only cause the process to be inefficient with its use of memory, I don’t believe there should be any other surprises in a simple single-threaded application. I certainly wouldn’t expect it to randomly crash on this line with an Access Violation:-

m_myType.Data.Save(filename);

The Object’s Lifetime

Once again, putting aside the blatant disregard for correct application of the Dispose Pattern, how can the object, whilst inside the Save() method, be garbage collected? I was actually already aware of this issue after having read the blog post “Lifetime, GC.KeepAlive, handle recycling” by Chris Brumme a while back but found it hard to imagine at the time how it could really affect me as it appeared somewhat academic. In my case I didn’t know how the Save() method was implemented, but I did know it was an incredibly thin wrapper around a native DLL, so I’ve guessed that it probably fitted Chris Brumme’s scenario nicely. If that’s the case then we can inline both the Data property access and the Save call so that in pseudo code it looks something like this:-

public void WriteStuff(string filename)
{
    ResourceWrapper tmp = new ResourceWrapper
                              (m_myType.data);
    IntPtr handle = tmp.m_handle;

    // tmp can now be garbage collected because we
    // have a copy of m_handle.
    ResourceWrapper.NativeSaveFunction(handle,
                                       filename);
}

What a C++ programmer would need to get out of their head is that ‘tmp’ is more like a ResourceWrapper* than a shared_ptr<ResourceWrapper> - the difference being that its lifetime ends way before the end of the scope.

The Minimal Fix

So, if I’ve understood Chris Brumme’s article correctly, then the code above is the JIT Compiler & Garbage Collector’s view. The moment we take a copy of the m_handle member from tmp it can be considered garbage because an IntPtr is a value type, not a reference type, even though we know it actually represents a reference to a resource. In native code you manage handles and pointers using some sort of RAII class like shared_ptr with a custom deleter as each copy of a handle represents another reference, but within C# it seems that the answer is using GC.KeepAlive() to force an objects lifetime to extend past the use of the resource handle. In my case, because we don’t own the ResourceWrapper type, we have to keep the temporary object alive ourselves, which leads to this solution:-

public void WriteStuff(string filename)
{
    ResourceWrapper tmp = m_myType.Data;
    tmp.Save(filename);
    GC.KeepAlive(tmp);
}

From a robustness point of view I believe the KeepAlive() call should still be added to the Save() method to ensure correctness even when Dispose() has accidentally not been invoked - as in this case. Don’t get me wrong I’m a big fan of the Fail Fast (and Loud) approach during development, but this kind of issue can easily evade testing and bite you in Production. To me this is where a Debug build comes into play and warns you that the finalizer performed cleanup at the last minute because you forgot to invoke Dispose(). But you don’t seem to hear anything about Debug builds in the C#/.Net world…

The Right Answer

The more impatient among you will have no doubt been shouting “Dispose - You idiot!” for the last few paragraphs. But when I find a bug I like to know that I’ve really understood the beast. Yes I realised immediately that Dispose() was not being called, but that should not cause an Access Violation in this scenario so I felt there were other forces at work. If I had gone ahead and added the respective using() statement that would likely have fixed my issue, but not have diagnosed the root cause. This way I get to inform the relevant team responsible for the component of a nasty edge case and we both get to sleep soundly.

[*I’m ignoring invoking delete, or destructors or calling Release() or any other manual method of releasing a resource. It’s 2010 and RAII, whether through scoped_handle or shared_ptr or whatever, has been the idiom of choice for managing resources in an exception safe way for well over a decade]

Wednesday, 7 April 2010

Turning Unconscious Incompetence to Conscious Incompetence

I’m sure that I must have come across the Four Stages of Competence before, but it was when Luke Hohmann quoted the more humorous interpretation that it actually registered with me. Pete Goodliffe also brought up this topic up in his recent C Vu column and it’s got me thinking about how I turn Unconscious Incompetence to mere Conscious Incompetence. Oh yeah and the fact that I’m currently sitting in a garage whilst they drain the unleaded petrol from my diesel powered MPV brings the subject of incompetence to the forefront of my mind…

Luke Hohmann was quoting the “Known Knowns” statement from Donald Rumsfeld during an eXtreme Tuesday Club (XTC) meeting back in November 2009. He sketched a pie-chart with a tiny wedge to represent “what we know”, a slightly larger wedge for “what we know we don’t know” and the rest for “what we don’t know we don’t know”. His talk was about Innovation Games and his point was about where innovation occurs. As always it was an insightful session, not least because it got me thinking about how I’d go about reducing the size of the “what I don’t know I don’t know” slice[*].

I’d always thought that not knowing something in detail was not much better than not knowing it at all. But clearly there is value in it. After the better part of 15 years of C++ I don’t think it would be too modest of me to say that C++ was by-and-large in the “do know” region. My recent move to C# meant that I suddenly found myself drawing considerably on that somewhat larger “don’t know” section. Although for C++ developers the usefulness of MSDN Magazine vanished years ago as Microsoft drove headlong into promoting the New World Order that was .Net and C#, I kept subscribing and reading a significant portion of each issue as I felt that there was still an underlying message in there worth listening to. These days Jeremy Miller has a column dedicated to Patterns and Practices and James McCaffrey covers testing mechanisms, so some of those concepts are more readily digestible.

Still, the background noise of the articles has sat there at the back of my mind so that within the last 6 months I have easily been able to move to the C# world. Topics like Generics, Extension Methods, Lambdas and LINQ were high-up on the reading list as somehow I knew these were important. For GUI work I know there is ASP.Net, WinForms and WPF all vying for attention, and on the IPC front Sockets, DCOM and Named Pipes are all passé, and Indigo/WCF is the one-stop-shop. My recent WCF posts probably illustrates how I’m starting to cover the WCF angle, but the GUI stuff will just have to wait as I’ve no need to absorb that yet.

Clearly I knew more that just the names of many of these concepts so is that really fair to categorise them as Conscious Incompetence? Probably not. So how about a list of things that I know virtually nothing about at all except the name and a truly vague context:-

Hadoop – Something from Google related to Map/Reduce?
Scala/Groovy – Languages somehow related to Java and/or the JVM?
Maven/CMake – Stuff to do with building code?
Recursive Descent Parser – Compiler theory/technique?

Does it make sense to know this little about each of these topics? What if I’m actually wrong about the way I’ve categorized them? Does this fulfil the notion of Conscious Incompetence, or do I already know too much?

As each year passes it seems that keeping up with what’s going on in the world of Software Development is becoming more and more of an uphill struggle. There are so many languages and technologies that tuning into the relevant stuff is nigh on impossible (I could have said filtering out the noise, but so much of it appears interesting in one way or another that the term ‘noise’ feels disingenuous). My RSS reader is overflowing with blogs and articles demanding my attention, so after I’ve read the important stuff like The Old New Thing and The Daily WTF I start skimming the general purpose feeds like Slashdot Developers and Dr Dobbs. I say skim, but it’s pretty hard not to get drawn into reading the entire piece, and then of course you’ll follow a few links and have lost another hour or two of your life. That assumes the wife and kids have left you alone long enough for that to happen…

Therein lies the problem for me. I find it hard to leave many topics in the state of Conscious Incompetence. I want to know enough so that when the time comes my unconscious awareness ensures that I know where I need to go to discover more, but at the same time know little enough that I can still effectively filter out that which is most relevant from the daily bombardment of new innovations and revised practices. I guess I’m in need of a more Agile approach to learning.

[*I realised after having read the Wikipedia entries for these two topics more closely that I’m stretching the definition of the Four Stages of Competence to its limit by applying it to the acquisition of knowledge, but as Rumsfeld has shown, repeatedly using the words Known and Unknown would only make the prose more indecipherable]

Thursday, 1 April 2010

Lured Into the foreach + lambda Trap

I had read Eric Lippert’s blog on the subject, I had commented on it, I had even mentioned it in passing to my colleagues the very same morning, but I still ended up writing:-

foreach(var task in tasks)
{
. . .
ThreadPool.QueueUserWorkItem(o => ExecuteTask(task));
}

Then scratched my head as I tried to work out why the same task was suddenly being processed more than once, which was only an issue because the database was chucking seemingly random “Primary Key Violation” errors at me. Luckily I had only changed a few lines of code so I was pretty sure where it must lie, but it still wasn’t obvious to me and I had to fire up the debugger just to humour myself that the ‘tasks’ collection didn’t contain duplicates on entry to the loop as I had changed that logic slightly too.

Mentally the foreach construct looks to me like this:-

foreach(collection)
{
var element = collection.MoveNext();
. . .
}

The loop variable is scoped, just like in C++, and also immutable from the programmers perspective. Yes the variable is written before the opening ‘{‘ just like a traditional for loop and that I guess is where I shall I have to look for guidance until the compiler starts warning me of my folly. Back in the days before all C++ compilers* had the correct for-loop scoping rules a common trick to simulate it was to place an extra pair of { } ‘s around the loop:-

{for(int i = 0; i != size; ++i)
{
. . .
}}

This is the mental picture I think I need to have in mind when using foreach in the future:-

{var element; foreach (collection)
{
element = collection.MoveNext();
. . .
}}

That’s all fine and dandy, but how should I rewrite my existing loop today to account for this behaviour? I feel I want to name the loop variable something obscure so that I’ll not accidentally use it within the loop. Names like ‘tmp’ are out because too many people do that to be lazy. A much longer name like ‘dontUseInLambdas’ is more descriptive but does shout a little too much for my taste. The most visually appealing solution (to me at least) has come from PowerShell’s $_ pipeline variable:-

foreach(var _ in tasks)
{
    var task = _;
    . . .
    ThreadPool.QueueUserWorkItem(o => ExecuteTask(task));
}

It’s so non-descript you can’t use it by accident in the loop and the code still visually lines up the ‘var task’ and ‘in collection’ to a reasonable degree so it’s not wholly unnatural. I think I need to live with it a while (and more importantly my colleagues too) before drawing any conclusions. Maybe I’ll try a few different styles and see what works best.

So what about not falling into the trap in the first place? This is going to be much harder because I’ve got a misaligned mental model to correct, but questioning if a lambda variable is physically declared before the opening brace (instead of the statement) is probably a good start. Mind you this presupposes I don’t elide the braces on single-line loops - which I do :-)

foreach(var task in tasks)
ThreadPool.QueueUserWorkItem(o => ExecuteTask(task));

One final question on my mind is whether or not to try and write some unit tests for this behaviour. The entire piece of code is highly dependent on threads and processes and after you mock all that out there’s virtually nothing left but this foreach loop. It also got picked up immediately by my system tests. Still, I’ve always known the current code was a tactical choice, so as I implement the strategic version perhaps I can amortise the cost of refactoring in then.

[*Yes I’m pointing the finger squarely at Visual C++]