The OldWood Thing: Fallibility

I’ve generally been pretty fortunate with the people I’ve found myself working with. For the most part they’ve all been continuous learners and there has always been some give and take on both sides so that we’ve learned different things from each other. Many years ago on one particular contract I had the misfortune to be thrown a curveball twice, by two different teammates. This post is a reflection on both theirs and my behaviour.

The Unsolicited Review

The first incident occurred when I had only been working on the project for a few weeks. Whilst adding some new behaviour to one of the support command-line tools I spotted some C++ code similar to this:

std::vector<string*> hosts;

for (. . .)
hosts.push_back(new string(. . .));

Having been used to using values, the RAII idiom and smart pointers for so long in C++ I was genuinely surprised by it. Naturally I flicked back through the commit log to see who wrote it and whether they could shed any light on it. This was also out of place given the rest of the code I’d seen. I discovered not only who the author was, but realised they were sitting but a few feet away and so decided to tap them up if they weren’t busy to find out a little more.

Although I cannot be sure, I believe that I approached them in a friendly manner and enquired why this particular piece of code used raw pointers instead of one of the more usual resource management techniques [1]. What I expected was the usual kind of “Doh!” reply that we often give when we noticed we’ve done something silly. What I absolutely wasn’t prepared for was the look of anger on their face followed by them barking “Are you reviewing my code? Have I asked you to do that?”

In somewhat of a daze I apologised for interrupting them and left the code as-was for the time being until I had due cause to fix it – I didn’t want to be seen to be going behind someone’s back either at this point as that might only cause even more friction.

Not long after this episode I had to work more closely with them on the build and deployment scripts. They would make code changes but then make no effort to test them, so even when I knew they were wrong I felt I should wait for the build to fail (a 2 hour process!) rather than be seen to “review” it.

Luckily the person left soon after, but I had already been given the remit to fix as many memory leaks as possible so could close out my original issue before that point.

Whose Bug?

The second incident features someone I actually referred to very briefly in a post over 5 years ago (“Can Code Be Too Simple?”), but that was for a different reason a little while after the following one.

I got pulled into a support conversation after some compute nodes appeared to be failing to load the cache file for a newly developed cache mechanism. For some reason the cache file appeared to be corrupted and so every time the compute process started, it choked on loading it. The file was copied from a UNC share on-demand and so the assumption was that this was when the corruption was happening.

What I quickly discovered was that the focus of the investigation was around the Windows API call CopyFile(). The hypothesis was that there was a bug in this function which was causing the file to become truncated.

Personally I found this hypothesis somewhat curious. I suggested to the author that the chances of there being a bug in such a core Windows API call in a version of Windows Server that was five years old was incredibly slim – not impossible of course, but highly unlikely. Their response was that “my code works” and therefore the bug must be in the Windows call. Try as I might to get them to entertain other possibilities and to investigate other avenues – that our code elsewhere might have a problem – they simply refused to accept it.

Feeling their analysis was somewhat lacklustre I took a look at the log files myself for both the compute and nanny processes and quickly discovered the source of the corruption. (The network contention copying the file was causing it to exceed the process start-up timeout and it was getting killed by the nanny during the lengthy CopyFile() call [2].)

Even when I showed them the log messages which backed up my own hypothesis they were still somewhat unconvinced until the fix went in and the problem went away.

Failure is Always an Option

Although I hadn’t heard it back then, this quote from Jeffrey Snover really sums up the attitude I’ve always tried to adopt with my team mates:

“When confronted by conflict respond with curiosity.”

Hence whenever someone has found a fault in my code or I might have done the same with theirs I do not just assume I’m right. In the first example I was 99% sure I knew how to fix the code but that wasn’t enough, I wanted to know if I was missing something I didn’t know about C++ or the codebase, or if the same was true for the author. In short I wanted to fix the root cause not just the symptoms.

In the second example there was clearly a conflict in our approaches. I’m willing to accept that any bug is almost certainly of my own making and that I’ll spend as much time as possible working on that basis until the only option left is it for to be in someone else’s code. Although I was okay to entertain their hypothesis, I also wanted to understand why they felt so sure of their own work as Windows API bugs are, in my experience, pretty rare and well documented [3].

Everyone has their off days and I’m no exception. If these had been one of those I’d not be writing about them. On the contrary these were just the beginning of some further unfortunate experiences. Both people continued to display tendencies that showed they were overconfident in their approach whilst also making it difficult for anyone else to critique their work. For (supposedly) experienced professionals I would have expected a little more personal reflection and openness.

The consequence of being such a closed book is that it is hard for others who may be able to provide valuable insights and learning to want to do so. When you work with people who are naturally reflective and inquisitive you get a buzz from helping them grow, and likewise when they teach you something new in return. With junior programmers you can allow for a certain amount of arrogance [4] and that’s a challenge worth taking on, but with much older programmers the view that “an old dog can’t learn new tricks” makes the prospect far less rewarding.

As an “old dog” myself I know that I probably have to work a little harder these days to appear open and attentive to change and I believe that process starts by accepting I’m far from infallible.

[1] In this instance simply using string values directly was more than adequate.

[2] The immediate fix of course was simply to copy to a temporary filename and then rename on completion, see “Copy & Rename (Like Copy & Swap But For File-Systems)”.

[3] The “Intriguing SCHTASKS Bug” that I found back in 2011 was certainly unusual, but a little googling turned up an answer reasonably quickly.

[4] See “The Downs and Ups of Being an ACCU Member” for my own watershed moment about how high the bar really goes.

The OldWood Thing

Saturday, 2 December 2017

Fallibility

No comments:

Post a Comment

About Me

Popular Posts

Blog Archive

Labels

Links

Stats [Monthly]