Friday 9 February 2024

Our Star Baker

Just over 14 years ago I posted the eulogy I wrote for my father on this blog (So Long and Thanks For All the Onions) mostly because I had just started writing and this blog gave me the confidence to write. Sadly, a month ago my mother passed away too and yesterday I got to present my eulogy for her as well. The writing practice from the intervening 14 years undoubtedly made the mechanics easier and afforded me more time for reflection and allowed me to better translate my thoughts and feelings to the page. So, thank you blog, but far more importantly “thank you Mum” you too will be with me always…

I was visiting Mum in hospital a few years ago, when she was having one of her knees or hips or something replaced, and a nurse came in and addressed her as “Jennifer”. For a brief moment I wondered who they were talking to, I’m not even sure Mum acknowledged her at first either. Mum hasn’t been a Jennifer for very long time – she was always Jenny, or Jen to her friends. Of course, to myself and Jo she was “mum”, and to our various offspring she was Grandma, or Grandma Jenny. Millie and Ella came up with The Jenster but it never really gained any traction outside our family, for obvious reasons.

She was only ever Jennifer in an official capacity, such as when she returned to work part-time, initially doing market research for the BBC. One of my earliest memories was of tagging along with Mum to some local parade of shops where she would interview the public about their viewing and listening habits. We got to play on the slopes and the railings while she quizzed the public, which was a lot more fun than it might sound today. From there she switched to clerical work, most notably (for me at least) at Bowater-Scott where one of her colleagues offered to build Jo and I a wooden sledge, which he did! As I only remember visiting that office maybe once or twice, I’ve always assumed Mum must have made quite an impression there.

Eventually she ended up in the Audiology Department at West Hill Hospital in Dartford treating people that had lost their hearing. It was only supposed to be a two-week placement, but she ended up staying for 21 years in the end (after cross-training as a Student Technician a year later). It was fairly apparent even to us kids that not every patient was easy to deal with, but she always put their needs first, to the extent that she would often bring her paperwork home to allow her to prioritize her time with the patients instead. If anyone was ever in any doubt that Mum was a people-person, her career at the hospital would surely stand as testament, backed up by the many cards of thanks she received over the years from the people she helped. Perhaps the one aspect of her work we regret her bringing home was the need to talk so much louder all the time.

The hospital wasn’t the most enjoyable place to hang around during the school holidays, but I soon discovered a tiny little computer shop at the top of West Hill which then made the trips to her workplace an absolute joy. When it came to buying me that first computer, I know Dad wasn’t so convinced, and it is Mum that I must thank for taking that leap of faith. Even forty years later Mum would still joke about whether it was the right thing to have done, as it could still just be a passing fad.

When I wrote the eulogy for my father, I suggested that the genes which have probably contributed most to my career in computer programming probably came from him, but in this more recent time for reflection I am beginning to question if it wasn’t more from my mother’s side instead. For her generation Mum was very good with technology – the proverbial Silver Surfer. Although she might occasionally ask for technical advice, she often sorted out her own problems, along with those of her friends! She always wrote a good email and picked-up WhatsApp with similar ease, if occasionally being a little over-zealous with the emojis. We had several different family WhatsApp groups with which she was very active and helped ensure she remained in constant contact with her grandchildren and could easily find out what they were up to. She took a genuine interest in their lives and they were always keen to share. It wasn’t unusual for Charlotte or me to mention what Mum was up to only to be met with a chorus of “yes, we know!” because they had been conversely directly with her about it.

This need to adapt to the ever-changing world was something which Mum embraced, not only on a technical level but also from a social perspective. Rather than dismiss young people because they haven’t faced the same struggles or because their viewpoint didn’t match hers, she would instead engage with them to try and understand how and why the world was changing the way it was. Her grandchildren helped her move with the times and in effect helped her to remain young at heart. She very much believed the old saying about only being as old as you feel. Her body may have shown some signs of wear and tear as she reached her eighties, but her mind was still razor-sharp, along with her wit.

We probably shouldn’t be surprised that some of her joints needed replacing later in life because she was always such an active person! Her diary always seemed to be full – from dawn until dusk – whether that be out with friends and family, or abroad visiting another new country and making even more new friends, and not just for the duration of the trip, they often became lifelong friends which speaks volumes about the kind of impression she left on everyone she met.

As a family we often joked when we visited somewhere new that grandma had probably already been there. If she had, you knew you could rely on her suggestions to fill your itinerary as they would include a range of beautiful vistas, buildings, galleries, restaurants, etc.. Over the years she visited a whole variety of different places from Alaska to Moscow with Canada, Croatia, China, and India to name a few in between. We were always fascinated to see the photo album she would put together on her return and listen to the stories she told about the people she met.

When our children were much smaller we managed to convince her to come away with us on a couple of more relaxing beach holidays. Much as she enjoyed reading at home she wasn’t the sort of person to curl up on a sun lounger with a book, not when there were places she could explore, and the grandkids also made sure it wasn’t going to be a relaxing holiday for any of us, least of all Grandma. I’m still not sure what possessed Mum and I to go paragliding in Turkey! Running, and then Jumping off a cliff while strapped to a stranger felt courageous enough for me, let alone mum who was in her mid-sixties by then. At the end she remarked the scariest part was the jeep ride up the mountain, not coming back down again by parachute!

What all her travelling proved was that she had a great sense of adventure and that was epitomised by her walk along the Inca Trail to Machu Picchu. This multi-day hike was essentially a marathon at high elevation, so a challenge even to the younger trekkers. Beforehand Mum was a little concerned about her age and fitness but she put the training in and need not have worried as she found herself at or near the front the entire time. In fact the porters nicknamed her “the nanny goat” on account of how well she acquitted herself. For once the trip didn’t just conclude with a photo album but ended up becoming a PowerPoint presentation too which she gave twice at Isaac and Millie’s school as Machu Picchu was on their curriculum. That wasn’t the only lesson they got from her there either, as Mum and Dad also put us all to shame at the school’s 1940’s night with a wonderful display of swing dancing.

I don’t think mum was ever a passive bystander in anything she got involved in, she was always there to lend a hand and ultimately would get drawn in to fill whatever role needed her talents. While I’m sure she enjoyed watching us swim I think she preferred it when she could also be an active participant – initially helping-out by decorating the float for the parade, to becoming a committee member, then club secretary, and then officiating at galas in various capacities, even after we’d flown the nest. (Her long-standing service to the Kent County Executive was recognised in 2002 when she received the Edward Maples trophy.) She even got to rekindle her netball skills in a couple of Mother & Daughter swimming club socials and we discovered along the way that she had briefly appeared on TV in her youth playing netball.

This wasn’t the only time she has featured on TV, more recently her left arm made a guest appearance on the BBC during The Proms. Mum was a huge fan of art, both in the literal sense and the wider movement. Although we lived well over an hour away London was middle ground for us both and that gave us the perfect opportunity to meet up and take in a West End show or a trip to the Royal Albert Hall. I was already well versed in musicals long before meeting Charlotte and have Mum to thank for knowing so much of The Sound of Music off-by-heart. While always a favourite in our house too, Mamma Mia became the musical of choice when the kids went to stay at Grandma’s after seeing the show together in London.

Even though she didn’t live on our doorstep that didn’t stop her from attending so many of the concerts and productions that her grandchildren featured in. She was always a big supporter of their talents and watched them whether it was a bit-part in the school Nativity or a paid-concert in Ely Cathedral or Huddersfield Town Hall. (Or for that matter a freezing cold football pitch or rugby pitch, which is probably why she nudged Jo and I towards the warmth of a swimming pool.) For some of you this will be old news as she was keen to share their endeavours with her close friends as any doting grandmother would. She attended so many events in and around Godmanchester over the years that people were always surprised to learn that she actually lived 90 miles away!

During the pandemic this distance made it a little harder to meet up in person, but it didn’t stop her from socialising and even doing activities with the kids. Like everyone else we used Zoom to keep in touch and Ella and Mum created their own virtual Great British Bake Off to ensure those legendary cooking skills were still put to good use. I was never a big fan of baking per-se, but I did enjoy squishing the sausage meat between my fingers when we made sausage rolls for Christmas. Likewise making mince pies was something I enjoyed too, and this Christmas baking tradition was passed-down to my children while I took on the more important role of keeping Mum’s glass of Prosecco topped up. Her puddings definitely were legendary, but for The Oldwood family it’s undoubtedly her coffee flavoured birthday cake that she will be most sorely missed for, baking-wise.

I’m now two-thousand words in and have barely scratched the surface of memories I could talk about. At some point I need to stop and give you the opportunity to share your favourite memories with us, and with the other people here. And share them we must, because that’s how we keep her memory alive. Every time we plan a trip, or open a packet of biscuits, or play a game of Rummy, or use her baking tray, or pour a glass of red wine, or whatever, there will be another opportunity to share our love for the person we once knew as Jenny, or Mum, or Grandma.

Wednesday 4 October 2023

Unpacking Code Ownership

This post was prompted by a document I read which was presented as a development guide. While the rest of it was about style, the section that particularly piqued my interest was one involving code ownership. For those of us who’ve been around the block, the term “code ownership” can bring with it connotations of protectionism. If you’ve never worked with people who are incredibly guarded about the code they write may I recommend my 2017 blog post Fallibility which contains two examples of work colleagues that erected a wall around themselves and their code.

While I initially assumed the use of the term was a proxy for accountability, some comments to my suggestion that Relentless Refactoring was an established practice in many teams hinted that there might be more to it than that. What came out of an online meeting of the team was that the term was carrying the weight of two different characteristics.

(I should point out that I’m always wary of this kind of discussion verging into bike-shedding territory. I like to try and ensure that language is only as precise as necessary, so when I suspect there may be confusion or suboptimal behaviour as a consequence, do I feel it’s worth digging deeper. In this instance I think “ownership” was referring to the following attributes and not about gatekeeping or protectionism for selfish reasons, e.g. job security.)

Accountability / Responsibility

When people talk about “owning your mistakes” what they’re referring to is effectively being accountable, nay responsible, for them. While there might be a legal aspect in the cases of Machiavellian behaviour, for the most part what we’re really after is some indication that changes were not made simply because “you felt like it”. Any code change should be justifiable which implies that there is an air of objectivity around your rationale.

For example, reformatting the code simply because you personally prefer a different brace placement is not objective [1]. In contrast, reformatting to meet the pre-agreed in-house style is. Likewise applying any refactoring that brings old code back in line with the team’s preferred idioms is inherently sound. Moreover, neither of these should even require any debate as the guide automatically confers agreement [2].

Where it might get more contentious is when it appears to be superfluous, but as long as you can justify your actions with a sense of objectivity I think the team should err on the side of acceptance [3]. The reason I think this kind of change can end up being rejected by default is when there is nothing in the development process to allow the status quo to be challenged. A healthy development process should include time for retrospection (e.g. a formal retrospective) and this is probably the place for further debate if it cannot quickly be resolved. (You should not build “inventory” in the form of open PRs simply because of unresolved conflict [4].)

One scenario where this can be less objective is when trying to introduce new idioms, i.e. experimental changes that may or may not set a new precedent. I would expect this to solicit at least some up-front discussion or proactive reviewing / pairing to weed out the obvious chaff. Throwing “weird” code into the codebase without consulting your teammates is disrespectful and can lead to unnecessary turf wars.

Being accountable also implies that you are mature enough to deal with the consequences if the decision doesn’t go your way, aka Egoless Programming [5]. That may involve seeing your work rejected or rewritten, either immediately or in the future which can feel like a personal attack, but shouldn’t.

Experience / Expertise

While accountability looks at ownership from the perspective of the person wanting to change the code, the flipside of ownership is about those people best placed to evaluate change. When we look for someone to act as a reviewer we look for those who have the most experience either directly with the code itself, or from working on similar problems. There may also be different people that can provide a technical or business focused viewpoint if there are both elements at play which deserve special attention, for example when touching code where the previous authors have left and you need help validating your assumptions.

In this instance what we’re talking about are Subject Matter Experts. These people are no more “owners” of the code in question than we are but that doesn’t mean they can’t provide useful insights. If anything having people unrelated to the code reviewing it can be more useful because you know they will have no emotional attachment to it. If the change makes sense feature-wise, and does it in a simple, easy to understand way, does anything else really matter?

These days we have modern tooling like version control products which, assuming we put the right level of metadata in, allow us to see the evolution of the codebase along with the rationale even when the authors have long gone. Ownership doesn’t have to be conferred simply because you’re the only one that remembers how and why the code ended up the way it did. This leads into territory around fear of change which is not a sustainable approach to software delivery. In this day-and-age “consulting the elders” should really be a last resort for times when the record of events is lost in the sands of time. Approval should be a function based on knowledge of the subject matter rather than simply years of service [6].

Shepherds, Not Owners

Ultimately what I find slightly distasteful about the term “shared ownership” is that it still conveys a sense of protectionism, especially for those currently “outside the team”.

From a metaphorical point of view what I think I described above is more a sense of shepherding. The desire should be to nurture contributors to understand the culture of the codebase and product to the extent that the conversations can focus on the essential, rather than accidental complexity.

I wonder if “shared mentorship” would work as a substitute?

 

[1] This is a good argument for using a standard code formatting tool as it can make these debates moot. 

[2] If the code is that performance sensitive it should not be touched without consultation then there should either be some performance tests or at a minimum some comments to make that obvious.

[3] The late Pieter Hintjens makes a compelling case in Why Optimistic Merging Works Better.

[4] There is where I favour the optimism of Trust, but Verify as an approach, or pairing / ensemble programming to reach early consensus.

[5] The Psychology of Computer Programming – Gerry Weinberg, 1971.

[6] One needs to be mindful of not falling into the Meritocracy trap though.

Monday 31 October 2022

WMI Performance Anomaly: Querying the Number of CPU Cores

As one of the few devs that both likes and is reasonably well-versed in PowerShell I became the point of contact for a colleague that was bemused by a performance oddity when querying the number of cores on a host. He was introducing Ninja into the build and needed to throttle its expectations around how many actual cores there were because hyperthreading was enabled and our compilation intensive build was being slowed by its bad guesswork [1].

The PowerShell query for the number of cores (rather than logical processors) was pulled straight from the Internet and seemed fairly simple:

(Get-WmiObject Win32_Processor |
  measure -Property NumberOfCores -Sum).Sum

However when he ran this it took a whopping 4 seconds! Although he was using a Windows VM running on QEMU/KVM, I knew from benchmarking a while back this setup added very little overhead, i.e. only a percentage or two, and even on my work PC I observed a similar tardy performance. Here’s how we measured it:

Measure-Command {
  (Get-WmiObject Win32_Processor |
  measure -Property NumberOfCores -Sum).Sum
} | % TotalSeconds
4.0867539

(As I write my HP laptop running Windows 11 is still showing well over a second to run this command.)

My first instinct was that this was some weird overhead with PowerShell, what with it being .Net based so I tried the classic native wmic tool under the Git Bash to see how that behaved:

$ time WMIC CPU Get //Format:List | grep NumberOfCores  | cut -d '=' -f 2 | awk '{ sum += $1 } END{ print sum }'
4

real    0m4.138s

As you can see there was no real difference so that discounted the .Net theory. For kicks I tried lscpu under the WSL based Ubuntu 20.04 and that returned a far more sane time:

$ time lscpu > /dev/null

real    0m0.064s

I presume that lscpu will do some direct spelunking but even so the added machinery of WMI should not be adding the kind of ridiculous overhead that we were seeing. I even tried my own C++ based WMICmd tool as I knew that was talking directly to WMI with no extra cleverness going on behind the scenes, but I got a similar outcome.

On a whim I decided to try pushing more work onto WMI by passing a custom query instead so that it only needed to return the one value I cared about:

Measure-Command {
  (Get-WmiObject -Query 'select NumberOfCores from Win32_Processor' |
  measure -Property NumberOfCores -Sum).Sum
} | % TotalSeconds
0.0481644

Lo-and-behold that gave a timing in the tens of milliseconds range which was far closer to lscpu and definitely more like what we were expecting.

While my office machine has some “industrial strength” [2] anti-virus software that could easily be to blame, my colleague’s VM didn’t, only the default of MS Defender. So at this point I’m none the wiser about what was going on although my personal laptop suggests that the native tools of wmic and wmicmd are both returning times more in-line with lscpu so something funky is going on somewhere.

 

[1] Hyper-threaded cores meant Ninja was scheduling too much concurrent work.

[2] Read that as “massively interfering”!

 

Monday 1 November 2021

Chaining IF and && with CMD

An interesting bug cropped up the other day in a dub configuration file which made me realise I wasn’t consciously aware of the precedence of && when used in an IF statement with cmd.exe.

Batch File Idioms

I’ve written a ton of batch files over the years and, with error handling being a manual affair, the usual pattern is to alternate pairs of statement + error check, e.g.

mkdir folder
if %errorlevel% neq 0 exit /b %errorlevel%

It’s not uncommon for people to explicitly leave off the error check in this particular scenario so that (hopefully) the folder will exist whether not it already does. However it then masks a (not uncommon) failure where the folder can’t be created due to permissions and so I tend to go for the more verbose option:

if not exist "folder" (
  mkdir folder
  if !errorlevel! neq 0 exit /b !errorlevel!
)

Note the switch from %errorlevel% to !errorlevel!. I tend to use setlocal EnableDelayedExpansion at the beginning of every batch file and use !var! everywhere by convention to avoid forgetting this transformation as it’s an easy mistake to make in batch files.

Chaining Statements

In cmd you can chain commands with & (much like ; in bash) with && being used when the previous command succeeds and || for when it fails. This is useful with tools like dub which allow you to define “one liners” that will be executed during a build by “shelling out”. For example you might write this:

mkdir bin\media && copy media\*.* bin\media

This works fine first time but it’s not idempotent which might be okay for automated builds where the workspace is always clean but it’s annoying when running the build repeatedly, locally. Hence you might be inclined to fix this by changing it to:

if not exist "bin\media" mkdir bin\media && copy media\*.* bin\media

Sadly this doesn’t do what the author intended because the && is part of the IF statement “then” block – the copy is only executed if the folder doesn’t exist. Hence this was the aforementioned bug which wasn’t spotted at first as it worked fine for the automated builds but failed locally.

Here is a canonical example:

> if exist "C:\" echo A && echo B
A
B

> if not exist "C:\" echo A && echo B

As you can see, in the second case B is not printed so is part of the IF statement happy path.

Parenthesis to the Rescue

Naturally the solution to problems involving ordering or precedence is to introduce parenthesis to be more explicit.

If you look at how parenthesis were used in the second example right back at the beginning you might be inclined to write this thinking that the parenthesis create a scope somewhat akin to {} in C style languages:

> if not exist "C:\" (echo A) && echo B

But it won’t work as the parenthesis are still part of the “then” statement. (They are useful to control evaluation when mixing compound conditional commands that use, say, || and & [1].)

Hence the correct solution is to use parenthesis around the entire IF statement:

> (if not exist "C:\" echo A) && echo B
B

Applying this to the original problem, it’s:

(if not exist "bin\media" mkdir bin\media) && copy media\*.* bin\media

 

[1] Single line with multiple commands using Windows batch file

Thursday 30 September 2021

Transient Expand-Archive Failures

[I’m sure there is something else going on here but on the off-chance someone else is also observing this and also lost at least they’ll know they’re not alone.]

We have a GitLab project pipeline that started out as a monolithic job but over the last 9 months has slowly been parallelized and now runs as over 150 jobs spread out across a cluster of 4 fairly decent [1] machines with 8 to 10 concurrent jobs per host. More recently we’ve started seeing the PowerShell Expand-Archive cmdlet failing randomly up to 5% of the time with the following error:

Remove-Item : Cannot find path {...} because it does not exist.

The line of code highlighted in the error is:

$expandedItems | % { Remove-Item $_ -Force -Recurse }

If you google this message it suggests this probably isn’t the real error but a problem with the cmdlet trying to clean-up after failing to extract the contents of the .zip file. Sadly the reason why the extraction might have failed in the first place is now lost.

Investigation

While investigating this error message I ran across two main hits – one from Stack Overflow and the other on the PowerShell GitHub project – both about hitting the classic long path problem in Windows. In our case the extracted paths, even including the build agent root, is still only 100 characters so well within the limit as the archive only has one subfolder and the filenames are short.

Also the archive is built with it’s companion cmdlet Compress-Archive so I doubt it’s an impedance mismatch in our choice of tools.

My gut reaction to anything spurious like this is that it’s the virus scanner (AV) [2]. Sadly I have no direct control over the virus scanner product choice or its configuration. In this instance the machines have Trend Micro whereas the other build agents I’ve built are VMs and have Windows Defender [3], but their load is also much lower. I managed to get the build folder excluded temporarily but that appears to have had no effect and nothing was logged in the AV to say it had blocked anything. (The “behaviour monitoring” in modern AV products often gets triggered by build tools which is annoying.)

After discounting the obvious and checking that memory exhaustion also wasn’t a factor as the memory load for the jobs is variable and the worst case loading can cause the page-file to be used, I wondered if there the problem lay with the GitLab runner cache somehow.

Corrupt Runner Cache?

To avoid downloading the .zip file artefact for every job run we utilise the GitLab runner local cache. This is effectively a .zip file of a packages folder in the project working copy that gets packed up and re-used in the other jobs on the same machine which, given our level of concurrency, means it’s constantly in use. Hence I wondered if our archive was being corrupted when the cache was being unpacked as I’ve seen embedded .zip files cause problems in the past for AV tools (even though it supposedly shouldn’t have been touching the folder). So I added a step to test our archive’s integrity before unpacking it by using 7-Zip as there doesn’t appear to be a companion cmdlet Test-Archive. I immediately saw the integrity test pass but the Expand-Archive step fail a few times so I’m pretty sure the problem is not archive corruption.

Workaround

The workaround which I’ve employed is to use 7-Zip for the unpacking step too and so far we’ve seen no errors at all but I’m left wondering why Expand-Archive was intermittently failing. Taking an extra dependency on a popular tool like 7-Zip is hardly onerous but it bumps the complexity up very slightly and needs to be accounted for in the docs / scripts.

In my 2017 post Fallibility I mentioned how I once worked with someone who was more content to accept they’d found an undocumented bug in the Windows CopyFile() function than believe there was a flaw in their code or analysis [4]. Hence I feel something as ubiquitous as Expand-Archive is unlikely to have a decompression bug and that there is some piece of the puzzle here that I’m missing. Maybe the AV is still interfering in some way that isn’t triggered by 7-Zip or the transient memory pressure caused by the heavier jobs is having an impact?

Given the low cost of the workaround (use 7-Zip instead) the time, effort and disruption needed to run further experiments to explore this problem further is sadly too high. For the time being annecdata is the best I can do.

 

[1] 8 /16 cores, 64 / 128 GB RAM, and NVMe based disks.

[2] I once did some Windows kernel debugging to help prove an anti-virus product update was the reason our engine processes where not terminating correctly under low memory conditions.

[3] Ideally servers shouldn’t need anti-virus tools at all but the principle of Defence in Depth suggests the minor performance impact is worth it to potentially help slow lateral movement.

[4] TL;DR: I quickly showed it was the latter at fault not the Windows API.

 

Monday 27 September 2021

Lose the Source Luke?

We were writing a new service to distribute financial pricing data around the trading floor as a companion to our new desktop pricing tool. The plugin architecture allowed us to write modular components that could tap into the event streams for various reasons, e.g. provide gateways to 3rd party data streams.

Linking New to Old

One of the first plugins we wrote allowed us to publish pricing data to a much older in-house data service which had been sat running in the server room for some years as part of the contributions system. This meant we could eventually phase that out and switch over to the new platform once we had parity with it.

The plugin was a doddle to write and we quickly had pricing data flowing from the new service out to a test instance of the old service which we intended to leave running in the background for soak testing. As it was an in-house tool there was no installer and my colleague had a copy of the binaries lying around on his machine [1]. Also he was one of the original developers so knew exactly what he was doing to set it up.

A Curious Error Message

Everything seemed to be working fine at first but as the data volumes grew we suddenly noticed that the data feed would eventually hang after a few days. In the beginning we were developing the core of the new service so quickly it was constantly being upgraded but now the pace was slowing down the new service was alive for much longer. Given how mature the old service was we assumed the issue was with the new one. Also there was a curious message in the log for the old service about “an invalid transaction ID” before the feed stopped.

While debugging the new plugin code my colleague remembered that the Transaction ID meant the message sequence number that goes in every message to allow for ordering and re-transmission when running over UDP. The data type for that was a 16-bit unsigned integer so it dawned on us that we had probably messed up handling the wrapping of the Transaction ID.

Use the Source Luke

Given how long ago he last worked on the old service he couldn’t quite remember what the protocol was for resetting the Transaction ID so we decided to go and look at the old service source code to see how it handled it. Despite being at the company for a few years myself this all pre-dated me so I left my colleague to do the rummaging.

Not long after my colleague came back over to my desk and asked if I might know where the source code was. Like so many programmers in a small company I was a part-time sysadmin and generally looked after some of servers we used for development duties, such as the one where our Visual SourceSafe repository lived that contained all the projects we’d ever worked on since I joined.

The VCS Upgrade

When I first started at the company there were only a couple of programmers not working on the mainframe and they wrote their own version control system. It was very Heath Robinson and used exclusive file locks to side-step the problem of concurrent changes. Having been used to a few VCS tools by then such as PVCS, Star Versions, and Visual SourceSafe I suggested that we move to a 3rd party VCS product as we needed more optimistic concurrency controls as more people were going to join the team. Given the MSDN licenses we already had along with my own experience Visual SourceSafe (VSS) seemed like a natural choice back then [2].

Around the same time the existing development server was getting a bit long in the tooth so the company forked out for a brand new server and so I set-up the new VSS repository on that and all my code went in there along with all the subsequent projects we started. None of the people that joined after me ever touched any of the old codebase or VCS as it was so mature it hadn’t needed changing in some time and anyway the two original devs where still there to look after it.

The Office Move

A couple of years after I joined, the owners of the lovely building the company had been renting for the last few decades decided they wanted to gut and renovate it as the area in London where we were based was getting a big makeover. Hence we were forced to move to new premises about half a mile away. The new premises were nice and modern and I no longer had the vent from the portable air-conditioning machine from one of the small server rooms pumping out hot air right behind my desk [3].

When moving day came I made sure the new server with all our stuff on got safely transported to the new office’s server room so that we ready to go again on Monday morning. As we stood staring around the empty office floor my colleague pointed to the old development server which had lay dormant in the corner and asked me (rhetorically) whether we should even bother taking it with us. As far as I was concerned everything I’d ever needed had always been on the new server and so I didn’t know what was left that we’d still need.

My colleague agreed and so we left the server to be chucked in the skip when the bulldozers came.

Dormant, But Not Redundant

It turned out their original home-grown version control system had a few projects in it, including the old data service. Luckily one of the original developers who worked on the contributions side still had an up-to-date copy of that and my colleague found a local copy of the code for one of the other services but had no idea how up-to-date it was. Sadly nobody had even a partial copy of the source to the data service we were interested in but we were going to replace that anyway so in the end the loss was far less significant than we originally feared.

In retrospect I can’t believe we didn’t even just take the hard disk with us. The server was a classic tower so took up a far bit of room which was still somewhat at a premium in the new office whereas the disk could probably have sit in a desk drawer or even been fitted as an extra drive in the new midi sized development server.

 

[1] +1 for xcopy deployment which made setting up development and test instances a piece of cake.

[2] There are a lot of stories of file corruption issues with VSS but in the 7 years I’d used it with small teams, even over a VPN, we only had one file corruption issue that we quickly restored from a backup.

[3] We were on the opposite side from the windows too so didn’t even get a cool breeze from those either.

 

Thursday 23 September 2021

The Case of the Curious Commit Message

I had taken a new contract at an investment bank and started working on a very mature codebase which was stored in ClearCase. As a long-time user [1] of version control systems one of the things that bugged me about the codebase were empty commit messages. On a mature codebase where people have come and gone it’s hard enough to work out what was going on just from the code, decent commit messages should be there to give you that extra context around the “why”.

Rallying the Troops

After attempting to sell the virtues of commit messages to my colleagues a couple of times during team meetings there were still the odd one or two that consistently avoided doing so. So I decided to try a name-and-shame approach [2] by emailing a table of names along with their percentage of non-empty commit messages hoping those that appeared at the bottom would consider changing their ways.

At the time I was just getting my head around ClearCase and there were a couple of complaints from people who felt unduly chastised because they didn’t have 100% when they felt they should. It turned out their accounts were used for some automated check-ins which had no message which I didn’t know about, so I excluded those commits and published a revised table.

Progress?

On the plus side this got people discussing what a good commit message looked like and it brought up some question marks around certain practices that others had done. For example a few team members wouldn’t write a formal message but simply paste the ID of the issue from ClearQuest [3]. Naturally this passed my “not empty” test but it raised a question about overly terse commit messages. Given where we were coming from I felt this was definitely acceptable (for the time being) as they were still using the commit message to provide more details, albeit in the form of a link to the underlying business request [5].

However, it got me thinking about whether people were not really playing ball and might be gaming the system so I started looking into overly terse commit messages and I’m glad to say everyone was entering into the spirit of things [4]. Everyone except one person who had never even been on the initial radar but who had a sizable number of commits with the simple message:

    nt

That’s right, just the two letters ‘n’ and ‘t’. (There were others but this was the most prevalent and memorable.)

A Curious Message

Looking at the diffs that went with these messages it wasn’t obvious what “nt” meant. My initial instinct was that it was an abbreviation of some sort, perhaps a business term I was unaware of as the developer was involved in the more maths heavy side of the project. They were far more common before my “shake-up” so I was pleased that whatever this term was it was being replaced by more useful comments now but I was still intrigued. Naturally I walked across the room to the very pleasant developer in question and asked him what “nt” meant.

It turned out it didn’t mean anything, and the developer was largely unaware they even existed! So where did they come from?

The Mist Clears

Luckily while we were chatting he started making a new change and I saw the ClearCase check-out dialog appear and the initial message was a few letters of garbage. I looked at what he intended to type in the editor and it dawned on me what was happening – the “nt” was the latter part of the word “int”.

Just as with Visual SourceSafe, the ClearCase Visual Studio plugin would trigger when you started editing a file and nothing else was checked out at that point. It would pop-up a dialog so you could configure how the check-out was done. For example you might want to put an exclusive lock on the file [6] or you could provide a message so others could see what files were being edited concurrently. By default the focus in this dialog was on the OK button so it was possible to dismiss this dialog without even really seeing it…

Hence this is what was going on:

  1. The dev typed “int” to start a declaration as part of a new set of code changes.
  2. The “i” keypress triggered the ClearCase plugin which noticed this was the start of a new check-out and promptly threw up a dialog with the remaining letters “nt” in the message field.
  3. By then the dev had already pressed “space” at the end of the type name which, due to the default button focus, caused the dialog to immediately disappear.
  4. When he committed the changes at the end he never edited the message anyway, he would just click the commit button and move on.

Case closed. From a UI perspective it probably falls into the same category (although with less disastrous consequences) as those unexpected popups that ask if you want to reboot your machine, NOW. Ouch!

 

[1] I was introduced to them on my very first job and have been fortunate enough to use one on virtually every job since, even if I ended up setting one up :o).

[2] In retrospect I probably didn’t try hard enough to sell it and should have taken a more personal approach for the laggards as maybe there were good reasons why they weren’t doing it, e.g. tooling.

[3] Yes, an enterprise level defect tracking tool with all the pain you’d expect from such a product.

[4] For non-trivial things that is, the message “typo” still appeared for some of those but that raised a whole different set of questions around not compiling or testing changes before committing them!

[5] Including the ticket number at the start of a commit message is something I promote in my Commit Checklist.

[6] This was useful for non-mergeable files like DTS packages and media assets but often ended up creating more harm than good as they got left locked and you had to get an admin to unlock them, and they were in another team.