Tuesday 3 September 2024

Cambridge Software Crafters Meetup

Despite living pretty close to Cambridge I’ve never actually worked there. For the past 25 years I’ve commuted to London because it was easier and faster than getting to Cambridge [1]. The daily rates for contract programmers in the finance industry might also have had an influence too :o).

Anyway, one consequence of working in London meant that I had access to all the London meetups in the evening. But I also had a wife and four young children, so wanted to keep the balance right and consequently I largely limited myself to the ACCU London and eXtreme Tuesday Club (XTC) events, only dabbling in others if there was a particularly interesting talk / topic.

Then the pandemic happened, and like many others I found myself working remotely full-time. I had worked from home before on previous contracts, and in some cases that was to facilitate being able to attend a meetup in Cambridge, most notably around .Net and Go, but I only attended a handful or so. Mark Dalgarno (via Software Acumen – organiser of many Agile related conferences, such as Agile Cambridge) was in Cambridge back then too, and through Software East also arranged various enjoyable gatherings.

Eventually some semblance of normality resumed but I still found myself working remotely and now Cambridge was my nearest point of social contact (IT wise) in the evenings. The pandemic meant that meetups had to stop for some considerable time (years!) and with people nervous to venture back out again the choice was naturally limited. Platform focused meetups like Cambridge .Net reappeared in-person sooner, as I guess they had a captive audience, but I wanted something that covered the wider topic of programming at large, not just one specific language / stack. (There was an ACCU Cambridge meetup once-upon-a-time but sadly that folded a long time ago.)

Tangentially, my eldest son had also recently entered a career in programming and he asked me about any local meetups I knew of as he was based around the same area. It didn’t take long to discover that the Cambridge Software Crafters group on Meetup was holding its first in-person gathering since November 2019 (a hiatus of four years). The topic was learning TDD and it was being held in the office where my son was working at the time so I suggested he had no excuse not to attend either :o).

I met some old faces at that event and, more importantly, some new faces too. (I even bumped into a young chap who lived a few houses down my street. I didn’t know he’d also started a career as a programmer.) A diversity of ages, experience, industries, culture, etc. is an important requirement for me in a more general craft-oriented meetup because “one size never fits all” and I like to be aware of what forces and constraints drive other people’s work.

For that first meetup we did a leap-year kata in pairs. I chose to work with someone young who wanted to use TypeScript as I’d never used that before. Having practiced TDD for almost 20 years I clearly wasn’t there as a beginner but I felt my experience might be useful and I always learn something new when pairing with people. (After pairing / mobbing heavily for the 6 years prior to my current contract I realised how much I had missed the sociability of ensemble programming.)

Sadly I missed the next few events but my son kept me informed on what they had got up to and it continued to sound interesting, with practicing TDD clearly a popular topic. I eventually attended the Architecture Kata, which I found interesting and intend to write up some particular observations in a separate post. Then we did an evening of lightning talks (10 mins) which is something I’ve always enjoyed at the ACCU and Python conferences as you always get a pretty diverse set of topics. This was no exception. (It also gave me an opportunity to resurrect a humorous short talk I did at ACCU 2014 – The Art of Code.)

The most recent meetup was another TDD kata, this time around the game of Noughts & Crosses. I ended up pairing with someone that was only just getting into programming and was attending the meetup to learn more about the craft as he had a background in construction. In the end we hardly did any programming per-se and I mostly walked him through TDD but more as a backdrop on talking about what sustainable delivery really entails. It wasn’t at all what I was expecting to do that evening but I was glad I could share some knowledge & experience, and it meant that everybody then got a suitable partner they could work with.

While the sociability factor is high on my list it would be remiss of me not to mention that the event has sponsorship from Codurance which means that the evening comes with free pizza and drinks. The venue, The Bradfield Centre, is a nice modern building and has an auditorium which was used for the lightning talks. Most events take place on the open plan ground floor area as we either pair or work in small groups and then share with everyone what we’ve done / learned.

I really hope this meetup continues to gain traction as I think we often focus too much on specific technologies and not enough on the craft as a whole. There are clearly a few regulars but also new faces each time which provides a nice mix and everyone has been really friendly. It also gives me an opportunity to “talk shop” with my son as that never goes down well at family gatherings :o).

 

[1] Things are much improved with the guided buses and upgraded A14 but what I wrote in “Missing the Daily Commute by Train” still holds as far as using the train goes.

 

Monday 22 April 2024

Naming Functions: When Intent and Implementation Differ

Most of the time these days when I get into a conversation about naming it tends to be about tweaking the language, perhaps because I think there is a much better term available, or the author is a non-native speaker and they’ve transliterated the name and it’s ended being quite jarring, or worse, ambiguous. I’m also on somewhat of a personal crusade to try and vastly reduce the lazy use of “get” in codebases by replacing it with far more descriptive names. (For a more humorous take on the overuse of “get” try my Overload back-pager “Acquire Carter”. Alternatively my C Vu article “In The Toolbox – Dictionary & Thesaurus” tackles the issue a little more traditionally.)

However, more recently I ended up in a conversation where I felt the the whole point of “naming to reveal intent” had gone out of the window and the author and other reviewer were far too fixated on reflecting the implementation instead. Normally the intent and implementation align very closely, but in this instance they didn’t because it was the first step towards formalising a new feature, which was implemented temporarily [1] using a different one.

To anyone who’s read Arlo Belshee’s writings on Naming as a Process I think you’ll recognise what follows as a case of “Honest Naming” instead of “Naming for Intent”, and that comes at a cost to the reader.

Background

I currently work on a system used to generate a variety of different reports. When some of the reporting code runs it generates log messages which warn about missing data, which means it’s had to substitute something else instead. When the code was initially written it was being run interactively and so the missing data would get addressed quickly. However, now that it’s automated nobody sees these messages anymore, except when the code fails and anything written to stdout gets included in the failure alert.

Naturally, there should really be some better way of surfacing these anomalies, even if the report generation succeeds, and there will be, eventually. However, in the meantime we discovered there is a way of hooking into the report scheduler so that those warnings could at least be surfaced in a completion status email instead. This mechanism involved a bit of abuse though – using the “critical” log level instead of merely logging the missing data as a “warning”, because serious errors during runtime were still highlighted at completion.

First Attempt: Confusion Through Dishonesty

The simplest thing that could possibly work, and the initial change that was presented, was to directly change any log lines like this:

print(“WARNING: Missing data for ‘XYZ’”)

to use an alternative logging function like this:

logCritical(“Missing data for ‘XYZ’”)

My immediate reaction to this was one of “No, now the code is lying to us!”. The code previously made it clear that this condition was just an inconvenience but now it appears to be far more serious.

While the elevation in severity was unfortunate my real problem was that it was missing the reason why the logging call was being made in the first place. Even in the original code the use of print() was a means to another end, and ultimately we should make that end more obvious.

While I’m a fan of doing the simplest thing by default, that does not also mean doing something overly naïve. The cost of adding an extra level of indirection at this point in time was not onerous and would very likely minimise rework in the future as we evolved the real implementation.

Second Attempt: Too Honest

I thought I had expressed my opinion fairly clearly and there appeared to be agreement on the way forward so I was somewhat surprised when the code came back like this:

enrichEmailWithWarning(“Missing data for ‘XYZ’”)

At this point I pulled out Arlo Belshee’s classic naming infographic to try and show how we were still a way off getting to the point where the code was saying what was really going on at a business level.

What makes this particularly confusing is the code where this appears runs in a child process and so has no knowledge of the parent scheduler. Hence the function being invoked is talking about some obscure mechanism totally outside itself. The name of the function in this instance is too honest, it says what is going under the covers but not what it really means. If you run this interactively again what email is it even talking about?

On the plus side though we have at least encapsulated the underlying mechanism to stop lying about the severity, and we can reimplement this later to use a separate file, database, or whatever without touching all the call sites. Or can we?

Well, the problem with honest naming is that when the implementation changes, then so does the name. This means all the call sites will have to be updated yet again each time we reimplement it. Every time we update the code we run the risk of breaking it [2], and it adds unnecessary noise to the file’s history because the real intent hasn’t changed.

Third Attempt: Revealing Intent

Let’s take a step back and ask ourselves what this line of code is really trying to achieve:

print(“WARNING: Missing data for ‘XYZ’”)

While you might say that it’s printing or logging, that is just the implementation bleeding through. A more general way of saying that might be “recording”. You could argue that “notifying” might be better because the information is included in a status email but I’d suggest that is being too specific at this point because in the future it may only get written somewhere and the retrieval mechanism is totally up for grabs and could easily involve not sending yet more emails.

Okay, so we’re recording something, but that’s not much better than print() or log() if we are just going to pass an arbitrary text string. We need to think more about what we’re recording to see if that can be included in the name too, or broken down into separate arguments.

The whole reason this conversation started was because the messages about missing data were not being seen and therefore we needed to surface them in some way. Hence what we’re recording is “missing data” and that record ideally needs to be different from the diagnostic trace messages and other more general logging output. Hence the function name / API I proposed was:

recordMissingData(“XYZ”)

To me this now expresses far better what the original author intended all along. The implementation has the ability to vary quite widely from simply printing to stdout, logging to a file, writing to a queue / database, and yet the same name will continue to reflect all those possibilities. By elevating the text “missing data” from the log message to the function name we are also being more explicit about the fact that handling missing data is formally recognised as a feature and can be given special treatment, unlike the general logging output.

Anyone not involved in this conversation who runs across the code and has a similar requirement stands a better chance of using it correctly too, meaning less rework. When it’s missing, future readers may try to “second guess” what’s going on and then apply the same mechanism for different, inappropriate reasons and now, as they say, you have two problems.

I can’t say for sure we won’t need to revisit this again as we learn more about the nature of what data is missing and whether with more context we can automatically triage and notify the right people, but for now it feels like the cost / benefit ratio of “talking versus doing” is about right.

 

[1] Describing anything as “temporary” is a fools errand, we all know it will live far longer than planned :o).

[2] This is code written in a dynamic language so even a simple typo can go undetected unless it’s tested properly.

 

Friday 9 February 2024

Our Star Baker

Just over 14 years ago I posted the eulogy I wrote for my father on this blog (So Long and Thanks For All the Onions) mostly because I had just started writing and this blog gave me the confidence to write. Sadly, a month ago my mother passed away too and yesterday I got to present my eulogy for her as well. The writing practice from the intervening 14 years undoubtedly made the mechanics easier and afforded me more time for reflection and allowed me to better translate my thoughts and feelings to the page. So, thank you blog, but far more importantly “thank you Mum” you too will be with me always…

I was visiting Mum in hospital a few years ago, when she was having one of her knees or hips or something replaced, and a nurse came in and addressed her as “Jennifer”. For a brief moment I wondered who they were talking to, I’m not even sure Mum acknowledged her at first either. Mum hasn’t been a Jennifer for very long time – she was always Jenny, or Jen to her friends. Of course, to myself and Jo she was “mum”, and to our various offspring she was Grandma, or Grandma Jenny. Millie and Ella came up with The Jenster but it never really gained any traction outside our family, for obvious reasons.

She was only ever Jennifer in an official capacity, such as when she returned to work part-time, initially doing market research for the BBC. One of my earliest memories was of tagging along with Mum to some local parade of shops where she would interview the public about their viewing and listening habits. We got to play on the slopes and the railings while she quizzed the public, which was a lot more fun than it might sound today. From there she switched to clerical work, most notably (for me at least) at Bowater-Scott where one of her colleagues offered to build Jo and I a wooden sledge, which he did! As I only remember visiting that office maybe once or twice, I’ve always assumed Mum must have made quite an impression there.

Eventually she ended up in the Audiology Department at West Hill Hospital in Dartford treating people that had lost their hearing. It was only supposed to be a two-week placement, but she ended up staying for 21 years in the end (after cross-training as a Student Technician a year later). It was fairly apparent even to us kids that not every patient was easy to deal with, but she always put their needs first, to the extent that she would often bring her paperwork home to allow her to prioritize her time with the patients instead. If anyone was ever in any doubt that Mum was a people-person, her career at the hospital would surely stand as testament, backed up by the many cards of thanks she received over the years from the people she helped. Perhaps the one aspect of her work we regret her bringing home was the need to talk so much louder all the time.

The hospital wasn’t the most enjoyable place to hang around during the school holidays, but I soon discovered a tiny little computer shop at the top of West Hill which then made the trips to her workplace an absolute joy. When it came to buying me that first computer, I know Dad wasn’t so convinced, and it is Mum that I must thank for taking that leap of faith. Even forty years later Mum would still joke about whether it was the right thing to have done, as it could still just be a passing fad.

When I wrote the eulogy for my father, I suggested that the genes which have probably contributed most to my career in computer programming probably came from him, but in this more recent time for reflection I am beginning to question if it wasn’t more from my mother’s side instead. For her generation Mum was very good with technology – the proverbial Silver Surfer. Although she might occasionally ask for technical advice, she often sorted out her own problems, along with those of her friends! She always wrote a good email and picked-up WhatsApp with similar ease, if occasionally being a little over-zealous with the emojis. We had several different family WhatsApp groups with which she was very active and helped ensure she remained in constant contact with her grandchildren and could easily find out what they were up to. She took a genuine interest in their lives and they were always keen to share. It wasn’t unusual for Charlotte or me to mention what Mum was up to only to be met with a chorus of “yes, we know!” because they had been conversely directly with her about it.

This need to adapt to the ever-changing world was something which Mum embraced, not only on a technical level but also from a social perspective. Rather than dismiss young people because they haven’t faced the same struggles or because their viewpoint didn’t match hers, she would instead engage with them to try and understand how and why the world was changing the way it was. Her grandchildren helped her move with the times and in effect helped her to remain young at heart. She very much believed the old saying about only being as old as you feel. Her body may have shown some signs of wear and tear as she reached her eighties, but her mind was still razor-sharp, along with her wit.

We probably shouldn’t be surprised that some of her joints needed replacing later in life because she was always such an active person! Her diary always seemed to be full – from dawn until dusk – whether that be out with friends and family, or abroad visiting another new country and making even more new friends, and not just for the duration of the trip, they often became lifelong friends which speaks volumes about the kind of impression she left on everyone she met.

As a family we often joked when we visited somewhere new that grandma had probably already been there. If she had, you knew you could rely on her suggestions to fill your itinerary as they would include a range of beautiful vistas, buildings, galleries, restaurants, etc.. Over the years she visited a whole variety of different places from Alaska to Moscow with Canada, Croatia, China, and India to name a few in between. We were always fascinated to see the photo album she would put together on her return and listen to the stories she told about the people she met.

When our children were much smaller we managed to convince her to come away with us on a couple of more relaxing beach holidays. Much as she enjoyed reading at home she wasn’t the sort of person to curl up on a sun lounger with a book, not when there were places she could explore, and the grandkids also made sure it wasn’t going to be a relaxing holiday for any of us, least of all Grandma. I’m still not sure what possessed Mum and I to go paragliding in Turkey! Running, and then Jumping off a cliff while strapped to a stranger felt courageous enough for me, let alone mum who was in her mid-sixties by then. At the end she remarked the scariest part was the jeep ride up the mountain, not coming back down again by parachute!

What all her travelling proved was that she had a great sense of adventure and that was epitomised by her walk along the Inca Trail to Machu Picchu. This multi-day hike was essentially a marathon at high elevation, so a challenge even to the younger trekkers. Beforehand Mum was a little concerned about her age and fitness but she put the training in and need not have worried as she found herself at or near the front the entire time. In fact the porters nicknamed her “the nanny goat” on account of how well she acquitted herself. For once the trip didn’t just conclude with a photo album but ended up becoming a PowerPoint presentation too which she gave twice at Isaac and Millie’s school as Machu Picchu was on their curriculum. That wasn’t the only lesson they got from her there either, as Mum and Dad also put us all to shame at the school’s 1940’s night with a wonderful display of swing dancing.

I don’t think mum was ever a passive bystander in anything she got involved in, she was always there to lend a hand and ultimately would get drawn in to fill whatever role needed her talents. While I’m sure she enjoyed watching us swim I think she preferred it when she could also be an active participant – initially helping-out by decorating the float for the parade, to becoming a committee member, then club secretary, and then officiating at galas in various capacities, even after we’d flown the nest. (Her long-standing service to the Kent County Executive was recognised in 2002 when she received the Edward Maples trophy.) She even got to rekindle her netball skills in a couple of Mother & Daughter swimming club socials and we discovered along the way that she had briefly appeared on TV in her youth playing netball.

This wasn’t the only time she has featured on TV, more recently her left arm made a guest appearance on the BBC during The Proms. Mum was a huge fan of art, both in the literal sense and the wider movement. Although we lived well over an hour away London was middle ground for us both and that gave us the perfect opportunity to meet up and take in a West End show or a trip to the Royal Albert Hall. I was already well versed in musicals long before meeting Charlotte and have Mum to thank for knowing so much of The Sound of Music off-by-heart. While always a favourite in our house too, Mamma Mia became the musical of choice when the kids went to stay at Grandma’s after seeing the show together in London.

Even though she didn’t live on our doorstep that didn’t stop her from attending so many of the concerts and productions that her grandchildren featured in. She was always a big supporter of their talents and watched them whether it was a bit-part in the school Nativity or a paid-concert in Ely Cathedral or Huddersfield Town Hall. (Or for that matter a freezing cold football pitch or rugby pitch, which is probably why she nudged Jo and I towards the warmth of a swimming pool.) For some of you this will be old news as she was keen to share their endeavours with her close friends as any doting grandmother would. She attended so many events in and around Godmanchester over the years that people were always surprised to learn that she actually lived 90 miles away!

During the pandemic this distance made it a little harder to meet up in person, but it didn’t stop her from socialising and even doing activities with the kids. Like everyone else we used Zoom to keep in touch and Ella and Mum created their own virtual Great British Bake Off to ensure those legendary cooking skills were still put to good use. I was never a big fan of baking per-se, but I did enjoy squishing the sausage meat between my fingers when we made sausage rolls for Christmas. Likewise making mince pies was something I enjoyed too, and this Christmas baking tradition was passed-down to my children while I took on the more important role of keeping Mum’s glass of Prosecco topped up. Her puddings definitely were legendary, but for The Oldwood family it’s undoubtedly her coffee flavoured birthday cake that she will be most sorely missed for, baking-wise.

I’m now two-thousand words in and have barely scratched the surface of memories I could talk about. At some point I need to stop and give you the opportunity to share your favourite memories with us, and with the other people here. And share them we must, because that’s how we keep her memory alive. Every time we plan a trip, or open a packet of biscuits, or play a game of Rummy, or use her baking tray, or pour a glass of red wine, or whatever, there will be another opportunity to share our love for the person we once knew as Jenny, or Mum, or Grandma.

Wednesday 4 October 2023

Unpacking Code Ownership

This post was prompted by a document I read which was presented as a development guide. While the rest of it was about style, the section that particularly piqued my interest was one involving code ownership. For those of us who’ve been around the block, the term “code ownership” can bring with it connotations of protectionism. If you’ve never worked with people who are incredibly guarded about the code they write may I recommend my 2017 blog post Fallibility which contains two examples of work colleagues that erected a wall around themselves and their code.

While I initially assumed the use of the term was a proxy for accountability, some comments to my suggestion that Relentless Refactoring was an established practice in many teams hinted that there might be more to it than that. What came out of an online meeting of the team was that the term was carrying the weight of two different characteristics.

(I should point out that I’m always wary of this kind of discussion verging into bike-shedding territory. I like to try and ensure that language is only as precise as necessary, so when I suspect there may be confusion or suboptimal behaviour as a consequence, do I feel it’s worth digging deeper. In this instance I think “ownership” was referring to the following attributes and not about gatekeeping or protectionism for selfish reasons, e.g. job security.)

Accountability / Responsibility

When people talk about “owning your mistakes” what they’re referring to is effectively being accountable, nay responsible, for them. While there might be a legal aspect in the cases of Machiavellian behaviour, for the most part what we’re really after is some indication that changes were not made simply because “you felt like it”. Any code change should be justifiable which implies that there is an air of objectivity around your rationale.

For example, reformatting the code simply because you personally prefer a different brace placement is not objective [1]. In contrast, reformatting to meet the pre-agreed in-house style is. Likewise applying any refactoring that brings old code back in line with the team’s preferred idioms is inherently sound. Moreover, neither of these should even require any debate as the guide automatically confers agreement [2].

Where it might get more contentious is when it appears to be superfluous, but as long as you can justify your actions with a sense of objectivity I think the team should err on the side of acceptance [3]. The reason I think this kind of change can end up being rejected by default is when there is nothing in the development process to allow the status quo to be challenged. A healthy development process should include time for retrospection (e.g. a formal retrospective) and this is probably the place for further debate if it cannot quickly be resolved. (You should not build “inventory” in the form of open PRs simply because of unresolved conflict [4].)

One scenario where this can be less objective is when trying to introduce new idioms, i.e. experimental changes that may or may not set a new precedent. I would expect this to solicit at least some up-front discussion or proactive reviewing / pairing to weed out the obvious chaff. Throwing “weird” code into the codebase without consulting your teammates is disrespectful and can lead to unnecessary turf wars.

Being accountable also implies that you are mature enough to deal with the consequences if the decision doesn’t go your way, aka Egoless Programming [5]. That may involve seeing your work rejected or rewritten, either immediately or in the future which can feel like a personal attack, but shouldn’t.

Experience / Expertise

While accountability looks at ownership from the perspective of the person wanting to change the code, the flipside of ownership is about those people best placed to evaluate change. When we look for someone to act as a reviewer we look for those who have the most experience either directly with the code itself, or from working on similar problems. There may also be different people that can provide a technical or business focused viewpoint if there are both elements at play which deserve special attention, for example when touching code where the previous authors have left and you need help validating your assumptions.

In this instance what we’re talking about are Subject Matter Experts. These people are no more “owners” of the code in question than we are but that doesn’t mean they can’t provide useful insights. If anything having people unrelated to the code reviewing it can be more useful because you know they will have no emotional attachment to it. If the change makes sense feature-wise, and does it in a simple, easy to understand way, does anything else really matter?

These days we have modern tooling like version control products which, assuming we put the right level of metadata in, allow us to see the evolution of the codebase along with the rationale even when the authors have long gone. Ownership doesn’t have to be conferred simply because you’re the only one that remembers how and why the code ended up the way it did. This leads into territory around fear of change which is not a sustainable approach to software delivery. In this day-and-age “consulting the elders” should really be a last resort for times when the record of events is lost in the sands of time. Approval should be a function based on knowledge of the subject matter rather than simply years of service [6].

Shepherds, Not Owners

Ultimately what I find slightly distasteful about the term “shared ownership” is that it still conveys a sense of protectionism, especially for those currently “outside the team”.

From a metaphorical point of view what I think I described above is more a sense of shepherding. The desire should be to nurture contributors to understand the culture of the codebase and product to the extent that the conversations can focus on the essential, rather than accidental complexity.

I wonder if “shared mentorship” would work as a substitute?

 

[1] This is a good argument for using a standard code formatting tool as it can make these debates moot. 

[2] If the code is that performance sensitive it should not be touched without consultation then there should either be some performance tests or at a minimum some comments to make that obvious.

[3] The late Pieter Hintjens makes a compelling case in Why Optimistic Merging Works Better.

[4] There is where I favour the optimism of Trust, but Verify as an approach, or pairing / ensemble programming to reach early consensus.

[5] The Psychology of Computer Programming – Gerry Weinberg, 1971.

[6] One needs to be mindful of not falling into the Meritocracy trap though.

Monday 31 October 2022

WMI Performance Anomaly: Querying the Number of CPU Cores

As one of the few devs that both likes and is reasonably well-versed in PowerShell I became the point of contact for a colleague that was bemused by a performance oddity when querying the number of cores on a host. He was introducing Ninja into the build and needed to throttle its expectations around how many actual cores there were because hyperthreading was enabled and our compilation intensive build was being slowed by its bad guesswork [1].

The PowerShell query for the number of cores (rather than logical processors) was pulled straight from the Internet and seemed fairly simple:

(Get-WmiObject Win32_Processor |
  measure -Property NumberOfCores -Sum).Sum

However when he ran this it took a whopping 4 seconds! Although he was using a Windows VM running on QEMU/KVM, I knew from benchmarking a while back this setup added very little overhead, i.e. only a percentage or two, and even on my work PC I observed a similar tardy performance. Here’s how we measured it:

Measure-Command {
  (Get-WmiObject Win32_Processor |
  measure -Property NumberOfCores -Sum).Sum
} | % TotalSeconds
4.0867539

(As I write my HP laptop running Windows 11 is still showing well over a second to run this command.)

My first instinct was that this was some weird overhead with PowerShell, what with it being .Net based so I tried the classic native wmic tool under the Git Bash to see how that behaved:

$ time WMIC CPU Get //Format:List | grep NumberOfCores  | cut -d '=' -f 2 | awk '{ sum += $1 } END{ print sum }'
4

real    0m4.138s

As you can see there was no real difference so that discounted the .Net theory. For kicks I tried lscpu under the WSL based Ubuntu 20.04 and that returned a far more sane time:

$ time lscpu > /dev/null

real    0m0.064s

I presume that lscpu will do some direct spelunking but even so the added machinery of WMI should not be adding the kind of ridiculous overhead that we were seeing. I even tried my own C++ based WMICmd tool as I knew that was talking directly to WMI with no extra cleverness going on behind the scenes, but I got a similar outcome.

On a whim I decided to try pushing more work onto WMI by passing a custom query instead so that it only needed to return the one value I cared about:

Measure-Command {
  (Get-WmiObject -Query 'select NumberOfCores from Win32_Processor' |
  measure -Property NumberOfCores -Sum).Sum
} | % TotalSeconds
0.0481644

Lo-and-behold that gave a timing in the tens of milliseconds range which was far closer to lscpu and definitely more like what we were expecting.

While my office machine has some “industrial strength” [2] anti-virus software that could easily be to blame, my colleague’s VM didn’t, only the default of MS Defender. So at this point I’m none the wiser about what was going on although my personal laptop suggests that the native tools of wmic and wmicmd are both returning times more in-line with lscpu so something funky is going on somewhere.

 

[1] Hyper-threaded cores meant Ninja was scheduling too much concurrent work.

[2] Read that as “massively interfering”!

 

Monday 1 November 2021

Chaining IF and && with CMD

An interesting bug cropped up the other day in a dub configuration file which made me realise I wasn’t consciously aware of the precedence of && when used in an IF statement with cmd.exe.

Batch File Idioms

I’ve written a ton of batch files over the years and, with error handling being a manual affair, the usual pattern is to alternate pairs of statement + error check, e.g.

mkdir folder
if %errorlevel% neq 0 exit /b %errorlevel%

It’s not uncommon for people to explicitly leave off the error check in this particular scenario so that (hopefully) the folder will exist whether not it already does. However it then masks a (not uncommon) failure where the folder can’t be created due to permissions and so I tend to go for the more verbose option:

if not exist "folder" (
  mkdir folder
  if !errorlevel! neq 0 exit /b !errorlevel!
)

Note the switch from %errorlevel% to !errorlevel!. I tend to use setlocal EnableDelayedExpansion at the beginning of every batch file and use !var! everywhere by convention to avoid forgetting this transformation as it’s an easy mistake to make in batch files.

Chaining Statements

In cmd you can chain commands with & (much like ; in bash) with && being used when the previous command succeeds and || for when it fails. This is useful with tools like dub which allow you to define “one liners” that will be executed during a build by “shelling out”. For example you might write this:

mkdir bin\media && copy media\*.* bin\media

This works fine first time but it’s not idempotent which might be okay for automated builds where the workspace is always clean but it’s annoying when running the build repeatedly, locally. Hence you might be inclined to fix this by changing it to:

if not exist "bin\media" mkdir bin\media && copy media\*.* bin\media

Sadly this doesn’t do what the author intended because the && is part of the IF statement “then” block – the copy is only executed if the folder doesn’t exist. Hence this was the aforementioned bug which wasn’t spotted at first as it worked fine for the automated builds but failed locally.

Here is a canonical example:

> if exist "C:\" echo A && echo B
A
B

> if not exist "C:\" echo A && echo B

As you can see, in the second case B is not printed so is part of the IF statement happy path.

Parenthesis to the Rescue

Naturally the solution to problems involving ordering or precedence is to introduce parenthesis to be more explicit.

If you look at how parenthesis were used in the second example right back at the beginning you might be inclined to write this thinking that the parenthesis create a scope somewhat akin to {} in C style languages:

> if not exist "C:\" (echo A) && echo B

But it won’t work as the parenthesis are still part of the “then” statement. (They are useful to control evaluation when mixing compound conditional commands that use, say, || and & [1].)

Hence the correct solution is to use parenthesis around the entire IF statement:

> (if not exist "C:\" echo A) && echo B
B

Applying this to the original problem, it’s:

(if not exist "bin\media" mkdir bin\media) && copy media\*.* bin\media

 

[1] Single line with multiple commands using Windows batch file

Thursday 30 September 2021

Transient Expand-Archive Failures

[I’m sure there is something else going on here but on the off-chance someone else is also observing this and also lost at least they’ll know they’re not alone.]

We have a GitLab project pipeline that started out as a monolithic job but over the last 9 months has slowly been parallelized and now runs as over 150 jobs spread out across a cluster of 4 fairly decent [1] machines with 8 to 10 concurrent jobs per host. More recently we’ve started seeing the PowerShell Expand-Archive cmdlet failing randomly up to 5% of the time with the following error:

Remove-Item : Cannot find path {...} because it does not exist.

The line of code highlighted in the error is:

$expandedItems | % { Remove-Item $_ -Force -Recurse }

If you google this message it suggests this probably isn’t the real error but a problem with the cmdlet trying to clean-up after failing to extract the contents of the .zip file. Sadly the reason why the extraction might have failed in the first place is now lost.

Investigation

While investigating this error message I ran across two main hits – one from Stack Overflow and the other on the PowerShell GitHub project – both about hitting the classic long path problem in Windows. In our case the extracted paths, even including the build agent root, is still only 100 characters so well within the limit as the archive only has one subfolder and the filenames are short.

Also the archive is built with it’s companion cmdlet Compress-Archive so I doubt it’s an impedance mismatch in our choice of tools.

My gut reaction to anything spurious like this is that it’s the virus scanner (AV) [2]. Sadly I have no direct control over the virus scanner product choice or its configuration. In this instance the machines have Trend Micro whereas the other build agents I’ve built are VMs and have Windows Defender [3], but their load is also much lower. I managed to get the build folder excluded temporarily but that appears to have had no effect and nothing was logged in the AV to say it had blocked anything. (The “behaviour monitoring” in modern AV products often gets triggered by build tools which is annoying.)

After discounting the obvious and checking that memory exhaustion also wasn’t a factor as the memory load for the jobs is variable and the worst case loading can cause the page-file to be used, I wondered if there the problem lay with the GitLab runner cache somehow.

Corrupt Runner Cache?

To avoid downloading the .zip file artefact for every job run we utilise the GitLab runner local cache. This is effectively a .zip file of a packages folder in the project working copy that gets packed up and re-used in the other jobs on the same machine which, given our level of concurrency, means it’s constantly in use. Hence I wondered if our archive was being corrupted when the cache was being unpacked as I’ve seen embedded .zip files cause problems in the past for AV tools (even though it supposedly shouldn’t have been touching the folder). So I added a step to test our archive’s integrity before unpacking it by using 7-Zip as there doesn’t appear to be a companion cmdlet Test-Archive. I immediately saw the integrity test pass but the Expand-Archive step fail a few times so I’m pretty sure the problem is not archive corruption.

Workaround

The workaround which I’ve employed is to use 7-Zip for the unpacking step too and so far we’ve seen no errors at all but I’m left wondering why Expand-Archive was intermittently failing. Taking an extra dependency on a popular tool like 7-Zip is hardly onerous but it bumps the complexity up very slightly and needs to be accounted for in the docs / scripts.

In my 2017 post Fallibility I mentioned how I once worked with someone who was more content to accept they’d found an undocumented bug in the Windows CopyFile() function than believe there was a flaw in their code or analysis [4]. Hence I feel something as ubiquitous as Expand-Archive is unlikely to have a decompression bug and that there is some piece of the puzzle here that I’m missing. Maybe the AV is still interfering in some way that isn’t triggered by 7-Zip or the transient memory pressure caused by the heavier jobs is having an impact?

Given the low cost of the workaround (use 7-Zip instead) the time, effort and disruption needed to run further experiments to explore this problem further is sadly too high. For the time being annecdata is the best I can do.

 

[1] 8 /16 cores, 64 / 128 GB RAM, and NVMe based disks.

[2] I once did some Windows kernel debugging to help prove an anti-virus product update was the reason our engine processes where not terminating correctly under low memory conditions.

[3] Ideally servers shouldn’t need anti-virus tools at all but the principle of Defence in Depth suggests the minor performance impact is worth it to potentially help slow lateral movement.

[4] TL;DR: I quickly showed it was the latter at fault not the Windows API.