tag:blogger.com,1999:blog-66289850225318661932024-03-18T09:38:41.837+00:00The OldWood Thingblog = reinterpret_cast<>(life);Chris Oldwoodhttp://www.blogger.com/profile/18183909440298909448noreply@blogger.comBlogger312125tag:blogger.com,1999:blog-6628985022531866193.post-59826508087121728282024-02-09T10:33:00.001+00:002024-02-09T10:33:25.197+00:00Our Star Baker<p><font face="Georgia">Just over 14 years ago I posted the eulogy I wrote for my father on this blog (</font><a href="https://chrisoldwood.blogspot.com/2010/01/so-long-and-thanks-for-all-onions.html"><font face="Georgia">So Long and Thanks For All the Onions</font></a><font face="Georgia">) mostly because I had just started writing and this blog gave me the confidence to write. Sadly, a month ago my mother passed away too and yesterday I got to present my eulogy for her as well. The writing practice from the intervening 14 years undoubtedly made the mechanics easier and afforded me more time for reflection and allowed me to better translate my thoughts and feelings to the page. So, thank you blog, but far more importantly “<em>thank you Mum”</em> you too will be with me always…</font></p> <blockquote> <p><font face="Georgia">I was visiting Mum in hospital a few years ago, when she was having one of her knees or hips or something replaced, and a nurse came in and addressed her as “Jennifer”. For a brief moment I wondered who they were talking to, I’m not even sure Mum acknowledged her at first either. Mum hasn’t been a Jennifer for very long time – she was always Jenny, or Jen to her friends. Of course, to myself and Jo she was “mum”, and to our various offspring she was Grandma, or Grandma Jenny. Millie and Ella came up with The Jenster but it never really gained any traction outside our family, for obvious reasons.</font></p> <font face="Georgia"></font> <p><font face="Georgia">She was only ever Jennifer in an official capacity, such as when she returned to work part-time, initially doing market research for the BBC. One of my earliest memories was of tagging along with Mum to some local parade of shops where she would interview the public about their viewing and listening habits. We got to play on the slopes and the railings while she quizzed the public, which was a lot more fun than it might sound today. From there she switched to clerical work, most notably (for me at least) at Bowater-Scott where one of her colleagues offered to build Jo and I a wooden sledge, which he did! As I only remember visiting that office maybe once or twice, I’ve always assumed Mum must have made quite an impression there.</font></p> <font face="Georgia"></font> <p><font face="Georgia">Eventually she ended up in the Audiology Department at West Hill Hospital in Dartford treating people that had lost their hearing. It was only supposed to be a two-week placement, but she ended up staying for 21 years in the end (after cross-training as a Student Technician a year later). It was fairly apparent even to us kids that not every patient was easy to deal with, but she always put their needs first, to the extent that she would often bring her paperwork home to allow her to prioritize her time with the patients instead. If anyone was ever in any doubt that Mum was a people-person, her career at the hospital would surely stand as testament, backed up by the many cards of thanks she received over the years from the people she helped. Perhaps the one aspect of her work we regret her bringing home was the need to talk so much louder all the time.</font></p> <font face="Georgia"></font> <p><font face="Georgia">The hospital wasn’t the most enjoyable place to hang around during the school holidays, but I soon discovered a tiny little computer shop at the top of West Hill which then made the trips to her workplace an absolute joy. When it came to buying me that first computer, I know Dad wasn’t so convinced, and it is Mum that I must thank for taking that leap of faith. Even forty years later Mum would still joke about whether it was the right thing to have done, as it could still just be a passing fad.</font></p> <font face="Georgia"></font> <p><font face="Georgia">When I wrote the eulogy for my father, I suggested that the genes which have probably contributed most to my career in computer programming probably came from him, but in this more recent time for reflection I am beginning to question if it wasn’t more from my mother’s side instead. For her generation Mum was very good with technology – the proverbial Silver Surfer. Although she might occasionally ask for technical advice, she often sorted out her own problems, along with those of her friends! She always wrote a good email and picked-up WhatsApp with similar ease, if occasionally being a little over-zealous with the emojis. We had several different family WhatsApp groups with which she was very active and helped ensure she remained in constant contact with her grandchildren and could easily find out what they were up to. She took a genuine interest in their lives and they were always keen to share. It wasn’t unusual for Charlotte or me to mention what Mum was up to only to be met with a chorus of “<em>yes, we know!</em>” because they had been conversely directly with her about it.</font></p> <font face="Georgia"></font> <p><font face="Georgia">This need to adapt to the ever-changing world was something which Mum embraced, not only on a technical level but also from a social perspective. Rather than dismiss young people because they haven’t faced the same struggles or because their viewpoint didn’t match hers, she would instead engage with them to try and understand how and why the world was changing the way it was. Her grandchildren helped her move with the times and in effect helped her to remain young at heart. She very much believed the old saying about only being as old as you feel. Her body may have shown some signs of wear and tear as she reached her eighties, but her mind was still razor-sharp, along with her wit.</font></p> <font face="Georgia"></font> <p><font face="Georgia">We probably shouldn’t be surprised that some of her joints needed replacing later in life because she was always such an active person! Her diary always seemed to be full – from dawn until dusk – whether that be out with friends and family, or abroad visiting another new country and making even <em>more</em> new friends, and not just for the duration of the trip, they often became lifelong friends which speaks volumes about the kind of impression she left on everyone she met.</font></p> <font face="Georgia"></font> <p><font face="Georgia">As a family we often joked when we visited somewhere new that grandma had probably already been there. If she had, you knew you could rely on her suggestions to fill your itinerary as they would include a range of beautiful vistas, buildings, galleries, restaurants, etc.. Over the years she visited a whole variety of different places from Alaska to Moscow with Canada, Croatia, China, and India to name a few in between. We were always fascinated to see the photo album she would put together on her return and listen to the stories she told about the people she met.</font></p> <font face="Georgia"></font> <p><font face="Georgia">When our children were much smaller we managed to convince her to come away with us on a couple of more relaxing beach holidays. Much as she enjoyed reading at home she wasn’t the sort of person to curl up on a sun lounger with a book, not when there were places she could explore, and the grandkids also made sure it wasn’t going to be a relaxing holiday for any of us, least of all Grandma. I’m still not sure what possessed Mum and I to go paragliding in Turkey! Running, and then Jumping off a cliff while strapped to a stranger felt courageous enough for me, let alone mum who was in her mid-sixties by then. At the end she remarked the scariest part was the jeep ride <em>up</em> the mountain, not coming back down again by parachute!</font></p> <font face="Georgia"></font> <p><font face="Georgia">What all her travelling proved was that she had a great sense of adventure and that was epitomised by her walk along the Inca Trail to Machu Picchu. This multi-day hike was essentially a marathon at high elevation, so a challenge even to the younger trekkers. Beforehand Mum was a little concerned about her age and fitness but she put the training in and need not have worried as she found herself at or near the front the entire time. In fact the porters nicknamed her “the nanny goat” on account of how well she acquitted herself. For once the trip didn’t just conclude with a photo album but ended up becoming a PowerPoint presentation too which she gave twice at Isaac and Millie’s school as Machu Picchu was on their curriculum. That wasn’t the only lesson they got from her there either, as Mum and Dad also put us all to shame at the school’s 1940’s night with a wonderful display of swing dancing.</font></p> <font face="Georgia"></font> <p><font face="Georgia">I don’t think mum was ever a passive bystander in anything she got involved in, she was always there to lend a hand and ultimately would get drawn in to fill whatever role needed her talents. While I’m sure she enjoyed watching us swim I think she preferred it when she could also be an active participant – initially helping-out by decorating the float for the parade, to becoming a committee member, then club secretary, and then officiating at galas in various capacities, even after we’d flown the nest. (Her long-standing service to the Kent County Executive was recognised in 2002 when she received the Edward Maples trophy.) She even got to rekindle her netball skills in a couple of Mother & Daughter swimming club socials and we discovered along the way that she had briefly appeared on TV in her youth playing netball.</font></p> <font face="Georgia"></font> <p><font face="Georgia">This wasn’t the only time she has featured on TV, more recently her left arm made a guest appearance on the BBC during The Proms. Mum was a huge fan of art, both in the literal sense and the wider movement. Although we lived well over an hour away London was middle ground for us both and that gave us the perfect opportunity to meet up and take in a West End show or a trip to the Royal Albert Hall. I was already well versed in musicals long before meeting Charlotte and have Mum to thank for knowing so much of The Sound of Music off-by-heart. While always a favourite in our house too, Mamma Mia became the musical of choice when the kids went to stay at Grandma’s after seeing the show together in London.</font></p> <font face="Georgia"></font> <p><font face="Georgia">Even though she didn’t live on our doorstep that didn’t stop her from attending so many of the concerts and productions that her grandchildren featured in. She was always a big supporter of their talents and watched them whether it was a bit-part in the school Nativity or a paid-concert in Ely Cathedral or Huddersfield Town Hall. (Or for that matter a freezing cold football pitch or rugby pitch, which is probably why she nudged Jo and I towards the warmth of a swimming pool.) For some of you this will be old news as she was keen to share their endeavours with her close friends as any doting grandmother would. She attended so many events in and around Godmanchester over the years that people were always surprised to learn that she actually lived 90 miles away!</font></p> <font face="Georgia"></font> <p><font face="Georgia">During the pandemic this distance made it a little harder to meet up in person, but it didn’t stop her from socialising and even doing activities with the kids. Like everyone else we used Zoom to keep in touch and Ella and Mum created their own virtual Great British Bake Off to ensure those legendary cooking skills were still put to good use. I was never a big fan of baking per-se, but I did enjoy squishing the sausage meat between my fingers when we made sausage rolls for Christmas. Likewise making mince pies was something I enjoyed too, and this Christmas baking tradition was passed-down to my children while I took on the more important role of keeping Mum’s glass of Prosecco topped up. Her puddings definitely <em>were</em> legendary, but for The Oldwood family it’s undoubtedly her coffee flavoured birthday cake that she will be most sorely missed for, baking-wise.</font></p> <font face="Georgia"></font> <p><font face="Georgia">I’m now two-thousand words in and have barely scratched the surface of memories I could talk about. At some point I need to stop and give <em>you</em> the opportunity to share your favourite memories with <em>us</em>, and with the other people here. And share them we must, because that’s how we keep her memory alive. Every time we plan a trip, or open a packet of biscuits, or play a game of Rummy, or use her baking tray, or pour a glass of red wine, or whatever, there will be another opportunity to share our love for the person we once knew as Jenny, or Mum, or Grandma.</font></p> <font face="Georgia"></font></blockquote>Chris Oldwoodhttp://www.blogger.com/profile/18183909440298909448noreply@blogger.com0tag:blogger.com,1999:blog-6628985022531866193.post-57131305016193670612023-10-04T10:53:00.001+01:002023-10-04T10:53:54.902+01:00Unpacking Code Ownership<p><font face="Georgia">This post was prompted by a document I read which was presented as a development guide. While the rest of it was about style, the section that particularly piqued my interest was one involving code ownership. For those of us who’ve been around the block, the term “code ownership” can bring with it connotations of protectionism. If you’ve never worked with people who are incredibly guarded about the code they write may I recommend my 2017 blog post </font><a href="https://chrisoldwood.blogspot.com/2017/12/fallibility.html"><em><font face="Georgia">Fallibility</font></em></a><font face="Georgia"> which contains two examples of work colleagues that erected a wall around themselves and their code.</font></p> <font face="Georgia"></font> <p><font face="Georgia">While I initially assumed the use of the term was a proxy for accountability, some comments to my suggestion that </font><a href="https://chrisoldwood.blogspot.com/2014/11/relentless-refactoring.html"><em><font face="Georgia">Relentless Refactoring</font></em></a><font face="Georgia"> was an established practice in many teams hinted that there might be more to it than that. What came out of an online meeting of the team was that the term was carrying the weight of two different characteristics.</font></p> <font face="Georgia"></font> <p><font face="Georgia">(I should point out that I’m always wary of this kind of discussion verging into </font><a href="https://en.wikipedia.org/wiki/Law_of_triviality"><font face="Georgia">bike-shedding</font></a><font face="Georgia"> territory. I like to try and ensure that language is only as precise as necessary, so when I suspect there may be confusion or suboptimal behaviour as a consequence, do I feel it’s worth digging deeper. In this instance I think “ownership” was referring to the following attributes and not about gatekeeping or protectionism for selfish reasons, e.g. job security.)</font></p> <font face="Georgia"></font> <p><strong><font face="Georgia">Accountability / Responsibility</font></strong></p> <font face="Georgia"></font> <p><font face="Georgia">When people talk about “owning your mistakes” what they’re referring to is effectively being accountable, nay responsible, for them. While there might be a legal aspect in the cases of Machiavellian behaviour, for the most part what we’re really after is some indication that changes were not made simply because “you felt like it”. Any code change should be justifiable which implies that there is an air of objectivity around your rationale.</font></p> <font face="Georgia"></font> <p><font face="Georgia">For example, reformatting the code simply because you <em>personally</em> prefer a different brace placement is not objective [1]. In contrast, reformatting to meet the pre-agreed in-house style is. Likewise applying any refactoring that brings old code back in line with the team’s preferred idioms is inherently sound. Moreover, neither of these should even require any debate as the guide automatically confers agreement [2].</font></p> <font face="Georgia"></font> <p><font face="Georgia">Where it might get more contentious is when it <em>appears</em> to be superfluous, but as long as you can justify your actions with a sense of objectivity I think the team should err on the side of acceptance [3]. The reason I think this kind of change can end up being rejected by default is when there is nothing in the development process to allow the status quo to be challenged. A healthy development process should include time for retrospection (e.g. a formal retrospective) and this is probably the place for further debate if it cannot quickly be resolved. (You should not build “inventory” in the form of open PRs simply because of unresolved conflict [4].)</font></p> <font face="Georgia"></font> <p><font face="Georgia">One scenario where this can be less objective is when trying to introduce new idioms, i.e. experimental changes that may or may not set a <em>new</em> precedent. I would expect this to solicit at least some up-front discussion or proactive reviewing / pairing to weed out the obvious chaff. Throwing “weird” code into the codebase without consulting your teammates is disrespectful and can lead to unnecessary turf wars.</font></p> <font face="Georgia"></font> <p><font face="Georgia">Being accountable also implies that you are mature enough to deal with the consequences if the decision doesn’t go your way, aka Egoless Programming [5]. That may involve seeing your work rejected or rewritten, either immediately or in the future which can feel like a personal attack, but shouldn’t.</font></p> <font face="Georgia"></font> <p><strong><font face="Georgia">Experience / Expertise</font></strong></p> <font face="Georgia"></font> <p><font face="Georgia">While accountability looks at ownership from the perspective of the person wanting to change the code, the flipside of ownership is about those people best placed to evaluate change. When we look for someone to act as a reviewer we look for those who have the most experience either directly with the code itself, or from working on similar problems. There may also be different people that can provide a technical or business focused viewpoint if there are both elements at play which deserve special attention, for example when touching code where the previous authors have left and you need help validating your assumptions.</font></p> <font face="Georgia"></font> <p><font face="Georgia">In this instance what we’re talking about are Subject Matter Experts. These people are no more “owners” of the code in question than we are but that doesn’t mean they can’t provide useful insights. If anything having people unrelated to the code reviewing it can be more useful because you know they will have no emotional attachment to it. If the change makes sense feature-wise, and does it in a simple, easy to understand way, does anything else really matter?</font></p> <font face="Georgia"></font> <p><font face="Georgia">These days we have modern tooling like version control products which, assuming we put the right level of metadata in, allow us to see the evolution of the codebase along with the rationale even when the authors have long gone. Ownership doesn’t have to be conferred simply because you’re the only one that remembers how and why the code ended up the way it did. This leads into territory around fear of change which is not a sustainable approach to software delivery. In this day-and-age “consulting the elders” should really be a last resort for times when the record of events is lost in the sands of time. Approval should be a function based on knowledge of the subject matter rather than simply years of service [6].</font></p> <p> <p><strong><font face="Georgia">Shepherds, Not Owners</font></strong></p> <p><font face="Georgia">Ultimately what I find slightly distasteful about the term “shared ownership” is that it still conveys a sense of protectionism, especially for those currently “outside the team”.</font></p> <p><font face="Georgia">From a metaphorical point of view what I think I described above is more a sense of shepherding. The desire should be to nurture contributors to understand the culture of the codebase and product to the extent that the conversations can focus on the essential, rather than accidental complexity. </font></p> <p><font face="Georgia">I wonder if “shared mentorship” would work as a substitute?</font></p> <font face="Georgia"> <p> <p><font face="Georgia"> <p><font face="Georgia"> </font></p> </font><font face="Georgia"><font face="Georgia">[1] This is a good argument for using a standard code formatting tool as it can make these debates moot.  </font></font></p> </p> </font> <p><font face="Georgia">[2] If the code is <em>that</em> performance sensitive it should not be touched without consultation then there should either be some performance tests or at a minimum some comments to make that obvious.</font></p> </p> <font face="Georgia"></font> <p><font face="Georgia">[3] The late Pieter Hintjens makes a compelling case in </font><a href="http://hintjens.com/blog:106"><font face="Georgia">Why Optimistic Merging Works Better</font></a><font face="Georgia">.</font></p> <font face="Georgia"></font> <p><font face="Georgia">[4] There is where I favour the optimism of </font><a href="http://www.chrisoldwood.com/articles/afterwood-trust-but-verify.html"><font face="Georgia">Trust, but Verify</font></a><font face="Georgia"> as an approach, or pairing / ensemble programming to reach early consensus.</font></p> <font face="Georgia"></font> <p><font face="Georgia">[5] The Psychology of Computer Programming – Gerry Weinberg, 1971.</font></p> <p><font face="Georgia">[6] One needs to be mindful of not falling into the </font><a href="https://en.wikipedia.org/wiki/Meritocracy"><font face="Georgia">Meritocracy</font></a><font face="Georgia"> trap though.</font></p>Chris Oldwoodhttp://www.blogger.com/profile/18183909440298909448noreply@blogger.com0tag:blogger.com,1999:blog-6628985022531866193.post-49481327127812302952022-10-31T00:12:00.001+00:002022-10-31T00:12:29.061+00:00WMI Performance Anomaly: Querying the Number of CPU Cores<p><font face="Georgia">As one of the few devs that both likes and is reasonably well-versed in PowerShell I became the point of contact for a colleague that was bemused by a performance oddity when querying the number of cores on a host. He was introducing Ninja into the build and needed to throttle its expectations around how many actual cores there were because hyperthreading was enabled and our compilation intensive build was being slowed by its bad guesswork [1].</font></p> <p><font face="Georgia">The PowerShell query for the number of cores (rather than logical processors) was pulled straight from the Internet and seemed fairly simple:</font></p> <p><font face="Courier New">(Get-WmiObject Win32_Processor | <br />  measure -Property NumberOfCores -Sum).Sum</font></p> <p><font face="Georgia">However when he ran this it took a whopping 4 seconds! Although he was using a Windows VM running on QEMU/KVM, I knew from benchmarking a while back this setup added very little overhead, i.e. only a percentage or two, and even on my work PC I observed a similar tardy performance. Here’s how we measured it:</font></p> <p><font face="Courier New">Measure-Command { <br />  (Get-WmiObject Win32_Processor | <br />  measure -Property NumberOfCores -Sum).Sum <br />} | % TotalSeconds <br style="box-sizing: inherit;" />4.0867539</font></p> <p><font face="Georgia">(As I write my HP laptop running Windows 11 is still showing well over a second to run this command.)</font></p> <p><font face="Georgia">My first instinct was that this was some weird overhead with PowerShell, what with it being .Net based so I tried the classic native <font face="Courier New">wmic</font> tool under the Git Bash to see how that behaved:</font></p> <p><font face="Courier New">$ time WMIC CPU Get //Format:List | grep NumberOfCores  <wbr style="box-sizing: inherit;"></wbr>| cut -d '=' -f 2 | awk '{ sum += $1 } END{ print sum }' <br style="box-sizing: inherit;" />4 <br style="box-sizing: inherit;" /> <br style="box-sizing: inherit;" />real  <wbr style="box-sizing: inherit;"></wbr>  <wbr style="box-sizing: inherit;"></wbr>0m4.138s</font></p> <p><font face="Georgia">As you can see there was no real difference so that discounted the .Net theory. For kicks I tried <font face="Courier New">lscpu</font> under the WSL based Ubuntu 20.04 and that returned a far more sane time:</font></p> <p><font face="Courier New">$ time lscpu > /dev/null</font></p> <p><font face="Courier New">real    0m0.064s</font></p> <p><font face="Georgia">I presume that <font face="Courier New">lscpu</font> will do some direct spelunking but even so the added machinery of WMI should not be adding the kind of ridiculous overhead that we were seeing. I even tried my own <a href="http://www.chrisoldwood.com/win32.htm#wmicmd">C++ based WMICmd</a> tool as I knew that was talking directly to WMI with no extra cleverness going on behind the scenes, but I got a similar outcome.</font></p> <p><font face="Georgia">On a whim I decided to try pushing more work onto WMI by passing a custom query instead so that it only needed to return the one value I cared about:</font></p> <p><font face="Courier New">Measure-Command { <br />  (Get-WmiObject -Query 'select NumberOfCores from Win32_Processor' | <br />  measure -Property NumberOfCores -Sum).Sum <br />} | % TotalSeconds <br />0.0481644</font></p> <p><font face="Georgia">Lo-and-behold that gave a timing in the <em>tens of milliseconds </em>range which was far closer to <font face="Courier New">lscpu</font> and definitely more like what we were expecting.</font></p> <p><font face="Georgia">While my office machine has some “industrial strength” [2] anti-virus software that could easily be to blame, my colleague’s VM didn’t, only the default of MS Defender. So at this point I’m none the wiser about what was going on although my personal laptop suggests that the native tools of <font face="Courier New">wmic</font> and <font face="Courier New">wmicmd</font> are both returning times more in-line with <font face="Courier New">lscpu</font> so something funky is going on somewhere.</font></p> <p><font face="Georgia"> </font></p> <p><font face="Georgia">[1] Hyper-threaded cores meant Ninja was scheduling too much concurrent work.</font></p> <p><font face="Georgia">[2] Read that as “massively interfering”!</font></p> <p><font face="Georgia"> </font></p>Chris Oldwoodhttp://www.blogger.com/profile/18183909440298909448noreply@blogger.com0tag:blogger.com,1999:blog-6628985022531866193.post-53799763227378497572021-11-01T10:25:00.001+00:002021-11-01T10:25:56.259+00:00Chaining IF and && with CMD<p><font face="Georgia">An interesting bug cropped up the other day in a <a href="https://dub.pm/package-format-sdl">dub configuration file</a> which made me realise I wasn’t consciously aware of the precedence of <font face="Courier New">&&</font> when used in an <font face="Courier New">IF</font> statement with <font face="Courier New">cmd.exe</font>.</font></p> <p><strong><font face="Georgia">Batch File Idioms</font></strong></p> <p><font face="Georgia">I’ve written a ton of batch files over the years and, with error handling being a manual affair, the usual pattern is to alternate pairs of statement + error check, e.g.</font></p> <p><font face="Courier New">mkdir folder <br />if %errorlevel% neq 0 exit /b %errorlevel%</font></p> <p><font face="Georgia">It’s not uncommon for people to explicitly leave off the error check in this particular scenario so that (hopefully) the folder will exist whether not it already does. However it then masks a (not uncommon) failure where the folder can’t be created due to permissions and so I tend to go for the more verbose option:</font></p> <p><font face="Courier New">if not exist "folder" ( <br />  mkdir folder <br />  if !errorlevel! neq 0 exit /b !errorlevel! <br />)</font></p> <p><font face="Georgia">Note the switch from <font face="Courier New">%errorlevel%</font> to <font face="Courier New">!errorlevel!</font>. I tend to use <font face="Courier New">setlocal EnableDelayedExpansion</font> at the beginning of every batch file and use <font face="Courier New">!var!</font> everywhere by convention to avoid forgetting this transformation as it’s an easy mistake to make in batch files.</font></p> <p><strong><font face="Georgia">Chaining Statements</font></strong></p> <p><font face="Georgia">In <font face="Courier New">cmd</font> you can chain commands with <font face="Courier New">&</font> (much like <font face="Courier New">;</font> in <font face="Courier New">bash</font>) with <font face="Courier New">&&</font> being used when the previous command succeeds and <font face="Courier New">||</font> for when it fails. This is useful with tools like dub which allow you to define “one liners” that will be executed during a build by “<a href="https://stackoverflow.com/questions/28628985/what-does-shell-out-or-shelling-out-mean">shelling out</a>”. For example you might write this:</font></p> <p><font face="Courier New">mkdir bin\media && copy media\*.* bin\media</font></p> <p><font face="Georgia">This works fine first time but it’s not idempotent which might be okay for automated builds where the workspace is always clean but it’s annoying when running the build repeatedly, locally. Hence you might be inclined to fix this by changing it to:</font></p> <p><font face="Courier New">if not exist "bin\media" mkdir bin\media && copy media\*.* bin\media</font></p> <p><font face="Georgia">Sadly this doesn’t do what the author intended because the <font face="Courier New">&&</font> is part of the IF statement “then” block – the <font face="Courier New">copy</font> is only executed if the folder doesn’t exist. Hence this was the aforementioned bug which wasn’t spotted at first as it worked fine for the automated builds but failed locally.</font></p> <p><font face="Georgia">Here is a canonical example:</font></p> <p><font face="Courier New">> if exist "C:\" echo A && echo B <br />A <br />B</font></p> <p><font face="Georgia"><font face="Courier New">> if not exist "C:\" echo A && echo B <br /></font> <br />As you can see, in the second case B is not printed so is part of the IF statement happy path.</font></p> <p><strong><font face="Georgia">Parenthesis to the Rescue</font></strong></p> <p><font face="Georgia">Naturally the solution to problems involving ordering or precedence is to introduce parenthesis to be more explicit.</font></p> <p><font face="Georgia">If you look at how parenthesis were used in the second example right back at the beginning you might be inclined to write this thinking that the parenthesis create a scope somewhat akin to <font face="Courier New">{}</font> in C style languages:</font></p> <p><font face="Courier New">> if not exist "C:\" (echo A) && echo B <br /></font></p> <p><font face="Georgia">But it won’t work as the parenthesis are still part of the “then” statement. (They <em>are</em> useful to control evaluation when mixing compound conditional commands that use, say, <font face="Courier New">||</font> and <font face="Courier New">&</font> [1].)</font></p> <p><font face="Georgia">Hence the correct solution is to use parenthesis around the entire IF statement:</font></p> <p><font face="Courier New">> (if not exist "C:\" echo A) && echo B <br />B</font></p> <p><font face="Georgia">Applying this to the original problem, it’s:</font></p> <p><font face="Courier New">(if not exist "bin\media" mkdir bin\media) && copy media\*.* bin\media</font></p> <p> </p> <p><font face="Georgia">[1] </font><a href="https://stackoverflow.com/questions/25343351/single-line-with-multiple-commands-using-windows-batch-file"><font face="Georgia">Single line with multiple commands using Windows batch file<font face="Georgia"> <br /></font></font></a></p>Chris Oldwoodhttp://www.blogger.com/profile/18183909440298909448noreply@blogger.com1tag:blogger.com,1999:blog-6628985022531866193.post-54994685962994308662021-09-30T10:50:00.001+01:002021-09-30T10:50:33.209+01:00Transient Expand-Archive Failures<p><em><font face="Georgia">[I’m sure there is something else going on here but on the off-chance someone else is also observing this and also lost at least they’ll know they’re not alone.]</font></em></p> <p><font face="Georgia">We have a GitLab project pipeline that started out as a monolithic job but over the last 9 months has slowly been parallelized and now runs as over 150 jobs spread out across a cluster of 4 fairly decent [1] machines with 8 to 10 concurrent jobs per host. More recently we’ve started seeing the PowerShell <font face="Courier New">Expand-Archive</font> cmdlet failing randomly up to 5% of the time with the following error:</font></p> <p><font face="Courier New">Remove-Item : Cannot find path {...} because it does not exist.</font></p> <p><font face="Georgia">The line of code highlighted in the error is:</font></p> <p><font face="Courier New">$expandedItems | % { Remove-Item $_ -Force -Recurse }</font></p> <p><font face="Georgia">If you google this message it suggests this probably isn’t the real error but a problem with the cmdlet trying to clean-up after failing to extract the contents of the .zip file. Sadly the reason why the extraction might have failed in the first place is now lost.</font></p> <p><strong><font face="Georgia">Investigation</font></strong></p> <p><font face="Georgia">While investigating this error message I ran across two main hits – </font><a href="https://stackoverflow.com/questions/50106917/expand-archive-in-powershell-is-failing-to-extract-nested-folders-and-files"><font face="Georgia">one from Stack Overflow</font></a><font face="Georgia"> and the other on </font><a href="https://github.com/PowerShell/Microsoft.PowerShell.Archive/issues/69"><font face="Georgia">the PowerShell GitHub project</font></a><font face="Georgia"> – both about hitting the classic long path problem in Windows. In our case the extracted paths, even including the build agent root, is still only 100 characters so well within the limit as the archive only has one subfolder and the filenames are short.</font></p> <p><font face="Georgia">Also the archive is built with it’s companion cmdlet <font face="Courier New">Compress-Archive</font> so I doubt it’s an impedance mismatch in our choice of tools.</font></p> <p><font face="Georgia">My gut reaction to anything spurious like this is that it’s the virus scanner (AV) [2]. Sadly I have no <em>direct</em> control over the virus scanner product choice or its configuration. In this instance the machines have Trend Micro whereas the other build agents I’ve built are VMs and have Windows Defender [3], but their load is also much lower. I managed to get the build folder excluded temporarily but that appears to have had no effect and nothing was logged in the AV to say it had blocked anything. (The “behaviour monitoring” in modern AV products often gets triggered by build tools which is annoying.)</font></p> <p><font face="Georgia">After discounting the obvious and checking that memory exhaustion also wasn’t a factor as the memory load for the jobs is variable and the worst case loading can cause the page-file to be used, I wondered if there the problem lay with the GitLab runner cache somehow.</font></p> <p><font face="Georgia"><strong>Corrupt Runner Cache?</strong></font></p> <p><font face="Georgia">To avoid downloading the .zip file artefact for every job run we utilise the GitLab runner local cache. This is effectively a .zip file of a <font face="Courier New">packages</font> folder in the project working copy that gets packed up and re-used in the other jobs on the same machine which, given our level of concurrency, means it’s constantly in use. Hence I wondered if our archive was being corrupted when the cache was being unpacked as I’ve seen embedded .zip files cause problems in the past for AV tools (even though it supposedly shouldn’t have been touching the folder). So I added a step to test our archive’s integrity before unpacking it by using 7-Zip as there doesn’t appear to be a companion cmdlet <font face="Courier New">Test-Archive</font>. I immediately saw the integrity test pass but the <font face="Courier New">Expand-Archive</font> step fail a few times so I’m pretty sure the problem is not archive corruption.</font></p> <p><font face="Georgia"><strong>Workaround</strong></font></p> <p><font face="Georgia">The workaround which I’ve employed is to use 7-Zip for the unpacking step too and so far we’ve seen no errors at all but I’m left wondering why <font face="Courier New">Expand-Archive</font> was intermittently failing. Taking an extra dependency on a popular tool like 7-Zip is hardly onerous but it bumps the complexity up very slightly and needs to be accounted for in the docs / scripts.</font></p> <p><font face="Georgia">In my 2017 post </font><a href="https://chrisoldwood.blogspot.com/2017/12/fallibility.html"><font face="Georgia">Fallibility</font></a><font face="Georgia"> I mentioned how I once worked with someone who was more content to accept they’d found an undocumented bug in the Windows <font face="Courier New">CopyFile()</font> function than believe there was a flaw in their code or analysis [4]. Hence I feel something as ubiquitous as <font face="Courier New">Expand-Archive</font> is unlikely to have a decompression bug and that there is some piece of the puzzle here that I’m missing. Maybe the AV is still interfering in some way that isn’t triggered by 7-Zip or the transient memory pressure caused by the heavier jobs is having an impact?</font></p> <p><font face="Georgia">Given the low cost of the workaround (use 7-Zip instead) the time, effort and disruption needed to run further experiments to explore this problem further is sadly too high. For the time being <a href="https://en.wikipedia.org/wiki/Anecdotal_evidence">annecdata</a> is the best I can do.</font></p> <p><font face="Georgia"> </font></p> <p><font face="Georgia">[1] 8 /16 cores, 64 / 128 GB RAM, and NVMe based disks.</font></p> <p><font face="Georgia">[2] I once did some Windows kernel debugging to help prove an anti-virus product update was the reason our engine processes where not terminating correctly under low memory conditions.</font></p> <p><font face="Georgia">[3] Ideally servers shouldn’t need anti-virus tools at all but the principle of Defence in Depth suggests the minor performance impact is worth it to potentially help slow lateral movement.</font></p> <p><font face="Georgia">[4] TL;DR: I quickly showed it was the latter at fault not the Windows API.</font></p> <p><font face="Georgia"> </font></p>Chris Oldwoodhttp://www.blogger.com/profile/18183909440298909448noreply@blogger.com0tag:blogger.com,1999:blog-6628985022531866193.post-9289356194898571922021-09-27T09:40:00.001+01:002021-09-27T09:40:21.695+01:00Lose the Source Luke?<p><font face="Georgia">We were writing a new service to distribute financial pricing data around the trading floor as a companion to our new desktop pricing tool. The plugin architecture allowed us to write modular components that could tap into the event streams for various reasons, e.g. provide gateways to 3rd party data streams.</font></p> <p><strong>Linking New to Old</strong></p> <p><font face="Georgia">One of the first plugins we wrote allowed us to publish pricing data to a much older in-house data service which had been sat running in the server room for some years as part of the contributions system. This meant we could eventually phase that out and switch over to the new platform once we had parity with it.</font></p> <p><font face="Georgia">The plugin was a doddle to write and we quickly had pricing data flowing from the new service out to a test instance of the old service which we intended to leave running in the background for soak testing. As it was an in-house tool there was no installer and my colleague had a copy of the binaries lying around on his machine [1]. Also he was one of the original developers so knew exactly what he was doing to set it up.</font></p> <p><font face="Georgia"><strong>A Curious Error Message</strong></font></p> <p><font face="Georgia">Everything seemed to be working fine at first but as the data volumes grew we suddenly noticed that the data feed would eventually hang after a few days. In the beginning we were developing the core of the new service so quickly it was constantly being upgraded but now the pace was slowing down the new service was alive for much longer. Given how mature the old service was we assumed the issue was with the new one. Also there was a curious message in the log for the old service about “an invalid transaction ID” before the feed stopped.</font></p> <p><font face="Georgia">While debugging the new plugin code my colleague remembered that the Transaction ID meant the message sequence number that goes in every message to allow for ordering and re-transmission when running over UDP. The data type for that was a 16-bit unsigned integer so it dawned on us that we had probably messed up handling the wrapping of the Transaction ID.</font></p> <p><font face="Georgia"><strong>Use the Source Luke</strong></font></p> <p><font face="Georgia">Given how long ago he last worked on the old service he couldn’t quite remember what the protocol was for resetting the Transaction ID so we decided to go and look at the old service source code to see how it handled it. Despite being at the company for a few years myself this all pre-dated me so I left my colleague to do the rummaging.</font></p> <p><font face="Georgia">Not long after my colleague came back over to my desk and asked if I might know where the source code was. Like so many programmers in a small company I was a part-time sysadmin and generally looked after some of servers we used for development duties, such as the one where our Visual SourceSafe repository lived that contained all the projects we’d ever worked on since I joined.</font></p> <p><font face="Georgia"><strong>The VCS Upgrade</strong></font></p> <p><font face="Georgia">When I first started at the company there were only a couple of programmers not working on the mainframe and they wrote their own version control system. It was very <a href="https://en.wikipedia.org/wiki/W._Heath_Robinson">Heath Robinson</a> and used exclusive file locks to side-step the problem of concurrent changes. Having been used to a few VCS tools by then such as PVCS, Star Versions, and Visual SourceSafe I suggested that we move to a 3rd party VCS product as we needed more optimistic concurrency controls as more people were going to join the team. Given the MSDN licenses we already had along with my own experience Visual SourceSafe (VSS) seemed like a natural choice back then [2].</font></p> <p><font face="Georgia">Around the same time the existing development server was getting a bit long in the tooth so the company forked out for a brand new server and so I set-up the new VSS repository on <em>that</em> and all my code went in there along with all the subsequent projects we started. None of the people that joined after me ever touched any of the old codebase or VCS as it was so mature it hadn’t needed changing in some time and anyway the two original devs where still there to look after it.</font></p> <p><font face="Georgia"><strong>The Office Move</strong></font></p> <p><font face="Georgia">A couple of years after I joined, the owners of the lovely building the company had been renting for the last few decades decided they wanted to gut and renovate it as the area in London where we were based was getting a big makeover. Hence we were forced to move to new premises about half a mile away. The new premises were nice and modern and I no longer had the vent from the portable air-conditioning machine from one of the small server rooms pumping out hot air right behind my desk [3].</font></p> <p><font face="Georgia">When moving day came I made sure the new server with all our stuff on got safely transported to the new office’s server room so that we ready to go again on Monday morning. As we stood staring around the empty office floor my colleague pointed to the old development server which had lay dormant in the corner and asked me (rhetorically) whether we should even bother taking it with us. As far as I was concerned everything I’d ever needed had always been on the new server and so I didn’t know what was left that we’d still need.</font></p> <p><font face="Georgia">My colleague agreed and so we left the server to be chucked in the skip when the bulldozers came.</font></p> <font face="Georgia"> <p><strong>Dormant, But Not Redundant</strong></p> </font> <p><font face="Georgia"><font face="Georgia">It turned out their original home-grown version control system had a few projects in it, including the old data service. Luckily one of the original developers who worked on the contributions side still had an up-to-date copy of that and my colleague found a local copy of the code for one of the other services but had no idea how up-to-date it was. Sadly nobody had even a partial copy of the source to the data service we were interested in but we were going to replace that anyway so in the end the loss was far less significant than we originally feared.</font></font></p> <p><font face="Georgia">In retrospect I can’t believe we didn’t even just take the hard disk with us. The server was a classic tower so took up a far bit of room which was still somewhat at a premium in the new office whereas the disk could probably have sit in a desk drawer or even been fitted as an extra drive in the new midi sized development server. <p><font face="Georgia"> </font></p> <font face="Georgia">[1] +1 for <font face="Courier New">xcopy</font> deployment which made setting up development and test instances a piece of cake.</font></font></p> <p><font face="Georgia">[2] There are a lot of stories of file corruption issues with VSS but in the 7 years I’d used it with small teams, even over a VPN, we only had one file corruption issue that we quickly restored from a backup.</font></p> <p><font face="Georgia">[3] We were on the opposite side from the windows too so didn’t even get a cool breeze from those either.</font></p> <p><font face="Georgia"> </font></p>Chris Oldwoodhttp://www.blogger.com/profile/18183909440298909448noreply@blogger.com0tag:blogger.com,1999:blog-6628985022531866193.post-19715466136581754662021-09-23T09:33:00.001+01:002021-09-23T09:33:01.393+01:00The Case of the Curious Commit Message<p><font face="Georgia">I had taken a new contract at an investment bank and started working on a very mature codebase which was stored in ClearCase. As a long-time user [1] of version control systems one of the things that bugged me about the codebase were empty commit messages. On a mature codebase where people have come and gone it’s hard enough to work out what was going on just from the code, decent commit messages should be there to give you that extra context around the “why”.</font></p> <p><font face="Georgia"><strong>Rallying the Troops</strong></font></p> <p><font face="Georgia">After attempting to sell the virtues of commit messages to my colleagues a couple of times during team meetings there were still the odd one or two that consistently avoided doing so. So I decided to try a name-and-shame approach [2] by emailing a table of names along with their percentage of non-empty commit messages hoping those that appeared at the bottom would consider changing their ways.</font></p> <p><font face="Georgia">At the time I was just getting my head around ClearCase and there were a couple of complaints from people who felt unduly chastised because they didn’t have 100% when they felt they should. It turned out their accounts were used for some automated check-ins which had no message which I didn’t know about, so I excluded those commits and published a revised table.</font></p> <p><font face="Georgia"><strong>Progress?</strong></font></p> <p><font face="Georgia">On the plus side this got people discussing what a good commit message looked like and it brought up some question marks around certain practices that others had done. For example a few team members wouldn’t write a formal message but simply paste the ID of the issue from ClearQuest [3]. Naturally this passed my “not empty” test but it raised a question about overly terse commit messages. Given where we were coming from I felt this was definitely acceptable (for the time being) as they were still using the commit message to provide more details, albeit in the form of a link to the underlying business request [5].</font></p> <p><font face="Georgia">However, it got me thinking about whether people were not really playing ball and might be gaming the system so I started looking into overly terse commit messages and I’m glad to say everyone was entering into the spirit of things [4]. Everyone except one person who had never even been on the initial radar but who had a sizable number of commits with the simple message:</font></p> <p><font face="Georgia">    nt</font></p> <p><font face="Georgia">That’s right, just the two letters ‘n’ and ‘t’. (There were others but this was the most prevalent and memorable.)</font></p> <p><font face="Georgia"><strong>A Curious Message</strong></font></p> <p><font face="Georgia">Looking at the diffs that went with these messages it wasn’t obvious what “nt” meant. My initial instinct was that it was an abbreviation of some sort, perhaps a business term I was unaware of as the developer was involved in the more maths heavy side of the project. They were far more common before my “shake-up” so I was pleased that whatever this term was it was being replaced by more useful comments now but I was still intrigued. Naturally I walked across the room to the very pleasant developer in question and asked him what “nt” meant.</font></p> <p><font face="Georgia">It turned out it didn’t mean anything, and the developer was largely unaware they even existed! So where did they come from?</font></p> <p><font face="Georgia"><strong>The Mist Clears</strong></font></p> <p><font face="Georgia">Luckily while we were chatting he started making a new change and I saw the ClearCase check-out dialog appear and the initial message was a few letters of garbage. I looked at what he intended to type in the editor and it dawned on me what was happening – the “nt” was the latter part of the word “int”.</font></p> <p><font face="Georgia">Just as with Visual SourceSafe, the ClearCase Visual Studio plugin would trigger when you started editing a file and nothing else was checked out at that point. It would pop-up a dialog so you could configure how the check-out was done. For example you might want to put an exclusive lock on the file [6] or you could provide a message so others could see what files were being edited concurrently. By default the focus in this dialog was on the OK button so it was possible to dismiss this dialog without even really seeing it…</font></p> <p><font face="Georgia">Hence this is what was going on:</font></p> <ol> <li><font face="Georgia">The dev typed “int” to start a declaration as part of a new set of code changes.</font></li> <li><font face="Georgia">The “i” keypress triggered the ClearCase plugin which noticed this was the start of a new check-out and promptly threw up a dialog with the remaining letters “nt” in the message field.</font></li> <li><font face="Georgia">By then the dev had already pressed “space” at the end of the type name which, due to the default button focus, caused the dialog to immediately disappear.</font></li> <li><font face="Georgia">When he committed the changes at the end he never edited the message anyway, he would just click the commit button and move on.</font></li> </ol> <p><font face="Georgia">Case closed. From a UI perspective it probably falls into the same category (although with less disastrous consequences) as those unexpected popups that ask if you want to reboot your machine, NOW. Ouch!</font></p> <p><font face="Georgia"> </font></p> <p><font face="Georgia">[1] I was introduced to them on my very first job and have been fortunate enough to use one on virtually every job since, even if I ended up setting one up :o).</font></p> <p><font face="Georgia">[2] In retrospect I probably didn’t try hard enough to sell it and should have taken a more personal approach for the laggards as maybe there were good reasons why they weren’t doing it, e.g. tooling.</font></p> <p><font face="Georgia">[3] Yes, an enterprise level defect tracking tool with all the pain you’d expect from such a product.</font></p> <p><font face="Georgia">[4] For non-trivial things that is, the message “typo” still appeared for some of those but that raised a whole different set of questions around not compiling or testing changes before committing them!</font></p> <p><font face="Georgia">[5] Including the ticket number at the start of a commit message is something I promote in my <a href="http://www.chrisoldwood.com/articles/in-the-toolbox-commit-checklist.html">Commit Checklist</a>.</font></p> <p><font face="Georgia">[6] This was useful for non-mergeable files like DTS packages and media assets but often ended up creating more harm than good as they got left locked and you had to get an admin to unlock them, and they were in another team.</font></p>Chris Oldwoodhttp://www.blogger.com/profile/18183909440298909448noreply@blogger.com2tag:blogger.com,1999:blog-6628985022531866193.post-79301563081811737642020-11-19T00:42:00.001+00:002020-11-19T00:42:20.470+00:00Planning is Inevitable<p><font face="Georgia">Like most programmers I’ve generally tried to steer well clear of getting involved in management duties. The trouble is that as you get older I think this becomes harder and harder to avoid. Once you get the mechanics of programming under control you might find you have more time to ponder about some of those other duties which go into delivering software because they begin to frustrate you.</font></p><p><font face="Georgia"><strong>The Price of Success</strong></font></p><p><font face="Georgia">Around the turn of the millennium I was working in a small team for a small financial organisation. The management structure was flat and we had the blessing of the owner to deliver what we thought the users needed and when. With a small but experienced team of programmers we could adapt to the every growing list of feature requests from our users. Much of what we were doing at the time was trying to work out how certain financial markets were being priced so there was plenty of experimentation which lead to the writing and rewriting of the pricing engine as we learned more.</font></p><p><font face="Georgia">The trouble with the team being successful and managing to reproduce prices from other more expensive 3rd party pricing software was that we were then able to replace it. But of course it also has some other less important features that users then decided they needed too. Being in-house and responsive to their changes just means the backlog grows and grows and grows…</font></p><p><strong><font face="Georgia">The Honeymoon Must End</font></strong></p><p><font face="Georgia">While those users at the front of the queue are happy their needs are being met you’ll end up pushing others further down the queue and then they start asking when you’re going to get around to them. If you’re lucky the highs from the wins can outweigh the lows from those you have to disappoint.</font></p><p><font face="Georgia">The trouble for me was that I didn’t like having to keep disappointing people by telling them they weren’t even on the horizon, let alone next on the list. The team was doing well at delivering features and reacting to change but we effectively had no idea where we stood in terms of delivering all those other features that weren’t being worked on.</font></p><p><strong><font face="Georgia">MS Project Crash Course</font></strong></p><p><font face="Georgia">The company had one of those MSDN Universal licenses which included a ton of other Microsoft software that we never used, including Microsoft Project. I had a vague idea of how to use it after seeing some plans produced by previous project managers and set about ploughing through our “backlog” [1] estimating every request with a wild guess. I then added the five of us programmers in the team as the “resources” [2] and got the tool to help distribute the work amongst ourselves as best as possible.</font></p><p><font face="Georgia">I don’t remember how long this took but I suspect it was spread over a few days while I did other stuff, but at the end I had a lovely </font><a href="https://en.wikipedia.org/wiki/Gantt_chart"><font face="Georgia">Gantt Chart</font></a><font face="Georgia"> that told us everything we needed to know – we had far too much and not enough people to do it in any meaningful timeframe. If I remember correctly we had something like a year’s worth of work even if nothing else was added to the “TODO list” from now on, which of course is ridiculous – software is never done until it’s decommissioned.</font></p><p><font face="Georgia">For a brief moment I almost felt compelled to publish the plan and even try and keep it up-to-date, after all I’d spend all that effort creating it, why wouldn’t I? Fortunately I fairly quickly realised that the true value in the plan was knowing that we had too much work and therefore something had to change. Maybe we needed more people, whether that was actual programmers or some form of manager to streamline the workload. Or maybe we just needed to accept the reality that some stuff was never going to get done and we should ditch it. Product backlogs are like the garage or attic where “stuff” just ends up, forgotten about but taking up space in the faint hope that one day it’ll be useful.</font></p><p><strong><font face="Georgia">Saying No</font></strong></p><p><font face="Georgia">The truth was uncomfortable and I remember it lead to some very awkward conversations between the development team and the users for a while [3]. There is only so long that you can keep telling people “it’s on the list” and “we’ll get to it eventually” before their patience wears out. It was unfair to string people along when we pretty much knew in our hearts we’d likely never have the time to accommodate them, but being the eternal optimists we hoped for the best all the same.</font></p><p><font face="Georgia">During that period of turmoil having the plan was a useful aid because it allowed is to have those awkward conversations about what happens if we take on new work. Long before we knew anything about “agility” we were doing our best to respond to change but didn’t really know how to handle the conflict caused by competing choices. There was definitely an element of “he who shouts loudest” that had a bearing on what made its way to the top of the pile rather than a quantitative approach to prioritisation.</font></p><p><font face="Georgia">Even today, some 20 years on, it’s hard to convince teams to throw away old backlog items on the premise that if they are important enough they’ll bubble up again. Every time I see an issue on GitHub that has been automatically closed because of inactivity it makes me a little bit sad, but I know it’s for the best; you simply cannot have a never-ending list of bugs and features – at some point you just have to let go of the past.</font></p><p><font face="Georgia">On the flipside, while I began to appreciate the futility of tracking so much work, I also think going through the backlog and producing a plan made me more tolerant of estimates. Being that person in the awkward situation of trying to manage someone’s expectations has helped me get a glimpse of what questions some people are trying to answer by creating their own plans and how our schedule might knock onto them. I’m in no way saying that I’d gladly sit through sessions of planning poker simply for someone to update some arbitrary project plan because it’s expected of the team, but I feel more confident asking the question about what decisions are likely to be affected by the information I’m being asked to provide.</font></p><p><font face="Georgia"><strong>Self-Organising Teams</strong></font></p><p><font face="Georgia">Naturally I’d have preferred someone else to be the one to start thinking about the feature list and work out how we were going to organise ourselves to deal with the deluge of work, but that’s the beauty of a self-organising team. In a solid team people will pick up stuff that needs doing, even if it isn’t the most glamourous task because ultimately what they want is to see is the <em>team</em> succeed [4], because then they get to be part of that shared success.</font></p><p><font face="Georgia"> </font></p><p><font face="Georgia">[1] B.O.R.I.S (aka Back Office Request Information System) was a simple bug tracking database written with Microsoft Access. I’m not proud of it but it worked for our small team in the early days :o).</font></p><p><font face="Georgia">[2] Yes, the air quotes are for irony :o).</font></p><p><font face="Georgia">[3] A downside of being close to the customer is that you feel their pain. (This is of course a good thing from a process point of view because you can factor this into your planning.)</font></p><p><font face="Georgia">[4] See “<a href="http://www.chrisoldwood.com/articles/afterwood-the-centre-half.html">Afterwood – The Centre Half</a>” for more thoughts on the kind of role I seem to end up carving out for myself in a team.</font></p>Chris Oldwoodhttp://www.blogger.com/profile/18183909440298909448noreply@blogger.com0tag:blogger.com,1999:blog-6628985022531866193.post-74609153237113897902020-11-16T00:24:00.001+00:002020-11-16T00:24:51.976+00:00Pair Programming Interviews<p><font face="Georgia">Let’s be honest, hiring people is hard and there are no perfect approaches. However it feels somewhat logical that if you’re hiring someone who will spend a significant amount of their time solving problems by writing software, then you should probably at least <em>try</em> and validate that they are up to the task. That doesn’t mean you don’t also look for ways to asses their suitability for the <em>other</em> aspects of software development that don’t involve programming, only that being able to solve a problem with code will encompass a fair part of what they’ll be doing on a day-to-day basis [1].</font></p><p><strong><font face="Georgia">Early Computer Based Tests</font></strong></p><p><font face="Georgia">The first time I was ever asked to write code on a computer as part of an interview was way back in the late ‘90s. Back then pair programming wasn’t much of a thing in the Enterprise circles I moved in and so the exercise was very hands-off. They left me in the boardroom with a computer (but no internet access) and gave me a choice of exercises. Someone popped in half way through to make sure I was alright but other than that I had no contact with anyone. At the end I chatted briefly with the interviewer about the task but it felt more like a box ticking affair than any real attempt to gain much of an insight into how I actually behaved as a programmer. (An exercise in separating “the wheat from the chaff”.)</font></p><p><font face="Georgia">I got the job and then watched from the other side of the table as other people went through the same process. In retrospect being asked to write code on an actual computer was still quite novel back then and therefore we probably didn’t explore it as much as we should have.</font></p><p><font face="Georgia">It was almost 15 years before I was asked to write code on a computer again as part of an interview. In between I had gone through the traditional pencil & paper exercises which I was struggling with more and more [2] as I adopted TDD and refactoring as my “stepwise refinement” process of choice.</font></p><p><strong><font face="Georgia">My First Pair Programming Interview</font></strong></p><p><font face="Georgia">Around 2013 an old friend in the <a href="https://accu.org/">ACCU</a>, <a href="https://twitter.com/edsykes">Ed Sykes</a>, told me about a consultancy firm called <a href="https://twitter.com/EqualExperts">Equal Experts</a> who were looking to hire experienced freelance software developers. Part of their interview process was a simple kata done in a pair programming style. While I had done no <em>formal</em> pair programming up to that time [3] it was a core technique within the firm and so any candidates were expected to be comfortable adopting this practice where preferable.</font></p><p><font face="Georgia">I was interviewed by Ed Sykes, who played a kind of Product Owner role, and <a href="https://twitter.com/nearlyadam">Adam Straughan</a>, who was more hands-on in the experience. They gave me the Roman Numerals kata (decimal to roman conversion), which I hadn’t done before, and an hour to solve it. I took a pretty conventional approach but didn’t quite solve the whole thing in the allotted time as I didn’t quite manage to get the special cases to fall out more naturally. Still, the interviewers must have got what they were after as once again I got the job. Naturally I got involved in the hiring process at Equal Experts too because I really liked the process I had gone through and I wanted to see what it was like on the other side of the keyboard. It seemed so natural that I <font face="Georgia">wondered why more companies didn’t adopt something similar, irrespective of whether or not any pair programming was involved in the role.</font></font></p><p><font face="Georgia">Whenever I got involved in hiring for the end client I also used the same technique although I tended to be a lone “technical” interviewer rather than having the luxury of the PO + Dev approach that I was first exposed to but it was still my preferred approach by a wide margin.</font></p><font face="Georgia"><p><font face="Georgia"><strong><font face="Georgia">Pairing – Interactive Interviewing</font></strong></font></p></font><p><font face="Georgia">On reflection what <em>I</em> liked most about this approach as a candidate, compared to the traditional one, is that it felt less like an exam, which I generally suck at, and more like what you’d really do on the job. Putting aside the current climate of living in a pandemic where many people are working at home by themselves, what I liked most was that I had access to other people and was <em>encouraged</em> to ask questions rather than solve the problem entirely by myself. To wit, it felt like I was interviewing to be part of a team of people, not stuck in a booth and expected to working autonomously [4]. Instead of just leaving you to flounder, the interviewers would actively nudge you to help unblock the situation, just like they (hopefully) would do in the real world. Not everyone notices the same things and as long as they aren’t holding the candidate’s hand the whole time that little nudge should be seen as a positive sign about taking on-board feedback rather than failing to solve the problem. It’s another small, but I feel hugely important, part of making the candidate feel comfortable.</font><p><font face="Georgia"><strong><font face="Georgia">The Pit of Success</font></strong></font></p><p><font face="Georgia">We’ve all heard about those interviews where it’s less about the candidate and more about the interviewer trying to show how clever they are. It almost feels like the interviewer is going out of their way to make the interview as far removed from normal operating conditions as possible, as if the pressure of an interview is somehow akin to a production outage. If your goal is to get the best from the candidate, and it should be if you want the best chance of evaluating them fairly, then you need to make them feel as comfortable as possible. You only have a short period of time with them so getting <em>them</em> into the right frame of mind should be utmost in <em>your</em> mind.</font></p><p><font face="Georgia">One of the problems I faced in that early programming test was an unfamiliar computer. You have a choice of whether to try and adapt to the keyboard shortcuts you’re given or reconfigure the IDE to make it more natural. You might wonder if that’s part of the test which wastes yet more time and adds to the artificial nature of the setting. What about the toolset – can you use your preferred unit testing framework or shell? Even in the classic homogenous environment that is The Windows Enterprise there is often still room for personal preference, despite what some organisations might have you believe [5].</font></p><p><font face="Georgia">Asking the candidate to bring their own laptop overcomes all of these hurdles and gives them the opportunity to use their own choice of tools thereby allowing them to focus more on the problem and interaction with you and less on yak shaving. They should also have access to the Internet so they can google whatever they need to. It’s important to make this perfectly clear so they won’t feel penalised for “looking up the answer” to even simple things because we all do that for real, let alone under the pressure of an interview. Letting them get flustered because they can’t remember something seemingly trivial and then also worrying about how it’ll look if they google it won’t work in <em>your</em> favour. (Twitter is awash with people asking senior developers to point out that even <em>they</em> google the simple things sometimes and that you’re not expected to remember everything all the time.)</font></p><p><font face="Georgia">Unfortunately, simply because there are people out there that insist on interviewing in a way designed to trip up the candidate, I find I have to go overboard when discussing the setup to reassure them that there really are no tricks – that the whole point of the exercise is to get an insight into how they work in practice. Similarly reassuring the candidate that the problem is open-ended and that solving it in the allotted is not expected also helps to relax them so they can concentrate more on enjoying the process and feel comfortable with you stopping to discuss, say, their design choices instead of feeling the need to get to the end of yet another artificial deadline instead.</font></p><font face="Georgia"><p><font face="Georgia"><strong>The Exercise</strong></font></p><p>I guess it’s to be expected that if you set a programming exercise that you’d want the candidate to complete it; but for me the exercise is a means to a different end. I’m not interested in the problem itself, it’s the conversation we have that provides me with the confidence I need to decide if the candidate has potential. This implies that the problem cannot be overly cerebral as the intention is to code and chat at the same time.</p><p>While there are a number of popular katas out there, like the Roman Numerals conversion, I never really liked any of them. Consequently I came up with my own little problem based around command line parsing. For starters I felt this was a problem domain that was likely to be familiar to almost any candidate even if they’re more GUI oriented in practice. It’s also a problem that can be solved in a procedural, functional, or object-oriented way and may even, as the design evolves, be refactored from one style to the other, or even encompass aspects of multiple paradigms. (Many of the classic katas are very functional in nature.) There is also the potential to touch on I/O with the program usage and this allows the thorny subject of mocking and testability to be broached which I’ve found to be a rich seam of discussion with plenty of opinions.</p><p>(Even though the first iteration of the problem only requires supporting “-v” to print a version string I’ve had candidates create complex class hierarchies based around the Command design pattern despite making it clear that we’ll introduce new features in subsequent iterations.)</p><p><strong>Mechanics</strong></p><p>Aside from how a candidate solves a problem from a design standpoint I’m also interested in the actual mechanics of how they program. I don’t mean whether they can touch type or not – I personally can’t so that would be a poor indicator :o) – no, I mean how they use the tools. For example I find it interesting what they use the keyboard or mouse for, what keyboard shortcuts they use, how they select and move text, whether they use snippets or prefer the editor not to interfere. While I don’t think any of the candidate’s choices says anything significant about their ability to solve the problem, it does provide an interesting avenue for conversation.</p><p>It’s probably a very weak indicator but programmers are often an opinionated bunch and one area they can be highly opiniated about is the tools they use. Some people love to talk about what things they find useful, in essence what they feel improves or hinders their productivity. This in turn begs the question about what they believe “productivity” is in a software development context.</p><p><strong>Reflection</strong></p><p>What much of this observation and conversation boils down to is not about whether they do things the same way I do – on the contrary I really hope they don’t as diversity is important – it’s about the “reflective” nature of the person. How much of what they do is through conscious choice and how much is simply the result of doing things by rote.</p><p>In my experience the better programmers I have worked with tend to more aware of how they work. While many actions may fall into the realm of unconscious competence when “in the zone” they can likely explain their rationale because they’re are still (subconsciously) evaluating it in the background in case a better approach is suitable.</p><p>(Naturally this implies the people I tend to interview are, or purport to be, experienced programmers where that level of experience is assumed to be over 10 years. I’m not sure what you can expect to take away from this post when hiring those just starting out on their journey.)</p><font face="Georgia"><p><strong>An Imperfect Process</strong></p><p>Right back at the start I said that interviewing is an imperfect process and while I think pairing with someone is an excellent way to get a window into their character and abilities, so much still comes down to a gut feeling and therefore a subjective assessment.</p><p>I once paired with someone in an interview and while I felt they were probably technically competent I felt just a tinge of uneasiness about them personally. Ultimately the final question was “would I be happy to work with this person?” and so I said “yes” because I felt I would be nit-picking to say “no”. As it happens I did end up working with this person and a couple of months into the contract I had to have an awkward conversation with my other two colleagues to see if they felt the same way I did about this team mate. They did and the team mate was “swapped out” after a long conversation with the account manager.</p><p>What caused us to find working with this person unpleasant wasn’t something we felt could easily and quickly be rectified. They had a general air of negativity about them and had a habit of making disparaging, sweeping remarks which showed they looked down on database administrators and other non-programming roles. They also lacked an attention to detail causing the rest of us to dot their I’s and cross their T’s. Even after bringing this up directly it didn’t get any better; they really just wanted to get on and write new code and leave the other tasks like reviewing, documenting, deploying, etc. to other people.</p><p>I doubt there is anything you can do in an hour of pairing to unearth these kind of undesirable traits [6] to a level that you can adequately assess, which is why the gut still has a role to play. (I suspect it was my many years of experience in the industry working with different people that originally set my spider senses tingling.)</p><p><strong>Epilogue</strong></p><p>The hiring question I may find myself putting to the client is whether they would prefer to accidentally let a good candidate slip away because the interview let them (the candidate) down or accidentally hire a less suitable candidate that appeared to “walk-the-walk” as well as “talk-the-talk” and potentially become a liability. Since doing pairing interviews this question has come up very rarely with a candidate as it’s been much clearer from the pairing experience what their abilities and attitude are.</p><p> </p></font></font><p><font face="Georgia">[1] This doesn’t just apply to hiring individuals but can also work for whole teams, see “<a href="https://chrisoldwood.blogspot.com/2015/09/choosing-supplier-hackathon.html">Choosing a Supplier: The Hackathon</a>”.</font></p><p><font face="Georgia">[2] See “<a href="http://www.chrisoldwood.com/articles/afterwood-the-interview.html">Afterwood – The Interview</a>” for more on how much I dislike the pen & paper approach to coding interviews.</font></p><p><font face="Georgia">[3] My first experience was in a Cyber Dojo evening back in September 2010 that <a href="https://twitter.com/JonJagger">Jon Jagger</a> ran at Skills Matter in London. I wrote it up for the ACCU: “<a href="http://www.chrisoldwood.com/articles/accu-london-september-2010.html">Jon Jagger’s Coding Dojo</a>”.</font></p><p><font face="Georgia">[4] Being a long-time freelancer this mode of operation is not unexpected as you are often hired into an organisation specifically for your expertise; your contributions outside of “coding” are far less clear. Some like the feedback on how the delivery process is working while others do not and just want you to write code.</font></p><p><font face="Georgia">[5] My <em>In The Toolbox</em> article “<a href="http://www.chrisoldwood.com/articles/in-the-toolbox-getting-personal.html">Getting Personal</a>” takes a look at the boundary between team conventions and personal freedom for choices in tooling and approach.</font></p><p><font face="Georgia">[6] I’m not saying this person could not have improved if given the right guidance, they probably could have and I hope they actually have by now; they just weren’t right for this particular environment which needed a little more sensitivity and rigour.</font></p><p><font face="Georgia"><br></font></p>Chris Oldwoodhttp://www.blogger.com/profile/18183909440298909448noreply@blogger.com1tag:blogger.com,1999:blog-6628985022531866193.post-32926764530022915372020-10-18T00:46:00.001+01:002020-10-18T00:46:14.450+01:00Fast Hardware Hides Many Sins<p><font face="Georgia">Way back at the beginning of my professional programming career I worked for a small software house that wrote graphics software. Although it had a desktop publisher and line-art based graphics package in its suite it didn’t have a bitmap editor and so they decided to outsource that to another local company.</font></p><p><strong><font face="Georgia">A Different User Base</font></strong></p><p><font face="Georgia">The company they chose to outsource to had a very high-end bitmap editing product and so the deal – to produce a cut-down version – suited both parties. In principle they would take their high-end product, strip out the features aimed at the more sophisticated market (professional photographers) and throw in a few others that the lower end of the market would find beneficial instead. For example their current product only supported 24-bit video cards, which were pretty unusual in the early to mid ‘90s due to their high price, and so supporting 8-bit palleted images was new to them. Due to the large images their high-end product could handle using its own virtual memory system they also demanded a large, fast hard disk too.</font></p><p><font face="Georgia">Even though I was only a year or two into my career at that point I was asked to look after the project and so I would get the first drop of each version as they delivered it so that I could evaluate their progress and also keep an eye on quality. The very first drop I got contained various issues that in retrospect did not bode well for the project, which ultimately fell through, although that was not until much later. (Naturally I didn’t have the experience I have now that would probably cause me to pull the alarm chord much sooner.)</font></p><p><strong><font face="Georgia">Hard Disk Disco</font></strong></p><p><font face="Georgia">One of the features that they partially supported but we wanted to make a little more prominent was the ability to see what the RGB value of the pixel under the cursor was – often referred to now as a colour dropper or eye dropper. When I first used the feature on my 486DX PC I noticed that it was a somewhat laggy; this surprised me as I had implemented algorithms like Floyd-Steinberg dithering so knew a fair bit about image manipulation and what algorithms were expensive and this definitely wasn’t one! As an aside I had also noticed that the hard disk light on my PC was pretty busy too which made no sense but was probably worth mentioning to them as an aside.</font></p><p><font face="Georgia">After feeding back to them about this and various other things I’d noticed they made some suggestions that their virtual memory system was probably overly aggressive as the product was designed for more beefier hardware. That kind of made sense and I waited for the next drop.</font></p><p><font face="Georgia">On the next drop they had apparently made various changes to their virtual memory system which helped it cope much better with smaller images so they didn’t page unnecessarily but I still found the feature laggy, and as I played with it some more I noticed that the hard disk light was definitely flashing lots when I moved the mouse although it didn’t stop flashing entirely when I stopped moving it. For our QA department who only had somewhat smaller 386SX machines it was almost even more noticeable.</font></p><p><font face="Georgia"><strong>DBWIN – Airing Dirty Laundry</strong></font></p><p><font face="Georgia">At our company all the developers ran <a href="https://jeffpar.github.io/kbarchive/kb/086/Q86263/">the debug version of Windows 3.1. enhanced mode</a> with a second mono monitor to display messages from the Windows APIs to point out bugs in our software, but it was also very interesting to see what errors other software generated too [1]. You probably won’t be surprised to discover that the bitmap editor generated a lot of warnings. For example Windows complained about the amount of extra (custom) data it was storing against a window handle (hundreds of bytes) which I later discovered was caused by them constantly copying image attribute data back-and-forth as <em>individual</em> values instead of allocating a single struct with the data and copying that single pointer around.</font></p><p><font face="Georgia"><strong>Unearthing The Truth</strong></font></p><p><font face="Georgia">Anyway, back to the performance problem. Part of the deal enabled our company to gain access to the bitmap editor source code which they gave to us earlier than originally planned so that I could help them by debugging some of their gnarlier crashes [2]. Naturally the first issue I looked into was the colour dropper and I quickly discovered the root cause of the dreadful performance – they were reading the application’s <font face="Courier New">.ini</font> file <em>every</em> time [3] the mouse moved! They also had a timer which simulated a <font face="Courier New">WM_MOUSEMOVE</font> message for other reasons which was why it still flashed the hard disk light even when the mouse wasn’t actually moving.</font></p><p><font face="Georgia">When I spoke to them about it they explained that once upon a time they ran into a Targa video card where the driver returned the <font face="Courier New">RGB</font> values as <font face="Courier New">BGR</font> when calling <font face="Courier New">GetPixel()</font>. Hence what they were doing was checking the <font face="Courier New">.ini</font> file to see if there was an application setting there to tell them to swap the <font face="Courier New">GetPixel()</font> result. Naturally I asked them why they didn’t just read this setting <em>once</em> at application start-up and cache the value given that the user can’t swap the video card whilst the machine (let alone the application) was running. Their response was simply a shrug, which wasn’t surprising by that time as it was becoming ever more apparent that the quality of the code was making it hard to implement the features we wanted and our QA team was turning up other issues which the mostly one-man team was never going to cope with in a reasonable time frame.</font></p><p><font face="Georgia"><strong>Epilogue</strong></font></p><p><font face="Georgia">I don’t think it’s hard to see how this feature ended up this way. It wasn’t a prominent part of their high-end product and given the kit their users ran on and the kind of images they were dealing with it probably never even registered with all the other swapping going on. While I’d like to think it was just an oversight and one should never optimise until they have measured and prioritised there were too many other signs in the codebase that suggested they were relying heavily on the hardware to compensate for poor design choices. The other is that with pretty much only one full-time developer [5] the pressure was surely on to focus on new features first and quality was further down the list.</font></p><p><font face="Georgia">The project was eventually canned and with the company I was working for struggling too due to the huge growth of Microsoft Publisher and CorelDraw I only just missed the chop myself. Sadly neither company is around today despite quality playing a major part in the company I worked for and it being significantly better than many of the competing products.</font></p><p><font face="Georgia"> </font></p><p><font face="Georgia">[1] One of the first pieces of open source software I ever published (on CiX) was a <a href="http://www.chrisoldwood.com/win16.htm#mdalib">Mono Display Adapter Library</a>.</font></p><p><font face="Georgia">[2] One involved taking Windows “out at the knees” – not even CodeView or BoundsChecker would trap it – the machine would just restart. Using <a href="https://en.wikipedia.org/wiki/SoftICE">SoftICE</a> I eventually found the cause – calling <font face="Courier New">EndDialog()</font> instead of <font face="Courier New">DestroyWindow()</font> to close a modeless dialog.</font></p><p><font face="Georgia">[3] Although Windows cached the contents of the <font face="Courier New">.ini</font> file it still needed to <font face="Courier New">stat()</font> the file on <em>every</em> read access to see if it had changed and disk caching wasn’t exactly stellar back then [4].</font></p><p><font face="Georgia">[4] See <a href="https://twitter.com/chrisoldwood/status/934871786692505600?s=20">this tweet</a> of mine about how I used to <font face="Courier New">grep</font> my hard disk under Windows 3.1 :o).</font></p><p><font face="Georgia">[5] I ended up moonlighting for them in my spare time by writing them a scanner driver for one of their clients while they concentrated on getting the cut-down bitmap editor done for my company.</font></p>Chris Oldwoodhttp://www.blogger.com/profile/18183909440298909448noreply@blogger.com0tag:blogger.com,1999:blog-6628985022531866193.post-80782599186854129052020-08-09T00:54:00.001+01:002020-08-09T00:55:56.110+01:00Simple Tables From JSON Data With JQ and Column<p><font face="Georgia">My current role is more of a DevOps role and I’m spending more time than usual monitoring and administrating various services, such as the GitLab instance we use for source control, build pipelines, issue management, etc. While the GitLab UI is very useful for certain kinds of tasks the rich RESTful API allows you to easily build your own custom tools to to monitor, analyse, and investigate the things you’re particularly interested in.</font></p><p><font face="Georgia">For example one of the first views I wanted was an alphabetical list of all runners with their current status so that I could quickly see if any had gone AWOL during the night. The alphabetical sorting requirement is not something the standard UI view provides hence I needed to use the REST API or hope that someone had already done something similar first.</font></p><p><font face="Georgia"><strong>GitLab Clients</strong></font></p><p><font face="Georgia">I quickly found two candidates: </font><a href="https://github.com/python-gitlab/python-gitlab"><font face="Courier New">python-gitlab</font></a><font face="Georgia"> and </font><a href="https://github.com/plouc/go-gitlab-client"><font face="Courier New">go-gitlab-client</font></a><font face="Georgia"> which looked promising but they only really wrap the API – I’d still need to do some heavy lifting myself and understand what the </font><a href="https://docs.gitlab.com/ee/api/"><font face="Georgia">GitLab API</font></a><font face="Georgia"> does. Given how simple the examples were, even with <font face="Courier New">curl</font>, it felt like I wasn’t really saving myself anything at this point, e.g.</font></p><p><font face="Courier New">curl --header "PRIVATE-TOKEN: $token" "</font><a href="https://gitlab.example.com/api/v4/runners"><font face="Courier New">https://gitlab.example.com/api/v4/runners</font></a><font face="Courier New">"</font></p><p><font face="Georgia">So I decided to go with a wrapper script [1] approach instead and find a way to prettify the JSON output so that the script encapsulated a shell one-liner that would request the data and format the output in a simple table. Here is the kind of JSON the GitLab API would return for the list of runners:</font></p><p><font face="Courier New">[<br> {<br> "id": 6,<br> "status": "online"<br> . . .<br> }<br>
,<br> {<br> "id": 8,<br> "status": "offline"<br> . . .<br> }<br>
]</font></p><p><strong><font face="Georgia">JQ – <em>The</em> JSON Tool</font></strong></p><p><font face="Georgia">I’d come across the excellent </font><a href="https://stedolan.github.io/jq/"><font face="Georgia">JQ tool</font></a><font face="Georgia"> for querying JSON payloads many years ago so that was my first thought for at least simplifying the JSON payloads to the fields I was interested in. However on further reading I found it could do some simple formatting too. At first I thought the <em>compact output </em>using the <font face="Courier New">–c</font> option was what I needed (perhaps along with some <font face="Courier New">tr</font> magic to strip the punctuation), e.g.</font></p><p><font face="Courier New">$ echo '[{"id":1, "status":"online"}]' |\<br> jq -c<br>
[{"id":1,"status":"online"}]<br>
</font></p><p><font face="Georgia">but later I discovered the <font face="Courier New">–r</font> option provided <em>raw output</em> which formatted the values as simple text and removed all the JSON punctuation, e.g.</font></p><p><font face="Courier New">$ echo '[{"id":1, "status":"online"}]' |\<br> jq -r '( .[] | "\(.id) \(.status)" )'<br>
1 online<br>
</font></p><p><font face="Georgia">Naturally my first thought for the column headings was to use a couple of <font face="Courier New">echo</font> statements before the <font face="Courier New">curl</font> pipeline but I also discovered that you can mix-and match string literals with the output from the incoming JSON stream, e.g.</font></p><p><font face="Courier New">$ echo '[{"id":1, "status":"online"}]' |\<br> jq -r '"ID Status",<br> "-- ------",<br> ( .[] | "\(.id) \(.status)" )'<br>
ID Status<br>
-- ------<br>
1 online</font></p><p><font face="Georgia">This way the headings were only output if the command succeeded.</font></p><p><strong><font face="Georgia">Neater Tables with Column</font></strong></p><p><font face="Georgia">While these crude tables were readable and simple enough for further processing with <font face="Courier New">grep</font> and <font face="Courier New">awk</font> they were still pretty unsightly when the values of a column were too varied in length such as a branch name or description field. Putting them on the right hand side kind of worked but I wondered if I could create fixed width fields ala <font face="Courier New">printf</font> via <font face="Courier New">jq</font>.</font></p><p><font face="Georgia">At this point I stumbled across the StackOverflow question </font><a href="https://stackoverflow.com/questions/39139107/how-to-format-a-json-string-as-a-table-using-jq"><font face="Georgia">How to format a JSON string as a table using jq?</font></a><font face="Georgia"> where one of the later answers mentioned a command line tool called “<font face="Courier New">column</font>” which takes rows of text values and arranges them as columns of similar width by adjusting the spacing between elements.</font></p><p><font face="Georgia">This almost worked except for the fact that some fields had spaces in their input and <font face="Courier New">column</font> would treat them by default as separate elements. A simple change of field separator from a space to a tab meant that I could have my cake and eat it, e.g.</font></p><p><font face="Courier New">$ echo '[ {"id":1, "status":"online"},<br> {"id":2, "status":"offline"} ]' |\<br> jq -r '"ID\tStatus",<br> "--\t-------",<br> ( .[] | "\(.id)\t\(.status)" )' |\<br> column -t -s $'\t'<br>ID Status<br>-- -------<br>1 online<br>2 offline</font></p><p><font face="Georgia"><strong>Sorting and Limiting</strong></font></p><p><font face="Georgia">While many of the views I was happy to order by ID, which is often the default for the API, or in the case of jobs and pipelines was a proxy for “start time”, there were cases where I needed to control the sorting. For example we used the runner description to store the hostname (or host + container name) so it made sense to order by that, e.g.</font></p><p><font face="Georgia"><font face="Courier New">jq <span class="s1">'sort_by(.description|ascii_downcase)'</span></font></font></p><p><font face="Georgia">For the runner’s jobs the job ID ordering wasn’t that useful as the IDs were allocated up front but the job might <em>start </em>much later if it’s a latter part of the pipeline so I chose to order by the job start time instead with descending order so the most recent jobs were listed first, e.g.</font></p><p><font face="Courier New">jq ‘sort_by(.started_at) | reverse’</font></p><p><font face="Georgia">One other final trick that proved useful occasionally when there was no limiting in the API was to do it with <font face="Courier New">jq</font> instead, e.g</font></p><p><font face="Courier New">jq "sort_by(.name) | [limit($max; .[])]"</font></p><p><p><font face="Georgia"> </font></p><font face="Georgia">[1] See my 2013 article <span lang="EN-GB">“<a href="http://www.chrisoldwood.com/articles/in-the-toolbox-wrapper-scripts.html">In The Toolbox – Wrapper Scripts</a>” for more about this common technique of simplifying tools.</span></font></p>Chris Oldwoodhttp://www.blogger.com/profile/18183909440298909448noreply@blogger.com0tag:blogger.com,1999:blog-6628985022531866193.post-49289468764758940422020-08-08T01:59:00.001+01:002020-08-08T01:59:59.599+01:00Weekend Maintenance as Chaos Engineering<p><font face="Georgia">I was working on a new system – a grid based calculation engine for an investment bank – and I was beginning to read about some crazy ideas by Netflix around how they would kill off actual production servers to test their resilience to failure. I really liked this idea as it had that “put your money where your mouth is” feel to it and I felt we were designing a system that <em>should</em> cope with this kind of failure, and if it didn’t, then we had learned something and needed to fix it.</font></p><p><font face="Georgia"><strong>Failure is Expected</strong></font></p><p><font face="Georgia">We had already had a few minor incidents during its early operation caused by dodgy data flowing down from upstream systems and had tackled that by temporarily remediating the data to get the system working but then immediately fixed the code so that the same kind of problem would not cause an issue in future. The project manager, who had also worked on a sister legacy system to one I’d worked on before, had made it clear from the start that he didn’t want another “support nightmare” like we’d both seen before [1] and pushed the “self-healing” angle which was a joy to hear. Consequently reliability was always foremost in our minds.</font></p><p><font face="Georgia">Once the system went live and the business began to rely on it the idea of randomly killing off services and servers <em>in production</em> was a hard prospect to sell. While the project manager had fought to help us get a UAT environment that almost brought us parity with production and was okay with us using that for testing the system’s reliability he was less happy about going to whole hog and adopting the Netflix approach. (The organisation was already very reserved and despite <em>our</em> impeccable record some other teams had some nasty failures that caused the organisation to become <em>more </em>risk adverse rather than address then root problems.)</font></p><p><font face="Georgia"><strong>Planned Disruption is Good!</strong></font></p><p><font face="Georgia">Some months after we had gone live I drew the short straw and was involved with a large-scale DR test. We were already running active/active by making use of the DR facilities during the day and rotated the database cluster nodes every weekend [2] to avoid a node getting stale, hence we had a high degree of confidence that we would cope admirably with the test. Unfortunately there was a problem with one of the bank’s main trade systems such that it wouldn’t start after failover to DR that we never really got to do a full test and show that it was a no-brainer for us.</font></p><p><font face="Georgia">While the day was largely wasted for me as I sat around waiting for our turn it did give me time to think a bit more about how we would show that the system was working correctly and also when the DR test was finished and failed back over again that it had recovered properly. At that point I realised we didn’t need to implement any form of Chaos Engineering <em>ourselves</em> as the Infrastructure team were already providing it, every weekend!</font></p><p><font face="Georgia">It’s common for large enterprises to only perform emergency maintenance during the week and then make much more disruptive changes at the weekend, e.g. tearing parts of the network up, patching and rebooting servers, etc. At that time it was common for support teams to shut systems down and carefully bring them back up after the maintenance window to ensure they were operating correctly when the eastern markets opened late Sunday evening [3]. This was the perfect opportunity to do the complete opposite – drive the system hard over the weekend and see what state it was after the maintenance had finished – if it wasn’t still operating normally we’d missed some failure modes.</font></p><p><font face="Georgia"><strong>An Aria of Canaries</strong></font></p><p><font face="Georgia">We were already pushing through a simple canary request every few minutes which allowed us to spot when things had unusually gone south but we wanted something heavier that might drive out subtler problems so we started pushing through heavy loads during the weekend too and then looked at what state they were in at the end of the weekend. These loads always had a lower priority than any real work so we could happily leave them to finish in the background rather than need to kill them off before the working week started. (This is a nice example of using the existing features of the system to avoid it disrupting the normal workload.)</font></p><p><font face="Georgia">This proved to be a fruitful idea as it unearthed a couple of places where the system wasn’t quite as reliable as we’d thought. For example we were leaking temporary files when the network was glitching and the calculation was restarted. Also the load pushed the app servers over the edge memory-wise and highlighted a bug in the nanny process when the machine was short of memory. There was also a bug in some exponential back-off code that backed off a little too far as it never expected an outage to last most of the weekend :o).</font></p><p><font face="Georgia"><strong>Order From Chaos</strong></font></p><p><font face="Georgia">When they finally scheduled a repeat DR test some months later after supposedly ironing out the wrinkles with their key trade capture systems our test was a doddle as it just carried on after being brought back to life in the DR environment and similarly after reverting back to PROD it just picked up where it had left off and retried those jobs that had failed when the switchover started. Rather than shying away from the weekend disruption we had used it to our advantage to help improve its reliability.</font></p><p><font face="Georgia"><font face="Georgia"> </font></font></p><p><font face="Georgia">[1] Eventually the team spends so much time fire-fighting there is no time left to actually fix the system and it turns into an endless soul-destroying job.</font></p><p><font face="Georgia">[2] Rotating the database cluster primary causes the database to work with an empty cache which is a great way to discover how much your common queries rely on heavily cached data. In one instance a 45-second reporting query took over 15 minutes when faced with no cached pages!</font></p><p><font face="Georgia">[3] See <a href="https://chrisoldwood.blogspot.com/2019/11/arbitrary-cache-timeouts.html">Arbitrary Cache Timeouts</a> for an example where constant rebooting masked a bug.</font></p>Chris Oldwoodhttp://www.blogger.com/profile/18183909440298909448noreply@blogger.com0tag:blogger.com,1999:blog-6628985022531866193.post-62501185983790564002020-02-03T13:25:00.000+00:002020-02-03T13:29:43.690+00:00Blog Post #300<span style="font-family: "georgia";">I signed off <a href="https://chrisoldwood.blogspot.com/2014/11/my-200th-blog-post.html">My 200th Blog Post</a> in November 2014 with the following words:</span><br />
<blockquote>
<span style="font-family: "georgia";">See you again in a few years.</span></blockquote>
<span style="font-family: "georgia";">At the time I didn’t think it would take me over 5 years to write another 100 blog posts, but it has. Does this mean I’ve stopped writing and gone back to coding, reading, and gaming more on my daily commute? No, the clue is also in that blog post:</span><br />
<blockquote>
<span style="font-family: "georgia";">My main aspiration was that writing this blog would help me sharpen my writing skills and give me the confidence to go and write something more detailed that might then be formally published.</span></blockquote>
<span style="font-family: "georgia";">No, I haven’t stopped writing; on the contrary, since <a href="http://www.chrisoldwood.com/articles/utilising-more-than-4gb.html">my first</a> “proper” [1] article for <a href="https://accu.org/">ACCU</a> in late 2013 I’ve spent far more of my time <a href="http://www.chrisoldwood.com/articles.htm">writing further articles</a>, somewhere around the 60 mark at the last count. These have often been longer and also required more care and attention but I’ve probably still written a similar amount of words in the last five years to the previous five.</span><br />
<span style="font-family: "georgia";"><br /></span>
<span style="font-family: "georgia";"><strong>Columnist</strong></span><br />
<span style="font-family: "georgia";"><br /></span>
<span style="font-family: "georgia";">My “<a href="http://chrisoldwood.blogspot.com/2014/08/in-toolbox-season-one.html">In The Toolbox</a>” column for C Vu was a regular feature from 2013 to 2016 but that has tailed off for now and been replaced by a column on the final page of ACCU’s <a href="https://en.wikipedia.org/wiki/Overload_(magazine)">Overload</a>. After it’s editor <a href="https://twitter.com/fbuontempo?lang=en">Frances Buontempo</a> suggested the title “Afterwood” in the pub one evening how could I not accept? </span><br />
<span style="font-family: "georgia";"><br /></span>
<span style="font-family: "georgia";">In <a href="http://www.chrisoldwood.com/articles/afterwood-the-final-page.html">my very first</a> Afterwood, where I set out my stall, I described how the final page of a programming journal has often played host to some entertaining writers in the past (when printed journals were still all the rage) and, while perhaps a little late to the party given the demise of the printed page, I’m still glad to have a stab at attempting such a role.</span><br />
<span style="font-family: "georgia";"><br /></span>
<span style="font-family: "georgia";">This 300th blog post almost coincided with the blog’s <a href="http://chrisoldwood.blogspot.com/2009/04/apology-to-raymond-chen.html">10th anniversary</a> 9 months ago but I had a remote working contract at the time so my long anticipated “<a href="http://www.chrisoldwood.com/articles/afterwood-a-decade-of-writing.html">decade of writing</a>” blog post was elevated to an Afterwood instead due to the latter having some semblance of moral obligation unlike the former [2]. That piece, together with this one which focuses more on this blog, probably forms the whole picture.</span><br />
<span style="font-family: "georgia";"><strong><br /></strong></span>
<span style="font-family: "georgia";"><strong>Statistics</strong></span><br />
<span style="font-family: "georgia";"><br /></span>
<span style="font-family: "georgia";">I did wonder if I’d ever get bored of seeing my words appear in print and so far I haven’t; it still feels just that little bit more special to have to get your content past some reviewers, something you don’t have with your own blog. Being author <em>and </em>editor for my blog was something I called out as a big plus in my first anniversary post, “<a href="https://chrisoldwood.blogspot.com/2010/04/happy-birthday-blog.html">Happy Birthday, Blog</a>”. </span><br />
<span style="font-family: "georgia";"><br /></span>
<span style="font-family: "georgia";">Many of us programmers aren’t as blessed in the confidence department as people in some other disciplines so we often have to find other ways to give ourselves that little boost every now and then. The blog wins out here as you can usually see some metrics and even occasionally the odd link back from other people’s blogs or Stack Overflow, which is a nice surprise. (Metrics only tell you someone downloaded the page, whereas a link back is a good indication they actually read it too :o). They may also have agreed, which would be even more satisfying!)</span><br />
<span style="font-family: "georgia";"><br /></span>
<span style="font-family: "georgia";">While we’re on the subject of “vanity” metrics I’ve remained fairly steadfast and ignored them. I did include a monthly “page views”counter on the sidebar just to make sure that it hadn’t got lost in the ether, search-engine wise. It’s never been easy searching for my own content; I usually have to add “<span style="font-family: "courier new";">site:chrisoldwood.blogspot.com</span>” into the query, but it’s not <em>that</em> big an issue as first-and-foremost it’s notes for myself, other readers are always a bonus. For a long time my posts about <a href="http://chrisoldwood.blogspot.com/2011/05/powershell-throwing-exceptions-exit.html">PowerShell exit codes (2011)</a> and <a href="https://chrisoldwood.blogspot.com/2010/03/cleaning-up-svnmergeinfo-droppings.html">Subversion mergeinfo records (2010)</a> held the top spots but for some totally unknown reason my slightly ranty post around <a href="http://chrisoldwood.blogspot.com/2016/05/the-curse-of-ntlm-based-http-proxies.html">NTLM HTTP proxies (2016)</a> is now dominating and will likely take over the top spot. Given there are no links to it (that I can find) I can only imagine it turns up in search engine queries and it’s not what people are really looking for. Sorry about that! Maybe there are devs and sysadmins out there looking for NTLM HTTP proxy therapy and this page is it? :o) Anyway, here are the top posts as of today:</span><br />
<span style="font-family: "georgia";"><br /></span>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg5Y4fSL8qEwBUQL-L_P_bKqe8pg1vmPTRLh1FdXh7PKUbWmwXAWdYt9o4BBSlGluQRkucQy4KIhJ3Uq4ZXb_x6IZejkpYBLt9AQz-5cyzvomwP8TNfd1l3noIO-y3U_AYC8ezlgX_77Yc/s1600/TopPosts.GIF" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="520" data-original-width="947" height="175" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg5Y4fSL8qEwBUQL-L_P_bKqe8pg1vmPTRLh1FdXh7PKUbWmwXAWdYt9o4BBSlGluQRkucQy4KIhJ3Uq4ZXb_x6IZejkpYBLt9AQz-5cyzvomwP8TNfd1l3noIO-y3U_AYC8ezlgX_77Yc/s320/TopPosts.GIF" width="320" /></a></div>
<span style="font-family: "georgia";"><br /></span>
<span style="font-family: "georgia";">Somewhat amusingly the stats graph on <a href="https://chrisoldwood.blogspot.com/2014/11/my-200th-blog-post.html">my 200th blog post</a> shows a sudden meteoric rise in page views. Was I suddenly propelled to stardom? </span><span style="font-family: "georgia";">Of course not. It just so happened that my most recent post at the time got some extra views after the link was retweeted by a few people who’s follower count is measured in the thousands. It happened again a couple of years later, but in between it’s sat around the 4,500 views / month from what I can tell.</span><br />
<span style="font-family: "georgia";"><br /></span>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhfii5vpkztL0ItPj5Lg81T-mNxe8wI_xSFggOG8WmeTUZIfczIE82CBJklgOZUGlv44YxMvh2pwlq434tFXRSKUTskREi1Gdu9vAL5lT-xCTiquOs9Wzaoc17QVsyx2K6B7MlvfDTrRVM/s1600/Blog-Stats.GIF" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="205" data-original-width="359" height="182" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhfii5vpkztL0ItPj5Lg81T-mNxe8wI_xSFggOG8WmeTUZIfczIE82CBJklgOZUGlv44YxMvh2pwlq434tFXRSKUTskREi1Gdu9vAL5lT-xCTiquOs9Wzaoc17QVsyx2K6B7MlvfDTrRVM/s320/Blog-Stats.GIF" width="320" /></a></div>
<span style="font-family: "georgia";"><br /></span>
<span style="font-family: "georgia";"><br /></span>
<span style="font-family: "georgia";">The 1 million views mark is still some way off, probably another 2.5 years, unless I manage to write something incredibly profound before then. (I won’t hold my breath though as 10 years of sample data must be statistically valid and it hasn’t happened so far.)</span><br />
<span style="font-family: "georgia";"><strong><br /></strong></span>
<span style="font-family: "georgia";"><strong>The Future</strong></span><br />
<span style="font-family: "georgia";"><br /></span>
<span style="font-family: "georgia";">So, what for the future? Hopefully I’m going to keep plodding along with both my blog and any other outlets that will accept my written word. I have 113 topics in my blog drafts folder so I’m not out of ideas just yet. Naturally many of those should probably be junked as my opinion has undoubtedly changed in the meantime, although that in itself is something to write about which is why I can’t bring myself to bin them just yet – there is still value there, somewhere.</span><br />
<span style="font-family: "georgia";"><br /></span>
<span style="font-family: "georgia";">Two things I have realised I’ve missed, due to spending more time writing, is reading books (both technical and fiction) and writing code outside of work, i.e. <a href="https://github.com/chrisoldwood">my free tools</a>. However, while I’ve sorely missed both of these pursuits I have in no way regretted spending more time writing as software development is all about communication and therefore it was a skill that I felt I definitely needed to improve. My time can hardly be considered wasted.</span><br />
<span style="font-family: "georgia";"><br /></span>
<span style="font-family: "georgia";">Now that I feel I’ve reached an acceptable level of competency in my technical writing I’m left wondering whether I’m comfortable sticking with that or whether I should try and be more adventurous. Books like <a href="https://www.amazon.co.uk/Goal-Process-Ongoing-Improvement/dp/0566086654">The Goal</a> show that technical subjects can presented in more entertaining ways and I’m well aware that my writing is still far too dry. My suspicion is that I need to get back to reading more fiction, and with a more critical eye, before I’ll truly feel confident enough to branch out more regularly into other styles [3].</span><br />
<span style="font-family: "georgia";"><br /></span>
<span style="font-family: "georgia";">Where I signed off my 200th post with a genuine expectation that I’d be back again for my 300th I’m less sure about the future. Not that I’ll have given up writing, more that I’m less sure <em>this blog</em> will continue to be the place where I express myself most. Here’s to the next 100 posts.</span><br />
<span style="font-family: "georgia";"></span><br />
<br />
<span style="font-family: "georgia";">[1] I wrote a few <a href="http://www.chrisoldwood.com/articles.htm#accu-london-reviews">reviews of branch meetings</a> and <a href="http://www.chrisoldwood.com/articles.htm#book-reviews">book reviews</a> before then, but that didn’t feel quite the same <em>to me</em> as writing about technical aspects of the craft itself. The latter felt like you were exposing more of your <em>own</em> thoughts rather than “simply” recording the opinions of others.</span><br />
<span style="font-family: "georgia";"><br /></span>
<span style="font-family: "georgia";">[2] See “<a href="https://chrisoldwood.blogspot.com/2015/11/missing-daily-commute-by-train.html">Missing the Daily Commute by Train</a>” about why my volume of writing is highly correlated with where I’m working at the time.</span><br />
<span style="font-family: "georgia";"><br /></span>
<span style="font-family: "georgia";">[3] To date my efforts to be more adventurous have been limited to my Afterwood left-pad spoof “<a href="http://www.chrisoldwood.com/articles/afterwood-knocked-for-six.html">Knocked for Six</a>” and the short poem “<a href="http://www.chrisoldwood.com/articles/afterwood-risk-a-verse.html">Risk-a-Verse</a>”.</span><br />
<span style="font-family: "georgia";"></span><br />Chris Oldwoodhttp://www.blogger.com/profile/18183909440298909448noreply@blogger.com0tag:blogger.com,1999:blog-6628985022531866193.post-75080832407437870372020-01-22T19:05:00.001+00:002020-01-22T19:05:54.928+00:00Cargo Culting GitFlow<p><font face="Georgia">A few years back I got to spend a couple of weeks consulting at a small company involved in the production of smart cards. My team had been brought in by the company’s management to cast our critical eye over their software development process and provide a report on what we found along with any recommendations on how it could be improved.</font></p><p><font face="Georgia">The company only had a few developers and while the hardware side of the business seemed to be running pretty smoothly the software side was seriously lacking. To give you some indication of how bad things used to be, they weren’t even using version control for their source code. Effectively when a new customer came on board they would find the most recent and relevant existing customer’s version (stored in a .zip file), copy their version of the system, and then start hacking out a new one just for the new customer.</font></p><p><font face="Georgia">As you can imagine in a set-up like this, if a bug is found it would need to be fixed in <em>every</em> version and therefore it only gets fixed if a customer noticed and reported it. This led to more divergence. Also as the software usually went in a kiosk the hardware and OS out in the wild was often <em>ancient</em> (Windows 2000 in some cases) [1].</font></p><p><font face="Georgia">When I say “how bad things <em>used to be</em>” this was some months before we started <em>our</em> investigation. The company had already brought in a previous consultant to do an “Agile Transformation” and they had recognised these issues and made a number of very sensible recommendations, like introducing version control, automated builds, unit testing, more collaboration with the business, etc.</font></p><p><font face="Georgia">However, we didn’t think they looked too hard at the way the team were actually working and only addressed the low hanging fruit by using whatever they found in their copy of The Agile Transformation Playbook™, e.g. Scrum. Naturally we weren’t there at the time but through the course of our conversations with the team it became apparent that a cookie-cutter approach had been prescribed despite it being (in our opinion) far too heavyweight for the handful of people in the team.</font></p><p><font face="Georgia">As the title of this post suggests, and the one choice I found particularly amusing, was the introduction of VSTS (Visual Studio Team Services; rebranded <a href="https://azure.microsoft.com/en-us/services/devops/">Azure DevOps</a>) and a <a href="https://nvie.com/posts/a-successful-git-branching-model/">GitFlow</a> style workflow for the development team. While I applaud the introduction of version control and isolated, repeatable builds to the company, this feels like another heavyweight choice. The fact that they were already using Visual Studio and writing their web service in C# probably means it’s not <em>that </em>surprising if you wanted to pick a Big Iron product.</font></p><p><font face="Georgia">The real kicker though was the choice of a GitFlow style workflow for the new product team where there were <em>only two developers</em> – one for the front-end and another for the back-end. They were using feature branches and pull requests despite the fact that they were <em>the only people</em> working in their codebase. While the company might have hired another developer at some point in the future they had no immediate plans to to grow the team to any significant size [2] so there would never be any merge conflicts to resolve in the short to medium term! Their project was a greenfield one to create a configurable product instead of the many one-offs to date, so they had no regressions to worry about at this point either – it was all about learning and building a prototype.</font></p><p><font face="Georgia">It’s entirely possible the previous consultant was working on different information to us but there was nothing in our conversations with the team or management that suggested they previously had different goals to what they were asking from us now. <font face="Georgia">Sadly this is all too common an occurrence – a company hires an agile coach or consultant who may know how to handle the transformation from the business end [3] but they don’t <em>really</em> know the technical side. Adopting an agile mindset requires the <a href="https://en.wikipedia.org/wiki/Extreme_programming_practices">XP technical practices</a><font face="Georgia"><font face="Georgia"><font face="Georgia"><font face="Georgia"> too to be successful and so, unless the transformation team really knows its development onions, the practices are going to be rolled out and applied with a cargo cult mentality instead of being taught in a way that the team understands which practices are most pertinent to their situation and why.</font></font></font></font></font></font></p><p><font face="Georgia">In contrast, the plan we put forward was to strip out much of the fat and focus on making it easy to develop something which could be easily demo’d to the stakeholders for rapid feedback. We also proposed putting someone who was “more developer than scrum master” into the team for a short period so they could really grok the XP practices and see why they matter. (This was something I personally pushed quite hard for because I’ve seen how this has played out before when you’re not hands-on, see “<a href="http://chrisoldwood.blogspot.com/2015/12/the-importance-of-leading-by-example.html">The Importance of Leading by Example</a>”.)</font></p><p><font face="Georgia"> </font></p><p><font face="Georgia">[1] Luckily these kiosks weren’t connected to a network; upgrades were a site visit with a USB stick.</font></p><p><font face="Georgia">[2] Sadly there were cultural reasons for this – a topic for another day.</font></p><p><font face="Georgia">[3] This is debatable but I’m trying to be generous here as my expertise is mostly on the technical side of the fence.</font></p>Chris Oldwoodhttp://www.blogger.com/profile/18183909440298909448noreply@blogger.com0tag:blogger.com,1999:blog-6628985022531866193.post-42569418807206018832019-12-18T10:39:00.001+00:002019-12-18T10:39:45.915+00:00Branching 0 – Git 1<p><font face="Georgia">My recent tirade against unnecessary branching – “<a href="https://chrisoldwood.blogspot.com/2019/12/git-is-not-problem.html">Git is Not the Problem</a>” – might have given the impression that I don’t appreciate the power that git provides. That’s not true and hopefully the following example highlights the appreciation I have for the power git provides but also why I dislike being put in that position in the first place.</font></p><p><font face="Georgia"><strong>The Branching Strategy</strong></font></p><p><font face="Georgia">I was working in a small team with a handful of experienced developers making an old C++/ATL based GUI more accessible for users with disabilities. Given the codebase was <em>very </em>mature and maintenance was minimal, our remit only extended so far as making the minimal changes we needed to both the code and resource files. Hence this effectively meant no refactoring – a strictly surgical approach.</font></p><p><font face="Georgia">The set-up involved an integration branch per-project with us on one and the client’s team on another – <font face="Courier New">master</font> was reserved for releases. However, as they were using Stash for their repos they also wanted us to make use of its ability to create separate pull requests (PR) for every feature. This meant we needed to create independent branches for every single feature as we didn’t have permission to push directly to the integration branch even if we wanted to.</font></p><p><font face="Georgia"><strong>The Bottleneck</strong></font></p><p><font face="Georgia">For those who haven’t had the pleasure of working with Visual Studio and C++/ATL on a native GUI with other people, there are certain files which tend to be a bottleneck, most notably <font face="Courier New">resource.h</font>. This file contains the mapping for the symbols (nay <font face="Courier New">#define</font>s) to the resource file IDs. Whenever you add a new resource, such as a localizable string, you add a new symbol and bump the two “next ID” counters at the bottom. This project ended up with us adding <em>a lot</em> of new resource strings for the various (localizable) annotations we used to make the various dialog controls more accessible [1].</font></p><p><font face="Georgia">Aside from the more obvious bottleneck this <font face="Courier New">resource.h</font> file creates, in terms of editing it in a team scenario, it also has one other undesirable effect – project rebuilds. Being a header file, and also one that has a habit of being used across most of the codebase (whether intentionally or not) if it changes then most of the codebase needs re-building. On a GUI of the size we were working on, using the development VMs we had been provided, this amounted to 45 minutes of thumb twiddling every time it changed. As an aside we couldn’t use the built-in Visual Studio editor either as the file had been edited by hand for so long that when it was saved by the editor you ended up with the diff from hell [2].</font></p><p><font face="Georgia"><strong>The Side-Effects</strong></font></p><p><font face="Georgia">Consequently we ran into two big problems working on this codebase that were essentially linked to that one file. The first was that adding new resources meant updating the file in a way that was undoubtedly going to generate a merge conflict with every other branch because most tasks meant adding new resources. Even though we tried to coordinate ourselves by introducing padding into the file and artificially bumping the IDs we still ended up causing merge conflicts most of the time.</font></p><p><font face="Georgia">In hindsight we probably could have made this idea work if we added a <em>huge</em> amount of padding up front and reserved a large range of IDs but we knew there was another team adding GUI stuff on another branch and we expected to integrate with them more often than we did. (We had no real contact with them and the plethora of open branches made it difficult to see what code they were touching.)</font></p><p><font face="Georgia">The second issue was around the rebuilds. While you can <font face="Courier New">git checkout –b <branch></font> to create your feature branch without touching <font face="Courier New">resource.h</font> again, the moment you <font face="Courier New">git pull</font> the integration branch and merge you’re going to have to take the hit [3]. Once your changes are integrated and you push your feature branch to the git server it does the integration branch merge for you and moves it forward.</font></p><p><font face="Georgia">Back on your own machine you want to re-sync by switching back to the integration branch, which I’d normally do with:</font></p><p><font face="Courier New">> git checkout <branch><br>> git pull --ff-only</font></p><p><font face="Georgia">…except the first step restores the <em>old</em> <font face="Courier New">resource.h</font> before updating it again in the second step back to where you just were! Except now we’ve got another 45 minute rebuild on our hands [3].</font></p><p><font face="Georgia"><strong>Git to the Rescue</strong></font></p><p><font face="Georgia">It had been some years since any of us had used Visual Studio on such a large GUI and therefore it took us a while to work out why the codebase always seemed to want rebuilding so much. Consequently I looked to the Internet to see if there was a way of going from my feature branch back to the integration branch (which should be identical from a working copy perspective) without any files being touched. It’s git, of course there was a way, and “<a href="https://samuelgruetter.net/blog/2018/08/31/git-ffwd-without-checkout/">Fast-forwarding a branch without checking it out</a>” provided the answer [4]:</font></p><p><font face="Courier New">> git fetch origin <branch>:<branch><br>> git checkout <branch></font></p><p><font face="Georgia">The trick is to fetch the branch changes from upstream and point the local copy of that branch to its tip. Then, when you do checkout, only the branch metadata needs to change as the versions of the files are identical and nothing gets touched (assuming no other upstream changes have occurred in the meantime).</font></p><p><font face="Georgia"><strong>Discontinuous Integration</strong></font></p><p><font face="Georgia">In a modern software development world where we strive to integrate as frequently as possible with our colleagues it’s issues like these that remind us what some of the barriers are for some teams. Visual C++ has been around a long time (since 1993) so this problem is not new. It is possible to break up a GUI project – it doesn’t need to have a monolithic resource file – but that requires time & effort to fix and needs to be done in a timely fashion to reap the rewards. In a product this old which is effectively on life-support this is never going to happen now.</font></p><p><font face="Georgia">As Gerry Weinberg once said “Things are the way they are because they got that way” which is little consolation when the clock is ticking and you’re watching the codebase compile, again.<br></font></p><p><font face="Georgia"> </font></p><p><font face="Georgia">[1] I hope to write up more on this later as the information around this whole area for native apps was pretty sparse and hugely diluted by the same information for web apps.</font></p><p><font face="Georgia">[2] Luckily it’s a fairly easy format but laying out controls by calculating every window rectangle is pretty tedious. We eventually took a hybrid approach for more complex dialogs where we used the resource editor first, saved our code snippet, reverted all changes, and then manually pasted our snippet back in thereby keeping the diff minimal.</font></p><p><font face="Georgia">[3] Yes, you can use <font face="Courier New">touch</font> to tweak the file’s timestamp but you need to be sure you can get away with that by working out what the effects might be.</font></p><p><font face="Georgia">[4] As with any “googling” knowing what the right terms are, to ask the right question, is the majority of the battle.</font></p>Chris Oldwoodhttp://www.blogger.com/profile/18183909440298909448noreply@blogger.com0tag:blogger.com,1999:blog-6628985022531866193.post-35405248015788892482019-12-16T21:30:00.001+00:002019-12-16T21:30:23.875+00:00Git is Not the Problem<p><font face="Georgia">Git comes in for a lot of stick for being a complicated tool that’s hard to learn, and they’re right, git <em>is</em> a complicated tool. But it’s a tool designed to solve a difficult problem – many disparate people collaborating on a single product in a totally decentralized fashion. However, many of us don’t need to work that way, so why are we using the tool in a way that makes our lives more difficult?</font></p><p><strong><font face="Georgia">KISS</font></strong></p><p><font face="Georgia">For my entire professional programming career, which now spans over 25 years, and my personal endeavours, I have used a version control tool (VCS) to manage the source code. In that time, for the most part, I have worked in a trunk-based development fashion [1]. That means all development goes on in one integration branch and the general philosophy for <em>every</em> commit is “<a href="https://wiki.c2.com/?AlwaysBeReadyToShip">always be ready to ship</a>” [2]. As you might guess <a href="https://chrisoldwood.blogspot.com/2013/10/codebase-stability-with-feature-toggles.html">features toggles</a> (in many different guises) play a significant part in achieving that.</font></p><p><font face="Georgia">A consequence of this simplistic way of working is that my development cycle, and therefore my use of git, boils down to these few steps [3]:</font></p><ul><li><div><font face="Georgia">clone</font></div></li><li><div><em><font face="Georgia">edit / build / test</font></em></div></li><li><div><font face="Georgia">diff</font></div></li><li><div><font face="Georgia">add / commit</font></div></li><li><div><font face="Georgia">pull</font></div></li><li><div><font face="Georgia">push</font></div></li></ul><p><font face="Georgia">There may occasionally be a short inner loop where a merge conflict shows up during the pull (integration) phase which causes me to go through the edit / diff / commit cycle again, but by-and-large conflicts are rare due to close collaboration and very short change cycles. Ultimately though, from the gazillions of commands that git supports, I <em>mostly</em> use just those 6. As you can probably guess, despite using git for nearly 7 years, I actually know very little about it (command wise). [4]</font></p><p><strong><font face="Georgia">Isolation</font></strong></p><p><font face="Georgia">Where I see people getting into trouble and subsequently venting their anger is when branches are involved. This is not a problem which is specific to git though, you see this crop up with any VCS that supports branches whether it be ClearCase, Perforce, Subversion, etc. Hence, <em>the tool is not the problem, the workflow is</em>. And that commonly stems from a delivery process mandated by the organization, meaning that ultimately the issue is one of an organizational nature, not the tooling per-se.</font></p><p><font face="Georgia">An organisation which seeks to reduce risk by isolating work (and by extension its people) onto branches is increasing the delay in feedback thereby paradoxically increasing the risk of integration, or so-called “<a href="http://www.chrisoldwood.com/articles/branching-strategies.html">merge debt</a>”. A natural side-effect of making it harder to push through changes is that people will start batching up work in an attempt to boost "efficiency”. The trick is to go in the opposite direction and break things down into smaller units of work that are easier to produce and quicker to improve. Balancing production code changes with a solid investment in test coverage and automation reduces that risk further along with collaboration boosting techniques like <a href="http://www.chrisoldwood.com/articles/to-mob-pair-or-fly-solo.html">pair and mob programming</a>.</font></p><p><strong><font face="Georgia">Less is More</font></strong></p><p><font face="Georgia">Instead of enforcing a complicated workflow and employing complex tools in the hope that we can remain in control of our process we should instead seek to keep the workflow simple so that our tools remain easy to use. Git was written to solve a problem most teams don’t have as they neither have the volume of distributed people or complexity of product to deal with. Organisations that do have complex codebases cannot expect to dig themselves out of their hole simply by introducing a more powerful version control tool, it will only increase the cost of delay while bringing a false sense of security as programmers work in the dark for longer.</font></p><p><font face="Georgia"> </font></p><p><font face="Georgia">[1] My “<a href="http://www.chrisoldwood.com/articles/branching-strategies.html">Branching Strategies</a>” article in ACCU’s Overload covers this topic if you’re looking for a summary.</font></p><p><font face="Georgia">[2] This does not preclude the use of private branches for spikes and/or release branches for hotfix engineering when absolutely needed. #NoAbsolutes.</font></p><p><font face="Georgia">[3] See “<a href="http://www.chrisoldwood.com/articles/in-the-toolbox-commit-checklist.html">In The Toolbox - Commit Checklist</a>” for some deeper discussion about what goes through <em>my</em> head during the diff / commit phase.</font></p><p><font face="Georgia">[4] I pondered including “log” in the list for when doing a spot of <a href="http://www.chrisoldwood.com/articles/in-the-toolbox-software-archaeology.html">software archaeology</a> but that is becoming much rarer these days. I also only use “fetch” when I have to work with feature branches.</font></p><p><font face="Georgia"></font></p>Chris Oldwoodhttp://www.blogger.com/profile/18183909440298909448noreply@blogger.com0tag:blogger.com,1999:blog-6628985022531866193.post-5488699835170836232019-12-13T21:26:00.001+00:002019-12-13T21:26:34.643+00:00Choosing “a” Database, not “the” Database<p><font face="Georgia">One thing I’ve run across a few times over the years is the notion that an application or system has one, and only one, database product. It’s as if the answer to the question about where we should store our data must be about where we store “all” our data.</font></p><p><strong><font face="Georgia">Horses for Courses</font></strong></p><p><font face="Georgia">I’ve actually touched on this topic before in “<a href="https://chrisoldwood.blogspot.com/2014/09/deferring-database-choice.html">Deferring the Database Choice</a>” where our team tried to put off the question as long as possible because of a previous myopic mindset and there was a really strong possibility that we might even have a need for two different styles of database – relational and document-oriented – because we had two different types of data to store with very different constraints.</font></p><p><font face="Georgia">In that instance, after eventually working out what we really needed, we decided to look at a traditional relational database for the transactional data [1], while we looked towards the blossoming NoSQL crowd for the higher-volume non-transactional data. While one <em>might</em> have sufficed for both purposes the organisational structure and lack of operational experience at the time meant we didn’t feel comfortable putting all our eggs in that one NoSQL basket up front.</font></p><p><font face="Georgia">As an aside the Solution Architect [2] who was assigned to our team by the client definitely seemed out of their comfort zone with the notion that we might want to use different products for different purposes.</font></p><p><strong><font face="Georgia">Platform Investment</font></strong></p><p><font face="Georgia">My more recent example of this line of reasoning around “the one size fits all” misnomer was while doing some consulting at a firm in the insurance sector, an area where mainframes and legacy systems pervade the landscape.</font></p><p><font face="Georgia">In this particular case I had been asked to help advise on the architecture of a few new internal services they were planning. Two were really just caches of upstream data designed to reduce the per-cost call of 3rd party services while the third would serve up flood related data which was due to be incorporated into insurance pricing.</font></p><p><font face="Georgia">To me they all seemed like no-brainers. Even the flood data service just felt like it was probably a simple web service (maybe REST) that looks up the data in a document oriented database based on the postcode key. The volume of requests and size of the dataset did not seem remarkable in any way, nor the other caches. The only thing that I felt deserved any real thought was around the versioning of the data, if that was even a genuine consideration. (I was mostly trying to think of <em>any</em> potential risks that might vaguely add to the apparent lack of complexity.)</font></p><p><font face="Georgia">Given the company already called out from its mainframe to other web services they had built, this was a solved problem, and therefore I felt there was no reason not to start knocking up the flood data service which, given its simplicity, could be done outside-in so that they’d have their first microservice built TDD-style (an approach they wanted to try out anyway). They could even plug it in pretty quickly and just ignore the responses back to the mainframe in the short term so that they could start getting a feel for the operational aspects. In essence it seemed the perfect learning opportunity for many new skills within the department.</font></p><p><strong><font face="Georgia">An Undercurrent</font></strong></p><p><font face="Georgia">However, while <em>I</em> saw this as a low-risk venture there were questions from further up effectively about choosing the database. I suspected there were concerns about the cost but some rudimentary calculations based around a three-node cluster with redundant disks versus storage for the mainframe showed that they weren’t even in the same ballpark and we’re not even talking SSDs here either. (This also ignores the fact that they were close to maxing out the mainframe anyway.)</font></p><p><font face="Georgia">One of the great things about databases in these modern times is that you can download the binaries and just fire one up and get playing. Given the dataset fitted the document-oriented paradigm and there were no transactions to speak of I suggested they pick either MongoDB or Couchbase and just get started as it was the paradigm they most needed to get acquainted with, the specific vendor (to me) was less of a concern in the shorter term as the data model was simple.</font></p><p><font face="Georgia">Nevertheless, rather than build something first and get a feel for what makes most sense, they wanted to invite the various big NoSQL vendors in and discuss contracts and products up-front. So I arranged for the three main contenders at the time to visit the company’s offices and give a pitch, followed by some Q&A time for the management to ask any burning questions. It was during the first of these three pitches that I began to realise where the disconnect lay between my vision and theirs.</font></p><p><font face="Georgia">While I had always been working on the assumption that the company was most comfortable with mainframes and relational databases and that they wanted to step outside that and move to a less monolithic architecture, perhaps using the <a href="https://martinfowler.com/bliki/StranglerFigApplication.html">Strangler Pattern</a> to break out the peripheral services into independent self-contained ones, they still saw a single database product sitting at the heart. Yes, the services might be built separately, and the data may well be partitioned via namespaces or collections or whatever, but fundamentally the assumption was that the data storage was still effectively monolithic.</font></p><p><strong><font face="Georgia">A False Economy</font></strong></p><p><font face="Georgia">In retrospect I shouldn’t really have been that surprised. The reason the mainframe had probably survived for so long was that the data was seen as the crown jewels and the problems of redundancy and backup had been solved long ago and were pretty robust. In fact if anything went wrong the vendor could helicopter some experts in (which they had done in the past). This was not the level of service offered by the new kids on the block and the company was still far from getting comfortable with cloud hosting and managed service providers which were are starting to spring up.</font></p><p><font face="Georgia">Hence, where I was looking at the somewhat disposable nature of the <em>new</em> services purely as an opportunity for learning, others higher up were looking at it as a stepping stone to moving <em>all</em> their data across to another platform. Coupled with this was the old-fashioned view that the decision needed to be made up-front and needed to be the right one from the off [3].</font></p><p><strong><font face="Georgia">A Different Investment</font></strong></p><p><font face="Georgia">Even with this misconception acknowledged and the shining cost savings to be had there was still a heavy reluctance to go with something new. I believe that in the end they put their investment into more mainframe storage instead of investing in their people and the organisation’s longer term future.</font></p><p><font face="Georgia"> </font></p><p><font face="Georgia">[1] There was definitely an element of “<a href="https://en.wikipedia.org/wiki/Availability_heuristic">availability bias</a>” here as the organisation had a volume licensing agreement with a relational database vendor.</font></p><p><font face="Georgia">[2] A role which highlighted their <a href="https://en.wikipedia.org/wiki/Ivory_tower">Ivory Tower</a> approach at the time but has since fallen away as architecture has thankfully started leaning more towards shared ownership.</font></p><p><font face="Georgia">[3] Some of the impetus for “<a href="https://chrisoldwood.blogspot.com/2015/11/dont-fail-fast-learn-cheaply.html">Don’t Fail Fast, Learn Cheaply</a>” came from conversations I had with this organisation about their approach to career development.</font></p><p><font face="Georgia"></font></p>Chris Oldwoodhttp://www.blogger.com/profile/18183909440298909448noreply@blogger.com0tag:blogger.com,1999:blog-6628985022531866193.post-26766031644449072962019-12-09T22:21:00.001+00:002019-12-09T22:21:04.438+00:00Automating Windows VM Creation on Ubuntu<p><em><font face="Georgia">TL;DR you can find my resulting Oz and Packer configuration files in <a href="https://gist.github.com/chrisoldwood/39e454f8bbce5858d8d634d12f500082">this Oz gist</a> and <a href="https://gist.github.com/chrisoldwood/aeec1e6876dadcc407109896d8d8aac7">this Packer gist</a> on my GitHub account.</font></em></p><p><font face="Georgia">As someone who has worked almost exclusively on Windows for the last 25 years I was somewhat surprised to find myself needing to create Windows VMs on Linux. Ultimately these were to be build server agents and therefore I needed to automate everything from creating the VM image, to installing Windows, and eventually the build toolchain. This post looks at the first two aspects of this process.</font></p><p><font face="Georgia">I did have a little prior experience with <a href="https://packer.io/">Packer</a>, but that was on AWS where the base AMIs you’re provided have already got you over the initial OS install hurdle and you can focus on baking in your chosen toolchain and application. This time I was working on-premise and so needed to unpick the Linux virtualization world too.</font></p><p><font face="Georgia">In the end I managed to get two approaches working – Oz and Packer – on the Ubuntu 18.04 machine I was using. (You may find these instructions useful for other distributions but I have no idea how portable this information is.)</font></p><p><strong><font face="Georgia">QEMU/KVM/libvirt</font></strong></p><p><font face="Georgia">On the Windows-as-host side (until fairly recently) virtualization boiled down to a few classic options, such as <a href="https://en.wikipedia.org/wiki/Hyper-V">Hyper-V</a> and <a href="https://www.virtualbox.org/">Virtual Box</a>. The addition of Docker-style <a href="https://www.docker.com/products/windows-containers">Windows containers</a>, along with <a href="https://blogs.msdn.microsoft.com/msgulfcommunity/2015/06/20/what-is-windows-server-containers-and-hyper-v-containers/">Hyper-V containers</a> has padded things out a bit more but to me it’s still fairly manageable.</font></p><p><font face="Georgia">In contrast on the Linux front, where this technology has been maturing for much longer, we have far more choice, and ultimately, for a Linux n00b like me [1], this means far more noise to wade through on top of the usual “which distribution are you running” type questions. In particular the fact that any documentation on “virtualization” could be referring to containers or hypervisors (or something in-between), when you’re only concerned with <em>hypervisors</em> for running Windows VMs, doesn’t exactly aid comprehension.</font></p><p><font face="Georgia">Luckily I was pointed towards <a href="https://en.wikipedia.org/wiki/Kernel-based_Virtual_Machine">KVM</a> as a good starting point on the Linux hypervisor front. <a href="https://www.qemu.org/">QEMU</a> is one of those minor distractions as it <em>can</em> provide full emulation, but it also provides the other bit KVM needs to be useful in practice – device emulation. (If you’re feeling nostalgic you can fire up an MS-DOS recovery boot-disk from “<a href="https://www.allbootdisks.com/">All Boot Disks</a>” under QMEU/KVM with minimal effort which gives you a quick sense of achievement.)</font></p><p><font face="Georgia">What I also found mentioned in the same breath as these two was a virtualization “add-on layer” called <a href="https://libvirt.org/">libvirt</a><em></em> which provides a layer on top of the underlying technology so that you can use more technology agnostic tools. Confusingly you might notice that Packer doesn’t mention libvirt, presumably because it already has providers that work directly with the lower layer.</font></p><p><font face="Georgia">In summary, using <font face="Courier New">apt</font>, we can install this lot with:</font></p><p><font face="Courier New">$ sudo apt install qemu qemu-kvm libvirt-bin bridge-utils virt-manager -y</font></p><p><strong><font face="Georgia">Windows ISO & Product Key</font></strong></p><p><font face="Georgia">We’re going to need a Windows ISO along with a related product key to make this work. While in the end you’ll need a proper license key I found the Windows 10 Evaluation Edition was perfect for experimentation as the VM only lasts for a few minutes before you bin it and start all over again.</font></p><p><font face="Georgia">You can download the latest Windows image from the <a href="https://www.microsoft.com/en-gb/software-download/windows10">MS downloads page</a> which, if you’ve configured your browser’s <font face="Courier New">User-Agent</font> string to appear to be from a non-Windows OS, will avoid <a href="https://www.howtogeek.com/427223/how-to-download-a-windows-10-iso-without-the-media-creation-tool/">all the sign-up nonsense</a>. Alternatively <a href="https://www.google.com/search?q=care.dlservice.microsoft.com">google for “care.dlservice.microsoft.com”</a> and you’ll find plenty of public build scripts that have direct download URLs which are beneficial for automation.</font></p><p><font face="Georgia">Although the Windows 10 evaluation edition doesn’t need a specific license key you will need a product key to stick in the <font face="Courier New">autounattend.xml</font> file when we get to that point. Luckily you can easily get that from the <a href="https://docs.microsoft.com/en-us/windows-server/get-started/kmsclientkeys">MS KMS client keys page</a>.</font></p><p><strong><font face="Georgia">Windows Answer File</font></strong></p><p><font face="Georgia">By default Windows presents a GUI to configure the OS installation, but if you give it a special XML file known as <font face="Courier New">autounattend.xml</font> (in a special location, which we’ll get to later) all the configuration settings can go in there and the OS installation will be hands-free.</font></p><p><font face="Georgia">There is a <a href="https://docs.microsoft.com/en-us/previous-versions/windows/it-pro/windows-vista/cc722301(v=ws.10)?redirectedfrom=MSDN">specific Windows tool</a> you can use to generate this file, but an online version in the guise of the <a href="https://www.windowsafg.com/">Windows Answer File Generator</a> produced a working file with fairly minimal questions. You can also generate one for different versions of the Windows OS which is important as there are many examples that appear on the Internet but it feels like pot-luck as to whether it would work or not as the format changes slightly between releases and it’s not easy to discover where the impedance mismatch lies.</font></p><p><font face="Georgia">So, at this point we have our Linux hypervisor installed, and downloaded a Windows installation <font face="Courier New">.iso</font> along with a generated <font face="Courier New">autounattend.xml</font> file to drive the Windows install. Now we can get onto building the VM, which I managed to do with two different tools – Oz and Packer.</font></p><p><strong><font face="Georgia">Oz</font></strong></p><p><font face="Georgia">I was flicking through a copy of <a href="https://www.packtpub.com/gb/networking-and-servers/mastering-kvm-virtualization">Mastering KVM Virtualization</a><em></em> and it mentioned a tool called <a href="https://github.com/clalancette/oz">Oz</a> which was designed to make it easy to build a VM along with installing an OS. More importantly it listed having support for most Windows editions too! Plus it’s been around for a fairly long time so is relatively mature. You can install it with <font face="Courier New">apt</font>:</font></p><p><font face="Courier New">$ sudo apt install oz -y</font></p><p><font face="Georgia">To use it you create a simple configuration file (<font face="Courier New">.tdl</font>) with the basic VM details such as CPU count, memory, disk size, etc. along with the OS details, <font face="Courier New">.iso</font> filename, and product key (for Windows), and then run the tool:</font></p><p><font face="Courier New">$ oz-install -d2 -p windows.tdl -x windows.libvirt.xml</font></p><p><font face="Georgia">If everything goes according to plan you end up with a QEMU disk image and an <font face="Courier New">.xml</font> file for the VM (called a “domain”) that you can then register with libvirt:</font></p><p><font face="Courier New">$ virsh define windows.libvirt.xml</font></p><p><font face="Georgia">Finally you can start the VM via libvirt with:</font></p><p><font face="Courier New">$ virsh start windows-vm</font></p><p><font face="Georgia">I initially tried this with the Windows 8 RTM evaluation <font face="Courier New">.iso</font> and it worked right out of the box with the Oz built-in template! However, when it came to <em>Windows 10 </em>the Windows installer complained about there being no product key, despite the Windows 10 template having a placeholder for it and the key was defined in the <font face="Courier New">.tdl</font> configuration file.</font></p><p><font face="Georgia">It turns out, as you can see from <a href="https://github.com/clalancette/oz/issues/268">Issue #268</a> (which I raised in the Oz GitHub repo) that the Windows 10 template is broken. The <font face="Courier New">autounattend.xml</font> file also wants the key in the <font face="Courier New"><UserData></font> section too it seems. Luckily for me oz-install can accept a custom <font face="Courier New">autounattend.xml</font> file via the <font face="Courier New">-a</font> option as long as we fill in any details manually, like the <font face="Courier New"><AutoLogin></font> account username / password, product key, and machine name.</font></p><p><font face="Courier New">$ oz-install -d2 -p windows.tdl -x windows.libvirt.xml –a autounattend.xml</font></p><p><font face="Georgia">That Oz GitHub issue only contains my suggestions as to what I <em>think</em> needs fixing in the <font face="Courier New">autounattend.xml</font> file, I also have <a href="https://gist.github.com/chrisoldwood/39e454f8bbce5858d8d634d12f500082">a personal gist</a> on GitHub that contains both the <font face="Courier New">.tdl</font> and <font face="Courier New">.xml</font> files that I successfully used. (Hopefully I’ll get a chance to submit a formal PR at some point so we can get it properly fixed; it also needs a tweak to the Python code as well I believe.)</font></p><p><em><font face="Georgia">Note: while I managed to build the basic VM I didn’t try to do any post-processing, e.g. using WinRM to drive the installation of applications and tools from the outside.</font></em></p><p><strong><font face="Georgia">Packer</font></strong></p><p><font face="Georgia">I had originally put Packer to one side because of difficulties getting anything working under Hyper-V on Windows but with my new found knowledge I decided to try again on Linux. What I hadn’t appreciated was quite how much Oz was actually doing for me under the covers.</font></p><p><font face="Georgia">If you use the <a href="https://www.packer.io/docs/builders/qemu.html">Packer documentation</a> [2] [3] and online examples you should happily get the disk image allocated and the VM to fire up in VNC and sit there waiting for you to configure the Windows install. However, after selecting your locale and keyboard you’ll probably find the disk partitioning step stumps you. Even if you follow some examples and put an <font face="Courier New">autounattend.xml</font> on a floppy drive you’ll still likely hit a <font face="Courier New"><DiskConfiguration></font> error during set-up. The reason is probably because you don’t have the right <em>Windows</em> driver available for it to talk to the underlying virtual disk device (unless you’re lucky enough to pick an IDE based example).</font></p><p><font face="Georgia">One of the really cool things Oz appears to do is handle this nonsense along with the <font face="Courier New">autounattend.xml</font> file which it also slips into the <font face="Courier New">.iso</font> that it builds on-the-fly. With Packer you have to be more aware and <a href="https://www.linux-kvm.org/page/WindowsGuestDrivers">fetch the drivers yourself</a> (which come as part of <a href="https://docs.fedoraproject.org/en-US/quick-docs/creating-windows-virtual-machines-using-virtio-drivers/index.html">another .iso</a><font face="Courier New"></font>) and then mount that explicitly as <em>another</em> CD-ROM drive by using the <font face="Courier New">qemuargs</font> section of the Packer builder config. (In my example it’s mapped as drive <font face="Courier New">E:</font> inside Windows.)</font></p><p><font face="Courier New">[ "-drive", "file=./virtio-win.iso,media=cdrom,index=3" ]</font></p><p><font face="Georgia">Luckily you can download the VirtIO drivers <font face="Courier New">.iso</font> from <a href="https://docs.fedoraproject.org/en-US/quick-docs/creating-windows-virtual-machines-using-virtio-drivers/index.html">a Fedora page</a> and stick it alongside the Windows <font face="Courier New">.iso</font>. That’s still not quite enough though, we also need to tell the Windows installer where our drivers are located; we do that with a special section in the <font face="Courier New">autounattend.xml file</font>.</font></p><p><font face="Courier New"><DriverPaths><br> <PathAndCredentials wcm:action="add" wcm:keyValue="1"><br> <Path>E:\NetKVM\w10\amd64\</Path></font><p><font face="Georgia">Finally, in case you’ve not already discovered it, the <font face="Courier New">autounattend.xml</font> file is presented by Packer to the Windows installer as a file in the root of a floppy drive. (The floppy drive and extra CD-ROM drives both fall away once Windows has bootstrapped itself.)</font></p><p><font face="Courier New">"floppy_files":<br>[<br> "autounattend.xml",</font><p><font face="Georgia">Once again, as mentioned right at the top, I have <a href="https://gist.github.com/chrisoldwood/aeec1e6876dadcc407109896d8d8aac7">a personal gist</a> on GitHub that contains the files I eventually got working.</font></p><p><font face="Georgia">With the QEMU/KVM image built we can then register it with libvirt by using <font face="Courier New">virt-install</font>. I thought the <font face="Courier New">--import </font>switch would be enough here as we now have a runnable image, but that option appears to be for a different scenario [4], instead we have to take two steps – generate the libvirt XML config file using the <font face="Courier New">--print-xml</font> option, and then apply it:</font></p><p><font face="Courier New">$ virt-install --vcpus ... --disk ... --print-xml > windows.libvert.xml<br>$ virsh define windows.libvert.xml</font></p><p><font face="Georgia">Once again you can start the finalised VM via libvirt with:</font></p><p><font face="Courier New">$ virsh start windows-vm</font></p><p><strong><font face="Georgia">Epilogue</font></strong></p><p><font face="Georgia">While having lots of documentation is generally A Good Thing™, when it’s spread out over a considerable time period it’s sometimes difficult to know if the information you’re reading still applies today. This is particularly true when looking at other people’s example configuration files alongside reading the docs. The long-winded route might still work but the tool might also do it automatically now if you just let it, which keeps your source files much simpler.</font><p><font face="Georgia">Since getting this working I’ve seen other examples which suggest I may have fallen foul of this myself and what I’ve written up may also still be overly complicated! Please feel free to use the comments section on this blog or <a href="https://gist.github.com/chrisoldwood">my gists</a> to inform any other travellers of your own wisdom in any of this.</font></p><p><p><font face="Georgia"><font face="Georgia"> </font></font></p><font face="Georgia"><font face="Georgia">[1] That’s not entirely true. I ran Linux on an <a href="http://www.anytux.org/hardware.php?system_id=695">Atari TT</a> and a circa v0.85 Linux kernel on a 386 PC in the early-to-mid ‘90s.</font></font><p><font face="Georgia">[2] The Packer docs can be misleading. For example it says the <a href="https://www.packer.io/docs/builders/qemu.html#disk_size"><font face="Courier New">disk_size</font> is in bytes</a> and you can use suffixes like <font face="Courier New">M</font> or <font face="Courier New">G</font> to simplify matters. Except they don’t work and the value is actually in <em>megabytes</em>. No wonder a value of <font face="Courier New">15,000,000,000</font> didn’t work either :o).</font></p><p><font face="Georgia">[3] Also be aware that the version of Packer available via apt is only <font face="Courier New">1.0.x</font> and you need to manually <a href="https://www.packer.io/downloads.html">download</a> the latest <font face="Courier New">1.4.x</font> version and unpack the <font face="Courier New">.zip</font>. (I initially thought the bug in [2] was down to a stale version but it’s not.)</font></p><p><font face="Georgia">[4] The <a href="http://manpages.ubuntu.com/manpages/cosmic/man1/virt-install.1.html">--import switch</a> still fires up the VM as it appears to assume you’re going to <em>add</em> to the current image, not that it <em>is</em> the final image.</font></p><p><font face="Georgia"><br></font></p>Chris Oldwoodhttp://www.blogger.com/profile/18183909440298909448noreply@blogger.com4tag:blogger.com,1999:blog-6628985022531866193.post-49725468065088573472019-11-18T20:31:00.001+00:002019-11-18T20:31:52.886+00:00Arbitrary Cache Timeouts<p><font face="Georgia">Like many other programmers I’ve probably added my fair share of caches to systems over the years, and as we know from the old joke, one of the two hardest problems in computer science is knowing when to invalidate them. It’s a hard question, to be sure, but a really annoying behaviour you can run into as a maintainer is when the invalidation <em>appears</em> to be done arbitrarily, usually by specifying some timeout seemingly plucked out of thin air and maybe even changed equally arbitrarily. (It may not be, but documenting such decisions is usually way down the list of important things to do.)</font></p><p><font face="Georgia"><strong>Invalidation</strong></font></p><p><font face="Georgia">If there is a need for a cache in production, and let’s face it that’s the usual driver, then any automatic invalidation is likely to be based on doing it as infrequently as possible to ensure the highest hit ratio. The problem is that that value can often be hard-coded and mask cache invalidation bugs because it rarely kicks in. The knee-jerk reaction to “things behaving weirdly” in production is to switch everything off-and-on again thereby implicitly invalidating any caches, but this doesn’t help us find those bugs.</font></p><p><font face="Georgia">The most recent impetus for this post was just such a bug which surfaced because the cache invalidation logic never ran in practice. The cache timeout was set arbitrarily large, which seemed odd, but I eventually discovered it was supposed to be irrelevant because the service hosting it should have been rebooted at midnight every day! Due to the password for the account used to run the reboot task expiring it never happened and the invalidated items then got upset when they were requested again. Instead of simply fetching the item from the upstream source and caching it again, the cache had some remnants of the stale items and failed the request instead. Being an infrequent code path it didn’t obviously ring any bells so took longer to diagnose.</font></p><p><strong><font face="Georgia">Design for Testability</font></strong></p><p><font face="Georgia">While it’s useful to avoid throwing away data unnecessarily in production we know that the live environment rarely needs the most flexibility when it comes to configuration (see “<a href="http://www.chrisoldwood.com/articles/testing-drives-the-need-for-flexible-configuration.html">Testing Drives the Need for Flexible Configuration</a>”). On the contrary, I’d expect to have any cache being cycled reasonably quickly in a test environment to try and flush out any issues as I’d expect more side-effects from cache misses than hits.</font></p><p><font face="Georgia">If you are writing any automated tests around the caching behaviour that is often a good time to consider the other non-functional requirements, such as monitoring and support. For example, does the service or tool hosting the cache expose some means to flush it manually? While rebooting a service may do the trick it does nothing to help you track down issues around residual state and often ends up wreaking havoc with any connected clients if they’re not written with a proper distributed system mindset.</font></p><p><font face="Georgia">Another scenario to consider is if the cache gets poisoned; if there is no easy way to eject the bad data you’re looking at the sledgehammer approach again. If your cache is HA (highly available) and backed by some persistent storage getting bad data out could be a real challenge when you’re under the cosh. One system I worked on had random caches poisoned with bad data due to <font face="Georgia">a threading serialization bug in an external library. </font></font></p><p><strong><font face="Georgia">Monitoring</font></strong></p><p><font face="Georgia">The monitoring side is probably equally important. If you generate no instrumentation data how do you know if your cache is even having the desired effect? One team I was on added a new cache to a service and we were bewildered to discover that it was never used. It turned out the WCF service settings were configured to create a new service instance for every request and therefore a new cache was created every time! This was despite the fact that we had unit tests for the cache and they were happily passing [1].</font></p><p><font face="Georgia">It’s also important to realise that a cache without an eviction policy is just another name for a memory leak. You cannot keep caching data forever unless you know there is a hard upper bound. Hence you’re going to need to use the instrumentation data to help find the sweet spot that gives you the right balance between time and space.</font></p><p><font face="Georgia">We also shouldn’t blindly assume that caches will continue to provide the same performance in future as they do now; our metrics will allow us to see any change in trends over time which might highlight a change in data that’s causing it to be less efficient. For example one cache I saw would see its efficiency plummet for a while because a large bunch of single use items got requested, cached, and then discarded as the common data got requested again. Once identified we disabled caching for those kinds of items, not so much for the performance benefit but to avoid blurring the monitoring data with unnecessary “glitches” [2].</font></p><p><font face="Georgia"> </font></p><p><font face="Georgia">[1] See “<a href="http://chrisoldwood.blogspot.com/2016/01/man-cannot-live-by-unit-testing-alone.html">Man Cannot Live by Unit Testing Alone</a>” for other tales of the perils of that mindset.</font></p><p><font face="Georgia">[2] This is a topic I covered more extensively in my Overload article “<a href="http://www.chrisoldwood.com/articles/monitoring-turning-noise-into-signal.html">Monitoring: Turning Noise Into Signal</a>”.</font></p>Chris Oldwoodhttp://www.blogger.com/profile/18183909440298909448noreply@blogger.com0tag:blogger.com,1999:blog-6628985022531866193.post-54232586285529117072019-11-14T20:51:00.001+00:002019-11-14T20:51:29.257+00:00Validate in Production<p><font face="Georgia">The change was reasonably simple: we had to denormalise some postcode data which was currently held in a centralised relational database into some new fields in every client’s database to remove some cross-database joins that would be unsupported on the new SQL platform we were migrating too [1].</font></p><p><font face="Georgia">As you might imagine the database schema changes were fairly simple – we just needed to add the new columns as nullable strings into every database. The next step was to update the service code to start populating these new fields as addresses were added or edited by using data from the centralised postcode database [2].</font></p><p><font face="Georgia">At this point any new data or data that changed going forward would have the correctly denormalised state. However we still needed to fix up any existing data and that’s the focus of this post.</font></p><p><strong><font face="Georgia">Migration Plan</font></strong></p><p><font face="Georgia">To fix-up all the existing client data we needed to write a tool which would load each client’s address data that was missing its new postcode data, look it up against the centralised list, and then write back any changes. Given we were still using the cross-database joins in live for the time being to satisfy the existing reports we could roll this out in the background and avoiding putting any unnecessary load on the database cluster.</font></p><p><font face="Georgia">The tool wasn’t throw-away because the postcode dataset gets updated regularly and so the denormalised client data needs to be refreshed whenever the master list is updated. (This would not be that often but enough to make it worth spending a little extra time writing a reusable tool for the job for ops to run.)</font></p><p><font face="Georgia">Clearly this isn’t rocket science, it just requires loading the centralised data into a map, fetching the client’s addresses, looking them up, and writing back the relevant fields. The tool only took a few hours to write and test and so it was ready to run for the next release during a quiet period.</font></p><p><font face="Georgia">When that moment arrived the tool was run across the hundreds of client databases and plenty of data was fixed up in the process, so the task appeared completed.</font></p><p><strong><font face="Georgia">Next Steps</font></strong></p><p><font face="Georgia">With all the existing postcode data now correctly populated too we should have been in a position to switch the report generation feature toggle on so that it used the new denormalised data instead of doing a cross-database join to the existing centralised store.</font></p><p><font face="Georgia">While the team were generally confident in the changes to date I suggested we should just do a sanity check and make sure that everything was working as intended as I felt this was a reasonably simple check to run.</font></p><p><font face="Georgia">An initial SQL query someone knocked up just checked how many of the new fields had been populated and the numbers seemed about right, i.e. very high (we’d expect <em>some </em>addresses to be missing data due to missing postcodes, typos and stale postcode data). However I still felt that we should be able to get a <em>definitive </em>answer with very little effort by leveraging the existing we SQL we were about to discard, i.e. use the cross-database join one last time to verify the data population more precisely.</font></p><p><strong><font face="Georgia">Close, but No Cigar</font></strong></p><p><font face="Georgia">I massaged the existing report query to show where data from the dynamic join was different to that in the new columns that had been added (again, not rocket science). To our surprise there were quite a significant number of discrepancies.</font></p><p><font face="Georgia">Fortunately it didn’t take long to work out that those addresses which were missing postcode data all had postcodes which were at least partially written in lowercase whereas the ones that had worked were entirely written in uppercase.</font></p><p><font face="Georgia">Hence the bug was fairly simple to track down. The tool loaded the postcode data into a dictionary (map) keyed on the string postcode and did a straight lookup which is case-sensitive by default. A quick change to use a case-insensitive comparison and the tool was fixed. The data was corrected soon after and the migration verified.</font></p><p><font face="Georgia">Why didn’t this show up in the initial testing?<font face="Georgia"> Well, it turned out the tools used to generate the test data sets and also to anonymize real client databases were somewhat simplistic and this helped to provide a false level of confidence in the new tool.</font></font></p><p><strong><font face="Georgia">Testing in Production</font></strong></p><p><font face="Georgia">Whenever we make a change to our system it’s important that we verify we’ve delivered what we intended. Oftentimes the addition of a feature has some impact on the front-end and the customer and therefore it’s fairly easy to see if it’s working or not. (The customer usually has something to say about it.)</font></p><p><font face="Georgia">However back-end changes can be harder to verify thoroughly, but it’s still important that we do the best we can to ensure they have the expected effect. In this instance we could easily check every migrated address within a reasonable time frame and know for sure, but on large data sets this might unfeasible so you might have to settle for less. Also the use of feature switches and incremental delivery meant that even though there was a bug it did not affect the customers and we were always making forward progress.</font></p><p><font face="Georgia">Testing does not end with a successful run of the build pipeline or a sign-off from a QA team – it also needs to work in real life too. Ideally the work we put in up-front will make that more likely but for some classes of change, most notably where actual customer data is involved, we need to follow through and ensure that practice and theory tie up.</font></p><p><font face="Georgia"> </font></p><p><font face="Georgia">[1] Storage limitations and other factors precluded simply moving the entire postcode database into each customer DB <em>before</em> moving platforms. The cost was worth it to de-risk the overall migration.</font></p><p><font face="Georgia">[2] There was no problem with the web service having two connections to two different databases, we just needed to stop writing SQL queries that did cross-database joins.</font></p>Chris Oldwoodhttp://www.blogger.com/profile/18183909440298909448noreply@blogger.com0tag:blogger.com,1999:blog-6628985022531866193.post-12687913065148510712019-03-28T18:24:00.001+00:002019-03-28T18:24:45.706+00:00PowerShell’s Call Operator (&) Arguments with Embedded Spaces and Quotes<p><font face="Trebuchet MS">I was recently upgrading a PowerShell script that used the v2 <font face="Courier New">nunit-console</font> runner to use the v3 one instead when I ran across a weird issue with PowerShell. I’ve haven’t found a definitive bug report or release note yet to describe the change in behaviour, hence I’m documenting my observation here in the meantime.</font></p> <p><font face="Trebuchet MS">When running the script on my desktop machine, which runs Windows 10 and PowerShell v5.x it worked first time, but when pushing the script to our build server, which was running Windows Server 2012 and PowerShell v4.x it failed with a weird error that suggested the command line being passed to <font face="Courier New">nunit-console</font> was borked.</font></p> <p><strong><font face="Trebuchet MS">Passing Arguments with Spaces</font></strong></p> <p><font face="Trebuchet MS">The v3 <font face="Courier New">nunit-console</font> command line takes a “<font face="Courier New">/where</font>” argument which allows you to provide a filter to describe which test cases to run. This is a form of expression and the script’s default filter was essentially this:</font></p> <p><font face="Courier New">cat == Integration && cat != LongRunning</font></p> <p><font face="Trebuchet MS">Formatting this as a command line argument it then becomes:</font></p> <p><font face="Courier New">/where:“cat == Integration && cat != LongRunning”</font></p> <p><font face="Trebuchet MS">Note that the value for the <font face="Courier New">/where</font> argument contains spaces and therefore needs to be enclosed in double quotes. An alternative of course is to enclose the whole argument in double quotes instead:</font></p> <p><font face="Courier New">“/where:cat == Integration && cat != LongRunning”</font></p> <p><font face="Trebuchet MS">or you can try splitting the argument name and value up into two separate arguments:</font></p> <p><font face="Courier New">/where “cat == Integration && cat != LongRunning”</font></p> <p><font face="Trebuchet MS">I’ve generally found these command-line argument games unnecessary unless the tool I’m invoking is using some broken or naïve command line parsing library [1]. (In this particular scenario I could have removed the spaces too but if it was a path, like “<font face="Courier New">C:\Program Files\Xxx</font>”, I would not have had that luxury.)</font></p> <p><strong><font face="Trebuchet MS">PowerShell Differences</font></strong></p> <p><font face="Trebuchet MS">What I discovered was that on PowerShell v4 when an argument has embedded spaces it appears to ignore the embedded quotes and therefore sticks an extra pair of quotes around the entire argument, which you can see here:</font></p> <p><font face="Courier New">> $where='/where:"cat == Integration"'; & cmd /c echo $where <br />"/where:"cat == Integration""</font></p> <p><font face="Trebuchet MS">…whereas on PowerShell v5 it “notices” that the value with spaces is already correctly quoted and therefore elides the <em>outer</em> pair of double quotes:</font></p> <p><font face="Courier New">> $where='/where:"cat == Integration"'; & cmd /c echo $where <br /> /where:"cat == Integration"</font></p> <p><font face="Trebuchet MS">On PowerShell v4 only by removing the spaces, which I mentioned above may not always be possible, can you stop it adding the outer pair of quotes:</font></p> <font face="Courier New">> $where='/where:"cat==Integration"'; & cmd /c echo $where <br />/where:"cat==Integration"</font> <p><font face="Trebuchet MS">…of course now you don’t need the quotes anymore :o). However, if for some reason you are formatting the string, such as with the <font face="Courier New">–f</font> operator that might be useful (e.g. you control the value but not the format string).</font></p> <p><font face="Trebuchet MS">I should point out that this doesn’t just affect PowerShell v4, I also tried it on my Vista machine with PowerShell v2 and that exhibited the same behaviour, so my guess is this was “fixed” in v5.</font></p> <p><font face="Trebuchet MS"></font></p> <p><font face="Trebuchet MS">[1] I once worked with an in-house C++ based application framework that completely ignored the standard parser that fed <font face="Courier New">main()</font> and instead re-parsed the arguments, very badly, from the raw string obtained from <font face="Courier New">GetCommandLine()</font>.</font></p>Chris Oldwoodhttp://www.blogger.com/profile/18183909440298909448noreply@blogger.com0tag:blogger.com,1999:blog-6628985022531866193.post-37375413451455515832019-03-22T20:40:00.001+00:002019-03-22T20:40:35.094+00:00CI/CD Server Inline Scripts<p><font face="Trebuchet MS">As you might have already gathered if you’d read my 2014 post “<a href="http://chrisoldwood.blogspot.com/2014/10/building-pipeline-process-led-or.html">Building the Pipeline - Process Led or Product Led?</a>” I’m very much in favour of developing a build and deployment process locally first, then automating that, rather than clicking buttons in a dedicated CI/CD tool and hoping I can debug it later. I usually end up at least partially scripting builds anyway [1] to save time waiting for the IDE to open [2] when I just need some binaries for a dependency, so it’s not wasted effort.</font></p> <font face="Trebuchet MS"></font> <p><strong><font face="Trebuchet MS">Inline Scripts</font></strong></p> <font face="Trebuchet MS"></font> <p><font face="Trebuchet MS">If other teams prefer to configure their build or deployment through a tool’s UI I don’t really have a problem with that if I know I can replay the same steps locally should I need to test something out as the complexity grows. What I do find disturbing though is when some of the tasks use <em>inline</em> scripts to do something non-trivial, like perform the entire deployment. What’s even more disturbing is when that task script is then duplicated across environments and maintained independently.</font></p> <font face="Trebuchet MS"></font> <p><strong><font face="Trebuchet MS">Versioning</font></strong></p> <font face="Trebuchet MS"></font> <p><font face="Trebuchet MS">There are various reasons why we use a version control tool, but first and foremost they provide a history, which implies that we can trace back any changes that have been made and we have a natural backup should we need to roll back or restore the build server.</font></p> <font face="Trebuchet MS"></font> <p><font face="Trebuchet MS">Admittedly most half-decent build and deployment tools come with some form of versioning built in which you gives that safety net. However having that code versioned in a separate tool and repository from the main codebase means that you have to work harder to correlate what version of the system requires what version of the build process. CI/CD tools tend to present you with a fancy UI for looking at the history rather than giving you direct access to, say, it’s internal git repo. And even then what the tool usually gives you is “what” changed, but does not <em>also</em> provide the commentary on “why” it was changed. Much of what I wrote in my “<a href="http://www.chrisoldwood.com/articles/in-the-toolbox-commit-checklist.html">Commit Checklist</a>” equally applies to build and deployment scripts as it does production code.</font></p> <font face="Trebuchet MS"></font> <p><font face="Trebuchet MS">Although Jenkins isn’t the most polished of tools compared to, say, TeamCity it is pretty easy to configure one of the 3rd party plugins to yank the configuration files out and check them into the same repo as the source code along with a suitable comment. As a consequence any time the repo is tagged due to a build being promoted the Jenkins build configuration gets included for free.</font></p> <p><strong><font face="Trebuchet MS">Duplication</font></strong></p> <p><font face="Trebuchet MS">My biggest gripe is not with the versioning aspect though, which I believe is pretty important for any non-trivial process, but it’s when the script is manually duplicated across environments. Having no single point of truth, from a logic perspective, is simply asking for trouble. The script will start to drift as subtleties in the environmental differences become enshrined directly in the logic rather than becoming parameterised behaviours.</font></p> <p><font face="Trebuchet MS">The tool’s text editor for inline script blocks is usually a simple edit box designed solely for trivial changes; anything more significant is expected to be handled by pasting into a real editor instead. But we all know different people like different editors and so this becomes another unintentional source of difference as tabs and spaces fight for domination.</font></p> <p><font face="Trebuchet MS">Fundamentally there should be one common flow of logic that works for <em>every</em> environment. The differences between them should boil down to simple settings, like credentials, or cardinality of resources, e.g. the number of machines in the cluster. Occasionally there may be custom branches in logic, such as the need for a proxy server, but it should be treated as a minor deviation that <em>could</em> apply to any environment, but just happens to only be applicable to, say, one at the moment.</font></p> <p><strong><font face="Trebuchet MS">Testability</font></strong></p> <font face="Trebuchet MS"></font> <p><font face="Trebuchet MS">This naturally leads onto the inherent lack of testability outside of the tool and workflow. It’s even worse if the script makes use of some variable substitution system that the CI/CD tool provides because that means you have to manually fix-up the code before running it outside the tool, or keep running it in the tool and use <font face="Courier New">printf()</font> style debugging by looking at the task’s output.</font></p> <font face="Trebuchet MS"></font> <p><font face="Trebuchet MS">All script engines I’m aware of accept arguments, so why not run the script as an external script and pass the arguments from the tool in the tried and tested way? This means the tool runs it pretty much the same way you do except perhaps for some minor environmental differences, like user account or current working directory which are all common problems and easily overcome. Most modern scripting languages come with a debugger too which seems silly to give up.</font></p> <p><font face="Trebuchet MS">Of course this doesn’t mean that you have to make every single configuration setting a separate parameter to the script, that would be overly complicated too. Maybe you just provide one parameter which is a settings file for the environment with a bunch of key/value pairs. You can then tweak the settings as appropriate while you test and debug. While <a href="https://en.wikipedia.org/wiki/Idempotence">idempotence</a> and the ideas behind <a href="https://en.wikipedia.org/wiki/PowerShell#Desired_State_Configuration">Desired State Configuration</a> (DSC) are highly desirable, there is no reason we can’t also borrow from the <a href="https://en.wikipedia.org/wiki/Design_for_testing">Design for Testability</a> guidebook here too by adding features making it easier to test.</font></p> <p><font face="Trebuchet MS">Don’t forget that scripting languages often come with unit test frameworks these days too which can allow you to mock out code which has nasty side-effects so you can check your handling and orchestration logic. For example PowerShell has <a href="https://github.com/pester/Pester">Pester</a> which really helps bring some extra discipline to script development; an area which has historically been tough due to the kinds of side-effects created by executing the code.</font></p> <font face="Trebuchet MS"></font> <p><strong><font face="Trebuchet MS">Complexity</font></strong></p> <font face="Trebuchet MS"></font> <p><font face="Trebuchet MS">When an inline script has grown beyond the point where <a href="https://en.wikiquote.org/wiki/C._A._R._Hoare">Hoare</a> suggests “there are obviously no deficiencies”, which is probably anything more than a trivial calculation or invocation of another tool, then it should be decomposed into smaller functional units. Then each of these units can be tested and debugged in isolation and perhaps the inline script then merely contains a couple of lines of orchestration code, which would be trivial to replicate at a REPL / prompt.</font></p> <p><font face="Trebuchet MS">For example anything around manipulating configuration files is a perfect candidate for factoring out into a function or child script. It might be less efficient to invoke the same function a few times rather than read and write the file once, but in the grand scheme of things I’d bet it’s marginal in comparison to the rest of the build or deployment process.</font></p> <p><font face="Trebuchet MS">Many modern scripting languages have a mechanism for loading some sort of module or library of code. Setting up an internal package manager is a pretty heavyweight option in comparison to publishing a .zip file of scripts but if it helps keep the script complexity under control and provides a versioned repository that can be reliably queried at execution time, then why not go for that instead?</font></p> <p><strong><font face="Trebuchet MS">Scripts are Artefacts</font></strong></p> <font face="Trebuchet MS"></font> <p><font face="Trebuchet MS">It’s easy to see how these things happen. What starts off as a line or two of script code eventually turns into a behemoth before anyone realises it’s not been versioned and there are multiple copies. After all, the deployment requirements historically come up at the end of the journey, after the main investment in the feature has already happened. The pressure is then on to get it live, and build & deployment, like tests, is often just another second class citizen.</font></p> <p><font face="Trebuchet MS">The <a href="http://wiki.c2.com/?WalkingSkeleton">Walking Skeleton</a> came about in part to push back against this attitude and make the build pipeline and tests part and parcel of the whole delivery process; it should not be an afterthought. This means it deserves the same rigour we apply elsewhere in our process.</font></p> <p><font face="Trebuchet MS">Personally I like to see <em>everything</em> go through the pipeline, by which I mean that source code, scripts, configuration, etc. all enter the pipeline as versioned inputs and are passed along until the deployed product pops out the other end. The way you build your artefacts is inherently tied to the source code and project configuration that produces it. Configuration, whether it be infrastructure or application settings, is also linked to the version of the tools, scripts, and code which consumes it. It’s more awkward to inject version numbers into scripts, like you do with binaries, but even pushing them through the pipeline in a .zip file with version number in the filename makes a big difference to tracking the “glue”.</font></p> <font face="Trebuchet MS"></font> <p><font face="Trebuchet MS">Ultimately any piece of the puzzle that directly affects the ability to safely deliver continuous increments of a product needs to be held in high regard and treated with the respect it deserves.</font></p> <font face="Trebuchet MS"></font> <p><font face="Trebuchet MS"> </font></p> <p><font face="Trebuchet MS">[1] See “<a href="https://chrisoldwood.blogspot.com/2014/01/cleaning-workspace.html">Cleaning the Workspace</a>” for more about why I don’t trust my IDE to clean up after itself.</font></p> <p><font face="Trebuchet MS">[2] I’m sure I could load Visual Studio, etc. in “safe mode” to avoid waiting for all the plug-ins and extensions to initialise but it still seems “wrong” to load an entire IDE just to invoke the same build tool I could invoke almost directly from the command line myself.</font></p> <p><font face="Trebuchet MS"></font></p>Chris Oldwoodhttp://www.blogger.com/profile/18183909440298909448noreply@blogger.com0tag:blogger.com,1999:blog-6628985022531866193.post-78993936359535171322019-03-14T22:54:00.001+00:002019-03-14T22:54:05.340+00:00Abstraction with Database Views<p><font face="Trebuchet MS">After being away from the relational database world for a few years it’s been interesting coming back and working on a mature system with plenty of SQL code. It’s been said that SQL is the assembly language of databases and when SQL code is written only using its primitives (types and tables) it’s easy to see why.</font></p> <p><font face="Trebuchet MS">Way back in 2011 I wrote “<a href="http://chrisoldwood.blogspot.com/2011/05/public-interface-of-database.html">The Public Interface of a Database</a>” which was a distillation of my thoughts at the time about what I felt was generally wrong with much of the database code I saw. One aspect in particular which I felt was sorely underutilised was the use of views to build a logical model over the top of the physical model to allow a more emergent design to unfold. This post documents some of the ways I’ve found views to be beneficial in supporting a more agile approach to database design.</font></p> <p><strong><font face="Trebuchet MS">Views for Code Reuse</font></strong></p> <font face="Trebuchet MS"></font> <p><font face="Trebuchet MS">The first thing that struck me about the recent SQL code I saw was how much there was of it. Most queries were pretty verbose and as a consequence you had to work hard to comprehend what was going on. Just as you see the same tired examples around <font face="Courier New">Orders</font> => <font face="Courier New">OrderItems</font> => <font face="Courier New">Products</font> so the code had a similar set of 3 table joins over and over again as they formed the basis for so many queries.</font></p> <font face="Trebuchet MS"></font> <p><font face="Trebuchet MS">One of the primary uses for database views is as a code reuse mechanism. Instead of copy-and-pasting the same bunch of joins everywhere:</font></p> <font face="Trebuchet MS"></font> <p><font face="Courier New">FROM Orders o <br />INNER JOIN OrderItems oi <br />ON o.Id = oi.OrderId  <br />INNER JOIN Products p <br />ON oi.ProductId = p.Id</font></p> <font face="Courier New"></font> <p><font face="Trebuchet MS">we could simply say:</font></p> <font face="Trebuchet MS"></font> <p><font face="Courier New">FROM OrdersOrderItemsProducts</font></p> <p><font face="Trebuchet MS">This one simplification reduces a lot of complexity and means that wherever we see that name we instantly recognise it without mentally working through the joins in our head. Views are composable too meaning that we can implement one view in terms of another rather than starting from scratch every time.</font></p> <font face="Trebuchet MS"></font> <p><strong><font face="Trebuchet MS">Naming</font></strong></p> <font face="Trebuchet MS"></font> <p><font face="Trebuchet MS">However, if the name <font face="Courier New">OrdersOrderItemsProducts</font> makes you wince then I don’t blame you because it’s jarring due to its length and unnaturalness. It’s a classic attempt at naming based on how it’s implemented rather than what it <em>means</em>.</font></p> <font face="Trebuchet MS"></font> <p><font face="Trebuchet MS">I suspect a difficulty in naming views is part of the reason for their lack of use in some cases. For our classic example above I would probably go with <font face="Courier New">OrderedProducts</font> or <font face="Courier New">ProductsOrdered</font>. The latter is probably preferable as the point of focus is the <font face="Courier New">Products</font> “set” with the use of <font face="Courier New">Orders</font> being a means to qualify which products we’re interested in, like “users online”. Of course one could just easily say “unread messages” and therefore we quickly remember why naming is one of the two hardest problems in computer science.</font></p> <p><font face="Trebuchet MS">Either way it’s important that we do spend the time required to name our views appropriately as they become the foundation on which we base many of our other queries.</font></p> <font face="Trebuchet MS"></font> <p><strong><font face="Trebuchet MS">Views for Encapsulation</font></strong></p> <font face="Trebuchet MS"></font> <p><font face="Trebuchet MS">Using views as a code reuse mechanism is definitely highly beneficial but where I think they start to provide more value are as a mechanism for <em>revealing</em> new, derived sets of data. The name <font face="Courier New">ProductsOrdered</font> is not radically different from the more long-winded <font face="Courier New">OrdersOrderItemsProducts</font> and therefore it still heavily reflects the physical relationship of the underlying tables.</font></p> <font face="Trebuchet MS"></font> <p><font face="Trebuchet MS">Now imagine a cinema ticketing system where you have two core relationships: <font face="Courier New">Venue</font> => <font face="Courier New">Screen</font> => <font face="Courier New">SeatingPlan</font> and <font face="Courier New">Film</font> => <font face="Courier New">Screening</font> => <font face="Courier New">Ticket</font> => <font face="Courier New">Seat</font>. By navigating these two relationships it is possible to determine the occupancy of the venue, screen, showing, etc. and yet the term <font face="Courier New">Occupancy</font> says nothing about how that is achieved. In essence we have revealed a new abstraction (<font face="Courier New">Occupancy</font>) which can be independently queried and therefore elevates our thinking to a higher plane instead of getting bogged down in the lengthy chain of joins across a variety of base tables.</font></p> <p><strong><font face="Trebuchet MS">Views for Addressing Uncertainty</font></strong></p> <font face="Trebuchet MS"></font> <p><font face="Trebuchet MS">We can also turn this thinking upside down, so that rather than creating something new by hiding the underlying existing structure, we can start with something concrete and re-organise how things work underneath. This is the essence of refactoring – changing the design without changing the behaviour.</font></p> <p><font face="Trebuchet MS">When databases were used as <a href="https://martinfowler.com/bliki/IntegrationDatabase.html">a point of integration</a> this idea of hiding the underlying schema from “consumers” made sense as it gave you more room to change the schema without breaking a bunch of queries your consumers had already created. But even if you have sole control over your schema there is still a good reason why you might want to hide the schema, nay implementation, even from much of your own code.</font></p> <font face="Trebuchet MS"></font> <p><font face="Trebuchet MS">Imagine you are developing a system where you need to keep daily versions of your customer’s details easily accessible because you regularly perform computations across multiple dates [1] and you need to use the correct version of each customer’s data for the relevant date. When you start out you may not know what the most appropriate way to store them because you do not know how frequently they change, what kinds of changes are made, or how the data will be used in practice.</font></p> <font face="Trebuchet MS"></font> <p><font face="Trebuchet MS">If you assume that most attributes change most days you may well plump to just store them daily, in full, e.g.</font></p> <font face="Trebuchet MS"></font> <p><font face="Courier New">| Date       | Name      | Valuation | ... |  <br />| 2019-03-01 | Company A | £102m     | ... |   <br />| 2019-03-01 | Company B | £47m      | ... |   <br />| 2019-03-02 | Company A | £105m     | ... |   <br />| 2019-03-02 | Company B | £42m      | ... |   <br />| 2019-03-03 | Company A | £105m     | ... |   <br />| 2019-03-03 | Company B | £42m      | ... |</font></p> <font face="Courier New"></font> <p><font face="Trebuchet MS">On the contrary, if the attributes rarely change each day then maybe we can version the data instead:</font></p> <p><font face="Courier New">| Name      | Version | Valuation | ... | <br />| Company A | 1       | £147m     | ... | <br />| Company A | 2       | £156m     | ... | <br />| Company B | 1       | £27m      | ... |</font></p> <p><font face="Trebuchet MS">So far so good, but how do we track which version belongs to which date? Once again I can think of two obvious choices. The first is much like the original verbose table and we record it on a daily basis:</font></p> <p><font face="Courier New">| Date       | Name      | Version | <br />| 2019-03-01 | Company A | 1       | <br />| 2019-03-01 | Company B | 1       | <br />| 2019-03-02 | Company A | 1       | <br />| 2019-03-02 | Company B | 2       |</font></p> <p><font face="Trebuchet MS">The second is to coalesce dates with the same version creating a much more compact form:</font></p> <p><font face="Courier New">| From       | To         | Name      | Version | <br />| 2019-03-01 | (null)     | Company A | 1       | <br />| 2019-03-01 | 2019-03-01 | Company B | 1       | <br />| 2019-03-02 | (null)     | Company B | 2       |</font></p> <p><font face="Trebuchet MS">Notice how we have yet another design choice to make here – whether to use <font face="Courier New">NULL</font> to represent “the future”, or whether to put today’s date as the upper bound and bump it on a daily basis [2].</font></p> <p><font face="Trebuchet MS">So, with all those choices how do we make a decision? What if we don’t need to make a decision, <em>now</em>? What if we <a href="https://www.artima.com/weblogs/viewpost.jsp?thread=351308">Use Uncertainty as a Driver</a> and create a design that is easily changeable when we know more about the shape of the data and how it’s used?</font></p> <p><font face="Trebuchet MS">What we <em>do</em> know is that we need to process customer data on a per-date basis, therefore, instead of starting with a Customer <em>table</em> we start with a Customer <em>view</em> which has the shape we’re interested in:</font></p> <font face="Trebuchet MS"><font face="Courier New">| Date | Name | Valuation | ... |  </font> <br /></font> <p><font face="Trebuchet MS">We can happily use this view wherever we like knowing that the underlying structure could change without us needing to fix up lots of code. Naturally some code will be dependent on the physical structure, but the point is that we’ve kept it to a bare minimum. If we need to transition from one design to another, but can’t take the downtime to rewrite all the data up-front, that can often be hidden behind the view too.</font></p> <p><strong><font face="Trebuchet MS">Views as Interfaces</font></strong></p> <p><font face="Trebuchet MS">It’s probably my background [3] but I can’t help but notice a strong parallel in the latter two examples with the use of interfaces in object-oriented code. <a href="https://en.wikipedia.org/wiki/All_models_are_wrong">George Box</a> reminds us that “all models are wrong, but some are useful” and so we should be careful not to strain the analogy too far but I think there is some value in considering the relationship between views and tables as somewhat akin to interfaces and classes, at least for the purposes of encapsulation as described above.</font></p> <p><font face="Trebuchet MS">On a similar note we often strive to create and use the narrowest interface that solves our problem and that should be no different in the database world either. Creating narrower interfaces (views) allows us to remain more in control of our implementation by leaking less.</font></p> <p><font face="Trebuchet MS">One final type related comparison that I think worthy of mention is that it’s easier to spot structural problems when you have a “richer type system”, i.e. many well-named views. For example, if a query joins through <font face="Courier New">ProductsOrdered</font> to get to <font face="Courier New">UserPreferences</font> you can easily see something funky is going on.</font></p> <p><strong><font face="Trebuchet MS">Embracing Change</font></strong></p> <p><font face="Trebuchet MS">When you work alongside a database where the SQL code and schema gets refactored almost as heavily as the services that depend on it is a pleasurable experience [4]. <a href="https://en.wikipedia.org/wiki/Scott_Ambler">Scott Ambler</a> wrote a couple of books over a decade ago (<a href="http://www.ambysoft.com/books/refactoringDatabases.html">Refactoring Databases: Evolutionary Database Design</a> and <a href="http://www.ambysoft.com/books/agileDatabaseTechniques.html">Agile Database Techniques</a>) which convinced me long ago that it was possible to design databases that could embrace change. Making judicious use of views certainly helped achieve that in part by keeping the accidental complexity down.</font></p> <p><font face="Trebuchet MS">Admittedly performance concerns, still a dark art in the world of databases, gets in the way every now and but I’d rather <em>try</em> to make the database a better place for my successors rather than assume it can’t be done.</font></p> <p><font face="Trebuchet MS"> </font></p> <font face="Trebuchet MS"></font> <p><font face="Trebuchet MS">[1] In investment banking it’s common to re-evaluate trades and portfolios on historical dates both for regulatory and analytical purposes.</font></p> <p><font face="Trebuchet MS">[2] Some interesting scenarios crop up here when repeatability matters and you have an unreliable upstream data source.</font></p> <p><font face="Trebuchet MS">[3] I’m largely a self-taught, back-end developer with many years of writing C++ and C# based services.</font></p> <p><font face="Trebuchet MS">[4] Having a large suite of database unit tests, <a href="http://chrisoldwood.blogspot.com/2011/04/you-write-your-sql-unit-tests-in-sql.html">also written in T-SQL</a>, really helped as we could use TDD on the database schema too.</font></p> <p><font face="Trebuchet MS"></font></p>Chris Oldwoodhttp://www.blogger.com/profile/18183909440298909448noreply@blogger.com0tag:blogger.com,1999:blog-6628985022531866193.post-1056595102872532692019-03-08T19:38:00.001+00:002019-03-08T19:38:27.438+00:00The Perils of Multi-Phase Construction<p><font face="Trebuchet MS">I’ve never really been a fan of C#’s <a href="https://docs.microsoft.com/en-us/dotnet/csharp/programming-guide/classes-and-structs/object-and-collection-initializers">object initializer syntax</a>. Yes, it’s a little more convenient to <em>write</em> but it has a big downside which is it forces you to make your types mutable by default. Okay, that’s a bit strong, it doesn’t force <em>you</em> to do anything, but it does promote that way of thinking and allows people to take advantage of mutability outside the initialisation block [1].</font></p> <p><font face="Trebuchet MS">This post is inspired by some buggy code I encountered where my suspicion is that the subtleties of the object initialisation syntax got lost along the way and partially constructed objects eventually found their way into the wild.</font></p> <font face="Trebuchet MS"></font> <p><strong><font face="Trebuchet MS">No Dragons Yet</font></strong></p> <font face="Trebuchet MS"></font> <p><font face="Trebuchet MS">The method, which was to get the next message from a message queue, was originally written something like this:</font></p> <font face="Trebuchet MS"></font> <p><font face="Courier New">Message result = null; <br />RawMessage message = queue.Receive(); <br /> <br />if (message != null) <br />{ <br />  result = new Message <br />  { <br />    Priority = message.Priority, <br />    Type = GetHeader(message, “MessageType”), <br />    Body = message.Body,  <br />  }; <br />} <br /> <br />return result;</font></p> <font face="Courier New"></font> <p><font face="Trebuchet MS">This was effectively correct. I say “effectively correct” because it doesn’t contain the bug which came later but still relies on mutability which we know can be dangerous.</font></p> <font face="Trebuchet MS"></font> <p><font face="Trebuchet MS">For example, what would happen if the <font face="Courier New">GetHeader()</font> method threw an exception? At the moment there is no error handling and so the exception propagates out the method and back up the stack. Because we make no effort to recover we let the caller decide what happens when a duff message comes in.</font></p> <font face="Trebuchet MS"></font> <p><strong><font face="Trebuchet MS">The Dragons Begin Circling</font></strong></p> <font face="Trebuchet MS"></font> <p><font face="Trebuchet MS">Presumably the behaviour when a malformed message arrived was undesirable because the method was changed slightly to include some recovery fairly soon after:</font></p> <font face="Courier New"> Message result = null; <br />RawMessage message = queue.Receive(); <br /> <br />if (message != null) <br />{ <br /><strong>  try <br />  { <br /></strong>    result = new Message <br />    { <br />      Priority = message.Priority, <br />      Type = GetHeader(message, “MessageType”), <br />      Body = message.Body,   <br />    }; <br /><strong>  } <br />  catch (Exception e) <br />  { <br />    Log.Error(“Invalid message. Skipping.”); <br />  } <br /></strong>} <br /> <br />return result; </font> <p><font face="Trebuchet MS">Still no bug yet, but that catch handler falling through to the <font face="Courier New">return</font> at the bottom is somewhat questionable; we are making the reader work hard to track what happens to <font face="Courier New">result</font> under the happy / sad paths to ensure it remains correct under further change.</font></p> <p><strong><font face="Trebuchet MS">Object Initialisation Syntax</font></strong></p> <p><font face="Trebuchet MS">Before showing the bug, here’s a brief refresher on how the object initialisation syntax works under the covers [2] in the context of our example code. Essentially it invokes the default constructor first and then performs assignments on the various other properties, e.g.</font></p> <font face="Trebuchet MS"><font face="Courier New">var __m = new Message(); <br /> __m.Priority = message.Priority; <br />__m.Type = GetHeader(message, “MessageType”); <br />__m.Body = message.Body,   <br />result = __m; </font> <br /></font> <p><font face="Trebuchet MS">Notice how the compiler introduces a hidden temporary variable during the construction which it then assigns to the target at the end? This ensures that any exceptions during construction won’t create partially constructed objects that are bound to variables by accident. (This assumes you don’t use the constructor or property setter to attach itself to any global variables either.)</font></p> <p><font face="Trebuchet MS">Hence, with respect to our example, if any part of the initialization fails then <font face="Courier New">result</font> will be left as <font face="Courier New">null</font> and therefore the message is indeed discarded and the caller gets a <font face="Courier New">null</font> reference back.</font></p> <p><strong><font face="Trebuchet MS">The Dragons Surface</font></strong></p> <p><font face="Trebuchet MS">Time passes and the code is then updated to support a new property which is also passed via a header. And then another, and another. However, being more complicated than a simple string value the logic to parse it is placed <strong><em>outside</em></strong> the object initialisation block, like this:</font></p> <font face="Courier New">Message result = null; <br />RawMessage message = queue.Receive(); <br /> <br />if (message != null) <br />{ <br />  try <br />  { <br />    result = new Message <br />    { <br />      Priority = message.Priority, <br />      Type = GetHeader(message, “MessageType”), <br />      Body = message.Body,   <br />    }; <br /> <br /></font><font face="Courier New"><strong>    var str = GetHeader(message, “SomeIntValue”); <br />    if (str != null && TryParseInt(str, out var value)) <br />      result.IntValue = value; <br /></strong> <br />    <strong>// ... more of the same ...</strong> <br />  } <br />  catch (Exception e) <br />  { <br />    Log.Error(“Invalid message. Skipping.”); <br />  } <br />} <br /> <br />return result; </font> <p><font face="Trebuchet MS">Now the problems start. With the latter header parsing code <em><strong>outside</strong></em> the initialisation block <font face="Courier New">result</font> is assigned a partially constructed object while the remaining parsing code runs. Any exceptions that occur [3] mean that <font face="Courier New">result</font> will be left only partially constructed and the caller will be returned the duff object because the exception handler falls out the bottom.</font></p> <p><strong><font face="Trebuchet MS">+1 for Tests</font></strong></p> <p><font face="Trebuchet MS">The reason I spotted the bug was because I was writing some tests around the code for a new header which also temporarily needed to be optional, like the others, to decouple the deployments. When running the tests there was an error displayed on the console output [4] telling me the message was being discarded, which I didn’t twig at first. It was when I added a retrospective test for the previous optional fields and I found my new one wasn’t be parsed correctly that I realised something funky was going on.</font></p> <p><strong><font face="Trebuchet MS">Alternatives</font></strong></p> <p><font face="Trebuchet MS">So, what’s the answer? Well, I can think of a number of approaches that would fix this particular code, ranging from small to large in terms of the amount of code that needs changing and our appetite for it.</font></p> <p><font face="Trebuchet MS">Firstly we could avoid falling through in the exception handler and make it easier on the reader to comprehend what would be returned in the face of a parsing error:</font></p> <font face="Trebuchet MS"><font face="Courier New">catch (Exception e)   <br /> {   <br />  Log.Error(“Invalid message. Skipping.”); <br />  return null; <br /> } </font> <br /></font> <p><font face="Trebuchet MS">Secondly we could reduce the scope of the <font face="Courier New">result</font> variable and return that at the end of the parsing block so it’s also clearer about what the happy path returns:</font></p> <font face="Trebuchet MS"><font face="Courier New">var result = new Message   <br />{   <br />  // . . .   <br />}; <br /> <br />var str = GetHeader(message, “SomeIntValue”); <br />if (str != null && TryParseInt(str, out var value) <br />  result.IntValue = value; <br /> <br />return result;</font> <br /></font> <p><font face="Trebuchet MS">We could also short circuit the original check too and remove the longer lived <font face="Courier New">result</font> variable altogether with:</font></p> <font face="Courier New">RawMessage message = queue.Receive(); <br /> <br />if (message == null) <br />    return null;</font> <br /> <p><font face="Trebuchet MS">These are all quite simple changes which are also safe going forward should someone add more header values in the same way. Of course, if we were truly perverse and wanted to show how clever <em>we</em> were, we could fold the extra values back into the initialisation block by doing an <a href="https://refactoring.com/catalog/extractFunction.html">Extract Function</a> on the logic instead and leave the original dragons in place, e.g.</font></p> <font face="Trebuchet MS"><font face="Courier New">try <br />{   <br />  result = new Message   <br />  {   <br />    Priority = message.Priority,   <br />    Type = GetHeader(message, “MessageType”),   <br />    Body = message.Body, <br />    IntValue = GetIntHeader(message, “SomeIntValue”), <br />    // ... more of the same ...   <br />  }; <br />}   <br /> catch (Exception e)   <br /> {   <br />  Log.Error(“Invalid message. Skipping.”);   <br />}</font> <br /></font> <p><font face="Trebuchet MS">But we would never do that because the aim is to write code that helps stop people making these kinds of mistakes in the first place. If we want to be clever we should make it easier for the maintainers to fall into <a href="https://blog.codinghorror.com/falling-into-the-pit-of-success/">The Pit of Success</a>.</font></p> <p><font face="Trebuchet MS"><strong>Other Alternatives</strong></font> </p> <p><font face="Trebuchet MS">I said at the beginning that I was not a fan of mutability by default and therefore it would be remiss of me not to suggest that the entire <font face="Courier New">Message </font>type be made immutable and all properties set via the constructor instead:</font></p> result = new Message   <br />(   <br />  priority: message.Priority,   <br />  type: GetHeader(message, “MessageType”),   <br />  body: message.Body, <br />  IntValue: GetIntHeader(message, “SomeIntValue”), <br />  // ... more of the same ...   <br />); <br /> <p>Yes, adding a new property is a little more work but, as always, writing the tests to make sure it all works correctly will dominate here.</p> <p><em>I would also prefer to see use of an <a href="https://stackoverflow.com/questions/16199227/optional-return-in-c-net">Optional<></a><font face="Courier New"></font> type instead of a <font face="Courier New">null</font> reference for signalling “no message” but that’s <a href="https://accu.org/index.php/journals/2530">a different discussion</a>.</em></p> <p><strong><font face="Trebuchet MS">Epilogue</font></strong></p> <p><font face="Trebuchet MS">While this bug was merely “theoretical” at the time I discovered it [5] it quickly came back to bite. A bug fix I made on the <em>sending</em> side got deployed before the receiving end and so the misleading error popped up in the logs after all.</font></p> <p><font face="Trebuchet MS">Although the system appeared to be functioning correctly it had slowed down noticeably which we quickly discovered was down to the receiving process continually restarting. What I hadn’t twigged just from reading this nugget of code was that due to the catch handler falling through and passing the message on it was being acknowledged on the queue twice –– once in that catch handler, and again after processing it. This second acknowledgment attempt generated a fatal error that caused the process to restart. Deploying the fixed receiver code as well sorted the issue out.</font></p> <p><font face="Trebuchet MS">Ironically the impetus for my blog post “<a href="https://chrisoldwood.blogspot.com/2012/09/black-hole-fail-fast-anti-pattern.html">Black Hole - The Fail Fast Anti-Pattern</a>” way back in 2012 was also triggered by two-phase construction problems that caused a process to go into a nasty failure mode, but that time it processed messages <em>much too quickly</em> and stayed alive failing them all.</font></p> <p><font face="Trebuchet MS"> </font></p> <font face="Trebuchet MS"></font> <p><font face="Trebuchet MS">[1] Generally speaking the setting of multiple properties implies it’s <em>multi</em>-phase construction. The more common term <em>Two-Phase Construction</em> comes (I presume) from explicit constructor methods names like <font face="Courier New">Initialise()</font> or <font face="Courier New">Create()</font> which take multiple arguments, like the constructor, rather than setting properties one-by-one.</font></p> <p><font face="Trebuchet MS">[2] This is based on my copy of <a href="https://www.amazon.co.uk/Programming-Language-Annotated-Microsoft-Development/dp/0321562992">The C# Programming Language: The Annotated Edition</a>.</font></p> <p><font face="Trebuchet MS">[3] When the header was missing it was passing a <font face="Courier New">null</font> <font face="Courier New">byte[]</font> reference into a UTF8 decoder which caused it to throw an <font face="Courier New">ArgumentNullException</font>.</font></p> <p><font face="Trebuchet MS">[4] Internally it created a logger on-the-fly so it wasn’t an obvious dependency that initially needed mocking.</font></p> <p><font face="Trebuchet MS">[5] It’s old, so possibly it did bite in the past but nobody knew why or it magically fixed itself when both ends where upgraded close enough together.</font></p>Chris Oldwoodhttp://www.blogger.com/profile/18183909440298909448noreply@blogger.com0tag:blogger.com,1999:blog-6628985022531866193.post-11133357073482237682019-03-04T19:11:00.001+00:002019-03-04T19:11:45.354+00:00A Not So Minor Hardware Revision<p><font face="Trebuchet MS">[<em>These events took place two decades ago, so consider it food for thought rather than a modern tale of misfortune. Naturally some details are hazy and possibly misremembered but the basic premise is still sound.</em>]</font></p> <font face="Trebuchet MS"></font> <p><font face="Trebuchet MS">Back in the late ‘90s I was working on a </font><a href="https://simple.wikipedia.org/wiki/Travelling_salesman_problem"><font face="Trebuchet MS">Travelling Salesman</font></a><font face="Trebuchet MS"> style problem (TSP) for a large oil company which had performance improvements as a key element. Essentially we were taking a new rewrite of their existing scheduling product and trying to solve some huge performance problems with it, such as taking many minutes to load, let alone perform any scheduling computations.</font></p> <font face="Trebuchet MS"></font> <p><font face="Trebuchet MS">We had made a number of serious improvements, such as reducing the load time from minutes to mere seconds, and, given our successes so far, were tasked with continuing to implement the rest of the features that were needed to make it usable in practice. One feature was to import the set of orders from the various customer sites which were scheduled by the underlying TSP engine.</font></p> <p><font face="Trebuchet MS"><strong>The Catalyst</strong></font></p> <font face="Trebuchet MS"></font> <p><font face="Trebuchet MS">The importing of orders required reading some reasonably large text files, parsing them (which was implemented using the classic <a href="https://en.wikipedia.org/wiki/Lex_(software)">Lex</a> & <a href="https://en.wikipedia.org/wiki/Yacc">YACC</a> toolset) and pushing them into the database where upon the engine would find them and work out a schedule for their delivery.</font></p> <font face="Trebuchet MS"></font> <p><font face="Trebuchet MS">Initially this importer was packaged as an ActiveX control, written in C and C++, and hosted inside the PowerBuilder (PB) based GUI. Working on the engine side (written entirely in C) we had created a number of native test harnesses (in C++/MFC) to avoid needing to use the PB front-end unless absolutely necessary due to its generally poor performance. Up until this point the importer appeared to work fine on our dev workstations, but when it was passed to the QA a performance problem started showing up.</font></p> <font face="Trebuchet MS"></font> <p><font face="Trebuchet MS">The entire team (developers and tester) had all been given identical Compaq machines. Give that we needed to run Oracle locally as well as use it for development and testing we had a whopping 256 MB of RAM to play with along with a couple of cores. The workstations were running Windows NT 4.0 and we were using Visual C++ 2 to develop with. As far as we could see they looked and behaved identically too.</font></p> <p><font face="Trebuchet MS"><strong>The Problem</strong></font></p> <font face="Trebuchet MS"></font> <p><font face="Trebuchet MS">The initial bug report from the QA was that after importing a fresh set of orders the scheduling engine run took orders of magnitude longer (no pun intended) to find a solution. However, after restarting the product the engine run took the normal amount of time. Hence the conclusion was that the importer ActiveX control, being in-process with the engine, was somehow causing the slowdown. (This was in the days before the <a href="https://docs.microsoft.com/en-us/windows/desktop/memory/low-fragmentation-heap">low-fragmentation heap</a> in Windows and heap fragmentation was known to be a problem for our kind of application.)</font></p> <font face="Trebuchet MS"></font> <p><font face="Trebuchet MS">Weirdly though the developer of the importer could not reproduce this issue on their machine, or another developer’s machine that they tried, but it was pretty consistently reproducible on the QA’s machine. As a workaround the logic was hoisted into a separate command-line based tool instead which was then passed along to the QA to see if matters improved, but it didn’t. Restarting the product was the only way to get the engine to perform well after importing new orders and naturally this wasn’t a flyer with the client as this would happen in real-life throughout the day.</font></p> <font face="Trebuchet MS"></font> <p><font face="Trebuchet MS">In the meantime I had started to read up on Windows heaps and found <a href="https://docs.microsoft.com/en-us/windows/desktop/api/heapapi/">some info</a> that allowed me to write some code which could help analyse the state of the heaps and see if fragmentation was likely to be an issue anyway, even with the importer running out-of-process now. This didn’t turn up anything useful at the time but the knowledge did come in handy <a href="http://www.chrisoldwood.com/articles/utilising-more-than-4gb.html">some years later</a>.</font></p> <font face="Trebuchet MS"></font> <p><font face="Trebuchet MS">Tests on various other machines were now beginning to show that the problem was most likely with the QA’s machine or configuration rather than with the product itself. After checking some basic Windows settings it was posited that it might be a hardware problem, such as a faulty RAM chip. The Compaq machines we had been given weren’t cheap and weren’t using cheap RAM chips either; the POST was doing a memory check too, but it was worth checking out further. Despite swapping over the RAM (and possibly CPUs) with another machine the problem still persisted on the QA’s machine.</font></p> <font face="Trebuchet MS"></font> <p><font face="Trebuchet MS">Whilst putting the machines back the way they were I somehow noticed that the motherboard revision was slightly different. We double-checked the version numbers and the QAs machine was one <em>minor</em> revision lower. We checked a few other machines we knew worked and lo-and-behold they were all on the newer revision too.</font></p> <font face="Trebuchet MS"></font> <p><font face="Trebuchet MS">Fortunately, inside the case of one machine was the manual for the motherboard which gave a run down of the different revisions. According to the manual the slightly lower revision motherboard only supported caching of the first 64 MB RAM! <font face="Trebuchet MS">Due to the way the application’s memory footprint changed during the order import and subsequent cache reloading it was entirely plausible that the new data could reside outside the cached region [1].</font></font></p> <p><font face="Trebuchet MS"><font face="Trebuchet MS"><font face="Trebuchet MS">This was enough evidence to get the QA’s machine replaced and the problem never surfaced again.</font></font></font></p> <font face="Trebuchet MS"></font> <p><strong><font face="Trebuchet MS">Retrospective</font></strong></p> <font face="Trebuchet MS"></font> <p><font face="Trebuchet MS">Two decades of experience later and I find <font face="Trebuchet MS">the way this issue was handled as rather peculiar by today’s standards.</font></font></p> <font face="Trebuchet MS"></font> <p><font face="Trebuchet MS">Mostly I find the amount of time we devoted to identifying this problem as inappropriate. Granted, this problem was weird and one of the most enjoyable things about software development is dealing with “interesting” puzzles. I for one was no doubt guilty of wanting to solve the mystery <em>at any cost</em>. We should have been able to chalk the issue up to something environmental much sooner and been able to move on. Perhaps if a replacement machine had shown similar issues later it would be cause to investigate further [2].</font></p> <font face="Trebuchet MS"></font> <p><font face="Trebuchet MS">I, along with most of the other devs, only had a handful of years of experience which probably meant we were young enough not to be bored by such issues, but also were likely too immature to escalate the problem and get a “grown-up” to make a more rational decision. While I suspect we had experienced <em>some</em> hardware failures in our time we hadn’t experienced enough weird ones (i.e. non-terminal) to suspect a hardware issue sooner.</font></p> <font face="Trebuchet MS"></font> <p><font face="Trebuchet MS">Given the focus on performance and the fact that the project was acquired from a competing consultancy after they appeared to “drop the ball” I guess there were some political aspects that I would have been entirely unaware of. At the time I was solely interested in finding the cause [3] whereas now I might be far more aware of any ongoing “costs” in this kind of investigation and would no doubt have more clout to short-circuit it even if that means we never get to the bottom of it.</font></p> <p><font face="Trebuchet MS">As more of the infrastructure we deal with moves into the cloud there is less need, or even ability, to deal with problems in this way. That’s great from a business point of view but I’m left wondering if that takes just a little bit more fun out of the job sometimes.</font></p> <font face="Trebuchet MS"> <p><font face="Trebuchet MS"> </font></p> </font> <p><font face="Trebuchet MS"><font face="Trebuchet MS">[1] This suggests to me that the OS was dishing out physical pages from a free-list where address ordering was somehow involved. I have no idea how realistic that is or was at the time.</font></font></p> <p><font face="Trebuchet MS"><font face="Trebuchet MS">[2] It’s entirely possible that I’ve forgotten some details here and maybe more than one machine was acting weirdly but we focused on the QA’s machine for some reason.</font></font></p> <p><font face="Trebuchet MS">[3] I’m going to avoid using the term “root cause” because we know from <em><a href="https://web.mit.edu/2.75/resources/random/How%20Complex%20Systems%20Fail.pdf">How Complex Systems Fail</a></em> that we still haven’t gotten to the bottom of it. For example, where does the responsibility for verifying the hardware was identical lie, etc.?</font></p>Chris Oldwoodhttp://www.blogger.com/profile/18183909440298909448noreply@blogger.com1