Friday, 13 February 2026

The Illusion of a One-Time Set-Up

One of the most laborious things about starting work at a new client / employer, can be getting your machine and user account configured so that you are able to work on their codebase.  For me, the acid test of whether I’m in a good place is being able to build the code and run the necessary test suites locally that I’ll be relying on for the early feedback loop. But that’s the bare minimum.

There are often plenty of tools to install and configure, and I’m not just talking about light vs dark mode, but team and organisation level stuff once you need to reach out across the network to dev, test, and production services. The number of permissions for your user account can be extensive if you’re working on a system that has lots of tentacles that reach out to message queues, databases, APIs, etc. especially when the organisation doesn’t use single sign-on everywhere.

One-Stop Shop

Sometimes this set-up process goes really smoothly and you can bootstrap yourself with very little effort, while other times it’s a long hard slog. In the corporate world there are typically more gates to go through as, for example, local admin rights are not conferred by default and your software choices are limited to what they grant access to. (See Getting Personal for a rant about where consistency in tooling actually matters.) Once you have access to the version control system, and the VCS tool installed, in theory you have the gateway to getting yourself set-up with a metaphorical “flick of the switch”…

If only that were always the case. Sadly it’s not unusual to be given little to nothing to work from. If the team uses a complex IDE like Visual Studio then it can often be assumed that once you have that installed then you’re 99% of the way there. This is only the true if you believe software developers only “write code”.

Maybe you are given a wiki page which helpfully lists many of the tools you need (to request from the company’s software portal) but probably neglects to tell you how you need to configure them based on the team’s typical workflow and development environments. The kinds of things I’m talking about here are local / shared databases (including drivers and DSNs, or containers), the various Azure emulators for storage and queues (or shared cloud hosted instances), shared AWS resources, upstream and downstream in-house dependent services, etc. Okay, you may not have the required access up-front but getting the approval signed should be the only burden, once you have that you don’t need to be fumbling around trying to work out how to make use of your new found powers.

Once Upon a Time

The reason typically given for not providing a better DX (developer experience) is that this is a “one-time setup” and the cost obviously isn’t worth it. But here’s the thing: while it may be a one-time set-up for that particular user and machine, it’s just one of many when you factor in all the places where this kind of workflow will be needed in practice.

Okay, so you don’t normally get people joining the team every few days, but configuring a local developer’s machine is the least interesting case, in my opinion. Where this blinkered one-time setup thinking really starts to cause a problem is once you factor in the build and deployment pipelines.

The whole “works on my machine” meme exists because it highlights the missing appreciation for what goes into turning a pending code change into a feature deployed across the real estate, or onto users desktops and/or phones. All the other machines that are required to build and run the same code and tests which you have on your machine also need to be set-up too.

Automation Friendly

While you might build a demo-able artefact on a developer’s machine, any release binaries or deployment packages will always be built in a “clean room” environment because a developer’s desktop is typically tainted with the results from experiments with ad-hoc code and tooling. While in the (not so) distant past we might have built and maintained the build and deployment pipeline servers, and dev, test, and production servers carefully by hand (aka snowflakes), those days should be long behind us. The rise in virtualisation and the purity gained from the “immutable infrastructure” movement means that the various steps in our once one-time set-up is now repeated, over and over again. This is even more apparent when the unit of delivery is an entire VM or container rather than just an application package. (Not seeing the similarities between how you build and test locally versus the entire delivery pipeline is a topic I covered way back in 2014 in Building the Pipeline - Process Led or Product Led?)

What this effectively boils down to is having an automation mindset. While the meme tells you to “automate all the things”, and this is a venerable goal, I’ve seen the pendulum swing too far so prefer the more pragmatic Automate Only What You Need To. Pedantry aside, the key point is that you think about how best to share the “process” you’re just about to discover. While it may be quicker for you to use a UI to perform this particular task now, if there is any possibility that other people or machines will need to perform it too, or it’ll be used as part of some automated process then it behooves you to spend a little bit of time looking at whether there is an automation-friendly approach which might be worth exploring instead.

MVP – Minimum Viable Process

Maybe you don’t have the time right now to write a nice little script that does “all the things” but have you considered whether there is an approach which at least leans into that? For example, instead of sharing a URL for downloading a tool that then has to be manually installed, see if it’s available via a package manager, which can later be scripted as part of the larger workflow.

While automation has always been Linux’s strong suit, Windows has improved hugely over the years such that many tasks which were once only accessible via the GUI can now be performed with a “one-liner”, if you ask your favourite search engine the right question. In essence instead of “how do I do X?” you need to append “from the command line” to access this Other World.

Console-ation Prize

Every time someone documents a process using a series of screenshots a kitten dies. Taking screenshots is labour intensive and far more likely to go out of date because vendors love to add new features and give their tools a facelift. In contrast a one-liner (or few lines) in a monospaced font on a wiki page is practically timeless in comparison and almost impossible for someone to mess up. It’s easy for someone else to then take on and turn into a script going forward.

You don’t need to be a DevOps kind of person to appreciate the simpler things in life, just someone who enjoys paying it forward when possible.

Monday, 26 January 2026

The Case of the Disappearing App

[Despite being almost 30 years ago, and the finer details being lost in the mists of time, the punchline has always stayed with me – it’s scar tissue I guess :o).]

I was contracting in a team which had taken over a codebase from another company, to try and improve the performance, as it was far below what was needed for even the smaller use cases they had in mind.

The front-end was a PowerBuilder based GUI and I was a C developer working on the in-process back-end library which was a scheduling engine (think: travelling salesman problem). To be more precise, the actual scheduling engine was written in Pascal which was then translated into C and statically linked into the back-end library which acted as the infrastructure code – reading & writing to the database, a large caching layer, providing the C API for PowerBuilder to interact with engine, etc.

One day there was a report from the QA team (which was effectively one guy – the same guy I mentioned before in A Not So Minor Hardware Revision) about the application just “disappearing”. There was no error message from the application, and no UAE / GPF style crash dialog from the OS either, which was Windows NT 4 at the time.

We could reproduce the problem fairly easily. It basically involved importing a new set of orders and running the scheduling engine. This was pretty much what we had been doing all along to profile the application and address the various performance issues we ran across. We had some nice test datasets that we kept using across the entire team and I suspect (now) he was probably using a newer one [1].

What was weird was that there no trace of the application terminating. If it was crashing Windows would have told us, and allowed us to attach the debugger. I even ran the application under the debugger and made sure I’d enabled “first chance exceptions” in case any Structured Exception Handling was being triggered and swallowed (it was C, not C++ so there would be no exceptions per-se). There was no mention in the event log either of anything unusual.

In the end I resorted to single stepping into the back-end and slowly drilling down into the code to see if I could work out if there was something in our code causing this. The core engine used a Tabu Search and called back out repeatedly to get various bit of data like product details, driving time between two points, etc. It was dying somewhere inside the core engine, but due to the cyclic nature of the execution it was hard to put a breakpoint anywhere useful. (This was Visual C++ 2.0 and conditional breakpoints had a habit of slowing things right down which added to the debugging burden as it didn’t trigger that quickly as-is.)

During my debugging attempts I began to notice that the memory footprint was spiking and had a suspicion that ultimately the cause was an out-of-memory problem. Native applications typically behave pretty badly when they run out of memory as it tends to lead to dereferencing a NULL pointer and the process unceremoniously crashing. But, as pointed out earlier the Windows crash handler was not being invoked so something else was going on.

Eventually I decided to put conditional breakpoints on the prologue of the CRT functions malloc() and calloc() in the hope that I could catch it returning a NULL pointer. It worked, and as I slowly walked back up the function stack I could see why the application simply disappeared – the error handling simply called abort()! Although this was quite an abrupt termination from a users perspective, from a Windows OS point of view the process had exited gracefully and therefore no crash handler was needed. Under the covers I suspect abort() called TerminateProcess() and so the PowerBuilder GUI would not have been able to catch the process’ impending doom either. If I remember correctly we found a couple of these calls to abort() and removed them so that the error could attempt to bubble back up the stack and at least the user would get notified one way or another of the unstable nature.

As an aside, this wasn’t the only memory issue we had on that project. Not counting the bizarre performance issue in A Not So Minor Hardware Revision we also suffered badly from heap fragmentation because this was before the days of the low-fragmentation heap. The knowledge I learned along the way debugging these, and other memory and heap issues elsewhere, paved the way for my first proper ACCU article Utilising More Than 4GB of Memory in a 32-bit Windows Process.

 

[1] I also seem to remember running Oracle locally so that each developer had their own database for testing which would have affected the overall memory footprint for the machine.