Friday, 2 May 2025

Codurance AI Hackathon

This is quite a long post (15 mins) and if you’re only interested in my final musings, and not the day itself, just skip to the last two sections.

Last Saturday I got to attend an event at Codurance’s offices in London around the use of AI based tooling to aid software delivery. I did not have much experience with these kinds of tools, in part because my current client uses their own in-house programming language [1]. Hence, my experiences to date have been limited to some messing around in VS Code with Copilot on some fairly simple C# katas.

Event Format

There were about twenty of us spread across a range of ages but, generally speaking, all fairly experienced programmers. We were split into two teams (A and B) and got to tackle two problems – one in the morning, and another in the afternoon. We got to spend 2 hours on each problem, with one team allowed to use AI tooling and the other not. Then, in the afternoon, we switched roles so that both teams got to try with and without AI assistance across the day. We also worked in pairs or threes. Both problems were very similar: effectively a web based service consisting of a backend API, that used some kind of database, and a frontend UI. If you’re thinking: an online clothing store / IMDB, then you’d be pretty close to the two exercises.

There were some questions about AI in the context of search engines as Google, for instance, puts its AI spin on the result. This was clarified as meaning no direct use of AI tooling for code generation and tooling suggestions. The non-AI teams had to rely on search engines / Stack Overflow, blog posts, docs, etc.

Another question (from yours truly) was about how much of a “hackathon” this really was, or whether we should approach it more like we were building something for real. The answer was that we should try and treat it more like a real project than something discardable like a prototype.

The Non-AI Exercise

I was in Team B, and for the morning session we weren’t allowed to use any AI tooling for our online clothing store. Ironically, trying to disable it to avoid biasing proved non-trivial, while still leaving the traditional IntelliSense and refactoring tools enabled. (Where does one end and the other begin…)

Our group of three had assembled because we all had C# in common. I’ve not done any non-trivial web UI work for twenty years, and it’s also been 5 years since I’ve done any production C# [1] so I wasn’t exactly up on the in-vogue architecture choices. Consequently I left it to the other two to suggest a Blazor based approach, which suited me as I had at least done some work with Razor 8 years ago, so was familiar with the underlying concepts.

The dataset we were given was hosted by Kaggle and came as a ZIP file that included a CSV file and some images. Trying to obtain this without having to sign-up to their service was another small barrier, eventually overcome when I unearthed the CURL instructions which thankfully didn’t require any authentication.

We all agreed to start by creating a walking skeleton that was a simple Blazor app. With some guidance from my colleagues, I used VS Code to knock that up and we were off with a basic web page in the browser.

We then looked at the dataset in a little more detail to see what shape it was in, and there was some discussion about next steps, such as pulling that entire CSV file in. That came off the back of the initial discussion about the model we should use to back the service. Yes, the categories aspect was an interesting modelling problem but I wanted to just to create a simple model with ID, Description, and Price, then hardcode a list of three items, and get that visible on the page to complete our skeleton. That idea was accepted and we got that working quite quickly.

At this point we all got a bit nervous as we hadn’t got any tests in our skeleton. Luckily one of us already had some experience with bUnit (a testing framework for Blazor), so we managed to get a test project and initial test in place relatively easily that checked for one of our hard-coded items being on the page.

This paved the way for starting to add new features in a test-first manner and, with what time we had left of the two hours, we started to work on the test for the product details page which comes from selecting an item on the products page. We had the failing test in place when the metaphorical whistle blew for half-time.

The AI Assisted Exercise

After lunch, where we had a chance to mingle and chat to others about their experiences so far, we tackled the IMDB clone with the addition of AI assistance.

We chose to remain in the same group, while others paired up with different people. We also thought it would be useful to use the same stack as that seemed like a useful comparison, although we did switch laptop and consequently from VS Code to JetBrain’s Rider. Once again there was some fiddling about with settings.

None of us had ever tried creating an entire project from scratch using an AI tool so we threw the README into the chat window prefixed by a sentence that asked it to create the entire solution and projects using Blazor as the stack.

The chat window started spewing out snippets of code which we then started to look at. I think we were all quite surprised about how plausible the code looked. We had some minor concerns about some of its design choices but it all looked pretty sane as a starting point. We hadn’t asked it to generate any tests, preferring instead to let this become an additional change later.

Trying to turn the suggested code in the chat window into an actual solution structure in Rider turned out to be our first time-sink. The UI was a little confusing and the laptop owner hadn’t toggled a setting which we later discovered allowed it to automatically create the files itself. Instead, to make progress in the meantime, we ended up creating the basic solution and project files ourselves, just like before. By the time we found the necessary setting we already had that aspect in place. However, once we enabled it we could at least take the suggestions for the model and Razor pages from the AI output.

This was the point when we discovered what looked plausible wasn’t entirely correct. For example, the code comment suggesting the relative path of the file didn’t match the file it created and it didn’t quite follow the Blazor convention. Hence, we moved a couple of files to follow the expected layout.

However, that said, when we built the code it worked first time and the web page appeared and looked totally sane. One of us remarked immediately that just getting the Bootstrap (UI) set-up right can be a real pain in practice and that alone was a real blessing.

Once we tried to interact with the page though we found it didn’t work. Cue the next time-sink, which was getting the filter feature to actually work. Naturally, we asked the AI tool what was wrong and it very confidently told us what was wrong and how to fix it. But, when I looked closely at the fix I pointed out that it had just re-written the existing code in a more complex manner but left it functionally the same. We went with it anyway but of course it didn’t work.

Another pointless suggestion from the AI tool about how to fix that didn’t inspire confidence at this point, so we resorted to using the debugger to see if the code was even being triggered. It wasn’t, although inspecting the DOM in the browser looked correct. I had raised a question earlier about session state management which didn’t initially sink in until a little later when our Blazor expert realised the page was missing some “interactive server” declaration at the top of the Razor page. Once that was added things started working.

It’s hard to know if the impedance mismatch was because we manually created the project files and the AI generated ones would have been correct for the code it was producing, or if it would still have been wrong.

Sadly, all this friction meant we had little time left to really explore the AI tool in the context of making changes to the existing codebase to add new features or refactor the code. We asked it a few cursory questions to see what it would suggest and it would always reply in a very confident way but would include changes that would revert some of our changes we made during our debugging attempts. I don’t know if that was a result of us not resetting the “context window” because those changes were ones it had originally suggested right at the beginning.

Reflections on the Experience

First up, I don’t know about how the others in my little group felt, but I was woefully underprepared for this day. With only two hours for each exercise we should have been in a position to hit the ground running and I just didn’t have enough experience with configuring or using the AI tool to make the most effective use of the little time available for the exercises.

Related to that, the problem domain wasn’t something I have any real experience of either. While this might have been useful for the “AI as a learning tool” aspect, I’m not sure it helps when trying to compare the two approaches.

With any day like this where you are working with other people you’ve never met before some element of “team dynamics” is going to distort the picture. My preference is to work in small steps, really small steps, and deciding whether to assert that or “go with the flow” is a much bigger side-quest than in an established team where you already know everyone’s favoured approach and politeness levels are well calibrated.

When you only have two hours there is a clear conflict between doing what you really would on a real project versus trying to explore the goals of the exercise. For example, due to my C# being a little rusty I wrote a classic POCO style class for the model instead using the more modern record types in C#, which I was aware of but had never actually used in practice. This prompted a short conversation about our feelings around immutability and anaemic models which is something a real team needs to resolve, but we simply don’t have time for that despite the “not really a hackathon” premise. Likewise, I have opinions on naming, design, etc. which I just have to put to one side this time.

However, I forgot this when writing our second test (in the first exercise) and naturally wrote a test name which opened a discussion about whether we were testing the navigation aspect or the state of the resulting page. Again, I think the approach to testing – the kinds of tests we want to write – is important in a real project. It would have been interesting to see what kinds of tests the AI tool would have generated as I see that being touted as a big use case, but given how poorly tests are commonly written I’m sceptical it has the corpus available to have “good taste”.

As I’ve already said above, I think we were all impressed with how much the tool churned out from the README, but are also aware that online stores and review sites are a well trodden path, with potentially a lot of example code to draw from. Other attendees got a lot further than we did, with one pair (using NodeJS) completing the exercise and having time to make up new features! In the post-exercise discussion one of that pair questioned how it would fair for, say, some embedded software where there are far less examples to train on.

The over confidence in the replies when we were trying to fix the code were scary. I think we were all sceptical and seeing it behave like this did not endear itself to us at all. Plus, trying to revert our code changes and only change the code it should need to when adding new features went against our agreed approach of only changing one thing at a time. Maybe we are being too rigid here and should cede more control in those early moments before we “hit production”?

We only got to work on producing code we had helped to write. It might have been interesting to have swapped at half-time and continue working on one of the other team’s AI generated codebases to see what it might be like to pick-up one of those as that is surely the future situation we’ll be finding ourselves in fairly soon.

Epilogue

I really enjoyed the day and am most grateful for the experience, but I don’t think I contributed that much personally to the experiment because of my current lack of knowledge around using these kinds of tools in practice.

I’ve only used Copilot integrated into VS Code on my laptop but would like to try some others based on the other group’s experiences, as long as they have tight integration into an IDE. I like baby steps and an “always be ready to ship” mentality which I’d feel uncomfortable giving up at this point in time. This is more like enhanced auto-complete based on what I’m already typing directly in the code than trying to write external prompts in a separate window which feels like unnecessary context switching when writing code.

Once I am more used to these new tools I’d like to try the exercise again and go “all in” and see what it chucks out, and then spend some significant time trying to work with that codebase to get a better feel for how much the AI tool wants to change each time. Automating the boilerplate stuff has to be worth exploring, but the question is how much leash to give it?

There is definitely a difference between building a disposable prototype and making changes in a large mature codebase. I think I’ve got more of a taste of the former now (coupled with my own fumblings beforehand) but I’m going to need more time to explore how this fits in with the latter as that’s my day job. At least I feel a lot more informed now than I did a week ago.

 

[1] My client has their own pure, functional, dynamic language which is used for much of the company’s reporting, scripting, and general gluing together of business workflows.

 

Saturday, 5 April 2025

A Decade of Lightning Talks with Programming One-Liners

Note: This blog post follows the modern recipe style – a load of unimportant background history before presenting the real content. So, just hit page-down a few times if that’s what you came here for, I won’t be offended.




Just over 10 years ago, fellow ACCU member Jez Higgins replied to another one of my programming one-liners on Twitter with a flippant remark that I probably had enough material now for an entire conference talk…

Lightning Talks

The ACCU conference generally hosts 90 minute talks. Of course, there was no way anybody would sit through that length of session, even if I had the relevant amount of “quality” programming puns, and if the conference committee were crazy enough to accept it as a serious proposal.

However, another popular feature of the ACCU conference were the lightning talks, which were only 5 minutes long and held at the end of each day. The barrier to entry there was considerably lower too as they could cover virtually any topic. Also, I’d already given a few of these in the past, all with programming related humour being the general theme, e.g.

2010: Recycle Bin 101 – a Room 101 affair featuring some of Microsoft’s tools which refused to die gracefully back then, such as Visual C++ 6, Visual SourceSafe, and IE 6.

2012: Not Only, But Also – a riff on the #NOSQL, and #NOESTIMATES movements featuring my own variation: #NOSHIT.

2014: The Art of Code – a selection of code snippets framed as art, along with humorous titles. (This was inspired by Chopin’s Waterloo at the Pompidou Centre in Paris.)

2014: Requiem for Windows XP – a recital of Rutger Hauer’s final words in Bladerunner followed by the shutdown screen for Windows XP to mark its demise by Microsoft as it had finally reached its official end of life.

An Actual Comedy Routine

With Jez’s tweet in mind I decided that for 2015 I would go for a more formal stand-up comedy routine and read out what I hoped were some of my “best” and most appropriate puns for the ACCU conference demographic, which are typically highly technical programmers, largely from the C++ community, and mostly of a certain “vintage”. With the rise of the agile movement in full swing at the time, “The Daily Stand-Up” seemed like the perfect play on words to use for the title. The set of 30 or so puns were delivered in a deadpan style to an audience of 300+ attendees, and the reaction was such that I considered it a success and definitely worthy of further attempts in the future.

Over subsequent years I “performed” similar routines at Agile of the Beach, the Equal Experts Christmas Party, Agile in the City Birmingham, NorDevCon, my Spektrix leaving do, and most recently, the D language conference. While they were mostly only an occasional gig or two, the lightning talks at the ACCU conference became a more regular fixture (aside from the Covid years when it wasn’t run / was purely online).

10 Years Later

And so here I am, a decade later, at the ACCU 2025 conference with another mixture of programming puns which features some old favourites, more recent material, and even some as-yet-unpublished (i.e. not tweeted, yet) material. So, if you missed out on the “live performance”, or are simply a glutton for punishment, then you can relive the experience (groans and winces entirely optional) by reading these to yourself. Enjoy!?

“The background AI in my text editor just asked me: are they paying you to write this crap? I thought that was a bit harsh, but then remembered I’d enabled dark mode.”

“I’ve started creating a typeface where each glyph is a different page from Wikipedia. It’s going to be the font of all knowledge.”

“If Bitcoin is value realized as ones and zeroes, does that effectively make it Gold Boolean?”

“Have you noticed that you never get a simple yes/no answer from people in the Tri-State area?”

“If I join the frontend channel, backend channel, database channel, and operations channel, does that make me a Full Slack Developer?”

“Now I do more working from home I like to get my kids to help me with my coding. I call it au-pair programming.”

“The trouble with pair programming in C++ is that your codebase becomes littered with .first and .second.”

“Our company is a big fan of the Pimpl Idiom. They like to hire spotty teenagers that vibe code C++.”

“I’ve always felt algorithms that round down to the nearest integer are inherently floored.”

“When first learning C++ I was given advice that I should ‘do as the ints do’. So now I make sure my code behaves slightly differently on different platforms.”

“Is a Polyglote a programmer that boasts about how many programming languages they know?”

“I once asked the Enterprise Architect if there were any non-function requirements? He said: yes, you can’t use Lisp, Haskell, F#, Elm, …”

“He clearly favoured an object oriented approach –anything we wanted to try that was different, he’d simply object.”

“I wonder if Google have postponed the next major version of their popular programming language after Dijkstra asserted that Go 2 is considered harmful?”

“I was really excited to discover Tony Hoare and Dennis Ritchie were giving a talk about two of their most influential contributions to Computer Science, but quickly became disappointed when my tickets arrived and said ‘Null & Void’.”

“People who are PRINCE certified just want to run projects like it’s 1999.”

“I recently tried to use an LLM to write some code to produce a digital certificate, but it just made a hash of it.”

“The infosec team asked us if we regularly rotate our keys. I told them: yes, we often do that by passing them through ROT-13.”

“I once debated with Alonzo, the creator of Lambda Calculous, about whether lambdas and closures were the same thing. He said ‘No!’, and that’s when I discovered there was a clear separation between Church and state.”

“We’re really regretting hiring this guy Sisyphus as a DBA, everything he does just gets rolled back!”

“In the battle between relational and document oriented databases, I think it was when SQL introduced the PIVOT function that the tables were turned.”

“We recently recruited a prompt engineer. It’s great that he’s in the office every day at 9am sharp, but then he spends the rest of the day tweaking the PS1 variable in his Bash config.”

“Have you ever noticed that off-by-one errors know no bounds?”

“The Wildlife Trust has deemed our codebase a conservation area on account of the number of bugs.”

“Our product owner asked why our acceptance tests only cover the happy path. I told him: it’s because they’re rose-tinted specs.”

“The Marketing department said that as a company we needed to be more disruptive, so I dropped the production database and deleted all the source code.”

“The other day I passed a chap typing fast and furiously to try and quit his text editor. I think he was called Vim Diesel.”

“I reckon my colleagues only find my jokes amusing when I’m in the office. At least, I think that’s what they meant when they said: you’re not remotely funny.”

Thursday, 20 February 2025

How Do I Test This?

I’ve no idea where I first came across the following premise for when starting a new code change:

“First ask yourself, how do I test this?”

I had assumed it was Steve Maguire’s Writing Solid Code or Debugging the Development Process as these are two of the earliest books I read as a professional developer. He has some excellent advice in “step through your code in the debugger” and “how could I have prevented / automatically detected this”. But it wasn’t either of those. No matter, I just like to remember my sources of inspiration.

Naturally, one shouldn’t take this advice too literally as there are other questions that really need answering first, such as whether this feature is even needed. The point is that once writing code is part of the solution, then thinking about how you’re going to test it should be right up there on your list. One behaviour that is common in those just starting out in their programmer journey is the desire to jump right in and start writing code without any thought for how they are going to answer the eventual question “does it work?”. And, by extension, “does it solve the customer’s problem?”

For a bug-fix, that might seem obvious, if the bug is easy to reproduce, though there is still the follow-up question of “have I broken anything else by accident?” Discovering up-front what you might break can have a sobering impact on your approach.

For new features the question can be much harder to answer, especially if the change is buried deep inside some complex system. It’s all very well being able to change some code and assume it works because the dry-run in your head suggests it will, but it’s another thing to be able to show that it works by running it in an observable way. This is doubly true when you factor in that famous quote from Edsger Dijkstra:

“Testing can prove the presence of bugs, but not their absence!”

Design for Testability

What both of these quotes try and convey is that testing requires a change in mindset, it’s not just some activity you throw in when it’s “code complete”. I’d hope in this modern era of software development that we’d all recognise the folly of believing that you can just bash out some code in a non-trivial codebase and assume it’ll work first time. Or even that you’ll obviously be able to see all the potential edge cases up front. (The average developer only spends 2-3 years in a role so we’re constantly putting ourselves back in the position of “new joiner”.)

To be clear, I’m not specifically talking about TDD (Test Driven Development) here, at least, not in the formal unit test / Red-Green-Refactor sense. I’m talking about it in a more general sense of how thinking about how you’re going to test something will affect how you approach the task at hand.

Plenty of what I do on a day-to-day basis does not have any formal automated tests produced as a by-product of it. For example, I am often heavily involved in the build, deployment, and support side of software delivery as much as writing code for the main product. While the latter will have a barrage of automated tests to help validate every change, the former often relies on an amount of manual testing. That’s because the cost benefit of introducing automated testing to a bunch of ad-hoc Bash glue scripts which rarely change is typically very high in comparison to the cost of testing them manually.

However, what makes the cost of testing them manually much lower is not the inherent nature of them, but the mindset used in how they were written – they were designed with testability in mind. What that means in practice is that instead of writing only the bare minimum to solve the problem at hand, some additional effort is made to allow it to be tested safely and quickly, both now and when future changes are made. It’s just another trade-off we have to be cognisant of and weigh up whether it’s worth the extra effort up-front – is being easier to test worth it? If so, how much easier is enough? (There is always an XKCD, and Is it Worth the Time along with Automation are well worth the time revisiting every now and then to remind yourself of the potential costs and pay-off, and delusions of grandeur.)

Risk / Confidence

When I say “easier” you shouldn’t just assume I mean “quicker” either. Testing is about risk, and developing confidence. The goal is to deliver the highest level of confidence with the lowest amount of effort. Printing out a log message saying that you would have deleted a particular file gives you confidence that you’d have deleted the correct file, but can’t point out that the entire process aborted because another process typically has it locked and now you might need to add logic to cater for retries and partial success. The principle of YAGNI (You ‘Aint Gonna Need It) is a strong force for reining in purely speculative requirements but can also be weaponised as an excuse for why anything outside the happy path was ignored.

My point here is that you consciously consider how you’re going to gain confidence and weigh up the options. Maybe when staring at that log message you’ll ask yourself “what would happen if that file is locked?” and consequently you go and adjust the code to allow you to target a different folder where you can create a locked file and test your hypothesis. Is that a temporary hack or do you introduce a command line switch / config setting [1]? If you’re going to keep hacking the code every time you test new changes then you run a higher risk of committing that hack by accident and breaking production [2].

Scope

We could very easily expand that original phrase “test this” to the more specific “test what happens when it fails”.  Only testing the happy paths is such a common malaise, although I suspect it is largely borne out of naivety rather than short-sightedness. The cost of debugging code that fails in a horrible way is huge, and frequently occurs as the result of a production incident when time is of the essence. Taking the time up-front to think about failure scenarios and test how they will manifest will really help reduce stress levels when the inevitable happens. (Stack Traces are a crutch we lean on far too often when a good error message could provide better value.)

Another tweak would be to replace the “I” with “we” in the original proposition. The person who finds the code difficult to test in future might still be you, or it might be a colleague. I don’t particularly like that analogy of writing code as if some psychopath will hunt you down later, and prefer instead to try and promote “paying it forward” as simply the right thing to do. The ultimate outcome is a healthier team, and codebase, and a faster, more sustainable delivery pace. Doing it right actually benefits the business in the long run.

Always be Testing

The question “how do I test this?” doesn’t necessarily stop being asked once the code has been delivered either. As one of my earlier posts (Validate in Production) shows, Dijkstra was right, and we need to be more pessimistic about our ability to get things right, first time, every time. Remember though there is a balance here. If you take Dijkstra at his word then you’ll never deliver anything as you’ll spend your entire time trying prove the absence of any bugs. (Formal Methods might be the exception here but I have no experience of that, or know anybody that has.)

It’s also only a short hop from that question to a closely related one – “how do I support this?” Ideally any separate support or operations team are already recognised as stakeholders in the system and their interests are explicitly factored in [3].

Taking this yet another step further leads you into the whole world of observability and ultimately around testing the business hypothesis of whether building that new feature delivered the benefits expected. Having that mindset where you can work backwards from “how to test the business impact” has yet more benefits that can be reaped.

Remember Overthinking is not Overengineering.

 

[1] See Testing Drives the Need for Flexible Configuration for further thoughts on that approach.

[2] This is one of the many things in my mental Commit Checklist which has come from making these mistakes myself in the past.

[3] Both Support-Friendly Tooling and the much older From Test Harness To Support Tool look at this angle.