Showing posts with label rants. Show all posts
Showing posts with label rants. Show all posts

Thursday, 28 November 2024

Using CoPilot-Like Tools is Not Pairing

Ever since the rise of LLM based tools like ChatGPT and CoPilot there have been quips made about how great it makes as a pairing partner. The joke is because these tools are passive and won’t trash your code, or disagree with your approach, or smell of garlic.

I get the joke.

The problem for me is that these jokes are suggesting that using tools like CoPilot are akin to pairing, but they’re not. And, as a consequence, they are continuing to perpetuate misconceptions about what pairing is really about. This topic came up one Tuesday at a Cambridge Software Crafters meet-up when we were discussing mentoring & coaching and how to adopt pairing in companies that are hostile towards it, likely for these very reasons.

(If you’re after a longer read on this subject I can offer you my 2018 C Vu article To Mob, Pair, or Fly Solo, or if you’re a video kind of person then you might want to watch the talky version I gave at the ACCU 2024 Conference, also titled Mob, Pair, or Fly Solo.)

Tools like CoPilot are there to help you with the “mechanics” of writing code [1]. They complement the traditional tools like compilers and linters which help avoid a bunch of annoying problems that come with creating software as part of a solution. Your pairing partner may help you avoid these mistakes too, especially if you’re new to the language or tool, but that’s not primarily what they are there for.

Their input should focus on helping you to solve the actual problem. That might be at a design level, such as how best to represent the solution using common idioms or patterns, because they can see where it might be heading. They can also help come up with potential problems or edge cases that need to be factored in, and help decide how and when that might happen in the evolution of the feature you’re working on.

I’ve never paired in the strict navigator-driver sense, it’s always tended to be far more free-flowing, likewise when ensemble programming too. The keyboard tends to go back-and-forth with “type, don’t talk” being the trigger to switch, when it’s not blindingly obvious what my partner(s) are trying to convey.

Hence, in a TDD style loop the other half of the pair might already be thinking about the next test or where the design might be going for the refactoring step after the test passes. Sometimes one half is on a roll and it’s better to maintain momentum, other times both minds need to focus on the exact problem at hand, such as when the test fails unexpectedly. (If you both missed it then that is a smell worth exploring because something is clearly amiss somewhere.)

With modern IDEs and syntax highlighting the typist knows when there is a typo, they don’t need to be told by their partner. However the way they drive the IDE and other tools can often be a point of discussion as there are often many shortcuts and extensions that provide easy wins. High-end tooling often has an ability to try and point out other features you might find useful, and ChatGPT like tools could probably summarise related features in popular tools if asked.

Humans often remember things better when the correction happens at the point of triggering, and what your partner can see that your tooling can’t, is whether or not it’s an appropriate moment to make  a suggestion. There is a time and a place to share knowledge and part of the art of pairing is to read the mood and only interject if it’s likely to be beneficial now, or note it for a retrospective conversation later. It may even resolve itself automatically, later.

Sharing knowledge takes practice. For instance, sharing the right level of detail or techniques and idioms, to avoid overwhelming your partner while they are trying to focus on the problem at hand. The new generation of coding tools might be well versed in programming language details and common patterns but they don’t know anything about your organisation, team, or design principles. They might help with the small picture but they can’t help with the medium and bigger pictures, and that is the bit that makes software delivery hard. That is also the area where your partner should be adding the most value to the exercise.

Programming is so much more than just writing code, and programming in pairs and groups is about active participation to triangulate on a sustainable solution far quicker than working with passive tools and feedback.

 

[1] Maybe one day they’ll be able to consume your entire codebase, along with the VCS history, so that they can tell what’s old and what’s new and make better design suggestions too based on evolving trends. We’ll see…

 

Thursday, 19 July 2018

Delivery Anti-Pattern: Local Optimisations

The daily stand-up mostly went along as usual. I wasn’t entirely sure why there was a stand-up as it wasn’t so much a team as a bunch of people working on the same codebase but with more-or-less individual goals. Applying the microservices analogy to the team you could say we were a bunch of micro-teams – each person largely acting with autonomy rather than striving to work together to achieve a common goal [1].

Time

But I digress, only slightly. What happened at the stand-up was a brief discussion around the delivery of a feature with the suggestion that it should be hidden behind a feature toggle. The implementer explained that they weren’t going to add a feature toggle because “it was a waste of their time”.

This surprised me. Knowing what I do now about how the team operates it isn’t that surprising but at the time I was used to working in teams where every member works towards common goals. One of those common goals is to try and ensure the delivery of features is a continuous flow and is not disrupted by a bad change which then has to be backed out because rolling back has the potential to create all sorts of disruption, not least the delay of those unaffected changes.

You should note that I’m not disagreeing with their choice of whether or not to use a feature toggle – I did not know anywhere near enough about the change or the system at that time to contribute to that decision. No, what disturbed me was the reason why they chose not to take that approach – that their time is more valuable than that of anyone else in the team, or the business for that matter.

In isolation that paints an unpleasant picture of the individual in question and that simply is not the case. However their choice of words, even if done without real consideration, does appear to reinforce the culture that surrounds them. In essence, with a feeling that the focus is on them and their performance, they are naturally going to behave in a way that favours optimising delivering their own features over that of the team at large.

Quality

Another example of favouring a local optimisation over the longer term goal of sustained delivery occurred when I was assigned my first piece of work. This was not so much a story as a couple of epics funded as an entire project (over 4 months solid work in the end). My instinct, after being shown roughly where in the code I needed to dig, was to ask where the existing tests were so that I knew where to add mine. The tech lead’s immediate response was “you won’t have time to write tests”.

My usual response to this statement is a jovially phrased “how will I know if it works then?” which often has the effect of opening a line of dialogue around the testing strategy and where it’s heading. Unfortunately this time around it only succeeded in the tech lead launching into a diatribe about how important delivery was, how much the business trusted us to deliver on time, blah, blah, blah, in fact almost everything that a good test suite enables!

Of course I still went ahead and implemented the entire project TDD-style and easily delivered it on time because I knew the approach was sound and the investment was more than worth it. The subsequent enhancements and repaying of some technical debt also became trivial at that point and meant that anyone, not just me, could safely and quickly make changes to that area of code. It also showed how easy it was to add new tests to cover changes to the older parts of the component when required later.

In the end over 10% of unit tests of the entire system had been written by me during that project for a codebase of probably over 1/2 million lines of code. I also added a command line test harness and a regression testing “framework” [2] in that time too all in an effort to reduce the amount of hoops you needed to go through to diagnose and safely fix any edge cases that showed up later. None of this was rocket science or in the least bit onerous.

Knowledge

I would consider a lack of supporting documentation one further local optimisation too. When only a select few have the knowledge to help support a system you have to continually rely on their help to nurse it through the bad times. This is especially true when the system has enough quirks that the cost of taking the wrong action is quite high (in terms of additional noise). If you need to remember a complex set of conditions and actions you’re going to get it wrong eventually without some form of checklist to work from. Relying on tribal knowledge is a great form of optimisation until core members of the team leave and you unearth the gaping holes in the team’s knowledge.

Better yet, design away the problems entirely, but that’s a different can of worms…

Project Before Product

I believe this was another example of how “projects” are detrimental to the development of a complex system. With the team funded by various projects and those projects being used as a very clear division on the task board through swim lanes [3] it killed the desire to swarm on anything but a production incident because you felt beholden to your specific stakeholders.

For example there were a number of conversations about fixing issues with the system that were slowing down delivery through unreliability that ended with “but who’s going to pay for that?” Although improvements were made they had to be so small as to not really affect the delivery of the project work. Hence the only real choice was to find easier ways to treat the symptoms rather than cure the disease.

Victims of Circumstance

Whenever I bump into this kind of culture my gut instinct is not to assume they are “incompetent” people, on the contrary, they’re clearly intelligent so I’ll assume they are shaped by their environment. Of course we all have our differences, that’s what makes diversity so useful, but we have to remember to stop once in a while and reflect on what we’re doing and question whether it’s still the right approach to take. What works for building Fizz Buzz does not work for a real-time, distributed calculation engine. And even if that approach did work once upon a time the world keeps moving on and so now we might be able to do even better.

 

[1] Pairing was only something you did when you’d already been stuck for some time, and when the mistake was found you went your separate ways again.

[2] I say “framework” because it was really just leveraging a classic technique: a command line tool reading CSV format data which fired requests into a server, the results of which are then diff’d against a known set of results (Golden Master Testing).

[3] The stand-up was originally run in project order, lead by the PM. Unsurprisingly those not involved in the other projects were rarely engaged in the meeting unless it was their turn to speak.

Thursday, 17 May 2018

It Compiles, Ship It!

The method was pretty simple and a fairly bog standard affair, it just attempted to look something up in a map and return the associated result, e.g.

public string LookupName(string key)
{
  string name;

  if (!customers.TryGetValue(key, out name)
    throw new Exception(“Customer not found”);

  return name;
}

The use of an exception here to signal failure implied to me that this really shouldn’t happen in practice unless the data structure is screwed up or some input validation was missed further upstream. Either way you know (from looking at the implementation) that the outcome of calling the method is either the value you’re after or an exception will be thrown.

So I was more than a little surprised when I saw the implementation of the method suddenly change to this:

public string LookupName(string key)
{
  string name;

  if (!customers.TryGetValue(key, out name)
    return null;

  return name;
}

The method no longer threw an exception on failure it now returned a null string reference.

This wouldn’t be quite so surprising if all the call sites that used this method had also been fixed-up to account for this change in behaviour. In fact what initially piqued my interest wasn’t that this method had changed (although we’ll see in a moment that it could have been expressed better) but how the calling logic would have changed.

Wishful Thinking

I always approach a change from a position of uncertainty. I’m invariably wrong or have something to learn, either from a patterns perspective or a business logic one. Hence my initial assumption was that I now needed to think differently about what happens when I need to “lookup a name” and that lookup fails. Where before it was truly exceptional and should never occur in practice (perhaps indicating a bug somewhere else) it’s now more likely and something to be formally considered, and resolving the failure needs to be handled on a case-by-case basis.

Of course that wasn’t the case at all. The method had been changed to return a null reference because it was now an implementation detail of another new method which didn’t want to use catching an exception for flow control. Instead they now simply check for null and act accordingly.

As none of the original call sites had been changed to handle the new semantics a rich exception thrown early had now been traded for (at best) a NullReferenceException later or (worse case) no error at all and an incorrect result calculated based on bad input data [1].

The TryXxx Pattern

Coming back to reality it’s easy to see that what the author really wanted here was another method that allowed them to attempt a lookup on a name, knowing that in their scenario it could possibly fail but that’s okay because they have a back-up plan. In C# this is a very common pattern that looks like this:

public bool TryLookupName(string key, out string name)

Success or failure is indicated by the return value and the result of the lookup returned via the final argument. (Personally I’ve tended to favour using ref over out for the return value [2].)

The Optional Approach

While statically types languages are great at catching all sorts of type related errors at compile time they cannot catch problems when you smuggle optional reference-type values in languages like C# and Java by using a null reference. Any reference-type value in C# can inherently be null and therefore the compiler is at a loss to help you.

JetBrains’ ReSharper has some useful annotations which you can use to help their static analyser point out mistakes or elide unnecessary checks, but you have to add noisy attributes everywhere. However expressing your intent in code is the goal and it’s one valid and very useful approach.

Winding the clock into the future we have the new “optional reference” feature to look forward to in C# (currently in preview). Rather than bury their heads in the sand the C# designers have worked hard to try and right an old wrong and reduce the impact of Sir Tony Hoare’s billion dollar mistake by making null references type unsafe.

In the meantime, and for those of us working with older C# compilers, we still have the ability to invent our own generic Optional<> type that we can use instead. This is something I’ve been dragging into C# codebases for many years (whilst standing on my soapbox [3]) in an effort to tame at least one aspect of complexity. Using one of these would have changed the signature of the method in question to:

public Optional<string> LookupName(string key)

Now all the call sites would have failed to compile and the author would have been forced to address the effects of their change. (If there had been any tests you would have hoped they would have triggered the alarm too.)

Fix the Design, Not the Compiler

Either of these two approaches allows you to “lean on the compiler” and leverage the power of a statically typed language. This is a useful feature to have but only if it’s put to good use and you know where the limitations are in the language.

While I would like to think that people listen to the compiler I often don’t think they hear it [4]. Too often the compiler is treated as something to be placated, or negotiated with. For example if the Optional<string> approach had been taken the call sites would all have failed to compile. However this calling code:

var name = LookupName(key);

...could easily be “fixed” by simply doing this to silence the compiler:

var name = LookupName(key).Value;

For my own Optional<> type we’d just have switched from a possible NullReferenceException on lookup failure to an InvalidOperationException. Granted this is better as we have at least avoided the chance of the null reference silently making its way further down the path but it doesn’t feel like we’ve addressed the underlying problem (if indeed there has even been a change in the way we should treat lookup failures).

Embracing Change

While the Optional<> approach is perhaps more composable the TryXxx pattern is more invasive and that probably has value in itself. Changing the signature and breaking compilation is supposed to put a speed bump in your way so that you consider the effects of your potential actions. In this sense the more invasive the workaround the more you are challenged to solve the underlying tension with the design.

At least that’s the way I like to think about it but I’m afraid I’m probably just being naïve. The reality, I suspect, is that anyone who could make such a change as switching an exception for a null reference is more concerned with getting their change completed rather than stopping to ponder the wider effects of what any compiler might be trying to tell them.

 

[1] See Postel’s Law and  consider how well that worked out for HTML.

[2] See “Out vs Ref For TryXxx Style Methods”.

[3] C# already has a “Nullable” type for optional values so I find it odd that C# developers find the equivalent type for reference-type values so peculiar. Yes it’s not integrated into the language but I find it’s usually a disconnect at the conceptual level, not a syntactic one.

[4] A passing nod to the conversation between Woody Harrelson and Wesley Snipes discussing Jimi Hendrix in White Men Can’t Jump.

Friday, 11 May 2018

The Perils of DateTime.Parse()

The error message was somewhat flummoxing, largely because it was so generic, but also because the data all came from a database extract rather than manual input:

Input string was not in a correct format.

Naturally I looked carefully at all the various decimal and date values as I knew this was the kind of message you get when parsing those kind of values when they’re incorrectly formed, but none of them appeared to be at fault. The DateTime error message is actually slightly different [1] but I’d forgotten that at the time and so I eyeballed the dates as well as decimal values just in case.

Then I remembered that empty string values also caused this error, but lo-and-behold I was not missing any optional decimals or dates in my table either. Time to hit the debugger and see what was going on here [2].

The Plot Thickens

I changed the settings for the FormatException error type to break on throw, sent in my data to the service, and waited for it to trip. It didn’t take long before the debugger fired into life and I could see that the code was trying to parse a decimal value as a double but the string value was “0100/04/01”, i.e. the 1st April in the year 100. WTF!

I immediately went back to my table and checked my data again, aware that a date like this would have stood out a mile first time around, but I was happy to assume that I could have missed it. This time I used some regular expressions just to be sure my eyes were not deceiving me.

The thing was I knew what column the parser thought the value was in but I didn’t entirely trust that I hadn’t mucked up the file structure and added or removed an errant comma in the CSV input file. I didn’t appear to have done that and so the value that appeared to be causing this problem was the decimal number “100.04”, but how?

None of this made any sense and so I decided to debug the client code, right from reading in the CSV data file through to sending it across the wire to the service, to see what was happening. The service was invoked via a fairly simple WCF client assembly and as I stepped into that code I came across a method called NormaliseDate()...

The Mist Clears

What this method did was to attempt to parse the input string value as a date and if it was successful it would rewrite it in an unusual (to me) “universal” format – YYYY/MM/DD [3].

The first two parsing attempts it did were very specific, i.e. it used DateTime.ParseExact() to match the intended output format and the “sane” local time format of DD/MM/YYYY. So far, so good.

However the third and last attempt, for whatever reason, just used DateTime.Parse() in its no-frills form and that was happy to take a decimal number like “100.04” and treat it as a date in the format YYY.MM! At first I wondered if it was treating it as a serial or OLE date of some kind but I think it’s just more liberal in its choice of separators than the author of our method intended [4].

Naturally there are no unit tests for this code or any type of regression test suite that shows what kind of scenarios this method was intended to support. Due to lack of knowledge around deployment and use in the wild of the client library I was forced to pad the values in the input file with trailing zeroes in the short term to workaround the issue, yuck! [5]

JSON Parsers

This isn’t the first time I’ve had a run-in with a date parser. When I was working on REST APIs I always got frustrated by how permissive the JSON parser would be in attempting to coerce a string value into a date (and time). All we ever wanted was to keep it simple and only allow ISO-8601 format timestamps in UTC unless there was a genuine need to support other formats.

Every time I started writing the acceptance tests though for timestamp validation I’d find that I could never quite configure the JSON parser to reject everything but the desired format. In the earlier days of my time with ASP.Net even getting it to stop accepting local times was a struggle and even caused us a problem as we discovered a US/UK date format confusion error which the parser was hiding from us.

In the end we resorted to creating our own Iso8601DateTime type which used the .Net DateTimeOffest type under the covers but effectively allowed us to use our own custom JSON serializer methods to only support the exact format we wanted.

More recently JSON.Net has gotten better at letting you control the format and parsing of dates but it’s still not perfect and there are unit tests in past codebases that show variants that would unexpectedly pass, despite using the strictest settings. I wouldn’t be surprised if our Iso8601DateTime type was still in use as I can only assume everyone else is far less pedantic about the validation of datetimes and those that are have taken a similar route to ensure they control parsing.

A Dangerous Game

One should not lose sight though of the real issue here which the attempt to classify string values by attempting to parse them. Even if you limit yourself to a single locale you might get away with it but when you try and do that across arbitrary locales you’re just asking for trouble.

 

[1] “String was not recognized as a valid DateTime.

[2] This whole fiasco falls squarely in the territory I’ve covered before in my Overload article “Terse Exception Messages”. Fixing this went to the top of my backlog, especially after I discovered it was a problem for our users too.

[3] Why they didn’t just pick THE universal format of ISO-8601 is anyone’s guess.

[4] I still need to go back and read the documentation for this method because it clearly caters for scenarios I just don’t normally see in my normal locale or user base.

[5] That’s what happens with tactical solutions, no one ever quite gets around to documenting anything because they never think it’ll survive for very long...

Tuesday, 16 May 2017

My Dislike of GOPATH

[This post was written in response to a tweet from Matt Aimonetti which asked “did you ever try #golang? If not, can you tell me why? If yes, what was the first blocker/annoyance you encountered?”]

I’m a dabbler in Go [1], by which I mean I know enough to be considered dangerous but not enough to be proficient. I’ve done a number of katas and even paired on some simple tools my team has built in Go. There is so much to like about it that I’ve had cause to prefer looking for 3rd party tools written in it in the faint hope that I might at some point be able to contribute one day. But, every time I pull the source code I end up wasting so much time trying to get the thing to build because Go has its own opinions about how the source is laid out and tools are built that don’t match the way I (or most other tools I’ve used) work.

Hello, World!

Go is really easy to get into, on Windows it’s as simple as:

> choco install golang

This installs the compiler and standard libraries and you’re all ready to get started. The obligatory first program is as simple as doing:

> pushd \Dev
> mkdir HelloWorld
> notepad HelloWorld.go
> go build HelloWorld.go
> HelloWorld

So far, so good. It’s pretty much the same as if you were writing in any other compiled language – create folder, create source file, build code, run it.

Along the way you may have run into some complaint about the variable GOPATH not being set. It’s easily fixed by simply doing:

> set GOPATH=%CD%

You might have bothered to read up on all the brouhaha, got side-tracked and discovered that it’s easy to silence the complaint without any loss of functionality by setting it to point to anywhere. After all the goal early on is just to get your first program built and running not get bogged down in tooling esoterica.

When I reached this point I did a few katas and used a locally installed copy of the excellent multi-language exercise tool cyber-dojo.org to have a play with the test framework and some of the built-in libraries. Using a tool like cyber-dojo meant that GOPATH problems didn’t rear their ugly head again as it was already handled by the tool and the katas only needed standard library stuff.

The first non-kata program my team wrote in Go (which I paired on) was a simple HTTP smoke test tool that also just lives in the same repo as the service it tests. Once again their was nary a whiff of GOPATH issues here either – still simple.

Git Client, But not Quite

The problems eventually started for me when I tried to download a 3rd party tool off the internet and build it [2]. Normally, getting the source code for a tool from an online repository, like GitHub, and then building it is as simple as:

> git clone https://…/tool.git
> pushd tool
> build

Even if there is no Windows build script, with the little you know about Go at this point you’d hope it to be something like this:

> go build

What usually happens now is that you start getting funny errors about not being able to find some code which you can readily deduce is some missing package dependency.

This is the point where you start to haemorrhage time as you seek in vain to fix the missing package dependency without realising that the real mistake you made was right back at the beginning when you used “git clone” instead of “go get”. Not only that but you also forgot that you should have been doing all this inside the “%GOPATH%\src” folder and not in some arbitrary TEMP folder you’d created just to play around in.

The goal was likely to just build & run some 3rd party tool in isolation but that’s not the way Go wants you to see the world.

The Folder as a Sandbox

The most basic form of isolation, and therefore version control, in software development is the humble file-system folder [3]. If you want to monkey with something on the side just make a copy of it in another folder and you know your original is safe. This style of isolation (along with other less favourable forms) is something I’ve written about in depth before in my C Vu In the Toolbox column, see “The Developer’s Sandbox”.

Unfortunately for me this is how I hope (expect) all tools to work out of the box. Interestingly Go is a highly opinionated language (which is a good thing in many cases) that wants you to do all your coding under one folder, identified by the GOPATH variable. The rationale for this is that it reduces friction from versioning problems and helps ensure everyone, and everything, is always using a consistent set of dependencies – ideally the latest.

Tool User, Not Developer

That policy makes sense for Google’s developers working on their company’s tools, but I’m not a Google developer, I’m a just a user of the tool. My goal is to be able to build and run the tool. If there happens to be a simple bug that I can fix, then great, I’d like to do that, but what I do not have the time for is getting bogged down in library versioning issues because the language believes everyone, everywhere should be singing from the same hymn sheet. I’m more used to a world where dependencies move forward at a pace dictated by the author not by the toolchain itself. Things can still move quickly without having to continually live on the bleeding edge.

Improvements

As a Go outsider I can see that the situation is definitely improving. The use of a vendor subtree to house a snapshot of the dependencies in source form makes life much simpler for people like me who just want to use the tool hassle free and dabble occasionally by fixing things here and there.

In the early days when I first ran into this problem I naturally assumed it was my problem and that I just needed to set the GOPATH variable to the root of the repo I had just cloned. I soon learned that this was a fools errand as the repo also has to buy into this and structure their source code accordingly. However a variant of this has got some traction with the gb tool which has (IMHO) got the right idea about isolation but sadly is not the sanctioned approach and so you’re dicing with the potential for future impedance mismatches. Ironically to build and install this tool requires GOPATH to still be working the proper way.

The latest version of Go (1.8) will assume a default location for GOPATH (a “go” folder under your profile) if it’s not set but that does not fix the fundamental issue for me which is that you need to understand that any code you pull down may be somewhere unrelated on your file-system if you don’t understand how all this works.

Embracing GOPATH

Ultimately if I am going to properly embrace Go as a language, and I would like to do more, I know that I need to stop fighting the philosophy and just “get with the programme”. This is hard when you have a couple of decades of inertia to overcome but I’m sure I will eventually. It’s happened enough times now that I know what the warnings signs are and what I need to Google to do it “the Go way”.

Okay, so I don’t like or agree with all of (that I know) the choices the Go language has taken but then I’m always aware of the popular quote by Bjarne Stroustrup:

“There are only two kinds of languages: the ones people complain about and the ones nobody uses.”

This post is but one tiny data point which covers one of the very few complaints I have about Go, and even then it’s just really a bit of early friction that occurs at the start of the journey if you’re not a regular. Still, better that than yet another language nobody uses.

 

[1] Or golang if you want to appease the SEO crowd :o).

[2] It was winrm-cli as I was trying to put together a bug report for Terraform and there was something going wrong with running a Windows script remotely via WinRM.

[3] Let’s put old fashioned technologies like COM to one side for the moment and assume we have learned from past mistakes.

Sunday, 29 May 2016

The Curse of NTLM Based HTTP Proxies

I first crossed swords with an NTLM authenticating proxy around the turn of the millennium when I was working on an internet based real-time trading system. We had chosen to use a Java applet for the client which provided the underlying smarts for the surrounding web page. Although we had a native TCP transport for the highest fidelity connection when it came to putting the public Internet and, more importantly, enterprise networks between the client and the server it all went downhill fast.

It soon became apparent that Internet access inside the large organisations we were dealing with was hugely different to that of the much smaller company I was working at. In fact it took another 6 months to get the service live due to all the workarounds we had to put in place to make the service usable by the companies it was targeted at. I ended up writing a number of different HTTP transports to try and get a working out-of-the-box configuration for our end users as trying to get support from their IT departments was even harder work. On Netscape it used something based on the standard Java URLConnection class, whilst on Internet Explorer it used Microsoft’s J/Direct Java extension to leverage the underlying WinInet API.

This last nasty hack was done specifically to cater for those organisations that put up the most barriers between its users and the Internet, which often turned out to be some kind of proxy which relied on NTLM for authentication. Trying to rig something similar up at our company to develop against wasn’t easy or cheap either. IIRC in the end I managed to get IIS 4 (with MS Proxy Server?) to act as an NTLM proxy so that we had something to work with.

Back then the NTLM protocol was a propriety Microsoft technology with all the connotations that comes with that, i.e. tooling had to be licensed. Hence you didn’t have any off-the-shelf open source offerings to work with and so you had a classic case of vendor lock-in. Essentially the only technologies that could (reliably) work with an NTLM proxy were Microsoft’s own.

In the intervening years the need for both clients (though mostly just the web browser) and servers have required more and more access to the outside world, both for the development of the software itself, and it’s ultimate presence through a move from on-premises to cloud based hosting.

Additionally the NTLM protocol was reverse engineered and tools and libraries started to appear (e.g. Cntlm) that allowed you to work (to a degree) within this constraint. However this appears to have sprung from a need in the non-Microsoft community and so support is essentially the bare minimum to get you out of the building (i.e. manually presenting a username and password).

From a team collaboration point of view tools like Google Docs and GitHub wikis have become more common as we move away from format-heavy content, and Trello for a lightweight approach to managing a backlog. Messaging in the form of Skype, Google Hangouts and Slack also play a wider role as the number of people outside the corporate network, not only due to remote working, but also to bring strategic partners closer to the work itself.

The process of developing software, even for Microsoft’s own platform, has grown to rely heavily on instant internet access in recent years with the rise of The Package Manager. You can’t develop a non-trivial C# service without access to NuGet, or a JavaScript front-end without Node.js and NPM. Even setting up and configuring your developer workstation or Windows server requires access to Chocolatey unless you prefer haemorrhaging time locating, downloading and manually installing tools.

As the need for more machines to access the Internet grows, so the amount of friction in the process also grows as you bump your head against the world of corporate IT. Due to the complexity of their networks, and the various other tight controls they have in place, you’re never quite sure which barrier you’re finding yourself up against this time.

What makes the NTLM proxy issue particularly galling is that many of the tools don’t make it obvious that this scenario is not supported. Consequently you waste significant amounts of time trying to diagnose a problem with a product that will never work anyway. If you run out of patience you may switch tact long before you discover the footnote or blog post that points out the futility of your efforts.

This was brought home once again only recently when we had developed a nice little tool in Go to help with our deployments. The artefacts were stored in an S3 bucket and there is good support for S3 via the AWS Go SDK. After building and testing the tool we then proceeded to try and work our why it didn’t work on the corporate network. Many rabbit holes were investigated, such as double and triple checking the AWS secrets were set correctly, and being read correctly, etc. before we discovered an NTLM proxy was getting in the way. Although there was a Go library that could provide NTLM support we’d have to find a way to make the S3 library use the NTLM one. Even then it turned out not to work seamlessly with whatever ambient credentials the process was running as so pretty much became a non-starter.

We then investigated other options, such as the AWS CLI tools that could we then script, perhaps with PowerShell. More time wasted before again discovering that NTLM proxies are not supported by them. Finally we resorted to using the AWS Tools for PowerShell which we hoped (by virtue of them being built using Microsoft’s own technology) would do the trick. It didn’t work out of the box, but the Set-AWSProxy cmdlet was the magic we needed and it was easy find now we knew what question to ask.

Or so we thought. Once we built and tested the PowerShell based deployment script we proceeded to invoke it via the Jenkins agent and once again it hung and eventually failed. After all that effort the “service account” under which we were trying to perform the deployment did not have rights to access the internet via (yes, you guessed it) the NTLM proxy.

This need to ensure service accounts are correctly configured even for outbound only internet access is not a new problem, I’ve faced it a few times before. And yet every time it shows up it’s never the first thing it think of. Anyone who has ever had to configure Jenkins to talk to private Git repos will know that there are many other sources of problem aside from whether or not you can even access the internet.

Using a device like an authenticating proxy has that command-and-control air about it; it ensures that the workers only access what the company wants them to. The alternate approach which is gaining traction (albeit very slowly) is the notion of Trust & Verify. Instead of assuming the worst you grant more freedom by putting monitoring in place to ensure people don’t abuse their privilege. If security is a concern, and it almost certainly is a very serious one, then you can stick a transparent proxy in between to maintain that balance between allowing people to get work done whilst also still protecting the company from the riskier attack vectors.

The role of the organisation should be to make it easy for people to fall into The Pit of Success. Developers (testers, system administrators, etc.) in particular are different because they constantly bump into technical issues that the (probably somewhat larger) non-technical workforce (that that policies are normally targeted at) do not experience on anywhere near the same scale.

This is of course ground that I’ve covered before in my C Vu article Developer Freedom. But there I lambasted the disruption caused by overly-zealous content filtering, whereas this particular problem is more of a silent killer. At least a content filter is pretty clear on the matter when it denies access – you aren’t having it, end of. In contrast the NTLM authenticating proxy first hides in the ether waiting for you to discover its mere existence and then, when you think you’ve got it sussed, you feel the sucker punch as you unearth the footnote in the product documentation that tells you that your particular network configuration is not unsupported.

In retrospect I’d say that the NTLM proxy is one of the best examples of why having someone from infrastructure in your team is essential to the successful delivery of your product.

Friday, 16 January 2015

Terse Exception Messages - Part II

Quite unintentionally it looks like my post back in October last year “Terse Exception Messages” needs a second part. The initial post was about dealing with terse error messages generated by external code, however I’d forgot that terse exception messages are not just the domain of 3rd party libraries, they can appear in “our code” too. Take this example from some real world production code:

throw new NotFoundException(“Invalid ID”);

Sadly this will likely appear in any diagnostic log file with only the usual thread context information to back it up, e.g.

<date, time, thread, etc.> ERROR: Invalid ID

If you also have a stack trace to work from, that could give you a clue as to what type the ID refers to. You might also be able to hunt around and find an earlier diagnostic message that could give you the value too, but you’re probably going to have work for it.

Clearer Exception Messages

At a bare minimum I would also include the “logical” type and, if possible (and it usually is), then the offending value too. As such what I would have thrown is something more like this:

throw new NotFoundException(“Invalid customer ID ‘{0}’”, id);

This message contains the two key pieces of information I know are available at the throw site and so would generate a more support-friendly message like this:

<date, time, thread, etc.> ERROR: Invalid customer ID ‘1234-5678’

Discerning Empty Values

You’ll notice that the value, which in this case was a string value, is surrounded by quotes. This is important in a message where it might not be obvious that the value is empty. For example if I had not added the quotes and the customer ID was blank (a good indication of what the problem might be) then it would have looked thus:

ERROR: Invalid customer ID

Unless you know intimately what all messages look like, you could easily be mistaken for believing that the error message contains little more than our first example, when in fact the value is there too, albeit invisibly. In the improved case you would see this:

ERROR: Invalid customer ID ‘’

For strings there is another possible source of problems that can be hard to spot when there are no delimiters around the value, and that is leading and trailing whitespace. Even when using a fixed width font for viewing log files it can be tricky to spot a single erroneous extra space - adding the quotes makes that a little easier to see:

ERROR: Invalid customer ID ‘1234-5678 ’

As an aside my personal preference for using single quotes around the value probably stems from many years working with SQL. I’ve lifted values out of log files only to paste them into a variety of SQL queries countless times and so automatically including the quotes seemed a natural thing to do.

Discerning Incorrect Values

It’s probably obvious but another reason to include the value is to discern the difference between a value that is badly formed, and could therefore never work (i.e. a logic error), and one that is only temporarily unusable (i.e. a runtime error). For example, if I saw the following error I would initially assume that the client has passed the wrong value in the wrong argument or that there is an internal logic bug which has caused the value to become “corrupted”:

ERROR: Invalid customer ID ‘10p off baked beans’

If I wanted to get a little more nit-picky about this error message I would stay clear of the word “invalid” as it is a somewhat overloaded term, much like “error”. Unless you have no validation at all for a type, there are usually two kinds of errors you could generate - one during parsing and another during range validation. This is most commonly seen with datetimes where you might only support ISO-8601 formatted values which, once parsed, could be checked to ensure an end date does not precede a start date.

To me the kind of error below would still be a terse message; better than the original one, but could be even better.

ERROR: Invalid datetime ‘01/01/2001 10:12 am’

My preferred term for values that fail “structural validation” was adopted from XML parsing - malformed. A value that parses correctly is considered “well formed” whereas the converse would be “malformed”. Hence I would have thrown:

ERROR: Malformed datetime ‘01/01/2001 10:12 am’

For types like a datetime where there are many conventions I would also try and include the common name for the format or perhaps a template if the format is dynamically configurable:

ERROR: Malformed datetime ‘01/01/2001 10:12 am’ (Only ISO-8601 supported)

ERROR: Malformed datetime ‘01/01/2001 10:12 am’ (Must conform to ‘YYYY-MM-DD HH:MM:SS’)

Once a value has passed structural validation if it then fails “semantic validation” I would report that using an appropriate term, such as “out of range” or explicitly describe the comparison. With semantic validation you often know more about the role the value is playing (e.g. it’s a ‘start’ or ‘end’ date) and so you can be more explicit in your message about what the problem is:

ERROR: The end date ‘2010-01-01’ precedes the start date ‘2014-02-03’

Custom Parsing Methods

As we saw back in my original post the language framework parsing methods are often horribly terse, even when failing to parse a simple value like an integer. The reason is that, like all general purpose functions, there is a balance between performance, security, etc. and so that means including the value in the exception might be undesirable in some classes of application.

However in the systems on which I work this is not a problem and I’d happily trade-off better diagnostics for whatever little extra garbage this might produce, especially given that it’s only in (hopefully) exceptional code paths.

My solution to this problem is the proverbial “extra level of indirection” and so I wrap the underlying TryParse() framework methods with something that produces a more satisfying diagnostic. In C# extension methods are a wonderful mechanism for doing that. For example I’ve wrapped the standard configuration mechanism with little methods that read a string value, try to parse it, and if fails then include the setting name and value in the message. Consequently instead of the usual terse FormatException affair you get when parsing an int with code like this:

int.Parse(settings[“timeoutInMs”]);

You invoke an extension method like so:

settings.ReadInt(“timeoutInMs”);

If this fails you’ll get a much more pleasing error message along these lines:

ERROR: The ‘timeoutInMs’ configuration setting value ‘3.1415’ was not a well formed integer value

Formatting Exceptions in C#

Whenever I create a custom exception type in C# I generally add a variety of overloads so that you can create one directly with a message format and selection of arguments without forcing the caller to use String.Format(). This is how the earlier example worked:

throw new NotFoundException(“Invalid customer ID ‘{0}’”, id);

All the class needs for this kind of use is a signature akin to String.Format’s:

public NotFoundException(string format,
                         params object[] args)
    : base(String.Format(format, args))
{ }

However this is not enough by itself, it’s dangerous. If we pass a raw message that happens to contain formatting instructions but no arguments (e.g. “Invalid list {0, 1, 2}”) it will throw during the internal call to String.Format(), so we need to add a simple string overload as well:

public NotFoundException(string message)
    : base(message)
{ }

As an aside I never add a public default constructor because I’m not convinced you should be allowed to throw an exception without providing some further information.

Due to the framework classes not having a varargs style ctor you might already be happy putting formatting calls in the callee, either with String.Format() directly or via a simple extension method like FormatWith():

throw new NotFoundException(String.Format(“Invalid customer ID ‘{0}’”, id));

throw new NotFoundException(“Invalid customer ID ‘{0}’”.FormatWith(id));

One scenario where the varargs style falls down in C# is when you start adding overloads to capture any inner exception. You can’t just add it on the end of the signature due to the final params-based args parameter, so you have to add it to the beginning instead:

public NotFoundException(Exception inner,
                         string format,
                         params object[] args)
    : base(String.Format(format, args), inner)
{ }

Sadly, this is the exact opposite of what the framework does, it adds them on the end.

In my experience very little effort is put in by developers to simulate faults and validate that exception handling is behaviour correctly, e.g. verifying what is reported and how. Consequently this makes the varargs overload potentially dangerous as adding an exception argument on the end by following the framework style would cause it to be silently swallowed as no placeholder would reference it. This has never been a problem for me in practice because I always throw custom types and always try and format a string, but that’s easy when you’re the one who discovers and instigates the pattern - others are more likely to follow the pattern they’re used to, which is probably how the framework does it.

Testing Error Messages

At the tail end of the previous section I touched briefly on what I think is at the heart of why (server-side) error messages are often so poor - a lack of testing to make sure that they satisfy the main “requirement”, which is to be helpful in diagnosing a problem. When writing unit tests for error scenarios it’s all too common to see something like this:

Assert.That(() => o.DoIt(. . .), Throws.Exception);

This verifies and documents very little about what will is expected to happen. In fact if “o” is a null reference the test will still pass and not even invoke the behaviour we want to exercise! Our expectations should be higher; but of course at the same time we don’t want to over specify and make the test brittle.

If a caller is going to attempt recovery of the failure then they need to know what the type of the exception is, lest they be forced to resort to “message scraping” to infer its type. Consequentially it’s beneficial to verify that the exception is of at least a certain type (i.e. it may also be a derived type). Where an obvious value, such as an input argument, is a factor in the error we can loosely test that the value is contained within the message somewhere too:

const string id = “malformed-value”;

Assert.That(() => o.DoIt(id),
            Throws.InstanceOf<ValidationException>()
                  .And.Message.Contains(is));

If there are many arguments to be validated it is important to try and match on the message too, or some other attribute, to discern which of the many arguments caused the exception to be thrown. It’s all to easy when making a change to a method that you end up finding one or more of your tests are passing because they are detecting a different error to the one they were intended to, as the null reference example above highlighted.

Whilst that takes care of our code-based contractual obligations we also have an obligation to those who will be consuming these messages to make sure they form a useful narrative within any diagnostic output. I tackled this side of the issue some years ago in “Diagnostic & Support User Interfaces” and so if you’ve experienced a sense of déjà vu reading this post then that may be a contributing factor.

Saturday, 15 November 2014

The Hardware Cycle

I’m writing this on a Windows XP netbook that is now well out of extended life support as far as Microsoft is concerned. I’m still happily using Microsoft Office 2002 to read and send email and the same version of Word to write my articles. This blog post is being written with Windows Live Writer which is considerably newer (2009) but still more than adequately does the job. At Agile on the Beach 2014 back in September, I give my Test-Driven SQL talk and coding demo on my 7 year old Dell laptop, which also runs XP (and SQL Server Express 2008). But all that technology is pretty new compared to what else we own.

I live in a house that is well over 150 years old. My turntable, CD player, amplifier, speakers etc. all date from the late 80’s and early 90’s. My first wide-screen TV, a 30 inch Sony beast from the mid 90’s is still going strong as the output for our various games consoles - GameCube, Dreamcast, Atari 2600, etc. Even our car, a 7-seater MPV that lugs a family of 6 around constantly, has just had its 8th birthday and that actually has parts that suffer from real wear-and-tear. Our last washing machine and tumble dryer, two other devices which are in constant use in a big family, also lasted over 10 years before recently giving up the ghost.

Of course we do own newer things; we have to. My wife has a laptop for her work which runs Windows 8 and the kids watch Netflix on the iPad and Virgin Media Tivo box. And yet I know that all those things will still be around for some considerable time, unlike our mobile phones. Yes, I do still have a Nokia 6020i in my possession for those occasional camping trips where electricity is scarce and a battery that lasts 10 days on standby is most welcome. No, it’s the smart phones which just don’t seem to last, we appear to have acquired a drawer full of phones (and incompatible chargers).

My own HTC Desire S is now just over 3 years old. It was fine for the first year or so but slowly, over time, each app update sucks more storage space so that you have to start removing other apps to make room for the ones you want to keep running. And the apps you do keep running get more and more sluggish over time as the new features, presumably aimed at the newer phones, cause the device to grind. Support for the phone’s OS only seemed to last 2 years at the most (Android 2.3.5). My eldest daughter’s HTC Wildfire which is of a similar age is all but useless now.

As a professional programmer I feel obligated to be using the latest kit and tools, and yet as I get older everywhere I look I just see more Silver Bullets and realise that the biggest barrier to me delivering is not the tools or the technology, but the people - it’s not knowing “how” to build, but knowing “what” to build that’s hard. For the first 5 years or so of my career I knew what each new Intel CPU offered, what sort of RAM was best, and then came the jaw-dropping 3D video cards. As a gamer Tom’s Hardware Guide was a prominent influence on my life.

Now that I’m married with 4 children my priorities have naturally changed. I am beginning to become more aware of the sustainability issues around the constant need to upgrade software and hardware, and the general declining attitude towards “fixing things” that modern society has. Sadly WD40 and Gaffer tape cannot fix most things today and somehow it’s become cheaper to dump stuff and buy a new one than to fix the old one. The second-hand dishwasher we inherited a few years back started leaking recently and the callout charge alone, just to see if it might be a trivial problem or a lost cause, was more than buying a new one.

In contrast though the likes of YouTube and 3D printers has put some of this power back into the hands of consumers. A few years back I came home from work to find my wife’s head buried in the oven. The cooker was broken and my wife found a video on YouTube for how to replace the heating element for the exact model we had. So she decided to buy the spare part and do it herself. It took her a little longer than the expert in the 10 minute video, but she was beaming with a real sense of accomplishment at having fixed it herself, and we saved quite a few quid in the process.

I consider this aversion to filling up landfills or dumping electronics on poorer countries one of the better attributes I inherited from my father [1]. He was well ahead of the game when it came to recycling; as I suspect many of that generation who lived through the Second World War are. We still have brown paper bags he used to keep seeds in that date back to the 1970’s, where each year he would write the yield for the crop (he owned an allotment) and then reuse the same bags the following year. The scrap paper we scribbled on as children was waste paper from his office. The front wall of the house where he grew up as a child was built by him using spoiled bricks that he collected from a builders merchant on his way home from school. I’m not in that league, but I certainly try hard to question our family’s behaviour and try to minimise any waste.

I’m sure some economist out there would no doubt point out that keeping an old house, car, electronics, etc. is actually worse for the environment because they are less efficient than the newer models. When was the last time you went a week before charging your mobile phone? For me there is an air of irrelevance to the argument about the overall carbon footprint of these approaches, it’s more about the general attitude of being in such a disposable society. I’m sure one day mobile phone technology will plateau, just as the desktop PC has [2], but I doubt I’ll be going 7 days between charges ever again.

 

[1] See “So Long and Thanks For All the Onions”.

[2] Our current home PC, which was originally bought to replace a 7 year old monster that wasn’t up to playing Supreme Commander, is still going strong 6 years later. It has 4 cores, 4 GB RAM, a decent 3D video card and runs 32-bit Vista. It still suffices for all the games the older kids can throw at it, which is pretty much the latest fad on Steam. The younger ones are more interested in Minecraft or the iPad.

Wednesday, 12 November 2014

UnitOfWorkXxxServiceFacadeDecorator

I’ve seen people in the Java community joke about class names that try and encompass as many design patterns as possible in a single name, but I never actually believed developers really did that. This blog post stands as a testament to the fact that they do. The title of this blog post is the name of a class I’ve just seen. And then proudly deleted, along with a bunch of other classes with equally vacant names.

I’ll leave the rest of this diatribe to The Codist (aka Andrew Wulf) who captured this very nicely in “I’m sick of GOF Design Patterns” [1].

 

[1] Technically speaking the Unit of Work design pattern is from Fowler’s Patterns of Enterprise Application Architecture (PoEAA).

Tuesday, 21 October 2014

So Many Wrongs, But No Rights

As we approach Christmas, that wonderful time of year for giving and receiving, we also approach that annual corporate ritual that is the Change Freeze. It’s also a time when I get to remember what is probably the worst software release I’ve had the misfortune to be involved in...

The Baroque Import Process

Like all successful systems it had been grown organically from a walking skeleton. Most of the codebase had decent test coverage and we had a fairly good build pipeline going out to a development instance that ran as close to production as possible. Most functional problems showed up in development and the UAT environment highlighted anything else non-environmental. We even had unit tests for most of our SQL code!

However, we also had some big chunks of technical debt [1] too. One particular stored procedure was now a behemoth (a many-hundred line monster) and had been developed without any test coverage. By the time this was recognised it had already grown massively by cut-and-paste and naturally nobody wanted to go back and write tests for it, and there was no drive from management to tackle this particular piece of debt either [2]. The other related data was not handled by this particular procedure but another complex maze of procedures and views. There was some “token” test coverage here, but not around what will transpire below.

These procedures were used to handle the versioning of an upstream feed. Very little data changed day-to-day so we used a versioning strategy that involved comparing yesterday’s and today’s data and only creating new records for updated entities. Another table then tied together which version should be used for which business date.

The Replacement Process

These procedures had managed to allow us to go live on time and had lived for a year in production, but eventually parts of the manual process were going to be automated upstream and so finally we got the go ahead to replace the major parts of this Heath Robinson process with something that lived outside the database and could provide us with better validation and diagnostics. This was developed in the lead up to Christmas and was looking good to be scheduled for release just before the change freeze kicked in.

The Change Freeze

This company, like many others, tries to mitigate the risk of problems escalating when no one is around to fix them during the festive period by putting a halt on all non-critical changes. It doesn’t matter whether you have been delivering continuously for over 12 months without a single cock-up, you still can’t deploy any changes unless they are to directly fix a priority one production incident.

The Last Minute Data Update

With just a week or so to go before the change freeze the business decided they needed to tweak some key data so that it would be in for the year end runs. This data was “manually calibrated” by them and very rarely changed. So we restored UAT to match production, ran in the new data and waited for the results. Most of the results were fine but as we looked a little closer we discovered that the report contained some entities where the data change had appeared to take effect, but the calculation result had not changed too.

Given the number of problems that had already shown up in the reporting component it seemed entirely feasible that another one had crept in somehow, but digging deeper proved that wasn’t entirely the case. The reporting code was known to be dubious for other reasons, but it didn’t explain why this particular data change had not had an effect everywhere it was expected.

Development != UAT

We also applied the data change to the development environment where the new release had been chugging along quite nicely. But the following morning the development system disagreed with UAT too. It looked like there might be a bug in the replacement process as well. Given how close we were to going live with this larger change we quickly dived in to see what had broken and whether we could also fix this before the freeze kicked in.

The Bug Unearthed

It turned out the new code was correct, the problem was actually in the old code, the code currently in production. The bug was in the versioning process where it failed to detect a new entity version if this one piece of datum that the business had wanted to change was the only change in the entity’s data. As a consequence the calculations had been done using the old value. Oops.

The One-Line Fix

The fix was trivial. All we needed to do was add an extra predicate in one of the select statements to ensure that the comparison was done against the latest version and not every prior version:

AND thing.Version = thing.LatestVersion

We knew the fix was this simple, and that it would work without even testing it...

The Irony

How did we know? Because we’d seen it before. Nope, it was better than that, the person who made this mistake had seen it before and even raised a bug request for the previous problem some months earlier.

Now It Gets Ugly

Not only do we have a new bug, but the results of the test run also highlights a longstanding problem with the reporting code. This code, which was bashed out in isolation [3], pulled in data from all over the place. In particular, instead of reporting data from the dated snapshot tables which is what the calculations use, it pulls them from the staging tables where the input feed is kept.

It might sound like these dated snapshot tables are in the “sanitised data” schema, they’re not. Both the unprocessed input feed and the dated snapshot live in the same table in the staging schema. The column where the value should have gone was never populated. No, that’s not true, it was populated, and versioned correctly, but with a different value that was never used.

Consequently what was reported was the value provided by the business, but that value never made its way into the set used for number crunching, hence the disparity. Sadly two wrongs did not make it right in this case.

Now It Gets Political

In any “sane” project I’d hope we could just hold up our hands and admit we have a bug, mention that we already have a fix for it ready to go and then discuss how best to get it deployed and move on the next most important thing. If the result of the discussion was that we would have to wait, that they could live with the bug, then so be it; at least we are being transparent. But that’s probably why I’d never make a “good” manager.

Instead we tried to find ways to disguise the bug. We couldn’t deploy the fix because so many other calculations that have also been working with incorrect data would come to light. We couldn’t deploy our new release because the replacement component didn’t have the bug either. This left us with somehow tweaking other data so that it would force a version change to occur, e.g. adding an extra space in a text field that was unused. The final option was stalling until the change freeze ended and we could hopefully bury bad news by folding the supposedly technical only release [4] in with a data release and the bug would get lost in the noise.

The Dark Cloud Arrives

With the change freeze over (6 weeks later) we had quite a backlog of changes. The frustration of the change freeze meant that the business were happy to lump together our new release alongside some other 3rd party and data changes. Our opportunity to bury bad news had arrived and the release was pushed out, the numbers all moved about, people murmured that things moved more than expected, but it quickly went quiet again. Normality resumed.

Epilogue

I don’t think I’ll ever understand why we couldn’t just hold up our hand and admit we’d made a mistake. Compared to many other projects around us we were a beacon of success as we had originally delivered something on time and under budget; although it may not have had all the bells and whistles originally planned. In the 12 months after going into production we had delivered at regular intervals measured in weeks, not months, and had not once had an outage caused by something our team had done. In contrast we had to push back a number of times on the 3rd party components provided to us because they didn’t entirely do what was expected of them - we became their regression test suite! I would hope that kind of delivery record would have afforded us the right to mess up slightly once in a while, but perhaps not. Maybe what trust we had built up was actually worth nothing.

They say confession is good for the soul. This is mine.

 

[1] Technical Debt is really about shortcuts taken for expediency, not crap code. However the crap code was quickly identified and a decision was made to live with it, by which I mean no decision was made to do anything about it up front. Is that now a conscious decision which means it becomes technical debt?

[2] Whilst I agree in principle that refactoring should be a by-product of any story, sometimes the debt grows into something better served by an architectural refactoring.

[3] This one component has provided inspiration for a number of other blog posts, e.g. this one in particular “The Cost of Defensive Programming”.

[4] We tried to avoid mixing purely technical changes, e.g. architectural refactorings and upcoming features (toggled off), with changes that affected the calculation results. This was to ensure our regression testing had virtually no noise. Packaging a number-breaking change in isolation was also a defence mechanism that allowed us to squarely point the finger outside our team when it went wrong. And it did, on numerous occasions.

Thursday, 22 May 2014

Promoting Functional Programming - A Missed Opportunity

One of the blog posts that seemed to get a fair bit of attention on Twitter a little while back was “Functional programming is a ghetto” by Michael O. Church. Whilst I really enjoyed the first 6-7 paragraphs, the rest just raised my ire. This post, for me, was a classic violation of the Separation of Concerns, as applied to writing. But more than that I felt the author had shot themselves in the foot by missing out on a great opportunity to help promote the virtues of functional programming techniques.

If you’ve read the post you’ll see that the early paragraphs are about how they apply aspects of functional programming alongside other paradigms to try and get the best of all worlds. But then for some reason he’s starts talking about how IDE’s suck and that the command-line is the tool of choice for functional programmers. Sorry, but what has tooling got to do with language paradigms?

So, instead of staying focused and finishing early on a high note and therefore providing us with some great material that we can promote to others, I find myself opposing it. I want to promote good articles about functional programming but not at the expense of myself appearing to side with some Ivory Tower attitude. I once tried to pass on the post but found myself having explain why they should ignore the drivel after paragraph 7.

Hopefully, they’ll refactor the blog post and siphon off the rant so that the good content is left to stand on its own.

Wednesday, 23 April 2014

Terminology Overdose

I’ve never been a fan of dependency injection frameworks, but what really bugs me most is the use of the term “injection”. It grates on me when I hear programmers talking about using “constructor injection” or “setter injection”. Sorry, but what you’re talking about is “passing parameters”, or if you prefer “passing arguments”. It’s bad enough that we already have two terms for the things we provide to a function to vary how it behaves, why do we need a third?

Just think about it for a moment, what exactly does “injection” mean? It’s a finely-controlled way to force a substance into some other material. Does passing a parameter sound like anything out of the ordinary? No. Now, if the framework used reflection to set a private member variable because there was no natural way to do it, then I’d be happier with the term. But only for that use case. Maybe that’s how these frameworks started out, but it sure isn’t what seems to be going on these days.

So let’s be clear, designing a class that depends on another which is typed via an interface rather than a concrete class is not anything new. It’s just the age old advice about Programming to an Interface, Not an Implementation, but taken quite literally. The fact that you pass the dependent object via the constructor [1] is just Programming 101, it doesn’t need any sort of fanfare.

[1] Although you could use a setter, don’t.

Tuesday, 22 November 2011

Cookbook Style Programming

Changing a tyre on a car is reasonably simple. In fact if I wasn’t sure how to do it I could probably find a video on YouTube to help me work through it. Does that now make me a mechanic though? Clearly not. If I follow the right example I might even do it safely and have many happy hours of motoring whilst I get my tyre fixed. But, if I follow an overly simplistic tutorial I might not put the spare on properly and cause an imbalance, which, at best will cause unnecessary wear on the tyre and at worst an accident. Either way the time lapse between cause and effect could be considerable.

If you’re already thinking this is a rehash of my earlier post “The Dying Art of RTFM”, it’s not intended to be. In some respects it’s the middle ground between fumbling in the dark and conscious incompetence...

The Crowbar Effect

The StackOverflow web site has provided us with a means to search for answers to many reoccurring problems, and that’s A Good Thing. But is it also fostering a culture that believes the answer to every problem can be achieved just by stitching together bunches of answers? If the answer to your question always has a green tick and lots of votes then surely it’s backed by the gurus and so is AAA grade; what can possibly be wrong with advice like that?

while (!finished)
{
  var answer = Google_problem(); 
  answer.Paste();
}

At what point can you not take an answer at face value and instead need to delve into it to understand what you’re really getting yourself into?

For example, say I need to do some database type stuff and although I’ve dabbled a bit and have a basic understanding of tables and queries I don’t have any experience with an ORM (Object Relational Mapper). From what I keep reading an ORM (alongside an IoC container) is a must-have these days for any “enterprise” sized project and so I’ll choose “Super Duper ORM” because it’s free and has some good press.

So I plumb it in and start developing features backed by lots of automated unit & integration tests. Life is good. Then we hit UAT and data volumes start to grow and things start creaking here and there. A quick bit of Googling suggests we’re probably pulling way too much data over the wire and should be using Lazy Loading instead. So we turn it on. Life is good again. But is it? Really?

Are You Qualified to Make That Decision?

Programmers are problem solvers and therefore it’s natural for them to want to see a problem through to the end, hopefully being the one to fix it on the way. But fixing the problem at hand should be done in a considered manner that weighs up any trade-offs; lest the proverbial “free-lunch” rears its ugly head. I’m not suggesting that you have to analyse every possible angle because quite often you can’t, in which case one of the trade-offs may well be the very fact that you cannot know what they are; but at least that should be a conscious decision. Where possible there should also be some record of that decision such as a comment in the code or configuration file, or a maybe a task in the bug database to revisit the decision at a later date when more facts are available. If you hit a problem and spent any time investigating it you can be pretty sure that your successor will too; but at least you’ll have given them a head start[*].

One common source of irritation is time-outs. When a piece of code fails due to a timeout the natural answer seems to be to just bump the timeout up to some ridiculous value without regard to the consequences. If the code is cut-and-pasted from an example it may well have an INFINITE timeout which could make things really interesting further down the line. For some reason configuration changes do not appear to require the same degree of consideration as a code change and yet they can cause just as much head-scratching as you try to work out why one part of the system ended up dealing with an issue that should have surfaced elsewhere.

Lifting the Bonnet

I’ll freely admit that I’m the kind of person who likes to know what’s going on behind the scenes. If I’m going to buy into some 3rd party product I want to know exactly how much control I have over it because when, not if, a problem surfaces I’m going to need to fix it or work around it. And that is unlikely to be achievable without giving up something in return, even if its psychological[+] rather than financial.

However I accept that not everyone is like that and some prefer to learn only enough to get them through the current problem and onto the next one; junior programmers and part-time developers don’t know any better so they’re [partly] excused. In some environments where time-to-market is everything that attitude is probably even highly desirable, but that’s not the environment I’ve ended up working in.

It’s Just a Lack of Testing...

It’s true that the side-effects of many of the problems caused by this kind of mentality could probably be observed during system testing - but only so long as your system tests actively invoke some of the more exceptional behaviour and/or are able to generate the kinds of excessive loads you’re hoping to cope with. If you don’t have the kind of infrastructure available in DEV and/or UAT for serious performance testing, and let’s face it most bean counters would see it as the perfect excuse to eliminate “waste”, you need to put even more thought into what you’re doing - not less.

 

[*] To some people this is the very definition of job security. Personally I just try hard not to end up on the The Daily WTF.

[+] No one likes working on a basket-case system where you don’t actually do any development because you spend your entire time doing support. I have a mental note of various systems my colleagues have assumed me I should avoid working on at all costs...