Friday 3 October 2014

Building the Pipeline - Process Led or Product Led?

[Edit 03/2019: I’ve tweaked the title and text slightly as it was somewhat confusing and the title gave entirely the wrong message when seen in isolation.]

After being in the comfortable position of working on greenfield projects in more recent times I’ve taken the opportunity to work on an existing system for the forthcoming months. Along with getting to know some new people I also have to get to know someone else’s codebase, techniques, build process, deployment, etc. One of my initial surprises was around the build pipeline - it cannot be replicated (easily) by a developer.

In a situation like this where my preconceptions have been challenged, rather than throw my hands up in disgust and cry “what are you lot doing?”, I prefer to take it as an opportunity to question my own principles. I do not ever want to be one of those people that just says “we’ve always done it this way” - I want to continually question my own beliefs to ensure that I don’t forget why I’m doing them. One of these principles is that it should be possible for me to replicate any build and deployment on my own workstation. For me, being able to work entirely in isolation is the one sure way to know that anything that goes wrong is almost entirely of my own doing and not due to interference from external sources.

The One-Stop Shop - Visual Studio

I believe one reason for how you might end up in this state of affairs is because Visual Studio appears to be a one-stop shop for developing, building and deploying a solution. Many developers rely heavily on its features and do not ever step outside it to understand how its use might fit into the bigger picture in a large system (I covered this a while back in “Don’t Let Your Tools Pwn You”). For me Visual Studio is mostly an integrated text editor and makefile manager - I can just as easily view and edit code in Notepad, Notepad++, TortoiseMerge, etc.

Given the way the team works and the demarcation between roles, it is easy to see how the development process is reflected in that practice, i.e Conway’s Law. The developers write code and tests in Visual Studio and push to the repo. The build pipeline picks up the code, builds and packages it, deploys it and runs the tests. The second stage in the process is managed (as in a real person) by a build manager - a dedicated role I’ve never come across before. In every team I’ve worked in to date, both (much) bigger and smaller in size, it has been the developers that put together the entire build and deployment process.

Please note that I’m not suggesting someone who remains focused on the core development duties is somehow inferior to others that have more general roles. On the contrary diversity in all its guises is a good thing for a team to have. Customers pay for the functionality not the development process and therefore if anything they generate more value than I do.

Process or Product First?

I’ve really struggled to try and succinctly categorise these two approaches. The line between them seems to be down to whether you look to develop a process first that you then automate with a product, or whether you start automating the process directly with a product. I’ve only ever worked in the former way, essentially by building a set of scripts on my local workstation that carve out the process (See “Layered Builds” and “Wrapper Scripts”). I then get to debug the process as much as possible before attempting to automate it, by which time the only (hopefully minor) differences should be down to the environment. This also has the nice side-effect that pretty much the entire build process then lives within the repo itself and so is versioned with the code [1].

Although I don’t know for sure, what I suspect has happened here is that the project has got started using Visual Studio which keeps the developers busy. Then the process of creating a build pipeline starts by picking a CI technology, such as Jenkins or TeamCity, and then stitching together the building blocks using the CI tool’s UI. Because the developer’s role stops at getting the acceptance tests passing, the process beyond that becomes someone else’s responsibility. I’m sure the developer’s helped debug the pipeline at some point, but I’d hedge my bets that it had to be done on the build server.

In the modern agile world where we start with a walking skeleton is it preferable to get the walking skeleton build automated or a solid isolated development process going?

Build Differences 

The difference in these two approaches has been foremost in my mind today as I spent the afternoon trying to understand why the Debug and Release build configurations were different. I tried to create a simple script that would replicate what the build pipeline is doing and found that the debug build worked fine locally, but the Release build failed. However the converse was true on the TFS build farm. What this means is that the developers work solely with debug code and the build pipeline works solely with release code. While in practice this should not be too bothersome, it does mean that any problems that do show up once the CI gets its hands on your code cannot be easily replicated locally.

The first problem I turned up straight away was that building on the command line via MSBuild was fine, and explained why the build machine also passed the build, whilst building through Visual Studio failed during compilation. It turned out that you had to build the solution twice to make Visual Studio happy. The reason no one else had noticed (or more likely had forgotten about the problem) was because they weren’t following the sort of practice I advocate in “Cleaning the Workspace”.

This turned out to be a simple missing NuGet package dependency. The problem this afternoon was much harder to diagnose because I knew nothing about TFS and its build farm. Like all brownfield projects the person you really want to speak to left just before you started and so I had to figure out myself why the Wix setup project was using $(SolutionDir) to construct the binaries path for a debug build, but $(OutDir)_PublishedWebsites for the release build. After a little googling I stumbled across the following blog post “Override the TFS Team Build OutDir property in TFS 2013” that put me on the right track.

It seems that a common practice with TFS farm builds is to put all the binaries in one folder, and this can be achieved by overriding the $(OutDir) variable on the MSBuild command line. This lead to me modifying my script so that a debug build executes like this:

> msbuild.exe Solution.sln /v:m
  "/p:Configuration=Debug" "/p:Platform=Any CPU"

…whilst a release build would be this:

> msbuild.exe Solution.sln /v:m
  "/p:Configuration=Release" "/p:Platform=Any CPU"
  "/p:OutDir=%root%\Build\Release\"

Confidence Trick

Clearly the team has coped admirably without the kind of build script I’m trying to put together, so it’s hardly essential. However I personally feel uncomfortable developing without such a tool available to quickly run through a build and deployment so that I could do some local system-level testing [2]. I like to refactor heavily and for me to have confidence in the changes I’m making I like to have the tools readily available otherwise I’m tempted not to bother.

As to whether the pipeline is more maintainable or not by leveraging the automation product to do more of the work remains to be seen. I’m certainly looking forward to seeing how this team structure plays out and in the meantime I may learn to trust modern build products a bit more and perhaps let go of one or two old fashioned responsibilities in the process.

 

[1] When I first got to use Jenkins I wondered how easy it would be to keep the tool’s configuration in the VCS - it’s trivial. I wrote a simple script to use xcopy /s to copy the config.xml files from the Jenkins folder into a suitable folder in our repo and then check it in. Whilst this is not the entire Jenkins configuration it would be enough to help us get a replacement Jenkins instance up and running quickly, which is one of the reasons for doing it.

[2] Sadly the current setup relies on shared infrastructure, e.g. databases and message queues so there is still some work to do if total isolation is to be achieved.

No comments:

Post a Comment