Thursday, 9 January 2014

Cleaning the Workspace

I have a habit - I like a clean workspace. No, I don’t mean my office desk or worktop in the kitchen but the workspace where I make my code changes. In ClearCase it was called a view and in Subversion the working copy, i.e. the place where you edit your code, merge, build it, run tests, etc.

Whilst I’m making the edits I don’t mind what state it’s in. But, when it comes time to integrate my colleague’s changes - merge, build and run the smoke tests before committing my own - I want the workspace spotless. I don’t want to see any residual files or folders lying around that might have the effect of tainting the codebase and giving me that “well, it works on my machine” excuse. When I’ve pushed my changes I fully expect the build machine to remain green and my colleagues to suffer no unnecessary interruption.

Back at the start of last year I wrote a post called “Layered Builds” that had the following outline of what I’d expect a Continuous Integration process to use:-

“For repeatable builds you need to start with a clean slate. If you can afford to that would be to start from an empty folder and pull the entire source tree from the version control system (VCS). However, if you’re using something like ClearCase with snapshot views, that could take longer than the build itself! The alternative is to write a “clean” script that runs through the source tree deleting every file not contained in the VCS repo, such as the .obj, .pdb, .exe, etc. files. Naturally you have to be careful with what you delete, but at least you have the VCS repo as a backup whilst you develop it.”

This “clean script” that I’m referring to is nothing more than a batch file that does a bunch of deletes of files and folders, e.g.

del /s *.ncb 2> nul
del /s *.obj 2> nul
del /s *.pdb 2> nul
. . .
for /d /r %d in (obj) do rmdir %d 2> nul
for /d /r %d in (bin) do rmdir %d 2> nul

For a more concrete example look at the script contained in my Build Scripts (or directly on GitHub).

Note that whilst I might do a recursive delete (del /s) of a build artefact like a .obj file, I almost never do a recursive delete of the artefact folders. This ensures I only clean up the actual build output and not any log files, batch files, test results or other stuff I might have created as part of my ongoing changes.

Build Detritus

My current project uses Jenkins and MSBuild, neither of which I’d used before, so someone else set up the skeleton of the build. Although the initial steps blew away the output folders, anyone who has ever used Visual Studio will know that its idea of “clean” falls way short of “spotless”. It caches a lot of data such as metadata like the .ncb files that Visual C++ uses for IntelliSense, intermediate build data like the header files generated via the #import statement right up to entire 3rd party packages pulled from NuGet. None of this stuff gets blown away if you do a “Clean Solution” from Visual Studio.

Of course the metadata, like where my breakpoints point to (.suo) and the IntelliSense data (.ncb) should have absolutely no effect on the compiled binary artefacts. However a product like Visual Studio is a complex beast and it’s not always apparent what detritus contains build dependent data and what doesn’t. Sometimes “stuff” just gets corrupted and you need to blow the lot away and start again [1]. So instead I err on the side of caution and try and remove as much as possible as often as possible without adversely affecting my productivity. Given the modern preference for short feedback loops it turns out there is very little I ever carry over from one integration to the next; except perhaps the occasional breakpoint or debugging command line.

Old Habits Die Hard

Like all habits you need to remind yourself why you’re doing it every now and again to ensure it still has value - perhaps all those weird build problems are now a thing of the past and I don’t need to do it anymore. Depending on your point of view, the good thing about being someone that religiously cleans their workspace before every integration means that you stand a good chance of being the one that finds the build problem the others didn’t know existed. On my current project the build machine also turned out to be ignorant…

One problem that I kept hitting early on was incorrectly configured NuGet package dependencies. ReSharper has this wonderful feature where it will automatically add the assembly reference and namespace “using” declaration when you start writing some code or paste in code from another assembly. The problem is that it’s not clever enough - it doesn’t fix up the NuGet package configuration when the reference is to a NuGet sourced assembly. Consequently other developers would let ReSharper fix up their build, but when I used my clean script and blew away the NuGet package cache the build might then fail as the assembly could be earlier in the build order and the package might not have been cached at that point. Another problem was the C# code analysis step failing with a weird error if you cut-and-pasted the build configuration and ended up pointing to a stale analysis output file that then doesn’t exist the moment you clean up after yourself.

Continuous Integration

The build machine should have been the one catching any problems, but it was never configured to wipe the slate clean before every build. Given that we were using Jenkins, which is pretty mature and has a mountain of plug-ins, I investigated what the options were. There is a plug-in to clean the workspace before running a job that I’ve since started using on all the other non-Git-repo related jobs. This picked up a couple of minor issues where we had been relying on one deployment overwriting another. We’d also been incorrectly analyzing a stale TestResults.xml file from the unit test run which nobody had spotted.

This plug-in works great except when the cost of recreating the workspace is high, such as in the case of the initial compilation step. The “master” Git repo we’re using is hosted remotely and has of late been suffering from a number of DDoS attacks [2]. That, coupled with its ever growing size means it takes some time to keep cloning the entire repo on every build. Consequently we now have our own step at the start of the build that runs our clean script instead. Yes, as I discovered later, I could have used “git clean -x -d -f” [3] to do the same job; however it has always been on the cards that we would move to another VCS where this won’t be an option so I maintain the script instead.

Clean or Spotless?

This need to distinguish between merely “clean”, i.e. all build artefacts and non-user state removed, and “totally spotless”, where it’s the same as if you’d just cloned the repo, helps avoid some other ugly side-effects caused by auto-generated files.

Visual C# lacks the ability to mark a .cs file as generated, which is exactly what we do to inject the build number via a AssemblyVersionInfo.cs file. As a consequence when you open Visual Studio you get little warning symbols on all these files telling you the file is missing, which is irritating. More annoying though is the popup dialog we get every time we open the solution when the web.config is missing because that’s also auto-generated.

This means that some files I’d prefer to be cleaned up by default are now only purged if the “--all” switch is specified.

Habit Justified?

Like many of the habits programmers adopt, the problems this particular one unearths would likely be unearthed later anyway. The question is how much later and consequently how much longer would it take to diagnose and fix when it does finally appear? The longer the time between a mistake and its discovery the harder it can become to pinpoint. This is especially true when you didn’t make the change yourself and you have a mental attitude of always blaming your own recent changes first rather than assuming someone else is at fault.

I still don’t think it costs me much time and ultimately the team wins which is what I think is more important.

 

[1] This used to be a common problem with Visual C++. If Visual C++ started crashing randomly, then binning the .ncb and other cached data files usually did the trick. I eventually found that Visual C++ was generally pretty solid if you disabled the Navigation Bar at the top of the source files and never touched the Class Wizard. Fortunately things are much better these days (VC++ 9.0 was the last version I used on a daily basis).

[2] The beauty of Git is that, in theory, if the remote repo is down for any length of time we could just clone a repo on the build server and treat that as our new master until the remote one is back. Or we could just push changes between ourselves if the need really arose. In practice outages of any kind (local or remote) can been counted in hours which even The Facebook Generation can cope with.

[3] This is the first time I’ve used Git and historically the version control systems I’ve used didn’t even allow you to ignore files (PVCS, SourceSafe, ClearCase) let alone provide support for resetting your workspace!