Wednesday 18 December 2019

Branching 0 – Git 1

My recent tirade against unnecessary branching – “Git is Not the Problem” – might have given the impression that I don’t appreciate the power that git provides. That’s not true and hopefully the following example highlights the appreciation I have for the power git provides but also why I dislike being put in that position in the first place.

The Branching Strategy

I was working in a small team with a handful of experienced developers making an old C++/ATL based GUI more accessible for users with disabilities. Given the codebase was very mature and maintenance was minimal, our remit only extended so far as making the minimal changes we needed to both the code and resource files. Hence this effectively meant no refactoring – a strictly surgical approach.

The set-up involved an integration branch per-project with us on one and the client’s team on another – master was reserved for releases. However, as they were using Stash for their repos they also wanted us to make use of its ability to create separate pull requests (PR) for every feature. This meant we needed to create independent branches for every single feature as we didn’t have permission to push directly to the integration branch even if we wanted to.

The Bottleneck

For those who haven’t had the pleasure of working with Visual Studio and C++/ATL on a native GUI with other people, there are certain files which tend to be a bottleneck, most notably resource.h. This file contains the mapping for the symbols (nay #defines) to the resource file IDs. Whenever you add a new resource, such as a localizable string, you add a new symbol and bump the two “next ID” counters at the bottom. This project ended up with us adding a lot of new resource strings for the various (localizable) annotations we used to make the various dialog controls more accessible [1].

Aside from the more obvious bottleneck this resource.h file creates, in terms of editing it in a team scenario, it also has one other undesirable effect – project rebuilds. Being a header file, and also one that has a habit of being used across most of the codebase (whether intentionally or not) if it changes then most of the codebase needs re-building. On a GUI of the size we were working on, using the development VMs we had been provided, this amounted to 45 minutes of thumb twiddling every time it changed. As an aside we couldn’t use the built-in Visual Studio editor either as the file had been edited by hand for so long that when it was saved by the editor you ended up with the diff from hell [2].

The Side-Effects

Consequently we ran into two big problems working on this codebase that were essentially linked to that one file. The first was that adding new resources meant updating the file in a way that was undoubtedly going to generate a merge conflict with every other branch because most tasks meant adding new resources. Even though we tried to coordinate ourselves by introducing padding into the file and artificially bumping the IDs we still ended up causing merge conflicts most of the time.

In hindsight we probably could have made this idea work if we added a huge amount of padding up front and reserved a large range of IDs but we knew there was another team adding GUI stuff on another branch and we expected to integrate with them more often than we did. (We had no real contact with them and the plethora of open branches made it difficult to see what code they were touching.)

The second issue was around the rebuilds. While you can git checkout –b <branch> to create your feature branch without touching resource.h again, the moment you git pull the integration branch and merge you’re going to have to take the hit [3]. Once your changes are integrated and you push your feature branch to the git server it does the integration branch merge for you and moves it forward.

Back on your own machine you want to re-sync by switching back to the integration branch, which I’d normally do with:

> git checkout <branch>
> git pull --ff-only

…except the first step restores the old resource.h before updating it again in the second step back to where you just were! Except now we’ve got another 45 minute rebuild on our hands [3].

Git to the Rescue

It had been some years since any of us had used Visual Studio on such a large GUI and therefore it took us a while to work out why the codebase always seemed to want rebuilding so much. Consequently I looked to the Internet to see if there was a way of going from my feature branch back to the integration branch (which should be identical from a working copy perspective) without any files being touched. It’s git, of course there was a way, and “Fast-forwarding a branch without checking it out” provided the answer [4]:

> git fetch origin <branch>:<branch>
> git checkout <branch>

The trick is to fetch the branch changes from upstream and point the local copy of that branch to its tip. Then, when you do checkout, only the branch metadata needs to change as the versions of the files are identical and nothing gets touched (assuming no other upstream changes have occurred in the meantime).

Discontinuous Integration

In a modern software development world where we strive to integrate as frequently as possible with our colleagues it’s issues like these that remind us what some of the barriers are for some teams. Visual C++ has been around a long time (since 1993) so this problem is not new. It is possible to break up a GUI project – it doesn’t need to have a monolithic resource file – but that requires time & effort to fix and needs to be done in a timely fashion to reap the rewards. In a product this old which is effectively on life-support this is never going to happen now.

As Gerry Weinberg once said “Things are the way they are because they got that way” which is little consolation when the clock is ticking and you’re watching the codebase compile, again.

 

[1] I hope to write up more on this later as the information around this whole area for native apps was pretty sparse and hugely diluted by the same information for web apps.

[2] Luckily it’s a fairly easy format but laying out controls by calculating every window rectangle is pretty tedious. We eventually took a hybrid approach for more complex dialogs where we used the resource editor first, saved our code snippet, reverted all changes, and then manually pasted our snippet back in thereby keeping the diff minimal.

[3] Yes, you can use touch to tweak the file’s timestamp but you need to be sure you can get away with that by working out what the effects might be.

[4] As with any “googling” knowing what the right terms are, to ask the right question, is the majority of the battle.

Monday 16 December 2019

Git is Not the Problem

Git comes in for a lot of stick for being a complicated tool that’s hard to learn, and they’re right, git is a complicated tool. But it’s a tool designed to solve a difficult problem – many disparate people collaborating on a single product in a totally decentralized fashion. However, many of us don’t need to work that way, so why are we using the tool in a way that makes our lives more difficult?

KISS

For my entire professional programming career, which now spans over 25 years, and my personal endeavours, I have used a version control tool (VCS) to manage the source code. In that time, for the most part, I have worked in a trunk-based development fashion [1]. That means all development goes on in one integration branch and the general philosophy for every commit is “always be ready to ship” [2]. As you might guess features toggles (in many different guises) play a significant part in achieving that.

A consequence of this simplistic way of working is that my development cycle, and therefore my use of git, boils down to these few steps [3]:

  • clone
  • edit / build / test
  • diff
  • add / commit
  • pull
  • push

There may occasionally be a short inner loop where a merge conflict shows up during the pull (integration) phase which causes me to go through the edit / diff / commit cycle again, but by-and-large conflicts are rare due to close collaboration and very short change cycles. Ultimately though, from the gazillions of commands that git supports, I mostly use just those 6. As you can probably guess, despite using git for nearly 7 years, I actually know very little about it (command wise). [4]

Isolation

Where I see people getting into trouble and subsequently venting their anger is when branches are involved. This is not a problem which is specific to git though, you see this crop up with any VCS that supports branches whether it be ClearCase, Perforce, Subversion, etc. Hence, the tool is not the problem, the workflow is. And that commonly stems from a delivery process mandated by the organization, meaning that ultimately the issue is one of an organizational nature, not the tooling per-se.

An organisation which seeks to reduce risk by isolating work (and by extension its people) onto branches is increasing the delay in feedback thereby paradoxically increasing the risk of integration, or so-called “merge debt”. A natural side-effect of making it harder to push through changes is that people will start batching up work in an attempt to boost "efficiency”. The trick is to go in the opposite direction and break things down into smaller units of work that are easier to produce and quicker to improve. Balancing production code changes with a solid investment in test coverage and automation reduces that risk further along with collaboration boosting techniques like pair and mob programming.

Less is More

Instead of enforcing a complicated workflow and employing complex tools in the hope that we can remain in control of our process we should instead seek to keep the workflow simple so that our tools remain easy to use. Git was written to solve a problem most teams don’t have as they neither have the volume of distributed people or complexity of product to deal with. Organisations that do have complex codebases cannot expect to dig themselves out of their hole simply by introducing a more powerful version control tool, it will only increase the cost of delay while bringing a false sense of security as programmers work in the dark for longer.

 

[1] My “Branching Strategies” article in ACCU’s Overload covers this topic if you’re looking for a summary.

[2] This does not preclude the use of private branches for spikes and/or release branches for hotfix engineering when absolutely needed. #NoAbsolutes.

[3] See “In The Toolbox - Commit Checklist” for some deeper discussion about what goes through my head during the diff / commit phase.

[4] I pondered including “log” in the list for when doing a spot of software archaeology but that is becoming much rarer these days. I also only use “fetch” when I have to work with feature branches.

Friday 13 December 2019

Choosing “a” Database, not “the” Database

One thing I’ve run across a few times over the years is the notion that an application or system has one, and only one, database product. It’s as if the answer to the question about where we should store our data must be about where we store “all” our data.

Horses for Courses

I’ve actually touched on this topic before in “Deferring the Database Choice” where our team tried to put off the question as long as possible because of a previous myopic mindset and there was a really strong possibility that we might even have a need for two different styles of database – relational and document-oriented – because we had two different types of data to store with very different constraints.

In that instance, after eventually working out what we really needed, we decided to look at a traditional relational database for the transactional data [1], while we looked towards the blossoming NoSQL crowd for the higher-volume non-transactional data. While one might have sufficed for both purposes the organisational structure and lack of operational experience at the time meant we didn’t feel comfortable putting all our eggs in that one NoSQL basket up front.

As an aside the Solution Architect [2] who was assigned to our team by the client definitely seemed out of their comfort zone with the notion that we might want to use different products for different purposes.

Platform Investment

My more recent example of this line of reasoning around “the one size fits all” misnomer was while doing some consulting at a firm in the insurance sector, an area where mainframes and legacy systems pervade the landscape.

In this particular case I had been asked to help advise on the architecture of a few new internal services they were planning. Two were really just caches of upstream data designed to reduce the per-cost call of 3rd party services while the third would serve up flood related data which was due to be incorporated into insurance pricing.

To me they all seemed like no-brainers. Even the flood data service just felt like it was probably a simple web service (maybe REST) that looks up the data in a document oriented database based on the postcode key. The volume of requests and size of the dataset did not seem remarkable in any way, nor the other caches. The only thing that I felt deserved any real thought was around the versioning of the data, if that was even a genuine consideration. (I was mostly trying to think of any potential risks that might vaguely add to the apparent lack of complexity.)

Given the company already called out from its mainframe to other web services they had built, this was a solved problem, and therefore I felt there was no reason not to start knocking up the flood data service which, given its simplicity, could be done outside-in so that they’d have their first microservice built TDD-style (an approach they wanted to try out anyway). They could even plug it in pretty quickly and just ignore the responses back to the mainframe in the short term so that they could start getting a feel for the operational aspects. In essence it seemed the perfect learning opportunity for many new skills within the department.

An Undercurrent

However, while I saw this as a low-risk venture there were questions from further up effectively about choosing the database. I suspected there were concerns about the cost but some rudimentary calculations based around a three-node cluster with redundant disks versus storage for the mainframe showed that they weren’t even in the same ballpark and we’re not even talking SSDs here either. (This also ignores the fact that they were close to maxing out the mainframe anyway.)

One of the great things about databases in these modern times is that you can download the binaries and just fire one up and get playing. Given the dataset fitted the document-oriented paradigm and there were no transactions to speak of I suggested they pick either MongoDB or Couchbase and just get started as it was the paradigm they most needed to get acquainted with, the specific vendor (to me) was less of a concern in the shorter term as the data model was simple.

Nevertheless, rather than build something first and get a feel for what makes most sense, they wanted to invite the various big NoSQL vendors in and discuss contracts and products up-front. So I arranged for the three main contenders at the time to visit the company’s offices and give a pitch, followed by some Q&A time for the management to ask any burning questions. It was during the first of these three pitches that I began to realise where the disconnect lay between my vision and theirs.

While I had always been working on the assumption that the company was most comfortable with mainframes and relational databases and that they wanted to step outside that and move to a less monolithic architecture, perhaps using the Strangler Pattern to break out the peripheral services into independent self-contained ones, they still saw a single database product sitting at the heart. Yes, the services might be built separately, and the data may well be partitioned via namespaces or collections or whatever, but fundamentally the assumption was that the data storage was still effectively monolithic.

A False Economy

In retrospect I shouldn’t really have been that surprised. The reason the mainframe had probably survived for so long was that the data was seen as the crown jewels and the problems of redundancy and backup had been solved long ago and were pretty robust. In fact if anything went wrong the vendor could helicopter some experts in (which they had done in the past). This was not the level of service offered by the new kids on the block and the company was still far from getting comfortable with cloud hosting and managed service providers which were are starting to spring up.

Hence, where I was looking at the somewhat disposable nature of the new services purely as an opportunity for learning, others higher up were looking at it as a stepping stone to moving all their data across to another platform. Coupled with this was the old-fashioned view that the decision needed to be made up-front and needed to be the right one from the off [3].

A Different Investment

Even with this misconception acknowledged and the shining cost savings to be had there was still a heavy reluctance to go with something new. I believe that in the end they put their investment into more mainframe storage instead of investing in their people and the organisation’s longer term future.

 

[1] There was definitely an element of “availability bias” here as the organisation had a volume licensing agreement with a relational database vendor.

[2] A role which highlighted their Ivory Tower approach at the time but has since fallen away as architecture has thankfully started leaning more towards shared ownership.

[3] Some of the impetus for “Don’t Fail Fast, Learn Cheaply” came from conversations I had with this organisation about their approach to career development.

Monday 9 December 2019

Automating Windows VM Creation on Ubuntu

TL;DR you can find my resulting Oz and Packer configuration files in this Oz gist and this Packer gist on my GitHub account.

As someone who has worked almost exclusively on Windows for the last 25 years I was somewhat surprised to find myself needing to create Windows VMs on Linux. Ultimately these were to be build server agents and therefore I needed to automate everything from creating the VM image, to installing Windows, and eventually the build toolchain. This post looks at the first two aspects of this process.

I did have a little prior experience with Packer, but that was on AWS where the base AMIs you’re provided have already got you over the initial OS install hurdle and you can focus on baking in your chosen toolchain and application. This time I was working on-premise and so needed to unpick the Linux virtualization world too.

In the end I managed to get two approaches working – Oz and Packer – on the Ubuntu 18.04 machine I was using. (You may find these instructions useful for other distributions but I have no idea how portable this information is.)

QEMU/KVM/libvirt

On the Windows-as-host side (until fairly recently) virtualization boiled down to a few classic options, such as Hyper-V and Virtual Box. The addition of Docker-style Windows containers, along with Hyper-V containers has padded things out a bit more but to me it’s still fairly manageable.

In contrast on the Linux front, where this technology has been maturing for much longer, we have far more choice, and ultimately, for a Linux n00b like me [1], this means far more noise to wade through on top of the usual “which distribution are you running” type questions. In particular the fact that any documentation on “virtualization” could be referring to containers or hypervisors (or something in-between), when you’re only concerned with hypervisors for running Windows VMs, doesn’t exactly aid comprehension.

Luckily I was pointed towards KVM as a good starting point on the Linux hypervisor front. QEMU is one of those minor distractions as it can provide full emulation, but it also provides the other bit KVM needs to be useful in practice – device emulation. (If you’re feeling nostalgic you can fire up an MS-DOS recovery boot-disk from “All Boot Disks” under QMEU/KVM with minimal effort which gives you a quick sense of achievement.)

What I also found mentioned in the same breath as these two was a virtualization “add-on layer” called libvirt which provides a layer on top of the underlying technology so that you can use more technology agnostic tools. Confusingly you might notice that Packer doesn’t mention libvirt, presumably because it already has providers that work directly with the lower layer.

In summary, using apt, we can install this lot with:

$ sudo apt install qemu qemu-kvm libvirt-bin  bridge-utils  virt-manager -y

Windows ISO & Product Key

We’re going to need a Windows ISO along with a related product key to make this work. While in the end you’ll need a proper license key I found the Windows 10 Evaluation Edition was perfect for experimentation as the VM only lasts for a few minutes before you bin it and start all over again.

You can download the latest Windows image from the MS downloads page which, if you’ve configured your browser’s User-Agent string to appear to be from a non-Windows OS, will avoid all the sign-up nonsense. Alternatively google for “care.dlservice.microsoft.com” and you’ll find plenty of public build scripts that have direct download URLs which are beneficial for automation.

Although the Windows 10 evaluation edition doesn’t need a specific license key you will need a product key to stick in the autounattend.xml file when we get to that point. Luckily you can easily get that from the MS KMS client keys page.

Windows Answer File

By default Windows presents a GUI to configure the OS installation, but if you give it a special XML file known as autounattend.xml (in a special location, which we’ll get to later) all the configuration settings can go in there and the OS installation will be hands-free.

There is a specific Windows tool you can use to generate this file, but an online version in the guise of the Windows Answer File Generator produced a working file with fairly minimal questions. You can also generate one for different versions of the Windows OS which is important as there are many examples that appear on the Internet but it feels like pot-luck as to whether it would work or not as the format changes slightly between releases and it’s not easy to discover where the impedance mismatch lies.

So, at this point we have our Linux hypervisor installed, and downloaded a Windows installation .iso along with a generated autounattend.xml file to drive the Windows install. Now we can get onto building the VM, which I managed to do with two different tools – Oz and Packer.

Oz

I was flicking through a copy of Mastering KVM Virtualization and it mentioned a tool called Oz which was designed to make it easy to build a VM along with installing an OS. More importantly it listed having support for most Windows editions too! Plus it’s been around for a fairly long time so is relatively mature. You can install it with apt:

$ sudo apt install oz -y

To use it you create a simple configuration file (.tdl) with the basic VM details such as CPU count, memory, disk size, etc. along with the OS details, .iso filename, and product key (for Windows), and then run the tool:

$ oz-install -d2 -p windows.tdl -x windows.libvirt.xml

If everything goes according to plan you end up with a QEMU disk image and an .xml file for the VM (called a “domain”) that you can then register with libvirt:

$ virsh define windows.libvirt.xml

Finally you can start the VM via libvirt with:

$ virsh start windows-vm

I initially tried this with the Windows 8 RTM evaluation .iso and it worked right out of the box with the Oz built-in template! However, when it came to Windows 10 the Windows installer complained about there being no product key, despite the Windows 10 template having a placeholder for it and the key was defined in the .tdl configuration file.

It turns out, as you can see from Issue #268 (which I raised in the Oz GitHub repo) that the Windows 10 template is broken. The autounattend.xml file also wants the key in the <UserData> section too it seems. Luckily for me oz-install can accept a custom autounattend.xml file via the -a option as long as we fill in any details manually, like the <AutoLogin> account username / password, product key, and machine name.

$ oz-install -d2 -p windows.tdl -x windows.libvirt.xml –a autounattend.xml

That Oz GitHub issue only contains my suggestions as to what I think needs fixing in the autounattend.xml file, I also have a personal gist on GitHub that contains both the .tdl and .xml files that I successfully used. (Hopefully I’ll get a chance to submit a formal PR at some point so we can get it properly fixed; it also needs a tweak to the Python code as well I believe.)

Note: while I managed to build the basic VM I didn’t try to do any post-processing, e.g. using WinRM to drive the installation of applications and tools from the outside.

Packer

I had originally put Packer to one side because of difficulties getting anything working under Hyper-V on Windows but with my new found knowledge I decided to try again on Linux. What I hadn’t appreciated was quite how much Oz was actually doing for me under the covers.

If you use the Packer documentation [2] [3] and online examples you should happily get the disk image allocated and the VM to fire up in VNC and sit there waiting for you to configure the Windows install. However, after selecting your locale and keyboard you’ll probably find the disk partitioning step stumps you. Even if you follow some examples and put an autounattend.xml on a floppy drive you’ll still likely hit a <DiskConfiguration> error during set-up. The reason is probably because you don’t have the right Windows driver available for it to talk to the underlying virtual disk device (unless you’re lucky enough to pick an IDE based example).

One of the really cool things Oz appears to do is handle this nonsense along with the autounattend.xml file which it also slips into the .iso that it builds on-the-fly. With Packer you have to be more aware and fetch the drivers yourself (which come as part of another .iso) and then mount that explicitly as another CD-ROM drive by using the qemuargs section of the Packer builder config. (In my example it’s mapped as drive E: inside Windows.)

[ "-drive", "file=./virtio-win.iso,media=cdrom,index=3" ]

Luckily you can download the VirtIO drivers .iso from a Fedora page and stick it alongside the Windows .iso. That’s still not quite enough though, we also need to tell the Windows installer where our drivers are located; we do that with a special section in the autounattend.xml file.

<DriverPaths>
  <PathAndCredentials wcm:action="add" wcm:keyValue="1">
    <Path>E:\NetKVM\w10\amd64\</Path>

Finally, in case you’ve not already discovered it, the autounattend.xml file is presented by Packer to the Windows installer as a file in the root of a floppy drive. (The floppy drive and extra CD-ROM drives both fall away once Windows has bootstrapped itself.)

"floppy_files":
[
  "autounattend.xml",

Once again, as mentioned right at the top, I have a personal gist on GitHub that contains the files I eventually got working.

With the QEMU/KVM image built we can then register it with libvirt by using virt-install. I thought the --import switch would be enough here as we now have a runnable image, but that option appears to be for a different scenario [4], instead we have to take two steps – generate the libvirt XML config file using the --print-xml option, and then apply it:

$ virt-install --vcpus ... --disk ...  --print-xml > windows.libvert.xml
$ virsh define windows.libvert.xml

Once again you can start the finalised VM via libvirt with:

$ virsh start windows-vm

Epilogue

While having lots of documentation is generally A Good Thing™, when it’s spread out over a considerable time period it’s sometimes difficult to know if the information you’re reading still applies today. This is particularly true when looking at other people’s example configuration files alongside reading the docs. The long-winded route might still work but the tool might also do it automatically now if you just let it, which keeps your source files much simpler.

Since getting this working I’ve seen other examples which suggest I may have fallen foul of this myself and what I’ve written up may also still be overly complicated! Please feel free to use the comments section on this blog or my gists to inform any other travellers of your own wisdom in any of this.

 

[1] That’s not entirely true. I ran Linux on an Atari TT and a circa v0.85 Linux kernel on a 386 PC in the early-to-mid ‘90s.

[2] The Packer docs can be misleading. For example it says the disk_size is in bytes and you can use suffixes like M or G to simplify matters. Except they don’t work and the value is actually in megabytes. No wonder a value of 15,000,000,000 didn’t work either :o).

[3] Also be aware that the version of Packer available via apt is only 1.0.x and you need to manually download the latest 1.4.x version and unpack the .zip. (I initially thought the bug in [2] was down to a stale version but it’s not.)

[4] The --import switch still fires up the VM as it appears to assume you’re going to add to the current image, not that it is the final image.