Monday 22 August 2016

Sharing Code with Git Subtree

The codebase I currently work on is split into a number of repositories. For example the infrastructure and deployment scripts are in separate repos as are each service-style “component”.

Manual Syncing

To keep things moving along the team decided that the handful of bits of code that were shared between the two services could easily be managed by a spot of manual copying. By keeping the shared code in a separate namespace it was also partitioned off to help make it apparent that this code was at some point going to be elevated to a more formal “shared” status.

This approach was clearly not sustainable but sufficed whilst the team was still working out what to build. Eventually we reached a point where we needed to bring the logging and monitoring stuff in-sync and I also wanted to share some other useful code like an Optional<T> type. It also became apparent that the shared code was missing quite a few unit tests as well.

Share Source or Binaries?

The gut reaction to such a problem in a language like C# would probably be to hive off the shared code into a separate repo and create another build pipeline for it that would result in publishing a package via a NuGet feed. And that is certainly what we expected to do. However the problem was where to publish the packages to as this was closed source. The organisation had its own license for an Enterprise-scale product but it wasn’t initially reachable from outside the premises where our codebase lay. Also there were some problems with getting NuGet to publish to it with an API key that seemed to lay with the way the product’s permissions were configured.

Hence to keep the ball rolling we decided to share the code at the source level by pulling the shared repo into each component’s solution. There are two common ways of doing this with Git – subtrees and submodules.

Git Submodules

It seemed logical that we should adopt the more modern submodule approach as it felt easier to attach, update and detach later. It also appeared to have support in the Jenkins 1.x plugin for doing a recursive clone so we wouldn’t have to frig it with some manual Git voodoo.

As always there is a difference between theory and practice. Whilst I suspect the submodule feature in the Jenkins plugin works great with publicly accessible open-source repos it’s not quite up to scratch when it comes to private repos that require credentials. After much gnashing of teeth trying to convince the Jenkins plugin to recursively clone the submodules, we conceded defeat assuming that we’re another victim of JENKINS-20941.

Git Subtree

Given that our long term goal was to move to publishing a NuGet feed we decided to try using a Git subtree instead so that we could at least move forward and share code. This turned out (initially) to be much simpler because for tooling like Jenkins it appears no different to a single repo.

Our source tree looked (unsurprisingly) like this:

<solution>
  +- src
     +- app
     +- shared-lib
        +- .csproj
        +- *.cs

All we needed to do was replace the shared-lib folder with the contents of the new Shared repository.

First we needed to set up a Git remote. Just as the remote main branch of a cloned repo goes by the name origin/master, so we set up a remote for the Shared repository’s main branch:

> git remote add shared https://github/org/Shared.git

Next we removed the old shared library folder:

> git rm src\shared-lib

…and grafted the new one in from the remote branch:

> git subtree add --prefix src/shared shared master --squash

This effectively takes the shared/master branch and links it further down the repo source tree to src/shared which is where we had it before.

However the organisation of the new Shared repo is not exactly the same as the old shared-lib project folder. A single child project usually sits in it’s own folder, but a full-on repo has it’s own src folder and build scripts and so the source tree now looked like this:

<solution>
  +- src
     +- app
     +- shared
        +- src
           +- shared-lib
              +- .csproj
              +- *.cs

There is now two extra levels of indirection. First there is the shared folder which corresponds to the external repo, plus there is that repo’s src folder.

At this point all that was left to do was to fix up the build, i.e. fix up the path to the shared-lib project in the Visual Studio solution file (.sln) and push the changes.

We chose to use the --squash flag when creating the subtree as we weren’t interested in seeing the entire history of the shared library in the solution’s repository.

Updating the Subtree

Flowing changes from the parent repo down into the subtree of the child repo is as simple as a fetch & pull:

> git fetch shared master
> git subtree pull --prefix src/shared shared master --squash

The latter command is almost the same as the one we used earlier but we pull rather than add. Once again we’re squashing the entire history as we’re not interested in it.

Pushing Changes Back

Naturally you might want to make a change in the subtree in the context of the entire solution and then push it back up to the parent repo. This is doable but involves using git subtree push to normalise the change back into the folder structure of the parent repo.

Personally we decided just to make the changes test-first in the parent and always flow down to the child. In the few cases the child solution helped in debugging we decided to work on the fix in the child solution workspace and then simply manually copy the change over to the shared workspace and push it out through the normal route. It’s by no means optimal but a NuGet feed was always our end game so we tolerated the little bit of friction in the short term.

The End of the Road

If we were only sucking in libraries that had no external dependencies themselves (up to that point our small shared code only relied on the .Net BCL) we might have got away with this technique for longer. But in the end the need to pull in 3rd party dependencies via NuGet in the shared project pushed it over the edge.

The problem is that NuGet packages are on a per-solution basis and the <HintPath> element in the project file assumes a relative path (essentially) from the solution file. When working in the real repo as part of the shared solution it was “..\..\packages\Xxx”, but when it’s part of the subtree based solution it needed to be two levels further up as “..\..\..\..\packages\Xxx”.

Although I didn’t spend long looking I couldn’t find a simple way to easily overcome this problem and so we decided it was time to bite-the-bullet and fix the real issue which was publishing the shared library as a NuGet feed.

Partial Success

This clearly is not anything like what you’d call an extensive use of git subtree to share code, but it certainly gave me a feel for it can do and I think it was relatively painless. What caused us to abandon it was tooling specific (the relationship between the enclosing solution’s NuGet packages folder and the shared assembly project itself) and so a different toolchain may well fair much better if build configuration is only passed down from parent to subtree.

I suspect the main force that might deter you from this technique is how much you know, or feel you need to know, about how git works. When you’re inside a tool like Visual Studio it’s very easy to make a change in the subtree folder and check it in and not necessarily realise you’re modifying what is essentially read-only code. When you next update the subtree things get sticky. Hence you really need to be diligent about your changes and pay extra attention when you commit to ensure you don’t accidentally include edits within the subtree (if you’re not planning on pushing back that way). Depending on how experienced your team are this kind of tip-toeing around the codebase might be just one more thing you’re not willing to take on.

1 comment:

  1. An issue we have hit with subtrees is that you can't move them or rename the directories they live in without causing issues. Git handles renames and moving of regular directories so well that it lulls you into a false sense of security.

    The problems only strike when you try to push or pull the subtree at which point git tells you there is no such subtree, which may be many commits after the original move occurred. The only solution I have found is to remove the directory and re-add as a subtree. Messy.

    ReplyDelete