Tuesday, 20 May 2014

Developing With Git / Pushing to TFS

My current project is in the somewhat bizarre position of having to move from Git to TFS. Let’s not dwell on why enterprises make such ridiculous demands and instead focus on what we can do about it. In an ideal world the future would already be here and we’d be using VS2013 and the TFS servers would be the edition that supports Git natively (also 2013 IIRC). Sadly my client uses neither of those; we are using VS2010 and TFS 2010 which means we needed to find some kind of bridge, ala Git/SVN.

I discovered there are two main choices available - Git-TF and Git-TFS. From what I read on StackOverflow and various other blogs Git-TF is pretty much out of the picture these days now that Microsoft are embracing Git themselves. Git-TFS on the other is still being actively maintained and has its own Google Group too which receives some TLC.

Depending on your expectations and how you intend to use it you may find Git-TFS works great, or like me you may find there are enough quirks to cause setting things up to be quite time consuming. The “cloning from TFS” scenario appears to be well catered for as this follows the existing Git/SVN model. The scenario we needed was to import our GitHub repo into a fresh TFS repo and then to continually import changes from the GutHub repo to TFS in the background, ideally automatically as part of our CI process.

This post mostly tackles the initial import as it’s taken some time just to get this working. At the moment the background refresh is done manually once a week until I can work out how to script it reliably.

Machine Setup

This was intended to be a fully automated process and so the first hurdle I ran into was accessing TFS from the build machine where the future Jenkins job would run. We already had the Git binaries installed but it’s worth noting that you will get a warning about Git v1.8.4, which we were using so I upgraded it.

There is no simple binaries package with just the TF.exe command line tool, but there is a good blog post that tells you how to get at the necessary minimum bits and pieces from the Team Explorer .ISO image for CI use. At least you don’t have install the whole of Visual Studio just to do it.

Once I could “ping” the TFS servers with TF.exe I stored the login and password for accessing TFS in the Windows Credential Manager by following the instructions in this blog post. Note that this implies your machine is running something a little more modern than Windows XP/Server 2003.

With both the Git and TFS binaries installed I created a folder on the server for the git-tfs binaries, any scripts and the Git repo that would be used as the bridge:

> mkdir D:\git-tfs
> cd /d D:\git-tfs
> mkdir bin

I copied the git-tfs 0.19.2 binaries into the bin folder and then created a little batch file to adjust the PATH so that I would have the Git, TFS and Git-TFS binaries all easily accessible (I called it SetPath.cmd):

@set PATH=%PATH%;^
D:\git-tfs\bin;^
C:\Program Files (x86)\Git\bin;^
C:\Program Files (x86)\Microsoft Visual Studio 10.0\Common7\IDE

With the tooling in place we’re ready to create the TFS and Git repos.

Create the TFS Repo

First we create a folder to house the “bridge” repo and map the folder for TFS to use:

> mkdir repo
> tf workspace /new
  /collection:http://tfs.example.org/tfs/MyCollection
  /noprompt
> tf workfold /map $/Project D:\git-tfs\repo

Next we create a folder for the working copy/branch and add a single empty file to act as the seed for Git-TFS (see later for more details).

> cd repo
> mkdir master
> echo. 2>master\seed-git-tfs

> tf add master
> tf add master\seed-git-tfs
> tf checkin /comment:"seed commit for git-tfs" /noprompt

I chose to name the root folder “master” to reflect that fact that it’s coming from the master branch in the GitHub repo. Then I removed all trace of the folder used to initialise the TFS side:

> tf workspace /delete my_workspace /noprompt
> del /f master\seed-git-tfs
> rmdir master

The final step is to convert the folder (called “master” here) to a branch in TFS. There is no command-line support in the standard TF.exe tool to convert a folder to a branch so I used the GUI, but I believe it might be possible using some TFS power tools.

With the TFS repo primed we can now switch to the Git side.

Create the Git “Bridge” Repo

We create the Git repo by cloning the TFS branch that we just created. Here I’ve chosen to name the Git repo folder “master” as well:

> git-tfs clone http://tfs.example.org/tfs/MyCollection $/Project/master master

This will create the repo, set up a remote for TFS and pull down the single commit we made earlier so that the master branch will end up with a single, empty file in the root. Next we attach the repo to our Git source, which in my case was a GitHub repo:

> cd master

At this point we have one remote branch called tfs/default which contains the TFS side we want to push to, and another called github/master which is the source where we are currently developing.

Import the Initial Commits

Assuming, like me, that you’ve already been working for some time with your Git repo you’ll need to push across the initial set of commits to date. To do that we do a pull, but also need to rebase our work on top of the tfs/default branch which has our TFS seed commit in it:

> git pull github master
> git rebase tfs/default

With the master branch now containing all our upstream commits (plus the TFS seed commit) we can push the lot to TFS:

> git-tfs rcheckin --quick

Using rcheckin will ensure that we get one commit in TFS for each commit in Git, i.e. the histories should match (issues from the rebase process notwithstanding). Git-TFS will be conservative by default and assume that something might change at the TFS end whilst the push is happening, however if you know that no one will be contributing at that end you can add the --quick switch which makes a huge difference to performance.

See the notes later if you want to know why the rebase is necessary.

Importing Subsequent Git Commits

At this point we have a TFS repo mirroring our Git source repo. However we are constantly adding to the Git repo and so we need to continually push any changes to TFS in the background. This part I was hoping to automate with a CI task but so far I’ve only done it manually a few times as we don’t need to be in sync all the time.

Due to the rebase that we had to do earlier the master and remotes/github/master branches share no common ancestry. Although content wise they should be identical (aside from the initial TFS commit at the tail) the SHA-1 also contains the ancestry details in it which is subtlety different only right at the very start. Consequently the SHA-1’s for the same commits on the two branches don’t match. This means we need to manually work out which are the more recent commits from the upstream repo and use the cherry-pick command to play them on top of the HEAD of master.

> cd /d D:\git-tfs\repo\master
> git fetch github
> git log -n 1 --pretty=oneline master
<SHA-1> <commit message of last one pushed to TFS >
> git log --pretty=oneline remotes/github/master | findstr /c:”<message>”
<SHA-1> <matching commit message in upstream repo>
> git log -n 1 --pretty=oneline remotes/github/master
<SHA-1> <commit message of current head of upstream changes>
> git cherry-pick <SHA-1 of last upstream commit pushed>..<SHA-1 of upstream HEAD>
> git-tfs rcheckin --quick

The SHA-1 range provided to the cherry-pick command is the half-open range that is exclusive of the earliest commit, but inclusive of the latest.

As I said above I have not automated this, mostly because I need to find a better way to identify the HEAD of master in the remotes branch as just relying on the commit message feels far too brittle even though we generally write fairly decent commit messages. Also the frequency we need to do this is far from being time-consuming.

Additional Notes

I spent quite a bit of time trying to get Git-TFS to work and along the way I bumped into a number of problems. I’m willing to accept that many of these are entirely my own fault and it’s quite possible there is an easy explanation, but given the amount of Googling I did, if there is, I’m afraid it wasn’t obvious to me. I’m neither an expert in Git or TFS either and so some problems may be entirely down to my inexperience. Hopefully the comments section of this post will be put to good use to point to the correct fu.

Hence the following sections are me trying to explain what I experienced and how that has affected the process I eventually arrived at above.

TFS Branch

I started out trying to use Git-TFS without there being any branches in TFS and I just got what I thought was a benign warning about “No TFS parents found!” when I cloned it. However, I couldn’t get Git-TFS to work at all without there being a branch in TFS and it wasn’t obvious to me that this might be a significant problem. I don’t know if you’re supposed to be able to use Git-TFS without there being at least one branch in TFS but the people I spoke to have one and therefore so do I.

The TFS Seed Commit

When you clone a TFS repo, if it doesn’t have at least one commit Git-TFS will just say there is nothing to do when try and rcheckin. Git-TFS appears to track the TFS ChangeSet number for each Git commit as it pushes it so that it can detect changes on the TFS side. If there is no ChangeSet anywhere in the history Git-TFS does not seem to be happy.

The Rebase

When I first brought in the upstream repo I naturally did a pull which merged master (containing my seed TFS commit) and the entire set of upstream commits. When I came to execute rcheckin it failed with this error:

Fetching changes from TFS to minimize possibility of late conflict...
Working on the merge commit: 11a00c52c2b54657220862d63b315ffeb80010b6
Sequence contains no elements

For some reason Git-TFS was ignoring all the commits in the merge that came from upstream. When I rebased master rcheckin was happy to push the changes to TFS.

We had tried to import another project before this one and had terrible problems with merge commits. That Git repo had a number of merge commits in and they would all have had to be resolved as part of the rebase, so we ditched the idea. With the current project we decided up front (on the suspicion that TFS would enter the frame again) that we would only ever use rebase when pulling. A couple of merge commits did slip in but there were no conflicts to resolve and so that has not been an issue this time around.

Once again I am unsure whether this is supposed to work or not.

Using Git Graft/Replace

This lead me to question whether there was another way to “fake” history by trying to pretend that the upstream commits were really rooted on top of the TFS seed commit. For this I investigated the use of grafts, and subsequently the newer “replace” command. After identifying the TFS and Git initial commits I tried to re-parent the Git commit onto the TFS commit like so:-

(A) TFS Seed: 62cb57f2422fca676055d35ed4d53fba187acac1
(B) Git Seed: c9bafc5f56b20b69dddca1b98449aceb96426c80
 
> git checkout c9bafc5~0
> git reset --soft 62cb57f
> git commit -C c9bafc5
> git replace c9bafc5 HEAD
> git checkout -

All appeared to go OK and looking at the history with TortoiseGit it seemed just as I would have expected. However when I ran git-tfs rcheckin I got the following error:

latest TFS commit should be parent of commits being checked in

I can’t tell if my grafting was incorrect or whether Git-TFS uses some technique to walk the parentage that is foiled by this hack.

LF Line Endings in TFS

After all this hard work getting my code into TFS with a history to match our Git repo, I was saddened to discover that when I mapped a workspace in TFS and got the latest version, that our entire codebase had LF-style line endings. Whilst far from disastrous I was surprised that it had happened because we specifically set autocrlf=true in our Git repo because we’re working exclusively on Windows. My flawed assumption was that the code in TFS (which does not have this line-ending mapping concept) would match whatever the “bridge” Git repo was configured with.

I thought I’d made a mistake and should have specified autocrlf=true when cloning the TFS repo with Git-TFS, but that turned out to make no difference. Fortunately there was an existing Google Groups thread that discussed this behaviour and it appeared I was not the only one who was suffering from this.

In short, the default behaviour of Git-TFS is to push the Git content as-is to the TFS repo. This means that if you use autocrlf=true, which maps line-endings to/from the Git working copy as CR/LF but stores them internally in Git as LF, then your TFS repo will end up with LF line endings. The only way to get CR/LF line endings in the TFS repo is to use autocrlf=false which means that Git stores content as-is (which would usually be CR/LF with Windows tools).

If you read that Google Groups thread you’ll see that @solpolenious has forked Git-TFS and added a small change that translates the line endings for text files so that they appear correctly in TFS. I have not personally tried this fork as I discovered it too late in the day, but it would be good if could be enabled, perhaps by a command-line switch, so that those of us who started out with the wrong autocrlf setting from the outset still have a workaround.

Work Item Mapping

The one final thing that tripped us up was a change in v0.19.2 that tries to match commits to TFS work items by parsing the commit messages looking for “#<number>” tags. We use these, but to associate the commit with our Trello cards, not TFS work items. This is not configurable in Git-TFS 0.19.2, however there is a change that was pulled months ago that should appear in the next release which allows you to configure the regex used to do the matching. I didn’t want the matching to occur at all so I set it to something I knew wouldn’t appear (it’s also the example used in the Git-TFS docs):

> git config git-tfs.work-item-regex "workitem #(?<item_id>\d+)"

Conclusion

It might seem that I’m being overly critical of Git-TFS, but that’s really not my intention. I started out with a goal of automating a solution for pushing any changes to our GitHub repo into another TFS one, with full history intact. Whilst I’ve fallen a little short of that goal I have saved myself the huge amount of time it would have taken me to manually reconstruct the project history in TFS! Thanks Git-TFS.

3 comments:

  1. Hi. I'm interested, why you've chosen to run Jenkins against TFS instead of having some git bare repository, which fetches changes from tfs (git tfs fetch) by scheduler, like once each 5 minutes. And then having main TFS branch build in the same way, but also with git advantages.

    It was my way some time ago - actually this target is the main reason rcheckin command exists :)

    ReplyDelete
    Replies
    1. We're not running Jenkins against TFS, we're using Jenkins against GitHub and have been for many months. TFS is purely for escrow purposes, we don't want to use it for development (we can't work remotely with it for starters). Ideally we'd just transfer ownership of the GitHub repo over to them at the end, but for some reason they want to manage their own TFS farm instead :-).

      Delete
  2. Ahh, I thought at first that you want access to the TFS on build agent for fetching latest sources. It was for syncing github repo with TFS, I see now.
    People who want TFS instead of git without management pressure/politics are indeed voodoo people :)
    Especially given that MS teams themselves work more and more with git.

    ReplyDelete