Thursday 22 May 2014

Promoting Functional Programming - A Missed Opportunity

One of the blog posts that seemed to get a fair bit of attention on Twitter a little while back was “Functional programming is a ghetto” by Michael O. Church. Whilst I really enjoyed the first 6-7 paragraphs, the rest just raised my ire. This post, for me, was a classic violation of the Separation of Concerns, as applied to writing. But more than that I felt the author had shot themselves in the foot by missing out on a great opportunity to help promote the virtues of functional programming techniques.

If you’ve read the post you’ll see that the early paragraphs are about how they apply aspects of functional programming alongside other paradigms to try and get the best of all worlds. But then for some reason he’s starts talking about how IDE’s suck and that the command-line is the tool of choice for functional programmers. Sorry, but what has tooling got to do with language paradigms?

So, instead of staying focused and finishing early on a high note and therefore providing us with some great material that we can promote to others, I find myself opposing it. I want to promote good articles about functional programming but not at the expense of myself appearing to side with some Ivory Tower attitude. I once tried to pass on the post but found myself having explain why they should ignore the drivel after paragraph 7.

Hopefully, they’ll refactor the blog post and siphon off the rant so that the good content is left to stand on its own.

The Perils of Using -Filter With Get-ChildItem

I was writing a PowerShell script to process some dated files in a folder and I was bemused when I discovered my script was picking up files that didn’t appear to match the very specific filter I had used. If you want to play along at home the following commands will create two empty files with very similar names to the ones that confused me (it was also a backed up file that got picked up by mistake):

> mkdir C:\Temp\Test-GCI
> echo. 2>C:\Temp\Test-GCI\File-20140521.txt
> echo. 2>C:\Temp\Test-GCI\File-20140521-Backup.txt

Hopefully you’ll  agree with me that the following attempt to match files in the test folder should only match a single file, right? After all the mask only uses the ? character which matches a single character, unlike * which can match many.

> PowerShell “Get-ChildItem -Path C:\Temp\Test-GCI
  -Filter File-????????.txt | select Name”

Name
----
File-20140521.txt
File-20140521-Backup.txt

Eh? That can’t be right. So I started doing some googling and came across some StackOverflow posts like this one which mentions that the -Filter switch behaves differently to the -Like operator. The Get-ChildItem documentation tells you that -Filter is probably more efficient but the semantics are those of the underlying provider, not PowerShell’s. Doing a “dir File-????????.txt” gives the same unexpected result which ties up with the PowerShell documentation.

The solution seems to be to include the file mask in the -Path argument instead of using the separate -Filter switch:

> PowerShell “Get-ChildItem -Path (Join-Path C:\Temp\Test-GCI File-????????.txt) | select Name”

Name
----
File-20140521.txt

OK, problem solved. But what’s curious here is that it doesn’t match what you get if you do “dir C:\Temp\Test-GCI\File-????????.txt” which is an interesting inconsistency that might trip you up if you’re going the other way round (testing with dir and then using the pattern with Get-ChildItem).

If you want to know why the native mask behaves like it does then you need to read Raymond Chen’s 2007 blog post “How did wildcards work in MS-DOS?”.

Tuesday 20 May 2014

Developing With Git / Pushing to TFS

My current project is in the somewhat bizarre position of having to move from Git to TFS. Let’s not dwell on why enterprises make such ridiculous demands and instead focus on what we can do about it. In an ideal world the future would already be here and we’d be using VS2013 and the TFS servers would be the edition that supports Git natively (also 2013 IIRC). Sadly my client uses neither of those; we are using VS2010 and TFS 2010 which means we needed to find some kind of bridge, ala Git/SVN.

I discovered there are two main choices available - Git-TF and Git-TFS. From what I read on StackOverflow and various other blogs Git-TF is pretty much out of the picture these days now that Microsoft are embracing Git themselves. Git-TFS on the other is still being actively maintained and has its own Google Group too which receives some TLC.

Depending on your expectations and how you intend to use it you may find Git-TFS works great, or like me you may find there are enough quirks to cause setting things up to be quite time consuming. The “cloning from TFS” scenario appears to be well catered for as this follows the existing Git/SVN model. The scenario we needed was to import our GitHub repo into a fresh TFS repo and then to continually import changes from the GutHub repo to TFS in the background, ideally automatically as part of our CI process.

This post mostly tackles the initial import as it’s taken some time just to get this working. At the moment the background refresh is done manually once a week until I can work out how to script it reliably.

Machine Setup

This was intended to be a fully automated process and so the first hurdle I ran into was accessing TFS from the build machine where the future Jenkins job would run. We already had the Git binaries installed but it’s worth noting that you will get a warning about Git v1.8.4, which we were using so I upgraded it.

There is no simple binaries package with just the TF.exe command line tool, but there is a good blog post that tells you how to get at the necessary minimum bits and pieces from the Team Explorer .ISO image for CI use. At least you don’t have install the whole of Visual Studio just to do it.

Once I could “ping” the TFS servers with TF.exe I stored the login and password for accessing TFS in the Windows Credential Manager by following the instructions in this blog post. Note that this implies your machine is running something a little more modern than Windows XP/Server 2003.

With both the Git and TFS binaries installed I created a folder on the server for the git-tfs binaries, any scripts and the Git repo that would be used as the bridge:

> mkdir D:\git-tfs
> cd /d D:\git-tfs
> mkdir bin

I copied the git-tfs 0.19.2 binaries into the bin folder and then created a little batch file to adjust the PATH so that I would have the Git, TFS and Git-TFS binaries all easily accessible (I called it SetPath.cmd):

@set PATH=%PATH%;^
D:\git-tfs\bin;^
C:\Program Files (x86)\Git\bin;^
C:\Program Files (x86)\Microsoft Visual Studio 10.0\Common7\IDE

With the tooling in place we’re ready to create the TFS and Git repos.

Create the TFS Repo

First we create a folder to house the “bridge” repo and map the folder for TFS to use:

> mkdir repo
> tf workspace /new
  /collection:http://tfs.example.org/tfs/MyCollection
  /noprompt
> tf workfold /map $/Project D:\git-tfs\repo

Next we create a folder for the working copy/branch and add a single empty file to act as the seed for Git-TFS (see later for more details).

> cd repo
> mkdir master
> echo. 2>master\seed-git-tfs

> tf add master
> tf add master\seed-git-tfs
> tf checkin /comment:"seed commit for git-tfs" /noprompt

I chose to name the root folder “master” to reflect that fact that it’s coming from the master branch in the GitHub repo. Then I removed all trace of the folder used to initialise the TFS side:

> tf workspace /delete my_workspace /noprompt
> del /f master\seed-git-tfs
> rmdir master

The final step is to convert the folder (called “master” here) to a branch in TFS. There is no command-line support in the standard TF.exe tool to convert a folder to a branch so I used the GUI, but I believe it might be possible using some TFS power tools.

With the TFS repo primed we can now switch to the Git side.

Create the Git “Bridge” Repo

We create the Git repo by cloning the TFS branch that we just created. Here I’ve chosen to name the Git repo folder “master” as well:

> git-tfs clone http://tfs.example.org/tfs/MyCollection $/Project/master master

This will create the repo, set up a remote for TFS and pull down the single commit we made earlier so that the master branch will end up with a single, empty file in the root. Next we attach the repo to our Git source, which in my case was a GitHub repo:

> cd master

At this point we have one remote branch called tfs/default which contains the TFS side we want to push to, and another called github/master which is the source where we are currently developing.

Import the Initial Commits

Assuming, like me, that you’ve already been working for some time with your Git repo you’ll need to push across the initial set of commits to date. To do that we do a pull, but also need to rebase our work on top of the tfs/default branch which has our TFS seed commit in it:

> git pull github master
> git rebase tfs/default

With the master branch now containing all our upstream commits (plus the TFS seed commit) we can push the lot to TFS:

> git-tfs rcheckin --quick

Using rcheckin will ensure that we get one commit in TFS for each commit in Git, i.e. the histories should match (issues from the rebase process notwithstanding). Git-TFS will be conservative by default and assume that something might change at the TFS end whilst the push is happening, however if you know that no one will be contributing at that end you can add the --quick switch which makes a huge difference to performance.

See the notes later if you want to know why the rebase is necessary.

Importing Subsequent Git Commits

At this point we have a TFS repo mirroring our Git source repo. However we are constantly adding to the Git repo and so we need to continually push any changes to TFS in the background. This part I was hoping to automate with a CI task but so far I’ve only done it manually a few times as we don’t need to be in sync all the time.

Due to the rebase that we had to do earlier the master and remotes/github/master branches share no common ancestry. Although content wise they should be identical (aside from the initial TFS commit at the tail) the SHA-1 also contains the ancestry details in it which is subtlety different only right at the very start. Consequently the SHA-1’s for the same commits on the two branches don’t match. This means we need to manually work out which are the more recent commits from the upstream repo and use the cherry-pick command to play them on top of the HEAD of master.

> cd /d D:\git-tfs\repo\master
> git fetch github
> git log -n 1 --pretty=oneline master
<SHA-1> <commit message of last one pushed to TFS >
> git log --pretty=oneline remotes/github/master | findstr /c:”<message>”
<SHA-1> <matching commit message in upstream repo>
> git log -n 1 --pretty=oneline remotes/github/master
<SHA-1> <commit message of current head of upstream changes>
> git cherry-pick <SHA-1 of last upstream commit pushed>..<SHA-1 of upstream HEAD>
> git-tfs rcheckin --quick

The SHA-1 range provided to the cherry-pick command is the half-open range that is exclusive of the earliest commit, but inclusive of the latest.

As I said above I have not automated this, mostly because I need to find a better way to identify the HEAD of master in the remotes branch as just relying on the commit message feels far too brittle even though we generally write fairly decent commit messages. Also the frequency we need to do this is far from being time-consuming.

Additional Notes

I spent quite a bit of time trying to get Git-TFS to work and along the way I bumped into a number of problems. I’m willing to accept that many of these are entirely my own fault and it’s quite possible there is an easy explanation, but given the amount of Googling I did, if there is, I’m afraid it wasn’t obvious to me. I’m neither an expert in Git or TFS either and so some problems may be entirely down to my inexperience. Hopefully the comments section of this post will be put to good use to point to the correct fu.

Hence the following sections are me trying to explain what I experienced and how that has affected the process I eventually arrived at above.

TFS Branch

I started out trying to use Git-TFS without there being any branches in TFS and I just got what I thought was a benign warning about “No TFS parents found!” when I cloned it. However, I couldn’t get Git-TFS to work at all without there being a branch in TFS and it wasn’t obvious to me that this might be a significant problem. I don’t know if you’re supposed to be able to use Git-TFS without there being at least one branch in TFS but the people I spoke to have one and therefore so do I.

The TFS Seed Commit

When you clone a TFS repo, if it doesn’t have at least one commit Git-TFS will just say there is nothing to do when try and rcheckin. Git-TFS appears to track the TFS ChangeSet number for each Git commit as it pushes it so that it can detect changes on the TFS side. If there is no ChangeSet anywhere in the history Git-TFS does not seem to be happy.

The Rebase

When I first brought in the upstream repo I naturally did a pull which merged master (containing my seed TFS commit) and the entire set of upstream commits. When I came to execute rcheckin it failed with this error:

Fetching changes from TFS to minimize possibility of late conflict...
Working on the merge commit: 11a00c52c2b54657220862d63b315ffeb80010b6
Sequence contains no elements

For some reason Git-TFS was ignoring all the commits in the merge that came from upstream. When I rebased master rcheckin was happy to push the changes to TFS.

We had tried to import another project before this one and had terrible problems with merge commits. That Git repo had a number of merge commits in and they would all have had to be resolved as part of the rebase, so we ditched the idea. With the current project we decided up front (on the suspicion that TFS would enter the frame again) that we would only ever use rebase when pulling. A couple of merge commits did slip in but there were no conflicts to resolve and so that has not been an issue this time around.

Once again I am unsure whether this is supposed to work or not.

Using Git Graft/Replace

This lead me to question whether there was another way to “fake” history by trying to pretend that the upstream commits were really rooted on top of the TFS seed commit. For this I investigated the use of grafts, and subsequently the newer “replace” command. After identifying the TFS and Git initial commits I tried to re-parent the Git commit onto the TFS commit like so:-

(A) TFS Seed: 62cb57f2422fca676055d35ed4d53fba187acac1
(B) Git Seed: c9bafc5f56b20b69dddca1b98449aceb96426c80
 
> git checkout c9bafc5~0
> git reset --soft 62cb57f
> git commit -C c9bafc5
> git replace c9bafc5 HEAD
> git checkout -

All appeared to go OK and looking at the history with TortoiseGit it seemed just as I would have expected. However when I ran git-tfs rcheckin I got the following error:

latest TFS commit should be parent of commits being checked in

I can’t tell if my grafting was incorrect or whether Git-TFS uses some technique to walk the parentage that is foiled by this hack.

LF Line Endings in TFS

After all this hard work getting my code into TFS with a history to match our Git repo, I was saddened to discover that when I mapped a workspace in TFS and got the latest version, that our entire codebase had LF-style line endings. Whilst far from disastrous I was surprised that it had happened because we specifically set autocrlf=true in our Git repo because we’re working exclusively on Windows. My flawed assumption was that the code in TFS (which does not have this line-ending mapping concept) would match whatever the “bridge” Git repo was configured with.

I thought I’d made a mistake and should have specified autocrlf=true when cloning the TFS repo with Git-TFS, but that turned out to make no difference. Fortunately there was an existing Google Groups thread that discussed this behaviour and it appeared I was not the only one who was suffering from this.

In short, the default behaviour of Git-TFS is to push the Git content as-is to the TFS repo. This means that if you use autocrlf=true, which maps line-endings to/from the Git working copy as CR/LF but stores them internally in Git as LF, then your TFS repo will end up with LF line endings. The only way to get CR/LF line endings in the TFS repo is to use autocrlf=false which means that Git stores content as-is (which would usually be CR/LF with Windows tools).

If you read that Google Groups thread you’ll see that @solpolenious has forked Git-TFS and added a small change that translates the line endings for text files so that they appear correctly in TFS. I have not personally tried this fork as I discovered it too late in the day, but it would be good if could be enabled, perhaps by a command-line switch, so that those of us who started out with the wrong autocrlf setting from the outset still have a workaround.

Work Item Mapping

The one final thing that tripped us up was a change in v0.19.2 that tries to match commits to TFS work items by parsing the commit messages looking for “#<number>” tags. We use these, but to associate the commit with our Trello cards, not TFS work items. This is not configurable in Git-TFS 0.19.2, however there is a change that was pulled months ago that should appear in the next release which allows you to configure the regex used to do the matching. I didn’t want the matching to occur at all so I set it to something I knew wouldn’t appear (it’s also the example used in the Git-TFS docs):

> git config git-tfs.work-item-regex "workitem #(?<item_id>\d+)"

Conclusion

It might seem that I’m being overly critical of Git-TFS, but that’s really not my intention. I started out with a goal of automating a solution for pushing any changes to our GitHub repo into another TFS one, with full history intact. Whilst I’ve fallen a little short of that goal I have saved myself the huge amount of time it would have taken me to manually reconstruct the project history in TFS! Thanks Git-TFS.

Terminology Abuse - Parameters vs Arguments

In my recent rant about the use of the term “injection” in software development to describe what is really just passing arguments, I slipped up. You can be sure that any blog post that attempts to complain about the (mis)use of terminology is almost certainly going to suffer from it itself, and that post was no exception. Fortunately there are people out there who are all too willing to point this out, and as someone who strives hard to try and use the right words for the right concepts I’m more than happy to be re-educated.

Parameters != Arguments

In my blog post I suggested that the term “injection” was synonymous with passing parameters or arguments and did the usual thing of citing the Wikipedia page on the subject. I had (skim) read that page and had deduced from reading the following sentence that the two terms were interchangeable:-

“These two terms parameter and argument are sometimes loosely used interchangeably...”

This, as @aral kindly pointed out to me is not actually the case. What the Wikipedia page should probably have said (for those of us lazy readers) is this:-

“These two terms parameter and argument are often incorrectly used interchangeably...”

This would (hopefully) have caused me to read it properly [*] and discover that they are two different sides of the same coin. The two terms describe the same concept, but from two different viewpoints - the caller and the callee. The caller passes values to a function described by the term “arguments”, whereas the calling function receives those same values described by the term “parameters”.

My follow-up tweet which attempted to consolidate this simplification into just 140 characters got favourited by @aral and so I take that as a thumbs up that I’ve now finally understood the difference.

 

[*] The funny thing about reading a page like this when you have a pre-conceived notion of what (you think) it says is that you completely ignore what it is really telling you. Once I understood there really was a difference and went back and read the Wikipedia page again I couldn’t fail to notice that the difference is there, plain as day, all throughout the page! I even went back and checked the page history in case it had been edited recently to make it clearer, sadly for me it’s always been that clear.