Tuesday, 5 August 2014

In The Toolbox - Season One

I never expected to write a column in a programming journal but somehow that seems to have happened. I started the “In The Toolbox” column in the ACCU’s C Vu journal in a desire to inspire other programmer’s to write about the tools they use to do their job. Naturally I wanted to lead by example and somehow I’ve managed to write the first six instalments, instead of just the odd one or two. Although the best way to access is the content is to become a member of the ACCU, I have collated them here now for convenience.

0: Introduction

An outline of what the column was intended for.

1: Team Chat

A piece on the various chat/discussion systems I’ve used in my programming endeavours. It extends what I started out writing about in one of my earliest blog posts “Company-Wide IRC Style Chat & Messaging”.

2: Wrapper Scripts

Examples of how I’ve used scripting languages to create simple scripts that hide some of the complexity that occurs when stitching together various tools to do build, testing, etc.

3: Pen & Paper

My lo-tech solution to keeping track of what I’m supposed to be doing on a day-to-day basis.

4: Static Code Analysis

I like to lean heavily on static code analysis tools when writing code as I believe they help me stay clear of a lot of common pitfalls that are valid language constructs, but can also be used all too dangerously. I’ve touched on some of this before too in a very early blog post “Where Are the 'Lite' Editions of Static Code Analysis Tools?”.

5: Social Networking

I’m a Twitter addict; this article was about how I use social networking in my professional programming career.

6: Software Archaeology

The other tool I also lean heavily on is the Version Control System as I like to look back over a project’s changes to help my understand why a particular change might have been made.

Thursday, 22 May 2014

Promoting Functional Programming - A Missed Opportunity

One of the blog posts that seemed to get a fair bit of attention on Twitter a little while back was “Functional programming is a ghetto” by Michael O. Church. Whilst I really enjoyed the first 6-7 paragraphs, the rest just raised my ire. This post, for me, was a classic violation of the Separation of Concerns, as applied to writing. But more than that I felt the author had shot themselves in the foot by missing out on a great opportunity to help promote the virtues of functional programming techniques.

If you’ve read the post you’ll see that the early paragraphs are about how they apply aspects of functional programming alongside other paradigms to try and get the best of all worlds. But then for some reason he’s starts talking about how IDE’s suck and that the command-line is the tool of choice for functional programmers. Sorry, but what has tooling got to do with language paradigms?

So, instead of staying focused and finishing early on a high note and therefore providing us with some great material that we can promote to others, I find myself opposing it. I want to promote good articles about functional programming but not at the expense of myself appearing to side with some Ivory Tower attitude. I once tried to pass on the post but found myself having explain why they should ignore the drivel after paragraph 7.

Hopefully, they’ll refactor the blog post and siphon off the rant so that the good content is left to stand on its own.

The Perils of Using -Filter With Get-ChildItem

I was writing a PowerShell script to process some dated files in a folder and I was bemused when I discovered my script was picking up files that didn’t appear to match the very specific filter I had used. If you want to play along at home the following commands will create two empty files with very similar names to the ones that confused me (it was also a backed up file that got picked up by mistake):

> mkdir C:\Temp\Test-GCI
> echo. 2>C:\Temp\Test-GCI\File-20140521.txt
> echo. 2>C:\Temp\Test-GCI\File-20140521-Backup.txt

Hopefully you’ll  agree with me that the following attempt to match files in the test folder should only match a single file, right? After all the mask only uses the ? character which matches a single character, unlike * which can match many.

> PowerShell “Get-ChildItem -Path C:\Temp\Test-GCI
  -Filter File-????????.txt | select Name”

Name
----
File-20140521.txt
File-20140521-Backup.txt

Eh? That can’t be right. So I started doing some googling and came across some StackOverflow posts like this one which mentions that the -Filter switch behaves differently to the -Like operator. The Get-ChildItem documentation tells you that -Filter is probably more efficient but the semantics are those of the underlying provider, not PowerShell’s. Doing a “dir File-????????.txt” gives the same unexpected result which ties up with the PowerShell documentation.

The solution seems to be to include the file mask in the -Path argument instead of using the separate -Filter switch:

> PowerShell “Get-ChildItem -Path (Join-Path C:\Temp\Test-GCI File-????????.txt) | select Name”

Name
----
File-20140521.txt

OK, problem solved. But what’s curious here is that it doesn’t match what you get if you do “dir C:\Temp\Test-GCI\File-????????.txt” which is an interesting inconsistency that might trip you up if you’re going the other way round (testing with dir and then using the pattern with Get-ChildItem).

If you want to know why the native mask behaves like it does then you need to read Raymond Chen’s 2007 blog post “How did wildcards work in MS-DOS?”.

Tuesday, 20 May 2014

Developing With Git / Pushing to TFS

My current project is in the somewhat bizarre position of having to move from Git to TFS. Let’s not dwell on why enterprises make such ridiculous demands and instead focus on what we can do about it. In an ideal world the future would already be here and we’d be using VS2013 and the TFS servers would be the edition that supports Git natively (also 2013 IIRC). Sadly my client uses neither of those; we are using VS2010 and TFS 2010 which means we needed to find some kind of bridge, ala Git/SVN.

I discovered there are two main choices available - Git-TF and Git-TFS. From what I read on StackOverflow and various other blogs Git-TF is pretty much out of the picture these days now that Microsoft are embracing Git themselves. Git-TFS on the other is still being actively maintained and has its own Google Group too which receives some TLC.

Depending on your expectations and how you intend to use it you may find Git-TFS works great, or like me you may find there are enough quirks to cause setting things up to be quite time consuming. The “cloning from TFS” scenario appears to be well catered for as this follows the existing Git/SVN model. The scenario we needed was to import our GitHub repo into a fresh TFS repo and then to continually import changes from the GutHub repo to TFS in the background, ideally automatically as part of our CI process.

This post mostly tackles the initial import as it’s taken some time just to get this working. At the moment the background refresh is done manually once a week until I can work out how to script it reliably.

Machine Setup

This was intended to be a fully automated process and so the first hurdle I ran into was accessing TFS from the build machine where the future Jenkins job would run. We already had the Git binaries installed but it’s worth noting that you will get a warning about Git v1.8.4, which we were using so I upgraded it.

There is no simple binaries package with just the TF.exe command line tool, but there is a good blog post that tells you how to get at the necessary minimum bits and pieces from the Team Explorer .ISO image for CI use. At least you don’t have install the whole of Visual Studio just to do it.

Once I could “ping” the TFS servers with TF.exe I stored the login and password for accessing TFS in the Windows Credential Manager by following the instructions in this blog post. Note that this implies your machine is running something a little more modern than Windows XP/Server 2003.

With both the Git and TFS binaries installed I created a folder on the server for the git-tfs binaries, any scripts and the Git repo that would be used as the bridge:

> mkdir D:\git-tfs
> cd /d D:\git-tfs
> mkdir bin

I copied the git-tfs 0.19.2 binaries into the bin folder and then created a little batch file to adjust the PATH so that I would have the Git, TFS and Git-TFS binaries all easily accessible (I called it SetPath.cmd):

@set PATH=%PATH%;^
D:\git-tfs\bin;^
C:\Program Files (x86)\Git\bin;^
C:\Program Files (x86)\Microsoft Visual Studio 10.0\Common7\IDE

With the tooling in place we’re ready to create the TFS and Git repos.

Create the TFS Repo

First we create a folder to house the “bridge” repo and map the folder for TFS to use:

> mkdir repo
> tf workspace /new
  /collection:http://tfs.example.org/tfs/MyCollection
  /noprompt
> tf workfold /map $/Project D:\git-tfs\repo

Next we create a folder for the working copy/branch and add a single empty file to act as the seed for Git-TFS (see later for more details).

> cd repo
> mkdir master
> echo. 2>master\seed-git-tfs

> tf add master
> tf add master\seed-git-tfs
> tf checkin /comment:"seed commit for git-tfs" /noprompt

I chose to name the root folder “master” to reflect that fact that it’s coming from the master branch in the GitHub repo. Then I removed all trace of the folder used to initialise the TFS side:

> tf workspace /delete my_workspace /noprompt
> del /f master\seed-git-tfs
> rmdir master

The final step is to convert the folder (called “master” here) to a branch in TFS. There is no command-line support in the standard TF.exe tool to convert a folder to a branch so I used the GUI, but I believe it might be possible using some TFS power tools.

With the TFS repo primed we can now switch to the Git side.

Create the Git “Bridge” Repo

We create the Git repo by cloning the TFS branch that we just created. Here I’ve chosen to name the Git repo folder “master” as well:

> git-tfs clone http://tfs.example.org/tfs/MyCollection $/Project/master master

This will create the repo, set up a remote for TFS and pull down the single commit we made earlier so that the master branch will end up with a single, empty file in the root. Next we attach the repo to our Git source, which in my case was a GitHub repo:

> cd master

At this point we have one remote branch called tfs/default which contains the TFS side we want to push to, and another called github/master which is the source where we are currently developing.

Import the Initial Commits

Assuming, like me, that you’ve already been working for some time with your Git repo you’ll need to push across the initial set of commits to date. To do that we do a pull, but also need to rebase our work on top of the tfs/default branch which has our TFS seed commit in it:

> git pull github master
> git rebase tfs/default

With the master branch now containing all our upstream commits (plus the TFS seed commit) we can push the lot to TFS:

> git-tfs rcheckin --quick

Using rcheckin will ensure that we get one commit in TFS for each commit in Git, i.e. the histories should match (issues from the rebase process notwithstanding). Git-TFS will be conservative by default and assume that something might change at the TFS end whilst the push is happening, however if you know that no one will be contributing at that end you can add the --quick switch which makes a huge difference to performance.

See the notes later if you want to know why the rebase is necessary.

Importing Subsequent Git Commits

At this point we have a TFS repo mirroring our Git source repo. However we are constantly adding to the Git repo and so we need to continually push any changes to TFS in the background. This part I was hoping to automate with a CI task but so far I’ve only done it manually a few times as we don’t need to be in sync all the time.

Due to the rebase that we had to do earlier the master and remotes/github/master branches share no common ancestry. Although content wise they should be identical (aside from the initial TFS commit at the tail) the SHA-1 also contains the ancestry details in it which is subtlety different only right at the very start. Consequently the SHA-1’s for the same commits on the two branches don’t match. This means we need to manually work out which are the more recent commits from the upstream repo and use the cherry-pick command to play them on top of the HEAD of master.

> cd /d D:\git-tfs\repo\master
> git fetch github
> git log -n 1 --pretty=oneline master
<SHA-1> <commit message of last one pushed to TFS >
> git log --pretty=oneline remotes/github/master | findstr /c:”<message>”
<SHA-1> <matching commit message in upstream repo>
> git log -n 1 --pretty=oneline remotes/github/master
<SHA-1> <commit message of current head of upstream changes>
> git cherry-pick <SHA-1 of last upstream commit pushed>..<SHA-1 of upstream HEAD>
> git-tfs rcheckin --quick

The SHA-1 range provided to the cherry-pick command is the half-open range that is exclusive of the earliest commit, but inclusive of the latest.

As I said above I have not automated this, mostly because I need to find a better way to identify the HEAD of master in the remotes branch as just relying on the commit message feels far too brittle even though we generally write fairly decent commit messages. Also the frequency we need to do this is far from being time-consuming.

Additional Notes

I spent quite a bit of time trying to get Git-TFS to work and along the way I bumped into a number of problems. I’m willing to accept that many of these are entirely my own fault and it’s quite possible there is an easy explanation, but given the amount of Googling I did, if there is, I’m afraid it wasn’t obvious to me. I’m neither an expert in Git or TFS either and so some problems may be entirely down to my inexperience. Hopefully the comments section of this post will be put to good use to point to the correct fu.

Hence the following sections are me trying to explain what I experienced and how that has affected the process I eventually arrived at above.

TFS Branch

I started out trying to use Git-TFS without there being any branches in TFS and I just got what I thought was a benign warning about “No TFS parents found!” when I cloned it. However, I couldn’t get Git-TFS to work at all without there being a branch in TFS and it wasn’t obvious to me that this might be a significant problem. I don’t know if you’re supposed to be able to use Git-TFS without there being at least one branch in TFS but the people I spoke to have one and therefore so do I.

The TFS Seed Commit

When you clone a TFS repo, if it doesn’t have at least one commit Git-TFS will just say there is nothing to do when try and rcheckin. Git-TFS appears to track the TFS ChangeSet number for each Git commit as it pushes it so that it can detect changes on the TFS side. If there is no ChangeSet anywhere in the history Git-TFS does not seem to be happy.

The Rebase

When I first brought in the upstream repo I naturally did a pull which merged master (containing my seed TFS commit) and the entire set of upstream commits. When I came to execute rcheckin it failed with this error:

Fetching changes from TFS to minimize possibility of late conflict...
Working on the merge commit: 11a00c52c2b54657220862d63b315ffeb80010b6
Sequence contains no elements

For some reason Git-TFS was ignoring all the commits in the merge that came from upstream. When I rebased master rcheckin was happy to push the changes to TFS.

We had tried to import another project before this one and had terrible problems with merge commits. That Git repo had a number of merge commits in and they would all have had to be resolved as part of the rebase, so we ditched the idea. With the current project we decided up front (on the suspicion that TFS would enter the frame again) that we would only ever use rebase when pulling. A couple of merge commits did slip in but there were no conflicts to resolve and so that has not been an issue this time around.

Once again I am unsure whether this is supposed to work or not.

Using Git Graft/Replace

This lead me to question whether there was another way to “fake” history by trying to pretend that the upstream commits were really rooted on top of the TFS seed commit. For this I investigated the use of grafts, and subsequently the newer “replace” command. After identifying the TFS and Git initial commits I tried to re-parent the Git commit onto the TFS commit like so:-

(A) TFS Seed: 62cb57f2422fca676055d35ed4d53fba187acac1
(B) Git Seed: c9bafc5f56b20b69dddca1b98449aceb96426c80
 
> git checkout c9bafc5~0
> git reset --soft 62cb57f
> git commit -C c9bafc5
> git replace c9bafc5 HEAD
> git checkout -

All appeared to go OK and looking at the history with TortoiseGit it seemed just as I would have expected. However when I ran git-tfs rcheckin I got the following error:

latest TFS commit should be parent of commits being checked in

I can’t tell if my grafting was incorrect or whether Git-TFS uses some technique to walk the parentage that is foiled by this hack.

LF Line Endings in TFS

After all this hard work getting my code into TFS with a history to match our Git repo, I was saddened to discover that when I mapped a workspace in TFS and got the latest version, that our entire codebase had LF-style line endings. Whilst far from disastrous I was surprised that it had happened because we specifically set autocrlf=true in our Git repo because we’re working exclusively on Windows. My flawed assumption was that the code in TFS (which does not have this line-ending mapping concept) would match whatever the “bridge” Git repo was configured with.

I thought I’d made a mistake and should have specified autocrlf=true when cloning the TFS repo with Git-TFS, but that turned out to make no difference. Fortunately there was an existing Google Groups thread that discussed this behaviour and it appeared I was not the only one who was suffering from this.

In short, the default behaviour of Git-TFS is to push the Git content as-is to the TFS repo. This means that if you use autocrlf=true, which maps line-endings to/from the Git working copy as CR/LF but stores them internally in Git as LF, then your TFS repo will end up with LF line endings. The only way to get CR/LF line endings in the TFS repo is to use autocrlf=false which means that Git stores content as-is (which would usually be CR/LF with Windows tools).

If you read that Google Groups thread you’ll see that @solpolenious has forked Git-TFS and added a small change that translates the line endings for text files so that they appear correctly in TFS. I have not personally tried this fork as I discovered it too late in the day, but it would be good if could be enabled, perhaps by a command-line switch, so that those of us who started out with the wrong autocrlf setting from the outset still have a workaround.

Work Item Mapping

The one final thing that tripped us up was a change in v0.19.2 that tries to match commits to TFS work items by parsing the commit messages looking for “#<number>” tags. We use these, but to associate the commit with our Trello cards, not TFS work items. This is not configurable in Git-TFS 0.19.2, however there is a change that was pulled months ago that should appear in the next release which allows you to configure the regex used to do the matching. I didn’t want the matching to occur at all so I set it to something I knew wouldn’t appear (it’s also the example used in the Git-TFS docs):

> git config git-tfs.work-item-regex "workitem #(?<item_id>\d+)"

Conclusion

It might seem that I’m being overly critical of Git-TFS, but that’s really not my intention. I started out with a goal of automating a solution for pushing any changes to our GitHub repo into another TFS one, with full history intact. Whilst I’ve fallen a little short of that goal I have saved myself the huge amount of time it would have taken me to manually reconstruct the project history in TFS! Thanks Git-TFS.

Terminology Abuse - Parameters vs Arguments

In my recent rant about the use of the term “injection” in software development to describe what is really just passing arguments, I slipped up. You can be sure that any blog post that attempts to complain about the (mis)use of terminology is almost certainly going to suffer from it itself, and that post was no exception. Fortunately there are people out there who are all too willing to point this out, and as someone who strives hard to try and use the right words for the right concepts I’m more than happy to be re-educated.

Parameters != Arguments

In my blog post I suggested that the term “injection” was synonymous with passing parameters or arguments and did the usual thing of citing the Wikipedia page on the subject. I had (skim) read that page and had deduced from reading the following sentence that the two terms were interchangeable:-

“These two terms parameter and argument are sometimes loosely used interchangeably...”

This, as @aral kindly pointed out to me is not actually the case. What the Wikipedia page should probably have said (for those of us lazy readers) is this:-

“These two terms parameter and argument are often incorrectly used interchangeably...”

This would (hopefully) have caused me to read it properly [*] and discover that they are two different sides of the same coin. The two terms describe the same concept, but from two different viewpoints - the caller and the callee. The caller passes values to a function described by the term “arguments”, whereas the calling function receives those same values described by the term “parameters”.

My follow-up tweet which attempted to consolidate this simplification into just 140 characters got favourited by @aral and so I take that as a thumbs up that I’ve now finally understood the difference.

 

[*] The funny thing about reading a page like this when you have a pre-conceived notion of what (you think) it says is that you completely ignore what it is really telling you. Once I understood there really was a difference and went back and read the Wikipedia page again I couldn’t fail to notice that the difference is there, plain as day, all throughout the page! I even went back and checked the page history in case it had been edited recently to make it clearer, sadly for me it’s always been that clear.

Friday, 25 April 2014

Leaky Abstractions: Querying a TIBCO Queue’s Pending Message Count

My team recently had a stark reminder about the Law of Leaky Abstractions. I wasn’t directly involved but feel compelled to pass the details on given the time we haemorrhaged [1] tracking it down...

Pending Messages

If you google how to find the number of messages that are pending in a TIBCO queue you’ll probably see a snippet of code like this:-

var admin = new Admin(hostname, login, password);
var queueInfo = admin.GetQueue(queueName);
var count = queueInfo.PendingMessageCount;

In fact I’m sure that’s how we came by this code ourselves as we didn’t have any formal documentation to work off right at the very start as this was only going in some test code. In case you're curious we were writing some acceptance tests and in the test code we needed to wait for the queue pending message count to drop to zero as a sign that the messages had been processed and we could safely perform our other asserts to very the behaviour.

Queue Monitoring

So far so good. The test code had been working flawlessly for months but then we started writing a new component. This time we needed to batch message handling up so instead of processing the stream as fast as possible we needed to wait for “a large enough batch”, then pull the messages and aggregate them. We could have buffered the messages in our own process but it felt safer to leave them in the durable storage for as long as possible as there was no two-phase commit going on.

The new acceptance tests were written in a similar style to before but the production code didn’t seem to be working properly - the pending message count always seemed to return 0. The pending message count check was just one of a number of triggers that could start the batch processing and so the code lived in a class something like this:-

public class QueueCountTrigger : IBatchTrigger
{
  public QueueCountTrigger(...)
  {
    _admin = new Admin(hostname, login, password);
    _queueInfo = _admin.GetQueue(queueName);
    _triggerLevel = triggerLevel;
  }

  public bool IsTriggered()
  {
    return (queueInfo.PendingMessageCount >=
            _triggerLevel);
  }

  private Admin _admin;
  private QueueInfo _queueInfo;
  private int _triggerLevel;
}

Granted polling is less than ideal but it would serve our purpose initially. This behaviour was most curious because as we saw it similar code had been working fine in the old acceptance tests for months.

The Leaky Abstraction

Eventually one of the team started poking around under the hood with a disassembler and everything began to fall into place. The QueueInfo object returned from the Admin type was just a snapshot of the queue. When subsequent attempts were made to query the PendingMessageCount property it was just returning a cached value. Because the service always started first, after the queue had been purged, it never saw the count change.

Looking at the TIBCO documentation for the classes (and method) in question you’d struggle to find anything that suggests you can’t hold on to the QueueInfo object and get real-time updates of the various queue attributes. Perhaps the “Info” suffix is supposed to be a clue? In retrospect perhaps QueueSnapshot would be a better name? Or maybe it’s documented clearly elsewhere in some kind of design rationale that you’re supposed to read up front?

I can’t remember if it was Steve Maguire in Writing Solid Code or Steve McConnell in Code Complete, but I’m sure one of them suggested that there is no such thing as a black box. Whilst the documentation may describe the interface there are often implementation details, such as performance or caching effects, that are left unspecified and eventually you’ll need to open the box to find out what’s really going on when it doesn’t behave as you would like in your scenario.

Back to the Tests

Looking more closely at the original code for the acceptance tests it made a sub-optimal call which opened a connection every time the message count was requested (just as the example code at the top would do):-

public int GetPendingMessageCount(...)
{
  var admin = new Admin(hostname, login, password);
  var queueInfo = admin.GetQueue(queueName);

  return queueInfo.PendingMessageCount;
}

This was later changed to an inline call, but re-fetched the QueueInfo snapshot the end of every loop iteration. Sadly the author of this change is no longer around so it’s hard to say if this was done after going through the same discovery exercise above, by basing it on a better example from somewhere else, prior knowledge of the problem, or just out-and-out luck.

public int WaitForQueueToEmpty(...)
{
  var admin = new Admin(hostname, login, password);
  var queueInfo = admin.GetQueue(queueName);

  while (queueInfo.PendingMessageCount > 0)
  {
    // Other timeout handling code
    . . .
    queueInfo = admin.GetQueue(queueName);
  }
}

 

[1] Exaggeration used for dramatic effect.

Thursday, 24 April 2014

What’s the Right Size for a Commit?

Embedded image permalink

At end of my ACCU 2014 Conference talk this year on Version Control - Patterns & Practices an interesting question came up:-

What’s the right size for a commit?

My answer, which I’ll admit was mostly based on instinct and not through any prior reasoning, was this:-

Something that can be cherry picked.

So that was the short answer, this post is the longer one that leads me to that conclusion…

Single Responsibility Principle

I’ve written before (What’s the Check-In Frequency, Kenneth?) about the effects of fine-grained commits on an integration branch, suffice to say that they cause a lot of noise when you come back to the history to do a spot of software archaeology. In my opinion, if you want to check-in after you get every unit test passing then you should use some kind of private branch. The converse side would be to bundle up so much into a single commit that it would appear as the proverbial Big Ball of Mud.

In essence the advice I gave was nothing more than the application of the age old Single Responsibility Principle as applied to the problem of committing changes to a version control system.

Cherry Picking

The practice of cherry picking changes from one branch to another (often from development to release to squeeze one more feature in) has a bad reputation. I suspect the reason for this is largely down to the breakages and stress it’s caused from incorrectly trying to divorce one single “feature” from the other changes that got made at the same time. Or from not merging all the commits that make up the “logical set” for that feature.

Don’t get me wrong cherry picking is an ugly business that should be avoided if at all possible, but it has its uses and so my approach has always been to  ensure that my commits create small, consistent units of change. Of course I break the build too sometimes and consequently I might have the odd “stray” commit that fixes up the build, but by-and-large each commit should stand alone and add some “value”.

Feature Branches

I very rarely use feature branches because I dislike all the constant merging to and fro, but when I have to the merge at the end usually becomes a single commit to the integration branch. The exception is when I’ve also made a few other changes as a by-product. Whilst the main goal is to implement a specific feature (or fix a bug, etc.) when you’re working on a legacy codebase that lacks any automated test coverage it can save a fair bit of time if the cost of testing can be amortised across a number of changes. This means that during the final merge I need to initially cherry pick a number of other fixes first as individual commits, then merge the remainder (the core feature) as a single commit.

In the integration branch this translates to a series of commits, where each one corresponds to a single feature, it just so happens that they all came from the same source branch. Hence my view is that as long as the “observable outcome” is the same - small, feature-focused commits on the integration branch - it doesn’t really matter too much how they got there. Granted it makes reading the private branch is little more awkward in the future but I feel the saving in development time is often worth the cost.

Feature Toggles

My preference has generally been for continuous integration through the use of feature toggles. This makes integration easier but cherry-picking harder because the entire feature might be spaced out across a number of discrete commits. I often break a feature down into many smaller tasks that can be delivered to production as soon as possible. This means I generally start with any refactoring because, by definition, I cannot have made any observable changes. Next up is the new code, which, as long as I don’t provide a means to access it can also be deployed to production as a silent passenger. That just leaves the changed or deleted code which will start to have some noticeable impact, at least when it’s toggled on. Each one of these steps may be one or more commits depending on how big the feature is, but each one should still be a cohesive unit of work.

From a cherry-picking perspective what we need to know is all the individual commits that make up the entire feature. This is where I’ve found the practice of tagging each commit with the “change request number”. If you’re using something like JIRA then each feature will likely be represented in that system with a unique id, e.g. PROJ-123. Even Trello cards have a unique number that can be used. I usually prefix (rather then append) each commit with the change code as it makes them easy to see when scanning the VCS change log.

Theory and Practice

It might sound as though cherry-picking features is a regular affair because I pay more than lip-service to them. That’s really not the case, I will avoid them like everyone else, even if they appear to be easy. Unless a change is relatively trivial, e.g. a simple bug fix, it’s quite possible that some refactoring on the integration branch will muddy the waters enough to make not doing it a no-brainer anyway.

It’s hard to quantify how much value there would be in any arbitrary commit. If the entire change is a one-line bug fix it’s easier to determine how big the commit should be. When it’s a new feature that will involve the full suite of modifications - refactoring, adding, removing, updating - it can be harder to see how to break it down into smaller units. This is where I find the notion of software archaeology comes in handy because I project forward 12 months to my future self and look back at the commit and ask myself whether it makes sense.

 

Photo by Thaddaeus Frogley (@codemonkey_uk)