What’s the right size for a commit?
My answer, which I’ll admit was mostly based on instinct and not through any prior reasoning, was this:-
Something that can be cherry picked.
So that was the short answer, this post is the longer one that leads me to that conclusion…
Single Responsibility Principle
I’ve written before (What’s the Check-In Frequency, Kenneth?) about the effects of fine-grained commits on an integration branch, suffice to say that they cause a lot of noise when you come back to the history to do a spot of software archaeology. In my opinion, if you want to check-in after you get every unit test passing then you should use some kind of private branch. The converse side would be to bundle up so much into a single commit that it would appear as the proverbial Big Ball of Mud.
In essence the advice I gave was nothing more than the application of the age old Single Responsibility Principle as applied to the problem of committing changes to a version control system.
The practice of cherry picking changes from one branch to another (often from development to release to squeeze one more feature in) has a bad reputation. I suspect the reason for this is largely down to the breakages and stress it’s caused from incorrectly trying to divorce one single “feature” from the other changes that got made at the same time. Or from not merging all the commits that make up the “logical set” for that feature.
Don’t get me wrong cherry picking is an ugly business that should be avoided if at all possible, but it has its uses and so my approach has always been to ensure that my commits create small, consistent units of change. Of course I break the build too sometimes and consequently I might have the odd “stray” commit that fixes up the build, but by-and-large each commit should stand alone and add some “value”.
I very rarely use feature branches because I dislike all the constant merging to and fro, but when I have to the merge at the end usually becomes a single commit to the integration branch. The exception is when I’ve also made a few other changes as a by-product. Whilst the main goal is to implement a specific feature (or fix a bug, etc.) when you’re working on a legacy codebase that lacks any automated test coverage it can save a fair bit of time if the cost of testing can be amortised across a number of changes. This means that during the final merge I need to initially cherry pick a number of other fixes first as individual commits, then merge the remainder (the core feature) as a single commit.
In the integration branch this translates to a series of commits, where each one corresponds to a single feature, it just so happens that they all came from the same source branch. Hence my view is that as long as the “observable outcome” is the same - small, feature-focused commits on the integration branch - it doesn’t really matter too much how they got there. Granted it makes reading the private branch is little more awkward in the future but I feel the saving in development time is often worth the cost.
My preference has generally been for continuous integration through the use of feature toggles. This makes integration easier but cherry-picking harder because the entire feature might be spaced out across a number of discrete commits. I often break a feature down into many smaller tasks that can be delivered to production as soon as possible. This means I generally start with any refactoring because, by definition, I cannot have made any observable changes. Next up is the new code, which, as long as I don’t provide a means to access it can also be deployed to production as a silent passenger. That just leaves the changed or deleted code which will start to have some noticeable impact, at least when it’s toggled on. Each one of these steps may be one or more commits depending on how big the feature is, but each one should still be a cohesive unit of work.
From a cherry-picking perspective what we need to know is all the individual commits that make up the entire feature. This is where I’ve found the practice of tagging each commit with the “change request number”. If you’re using something like JIRA then each feature will likely be represented in that system with a unique id, e.g. PROJ-123. Even Trello cards have a unique number that can be used. I usually prefix (rather then append) each commit with the change code as it makes them easy to see when scanning the VCS change log.
Theory and Practice
It might sound as though cherry-picking features is a regular affair because I pay more than lip-service to them. That’s really not the case, I will avoid them like everyone else, even if they appear to be easy. Unless a change is relatively trivial, e.g. a simple bug fix, it’s quite possible that some refactoring on the integration branch will muddy the waters enough to make not doing it a no-brainer anyway.
It’s hard to quantify how much value there would be in any arbitrary commit. If the entire change is a one-line bug fix it’s easier to determine how big the commit should be. When it’s a new feature that will involve the full suite of modifications - refactoring, adding, removing, updating - it can be harder to see how to break it down into smaller units. This is where I find the notion of software archaeology comes in handy because I project forward 12 months to my future self and look back at the commit and ask myself whether it makes sense.
Photo by Thaddaeus Frogley (@codemonkey_uk)