On Pull Request merge strategies

2022-08-28

The software repository we are working on at my company is on git and hosted on Github. Developers have different opinions on the best merge strategy to use, and historically merge commits have been preferred.
However, this has resulted in quite a messy history with thousands of branches merging each other. I explain below why I prefer “squash and merge” over other strategies when merging Pull Requests.

Basic development cycle

When software engineers make a new features or a fix a bug, the flow is usally:

Create a new branch based on the base branch to develop on
Make the necessary changes
Push to Github and open a Pull Request
Wait for the review
Improve and change code based on the review
Once the Pull Request is approved, merge it into the target branch (usually master, main or a release branch)

This is all pretty standard, but there a few hiccups.

Review comments

Let’s first assume than the original Pull Request had a clean change history that made sense with respect to the bug fix or feature. With changes requested on the review, changes are made on top of that clean history and it may now look like this:

feature: implement XXX
feature: add YYY
fix typo (review starts here)
imports
review suggestions
revert fix typo

Most developers (myself included!) are not good at writing good commit messages when fixing review comments. This is actually quite hard to do.

Once this happened, the history is no longer clean, and the author has now 2 choices:

He does not care (the general case).
He cares and rebase his history to squash changes into meaningful commits. This can often be tedious depending on the changes made. One downside, is that the comments made on the pull request on commits are no longer valid.

But usually, the author does have the time to do that, and even if he does and rebase, it is not enforced in the company. So how the branch looks like at the end just depends on the individual.

Conflicts with the target branch

Assuming all review comments have been taken care of, it is now time to merge… but unfortunately, there are sometimes some conflicts with the target branch. Two possibilities:

Rebase the current branch on top of the target branch: this can lead to tedious and not trivial changes between commits, sometimes leading to retesting lots of stuff. This is what is preferred to keep a clean history.
Merge the target branch in the current branch: this is a one time merge, and is usually much simpler than rebasing if there are multiple conflicting commits. This is what is usually done in my company. The big downside is that the history is no longer linear and quickly becomes a mess. In the case of long-lived feature branch, there are often several merges of the target branch into the feature branch.

Different merge strategies

Github proposes several Pull Request merge strategies:

Create a merge commit: merge the branch using a merge commit.
Squash and merge: merge all new commits of the current branch into a single one that will be added to the target branch.
Rebase and merge: only use that if the history is really clean. I usually prefer a merge commit between the two and discourage using that strategy.

Of the three choices, squash and merge is in my opinion the only one that:

Will keep the git history clean and linear in all cases, there are always commits that we don’t want to end up in the main branch (I am looking at you “typos”).
Ensures that the state of the software is (in most cases) consistent between all commits, assuming that all tests run before merging.
Can easily be reverted if something happens. This can also easily be done with a merge commit, not when rebasing.
Do not hinder developer velocity by rewriting the feature branch history… because it does not matter anymore. This is also valid for merge commits.
The most compatible with different individuals: the ones who care about their commit, and the ones who don’t… as they all end in a single commit anyway.

I would favor merge commits only if all developers in the team have the incentive (and the time) to rewrite their history and force push changes for all Pull Requests to be as clean as possible. And this stops working just when one persone does not follow those recommendations.

Notes:

As described in the Github documentation, squash and merge is discouraged for long running branches, when work continues on the feature branch after the merge. However I found that is easier to create a new branch after the Pull Request has been merged and cherry-picking all commits above.
Squash and merge will lose individual commits authorship. This may sometimes be an issue if several persons worked on the same branch. If keeping authorship is important (maybe when git blaming…), squash and merging should also be discouraged.

Conclusion

At the end, it all depends on the team velocity and which strategy the development team enforces. Is too much time spent on rebasing, or looking at a messed git history?
My opinion is that keeping a clean linear history is the most important, and because every developer in the team have different habits, squash merging “just works” most of the time because everything ends as a single commit.

Useful (biased) links:

Github documentation
Two years of squash merge: a blog post better explaining the benefits of squash and merge in the long run
Squash merge on Azure