On Pull Request merge strategies

2022-08-28

The software repository we are working on at my company is on Git and hosted on GitHub. Developers have different opinions on the best merge strategy to use, and historically merge commits have been preferred.
However, this has resulted in quite a messy history with thousands of branches merging each other. I explain below why I prefer “squash and merge” over other strategies when merging Pull Requests.

Basic development cycle

When software engineers make a new feature or fix a bug, the flow is usually:

create a new branch based on the base branch to develop on
make the necessary changes
push to GitHub and open a Pull Request
wait for the review
improve and change code based on the review
once the Pull Request is approved, merge it into the target branch (usually master, main, or a release branch)

This is all pretty standard, but there are a few hiccups.

Review comments

Let us first assume that the original Pull Request had a clean change history that made sense with respect to the bug fix or feature. With changes requested during the review, changes are made on top of that clean history and it may now look like this:

feature: implement XXX
feature: add YYY
fix typo (review starts here)
imports
review suggestions
revert fix typo

Most developers (myself included!) are not good at writing proper commit messages when fixing review comments. This is actually quite hard to do.

Once this has happened, the history is no longer clean, and the author now has two choices:

he does not care (the general case)
he cares and rebases his history to squash changes into meaningful commits. This can often be tedious depending on the changes made. One downside is that the comments made on the Pull Request commits are no longer valid.

But usually, the author does not have the time to do that, and even if he does and rebases, it is not enforced in the company. So how the branch looks at the end just depends on the individual.

Conflicts with the target branch

Assuming all review comments have been taken care of, it is now time to merge… but unfortunately, there are sometimes some conflicts with the target branch.
Two possibilities:

rebase the current branch on top of the target branch: this can lead to tedious and non-trivial changes between commits, sometimes requiring retesting many things. This is preferred to keep a clean history.
merge the target branch into the current branch: this is a one-time merge and is usually much simpler than rebasing if there are multiple conflicting commits. This is what is usually done in my company. The big downside is that the history is no longer linear and quickly becomes a mess. In the case of long-lived feature branches, there are often several merges of the target branch into the feature branch.

Different merge strategies

GitHub proposes several Pull Request merge strategies:

create a merge commit: merge the branch using a merge commit
squash and merge: merge all new commits of the current branch into a single one that will be added to the target branch
rebase and merge: only use this if the history is really clean. I usually prefer a merge commit between the two and discourage using this strategy.

Of the three choices, squash and merge is, in my opinion, the only one that:

will keep the Git history clean and linear in all cases. There are always commits that we do not want to end up in the main branch (I am looking at you “typos”)
ensures that the state of the software is (in most cases) consistent between all commits, assuming that all tests run before merging
can easily be reverted if something happens. This can also easily be done with a merge commit, but not when rebasing
does not hinder developer velocity by rewriting the feature branch history… because it does not matter anymore. This is also valid for merge commits
is the most compatible with different individuals: the ones who care about their commits, and the ones who do not… as they all end in a single commit anyway

I would favor merge commits only if all developers in the team have the incentive (and the time) to rewrite their history and force-push changes for all Pull Requests to be as clean as possible. And this stops working as soon as one person does not follow those recommendations.

Notes:

as described in the GitHub documentation, squash and merge is discouraged for long-running branches, when work continues on the feature branch after the merge. However, I have found that it is easier to create a new branch after the Pull Request has been merged and cherry-pick all commits above.
squash and merge will lose individual commit authorship. This may sometimes be an issue if several people worked on the same branch. If keeping authorship is important (maybe for git blame…), squash and merge should also be discouraged.

Conclusion

In the end, it all depends on the team velocity and which strategy the development team enforces. Is too much time spent on rebasing, or looking at a messy Git history?
My opinion is that keeping a clean, linear history is the most important, and because every developer in the team has different habits, squash merging “just works” most of the time because everything ends up as a single commit.

Useful (biased) links:

GitHub documentation
Two years of squash merge: a blog post better explaining the benefits of squash and merge in the long run
Squash merge on Azure