Squash vs. Merge

Currently, the parse-server GitHub account is configured to not allow merge commits into master.

This means that if there is a pull request with dozens of commits, the commits will be squashed into a single commit.

On my own stuff, I don’t do this. I commit very frequently with a message for each thing I do. When I want what’s in master I create a merge commit instead of rebasing.

When I merge into master I do not squash.

I like this because:

  1. If I am searching for how something is done by grepping the commit history, and I get a hit, it’ll generally be a small commit for just what or nearly just what I am looking for instead of a large commit with lots of mostly unrelated changes.

  2. If I am trying to locate a regression I can use the amazingly awesome git bisect to pin down the change, if I have small discreet commits, it is easier to locate.

  3. I can keep track of my local branches more easily using:

git-clean='git branch --merged | egrep -v "(^\*|master|dev)" | xargs git branch -d'

this to me is a big advantage. I still manage my local branches, but it is more labor-intensive and accident-prone.

  1. it’s mind-numbingly easy and easy to reason about. For example, what even happens when you squash when there are commits from multiple authors in a pr. I know it is handled but I have no idea how.

What are the counter-arguments?

One that I can recall is that it makes it easier to read history cause PR’s are clumped together. Maybe, but I have never looked at history that way for that purpose…maybe others do?

1 Like

There are fews points in this article here: https://blog.carbonfive.com/2017/08/28/always-squash-and-rebase-your-git-commits/

I like and agree with your pros and I think the mains cons are:

  • Easier to read history
  • All commits passing tests => easier to locate a regression

I’d suggest we try the new approach for a while and evaluate how it goes, but I have some questions to think about:

  • Maybe require a rebase instead of just merge? So we can have all commits from a PR together in the history
  • Maybe a max number of commits / PR?
  • Maybe review how meaningful are the commits of the PR and ask for the author to merge/exclude some of them?
  • Maybe add a hook in the commit process to run tests and therefore ensure that all commits are passing tests?
1 Like

oh, I’m not advocating we switch, at least no at this point. Just something I have been wondering about.

But to your points,

  1. who cares if all the commits pass? I don’t understand what value there would be in requiring that. In fact, it would just be a coincidence if they all did happen to pass. We just want to know that when the merge commit is created, the sum of the commits pass.

  2. why a max number of commits / pr? who cares. I don’t even care if the commit is just a nit. just don’t even think about it. commit when you want. nobody’s business but your own.

And the difference may come down to this: ‘Easier to read history’

I have never read history for any reason other than trying to fix something I or someone else has done with git and I don’t think that the squash v merge would have mattered one way or another in those cases. All that matters, in my experience, is that a commit has a good enough message.

In my own work, I seek small and cohesive commits that don’t have unrelated changes. So I find myself doing the exact opposite of what you’re suggesting. When I have a bunch of dirty files, I’ll go through and do partial stages and stuff so I can get discreet changes in one commit if there are two changes in the same file that are unrelated. I do this with my parse commits and everything else I do, which is what got me thinking about this.

Your point about rebase and keeping all the commits from a merge contiguous is interesting, but I think it probably only matters if you want to ‘read history’ which I at least have never found a need or reason to.

One other good thing about my method, particularly with OSS project is that it is completely laissez-faire, so it’s one less thing we’re ‘forcing’ contributors to adhere in order to contribute, though, in real life, GitHub has made what we are doing so easy to the end-user that it doesn’t really matter anymore.

…ok. now I’ll read that link.

ps thanks for thinking along with me!

The author makes your point about tests passing which is a good one and directly rebuts one of my points which is that you can’t use git bisect properly on commits that done’t build or other can’t run.

At first I thought, ‘that’s never been an issue for me in real life and it is countered by the advantage of small commits in bisect’ but even a brief reflection and I have to admit I have run into that problem, without fully realizing that that was why bisect was difficult to run.

Interesting…

I use to read sometimes to know what’s new in each release (I can do the same by reading the release notes) and what’s in next release. I brought some points to think about in order to improve the history reading but I agree with you that’s maybe not so important as an easier process for new committers would be. Maybe we should just try for a while in some of the repos and evaluate the results…

For a release compare, I’m generally using tags or specific start, end commits, but in that context, yes, seeing just one message per pull request is good too. I think I’m convinced, which means I have to figure out a better way to manage local branches or live with doing it manually.

Thanks :).