Document toolboxDocument toolbox

Cleaning Up Your Git History

Notes on Order of Operations

In my experience using git it is usually a really good idea to follow some conventions about the order of doing things.  Regarding rebasing, which can be confusing and potentially cause a lot of havoc if done improperly, I like to always confine my rebasing to private local branches of my repository, this means that even if I didn't actually make a branch off of develop to work in, I'll still make a branch before I do any rebasing.  This adds an extra level of security, because now you have convenient labels to your original history that aren't going to go away while you work.

Examples

There is concern that untidily made commits make for an unnecessarily confusing code review process.  Specifically, the concern is with the specific case of changes being made to the same portion of the same file, across multiple different commits.  Ideally, what we want the code reviewer to see is simply the final state of the file and not each incremental change.

Since there are so many different ways that one could choose to make commits in git, and so many different styles of coding I'm going to try and illustrate for a couple of common cases how to go about rewriting your commit history for the code review process so that each commit represents a single meaningful change.  For some good background reading on this I refer you to the article in the Linux Kernel developers guide on Upstream Merging Strategy.  That page explains really well what the semantics of a "single meaningful change" really is, and why you should strive for it.

Commit Style Case Studies

Let's take as our first case the "Exploratory Developer"

Exploratory Developer

This committing style is best characterized as "learning-by-doing".  Basically multiple commits end up changing the same portions of several different files.  Maybe these are related configuration parameters that have to be changed together.  Regardless, once the correct combination is arrived at, the intermediate steps to get there are not of interest.  Assuming that no other work was committed in between such changes, it's a simple matter of "squashing" the history of those three commits into one commit.  

Assuming your commit history looks like this:

* 8ae9f82        (HEAD, config-test) Final configuration (Geoff Shannon)
* 713de43        A second configuration (Geoff Shannon)
* 8f1647b        First try at config (Geoff Shannon)
* f988e8b        (develop) initial commit (Geoff Shannon)

 

This is done like this:

git checkout config-test
git rebase --interactive f988e8b  # -i can be used instead of --interactive

This will pop up your system default editor with a file that should look something like this:

pick 8f1647b First try at config                                                
pick 713de43 A second configuration                                             
pick 8ae9f82 Final configuration                                                
                                                                                
# Rebase f988e8b..8ae9f82 onto f988e8b                                          
#                                                                               
# Commands:                                                                     
#  p, pick = use commit                                                         
#  r, reword = use commit, but edit the commit message                          
#  e, edit = use commit, but stop for amending                                  
#  s, squash = use commit, but meld into previous commit                        
#  f, fixup = like "squash", but discard this commit's log message              
#  x, exec = run command (the rest of the line) using shell                     
#                                                                               
# These lines can be re-ordered; they are executed from top to bottom.          
#                                                                               
# If you remove a line here THAT COMMIT WILL BE LOST.                           
# However, if you remove everything, the rebase will be aborted.                
#

This is called the "rebase todo list" or something similar.  There are several important things to notice.  The commit that you specified in the rebase command doesn't appear in the list of commits that can be changed.  This is because, as the first line of comments states you are rebasing onto that commit.  It's your anchor point, so-to-speak.  The second thing to notice is that the list of commits is in chronological order, with the oldest commit first, the opposite of the way most git commands show you commits.  This has implications for which commits you choose to apply things command like fixup and squash to.

Now, continuing with our example, what we'd want to do to make all of these commits into one is change the todo list to look like this:

pick 8f1647b First try at config                                                
squash 713de43 A second configuration                                             
squash 8ae9f82 Final configuration


# Rebase f988e8b..8ae9f82 onto f988e8b
...

 

Now, if you save the file and close the editor, git will present you with a new screen:

# This is a combination of 3 commits.                                           
# The first commit's message is:                                                
First try at config                                                             
                                                                                
# This is the 2nd commit message:                                               
                                                                                
A second configuration                                                          
                                                                                
# This is the 3rd commit message:                                               
                                                                                
Final configuration                                                             
                                                                                
# Please enter the commit message for your changes. Lines starting              
# with '#' will be ignored, and an empty message aborts the commit.             
# Not currently on any branch.                                                  
# Changes to be committed:                                                      
#   (use "git reset HEAD <file>..." to unstage)                                 
#                                                                               
# modified:   one                                                               
# new file:   three                                                             
# new file:   two                                                               
#

This may look complicated, but it's actually really simple once you understand the parts.

The first line is just informative telling you what's going on.  In this case, 3 commits are being combined into one.  The next three sections are simply the original messages of each commit, delimited by comments telling you which commit in the sequence it came from.

What git wants you to do is to create a commit message that you want your new commit to have.  It's just trying to be helpful and show you what you previously wrote about each individual commit.  Since these commit messages don't really matter, we can just erase everything and replace it with a totally new commit message.

Changed the overall configuration
 
This configuration is good because of foo, bar and baz.
When I tried x, it was no good, but y kind of worked... blah blah blah

Again, save the file and close the editor.

Now your commit history should look like this:

* 009e58f        (HEAD, config-test) Changed the overall configuration (Geoff Shannon)
* f988e8b        (develop) initial commit (Geoff Shannon)

Notice that the new commit has a completely different hash than any of the three previous ones did.  In fact, if we try to checkout one of them like so:

git checkout 8ae9f82

We see that not only does it work, but it gives a very interesting view:

git gr
* 009e58f        (config-test) Changed the configuration (Geoff Shannon)
| * 8ae9f82      (HEAD) Final configuration (Geoff Shannon)
| * 713de43      A second configuration (Geoff Shannon)
| * 8f1647b      First try at config (Geoff Shannon)
|/  
* f988e8b        (develop) initial commit (Geoff Shannon)

This makes it clear that the rebase is an inherently non-destructive operation.  Until git garbage collection is run, those commits will remain available to access by their hash codes, you just no longer have the convenient pointer of a branch head.