Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

There is concern that untidily made commits make for an unnecessarily confusing code review process.  Specifically, the concern is with the specific case of changes being made to the same portion of the same file, across multiple different commits.  Ideally, what we want the code reviewer to see is simply the final state of the file and not each incremental change.

 

Since there are so many different ways that one could choose to make commits in git, and so many different styles of coding I'm going to try and illustrate for a couple of common cases how to go about rewriting your commit history for the code review process so that each commit represents a single meaningful change.  For some good background reading on this I refer you to the article in the Linux Kernel developers guide on Upstream Merging Strategy.  That page explains really well what the semantics of a "single meaningful change" really is, and why you should strive for it.

Commit Style Case Studies

Let's take as our first case the "Exploratory Developer"

Exploratory Developer

This committing style is best characterized as "learning-by-doing".  Basically multiple commits end up changing the same portions of several different files.  Maybe these are related configuration parameters that have to be changed together.  Regardless, once the correct combination is arrived at, the intermediate steps to get there are not of interest.  Assuming that no other work was committed in between such changes, it's a simple matter of "squashing" the history of those three commits into one commit.  

Assuming your commit history looks like this:

...

Notes on Order of Operations

In my experience using git it usually a really good idea to follow some conventions about the order of doing things.  Regarding rebasing, which can be confusing and potentially cause a lot of havoc if done improperly, I like to always confine my rebasing to private local branches of my repository, this means that even if I didn't actually make a branch off of develop to work in, I'll still make a branch before I do any rebasing.  This adds an extra level of security, because now you have convenient labels to your original history that aren't going to go away while you work.

Examples

There is concern that untidily made commits make for an unnecessarily confusing code review process.  Specifically, the concern is with the specific case of changes being made to the same portion of the same file, across multiple different commits.  Ideally, what we want the code reviewer to see is simply the final state of the file and not each incremental change.

Since there are so many different ways that one could choose to make commits in git, and so many different styles of coding I'm going to try and illustrate for a couple of common cases how to go about rewriting your commit history for the code review process so that each commit represents a single meaningful change.  For some good background reading on this I refer you to the article in the Linux Kernel developers guide on Upstream Merging Strategy.  That page explains really well what the semantics of a "single meaningful change" really is, and why you should strive for it.

Commit Style Case Studies

Let's take as our first case the "Exploratory Developer"

Exploratory Developer

This committing style is best characterized as "learning-by-doing".  Basically multiple commits end up changing the same portions of several different files.  Maybe these are related configuration parameters that have to be changed together.  Regardless, once the correct combination is arrived at, the intermediate steps to get there are not of interest.  Assuming that no other work was committed in between such changes, it's a simple matter of "squashing" the history of those three commits into one commit.  

Assuming your commit history looks like this:

Code Block
* 8ae9f82        (developHEAD, config-test) initialFinal commitconfiguration (Geoff Shannon)

 

This is done like this:

Code Block
git checkout config-test
git rebase --interactive f988e8b  # -i can be used instead of --interactive

This will pop up your system default editor with a file that should look something like this:

Code Block
pick 8f1647b
* 713de43        A second configuration (Geoff Shannon)
* 8f1647b        First try at config (Geoff Shannon)
* f988e8b        (develop) initial commit (Geoff Shannon)

 

This is done like this:

Code Block
git checkout config-test
git rebase --interactive f988e8b  # -i can be used instead of --interactive

This will pop up your system default editor with a file that should look something like this:

Code Block
pick 8f1647b First try at config              pick 713de43 A second configuration                              
pick 713de43 A second configuration           pick 8ae9f82 Final configuration                               
pick 8ae9f82 Final configuration                                                
                                              # Rebase                                  
# Rebase f988e8b..8ae9f82 onto f988e8b                                          
#                                                                               
# Commands:                                                                     
#  p, pick = use commit                                                         
#  r, reword = use commit, but edit the commit message                          
#  e, edit = use commit, but stop for amending                                  
#  s, squash = use commit, but meld into previous commit                        
#  f, fixup = like "squash", but discard this commit's log message              
#  x, exec = run command (the rest of the line) using shell                     
#                                                                               
# These lines can be re-ordered; they are executed from top to bottom.          
#                                                                               
# If you remove a line here THAT COMMIT WILL BE LOST.                           
# However, if you remove everything, the rebase will be aborted.                
#

...

Code Block
# This is a combination of 3 commits.                                           
# The first commit's message is:                                                
First try at config                                                             
                                                                                
# This is the 2nd commit message:                                               
                                                                                
A second configuration                                                          
                                                                                
# This is the 3rd commit message:                                               
                                                                                
Final configuration                                                             
                                                                                
# Please enter the commit message for your changes. Lines starting              
# with '#' will be ignored, and an empty message aborts the commit.             
# Not currently on any branch.                                                  
# Changes to be committed:                                                      
#   (use "git reset HEAD <file>..." to unstage)                                 
#                                                                               
# modified:   one                                                               
# new file:   three                                                             
# new file:   two                                                               
#

This may look complicated, but it's actually really simple once you understand the parts.

The first line is just informative telling you what's going on.  In this case, 3 commits are being combined into one.  The next three sections are simply the original messages of each commit, delimited by comments telling you which commit in the sequence it came from.

What git wants you to do is to create a commit message that you want your new commit to have.  It's just trying to be helpful and show you what you previously wrote about each individual commit.  Since these commit messages don't really matter, we can just erase everything and replace it with a totally new commit message.

Code Block
Changed the overall configuration
 
This configuration is good because of foo, bar and baz.
When I tried x, it was no good, but y kind of worked... blah        blah blah

Again, save the file and close the editor.

Now your commit history should look like this:

Code Block
* 009e58f                  
#

This may look complicated, but it's actually really simple once you understand the parts.

The first line is just informative telling you what's going on.  In this case, 3 commits are being combined into one.  The next three sections are simply the original messages of each commit, delimited by comments telling you which commit in the sequence it came from.

What git wants you to do is to create a commit message that you want your new commit to have.  It's just trying to be helpful and show you what you previously wrote about each individual commit.  Since these commit messages don't really matter, we can just erase everything and replace it with a totally new commit message.

Code Block
Changed the overall configuration
 
This configuration is good because of foo, bar and baz.
When I tried x, it was no good, but y kind of worked... blah blah blah

Again, save the file and close the editor.

Now your commit history should look like this:

Code Block
* 009e58f        (HEAD, config-test) Changed the overall configuration (Geoff Shannon)(HEAD, config-test) Changed the overall configuration (Geoff Shannon)
* f988e8b        (develop) initial commit (Geoff Shannon)

Notice that the new commit has a completely different hash than any of the three previous ones did.  In fact, if we try to checkout one of them like so:

Code Block
git checkout 8ae9f82

We see that not only does it work, but it gives a very interesting view:

Code Block
git gr
* 009e58f        (config-test) Changed the configuration (Geoff Shannon)
| * 8ae9f82      (HEAD) Final configuration (Geoff Shannon)
| * 713de43      A second configuration (Geoff Shannon)
| * 8f1647b      First try at config (Geoff Shannon)
|/  
* f988e8b        (develop) initial commit (Geoff Shannon)

 This makes it clear that the rebase is an inherently non-destructive operation.  Until git garbage collection is run, those commits will remain available to access by their hash codes, you just no longer have the convenient pointer of a branch head.