incremental code reviews with git

For a long time I've wanted to do something which should have been simple, but wasn't simple to figure out.

Task: Given a git branch, code review each individual commit made on that branch, one at a time. This assumes the code review is being done using the git diff command, probably using a nice graphical tool such as meld.

The obvious (and wrong) solution is to run git log, get that list of commit hashes, reverse them, and run git diff on each pair.

The problems I encountered:

Merges from the master branch usually create huge diffs:

This is easily fixed by using the --no-merges flag.

The commit ID in git log for the commit previous to the one you're looking at isn't necessarily the "parent":

This is solved using a custom format string for git log.

Individual commits (not merges) in your branch may have originated elsewhere, if you've been keeping your branch up-to-date with master (which you probably should be).

This is solved using the git cherry command.

How it's done

The following steps break it down. Once you understand them individually, combine them into a script for ease-of-use.

figure out which commits you're interested in

This command will tell you which commits in your branch are not in master:

git cherry -v master

If you strip out everything but the commit hash, you can save a file which can easily be used as a filter later:

git cherry -v master | perl -pe 's/^\+ (\S+)\s+.*$/$1/' > filter.txt

get a list of commit IDs and parent commit IDs

Next, we need all the commit hashes to compare. Using this custom format string, git will tell us exactly what the parent hash is:

git log --format="%P..%H" --no-merges > all_the_hashes.txt

filter

Use grep to filter all the commits down to just the ones unique to this branch. Use tac to reverse the list so that the hashes are in chronological order:

grep -f filter.txt all_the_hashes.txt | tac > to_review.txt

Review!

Use these hashes with git diff one at a time, and you can easily review each step along the way. Of course, it helps if the developer committed often.

my actual script

The following script combines all these ideas, and performs the following steps:

  1. Get the hashes unique to this branch.
  2. Get the log info including committer name and commit date.
  3. Filter by the hashes in the first step.
  4. Write a bash script which, when run, will echo the committer name and date and execute git diff for each set of commits, on at a time.

The LF below is just plaintext standing in for the newline character; I couldn't get git log to output a literal \n, so I just use Perl to substitute it later.

Once you've created this bash script, you can easily edit it to trim out commits you don't care about before you begin reviewing:

#!/usr/bin/env bash

git cherry -v master | perl -pe 's/^\+ (\S+)\s+.*$/$1/' > /tmp/filter.txt

git log --format="echo %an, %ar: %sLFgit diff %P..%HLF" --no-merges \
| grep -f /tmp/filter.txt | tac \
| perl -pe 's/LF/\n/g;'

Comments !

blogroll

social