Wednesday, August 17, 2011

Some Points on Git vs Subversion

Since giving a talk on Git a few months ago at work, and hearing from some people about how they especially liked my comparisons of Git and Subversion, I decided to make a blog post about just that.

Aside from the obvious points about how "Subversion is centralized and Git is distributed", I thought I'd offer some other places where the two differ - particularly in practicalities. In passing conversations, it's very easy to say that Git is better than Subversion without going into much detail as to why, so I'm going to cover quite a lot here.

In Git, you create your repositories. You don't need to ask the system administrator to do it for you on another server (unless it's to setup a central repository). You create the repo with a simple command:
 $ cd proj/
 $ git init
Compare this to Subversion's way:
 $ ssh [svn server]
 $ sudo svnadmin create /path/to/subversion/repos/proj --fs-type fsfs
The Git repository lives in the top level of your project (proj/.git). So, unlike Subversion, you just have one hidden directory for version control. No .svn directories or anything in every single directory of your project.

Git doesn't use the term working copy like Subversion does. Git uses the term working tree. Since there's no separation of the working tree from the repository there's no copy. Make sense? In Subversion, your repository exists over there on another server. In Git, your repository is right here, in the .git directory inside your project's directory on your work station. This means deleting your git repo is as simple as rm -rf proj/.git.

Git also doesn't use separate file systems that you need to worry about creating or updating. With Subversion, if you create a repository with FSFS it will be a specific version of FSFS. If you upgrade svn, you'll also have to upgrade your Subversion file system to get the new features offered by the new version of the svn software. This requires asking your friendly system administrator to do an svn upgrade on your repositories. Git doesn't break backward compatibility like that.

"That sounds great and all, but doesn't it mean I will have to manage my own backups for each repository I create - rather than relying on my System Administrator to do it for me?" Yes and no. This is where Central Repositories for Git are useful. Even though Git is distributed, it doesn't mean Git can't make use of central repositories. In fact if you're working on a team, you'll probably want to have a central repository to push your changes to and allow others to then pull them. This also serves as a backup.

Git has three main states that your files can reside in: committed, modified and staged. Committed means that the data is stored in your local database; Modified means that you've changed the file but haven't committed to your database yet; and Staged means you have marked a modified file in its current version to go into your next commit snapshot.

This is conceptually similar to how Subversion works, but Git's staging area is far more powerful than how svn does things. Say you create a couple of files. Just like with svn, before you can commit any new files you need to add them:
 $ git add .
That is what it means in Git terms when someone says to stage your files; You add them to the staging area so Git can be aware of them. Once the files are added, you can commit them:
 $ git commit -m "initial import"
Once a file has been committed, and you make further changes to it, you will have to stage the file again before you commit it. Fortunately, you can just use the -a option to stage it and commit it at the same time. Edit the file foo and add a line of text blah to it, then:
 $ git commit -a -m "Committing blah"
Another thing that's great about Git is that it gives you a lot more verbose output when you run commands. Subversion doesn't usually tell you very much about what's happening. For example, Git often tells you which commands you can run in order to undo an operation.

Create two new files called spam and eggs. To see what state the unadded files are in, type:
 $ git status
This is akin to svn status, which shows ? next to unadded files. Git is more verbose and will say the files are untracked. So track them:
 $ git add spam eggs
Now typing git status will say they are new files with changes ready to be committed. Instead of reverting files, you can unstage them if you don't change your mind and don't want to commit them.

Git is really good about letting you see diffs and times very easily and flexibly. For example, to see what was last committed, both the messages and the diffs, type:
 $ git show
One of my favorite commands is:
 $ git whatchanged
It will show you a git log with every file tht was changed and how it was changed (M, A, D, etc.) You can even do:
 $ git whatchanged --since="2 weeks ago"
Like svn log, you can do git log to see all that was committed thus far. You can take this many steps further with Git. For instance, you can see all the diffs for everything with:
 $ git log --stat -p
To narrow this output down to just all the commits that rkulla made, do:
 $ git log --author=rkulla
This is much nicer than in svn where you end up having to grep the log output. In fact, grep'ing the log output with git can be done with:
 $ git log --grep=foo
There's no need to git log | grep foo, which is nice because piping to grep causes you to lose information because it only shows you the lines that contain the exact match.

You can make aliases directly in Git. There's little need to outsource aliases to your command-shell like you need with Subversion. Since there are so many command-line options with Git, I often make custom commands. Take this one:
 $ git config --global 'log --pretty=format:"%C(yellow)%h%d%Creset %s - %an [%ar]"'
This lets me type:
 $ git lol
to show the git log like you get with git log --oneline, but it's more verbose and shows things like the author and how long ago things were committed.

You can also apply filters to git log. For example to see all files that were ever deleted from the repository:
 $ git log --name-status --diff-filter=D
The pickaxe can help you find code that's been deleted or moved (or introduced) based on a string. To use the pickaxe pass -S[string] to git log:
 $ git log -Sfoo
That will show the commit(s) that the string foo was ever in. Because the ncurses based program tig supports all the git options, you can view the list of commits and then see the diffs by hitting enter by first running:
 $ tig -Sfoo
With git, commands like git log, git diff--and others that produce lots of output--will automatically get piped to your pager program (e.g., less(1) or more(1)). With Subversion, you always have to pipe things to less manually.

You can also use git grep to grep for things that exist:
 $ git grep 'foo'
That will show you all files that have the string foo in them. All without the need to have you specify file names or exclusions; it even recurses into sub-directories automatically. Contrast this with what Subversion would make you go through with:
 $ find . -not \( -name .svn -prune \) -exec grep foo {} +
(If you are using Subversion, do yourself a favor and install ack, so you don't have to write find commands like the one above.)

Git's grep command is also powerful enough so that you don't have to write regular expressions as much. Adding -p will show you what functions the matches are in:
 $ git grep -np VIDEORESIZE main():    if event.type == VIDEORESIZE:
Moving on to branching differences really quick. Creating branches in Git is much easier than in Subversion. You don't even have to checkout your branch in git after you create it like you do in svn. So instead of doing:
 $ svn cp ^/trunk ^/branch/branchname -m "creating branch"
 $ svn switch ^/branch/branchname
You can just do:
 $ git branch branchname
 $ git checkout branchname
That's it. Or even easier:
 $ git checkout -b branchname
To create the branch and switch to it at the same time.

Deleting a branch is as easy as:
 $ git branch -d branchname
(You do delete your branches when you're done with them, don't you?)

Git supports merging between branches much better than Subversion does. Git keeps track of much more history to make it a smooth operation, and the command is easier to type. Once you're on one of the branches to be merged, you can merge the other one with:
 $ git merge [branch]
If there are no conflicts it even commits automatically for you. Though you can tell it not to commit for you with --no-commit.

Aside from merging, sometimes you just want to grab a commit from a different branch and apply it to your current branch. This is called cherry picking and in git the command is appropriately named:
 $ git cherry-pick [revision]
Compare this with Subversion, which has a much less intuititive way of doing this:
 $ svn merge -c [revision] [url]
 $ svn ci -m "cherry picked [rev]
As you may have guessed by now, creating a tag with Git is as easy as creating a branch:
 $ git tag -a tagname
And you can list which tags you have as simple as:
 $ git tag -l
Another thing Git can do that Subversion can't is stashing. For those times when your changes are in an incomplete state and you're ready to commit but you need to temporarily return to the last fresh commit, you can push all your uncommitted changes onto a stack. See the documentation for how to do this.

Moving on, moving on... Okay, how about reverting? To do the equivalent of "svn revert -R ." (revert all local, unstaged, changes):
 $ git reset --hard HEAD
Rolling back a commit is as easy in git as:
 $ git revert HEAD
It will even fill in the commit message for you with "reverting [whatever your last commit message was]" along with the SHA hash of the commit.

You can even pick specific commits to undo with:
 $ git revert [hash]
Oh yes, git lets you easily change commit messages, too. Say you have a post-commit hook script that looks for the string "bug #nnnn" in commit messages--in order to create a list of files in the corresponding ticket number in your bug tracker. Well, what if you forget to input that special syntax into your commit messages? With Git, you could just:
 $ git commit --ammend
Which will open your editor and let you change the commit message. Once you close your editor, it's done. It even changes the commit messages automatically for any reverts associated with the commit to say "reverted [new message]"! Good luck doing that with Subversion. If you need to modify multiple commit messages, or a commit messages several commits back, look into using Git's interactive rebase.

Moving onto deployments now... Git has archiving features built in. For example, you can create a tar of the latest revision using:
 $ cd my-proj
 $ git archive -o /tmp/my-proj.tar HEAD
You can create a tarball with:
 $ git archive HEAD | gzip > /tmp/my-proj.tar.gz 
Say you want to zip up just the documentation of your project:
 $ git archive --format=zip --prefix=my-docs HEAD:docs/ >  /tmp/
Now when you unzip it will unpack a directory called my-docs with your documentation.

This list of comparisons is getting really long, so I'm going to stop now. Feel free to add your own additions in the comments section below.

1 comment:

  1. Thank you. Very nice review, for a n00b like me trying to decide whether using SNV or GIT.


About Me