Using GIT coming from subversion

by Jettro Coenradie, November 12, 2009

A while a go I upgraded to subversion 1.6. That was the moment I started to have more and more problems with subversion. The past years I was a strong advocate of subversion, but now I am having a hard time defending the usage of subversion. I got interested in GIT. About a month a go Jelmer did a presentation about GIT and I saw Peter Ledbrook using GIT during a training about groovy and grails. Now I have read the Git Community Book and I created an account at github. Time to share what I have learned.

First I’d like to mention that extensive resources about git are available online. I refer to the most important ones at the end of my post. In this post I am not going to write a manual for using git. Other have done that much better than I would be able to do right now. Within this post I will focus on things I like and that I have learned

How does git do what it does

Elements of a git repository

Git uses a completely different format for storing the contents as well as changes to the contents of your repository. While we are used to store only changes using for instance subversion, git uses an approach to store a snapshot of everything in the tree structure for each commit. I see you thinking, doesn’t that make my repository enormous? No it does not, that has to do with the way git stores data.

Git uses SHA1 keys as names of objects that it stores. That way different versions of content can easily be compared to be equal. Just by comparing their SHA1 names. Another optimization is that it creates objects with the same SHA1 name only once. Therefore objects that do not change over a commit, are not recreated. If I do a commit, something changes, but of course not your whole repository. What do the objects that you commit represent. According to the mentioned book, there are 4 types of objects

blob – the actual data
tree – a list with references to other tree and blob objects
commit – a reference to a tree that is committed at a certain moment in time as well as some metadata like did the commit, who was the author, when was the commit/
tag – just an easy way to reference a commit

The commit object is important. It has a reference to zero, one or more parents. If it has no parent, it is the root commit (the start of the repository). It can have more than one parent if the commit is a result of a merge. Otherwise the commit will have one parent.

local meta data of git

Subversion makes extensive use of the .svn folder. I think everybody has copied a folder including the .svn folder from one place to another a was very surprised that commits gave very strange results. With git this is different, there is only one .git folder in the root of the project. This is a very important directory because it contains the complete history of the project. Thanks to this folder you can switch between branches without loosing data. So please do not remove it :-).

The index

An important concept is the index. The index is a staging area between your working directory and your local repository. When performing a commit, the changes in the index are committed. Using the command git status you can see what changes are in your index and which ones are not.

Installation

Installing git should be pretty painless on all operating systems. This is enough to use others remote git repositories and clone them. If you want to do more like push changes to your own remote repository, you have to take some additional steps. I recommend reading the following two articles when using github.

Remote and local repository

cloning

To be able to have a look at sources in a repository, you need to create a clone of that repository. Of course you first need to know the address of the repository. In github that is very easy, check the following image. Just click the button next to the public clone and you can paste the url to the command git clone git://github.com/jettro/MyServerPark.git

Now you have a local clone in your repository. On this repository you can create branches, locally of course, and even commit to your local branches. You cannot push your changes to the remote repository, but there are other means of getting your changes committed. Most easy one is to create a path and mail it.

In case you have made changes to a project that you are not the owner of and you do want to update the project. You can just revert all your changes. Only do this if you do not care about the changes you have created:

git checkout -f

Now you have a clear clone and you can do git pull again.

So creating a clone or copy is done using clone, updating your sources with the changes in the remote repository is done using pull and in case you have configured your local repository the right way you can use push to copy commits in your local repository back to the remote repository.

Using branches

This still does not feel like being really different from subversion. For now the biggest difference is the index, which is a nice concept but using things like changesets in intellij you can accomplish the same thing. Of course you than need intellij to do all the svn stuff. What does make git interesting is that you can commit locally and track your revision history locally. Again we can make the same remark when using intellij, but still it will not be as advanced as with git. So git already has an advantage when using it the subversion way. What does make git very interesting is the concept of local branches.

Git makes it very easy to create a branch, create some changes, commit these changes in the branche. It is also very easy to switch to other branches. That way you can easily create a branch for that killer feature that does not have top priority. When the feature is done, merging it back into the master branch is not hard. Of course you can have merge conflicts that need to be resolved. When that killer feature turns out to be bogus, dropping the branch is easy as well. Switching between branches is done with the command:

git checkout <branch_name>

I do want to clarify what happens when doing a checkout, changes in you files will be taken with you when checking out another branch. Let me clarify this with an example. You create a new branch fix. Than you checkout this branch and change a file. If you checkout the master branch, the change in your file will still be there. So if you want to keep your changes in the branch fix, you need to commit the changes in that branch. Checking out the master branch after the commit will give the desired effect. Your master branch will not see the changes. Remember that you are the only one that can see the commit and you will not pollute the code of others.

Sharing with public repository

Using the pull command you can obtain updates from your cloned repository. By just typing git pull, you are actually using a short hand. Git has configured the remote repository for you when you did git clone. Find out what the remote repository is by typing the following command:

git config --get remote.origin.url

When sharing code with multiple people, it is best to use a public repository. Other can then pull from your public repository and you can push changes to your public repository from your private repository. But how can the others give you changes they have made to create new features or a bug fix. When they do it just once, providing a patch per email is probably best. If they have creates a public repository themselves as well, they can send you a pull request. That way you can pull changes from their public repository, merge them in you local repository and when satisfied push them back to your public repository.

The described method is very much into one person per repository, so one person is also maintaining the main public repository. Well actually, there is no formal main public repository. It is more that the main public repository is the one communicated through the home page of the project. There are other options when working on a project for instance.

At github, working together on a project is done using collaborators. You can have a lot of public collaborators, but limited private collaborators. If you are looking for pricing, check this page: pricing plans. So far I really like the offering of github. Read about their options, I think there is one for everybody. If you are only working on open source projects, the free account will suffice. The business accounts are more valid for using private repositories.

For now I have only looked at github, but sourceforge also has git support. I know that one of the projects I have blogged about before called beet uses the git repository at sourceforge.

Tool support

By now I have used only a few tools to work with git. Most of them are the command line tools and I tried using intellij for a bit. But I could not get it to work together with my local repository. So I have to try that again. I did install the mac tool Gitnub. Which looks better than the default gitk tool.

Doing git on a subversion repository

Git comes with a plugin to use git locally when subversion is used on the server. I am going to try this as well, although I am a bit scared the partial updates problem will be happening than as well.

Read this post if you are interested in using git with svn.

Concluding

In general I like what I see. I will spend some time on the intellij integration, I am very curious how it will behave when I start doing refactorings and the more advanced stuff. I’ll keep you posted about the progressions I make through this blog.

References

http://git-scm.com/ – the homepage
http://book.git-scm.com/ – the community driven book
http://www.github.com/ – online git repository