Thursday, February 4, 2010

Bazaar

What better way to evaluate a Distributed Version Control System than by actually using it on a real project? Even if the candidate project is maintained in Subversion...

Why do I want to check out a Distributed Version Control System?

At the place I work we use Subversion as our version control system (VCS). We do not host it ourselves but rely on an external provider for that. In general, we are very pleased both with Subversion and our provider.

However, one problem with this setup is that I need to be online for each VCS action that requires access to the repository. This is not always possible or only through a smallband connection, especially when I am on the road. So no commit, no creation or retrieval of a branch, no history etc.

But even when I am at the office and I am online through a broadband connection, some actions can take up a lot of time due to the sheer amount of data that has to be transferred. The primary example of this is the retrieval of a branch of the project I spent most of my time on. The retrieval of a branch amounts to a 158M download. At the office, we have a download speed that maxes out at 200 KB/s. In theory the retrieval takes a 13 minutes but in practice it takes more than 20 minutes.

Usually my colleagues and I "work around" this problem by working directly on the trunk. This is not a problem when the changes can be applied and tested in a short timeframe. However, changes that appear minor can turn out anything but minor. They take longer to develop than expected and all that time you cannot commit your changes in fear of polluting the trunk. To make things worse, chances are other developers have committed changes to the trunk which you have to checkout before you can commit at all.

Distibuted version control systems (DVCS) can help me with these issues as they allow me to store the complete repository locally. So I decided to test drive one of these DVCSs, namely Bazaar (Bzr) on the aforementioned project. There are two reasons why I chose Bzr. First, there exists an extension to Bzr that allows one to interface to a Subversion repository, see this link for more information. But this functionality is not exclusive to Bzr, for example Git has it too. This brings me to the second reason. At the time of writing this blog entry, Bzr was the only DVCS I had used and it had been a good experience.

Way of working

Just as the documentation of the Bzr Subversion advises, I created a shared repository and within that repository, I created a Bzr checkout of the trunk of the Subversion repository:

bzr init-repo --default-rich-root app-repo

cd app-repo
bzr checkout https:<path-to-trunk> app-trunk

The last command gave me some headaches, but that is another story.

The reason I created a checkout and not an ordinary branch is that when you commit to a checkout, Bzr also passes your commit through to the original branch. In my case, this is the trunk in the Subversion directory. This immediately makes clear the way of working I have in mind:
  1. When I want to work on a branch, I branch my local checkout and work on the new branch. I commit all my changes to the local branch.
  2. When I want to merge my changes with the trunk, I push them from my branch to the checkout of the trunk.
First results

To checkout a branch from the Subversion repository through the TortoiseSVN client took 11 minutes and 17 seconds. This is done measured at home, where I have a higher download speed than at the office. To compare, to branch the local Bzr checkout takes 1 minute and 9 seconds. This is almost 10 times as fast as the Subversion checkout, and more than 17 times as fast as the Subversion checkout at the office. This was a great speed-up although the 1 minute and 9 seconds itself left me a bit underwhelmed. Why does it still take more than a minute to branch? One cause could be the actual size of the branch. Once branched, the directory tree of the new branch takes up 1GB and it definately takes time to write that amount of data.

Unanswered questions

With the repository layout described above, can I recreate a branch after I have deleted its working tree, even if the trunk has evolved in the meantime? The real question is how Bzr identifies branches inside a shared repository. By their path in the repository directory? What whould happen when I create a branch whose path coincides with a previously created branch whose working tree I have deleted? For now it is not really an issue as I will uniquely name my branches and probably, once deleted, never need to restore them. But the answer to this question would increase my understanding of the inner workings of Bzr.

What about collaborating with my colleagues that only use Subversion. Can I push my local branch directly to a Subversion branch? Does a shared repository present the best setup for me? Maybe I should I use stacked branches? For now, lets see how it all works out in practice.

Kind regards.