How can a project so completely miss the point as badly as Subversion? I’m truly flabbergasted. I hope I don’t offend people by saying this, but I rather agree with Linus Torvalds:
…my hatred of CVS has meant that I see Subversion as being the most pointless project ever started. The slogan of Subversion for a while was “CVS done right”, or something like that, and if you start with that kind of slogan, there’s nowhere you can go. There is no way to do CVS right.
Here is the unanswered question in my head. How can a project first released in 2000 have problems that were hated in the 80′s and solved in the 90′s? To fully put this in perspective let me frame the problems by looking at the issues in SVN’s parent project CVS.
CVS is by far the worst revision control system I’ve ever seen. To again quote Linus:
For the first 10 years of kernel maintenance, we literally used tarballs and patches, which is a much superior source control management system than CVS…
The only revision control system to more blatantly miss the point than SVN was Team Foundation Server (the successor to VSS) which has many of the problems of CVS that SVN set out to fix.
The most common grievances about CVS are the absence of atomic checkins, awful branching support, and terrible performance. Of those four primary grievances SVN only really addresses one of them as SVN does bring atomic checkins to the mix. They also attempted to address branching, but they kinda missed the point and ended up with abysmal branching support. Additionally SVN’s performance is atrocious.
Before all the SVN fans in the room get their knickers all in a bunch let me explain a few things. If you haven’t worked on a large software project with lots of branches, and you haven’t worked with Git, Perforce, or another SCM tool that has good merging support; you don’t get to talk. The honest fact is that I’ve met more than a few people whose only exposure to SCM is SVN and/or CVS, and those people LOVE the branch support in SVN. That’s because they don’t know better. Let me give you a hypothetical run through.
The diagram above shows a visual representation of the history of a file in two different branches. The first event of importance is when Foo’s A.txt was branched to create Bar’s A.txt. That operation took revision 3 of Foo’s A.txt and merged it to create revision 1 of Bar’s A.txt. Cool, well that’s real easy in SVN. The second event is where things start getting tricky: merging Foo’s revision 5 into Bar’s revision 2 to create Bar’s revision 3. SVN isn’t terrible at this first merge, but it also isn’t good. SVN merges will attempt to resolve the changes between the two files, and if possible it will generate a nice merged file. The two big complaints here are that SVN’s merge resolution isn’t as good as the competitions, and it is unbearably slow. I merged a one byte file change the other day and it took over a minute. ONE BYTE!
Where things get really sticky for SVN in the next two events. Merging Foo’s revision 7 in to create Bar’s revision 5, and merging Bar’s revision 6 back into Foo. Since Foo’s revision 5 has already been merged in, the only changes that should need to be resolved are changes from Foo’s revisions 6 and 7. Unfortunately SVN isn’t smart enough to always know that. I’m not sure if it is a defect, or just poor design, but SVN’s ability to track when files are merged between branches is sub-par. If the mergeinfo is properly maintained it SVN can track whether or not files need to be merged, but it doesn’t do it on a change-by-change basis. SVN’s mergeinfo also doesn’t track history across multiple branches, that causes it to be useless if you are dealing with a lot of branches. I recently found myself in a situation where I had merged changes to a file from one branch to another multiple times, and I had to keep re-resolving the changes that I had already merged. That is a giant waste of time.
The other big issue is of course performance. I don’t know how to phrase this gracefully, so I’ll just come out and say it. SVN’s performance blows. I can rsync a directory of source code over ssh faster than I can perform an SVN checkout. That is ridiculous. When working with CVS I hated how many hours of my day were wasted with long checkout times, and while I will recognize that SVN is exponentially better than CVS, it is still piss-poor. To give another contrast, I took a SVN repository and converted it to a git repository.
Then to see the performance difference I did a SVN checkout and a git clone and timed the two operations. The SVN checkout took 10 minutes to pull down 100k files. The git clone only took 2. For those of you who don’t have experience with git, I should also point out that there are some significant differences between SVN’s checkout and git’s clone. SVN’s checkout makes a local copy of the repository and caches some data locally to allow you to perform local diffs and a few other operations. Git’s clone is essentially making a full copy of the repository database including the incremental diffs and change history for every file in the branch. By all rational thinking the git clone should take WAY longer than an SVN checkout because it does so much more, but SVN is shockingly slow.
Moral of this story: use git.