01 October 2011

Using git svn with a large repository

I've started using the git svn bridge for one of our projects, but I had a couple of problems with the initial clone of the repository, due to the file size
(some > 100Mb), and to the subversion server dropping the connection.
So, I started using the standard git svn clone:
$ git svn clone https://svn.farwell.co.uk/svn/project --stdlayout
Initialized empty Git repository in c:/code/project/.git/
r1 = 339bd134b2d482cf9038c16fa75f93255ebfbc1a (refs/remotes/trunk)
W: +empty_dir: trunk/blah1
W: +empty_dir: trunk/blah2
W: +empty_dir: trunk/blah3
W: +empty_dir: trunk/blah4
The --stdlayout means that git expects the trunk to be called trunk, tags be called tags and branches to be called branches.
Note also that you need to specify the url without the trunk at the end. This ran for a while, and then fell over, because svn dropped the connection on me. There is a timeout on the server.
RA layer request failed: REPORT request failed on '/svn/project
/!svn/vcc/default': REPORT of '/svn/project/!svn/vcc/default':
Could not read chunk delimiter: Secure connection truncated (ht
tps://svn.farwell.co.uk) at C:\Program Files (x86)\Git/libexec/
git-core/git-svn line 5114
We need to load in batches. git fetch has a -r option to allow you to specify the range of revisions to fetch. We've got some large files, so we'll do 10 at a time.
I started again:
$ git svn clone https://svn.farwell.co.uk/svn/project \
--stdlayout -r1:2
which fetched the first two revisions, but we have to fetch the rest, about 1000 revisions. I used a quick perl script.
my $count = 1;

while ($count <= 1000) {
# executes git svn fetch -r1:11 etc.
my $cmd="git svn fetch -r$count:" . ($count + 10);
print "$cmd\n";
$count += 10;
But then we get another problem: git is running out of memory; it crashed and this time it's more serious. Another problem with our big files. This is the error message:
Out of memory during "large" request for 268439552 bytes, total sbrk() is 140652544 bytes at /usr/lib/perl5/site_perl/Git.pm line 898,  line 3.
Git svn uses perl to download and process the files, but it slurps the entire file in one go. So for our large files, it runs out of memory.

After a bit of searching on the internet, I found a solution on github for our problem: Git.pm: Use stream-like writing in cat_blob().
This is a fairly simple patch, which doesn't seem to have made it into a release yet, so I applied it manually to C:\Program Files (x86)\Git\lib\perl5\site_perl\Git.pm.
@@ -896,22 +896,26 @@ sub cat_blob {
my $size = $1;
- my $blob;
my $bytesRead = 0;

while (1) {
+ my $blob;
my $bytesLeft = $size - $bytesRead;
last unless $bytesLeft;

my $bytesToRead = $bytesLeft < 1024 ? $bytesLeft : 1024;
- my $read = read($in, $blob, $bytesToRead, $bytesRead);
+ my $read = read($in, $blob, $bytesToRead);
unless (defined($read)) {
throw Error::Simple("in pipe went bad");

$bytesRead += $read;
+ unless (print $fh $blob) {
+ $self->_close_cat_blob();
+ throw Error::Simple("couldn't write to passed in filehandle");
+ }

# Skip past the trailing newline.

@@ -926,11 +930,6 @@ sub cat_blob {
throw Error::Simple("didn't find newline after blob");

- unless (print $fh $blob) {
- $self->_close_cat_blob();
- throw Error::Simple("couldn't write to passed in filehandle");
- }
return $size;
I restarted the process from the beginning and voilà, it got to the end. All of the revisions had been fetched, all that was left to do was a
$ git svn rebase
to merge the changes into the tree and have a working git repo.

If had wanted to migrate from svn to github, rather than continue to use git svn, I'd have done exactly the same thing, but add a --no-metadata to the clone command.
And obviously you don't need to to an svn rebase, just a rebase.


Anonymous said...

Great blog you have here but I was wondering if you knew of any community forums that cover the same topics talked about in this article? Id really love to be a part of group where I can get feedback from other knowledgeable people that share the same interest. If you have any recommendations, please let me know. Appreciate it!
Acheter vimax en France.2011AVEF

Matthew Farwell said...

There are a number of options: there is a git channel on IRC, irc://irc.freenode.net/git, and http://stackoverflow.com which is a great site for asking or answering questions. If you want to start somewhere, start at stackoverflow.com

Semen Vadishev said...

If you want to migrate to gihub, consider using subgit, http://subgit.com

To use this tool you have to have local access to your svn repository. If so, just install subgit into this repository.

This will import all revisions into specified git repository. Since then you can use pure git (not git-svn) to push changes into this newly created repository — subgit will automatically synchronize changes with svn.

In order to make it work with github, you can publish your git repository and from time to time push changes into git repository synchronized with svn.

Hope that helps.