Source repository mirroring primer

Lately I’ve been mirroring some repositories for no reason. Or at least I haven’t used most of them for now. The repositories I mean here are the ones using one of open source VCS: CVS, Subversion, Git or Mercurial. I haven’t tried for other open source VCS (svk, darcs, bazaar, fossil, etc) since I haven’t encountered any repositories I’m mirroring using them.

Without further ado, here be the guides.

CVS

The classic, one of the oldest VCS human ever made. Highly difficult to synchronize uniformly thanks to it’s per-file versioning due to its RCS inheritance. Thankfully it’s not too difficult to mirror: there are two methods, using rsync and CVS{up,sync}.

rsync method is practically usable for every single types of repository. It just happen that CVS mirroring method is limited to this (and other one). The other method is not quite widely usable though (I’ll discuss on that later). Using rsync is straightforward: you clone entire repository, file-per-file. Make sure to use some switches to ensure you life easier:

rsync -azhP <rsync.server>::<cvs/root> <destination>
  • -a: shortcut for -rlptgoD, recursive (-r), symlink as symlink (-l), preserve permission, ownership, and modification time (-ptgo), keep special files (kind of useless since there shouldn’t be any) (-D). I believe you can replace it with -rlt but -a is easier to remember.
  • -z: compression. Clear enough.
  • -h: summary and progress information will be showed in human-readable size (just like in df and co.)
  • -P: shortcut for --progress --partial, it will give you nice progress bar and faster resume on interruption.

To update the mirror just run the same command again.

The other method, CVS{up,ync} is also available. Though as I mentioned earlier, its availability is usually limited to *BSD projects. You can read comprehensive guide here (CVSup, CVSync). It feels slower than rsync (which made me wonder why it’s created) but YMMV.

Subversion

Successor to CVS. With 100% less readable raw repository format. Much faster than CVS though (what isn’t?).

Apart from rsync method which previously mentioned for CVS, there’s another method for mirroring: svnsync. Note that it won’t create 100% carbon-copy of the repository (as in switching from main repository to svnsync mirror is not quite trivial) because the repository will have different UUID.

There are four steps required for initial mirroring, with the last one also used to update the mirror:

  1. Create empty repository. Done by using svnadmin create <destination>.
  2. Create “empty” hook for modifying revision property (revprop). svnsync modifies revision 0 for its operation therefore it’s required to allow revprop modification. Just create an executable file named pre-revprop-change in destination/hooks with one line content: #!/bin/sh.
  3. Initialize synchronization property: svnsync init file:///absolute/path/to/destination svn://sour.ce/repository
  4. Finally update the mirror: svnsync sync file:///absolute/path/to/destination

In case the synchronization process interrupted, it’ll leave sync-lock property on the repository which you must remove before re-synchronizing it. It’s done using this command:

svn pdel --revprop -r 0 svn:sync-lock file:///absolute/path/to/destination

Tip: instead of writing file:///absolute/path/to/destination for every single operation you can just cd to the repository root and use $PWD environment variable (file://$PWD)

Git

People’s favorite (D)VCS. Does practically everything at fastest possible speed. That is, if you use its own git:// protocol for remote operation. For some reason I find it rather slow for remote operation using http:// protocol. It has over9000 options and commands which I find highly confusing.

Initialization can’t be simpler:

git clone git://remoterepo.project.com/project.git

Add additional destination path argument if you want the mirror located in another directory (default is $PWD/project).

Updating is also simple:

cd /path/to/project && git fetch -t

You can also use git pull -t though fetch is easier on disk space if you don’t intent to use the mirrored repository directly. And as I said before, it has over9000 options and commands but for simple mirroring you only need these two. And I probably missed something (and I just realized when writing this post that -t is required).

Mercurial

Simple version of Git. Also less flexible but easier for my brain. Initial mirror is same with Git but for updating it uses pull command. In fact you can use hg init project, cd to the project directory and then hg pull http://remote.repo.com/hg/project. Also create file named hgrc inside .hg directory (in project root directory) to define default pull path:

[paths]
default = http://remote.repo.com/hg/project

And then you can pull away. Much easier compared to Git, I must say. Also read this page if you’re coming from/to Git. I should’ve read that page before messing with Git.

That’s it! Hopefully this post will be useful for someone (besides myself, especially for anything other than Mercurial).

One thought on “Source repository mirroring primer

  1. Pingback: Amit Agarwal

Leave a Reply

Your email address will not be published. Required fields are marked *