[linux-elitists] Fun with Git repository copying

Øyvind A. Holm sunny at sunbase.org
Sat Apr 13 20:21:31 PDT 2013


On 14 April 2013 05:04, Greg KH <greg at kroah.com> wrote:
> On Sat, Apr 13, 2013 at 10:26:11AM -0700, Don Marti wrote:
> > begin Greg KH quotation of Sat, Apr 13, 2013 at 08:07:32AM -0700:
> >
> > > Again, don't just use rsync or cp on a live git repo, you wouldn't
> > > do that on a database, would you?
> >
> > No, for a database I'd shut down the server, then copy, then start
> > up the server again (unless it was critical to minimize downtime, in
> > which case I'd put the database files on a separate filesystem and
> > do a snapshot, then copy that.)
> >
> > For git, though, there's no server process to shut down (unless I
> > want to bring down sshd).  What's the best way to make git not
> > modify a repository while I'm copying it or backing it up?
>
> You already said it, stop all processes that could access the repo
> (i.e. sshd), back it up / snapshot it, and start it up.  Just like any
> other database.

Why use rsync at all? We already have git fetch. Create a bare
repository on another machine, set up a remote in that bare repo that
points to the source repo (the one that should be backed up) and run
"git fetch --all --prune". In addition to that, you could recreate all
the branches locally (in the backup repo) using something like this
script:

  https://github.com/sunny256/utils/blob/master/git-allbr

Are there any advantages to using rsync instead of just fetching a
backup? By using this method, silent corruption of the main repo (like
the KDE event some weeks ago) will be caught if fetch.fsckObjects is set
to true. And there's no need to shut down anything. Also, if the source
repo is repacked with git-gc, no additional bandwith is used to download
all the repackaged objects.

Regards,
Øyvind


More information about the linux-elitists mailing list