[linux-elitists] rsync hacks (fwd from kragen@pobox.com)

Eugen Leitl eugen@leitl.org
Thu Jul 15 00:54:29 PDT 2004


Figured this could be useful for somebody here.

----- Forwarded message from Kragen Sitaker <kragen@pobox.com> -----

From: Kragen Sitaker <kragen@pobox.com>
Date: Wed, 14 Jul 2004 11:37:06 -0400 (EDT)
To: kragen-hacks@canonical.org
Subject: rsync hacks

rsync is a remarkably versatile program.  Here are some of the recent
uses I have for it.

Downloading mail
================

I download my mail with rsync these days.  Here's my mail-downloading script:
#!/bin/sh
set -e
ssh kragen@somehost.example.com tail --bytes=150000000 /var/mail/kragen \> tmp.mboxtail
rsync -e ssh -Pavzz kragen@somehost.example.com:tmp.mboxtail /home/kragen/sdc1/mail/tmp.mboxtail

This first updates a tmp.mboxtail file in my home directory on
somehost.example.com to contain the last 150 megabytes of my email,
then downloads the differences between that and my current
locally-cached tmp.mboxtail.  If the transfer is interrupted, the -P
leaves me with whatever I've downloaded so far --- I don't lose
anything.  This is important when I'm downloading over my cell phone.
(Rsync makes it possible to keep that 150 megabyte file up-to-date
over a ten-kilobit, one-second-latency, unreliable cell-phone
connection.)

There is a certain amount of overhead in using rsync for this.  If I
were always using rsync over my cell phone, I'd probably break the
mailbox up into 50MB chunks.

I think this ends up being significantly faster than POP or IMAP when
I have a lot of messages and high latency.  I always have a lot of
messages.

Putting together pieces
=======================

One of my machines on a DSL line has a mirror of my whole mailbox
(about 771 megabytes at the moment.)  Normally I just use rsync to
update the mirror from the master on somehost.example.com, but that
involves downloading whatever new mail has arrived.  In this case, it
was about 50 megabytes, which would take about seven minutes to
download over DSL.  I realized that since I was physically near that
machine with my laptop, I could copy my 150-megabyte mailbox-tail file
onto it, and rsync could use that to avoid copying a lot of data over
the link.

I concatenated the 150-megabyte file (with the 150 megabytes of latest
mail) to the other file (with 700+ megabytes of mail) and got a file
that had all my mail in it, just garbled.  Then I used rsync to clean
it up: 

rsync -e ssh -Pavzz kragen@somehost.example.com:/var/mail/kragen currentmbox.somehost

rsync merely used the remote file as a pattern by which to assemble
the pieces of the currentmbox.somehost file into the right pattern.
Rsync in this case took a little under five minutes, most of which was
spent reading the mailbox file on the remote host to see what was in
it.  Instead of 50MB of traffic, it only used about 200KB each way ---
250 times less.  

Sadly, in this case, it didn't actually save me any time because I
spent five minutes copying the 150MB file over 802.11b onto the
machine with the mirror.  If I'd been using 100BaseT it would have
been worthwhile.

Copy-on-write backups
=====================

This is a well-known trick.  It allows you to keep a large number of
snapshots of your filesystem without using a lot of space, as long as
you don't have big files that change frequently.

I use a slightly modified form of the following script to back up my web site:

#!/bin/sh
cd /backups/murch-sitaker-bkup
i=14
rm -rf previous.$i
while [ $i -gt 0 ] ; do
    prev=$(($i - 1))
    mv previous.$prev previous.$i
    i=$prev
done
cp -al current previous.0
rsync -av --delete /chroot/apache/var/www/murch-sitaker current

I run it from a nightly cron job with a crontab line like this:
  35     3    *    *     *     sh /backups/murch-sitaker-bkup/bkupscript

This keeps 15 nightly backups, all sharing space.  They're called
"current", "previous.0", "previous.1", etc.  Every night at 3:35 AM,
the script removes previous.14, renames previous.13 to previous.14,
previous.12 to previous.13, etc.  Then it makes a copy of "current"
called "previous.0".  

The "-al" flags to "cp" cause it to run in "archive" mode --- copying
file permissions, not copying symbolic links, and copying entire
directories full of files --- and using "links", meaning Linux
hardlinks.  The result is that "previous.0"'s directory tree is a copy
of "current", but all the files in it are actually the same files as
the ones in "current".  If you were to edit a file in "current", the
change would be reflected in the corresponding file in "previous.0".
The two directory trees just point at the same files on disk.

After this happens again the next night, those same files are pointed
to by "current", "previous.0", and "previous.1".

Then rsync changes "current" to look like the web-site directory being
backed up.  But it never edits files; if a file has changed, it
creates a new version of the file, then renames it over the top of the
old file.  So any files that have changed are no longer shared between
"current" and "previous.0"; each tree has its own version.  Eventually
previous.0 will get renamed all the way to previous.14, and then when
it is deleted, Linux will reclaim the files that were last changed in
previous.14.

This whole process takes only about 7 seconds if nothing has changed.
The disk usage is just the sum of all the versions of files that have
existed during the last 15 days.

I use a similar technique for backing up my laptop over the network to
another machine.


Low-downtime server moves
=========================

I recently moved the above-mentioned web-server directory from one
filesystem to another.  I wanted to keep all the web server log files,
state files used by PHP, and suchlike in a valid state during the
move; I didn't want to lose log-file entries, or get half-written
ImageGallery state files.

The easy way to do this is as follows:
1. Shut down the web server.
2. Copy the directory tree to the new location.  (It's a gigabyte, so
this takes about five minutes.)
3. Start up the web server in the new location.

The trouble with this is that the web server is down for five minutes.
Or possibly more, if other things go wrong.  So here's an alternative:

1. Copy the directory tree to the new location.
2. rsync the directory tree to the new location.  Very few changes
should get copied across --- only those that happened during the
initial copy.  So this should take much less time.
3. Shut down the web server.
4. rsync the directory tree to the new location, again.
5. Start up the web server in the new location.

With this technique, my downtime was less than a minute.  The command
line that performed steps 3, 4, and 5 looked like this:

  sudo /etc/init.d/chrooted-httpd stop; \
  time sudo rsync -a /chroot/apache/ /newlocation/apache; \
  sudo mv /chroot/apache /chroot/old-apache; \
  sudo ln -s /newlocation/apache /chroot/apache; \
  sudo /etc/init.d/chrooted-httpd start

Note the trailing / on /chroot/apache/ in the rsync command.  This is
important.  Without this, it will put the files in
/newlocation/apache/apache.

In some cases it may be worthwhile to repeat step 2 several times to
catch up with the server, if it's writing a lot of files, but it
wasn't in my case.

Screwing up Macs
================

After an OS upgrade, Beatrice's Mac wouldn't run any GUI applications.
There's some key you can hold down to get it to boot into single-user
mode, and so I modified the multi-user script to not start the window
server, then ran it.  Then I used rsync to back up the Mac over the
network, over ssh, to another nearby machine.  This took nearly an
hour.

Then, after a fresh install, we used rsync to copy the files back.
Unfortunately, rsync doesn't copy resource forks or some MacOS file
attributes.  So all the applications, plus all the Quicken data, were
completely hosed.  If we'd had another Mac to do a test restore onto
before doing the fresh install, maybe we would have been OK.

Apparently there's a program called "psync" that avoids this kind of
problem.

----- End forwarded message -----
-- 
Eugen* Leitl <a href="http://leitl.org">leitl</a>
______________________________________________________________
ICBM: 48.07078, 11.61144            http://www.leitl.org
8B29F6BE: 099D 78BA 2FD3 B014 B08A  7779 75B0 2443 8B29 F6BE
http://moleculardevices.org         http://nanomachines.net
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
Url : http://allium.zgp.org/pipermail/linux-elitists/attachments/20040715/e9183962/attachment.pgp 


More information about the linux-elitists mailing list