Automatic backups with rsync

As I've explained before, one of the first things I did upon installation of Arch was to set up a backup solution. This was done simply using a little bash script, which itself would launch rsync to do the actual backup.

But after using this system for a little while, I decided that it needed a rewrite. This was mainly due to two reasons:

  • A "bug" in rsync
  • A flaw in the script/system

A "bug" in rsync

I put bug in between quotes, because while some could classify this as a bug, it isn't really one, since the current behavior of rsync actually matches what the manual says. It's just that one could easily see how a change in the current behavior would make a lot of sense.

It all comes down to the famous --link-dest option that I'm using, as a way to get multiple backups of my system, where all identicital files only need to be copied/stored once on the backup disk, and then hard-linked as many times as needed.

This works great, however here's what the manual says:

This option works best when copying into an empty destination hierarchy, as rsync treats existing files as definitive (so it never looks in the link-dest dirs when a destination file already exists)

So if a file already exists in the destination, rsync will not look into the link-dest dirs. Which means that, in my case, when updating the backup week it would happen that a file did exists (from the previous week), had changed since then, and as a result rsync would copy the new file.

Ideally, it would have looked into the link-dest dir before, thus realizing that there's already an up-to-date version of the file, and therefore simply creating a hard-link is enough. It would make the whole process go faster, and save space of course. Alas, that's not how rsync works, and this would be repeated for the month backup as well.

This issue has been reported a few times, including here where someone even proposed a patch. Unfortunately, while the patch seemed to work as expected on a small test, when I tried to run it for a "real" run rsync just crashed.

Two solutions at that point:

  1. I try to fix it myself; but I don't think I would be able to do that.
  2. Change the way I use rsync to bypass this "bug"

A flaw in the script/system

And I went with the later, because of something else that, while I knew it from the start and originally was okay with, I changed my mind. My script would run each day (cron) to make a new backup day

In addition, every week it would make a backup week and every month a backup month, the idea being that at any given moment I have three backups: beginning of the day, the week, and the month.

But the way it was done would result in two backups being pretty much the same, and every once in a while all three of them would actually be identical. So, I figured I might as well improve this, and also work it so that I use rsync in a way that will bypass the "bug" described earlier.

The new & improved backup script

So I made a new version of the script, which now works like the following:

  • launch rsync to make a backup (in a new folder named after the current date, e.g. 2011-09-23). It still uses the --link-dest option (pointing to the latest backup), but since the destination always is a new folder, no more problem.
  • then it creates/updates a symlink latest so that it points to the newly created backup. This symlink is what's used in the --link-dest option.
  • the backup from the day before is removed, unless:
    • we are the 2nd day of the month, then last month's backup is removed instead
    • we are Tuesday, then:
      • if we are also the 2nd day of the month, nothing else is removed
      • if we are also the 9th day of the month, the backup from 2 weeks ago is also removed
      • else, last week's backup is also removed

The result is pretty much the same as before, except that now I never have 2 (or more) backups identical. There's even handling of the case where a new day, week and month all begins at the same time. In which case on the 2nd I'll have my daily backup, the backup from the day before (as new backup of the month), and kept the backup from the previous week - which will be removed on the 9th.

I also used this occasion to put some things out in a configuration file (and/or as command-line options), because I realized the first version has a few too many things hard-coded (for instance, I hadn't even realized it couldn't be used to backup anything else than / !).

Now the script relies on configuration file, so you can define as many backup schemes as you want, then simply specify which config file to use form the command line (using -c or --config). Of course you can also simply define all options from command-line, should you want to.

In case you define the same option both in a config file and on the command-line, the later takes precedence.

Configuration

The configuration file is a simple text file, where you can use comments (start the line with #). Values should not be put between quotes but directly specified after the equal sign.

Backup folders are created in a destination root, set by option dest-root Alongside the actual backups, a symlink will be automatically created/updated after each backup, pointing to the latest backup. Its name can be set using link-dest

For the whole process to work, backups should be named after the date they were ran at. You can customize their names using option date-format, defaulting to %Y-%m-%d (see man date for more about the format supported). You can also run the script while specifying the name to use this time (instead of using the date format), using option name

The backup source is simply set using option source and as before you can define exclusions through option exclude-from (will be sent to rsync's option of the same name).

Speaking of, you can also define the arguments for rsync through option args Make sure not to use --verbose, --exclude-from or --link-dest as they are auto-added if needed. If not set, it defaults to --archive --acls --xattrs --human-readable -h --stats

A sample configuration file is included, and you can use -h or --help to get command-line help.

Download

All files are available on this BitBucket repository. For fellow Arch users, there's a package in the AUR.

Additionally, you can also download the latest version from this link.

It's all released under GPLv3, and of course bug reports, suggestions or any other form of constructive criticism is very much welcome.

Top of Page