Automatic backups with rsync
As I've explained
before,
one of the first things I did upon installation of Arch was to set up a
backup solution. This was done simply using a little bash script, which itself
would launch rsync
to do the actual backup.
But after using this system for a little while, I decided that it needed a rewrite. This was mainly due to two reasons:
- A "bug" in rsync
- A flaw in the script/system
A "bug" in rsync
I put bug in between quotes, because while some could classify this as a bug, it isn't really one, since the current behavior of rsync actually matches what the manual says. It's just that one could easily see how a change in the current behavior would make a lot of sense.
It all comes down to the famous --link-dest
option that I'm using, as a way to
get multiple backups of my system, where all identicital files only need to be
copied/stored once on the backup disk, and then hard-linked as many times as
needed.
This works great, however here's what the manual says:
This option works best when copying into an empty destination hierarchy, as rsync treats existing files as definitive (so it never looks in the link-dest dirs when a destination file already exists)
So if a file already exists in the destination, rsync will not look into the
link-dest dirs. Which means that, in my case, when updating the backup week
it
would happen that a file did exists (from the previous week), had changed since
then, and as a result rsync would copy the new file.
Ideally, it would have looked into the link-dest dir before, thus realizing that
there's already an up-to-date version of the file, and therefore simply creating
a hard-link is enough. It would make the whole process go faster, and save space
of course. Alas, that's not how rsync works, and this would be repeated for the
month
backup as well.
This issue has been reported a few times, including here where someone even proposed a patch. Unfortunately, while the patch seemed to work as expected on a small test, when I tried to run it for a "real" run rsync just crashed.
Two solutions at that point:
- I try to fix it myself; but I don't think I would be able to do that.
- Change the way I use rsync to bypass this "bug"
A flaw in the script/system
And I went with the later, because of something else that, while I knew it from
the start and originally was okay with, I changed my mind. My script would run
each day (cron) to make a new backup day
In addition, every week it would make a backup week
and every month a backup
month
, the idea being that at any given moment I have three backups: beginning
of the day, the week, and the month.
But the way it was done would result in two backups being pretty much the same, and every once in a while all three of them would actually be identical. So, I figured I might as well improve this, and also work it so that I use rsync in a way that will bypass the "bug" described earlier.
The new & improved backup script
So I made a new version of the script, which now works like the following:
- launch rsync to make a backup (in a new folder named after the current date,
e.g.
2011-09-23
). It still uses the--link-dest
option (pointing to the latest backup), but since the destination always is a new folder, no more problem. - then it creates/updates a symlink
latest
so that it points to the newly created backup. This symlink is what's used in the--link-dest
option. - the backup from the day before is removed, unless:
- we are the 2nd day of the month, then last month's backup is removed instead
- we are Tuesday, then:
- if we are also the 2nd day of the month, nothing else is removed
- if we are also the 9th day of the month, the backup from 2 weeks ago is also removed
- else, last week's backup is also removed
The result is pretty much the same as before, except that now I never have 2 (or more) backups identical. There's even handling of the case where a new day, week and month all begins at the same time. In which case on the 2nd I'll have my daily backup, the backup from the day before (as new backup of the month), and kept the backup from the previous week - which will be removed on the 9th.
I also used this occasion to put some things out in a configuration file (and/or
as command-line options), because I realized the first version has a few too
many things hard-coded (for instance, I hadn't even realized it couldn't be used
to backup anything else than /
!).
Now the script relies on configuration file, so you can define as many backup
schemes as you want, then simply specify which config file to use form the
command line (using -c
or --config
). Of course you can also simply define
all options from command-line, should you want to.
In case you define the same option both in a config file and on the command-line, the later takes precedence.
Configuration
The configuration file is a simple text file, where you can use comments (start
the line with #
). Values should not be put between quotes but directly
specified after the equal sign.
Backup folders are created in a destination root, set by option dest-root
Alongside the actual backups, a symlink will be automatically created/updated
after each backup, pointing to the latest backup. Its name can be set using
link-dest
For the whole process to work, backups should be named after the date they were
ran at. You can customize their names using option date-format
, defaulting to
%Y-%m-%d
(see man date
for more about the format supported). You can also
run the script while specifying the name to use this time (instead of using the
date format), using option name
The backup source is simply set using option source
and as before you can
define exclusions through option exclude-from
(will be sent to rsync's option
of the same name).
Speaking of, you can also define the arguments for rsync through option args
Make sure not to use --verbose
, --exclude-from
or --link-dest
as they are
auto-added if needed. If not set, it defaults to --archive --acls --xattrs
--human-readable -h --stats
A sample configuration file is included, and you can use -h
or --help
to get
command-line help.
Download
All files are available on this BitBucket repository. For fellow Arch users, there's a package in the AUR.
Additionally, you can also download the latest version from this link.
It's all released under GPLv3, and of course bug reports, suggestions or any other form of constructive criticism is very much welcome.