The other day I was talking to one of my friends and he mentioned how he lost a hard drive recently and consequently all of his stuff. I realized that, although I had bought a backup hard drive a little while ago, I hadn't done more than a couple backups with it when I felt like it. So I decided that I would finally set up regular backups of my home directory.
I'd looked briefly at backup solutions before and thought that rdiff-backup seemed like a good choice. I also want to do snapshots at some point either with straight-up rsync or rsnapshot, but for now just the differential backups are probably fine. So I chose rdiff-backup.
One easily-solved problem I encountered was that I have large data files that I don't have space to back up and that aren't super necessary to back up in the first place. However, I wanted to keep track of the files. So I'm using
rdiff-backup --exclude to exclude those directories, and then writing the output of
ls -lLR in those directories to a file to keep track of what's in them. Since the -l option tells us the modified times, I didn't think it was necessary to write the listing files out into my actual home directory and then have that be part of the backup. EDIT: On second thought, it actually is necessary to do so. If, for example, the entire directory we're saving the listing of was deleted, and then the backup script ran again, the listing would be overwritten with nothing and it would be pointless. So we need to do the listing before running the background and it needs to output to a directory that is going to be backed up.
I combined these two steps into a python script that prints some information about what it's doing and then runs the commands. I always prefer python scripts to bash because you don't have to worry about escaping, or portability, or unreadability, or any of the other problems one encounters with shell scripting.
The other half of the backup problem is actually running the backups. Putting a backup command in your crontab isn't the best idea for a desktop or laptop because it might be scheduled to run while your computer is off, resulting in a missed backup. Anacron is traditionally the solution to this: it runs cron jobs on a hourly/daily/monthly/whatever basis and, more importantly, will run jobs that were missed while the computer was off.
However, I knew that systemd somewhat-recently gained support for doing cron-like things, and I knew that if I used systemd and wrote a unit file to run the backups, I could get the output easily and nicely into systemd's journal, making it easy to check that backups are running. If I put it in cron or anacron, which on my system are both covered by cronie, the output from my backup script would be mixed together with the other cronjobs' outputs. (I think.)
So I wrote a simple .service unit file that runs the python script I wrote as a oneshot, and then wrote a .timer file which tells systemd to run the corresponding .service file with the same name daily (
OnCalendar=daily) and to run it ASAP if we missed the last time it was supposed to run (
Persistent=true). The service file also has some options to set the nice value to 19 (the lowest priority) and lower the IO priority. (
IOSchedulingPriority=7) These were mostly copied from this page on the arch linux wiki. Make sure to delete the in-line comments; they will cause errors if you try to actually use them as they are.
I looked into using systemd user sessions to run these, but it seemed a bit complicated to set up and I wasn't entirely sure what the benefit was. Furthermore, I might use this script to do backups of /etc or something in the future, which would mean I wouldn't use a user session anyway. So instead I have the unit run as my own user (
User=username) and copied the .service and .timer files into
/etc/systemd/system/. I copied the python script that does the backups to
/usr/local/bin/backup-scripts/, though you could just leave it in
/usr/local/bin/ just the same. The last step to make everything work is to enable the timer unit with something like
sudo systemctl enable my-backup.timer.
It's probably irrational, but I feel a bit nervous making my script and unit files public even though the worst/best thing that would probably happen would be someone pointing out a way to make them better. Email me or something if you want to see them, I guess? that feels really lame.