Motives and the Features
For the servers running in our new network, I was in need for a highly configurable, but easy-to-use backup solution that can take online backups of VMs and MySQL databases running multiple storage engines.
Since my colleagues are all researchers or programmers but there are no dedicated engineers for managing our system, I decided to write a set of command line scripts to accomplish the task instead of using an existing, highly-configurable but time-taking-to-learn backup solutions, like Amanda.
And what I have come up with now is a backup solution with following characteristics, let me introduce them.
- a central backup server able to take backup of other servers over SSH using public-key authentication
- no need to install backup agents into each server
- LVM snapshot-based online, incremental backups (capable of taking online backups of LVM-based VMs)
- taking online backups of MySQL databases running mulitple storage engines with sophisticated lock control
- no configuration files, only use crontab and shell-scripts
The solution consists of two tools, blockdiff (kazuho's blockdiff at master - GitHub), and cronlog script of kaztools (kazuho's kaztools at master - GitHub).
Blockdiff is a set of scripts for taking block-based diffs of files or volumes on a local machine or on remote machines over the network using SSH. The script below takes online backup of three LVM volumes from three servers. In the form below, a full backup will be taken once a month, and incremetal backups will be taken during every month.
export YEARMONTH=`date '+%Y%m'`
# backup a LVM volume (using snapshot) at /dev/pv/lv on srv00
blockdiff_backup /var/backup/srv00-pv-lv-$YEARMONTH ssh_lvm_dump --gzip \
root@srv00 /dev/pv/lv \
|| exit $?
# backup another LVM volume on an another server
blockdiff_backup /var/backup/srv01-pv-lv-$YEARMONTH ssh_lvm_dump --gzip \
root@srv01 /dev/pv/lv \
|| exit $?
# backup a MySQL database stored on volume /dev/pv/lv on server db00
LVCREATE_PREFIX='mysqllock --host=db00 --user=root --password=XXXX' \
blockdiff_backup /var/backup/db00-pv-lv-$YEARMONTH ssh_lvm_dump --gzip \
root@db00 /dev/pv/lv \
|| exit $?
The backup command of the last volume uses mysqllock command included in blockdiff to keep "FLUSH TABLES WITH WRITE LOCK" running while taking a snapshot of the LVM volume on which the database files exist. It is also possible to implement other kinds of locks so as not to issue the "FLUSH TABLES WITH WRITE LOCK" while long-running queries are in execution. Since the flush statement blocks other queries until all of the already running queries complete, issuing the flush query when long-running queries exist will lead to the database not responding to other queries for a certain amount of time.
Crontab and the cronlog script
The backup script is invoked by cron via cronlog, a script that logs the output of the executed task, as well as controlling the output passed to cron so that an alert mail will be sent when the backup script fails. It uses setlock command of daemontools for holding an exclusive lock while running the backup script (and to alert the administrator on when failing to acquire the lock).
5 3 * * * cd /var/backup && exec setlock -nX /tmp/backup.lock cronlog -l /var/backup/backup.log -t -- ./backup.sh 2>&1
This is all that needs to be set up to backup LVM volumes including MySQL databases. Output of the log will be like the following.
[Sat Jan 9 03:05:02 2010] backup-srv starting: ./backup.sh
[Sat Jan 9 03:05:02 2010] creating snapshot...
[Sat Jan 9 03:05:07 2010] Logical volume "lvm_dump" created
[Sat Jan 9 03:05:07 2010] running: ssh_blockdiff_dump --gzip "root@srv00" "/dev/pv/lv"...
[Sat Jan 9 03:19:22 2010] removing snapshot /dev/pv/lvm_dump...
[Sat Jan 9 03:19:23 2010] Logical volume "lvm_dump" successfully removed
[Sat Jan 9 03:19:23 2010] backup completed successfully
[Sat Jan 9 03:35:56 2010] creating snapshot...
[Sat Jan 9 03:35:56 2010] issuing lock statement: FLUSH TABLES WITH READ LOCK
[Sat Jan 9 03:36:00 2010] Logical volume "lvm_dump" created
[Sat Jan 9 03:36:00 2010] issuing unlock statement: UNLOCK TABLES
[Sat Jan 9 03:36:00 2010] running: bin/ssh_blockdiff_dump --gzip "root@db00" "/dev/pv/lv"...
[Sat Jan 9 04:18:44 2010] removing snapshot /dev/pv/lvm_dump...
[Sat Jan 9 04:18:46 2010] Logical volume "lvm_dump" successfully removed
[Sat Jan 9 04:18:46 2010] backup completed successfully
[Sat Jan 9 04:18:46 2010] command exited with code:0
The files in the backup directory will be like below. The .gz files contain the backup data, and .md5 files contain per-block checksums used for taking incremental or differential backups.
The backup files
% ls -l db00-pv-lv-201001*
-rw-r--r-- 1 backup backup 50289166539 2010-01-01 05:35 db00-pv-lv-201001.1.gz
-rw-r--r-- 1 backup backup 131072004 2010-01-01 05:35 db00-pv-lv-201001.1.md5
-rw-r--r-- 1 backup backup 10914423057 2010-01-02 04:32 db00-pv-lv-201001.2.gz
-rw-r--r-- 1 backup backup 131072004 2010-01-02 04:32 db00-pv-lv-201001.2.md5
-rw-r--r-- 1 backup backup 13648250036 2010-01-03 04:33 db00-pv-lv-201001.3.gz
-rw-r--r-- 1 backup backup 131072004 2010-01-03 04:34 db00-pv-lv-201001.3.md5
-rw-r--r-- 1 backup backup 3 2010-01-18 04:34 db00-pv-lv-201001.ver
For more information, please read the source code and the accompanying documentation.
As can be seen, this is a powerful backup solution that can be built up with minimum setup. It will work well if you work in a small number of experienced engineers, while it might not be suitable for large-scale deployments with many admins. If you are interested, please give it a try. I am looking forward to your ideas and / or suggestions.
PS. The blockdiff_merge command can be used to restore the backups.