====== Backup ======
This is how I use ''rsync'' and ''crontab'' with some simple shell scripts to set up an automatic rotating backup system for my files.
===== Notes =====
==== References ====
Many thanks to those people who shared their knowledge and experience on the web.
* [[http://www.mikerubel.org/computers/rsync_snapshots/]]
* [[http://www.sanitarium.net/golug/rsync_backups_2010.html]]
* [[http://blog.interlinked.org/tutorials/rsync_time_machine.html]]
==== External Hard Drive ====
Under Mac OS X, make sure to un-check the "ignore ownership on this volume" option (under Finder/Get Info). Otherwise ''rsync'' will produce a full backup every time, instead of the intended incremental backup. Also I prefer to turn off Spotlight for the external hard drive.
===== Shell Scripts =====
Here are the scripts used in my backup scheme.
==== Hourly Backup ====
The first script checks if the hard drive was mounted correctly, and if yes, calls the second script to perform the backup.
chkuo@sage[~/script]$ cat hourly.sh
#!/bin/bash
if [ -d /Volumes/sparrow ]; then
/Users/chkuo/script/sync.hourly.sh
else
exit 1
fi;
In this example, I use an external hard drive (''/Volumes/sparrow'') to backup the files in the home directory (''/Users/chkuo''). Cache files in the ''~/Library/Caches'' directory are excluded from the backup operation. The script is designed to maintain 10 hourly backups (''hourly.0'' is the most recent one and ''hourly.9'' is the oldest one). Because the ''--link-dest'' argument is used in the ''rsync'' command, only files that have been changed during the past hour will be saved in the incremental backup; files that have not been changed will be linked with a hard link and do not occupy additional disk space.
chkuo@sage[~/script]$ cat sync.hourly.sh
#!/bin/bash
source=/Users/chkuo
target=/Volumes/sparrow/hourly
max=9
# remove the oldest snapshot, if it exists
if [ -e $target.$max ]; then
rm -rf $target.$max*
fi;
# shift the other snapshot(s) back by one, if they exist
for (( i=$max;i>0;i=i-1 ))
do
if [ -e $target.$(($i-1)) ]; then
mv $target.$(($i-1)) $target.$i
fi;
if [ -e $target.$(($i-1)).log ]; then
mv $target.$(($i-1)).log $target.$i.log
fi;
if [ -e $target.$(($i-1)).err ]; then
mv $target.$(($i-1)).err $target.$i.err
fi;
done
# make the current snapshot
rsync -av --delete \
--link-dest=$target.1 \
--exclude='Library/Caches' \
--exclude='.cpan/' \
$source $target.0 \
1> $target.0.log \
2> $target.0.err
# update the timestamp
touch $target.0
==== Daily Backup ====
Similar to the hourly backup, two scripts are used to perform the daily backup. In addition to the home directory (''/Users/chkuo''), I also backup files in ''/scratch/chkuo'' and ''/storage/chkuo''. ''/scratch/chkuo'' is where a large number of files are generated and removed during the day, performing hourly backup for it would require a large amount of disk I/O. ''/storage/chkuo'', as the name implied, is the place for files that need long-term storage and has no need to be checked hourly.
In order to save the disk space, files in the home directory (''/Users/chkuo'') were linked to the latest hourly backup (''/Volumes/sparrow/hourly.0'') if they have not been changed.
chkuo@sage[~/script]$ cat daily.sh
#!/bin/bash
if [ -d /Volumes/sparrow ]; then
/Users/chkuo/script/sync.daily.sh
else
exit 1
fi
chkuo@sage[~/script]$ cat sync.daily.sh
#!/bin/bash
target=/Volumes/sparrow/daily
max=9
# remove the oldest snapshot, if it exists
if [ -e $target.$max ]; then
rm -rf $target.$max*
fi;
# shift the other snapshot(s) back by one, if they exist
for (( i=$max;i>0;i=i-1 ))
do
if [ -e $target.$(($i-1)) ]; then
mv $target.$(($i-1)) $target.$i
fi;
if [ -e $target.$(($i-1)).log ]; then
mv $target.$(($i-1)).log $target.$i.log
fi;
if [ -e $target.$(($i-1)).err ]; then
mv $target.$(($i-1)).err $target.$i.err
fi;
done
# make the current snapshot
mkdir -p $target.0/Users
rsync -av --delete \
--link-dest='/Volumes/sparrow/hourly.0' \
--exclude='Library/Caches' \
--exclude='.cpan/' \
/Users/chkuo $target.0/Users \
1> $target.0.log \
2> $target.0.err
mkdir -p $target.0/scratch
rsync -av --delete \
--link-dest=$target.1/scratch \
/scratch/chkuo $target.0/scratch \
1>> $target.0.log \
2>> $target.0.err
mkdir -p $target.0/storage
rsync -av --delete \
--link-dest=$target.1/storage \
/storage/chkuo $target.0/storage \
1>> $target.0.log \
2>> $target.0.err
# update the timestamp
touch $target.0
==== Weekly Backup ====
Once again, hard links are used for files that have not been changed since the last daily backup to save disk space.
chkuo@sage[~/script]$ cat weekly.sh
#!/bin/bash
if [ -d /Volumes/sparrow ]; then
/Users/chkuo/script/sync.weekly.sh
else
exit 1
fi
chkuo@sage[~/script]$ cat sync.weekly.sh
#!/bin/bash
target=/Volumes/sparrow/weekly
max=9
# remove the oldest snapshot, if it exists
if [ -e $target.$max ]; then
rm -rf $target.$max*
fi;
# shift the other snapshot(s) back by one, if they exist
for (( i=$max;i>0;i=i-1 ))
do
if [ -e $target.$(($i-1)) ]; then
mv $target.$(($i-1)) $target.$i
fi;
if [ -e $target.$(($i-1)).log ]; then
mv $target.$(($i-1)).log $target.$i.log
fi;
if [ -e $target.$(($i-1)).err ]; then
mv $target.$(($i-1)).err $target.$i.err
fi;
done
# make the current snapshot
mkdir -p $target.0/Users
rsync -av --delete \
--link-dest='/Volumes/sparrow/daily.0/Users' \
--exclude='Library/Caches' \
--exclude='.cpan/' \
/Users/chkuo $target.0/Users \
1> $target.0.log \
2> $target.0.err
mkdir -p $target.0/scratch
rsync -av --delete \
--link-dest='/Volumes/sparrow/daily.0/scratch' \
/scratch/chkuo $target.0/scratch \
1>> $target.0.log \
2>> $target.0.err
mkdir -p $target.0/storage
rsync -av --delete \
--link-dest='/Volumes/sparrow/daily.0/storage' \
/storage/chkuo $target.0/storage \
1>> $target.0.log \
2>> $target.0.err
# update the timestamp
touch $target.0
==== Remote Server ====
The example above uses a locally mounted external hard drive for backup. It is also possible to backup to a remote server using ''rsync'' for added safety. For this to work, it is necessary to set up ''ssh'' auto-authentication first (see [[computer:bash|here]]).
The first script calls the second script and write the result to a log file.
chkuo@mesquite[~/script]$ cat aphidhouse.sh
#!/bin/bash
/Users/chkuo/script/sync.aphidhouse.sh > /Users/chkuo/script/log/aphidhouse.log 2>&1
The second script specifies the file directories that need to be included and the address of the remote server.
chkuo@mesquite[~/script]$ cat sync.aphidhouse.sh
#!/bin/bash
# backup to Ochman/Moran Lab server
# aphidhouse.biochem.arizona.edu
# essential files
rsync -av --rsh="ssh -l chkuo" --delete ~/active_project aphidhouse.biochem.arizona.edu:~
rsync -av --rsh="ssh -l chkuo" --delete ~/bin aphidhouse.biochem.arizona.edu:~
rsync -av --rsh="ssh -l chkuo" --delete ~/plscript aphidhouse.biochem.arizona.edu:~
rsync -av --rsh="ssh -l chkuo" --delete ~/script aphidhouse.biochem.arizona.edu:~
rsync -av --rsh="ssh -l chkuo" --delete ~/test aphidhouse.biochem.arizona.edu:~
rsync -av --rsh="ssh -l chkuo" --delete ~/work_doc aphidhouse.biochem.arizona.edu:~
===== Cron Jobs =====
Once the shell scripts are ready, the next step is to setup a ''crontab'' to specify the time when the scripts should be executed.
The crontab is a simple text file in the following format:
chkuo@mesquite[~/script]$ cat mesquite.cron
# set up crob jobs
# the fields are:
# (1) minute (0 - 59)
# (2) hour (0 - 23)
# (3) day of month (1 - 31)
# (4) month (1 - 12)
# (5) day of week (0 - 6) (Sunday = 0)
# (6) command to be executed
# sync to external hard drive (hourly backup)
0 9-18 * * 1-5 /Users/chkuo/script/hourly.sh
# sync to external hard drive (daily backup)
30 8 * * 1-5 /Users/chkuo/script/daily.sh
# sync to external hard drive (weekly backup)
30 19 * * 5 /Users/chkuo/script/weekly.sh
In the example above, the ''hourly.sh'' is executed every hour from 9AM to 6PM during weekdays, the ''daily.sh'' is executed at 8:30AM Monday through Friday, and the ''weekly.sh'' is executed at 7:30PM every Friday.
Use the ''crontab'' command to setup the schedule:
chkuo@mesquite[~/script]$ crontab mesquite.cron
To check if the ''cron'' jobs are set up correctly, use the following command:
chkuo@mesquite[~/script]$ crontab -l
# set up crob jobs
# the fields are:
# (1) minute (0 - 59)
# (2) hour (0 - 23)
# (3) day of month (1 - 31)
# (4) month (1 - 12)
# (5) day of week (0 - 6) (Sunday = 0)
# (6) command to be executed
# sync to external hard drive (hourly backup)
0 9-18 * * 1-5 /Users/chkuo/script/hourly.sh
# sync to external hard drive (daily backup)
30 8 * * 1-5 /Users/chkuo/script/daily.sh
# sync to external hard drive (weekly backup)
30 19 * * 5 /Users/chkuo/script/weekly.sh
The output should be identical to the ''crontab'' file used as the input earlier.