Table of Contents

Backup

This is how I use rsync and crontab with some simple shell scripts to set up an automatic rotating backup system for my files.

Notes

References

Many thanks to those people who shared their knowledge and experience on the web.

External Hard Drive

Under Mac OS X, make sure to un-check the “ignore ownership on this volume” option (under Finder/Get Info). Otherwise rsync will produce a full backup every time, instead of the intended incremental backup. Also I prefer to turn off Spotlight for the external hard drive.

Shell Scripts

Here are the scripts used in my backup scheme.

Hourly Backup

The first script checks if the hard drive was mounted correctly, and if yes, calls the second script to perform the backup.

chkuo@sage[~/script]$ cat hourly.sh 
#!/bin/bash
if [ -d /Volumes/sparrow ]; then
	/Users/chkuo/script/sync.hourly.sh
else
	exit 1
fi;

In this example, I use an external hard drive (/Volumes/sparrow) to backup the files in the home directory (/Users/chkuo). Cache files in the ~/Library/Caches directory are excluded from the backup operation. The script is designed to maintain 10 hourly backups (hourly.0 is the most recent one and hourly.9 is the oldest one). Because the –link-dest argument is used in the rsync command, only files that have been changed during the past hour will be saved in the incremental backup; files that have not been changed will be linked with a hard link and do not occupy additional disk space.

chkuo@sage[~/script]$ cat sync.hourly.sh 
#!/bin/bash
source=/Users/chkuo
target=/Volumes/sparrow/hourly
max=9
 
# remove the oldest snapshot, if it exists
if [ -e $target.$max ]; then
	rm -rf $target.$max*
fi;
 
# shift the other snapshot(s) back by one, if they exist
for (( i=$max;i>0;i=i-1 ))
do
    if [ -e $target.$(($i-1)) ]; then
        mv $target.$(($i-1)) $target.$i
    fi;
    if [ -e $target.$(($i-1)).log ]; then
        mv $target.$(($i-1)).log $target.$i.log
    fi;
    if [ -e $target.$(($i-1)).err ]; then
        mv $target.$(($i-1)).err $target.$i.err
    fi;
done
 
# make the current snapshot
rsync -av --delete \
	--link-dest=$target.1 \
	--exclude='Library/Caches' \
	--exclude='.cpan/' \
	$source  $target.0 \
	1> $target.0.log \
	2> $target.0.err
 
# update the timestamp
touch $target.0

Daily Backup

Similar to the hourly backup, two scripts are used to perform the daily backup. In addition to the home directory (/Users/chkuo), I also backup files in /scratch/chkuo and /storage/chkuo. /scratch/chkuo is where a large number of files are generated and removed during the day, performing hourly backup for it would require a large amount of disk I/O. /storage/chkuo, as the name implied, is the place for files that need long-term storage and has no need to be checked hourly.

In order to save the disk space, files in the home directory (/Users/chkuo) were linked to the latest hourly backup (/Volumes/sparrow/hourly.0) if they have not been changed.

chkuo@sage[~/script]$ cat daily.sh 
#!/bin/bash
if [ -d /Volumes/sparrow ]; then
	/Users/chkuo/script/sync.daily.sh
else
	exit 1
fi
chkuo@sage[~/script]$ cat sync.daily.sh 
#!/bin/bash
target=/Volumes/sparrow/daily
max=9
 
# remove the oldest snapshot, if it exists
if [ -e $target.$max ]; then
	rm -rf $target.$max*
fi;
 
# shift the other snapshot(s) back by one, if they exist
for (( i=$max;i>0;i=i-1 ))
do
    if [ -e $target.$(($i-1)) ]; then
        mv $target.$(($i-1)) $target.$i
    fi;
    if [ -e $target.$(($i-1)).log ]; then
        mv $target.$(($i-1)).log $target.$i.log
    fi;
    if [ -e $target.$(($i-1)).err ]; then
        mv $target.$(($i-1)).err $target.$i.err
    fi;
done
 
# make the current snapshot
mkdir -p $target.0/Users
rsync -av --delete \
	--link-dest='/Volumes/sparrow/hourly.0' \
	--exclude='Library/Caches' \
	--exclude='.cpan/' \
	/Users/chkuo  $target.0/Users \
	1> $target.0.log \
	2> $target.0.err
mkdir -p $target.0/scratch
rsync -av --delete \
	--link-dest=$target.1/scratch \
	/scratch/chkuo  $target.0/scratch \
	1>> $target.0.log \
	2>> $target.0.err
mkdir -p $target.0/storage
rsync -av --delete \
	--link-dest=$target.1/storage \
	/storage/chkuo  $target.0/storage \
	1>> $target.0.log \
	2>> $target.0.err
 
# update the timestamp
touch $target.0

Weekly Backup

Once again, hard links are used for files that have not been changed since the last daily backup to save disk space.

chkuo@sage[~/script]$ cat weekly.sh 
#!/bin/bash
if [ -d /Volumes/sparrow ]; then
	/Users/chkuo/script/sync.weekly.sh
else
	exit 1
fi
chkuo@sage[~/script]$ cat sync.weekly.sh 
#!/bin/bash
target=/Volumes/sparrow/weekly
max=9
 
# remove the oldest snapshot, if it exists
if [ -e $target.$max ]; then
	rm -rf $target.$max*
fi;
 
# shift the other snapshot(s) back by one, if they exist
for (( i=$max;i>0;i=i-1 ))
do
    if [ -e $target.$(($i-1)) ]; then
        mv $target.$(($i-1)) $target.$i
    fi;
    if [ -e $target.$(($i-1)).log ]; then
        mv $target.$(($i-1)).log $target.$i.log
    fi;
    if [ -e $target.$(($i-1)).err ]; then
        mv $target.$(($i-1)).err $target.$i.err
    fi;
done
 
# make the current snapshot
mkdir -p $target.0/Users
rsync -av --delete \
	--link-dest='/Volumes/sparrow/daily.0/Users' \
	--exclude='Library/Caches' \
	--exclude='.cpan/' \
	/Users/chkuo  $target.0/Users \
	1> $target.0.log \
	2> $target.0.err
mkdir -p $target.0/scratch
rsync -av --delete \
	--link-dest='/Volumes/sparrow/daily.0/scratch' \
	/scratch/chkuo  $target.0/scratch \
	1>> $target.0.log \
	2>> $target.0.err
mkdir -p $target.0/storage
rsync -av --delete \
	--link-dest='/Volumes/sparrow/daily.0/storage' \
	/storage/chkuo  $target.0/storage \
	1>> $target.0.log \
	2>> $target.0.err
 
# update the timestamp
touch $target.0

Remote Server

The example above uses a locally mounted external hard drive for backup. It is also possible to backup to a remote server using rsync for added safety. For this to work, it is necessary to set up ssh auto-authentication first (see here).

The first script calls the second script and write the result to a log file.

chkuo@mesquite[~/script]$ cat aphidhouse.sh 
#!/bin/bash
/Users/chkuo/script/sync.aphidhouse.sh > /Users/chkuo/script/log/aphidhouse.log 2>&1 

The second script specifies the file directories that need to be included and the address of the remote server.

chkuo@mesquite[~/script]$ cat sync.aphidhouse.sh 
#!/bin/bash
# backup to Ochman/Moran Lab server
# aphidhouse.biochem.arizona.edu
# essential files
rsync -av --rsh="ssh -l chkuo" --delete ~/active_project aphidhouse.biochem.arizona.edu:~
rsync -av --rsh="ssh -l chkuo" --delete ~/bin aphidhouse.biochem.arizona.edu:~
rsync -av --rsh="ssh -l chkuo" --delete ~/plscript aphidhouse.biochem.arizona.edu:~
rsync -av --rsh="ssh -l chkuo" --delete ~/script aphidhouse.biochem.arizona.edu:~
rsync -av --rsh="ssh -l chkuo" --delete ~/test aphidhouse.biochem.arizona.edu:~
rsync -av --rsh="ssh -l chkuo" --delete ~/work_doc aphidhouse.biochem.arizona.edu:~

Cron Jobs

Once the shell scripts are ready, the next step is to setup a crontab to specify the time when the scripts should be executed.

The crontab is a simple text file in the following format:

chkuo@mesquite[~/script]$ cat mesquite.cron 
# set up crob jobs
# the fields are:
#   (1) minute (0 - 59) 
#   (2) hour (0 - 23)
#   (3) day of month (1 - 31)
#   (4) month (1 - 12)  
#   (5) day of week (0 - 6) (Sunday = 0)
#   (6) command to be executed
# sync to external hard drive (hourly backup)
0      9-18       *       *       1-5       /Users/chkuo/script/hourly.sh
# sync to external hard drive (daily backup)
30      8       *       *       1-5       /Users/chkuo/script/daily.sh
# sync to external hard drive (weekly backup)
30      19       *       *       5       /Users/chkuo/script/weekly.sh

In the example above, the hourly.sh is executed every hour from 9AM to 6PM during weekdays, the daily.sh is executed at 8:30AM Monday through Friday, and the weekly.sh is executed at 7:30PM every Friday.

Use the crontab command to setup the schedule:

chkuo@mesquite[~/script]$ crontab mesquite.cron 

To check if the cron jobs are set up correctly, use the following command:

chkuo@mesquite[~/script]$ crontab -l
# set up crob jobs
# the fields are:
#   (1) minute (0 - 59) 
#   (2) hour (0 - 23)
#   (3) day of month (1 - 31)
#   (4) month (1 - 12)  
#   (5) day of week (0 - 6) (Sunday = 0)
#   (6) command to be executed
# sync to external hard drive (hourly backup)
0      9-18       *       *       1-5       /Users/chkuo/script/hourly.sh
# sync to external hard drive (daily backup)
30      8       *       *       1-5       /Users/chkuo/script/daily.sh
# sync to external hard drive (weekly backup)
30      19       *       *       5       /Users/chkuo/script/weekly.sh

The output should be identical to the crontab file used as the input earlier.