Table of Contents
Backup
This is how I use rsync
and crontab
with some simple shell scripts to set up an automatic rotating backup system for my files.
Notes
References
Many thanks to those people who shared their knowledge and experience on the web.
External Hard Drive
Under Mac OS X, make sure to un-check the “ignore ownership on this volume” option (under Finder/Get Info). Otherwise rsync
will produce a full backup every time, instead of the intended incremental backup. Also I prefer to turn off Spotlight for the external hard drive.
Shell Scripts
Here are the scripts used in my backup scheme.
Hourly Backup
The first script checks if the hard drive was mounted correctly, and if yes, calls the second script to perform the backup.
chkuo@sage[~/script]$ cat hourly.sh #!/bin/bash if [ -d /Volumes/sparrow ]; then /Users/chkuo/script/sync.hourly.sh else exit 1 fi;
In this example, I use an external hard drive (/Volumes/sparrow
) to backup the files in the home directory (/Users/chkuo
). Cache files in the ~/Library/Caches
directory are excluded from the backup operation. The script is designed to maintain 10 hourly backups (hourly.0
is the most recent one and hourly.9
is the oldest one). Because the –link-dest
argument is used in the rsync
command, only files that have been changed during the past hour will be saved in the incremental backup; files that have not been changed will be linked with a hard link and do not occupy additional disk space.
chkuo@sage[~/script]$ cat sync.hourly.sh #!/bin/bash source=/Users/chkuo target=/Volumes/sparrow/hourly max=9 # remove the oldest snapshot, if it exists if [ -e $target.$max ]; then rm -rf $target.$max* fi; # shift the other snapshot(s) back by one, if they exist for (( i=$max;i>0;i=i-1 )) do if [ -e $target.$(($i-1)) ]; then mv $target.$(($i-1)) $target.$i fi; if [ -e $target.$(($i-1)).log ]; then mv $target.$(($i-1)).log $target.$i.log fi; if [ -e $target.$(($i-1)).err ]; then mv $target.$(($i-1)).err $target.$i.err fi; done # make the current snapshot rsync -av --delete \ --link-dest=$target.1 \ --exclude='Library/Caches' \ --exclude='.cpan/' \ $source $target.0 \ 1> $target.0.log \ 2> $target.0.err # update the timestamp touch $target.0
Daily Backup
Similar to the hourly backup, two scripts are used to perform the daily backup. In addition to the home directory (/Users/chkuo
), I also backup files in /scratch/chkuo
and /storage/chkuo
. /scratch/chkuo
is where a large number of files are generated and removed during the day, performing hourly backup for it would require a large amount of disk I/O. /storage/chkuo
, as the name implied, is the place for files that need long-term storage and has no need to be checked hourly.
In order to save the disk space, files in the home directory (/Users/chkuo
) were linked to the latest hourly backup (/Volumes/sparrow/hourly.0
) if they have not been changed.
chkuo@sage[~/script]$ cat daily.sh #!/bin/bash if [ -d /Volumes/sparrow ]; then /Users/chkuo/script/sync.daily.sh else exit 1 fi
chkuo@sage[~/script]$ cat sync.daily.sh #!/bin/bash target=/Volumes/sparrow/daily max=9 # remove the oldest snapshot, if it exists if [ -e $target.$max ]; then rm -rf $target.$max* fi; # shift the other snapshot(s) back by one, if they exist for (( i=$max;i>0;i=i-1 )) do if [ -e $target.$(($i-1)) ]; then mv $target.$(($i-1)) $target.$i fi; if [ -e $target.$(($i-1)).log ]; then mv $target.$(($i-1)).log $target.$i.log fi; if [ -e $target.$(($i-1)).err ]; then mv $target.$(($i-1)).err $target.$i.err fi; done # make the current snapshot mkdir -p $target.0/Users rsync -av --delete \ --link-dest='/Volumes/sparrow/hourly.0' \ --exclude='Library/Caches' \ --exclude='.cpan/' \ /Users/chkuo $target.0/Users \ 1> $target.0.log \ 2> $target.0.err mkdir -p $target.0/scratch rsync -av --delete \ --link-dest=$target.1/scratch \ /scratch/chkuo $target.0/scratch \ 1>> $target.0.log \ 2>> $target.0.err mkdir -p $target.0/storage rsync -av --delete \ --link-dest=$target.1/storage \ /storage/chkuo $target.0/storage \ 1>> $target.0.log \ 2>> $target.0.err # update the timestamp touch $target.0
Weekly Backup
Once again, hard links are used for files that have not been changed since the last daily backup to save disk space.
chkuo@sage[~/script]$ cat weekly.sh #!/bin/bash if [ -d /Volumes/sparrow ]; then /Users/chkuo/script/sync.weekly.sh else exit 1 fi
chkuo@sage[~/script]$ cat sync.weekly.sh #!/bin/bash target=/Volumes/sparrow/weekly max=9 # remove the oldest snapshot, if it exists if [ -e $target.$max ]; then rm -rf $target.$max* fi; # shift the other snapshot(s) back by one, if they exist for (( i=$max;i>0;i=i-1 )) do if [ -e $target.$(($i-1)) ]; then mv $target.$(($i-1)) $target.$i fi; if [ -e $target.$(($i-1)).log ]; then mv $target.$(($i-1)).log $target.$i.log fi; if [ -e $target.$(($i-1)).err ]; then mv $target.$(($i-1)).err $target.$i.err fi; done # make the current snapshot mkdir -p $target.0/Users rsync -av --delete \ --link-dest='/Volumes/sparrow/daily.0/Users' \ --exclude='Library/Caches' \ --exclude='.cpan/' \ /Users/chkuo $target.0/Users \ 1> $target.0.log \ 2> $target.0.err mkdir -p $target.0/scratch rsync -av --delete \ --link-dest='/Volumes/sparrow/daily.0/scratch' \ /scratch/chkuo $target.0/scratch \ 1>> $target.0.log \ 2>> $target.0.err mkdir -p $target.0/storage rsync -av --delete \ --link-dest='/Volumes/sparrow/daily.0/storage' \ /storage/chkuo $target.0/storage \ 1>> $target.0.log \ 2>> $target.0.err # update the timestamp touch $target.0
Remote Server
The example above uses a locally mounted external hard drive for backup. It is also possible to backup to a remote server using rsync
for added safety. For this to work, it is necessary to set up ssh
auto-authentication first (see here).
The first script calls the second script and write the result to a log file.
chkuo@mesquite[~/script]$ cat aphidhouse.sh #!/bin/bash /Users/chkuo/script/sync.aphidhouse.sh > /Users/chkuo/script/log/aphidhouse.log 2>&1
The second script specifies the file directories that need to be included and the address of the remote server.
chkuo@mesquite[~/script]$ cat sync.aphidhouse.sh #!/bin/bash # backup to Ochman/Moran Lab server # aphidhouse.biochem.arizona.edu # essential files rsync -av --rsh="ssh -l chkuo" --delete ~/active_project aphidhouse.biochem.arizona.edu:~ rsync -av --rsh="ssh -l chkuo" --delete ~/bin aphidhouse.biochem.arizona.edu:~ rsync -av --rsh="ssh -l chkuo" --delete ~/plscript aphidhouse.biochem.arizona.edu:~ rsync -av --rsh="ssh -l chkuo" --delete ~/script aphidhouse.biochem.arizona.edu:~ rsync -av --rsh="ssh -l chkuo" --delete ~/test aphidhouse.biochem.arizona.edu:~ rsync -av --rsh="ssh -l chkuo" --delete ~/work_doc aphidhouse.biochem.arizona.edu:~
Cron Jobs
Once the shell scripts are ready, the next step is to setup a crontab
to specify the time when the scripts should be executed.
The crontab is a simple text file in the following format:
chkuo@mesquite[~/script]$ cat mesquite.cron # set up crob jobs # the fields are: # (1) minute (0 - 59) # (2) hour (0 - 23) # (3) day of month (1 - 31) # (4) month (1 - 12) # (5) day of week (0 - 6) (Sunday = 0) # (6) command to be executed # sync to external hard drive (hourly backup) 0 9-18 * * 1-5 /Users/chkuo/script/hourly.sh # sync to external hard drive (daily backup) 30 8 * * 1-5 /Users/chkuo/script/daily.sh # sync to external hard drive (weekly backup) 30 19 * * 5 /Users/chkuo/script/weekly.sh
In the example above, the hourly.sh
is executed every hour from 9AM to 6PM during weekdays, the daily.sh
is executed at 8:30AM Monday through Friday, and the weekly.sh
is executed at 7:30PM every Friday.
Use the crontab
command to setup the schedule:
chkuo@mesquite[~/script]$ crontab mesquite.cron
To check if the cron
jobs are set up correctly, use the following command:
chkuo@mesquite[~/script]$ crontab -l # set up crob jobs # the fields are: # (1) minute (0 - 59) # (2) hour (0 - 23) # (3) day of month (1 - 31) # (4) month (1 - 12) # (5) day of week (0 - 6) (Sunday = 0) # (6) command to be executed # sync to external hard drive (hourly backup) 0 9-18 * * 1-5 /Users/chkuo/script/hourly.sh # sync to external hard drive (daily backup) 30 8 * * 1-5 /Users/chkuo/script/daily.sh # sync to external hard drive (weekly backup) 30 19 * * 5 /Users/chkuo/script/weekly.sh
The output should be identical to the crontab
file used as the input earlier.