Rsync - synchronizes files and directories from one location to another

=Background= rsync is a free software computer program for Unix systems which synchronizes files and directories from one location to another while minimizing data transfer using delta encoding when appropriate. An important feature of rsync not found in most similar programs/protocols is that the mirroring takes place with only one transmission in each direction.

rsync can copy or display directory contents and copy files, optionally using compression and recursion.

rsyncd, the rsync protocol daemon, uses the default TCP port of 873. rsync can also be used to synchronize local directories, or via a remote shell such as RSH or SSH. In the latter case, the rsync client executable must be installed on the near as well as the far host (the computer running the remote shell daemon). There also exists a utility called rdiff, which can be used for incremental backups.

The Mac OS X filesystem has a special version, rsyncX, which allow transferring resource forks. To run rsync on Microsoft Windows, the Cygwin package is necessary to provide the expected system interfaces. A package combination is available that include rsync, cygwin, and an installer, making it easier and more familiar to Windows users.

There are several well written tutorials on using rsync

If you are interested in fully automated, ssh-secured backups from one machine to another, then see the 3rd example below, and read this easy | Debian Admin HOW-TO on setting up ssh/rsa keys so that you computer can securely back itself up to your Linkstation while you are not there.

=Installation=

Compile from source
On any distribution FreeLink or OpenLink wget http://samba.org/ftp/rsync/rsync-3.0.2.tar.gz tar xfzv rsync-3.0.2.tar.gz cd rsync-2.6.9 ./configure make su root make install
 * 1) Make sure you have installed the Precompiled C development environment, running on the LS first.
 * 2) Get the source, make and install.

PowerPC
From the Yahoo! Linkstation General Group wget http://ls.jcedata.net/rsync chmod a+x rsync cp rsync /usr/bin

FreeLink
Use apt-get to install rsync apt-get install rsync

PowerPC
ipkg install rsync
 * Install Ipkg on the Linkstation (for end-users) and enable the NSLU2 Feed: Ipkg Package List: PowerPC
 * Install rsync

MIPSel
Alexander Skwar has created a fairly extensive selection of Ipkg packages for the MIPSel (LS2) LinkStation, Install Ipkg and enable his feed: ipkg install rsync
 * Ipkg on the Linkstation (for end-users)
 * Experimental "unstable" ipk Packages for the MIPSEL Linkstation
 * Install rsync

LinkStations
The manufacturer (Buffalo) of Linkstations has installed sshd and rsync software on these machines, but has not enabled it, probably because of difficulties in maintaining warranty when a backup device is converted to use as a general computer. It is relatively easy to enable this software if you are willing to void your warranty; see:
 * Enabling Rsync on a Linkstation Mini

=Examples= rsync -a /SOURCE/ /DEST/`date +%Y-%m-%d`/
 * Create a backup named with todays date (formatted yyyy-mm-dd)

BACKUP_DATE=`date +%Y-%m-%d` rsync -a --delete --link-dest=/DEST/most-recent-backup /SOURCE/ /DEST/$BACKUP_DATE/ rm /BACKUP/most-recent-backup ln -s /DEST/$BACKUP_DATE /DEST/most-recent-backup
 * A script to create a backup, named by date, which will save space by creating hard links to files which are already backed up. It requires a symbolic link to the most recently created backup dir (similar to the last line of the script)
 * 1) !/bin/sh

echo ========kurohg mediarippertunes backup started $(date) ====== >> /var/log/musicbackuplog.txt ; rsync -av --rsh=ssh /mnt/mediarippertunes/ 10.0.1.10:/mnt/share/mediarippertunes >> /var/log/musicbackuplog.txt ; echo ======== end of backup run at $(date) ========  >> /var/log/musicbackuplog.txt ; echo   >>  /var/log/musicbackuplog.txt
 * This one-liner is incorporated into my crontab, giving me a daily backup of my KuroboxHG's mp3's as it rips them, and a log of the backup. Use it either as a one-liner, or edit into separate lines and use as a bash script.  It does the following:
 * prints a header with a timestamp
 * executes a secure, ssh-keyed machine-to-machine transfer, gives an rsync log, along with a record of what files are updated, and the average transfer speed of the backup
 * prints a footer with timestamp, and then a blank line
 * all the output is appended, so that the log is cumulative (I also have logrotation enabled)

echo ========FreeLink /dev/hda3 JFS backup started $(date) ====== >> /var/log/weeklybackuplog.txt ; rsync -av /mnt/share/shareddirs/ /mnt/share/usb0/backups/shareddirs >> /var/log/weeklybackuplog.txt ; echo ======end of backup run at $(date) =======  >> /var/log/weeklybackuplog.txt ; echo >> /var/log/weeklybackuplog.txt
 * Here is another one-liner which, using cron, gives me a weekly backup of my LS-HG's data partition hda3 onto a USB hard drive & logs everything. Use it either as a one-liner, or edit into separate lines and use as a bash script.  It does the following:
 * prints a header with a timestamp
 * executes the backup & shows the rsync log, along with a record of what files are updated, and the average transfer speed of the backup
 * prints a footer with timestamp, and then a blank line
 * all the output is appended, so that the log is cumulative (I also have logrotation enabled)

= Using keygen and other utils for automatic, unattended login via ssh = If you want to have your rsync happen automatically, then set up keys for ssh. Read the following for instructions:

http://freebsd.peon.net/quickies/21/

http://www.debianadmin.com/ssh-your-debian-servers-without-password.html

=Editing Rsync rsnapshot backup for easy LS archiving=

rsnapshot is a script that uses rsync to create a space saving archive of data (see http://www.rsnapshot.org/ for more details). Usually you configure rsnapshot to do periodical backups of certain disk areas. The backup destination may even be a remote site. Anybody who has a remote Unix server available does not need to read the following lines. You are better off doing a straight rsnapshot backup to your Unix machine.

But... if you only have a windows PC available, there are some limitations: Due to the different nature of Unix and Windows filesystems (ext3 vs. NTFS), a 1:1 backup from the LS to a windows PC is not possible. Even if you e. g. run Cygwin, some permissions and ownerships will get lost, if you simply rsnapshot to your PC.

The following script works this around. It assumes, that you have an rsync receiver available at the backup target. The Cygwin tools are a very good choice for providing this. What rsync_rsnapshot does, is to package all rsnapshot backups on the Unix side (your LS) and to transfer only a single TAR file to the windows PC. This has one major drawback that needs to be mentioned: rsync has a great feature to only transfer data that has been changed. This rsync feature does a great job comparing single files. It is way more inefficient to have the delta algorithm only on a compressed tar file. On the other hand, we usually have a fast network connection between the LS and the backup destination. So transferring larger volumes of data does not matter too much.

You need to configure rsnapshot to create a local backup in the first. The following script needs to be parameterized setting some global variables:


 * RS_ROOT: The root folder for your archive on the LS (you may need to calculate a considerable amount of space for this.
 * TAR_FILE: The name of the tar file on the local (LS) destination.
 * RSYNC_SERVER_DEST: The FQDN of the PC running the Cygwin tools with a listening SSHD and RSYNC installed (any Unix box would be fine although).
 * RS_MOST_RECENT: The subfolder name of the rsnapshot tree that has the largest update rate. If you´re doing daily backups, this is usually "daily.0", if running weekly backups, it should be "weekly.0".
 * RSYNC_USER_DEST: The name of the user on the destination box.
 * RSYNC_FOLDER_DEST: The destination folder on the Cygwin box.
 * RSYNC_SSH_DSA: Name of the key file for the DSA private ssh key.

RS_ROOT='/share/backup/rsnapshot' RS_MOST_RECENT='weekly.0' RS_LOCK_FILE='/var/run/rsnapshot.pid' MY_LOCK_FILE='/var/run/rsync_rsnapshot.pid' MY_NAME=`basename $0` TAR_FILE='/share/backup/rsnapshot/rs_nas1.tgz' LOG_TAG='rsync_rsnapshot' RSYNC_SERVER_DEST='dest3' RSYNC_USER_DEST='sshrsync' RSYNC_FOLDER_DEST='/cygdrive/n/backup/nas/' RSYNC_SSH_DSA='dest3_backup.id_dsa'
 * 1) !/bin/bash

if [ -a "${MY_LOCK_FILE}" ]; then # check if pid from lockfile does still exist LOCK_PID=`cat ${MY_LOCK_FILE}` LOCK_PGM=`ps -p ${LOCK_PID} -o comm=`
 * 1) check for existence of own lockfile

# check if the lock pid is valid if [ "${LOCK_PGM}" = "${MY_NAME}" ]; then logger -p syslog.info -t ${LOG_TAG} "previous run is still active: ${LOCK_PID}" exit; else # write log record logger -p syslog.error -t ${LOG_TAG} "removing old lock file with dead or wrong process: ${LOCK_PGM}" rm ${MY_LOCK_FILE} fi fi

echo $$ > ${MY_LOCK_FILE}
 * 1) create lockfile

LOCK_PID=`cat ${MY_LOCK_FILE}` if [ $$ -ne ${LOCK_PID} ]; then # lockfile could not be created properly logger -p syslog.error -t ${LOG_TAG} "could not create lock file, conflict with pid=${LOCK_PID}" exit 1; fi
 * 1) check if everything went ok

if [ ${RS_ROOT}/${RS_MOST_RECENT} -nt ${TAR_FILE} ]; then logger -p syslog.info -t ${LOG_TAG} "start tar file creation"
 * 1) check if tar file exists or if folder has newer timestamp than the tar file

# exit if rsnapshot pid file exists if [ -a "${RS_LOCK_FILE}" ]; then # write log record logger -p syslog.warning -t ${LOG_TAG} "rsnapshot run in progress, exiting..." rm ${MY_LOCK_FILE} exit; fi

# generate tar file (overwrite old one if it exists       nice tar --numeric-owner --one-file-system --preserve --exclude=${TAR_FILE} -czPf ${TAR_FILE} ${RS_ROOT}        chmod 600 ${TAR_FILE}

# write log record logger -p syslog.info -t ${LOG_TAG} "new tar file created" fi

logger -p syslog.info -t ${LOG_TAG} "start/check for rsync transfer" RSYNC_OUT=`nice rsync -e "ssh -i $HOME/.ssh/${RSYNC_SSH_DSA}" -av --delete-excluded --timeout=30 --partial --whole-file ${TAR_FILE} ${RSYNC_USER_DEST}@${RSYNC_SERVER_DEST}:${RSYNC_FOLDER_DEST}`
 * 1) rsync tar archive to destination server

if [ $? -eq 0 ]; then logger -p syslog.info -t ${LOG_TAG} ${RSYNC_OUT} else logger -p syslog.error -t ${LOG_TAG} "rsync failed ($?)" rm ${MY_LOCK_FILE} exit $? fi
 * 1) write log record

rm ${MY_LOCK_FILE}
 * 1) remove lock file

The script tries to connect the backup destination all day log. Whenever it is up and running, it starts transferring the latest tar file. So the level of safety depends on the availability of your backup box.

This gives you a convenient solution to create a historized archive of your data on the LS and to transfer the whole archive to another machine, usually your working PC. Personally I use rsync_rsnapshot to keep an up-to-date mirror of my system partition. For very large data volumes, an rsync approach tends to be slower in backup performance and needs to be questioned. rsync_rsnapshot has it´s major domain in providing archive and history information for fast changing environments.

= References =