archive and synchronizations with unison and rsync

I use rsync as my primary backup utility as it is fairly efficient at archiving files (NOTE: to backup to fat32 usb drives, make sure --modify-window=1 is used). I recently discovered unison, and am using it for real time synchronization between directories from two Linux computers. I really like it.

Because of how well unison works for syncing my two computers in real time, I wanted to compare it to rsync for backing up files (one-way) to different computers and to usb drives (use the -force /first/location for one way syncs). This post does a good job describing unison‘s capabilities. I tried unison in batch mode to sync between two Linux computers, and it too was quite fast (even faster than rsync).

However, it is horrendously slow when transferring to a FAT drive since it uses checksum on all the files; it is discussed in the comments here.

For my current backup needs, I can use rsync or unison for backups between Linux computers or ext formatted drives; I will probably rely more on rsync since unison isn’t installed by default on all Linux machines and the same version is required on the linked machines. For archiving to a FAT32 usb drive, I will rely on rsync. For real-time synchronizations, I will rely on unison.

Real time file synchronization like Dropbox via Unison

Dropbox is a very nice tool for real time synchronization. It works very well to keep files from multiple devices (computers, phones, etc.) in sync. I use it mainly as a cloud-based backup for some of my files. However, it’s been on the headlines recently due to security and privacy concerns, leading to calls for encrypting your files prior to syncing with Dropbox.

I’ve always contemplated on running my own Dropbox-like service to have yet another safe backup of my files. Besides knowing where my data are stored exactly, I have (in theory) an unlimited amount of space. This post and this post outline solutions based on open source tools such as OpenSSH (for encrypted file transfer), lsyncd (for monitoring files), and Unison (rsync-like tool). I’ve attempted this setup, but failed to get things working with lsyncd (see the extensive discussion with the author via the comments).

I stumbled upon this post that outlines a solution based on the bleeding edge version of Unison, which includes the -repeat watch option, featuring the monitoring of files. However, the author outlined a solution for Mac OS X. I played around with the new Unison and arrived at a solution I am pretty satisfied with for my Ubuntu machines (easily extended to Mac and Windows, I’m sure). I will outline my setup in this post. Note that I have password-less ssh set up so that I can ssh into my server without typing in the password. Also, I am using Unison version 2.44.2, which I downloaded via svn around 7/16/2011.

Installing Unison

The same version of Unison must be installed on both the client and the server. Both my client and server runs Ubuntu (11.04 and 10.04 server). On the client, the folder I would like to sync is /home/vinh/Documents; the server’s destination is /home/vinh/Backup/Documents.

sudo apt-get install ocaml python-pyinotify
## install the .deb file from via `dpkg -i` if python-pyinotify is not in your repository
svn checkout
cd trunk
make NATIVE=true UISTYLE=text
## `make install` installs into $HOME/bin/
sudo cp src/unison /usr/local/bin/
sudo cp src/ /usr/local/bin/

Everything following is done on the client computer.



#! /bin/bash

## can't have extension in filename

# ssh username@server.ip -f -N -L 9922:server.ip:22 ## minimal
sudo -u local.username ssh username@server.ip -Y -C -f -N -L 9922:server.ip:22

## multiple instances can run in case of disconnect and reconnect

This script forwards my local port 9922 to the server’s port 22 via ssh. That way, I can ssh username@localhost -p 9922 if I wanted to connect to the server. I do this so that file synchronization can resume after a disconnect and reconnect (changed files does not get synced after a reconnect if I connect to the remote server directly).

Run sudo cp unisonNetworkOnPortForward /etc/network/if-up.d/ on Debian or Ubuntu. By doing this, the script will be executed whenever the computer is connected to a network (this will be different for non-debian-based distros). Note that multiple instances of this port forwarding will be present if the network is disconnected and reconnected multiple times. This makes things a little ugly, but I haven’t noticed any problems really. Also note that the script name cannot have a file extension or things will not work.

#! /bin/bash

## in /etc/rc.local, add:
## sudo -u local.username /path/to/ &

unison default ~/Documents ssh://username@localhost:9922//home/vinh/Backup/Documents -repeat watch -times -logfile /tmp/unison.log
# -times: sync timestamps
# -repeat watch: real-time synchronization via pyinotify

Add to /etc/rc.local before the last line:

sudo -u local.username /path/to/ &

This turns on unison sync at startup (unison will keep trying to connect to the server if it is disconnected). Again, this implementation is different for non-debian-based distros.

#! /bin/bash

unison -batch -times ~/Documents ssh://username@localhost:9922//home/vinh/Backup/Documents -logfile /tmp/unison.log

Run when you want to manually sync the two folders. I add the following line to cron (crontab -e) to have a manual sync everyday at 12:30pm:

30 12 * * * /path/to/

I set up this cron job because will only sync files that have changed while the unison process is running. This daily backup makes sure all my files are in sync at least once a day.

#! /bin/bash

ps aux | grep unison | awk '{print $2}' | xargs kill -9

I run this script on the client or server when I want to clean up unison processes. The one drawback about the monitor feature of unison currently is that the unison -server and process on the server is not killed when the unison process stops on the client side. After multiple connects, this will leave a lot of unison processes running on the server. Although I haven’t seen any issues with this, the script should make cleaning up the processes easier.

Start the service

Once these scripts are in their correct locations, first run to have the initial sync. Then restart the computer. You should see a unison and process by executing ps aux | grep unison on the client and server. Also, you should see an ssh process corresponding to the port forwarding by executing ps aux | grep ssh. Run touch foo.txt in the directory that you are watching and see if it appears on the server. Remove it and see if it gets deleted. Good luck!

What are some drawbacks with this setup compared to Dropbox? Well, I can’t revert back to files from a previous date, and I don’t have a dedicated Android app that I can access the files with. To solve the former, you can set up another cron job that syncs to a different location on your server every few days, giving you access to files that are a few days old. To solve the latter, I’m sure there are Android apps that allow you to access files via the sftp protocol.

ControlMaster in OpenSSH – speeding up editing files remotely with emacs + tramp

So I was googling around to find out how to change the shell in tramp for emacs, and I ran into this and this.

When editing remote files with emacs using tramp, opening and saving files can take a bit of time, due to re-logging in and authenticating. I discovered that you OpenSSH has a feature that allows one to re-use an existing connection to a remote host when opening new connections to that host. This is quite cool. Place the folling in ~/.ssh/config:

<pre class="src src-sh">Host *

ControlMaster auto ControlPath ~/.ssh/master-%r@%h:%p

Making my personal website and course websites: iWeb + rsync

So I’ve been using iWeb on my macbook to create my personal webpage and potential course websites. I use it because i don’t really know html, and i don’t think I NEED to learn it right now. Point and click to create them is fine with me for the time being. Actually, I would prefer to create the pages in google sites, and export them to my professional-life host, ie, uci-ics domain. However, this option isn’t quite available from google yet.

My main webpage is, and from there, i can have my personal homepage and course websites hosted. However, when I use iweb to publish multiple sites to the same destination via the sftp option, things get funny because iweb puts a default index.html file in each directory, and this file directs u to a page. As i upload multiple sites to that one root domain, re-direction get’s a little fuzzy. I fixed this by uploading the course websites first, then my personal site (root directory) last. Then, with every update, i just use “Publish Site Changes.” However, what if i want to add some more pages? I didn’t like this, and i finally did something about it.

Got my information from UCI’s EEE help on iweb.

Now, what I do is this:

  1. ICS servers: websites are in ~/public_html/
  2. Created ~/public_html and ~/iWebSites on my macbook.
  3. Publish my sites to a local folder, ~/iWebSites, instead of using sftp, one directory for each site.
  4. After every update and publishing to my local folder, i run the following script (supposing my i have two sites, one personal, and one for a class website):
   <pre class="src src-sh"><span style="color: #ff4500;">#</span><span style="color: #ff4500;">! /bin/</span><span style="color: #00ffff;">bash</span>

rsync -progress -av ~/iWebSites/Vinh_Q._Nguyen/ ~/public_html/ rsync -progress -av ~/iWebsites/stat8 ~/public_html/ rsync -progress -av -e ssh ~/public_html/

Now things work great! Good thing i have passwordless ssh!

Next thing to try is html in org-mode (emacs), which i found out through Michael Zeller’s comment on here (he makes his website with it).