Real time file synchronization like Dropbox via Unison

Linux
Security
Author

Vinh Nguyen

Published

July 18, 2011

Dropbox is a very nice tool for real time synchronization. It works very well to keep files from multiple devices (computers, phones, etc.) in sync. I use it mainly as a cloud-based backup for some of my files. However, it's been on the headlines recently due to security and privacy concerns, leading to calls for encrypting your files prior to syncing with Dropbox.

I've always contemplated on running my own Dropbox-like service to have yet another safe backup of my files. Besides knowing where my data are stored exactly, I have (in theory) an unlimited amount of space. This post and this post outline solutions based on open source tools such as OpenSSH (for encrypted file transfer), lsyncd (for monitoring files), and Unison (rsync-like tool). I've attempted this setup, but failed to get things working with lsyncd (see the extensive discussion with the author via the comments).

I stumbled upon this post that outlines a solution based on the bleeding edge version of Unison, which includes the -repeat watch option, featuring the monitoring of files. However, the author outlined a solution for Mac OS X. I played around with the new Unison and arrived at a solution I am pretty satisfied with for my Ubuntu machines (easily extended to Mac and Windows, I'm sure). I will outline my setup in this post. Note that I have password-less ssh set up so that I can ssh into my server without typing in the password. Also, I am using Unison version 2.44.2, which I downloaded via svn around 7/16/2011.

Installing Unison

The same version of Unison must be installed on both the client and the server. Both my client and server runs Ubuntu (11.04 and 10.04 server). On the client, the folder I would like to sync is /home/vinh/Documents; the server's destination is /home/vinh/Backup/Documents.

sudo apt-get install ocaml python-pyinotify
## install the .deb file from http://packages.ubuntu.com/search?keywords=python-pyinotify via `dpkg -i` if python-pyinotify is not in your repository
svn checkout https://webdav.seas.upenn.edu/svn/unison
cd trunk
make NATIVE=true UISTYLE=text
## `make install` installs into $HOME/bin/
sudo cp src/unison /usr/local/bin/
sudo cp src/fsmonitor.py /usr/local/bin/

Everything following is done on the client computer.

Scripts

unisonNetworkOnPortForward:

#! /bin/bash

## http://ubuntuforums.org/showpost.php?p=6679437&postcount=4
## can't have extension in filename http://www.duncanelliot.com/blog/?p=28

# ssh username@server.ip -f -N -L 9922:server.ip:22 ## minimal
sudo -u local.username ssh username@server.ip -Y -C -f -N -L 9922:server.ip:22

## multiple instances can run in case of disconnect and reconnect

This script forwards my local port 9922 to the server's port 22 via ssh. That way, I can ssh username@localhost -p 9922 if I wanted to connect to the server. I do this so that file synchronization can resume after a disconnect and reconnect (changed files does not get synced after a reconnect if I connect to the remote server directly).

Run sudo cp unisonNetworkOnPortForward /etc/network/if-up.d/ on Debian or Ubuntu. By doing this, the script will be executed whenever the computer is connected to a network (this will be different for non-debian-based distros). Note that multiple instances of this port forwarding will be present if the network is disconnected and reconnected multiple times. This makes things a little ugly, but I haven't noticed any problems really. Also note that the script name cannot have a file extension or things will not work.

unisonMonitor.sh:

#! /bin/bash

## in /etc/rc.local, add:
## sudo -u local.username /path/to/unisonMonitor.sh &

unison default ~/Documents ssh://username@localhost:9922//home/vinh/Backup/Documents -repeat watch -times -logfile /tmp/unison.log
# -times: sync timestamps
# -repeat watch: real-time synchronization via pyinotify

Add to /etc/rc.local before the last line:

sudo -u local.username /path/to/unisonMonitor.sh &

This turns on unison sync at startup (unison will keep trying to connect to the server if it is disconnected). Again, this implementation is different for non-debian-based distros.

unisonSync.sh:

#! /bin/bash

unison -batch -times ~/Documents ssh://username@localhost:9922//home/vinh/Backup/Documents -logfile /tmp/unison.log

Run unisonSync.sh when you want to manually sync the two folders. I add the following line to cron (crontab -e) to have a manual sync everyday at 12:30pm:

30 12 * * * /path/to/unisonSync.sh

I set up this cron job because unisonMonitor.sh will only sync files that have changed while the unison process is running. This daily backup makes sure all my files are in sync at least once a day.

unisonKill.sh:

#! /bin/bash

ps aux | grep unison | awk '{print $2}' | xargs kill -9

I run this script on the client or server when I want to clean up unison processes. The one drawback about the monitor feature of unison currently is that the unison -server and fsmonitor.py process on the server is not killed when the unison process stops on the client side. After multiple connects, this will leave a lot of unison processes running on the server. Although I haven't seen any issues with this, the unisonKill.sh script should make cleaning up the processes easier.

Start the service

Once these scripts are in their correct locations, first run unisonSync.sh to have the initial sync. Then restart the computer. You should see a unison and fsmonitor.py process by executing ps aux | grep unison on the client and server. Also, you should see an ssh process corresponding to the port forwarding by executing ps aux | grep ssh. Run touch foo.txt in the directory that you are watching and see if it appears on the server. Remove it and see if it gets deleted. Good luck!

What are some drawbacks with this setup compared to Dropbox? Well, I can't revert back to files from a previous date, and I don't have a dedicated Android app that I can access the files with. To solve the former, you can set up another cron job that syncs to a different location on your server every few days, giving you access to files that are a few days old. To solve the latter, I'm sure there are Android apps that allow you to access files via the sftp protocol.