Real time file synchronization like Dropbox via Unison

Dropbox is a very nice tool for real time synchronization. It works very well to keep files from multiple devices (computers, phones, etc.) in sync. I use it mainly as a cloud-based backup for some of my files. However, it’s been on the headlines recently due to security and privacy concerns, leading to calls for encrypting your files prior to syncing with Dropbox.

I’ve always contemplated on running my own Dropbox-like service to have yet another safe backup of my files. Besides knowing where my data are stored exactly, I have (in theory) an unlimited amount of space. This post and this post outline solutions based on open source tools such as OpenSSH (for encrypted file transfer), lsyncd (for monitoring files), and Unison (rsync-like tool). I’ve attempted this setup, but failed to get things working with lsyncd (see the extensive discussion with the author via the comments).

I stumbled upon this post that outlines a solution based on the bleeding edge version of Unison, which includes the -repeat watch option, featuring the monitoring of files. However, the author outlined a solution for Mac OS X. I played around with the new Unison and arrived at a solution I am pretty satisfied with for my Ubuntu machines (easily extended to Mac and Windows, I’m sure). I will outline my setup in this post. Note that I have password-less ssh set up so that I can ssh into my server without typing in the password. Also, I am using Unison version 2.44.2, which I downloaded via svn around 7/16/2011.

Installing Unison

The same version of Unison must be installed on both the client and the server. Both my client and server runs Ubuntu (11.04 and 10.04 server). On the client, the folder I would like to sync is /home/vinh/Documents; the server’s destination is /home/vinh/Backup/Documents.

sudo apt-get install ocaml python-pyinotify
## install the .deb file from http://packages.ubuntu.com/search?keywords=python-pyinotify via `dpkg -i` if python-pyinotify is not in your repository
svn checkout https://webdav.seas.upenn.edu/svn/unison
cd trunk
make NATIVE=true UISTYLE=text
## `make install` installs into $HOME/bin/
sudo cp src/unison /usr/local/bin/
sudo cp src/fsmonitor.py /usr/local/bin/

Everything following is done on the client computer.

Scripts

unisonNetworkOnPortForward:

#! /bin/bash

## http://ubuntuforums.org/showpost.php?p=6679437&postcount=4
## can't have extension in filename http://www.duncanelliot.com/blog/?p=28

# ssh username@server.ip -f -N -L 9922:server.ip:22 ## minimal
sudo -u local.username ssh username@server.ip -Y -C -f -N -L 9922:server.ip:22

## multiple instances can run in case of disconnect and reconnect

This script forwards my local port 9922 to the server’s port 22 via ssh. That way, I can ssh username@localhost -p 9922 if I wanted to connect to the server. I do this so that file synchronization can resume after a disconnect and reconnect (changed files does not get synced after a reconnect if I connect to the remote server directly).

Run sudo cp unisonNetworkOnPortForward /etc/network/if-up.d/ on Debian or Ubuntu. By doing this, the script will be executed whenever the computer is connected to a network (this will be different for non-debian-based distros). Note that multiple instances of this port forwarding will be present if the network is disconnected and reconnected multiple times. This makes things a little ugly, but I haven’t noticed any problems really. Also note that the script name cannot have a file extension or things will not work.

unisonMonitor.sh:

#! /bin/bash

## in /etc/rc.local, add:
## sudo -u local.username /path/to/unisonMonitor.sh &

unison default ~/Documents ssh://username@localhost:9922//home/vinh/Backup/Documents -repeat watch -times -logfile /tmp/unison.log
# -times: sync timestamps
# -repeat watch: real-time synchronization via pyinotify

Add to /etc/rc.local before the last line:

sudo -u local.username /path/to/unisonMonitor.sh &

This turns on unison sync at startup (unison will keep trying to connect to the server if it is disconnected). Again, this implementation is different for non-debian-based distros.

unisonSync.sh:

#! /bin/bash

unison -batch -times ~/Documents ssh://username@localhost:9922//home/vinh/Backup/Documents -logfile /tmp/unison.log

Run unisonSync.sh when you want to manually sync the two folders. I add the following line to cron (crontab -e) to have a manual sync everyday at 12:30pm:

30 12 * * * /path/to/unisonSync.sh

I set up this cron job because unisonMonitor.sh will only sync files that have changed while the unison process is running. This daily backup makes sure all my files are in sync at least once a day.

unisonKill.sh:

#! /bin/bash

ps aux | grep unison | awk '{print $2}' | xargs kill -9

I run this script on the client or server when I want to clean up unison processes. The one drawback about the monitor feature of unison currently is that the unison -server and fsmonitor.py process on the server is not killed when the unison process stops on the client side. After multiple connects, this will leave a lot of unison processes running on the server. Although I haven’t seen any issues with this, the unisonKill.sh script should make cleaning up the processes easier.

Start the service

Once these scripts are in their correct locations, first run unisonSync.sh to have the initial sync. Then restart the computer. You should see a unison and fsmonitor.py process by executing ps aux | grep unison on the client and server. Also, you should see an ssh process corresponding to the port forwarding by executing ps aux | grep ssh. Run touch foo.txt in the directory that you are watching and see if it appears on the server. Remove it and see if it gets deleted. Good luck!

What are some drawbacks with this setup compared to Dropbox? Well, I can’t revert back to files from a previous date, and I don’t have a dedicated Android app that I can access the files with. To solve the former, you can set up another cron job that syncs to a different location on your server every few days, giving you access to files that are a few days old. To solve the latter, I’m sure there are Android apps that allow you to access files via the sftp protocol.

About Vinh Nguyen

Statistician

12 comments

  1. Hi. For your last issue check out Back in Time. It keeps snapshots based on only files that have changed. I am interested in this project. I want a low powered media/file system that keeps files synced across multiple computers. When a file is branched then it notifies users for correction. Also when a new PC is added to the system the files are seemlessly synced in idle time or when in need. Will keep an eye in your progress. Cheers.

  2. Great implementation!

    I am new to syncing, but I think this is one of the best solutions. You have all the functionalities of Dropbox, without privacy and cost issues!

    But I have a question. How you cope with conflicts?

  3. @Alessandro what conflicts? I don’t usually have conflicts. If it arises somehow, I just do a manual sync and then restart my real-time sync.

    @Jay What does the ‘-backups’ argument do?

  4. A conflict happen when both sides change before having a connection again. It can be even more hard if you use more than one computer.

    But you are right, I guess it is enough to just run a manual sync, I will try

  5. I’m using your sync method. Everything works well, but when a new folder is created it’s content is not synchronized. It seems as it’s not being monitored by pyinotify. If you have a solution for this issue I would like to know.

    Regards

  6. Thanks very much for this tutorial. Do you still recommend running a periodic manual sync via crontab? I thought it would be dangerous to do so while the continuously updating daemon is still running (as started by your unisonMonitor.sh). I have also had problems with Unison detecting changes automatically, so I’d also like to periodically start a manual sync, but doesn’t one need to kill the daemon first? Do you happen to have any new insights on this aspect?

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>