archive and synchronizations with unison and rsync

I use rsync as my primary backup utility as it is fairly efficient at archiving files (NOTE: to backup to fat32 usb drives, make sure --modify-window=1 is used). I recently discovered unison, and am using it for real time synchronization between directories from two Linux computers. I really like it.

Because of how well unison works for syncing my two computers in real time, I wanted to compare it to rsync for backing up files (one-way) to different computers and to usb drives (use the -force /first/location for one way syncs). This post does a good job describing unison‘s capabilities. I tried unison in batch mode to sync between two Linux computers, and it too was quite fast (even faster than rsync).

However, it is horrendously slow when transferring to a FAT drive since it uses checksum on all the files; it is discussed in the comments here.

For my current backup needs, I can use rsync or unison for backups between Linux computers or ext formatted drives; I will probably rely more on rsync since unison isn’t installed by default on all Linux machines and the same version is required on the linked machines. For archiving to a FAT32 usb drive, I will rely on rsync. For real-time synchronizations, I will rely on unison.

Real time file synchronization like Dropbox via Unison

Dropbox is a very nice tool for real time synchronization. It works very well to keep files from multiple devices (computers, phones, etc.) in sync. I use it mainly as a cloud-based backup for some of my files. However, it’s been on the headlines recently due to security and privacy concerns, leading to calls for encrypting your files prior to syncing with Dropbox.

I’ve always contemplated on running my own Dropbox-like service to have yet another safe backup of my files. Besides knowing where my data are stored exactly, I have (in theory) an unlimited amount of space. This post and this post outline solutions based on open source tools such as OpenSSH (for encrypted file transfer), lsyncd (for monitoring files), and Unison (rsync-like tool). I’ve attempted this setup, but failed to get things working with lsyncd (see the extensive discussion with the author via the comments).

I stumbled upon this post that outlines a solution based on the bleeding edge version of Unison, which includes the -repeat watch option, featuring the monitoring of files. However, the author outlined a solution for Mac OS X. I played around with the new Unison and arrived at a solution I am pretty satisfied with for my Ubuntu machines (easily extended to Mac and Windows, I’m sure). I will outline my setup in this post. Note that I have password-less ssh set up so that I can ssh into my server without typing in the password. Also, I am using Unison version 2.44.2, which I downloaded via svn around 7/16/2011.

Installing Unison

The same version of Unison must be installed on both the client and the server. Both my client and server runs Ubuntu (11.04 and 10.04 server). On the client, the folder I would like to sync is /home/vinh/Documents; the server’s destination is /home/vinh/Backup/Documents.

sudo apt-get install ocaml python-pyinotify
## install the .deb file from http://packages.ubuntu.com/search?keywords=python-pyinotify via `dpkg -i` if python-pyinotify is not in your repository
svn checkout https://webdav.seas.upenn.edu/svn/unison
cd trunk
make NATIVE=true UISTYLE=text
## `make install` installs into $HOME/bin/
sudo cp src/unison /usr/local/bin/
sudo cp src/fsmonitor.py /usr/local/bin/

Everything following is done on the client computer.

Scripts

unisonNetworkOnPortForward:

#! /bin/bash

## http://ubuntuforums.org/showpost.php?p=6679437&postcount=4
## can't have extension in filename http://www.duncanelliot.com/blog/?p=28

# ssh username@server.ip -f -N -L 9922:server.ip:22 ## minimal
sudo -u local.username ssh username@server.ip -Y -C -f -N -L 9922:server.ip:22

## multiple instances can run in case of disconnect and reconnect

This script forwards my local port 9922 to the server’s port 22 via ssh. That way, I can ssh username@localhost -p 9922 if I wanted to connect to the server. I do this so that file synchronization can resume after a disconnect and reconnect (changed files does not get synced after a reconnect if I connect to the remote server directly).

Run sudo cp unisonNetworkOnPortForward /etc/network/if-up.d/ on Debian or Ubuntu. By doing this, the script will be executed whenever the computer is connected to a network (this will be different for non-debian-based distros). Note that multiple instances of this port forwarding will be present if the network is disconnected and reconnected multiple times. This makes things a little ugly, but I haven’t noticed any problems really. Also note that the script name cannot have a file extension or things will not work.

unisonMonitor.sh:

#! /bin/bash

## in /etc/rc.local, add:
## sudo -u local.username /path/to/unisonMonitor.sh &

unison default ~/Documents ssh://username@localhost:9922//home/vinh/Backup/Documents -repeat watch -times -logfile /tmp/unison.log
# -times: sync timestamps
# -repeat watch: real-time synchronization via pyinotify

Add to /etc/rc.local before the last line:

sudo -u local.username /path/to/unisonMonitor.sh &

This turns on unison sync at startup (unison will keep trying to connect to the server if it is disconnected). Again, this implementation is different for non-debian-based distros.

unisonSync.sh:

#! /bin/bash

unison -batch -times ~/Documents ssh://username@localhost:9922//home/vinh/Backup/Documents -logfile /tmp/unison.log

Run unisonSync.sh when you want to manually sync the two folders. I add the following line to cron (crontab -e) to have a manual sync everyday at 12:30pm:

30 12 * * * /path/to/unisonSync.sh

I set up this cron job because unisonMonitor.sh will only sync files that have changed while the unison process is running. This daily backup makes sure all my files are in sync at least once a day.

unisonKill.sh:

#! /bin/bash

ps aux | grep unison | awk '{print $2}' | xargs kill -9

I run this script on the client or server when I want to clean up unison processes. The one drawback about the monitor feature of unison currently is that the unison -server and fsmonitor.py process on the server is not killed when the unison process stops on the client side. After multiple connects, this will leave a lot of unison processes running on the server. Although I haven’t seen any issues with this, the unisonKill.sh script should make cleaning up the processes easier.

Start the service

Once these scripts are in their correct locations, first run unisonSync.sh to have the initial sync. Then restart the computer. You should see a unison and fsmonitor.py process by executing ps aux | grep unison on the client and server. Also, you should see an ssh process corresponding to the port forwarding by executing ps aux | grep ssh. Run touch foo.txt in the directory that you are watching and see if it appears on the server. Remove it and see if it gets deleted. Good luck!

What are some drawbacks with this setup compared to Dropbox? Well, I can’t revert back to files from a previous date, and I don’t have a dedicated Android app that I can access the files with. To solve the former, you can set up another cron job that syncs to a different location on your server every few days, giving you access to files that are a few days old. To solve the latter, I’m sure there are Android apps that allow you to access files via the sftp protocol.

Split, cut, or sample a video file on the command line

There are many reasons to cut or split a video file. For example, one may want to cut a long video into multiple parts to upload to YouTube. I first ran into this and this, which suggests:

 <pre class="src src-sh">ffmpeg -ss 00:00:00 -t 00:01:00 -vcodec copy -acodec copy -i in.avi out.avi

## -ss: start position ## -t: end position ## can re-encode with other codecs

However, for some reason my out file is almost as large as my in file, even though I’m only sampling 1 minute out of the 2 hour segment. This lead me to the mencoder solution:

 <pre class="src src-sh">mencoder -ss 00:00:00 -endpos 00:00:01 -ovc copy -oac copy in.avi -o out.avi

The file size of my out file is more reasonable, and the speed is incredibly fast since I am not re-encoding.

Find files and find files containing certain text

This is a reminder to myself as I keep forgetting how to do these basic searches in Linux.

To find files with file name containing the text foo using the command find:

## find file with "foo" in file name
find ./ -name "*foo*" ## replace ./ with path; can use shell style wildcards
## ignore upper and lower cases
find ./ -iname "*foo*"
## print pathnames of all files
find ./ -print

To find files with bar in it’s content using the command grep:

grep "bar" -r ./
## print lines without the word "bar"
grep -v "bar" -r ./
## note, can also use regexp with -E

GPG/PGP: sign or encrypt emails or files

I always wanted to set this up, but never got around to it because I a particular need never arised. I finally had some time to set up my GPG key so others can send me encrypted files and emails. Google “gpg” or “pgp” for more information about it.

Setup

I followed these instructions to set up GPG on my laptop; setup is pretty straightforward. Be sure to back up your public key, private key, and revocation key somewhere safe, like a CD locked up in a fireproof safe:

gpg --list-secret-keys ## Look for the line that starts something like "sec 1024D/". The part after the 1024D is the key_id.
gpg -ao _something_-private.key --export-secret-keys key_id
gpg -ao _something_-public.key --export key_id
gpg --gen-revoke

Remember to import the key if you use multiple computers or got a new computer:

gpg --import _something_-public.key
gpg --import _something_-private.key

Use

This is a good beginner’s reference for GPG. Some parts are also based on this page, although it is for the pgp command as opposed to the gpg command.

Remember to place the following in ~/.bashrc so that GPG uses your key id as the default:

export GPGKEY=key id

Import trusted people’s public keys:

gpg --import public.key.file
gpg --list-keys ## list to see who we have

Encrypt file:

gpg --encrypt MyFile ## optional (prior to --encrypt): --out OutFile
## select person from your public keys ring via key id or email address

Now send the encrypted file to the person. Only he/she can open it using his/her private key.

Decrypt file:

gpg --decrypt MyFile ## optional (prior to --decrypt): --output OutFile
## enter passphrase

Signing is used to let others (with your public key) know that the message/file was indeed from you and has not been tampered with while being transported to you. The content is not private but the origin is of concern.

Signing a file:

gpg -sat textfile ## clear sign a file (--sign --armored --text) so that the original text and content are in the same file
## enter passphrase

gpg -sb binaryfile ## (--sign --detach-sign): a separate binary .sig file is generated to be delivered with the binary file
## enter passphrase

gpg -sab binaryfile ## (--sign --armored --detach-sign): a separate armored text signature .asc file is generated to be delivered with the binary file
## enter passphrase

## sign and encrypt
pgp -seat textfile "To User ID"
pgp -se binaryfile "to recipient ID"
pgp -sea binaryfile "to recipient ID"

Reading a file:

gpg gpgfile ## "This is used for all PGP files, be they encrypted, signed or a key file. PGP will handle it all automatically."
gpg signaturefile signedfile ## detached signature file

Use with Email

Since I use gmail in the web browser primarily, I’m unable to integrate GPG with gmail since FireGPG is now discontinued. I have a feeling a chrome extension will come to fruition soon.

Currently, if I want to send signed emails, I do so in Evolution Mail per the setup instructions. I will update this post on how to do it using mutt or emacs soon.

To check on signatures sent to my email, what I do is open using “Show original” and pasting the content into a text file on my computer. I use the clearmime script as follow:

$ clearmime | gpg --verify # expects you to paste a raw email message
$ clearmime < myemail.txt | gpg --verify # reads the raw email from a file

UPDATE 9/29/2010: Using GPG with mutt is actually quite easy. I followed these instructions. Basically, after mutt is set up to be able to send mail, I added the following to my ~/.muttrc file:

## gpg with mutt http://blogs.techrepublic.com.com/security/?p=413
## following for debian only (location)
source /usr/share/doc/mutt/examples/gpg.rc
## or paste content into .muttrc file

Now, type mutt in the command line, select mail, and compose. After composing the mail, type p to select whether you want to sign, encrypt, etc. Note that I believe you have to be in the mutt program to access GPG options. I don’t think sending signed/encrypted email from the command line with mutt is an option.

Also, to open encrypted emails from my gmail in the web browser (I already covered how to verify a signature previously), I just download the attached encrypted message by copy and paste and type the following:

gpg EncryptedMail.asc
## enter passphrase

Conclusions

My public key is available on my personal website for others to use to send me encrypted emails and files. I’ve also uploaded to the keyservers. However, I have not had anyone verify me in the physical world, so you might get a WARNING message.

UPDATE 9/8/2011 Revoke Keys

I recently decided to revoke the keys I set up last year. Why? The passphrase was too short. The passphrase should be long in order to protect myself from a brute-force attack. According to this, the passphrase should be 22+ characters in length to be equivalent to a 128 bit key, and 42+ characters long to be equivalent to a 256 bit key. Thus I revoked my current key and will create a new one.

To revoke, I followed these instructions:

gpg --import revoke.asc ## I saved this file in a very secure place
gpg --keyserver pgp.mit.edu --send-keys key_num

To delete the key from my computer, I followed these instructions:

gpg --delete-secret-keys key_num
gpg --delete-keys key_num
gpg --list-key