## Version control and collaborating with LaTeX files

This post finally pushed me to explore ways to version control and collaborate with others using LaTeX files. I’m assuming the collaborators also use LaTeX, which is rare in itself when your primary collaborators are scientists that work mainly with WYSISWG editors, in particular, MS Word. I will outline how I collaborate with non-LaTeX users in a future post.

One issue arises with LaTeX files. When writing prose, I tend to write continuously on a line until a line break is necessary, usually when a new paragraph begins or when I introduce long math formulas. When diff is used to compare two LaTeX files, it is hard to see where a change occured in long line since diff looks at changes in lines; this problem is well documented.

The first obvious solution is to break up your lines. A fill-type solution will probably not work too well with diff with that 80 character restriction since a small change might affect more than one line. Comments from the original post suggests using a single line for each sentence. I think this is reasonable for readability and for diff to work. However, when collaborating with others, they might not abide by this preference.

The utility wdiff is another suggested solution that looks at differences in words instead of lines. The only problem I see with it is that the entire file will be printed, with words that are added and removed “highlighted” in the text using braces and brackets; its hard to see all the changes immediately with long files. This site shows how to add additional LaTeX markup that will highlight the changes visually on a generated dvi/ps/pdf file. The utility latexdiff also carries this out, and appears to be very popular among users. The output document is marked up similar to the “Track Changes” feature of your typical WYSIWYG editors.

latexdiff install on ubuntu:

<pre class="src src-sh">sudo apt-get install latexdiff


Moving forward, I will abide by the one sentence per line principle. However, I cannot force collaborators to do this, so will use wdiff and latexdiff to compare their changes. To assist in the first task, I can insert a new line after period using with the help of regexp-replace or replace-string on a region (two spaces to new line C-q C-j).

## Finally started to version control my files using Git

I’ve been aware of different version control systems since I started using Linux. Why? Often times I am asked to install the latest development version of a software as the last tarball (*.tar.gz) has not been updated for a while. I’ve pulled code most often using Subversion (SVN), Git, and CVS. Of the various version control systems, I must say Subversion is the most prevalent out there. However, I’ve heard many preach about Git, similar to people preaching about Emacs (me being one of them). Even though Git is not as “standard” as Subversion, I chose Git because

1. many acquaintances I think highly of use Git,
2. many tutorials out there (although this is true for most popular systems),
3. Linux kernel is managed using Git, and
4. existence of git-svn, so I can still use SVN through Git.

I will write more about my experience with Git on particular projects later (e.g., Git with SVN on R-Forge). For this post, I’ll just post some references I’ve used to learn Git and the common commands I use (reminder to self).

First, read this short tutorial to understand Git. Like the author described, most tutorials show you command. The author tries to explain the concept of version control and their principles in Git.

Try out Git on one of your projects (create a folder and create and edit some files).

Next, read the Git Magic book (online). It gives a good analogy of version control to video games, along with

Re-read how Git works, play with files, and Git Magic.

For Git and SVN, this site and this site helped me get started. Note: I did not use -s in the git svn clone. This is also a good intro to Git.

To undo what you’ve edited since the last commit, remember git reset --hard HEAD.

Once I get the command lines down well to know exactly what’s going on, I plan to incorporate magit, an emacs extension for git, into my workflow.

Now, my notes to remember:

 <pre class="src src-sh"><span style="color: #ff4500;">## </span><span style="color: #ff4500;">start version control in a project/folder</span>


git init git add … ## files, folders, etc. git commit -a -m “COMMENTS” ## commit

Uhh, how about just google “git reference card” to obtain this card and this card.

## Managing a statistical analysis project – guidelines and best practices

Had to share this link today as I better read all the content it refers to and incorporate a lot of the recommended practices into my work flow. Thanks Tal Galili for compiling all those information.

## an innovative idea? blogging by email: posterous

so i recently discovered (through michael zeller again) a new blogging service called posterous. it’s pretty innovative in that u can post by email. really? blogger let’s me do that too, and so does wordpress and most blogging services. however, posterous stands out because it’ll do all the work for u. put in a link to a video and it’ll embed it on your blog. attach and song, same thing. attach multiple pictures and it’ll embed a gallery. want to format ur blog? write the html and posterous will take care of everything. cool huh? look at the list of features yourself. i’m still debating on whether to switch over or keep using blogger. i mean i guess i can start another blog on there for a different topic, but face it, i only have time for one active blog right now, and that i don’t have another interest that i could blog about. to switch or not to switch? some drawbacks i see with posterous:

1. attaching too many files my gmail will get huge.
2. security issues with emails…what if someone spoofs my email and posts?

michael zeller points out that he likes the email idea because he could write everything in emacs using org-mode, save them for version control, and email them in emacs. i guess thats a good idea. however, right now my current blog is like a repository for my brain….it’s very random and scattered. i don’t organize (i let google do the organizing). to switch or not? i want to try to switch, but i’m having issues importing. also, i would have to change some links in my blog that refers to the older posts. i chose blogger in the first place because it is simple, integrates well with google, and has a wide support online. hmmm…decisions decisions