Optimized R and Python: standard BLAS vs. ATLAS vs. OpenBLAS vs. MKL

wpid-2014-11-10-R_blas-atlas-openblas-mkl.png

Revolution Analytics recently released Revolution Open R, a downstream version of R built using Intel’s Math Kernel Library (MKL). The post mentions that comparable improvements are observed on Mac OS X where the ATLAS blas library is used. A reader also expressed his hesitation in the Comments section for a lack of a comparison with ATLAS and OpenBLAS. This concept of using a different version of BLAS is documented in the R Administration manual, and has been compared in the past here and here. Now, as an avid R user, I should be using a more optimal version of R if it exists and is easy to obtain (install/compile), especially if the improvements are up to 40% as reported by the Domino Data Lab. I decided to follow the framework set out by this post to compare timings for the different versions of R on a t2.micro instance on Amazon EC2 running Ubuntu 14.04.

First, I install R and the various versions of BLAS and lapack and download the benchmark script:

sudo apt-get install libblas3gf libopenblas-base libatlas3gf-base liblapack3gf libopenblas-dev liblapack-dev libatlas-dev R-base R-base-dev
wget http://r.research.att.com/benchmarks/R-benchmark-25.R
echo "install.packages('SuppDists', dep=TRUE, repo='http://cran.stat.ucla.edu')" | sudo R --vanilla ## needed for R-benchmarks-25.R

One could switch which blas and lapack library are used via the following commands:

sudo update-alternatives --config libblas.so.3 ## select from 3 versions of blas: blas, atlas, openblas
sudo update-alternatives --config liblapack.so.3 ## select from 2 versions of lapack: lapack and atlas-lapack

Run R, issue Ctrl-z to send the process to the background, and see that the selected BLAS and lapack libraries are used by R by:

ps aux | grep R ## find the process id for R
lsof -p PROCESS_ID_JUST_FOUND | grep 'blas\|lapack'

Now run the benchmarks on different versions:

# selection: libblas + lapack
cat R-benchmark-25.R | time R --slave
...
171.71user 1.22system 2:53.01elapsed 99%CPU (0avgtext+0avgdata 425068maxresident)k
4960inputs+0outputs (32major+164552minor)pagefaults 0swaps
173.01
# selection: atlas + lapack
cat R-benchmark-25.R | time R --slave
...
69.05user 1.16system 1:10.27elapsed 99%CPU (0avgtext+0avgdata 432620maxresident)k
2824inputs+0outputs (15major+130664minor)pagefaults 0swaps
70.27
# selection: openblas + lapack
cat R-benchmark-25.R | time R --slave
...
70.69user 1.19system 1:11.93elapsed 99%CPU (0avgtext+0avgdata 429136maxresident)k
1592inputs+0outputs (6major+131181minor)pagefaults 0swaps
71.93
# selection: atlas + atlas-lapack
cat R-benchmark-25.R | time R --slave
...
68.02user 1.14system 1:09.21elapsed 99%CPU (0avgtext+0avgdata 432240maxresident)k
2904inputs+0outputs (12major+124761minor)pagefaults 0swaps
69.93

As can be seen, there’s about a 60% improvement using OpenBLAS or ATLAS over the standard libblas+lapack. What about MKL? Let’s test RRO:

sudo apt-get remove R-base R-base-dev
wget http://mran.revolutionanalytics.com/install/RRO-8.0-Beta-Ubuntu-14.04.x86_64.tar.gz
tar -xzf RRO-8.0-Beta-Ubuntu-14.04.x86_64.tar.gz
./install.sh
# check that it is using a different version of blas and lapack using lsof again
cat R-benchmark-25.R | time R --slave
...
51.19user 0.98system 0:52.24elapsed 99%CPU (0avgtext+0avgdata 417840maxresident)k
2208inputs+0outputs (11major+131128minor)pagefaults 0swaps
52.24

This is a 70% improvement over the standard libblas+lapack version, and a 25% improvement over the ATLAS/OpenBLAS version. This is quite a substantial improvement!

Python

Although I don’t use Python much for data analysis (I use it as a general language for everything else), I wanted to repeat similar benchmarks for numpy and scipy as improvements have been documented. To do so, compile numpy and scipy from source and download some benchmark scripts.

sudo pip install numpy
less /usr/local/lib/python2.7/dist-packages/numpy/__config__.py ## openblas?
sudo pip install scipy
# test different blas
python
ps aux | grep python
lsof -p 20812 | grep 'blas\|lapack' ## change psid
wget https://gist.github.com/osdf/3842524/raw/df01f7fa9d849bec353d6ab03eae0c1ee68f1538/test_numpy.py
wget https://gist.github.com/osdf/3842524/raw/22e21f5d57a9526cbcd9981385504acdc7bdc788/test_scipy.py

One could switch blas and lapack like before. Results are as follows:

# selection: blas + lapack
time python test_numpy.py
FAST BLAS
version: 1.9.1
maxint: 9223372036854775807

dot: 0.214728403091 sec

real    0m1.253s
user    0m1.119s
sys     0m0.036s

time python test_scipy.py
cholesky: 0.166237211227 sec
svd: 3.56523122787 sec

real    0m19.183s
user    0m19.105s
sys     0m0.064s

# selection: atlas + lapack
time python test_numpy.py
FAST BLAS
version: 1.9.1
maxint: 9223372036854775807

dot: 0.211034584045 sec

real    0m1.132s
user    0m1.121s
sys     0m0.008s

time python test_scipy.py
cholesky: 0.0454761981964 sec
svd: 1.33822960854 sec

real    0m7.442s
user    0m7.346s
sys     0m0.084s

# selection: openblas + lapack
time python test_numpy.py
FAST BLAS
version: 1.9.1
maxint: 9223372036854775807

dot: 0.212402009964 sec

real    0m1.139s
user    0m1.130s
sys     0m0.004s

time python test_scipy.py
cholesky: 0.0431131839752 sec
svd: 1.09770617485 sec

real    0m6.227s
user    0m6.143s
sys     0m0.076s

# selection: atlas + atlas-lapack
time python test_numpy.py
FAST BLAS
version: 1.9.1
maxint: 9223372036854775807

dot: 0.217267608643 sec

real    0m1.162s
user    0m1.143s
sys     0m0.016s

time python test_scipy.py
cholesky: 0.0429849624634 sec
svd: 1.31666741371 sec

real    0m7.318s
user    0m7.213s
sys     0m0.092s

Here, if I only focus on the svd results, then OpenBLAS yields a 70% improvement and ATLAS yields a 63% improvement. What about MKL? Well, a readily available version costs money, so I wasn’t able to test.

Conclusion

Here are my take-aways:

  • Using different BLAS/LAPACK libraries is extremely easy on Ubuntu; no need to compile as you could install the libraries and select which version to use.
  • Install and use RRO (MKL) when possible as it is the fastest.
  • When the previous isn’t possible, use ATLAS or OpenBLAS. For example, we have AIX at work. Getting R installed on there is already a difficult task, so optimizing R is a low priority. However, if it’s possible to use OpenBLAS or ATLAS, use it (Note: MKL is irrelevant here as AIX uses POWER cpu).
  • For Python, use OpenBLAS or ATLAS.

For those that want to compile R using MKL yourself, check this. For those that wants to do so for Python, check this.

Finally, some visualizations to summarize the findings: 2014-11-10-R_blas+atlas+openblas+mkl.png 2014-11-10-Python_blas+atlas+openblas.png

# R results
timings <- c(173.01, 70.27, 71.93, 69.93, 52.24)
versions <- c('blas + lapack', 'atlas + lapack', 'openblas + lapack', 'atlas + atlas-lapack', 'MKL')
versions <- factor(versions, levels=versions)
d1 <- data.frame(timings, versions)
ggplot(data=d1, aes(x=versions, y=timings / max(timings))) + 
  geom_bar(stat='identity') + 
  geom_text(aes(x=versions, y=timings / max(timings), label=sprintf('%.f%%', timings / max(timings) * 100)), vjust=-.8) +
  labs(title='R - R-benchmark-25.R')
ggsave('R_blas+atlas+openblas+mkl.png')

# Python results
timings <- c(3.57, 1.34, 1.10, 1.32)
versions <- c('blas + lapack', 'atlas + lapack', 'openblas + lapack', 'atlas + atlas-lapack')
versions <- factor(versions, levels=versions)
d1 <- data.frame(timings, versions)
ggplot(data=d1, aes(x=versions, y=timings / max(timings))) + 
  geom_bar(stat='identity') + 
  geom_text(aes(x=versions, y=timings / max(timings), label=sprintf('%.f%%', timings / max(timings) * 100)), vjust=-.8) +
  labs(title='Python - test_scipy.py (SVD)')
ggsave('Python_blas+atlas+openblas.png')

Upgrading Ubuntu 12.04 to 14.04 breaks encrypted LVM

My laptop runs Ubuntu and is fully encrypted (since version 10.04). Upgrade from 10.04 to 12.04 was smooth in the sense that my system booted fine, asking for the passphrase to unlock the LVM. However, when I upgraded from 12.04 to 14.04, things broke and my laptop no longer booted properly as the LVM never got encrypted. I had to do the following to get my laptop working again (after many rounds of trial and error):

  • Boot a live usb Ubuntu session, de-crypted the LVM, and chroot’ed to run as the original OS
  • Finish the upgrade session via apt-get update && apt-get upgrade
  • It appears Ubuntu 14.04 installed some new package (did not write name down) that manages LVM or disks somehow (based on googling the error message). I removed this package.
  • Saw lvm issues, so installed the package lvm2
  • I made sure both dm-crypt and lvm2 were installed, and were accessible in initramfs, as cryptsetup was removed from initramfs since version 13.10. Had to do something with the following CRYPTSETUP issue.
  • Based on this post, I modified various files, but things still did not boot properly. I believe what finally fixed it was explicitly pointing to the LVM by /dev/sda5 in the GRUB_CMDLINE_LINUX line in /etc/default/grub.

The following is summary of these files for me. /etc/crypttab:

# <target name> <source device>         <key file>      <options>
# sdb5_crypt UUID=731a44c4-4655-4f2b-ae1a-2e3e6a14f2ef none luks
sdb5_crypt UUID=731a44c4-4655-4f2b-ae1a-2e3e6a14f2ef none luks,retry=1,lvm=vg01

/etc/initramfs-tools/conf.d/cryptroot:

## vinh created http://www.joh.fi/posts/2014/03/18/install-ubuntu-1310-on-top-of-encrypted-lvm/
# CRYPTROOT=target=sdb5_crypt,source=/dev/disk/by-uuid/f1ba5a54-ac7e-419d-8762-43da3274aba4
CRYPTOPTS=target=sdb5_crypt,source=UUID=f1ba5a54-ac7e-419d-8762-43da3274aba4,lvm=vg01

Then run update-initramfs -k all -c in order to update the initramfs images.

Have this line in /etc/default/grub:

#GRUB_CMDLINE_LINUX="cryptopts=target=sdb5_crypt,source=/dev/disk/by-uuid/f1ba5a54-ac7e-419d-8762-43da3274aba4,lvm=vg01"
#GRUB_CMDLINE_LINUX="cryptopts=target=sdb5_crypt,source=UUID=f1ba5a54-ac7e-419d-8762-43da3274aba4,lvm=vg01"
GRUB_CMDLINE_LINUX="cryptopts=target=sdb5_crypt,source=/dev/sda5,lvm=vg01"

Run update-grub.

Again, I think the key is the source definition in the previous line. I kept trying to refer to it by uuid but that did not work.

Screen brightness after suspend in Ubuntu

Many laptops have their screens dimmed after returning from “suspend” and cannot get back to their original brightness. The bug hasn’t been fixed for 3 years. A fix is provided in the bug report by putting something like the following in /etc/rc.local:

<pre class="src src-sh"><span style="color: #eedd82;">brt</span>=<span style="color: #fa8072;">`cat /sys/devices/virtual/backlight/acpi_video0/brightness`</span>

abrt=cat /sys/devices/virtual/backlight/acpi_video0/actual_brightness if (( $brt != $abrt )) ; then echo $abrt > /sys/devices/virtual/backlight/acpi_video0/brightness fi

Use the following

find /sys/ -iname 'bright'

to see if you need to change the exact path to the files.

Enable root account in Ubuntu?

After my recent experience with broken su and sudo commands in a failed system upgrade, I realized that although disabling the root account has many advantages, one of the disadvantage is that I can’t login as root in the terminal when I’m physically in front of the system. This is a major issue if su, sudo, and passwd binaries are broken somehow. Luckily, chroot was there to the rescue for me. Now, I contemplate whether I should enable the root account on my systems…

Flipping the classroom: creating screencast lectures in Linux

I’m debating the idea (hype) of flipping the classroom for one of my classes next Fall where students watch lecture videos at home (or elsewhere) so I could spend class time doing more hands-on activities like discussing the art of data analysis and how to solve problems with statistics. I think Khan Academy, Udacity, and Coursera are doing a great service for humanity by offering high quality courses taught by excellent teachers online that are accessible to anyone with an internet connection.

I don’t claim to be a great teacher, but I think my own students might benefit from this pedagogical method. My main concern with this approach is that not all students will watch the lectures, just as how not all students read the assigned readings (guilty as a student). I guess I can give students short quizzes during lecture to push them to watch the videos. Also, I’ll give my usual challenging homework so that only students that study the material well could excel. By flipping the classroom, more material could be covered, students have access to the recordings in addition to my slides, and I could make sure everything I want to be said are recorded (as opposed to a live session where I could forget a few points). Lecture times can then be more interactive as opposed to me lecturing them for an hour.

I think most of the online education sites use Camtasia with a Wacom Cintiq to produce their videos. I use Linux and cannot afford such an expensive device. I plan on using a screencast software like recordMyDesktop or Istanbul to record the desktop screen and audio. For recordMyDesktop, I had issues with the encode on the fly option, which means recording very long videos could be an issue (1 minute of raw video takes up about 210MB, and 1 minute encoded video takes up about 8MB). Istanbul records on the fly without problem (I think). I haven’t tried recording for an hour and 20 minutes yet.

My plan is to create my lecture slides with LaTeX Beamer and use Xournal to annotate the slides as I’m lecturing; hopefully my Asus T101MT netbook is strong enough to do the recording as I utilize it’s touchscreen capabilities. I can just switch over to Emacs to illustrate data analysis in R when needed. My main concern now is where I could host these (large) videos…

Update 4/27/2012: Screencast with ffmpeg

After some testing, I think the best screencast software on Linux would have to be ffmpeg. First, remove ffmpeg and compile it from source based on the latest version per this post. Then, create screencast.sh:

#! /bin/bash
DATE=`date +%Y%m%d`
TIME=`date +%Hh%M`
ffmpeg -y -f alsa -ac 2 -i pulse -f x11grab -r 24 -s $(xwininfo -root | grep 'geometry' | awk '{print $2;}') -i :0.0 -c:v libx264 -preset veryfast -crf 22 -c:a libmp3lame -ar 44100 -ab 24k -threads 0 /tmp/screencast_$DATE-$TIME.mp4

For more libx264 options, see this page.

Test ram with Memtest86+ and ignore bad parts with badram in grub

Recently, my computer kept freezing whenever I started conkeror (with 100+ buffers loading from a previous session). Folks over at #conkeror on freenode suggested that the problem might be due to faulty ram. They suggested testing my ram with Memtest86+. It is installed by default on Ubuntu.

If you have multiple sticks of ram, test one stick at a time. It’s best to test one stick per night as the test can take hours. To test the ram, restart your computer and go to the grub menu (hold shift if your grub menu doesn’t display automatically). Then, select the “Memtest86+” boot option. Press “c”, “4”, and “3” to display the error locations according to the BadRAM syntax (converting the default faulty memory addresses is not obvious to me and others). If you don’t do this, you will end up wasting time fixing your boot options (details later).

If you know which ram sticks are bad, replace them if they are under warranty. If they are not under warranty and you can’t afford new ram, you can make use of BadRAM, incorporated by default in grub2, per these documentations. That is, edit /etc/default/grub and specify the faulty ram addresses with the GRUB_BADRAM option.

More information on running Linux with broken memory can be found here.

When I tried this out, I did not use the proper memory address syntax so my computer failed to boot. What made things even worse was that my hard drive was encrypted. Luckily, I can still access grub, and after many trials and tribulations, I fixed the problem by booting the computer with an Ubuntu live disk (usb), mounting the first, unencrypted partition (/dev/sda1) of the hard drive that stored /boot, and removing the badram option in /boot/grub/grub.cfg (replace “boot” with the mount path). Before figuring out the solution, I was trying to mount /dev/sda5, the encrypted partition, according to this and this as I thought that was where /boot resided. I also thought I had to generate a new initrd image. Luckily I didn’t have to (and didn’t succeed in trying) as that would have further complicate my boot options as I have experienced in the past.

After removing the bad ram, conkeror still crashed for me. Either something is wrong with other pieces of my hardware or something is going on with the xulrunner sucking up my system resources. I was able to stop the crashes by placing this in my conkeror rc file.

Build 32 bit R on 64 bit Ubuntu by utilizing chroot

In the past, I’ve described how one could build multiarch (64 bit and 32 bit) versions of R on a 64 bit Ubuntu machine. The method based on this thread no longer works as of R 2.13 or 2.14 I believe. I received advice from someone on #R over on freenode (forgot who) a few months ago that suggested the chroot route (see this also). I recently tried it and wanted to document the procedures. Although the solution isn’t as nice as the previous multiarch route, it will suffice for now. With the chroot method, first compile the 64 bit version of R the usual way. For the 32 bit version of R, do:

<pre class="src src-sh"><span style="color: #ff4500;">#### </span><span style="color: #ff4500;">change my.username to your username, or modify path per your taste</span>

### create chroot jail sudo apt-get install dchroot debootstrap sudo mkdir ~/chroot-R32 sudo emacs -q -nw /etc/schroot/schroot.conf ## paste the following in the file: (no quotes) [natty] description=Ubuntu Natty location=/home/my.username/chroot-R32 priority=3 users=my.username groups=sbuild root-groups=root

## build a basic Ubuntu system in the chroot jail sudo debootstrap –variant=buildd –arch i386 natty /home/my.username/chroot-R32 http://ubuntu.cs.utah.edu/ubuntu/ ## pick a mirror from https://launchpad.net/ubuntu/+archivemirrors

## copy my source locations for apt sudo cp /etc/apt/sources.list /var/chroot/etc/apt/sources.list ## edit this new file if to reflect only the needed source

### do following steps whenever you need to access 32 bit R ## access to proc and dns sudo mount -o bind /proc /home/my.username/chroot-R32/proc sudo cp /etc/resolv.conf /home/my.username/chroot-R32/etc/resolv.conf ## go into jail; do this whenever you want sudo chroot /home/my.username/chroot-R32 dpkg-architecture ## make sure system is i386 ### now the root / location should reflect the jail

### following happens in jail ## tools needed to build R apt-get install gcc g++ gfortran libreadline-dev libx11-dev xorg-dev ## get svn to get latest r source code apt-get install git-core subversion

## compile 32 bit R cd home/ mkdir R32 cd R32 svn checkout https://svn.r-project.org/R/trunk/ r-devel cd r-devel/ apt-get install rsync ./tools/rsync-recommended ./configure make make install R

How big is my /home/my.username/chroot-R32 folder? It is at 791 MB after the above steps. Let me know if you have suggestions for having both 32 bit or 64 concurrently on Linux. I believe Windows and Mac ships and compiles both 32 bit and 64 bit versions of R. I’m surprised this isn’t the case for Linux.