Scheduled Parallel Computing with R: R + Rmpi + OpenMPI + Sun Grid Engine (SGE)

Recently I’ve learned how to do parallel computing in R on a cluster of machines thanks to the R packages snowfall, snow, and Rmpi. I’ve been using the SOCKET method with snowfall since together they make things simple. With these tools, I can reduce day/week long jobs to hours or a day across many (100) cores/cpus.

However, system admins would prefer me to do things using the sun grid engine (sge) or one of their job scheduler since clusters are usually a shared resource and having “rogue” jobs like mine that hog all the resources aren’t really a good thing. Aside from scheduling jobs, another great thing about SGE is that it determines which nodes to use (idle?) with R so I don’t have to determine the list of nodes.

Luckily, people have attacked this problem already. First, Revolutions Computing has an internal document that gives instructions on how to install R, Rmpi, OpenMPI, and SGE to get them to work together. If you email them and ask for it, they are more than willing to share it. The document is “sge-snow.pdf.” After things are installed, here is how to get things to work.

Rmpi with OpenMPI and SGE via qsub:

First, copy the content of Rprofile that is packaged in Rmpi into ~/.Rprofile. Place the following in a shell script to be submitted by qsub (an example script is at the end):

mpirun -np 51 R --no-save -q < SGEtest.R > SGEtest.Rout

NOTE: 51 is the number of cores/cpus to use, 1 master + 50 slaves. Inside the R script, do not use anything that belongs to snow or snowfall. Just use Rmpi’s functions. Also, by using mpirun, we do NOT need to spawn slaves as they are spawned in the mpirun command. We also do not need to call library(Rmpi). Put the following in the R script (SGEtest.R) to see that things are running:

mpi.remote.exec(paste("I am",mpi.comm.rank(),"of",mpi.comm.size()))
mpi.remote.exec(paste("I am",Sys.info(),"of",mpi.comm.size()))

snow with OpenMPI and SGE via qsub:

First, place the location of the executable RMPISNOW from the snow package in the PATH variable (or use the direct location wherever you see RMPISNOW in the command line). DO NOT put the Rprofile from Rmpi into ~/.Rprofile. Place the following in the shell script to be submitted by qsub:

mpirun -np 21 RMPISNOW < SGEtest2.R > SGEtest2.Rout

In the R script, use only snow functions (not Rmpi or snowfall). No need to call library(snow). Put the following in the R script (SGEtest2.R) to test:

cl <- makeCluster()
print(clusterCall(cl, function() Sys.info()))

snow with OpenMPI and SGE via qrsh (interactive)

Similar to 2, but run

qrsh -V -q int64 mpirun -np 9 RMPISNOW

instead of the qsub command to get an interactive session.

Sample script for SGE

For the first two cases, a sample openMPI_R.sh script is:

#!/bin/bash

# here's the SGE directives
# ------------------------------------------
#$ -q longbat-adc # <- the name of the Q you want to submit to
#$ -pe openmpi 51 # <- load the openmpi parallel env w/ 4 slots
#$ -S /bin/bash # <- run the job under bash
#$ -N MPI-SGE # <- name of the job in the qstat output
#$ -o MPI-SGE.out # <- name of the output file.
#$ -e MPI-SGE.stderr # <- name of the stderr file.

module load R/2.10.0
echo "calling mpirun now"
## mpirun -np 51 R --no-save -q < SGEtest.R > SGEtest.Rout
mpirun -np 21 RMPISNOW < SGEtest2.R > SGEtest2.Rout
## call via: qsub openMPI_R.sh

Finally, I would like to point out that currently snowfall does not yet work with SGE because it requires a call to sfInit(), and this conflicts with the cluster called from mpirun. This made me learn some functions from snow, which aren’t all that much different from snowfall.

Also, there is an rsge package for R that seems to work too.

UPDATE 1/25/2010

  1. We don’t need to specify -np 51 in the mpirun command. If we omit it, SGE passes this information directly to OpenMPI.
  2. I tried installing this myself. A few things to note are: a. compile OpenMPI with the –with-sge flag. b. Place the bin directory of OpenMPI in PATH if it is installed in a non-standard placed. Also, remember to place the directory where RMPISNOW resides into PATH as well. c. install Rmpi: R CMD INSTALL Rmpi_0.5-8.tar.gz –lib=~/Rlib –configure-args=”–with-mpi=/home/vqnguyen/openmpi-1.4.1-vqn/” OR specify MPI_ROOT environment variable as home/vqnguyen/openmpi-1.4.1-vqn. d. Place “export LD_LIBRARY_PATH=/path/to/libmpifolder:$LD_LIBRARY_PATH” in .bashrc if the variable does not include it. This is required for library(Rmpi) to work. Also place “.libPaths(“~/Rlib”)” in RMPISNOWprofile in order to see where my Rmpi is. e. Set up a parallel environment in SGE either with qmon or on the command line with:

$ qconf -Ap openmpi.config where the config file is as follow:

openmpi configuration:
===============================
pe_name openmpi
slots 666
user_lists arusers
xuser_lists NONE
start_proc_args /bin/true
stop_proc_args /bin/true
allocation_rule $round_robin
control_slaves TRUE
job_is_first_task FALSE
urgency_slots min
accounting_summary TRUE
===============================

You can name the PE anything and set the number slots. Make sure the user list has you in it. Also, make sure you add the Q’s u want to work with into this PE.

  1. specifying an outfile in the makeCluster() command in RMPISNOW doesn’t do anything since the cluster is called at the RMPISNOW’s invocation. If we look at the RMPISNOWprofile, we see that the output is sink to /dev/null. I tried a few ways to get the workers’ output out, via sink() on each worker via clusterEvalQ or setting the OUT or R_SNOW_OUTFILE variables (see RMPInode.R and RMPInode.sh). How I got it to work was with:
clusterEvalQ(cl, sinkWorkerOutput("nodes.out"))
  1. Of course, make sure u have passwordless ssh on. If you have host key messages (ie, type yes to accept key) and your job doesnt run, put

StrictHostKeyChecking no

in ~/.ssh/config according to this page. Check the stderr of your SGE log.

About Vinh Nguyen

Statistician

13 comments

  1. Pingback: piroyon
  2. Pingback: Crunch.io
  3. hi

    i am also interested in running Rmpi/snow across a few nodes, interactively like your case 3. i have followed your instructions closely. but when i run RMPISNOW, the functions like makeCluster() is not available. any ideas what has went wrong?

    thanks

    1. The RMPISNOW script should have loaded the snow library, which makes makeCluster available. If you don’t see makeCluster, then snow has not been loaded.

      What exactly did you type on the command line to start R? What did you type in R? What are the outputs?

  4. hi

    thanks for the reply. i typed:

    qrsh -V -q test.q mpirun -np 9 RMPISNOW

    and get into an R session. seems snow library not loaded as you said. if I type cl <- makeCluster(), or simply makeCluster(), the R session quit saying function not found:

    cl <- makeCluster() cl <- makeCluster() Error: could not find function “makeCluster” Execution halted

    1. Haven’t done this in a while, but after looking in the installed snow library, besides RMPISNOW, there is also a file called RMPISNOWprofile. If I remember correctly, you need to place that content in your ~/.Rprofile file. That should activate the snow library and set up each node correctly.

      With that said, qsub is preferred method if you are on a truely shared SGE cluster (so resources get scheduled and allocated).

  5. hi, thanks for your help and i have made it work. but there are some issues:

    • if there is an error, it crashes and the R session is closed entirely
    • no tab completion
    • cannot use arrow keys to navigate text (either moving left/right to amend the current line or moving up/down to search through the command history)
    • prints result of every query/statement/call
    • help (?functionName) doesn’t work

    please can you hekp?

  6. For error logging, I recommend using the “sink” function to log output. Actually, you can even do so in one of the snowfall commands by specifying the log file.

    You really should not be testing code when you want to run parallel code. You should have debugged all your code in a regular R session before moving to parallel execution. Once this is done, I recommend running the code via “qsub” for the computations to complete, and save your R session. Once done, you can load the R session to do more “exploratory” stuff.

    History of commands, getting help on function isn’t fitted for an R session started with “qrsh”.

  7. Dear Vinh . I wonder whether you van gime some step by step guide in order to install snowfall + sfcluster on server and client level. And how to use ssh key less

    1. Instructions were provided in this very post. First compile OpenMpi with the SGE option if you’re using grid engine. Then compile Rmpi and link it with the appropriate OpenMpi. If you use SGE, set up your parallel environment.

      Google “passwordless ssh” to set it up.

  8. It doesn’t look like this line is correct:

    mpi.remote.exec(paste(“I am”,Sys.info()1,”of”,mpi.comm.size()))

    as it returns Error: unexpected numeric constant in Sys.info()1. Instead, it should perhaps be something like Sys.info().

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>