...
For map/reduce style computations, look at Rhipe or Segue. R at 12,000 Cores describes the “Programming with Big Data in R” project (pbdR). For batch jobs, Starcluster may be a better choice. Brian Holt (Unlicensed) wrote up a document on using R on starcluster called /wiki/spaces/IT/pages/7867417. Other documents give an overview of Distributed Computation Strategy and how to run Distributed Compute Jobs.
Note that the parallel package is perfectly happy starting up several copies of R on a single machine, which can be helpful for testing.
Starting a cluster with Bioconductor and Cloud Formation
...
Code Block |
---|
# grab host IPs lines <- readLines("/usr/local/Rmpi/hostfile.plain") # we'll want to start a worker for each core on each # machine in the cluster hosts <- do.call(c, lapply(strsplit(lines, " "), function(host) { rep(host[1], as.integer(host[2])) })) library(parallel) help(package=parallel) cl <- makePSOCKcluster(hosts) |
Note that the parallel package is perfectly happy starting up several copies of R on a single machine, which can be helpful for testing.
Simple tests
Try a few simple tests to make sure we're able to evaluate code on the workers and that it buys us some speed.
...