Page Comparison

...

For map/reduce style computations, look at Rhipe or Segue. R at 12,000 Cores describes the “Programming with Big Data in R” project (pbdR). For batch jobs, Starcluster may be a better choice. Brian Holt (Unlicensed) wrote up a document on using R on starcluster called /wiki/spaces/IT/pages/7867417. Other documents give an overview of Distributed Computation Strategy and how to run Distributed Compute Jobs.

Note that the parallel package is perfectly happy starting up several copies of R on a single machine, which can be helpful for testing.

Starting a cluster with Bioconductor and Cloud Formation

...

Code Block

# grab host IPs
lines <- readLines("/usr/local/Rmpi/hostfile.plain")

# we'll want to start a worker for each core on each
# machine in the cluster
hosts <- do.call(c, lapply(strsplit(lines, " "), function(host) { rep(host[1], as.integer(host[2])) }))

library(parallel)
help(package=parallel)

cl <- makePSOCKcluster(hosts)

Note that the parallel package is perfectly happy starting up several copies of R on a single machine, which can be helpful for testing.

Simple tests

Try a few simple tests to make sure we're able to evaluate code on the workers and that it buys us some speed.

...

Versions Compared

Old Version 18

New Version 19

Key

Starting a cluster with Bioconductor and Cloud Formation

Simple tests