Page Comparison

...

For map/reduce style computations, look at Rhipe or Segue. For batch jobs, Starcluster may be a better choice. Brian Holt (Unlicensed) wrote up a document on using R on starcluster called /wiki/spaces/IT/pages/7867417. Other documents give an overview of Distributed Computation Strategy and how to run Distributed Compute Jobs.

Note that the parallel package is perfectly happy starting up several copies of R on a single machine, which can be helpful for testing.

Starting a cluster with Bioconductor and Cloud Formation

The BioConductor group has put together a Cloud Formation stack for doing interactive parallel computing in R on Amazon AWS. Follow those instructions, selecting the number of workers and size of the EC2 instances. Once the stack comes up, which took about 10 minutes for me, you log into RStudio on the head node. You'll start R processes on the worker nodes and send commands to the workers.

stack name: StartBioCParallelClusterWithSSH
template url: https://s3.amazonaws.com/bioc-cloudformation-templates/parallel_cluster_ssh.json

Note that the parallel package is perfectly happy starting up several copies of R on a single machine, which can be helpful for testingAfter starting the cloud formation script, you'll get a URL for a head-node running R-Studio. Click on that and continue...

Starting a cluster

The IP addresses of the workers (and the head node) get stored on the head node in a file. We'll read that file and create an R process for each core on each worker.

...

Versions Compared

Old Version 16

New Version 17

Key

Starting a cluster with Bioconductor and Cloud Formation

Starting a cluster