Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
# set lib path to install packages

clusterEvalQ(cl, { .libPaths( c('/home/ubuntu/R/library', .libPaths()) ) })
clusterEvalQ(cl, {
    install.packages("someUsefulPackage")
    require(someUsefulPackage)
})

Loading sage package

Code Block
clusterEvalQ(cl, {
    options(repos=structure(c(CRAN="http://cran.fhcrc.org/")))
    source('http://depot.sagebase.org/CRAN.R')

    pkgInstall("synapseClient")
    pkgInstall("predictiveModeling")
    
    library(synapseClient)
    library(predictiveModeling)
})

Loading synapse entities

Logging in.

Code Block
clusterEvalQ(cl, { synapseLogin('joe.user@mydomain.com','secret') })

Asking many worker nodes to load packages and request Synapse entities at once is a fun and easy way to mount a distributed denial of service attack on the repository service. The service deals with this by timing out requests, which means some workers will succeed, while others will fail. A couple of tricks will help smooth over these problems.

  1. check if our target data already exists. That way, we can re-try in the event of partial failure without re-doing work and unnecessarily thrashing Synapse.
  2. throw in a few random seconds of rest for our workers. This spreads out the load on Synapse.

...

isn't a recommended or scalable approach.

Instead, see Configuration of Cluster for Scientific Computing for an example of connecting a shared EBS volume to the nodes. How to do this in the context of a cloud formation stack is something yet to be figured out.

Accessing source code repos on worker nodes

...