...
Code Block |
---|
# set lib path to install packages clusterEvalQ(cl, { .libPaths( c('/home/ubuntu/R/library', .libPaths()) ) }) clusterEvalQ(cl, { install.packages("someUsefulPackage") require(someUsefulPackage) }) |
Loading sage package
Code Block |
---|
clusterEvalQ(cl, {
options(repos=structure(c(CRAN="http://cran.fhcrc.org/")))
source('http://depot.sagebase.org/CRAN.R')
pkgInstall("synapseClient")
pkgInstall("predictiveModeling")
library(synapseClient)
library(predictiveModeling)
}) |
Loading synapse entities
Logging in.
Code Block |
---|
clusterEvalQ(cl, { synapseLogin('joe.user@mydomain.com','secret') }) |
Asking many worker nodes to load packages and request Synapse entities at once is a fun and easy way to mount a distributed denial of service attack on the repository service. The service deals with this by timing out requests, which means some workers will succeed, while others will fail. A couple of tricks will help smooth over these problems.
- check if our target data already exists. That way, we can re-try in the event of partial failure without re-doing work and unnecessarily thrashing Synapse.
- throw in a few random seconds of rest for our workers. This spreads out the load on Synapse.
...
isn't a recommended or scalable approach.
Instead, see Configuration of Cluster for Scientific Computing for an example of connecting a shared EBS volume to the nodes. How to do this in the context of a cloud formation stack is something yet to be figured out.
Accessing source code repos on worker nodes
...