...
Code Block |
---|
# set lib path to install packages clusterEvalQ(cl, { .libPaths( c('/home/ubuntu/R/library', .libPaths()) ) }) clusterEvalQ(cl, { install.packages("someUsefulPackage") require(someUsefulPackage) }) |
...
Sage packages
Code Block |
---|
clusterEvalQ(cl, {
options(repos=structure(c(CRAN="http://cran.fhcrc.org/")))
source('http://depot.sagebase.org/CRAN.R')
pkgInstall("synapseClient")
pkgInstall("predictiveModeling")
library(synapseClient)
library(predictiveModeling)
}) |
Logging workers into synapse:
Code Block |
---|
clusterEvalQ(cl, { synapseLogin('joe.user@mydomain.com','secret') }) |
Asking many worker nodes to load packages and request Synapse entities isn't a recommended or scalable approach.Instead, see request Synapse entities at once is a fun and easy way to mount a distributed denial of service attack on the repository service. The service deals with this by timing out requests, which means some workers will succeed, while others will fail. A couple of tricks will help smooth over these problems.
- check if our target data already exists. That way, we can re-try in the event of partial failure without re-doing work and unnecessarily thrashing Synapse.
- throw in a few random seconds of rest for our workers. This spreads out the load on Synapse.
Code Block |
---|
clusterEvalQ(cl, {
if (!exists('expr')) {
Sys.sleep(runif(1,0,5))
expr_entity <- loadEntity('syn269056')
expr <- expr_entity$objects$eSet_expr
}
}) |
Attaching a shared EBS volume
It might be worth looking into attaching a shared EBS volume and adding that to R's .libPaths(). See Configuration of Cluster for Scientific Computing for an example of connecting a shared EBS volume to the nodesin StarCluster. How to do this in the context of a cloud formation stack is something yet to be figured out.
<<attached shared EBS volume for R packages and files>>
Accessing source code repos on worker nodes
...