...
clusterEvalQ(cl, { .libPaths( c('/home/ubuntu/R/library', .libPaths()) ) })
clusterEvalQ(cl, {
install.packages("someUsefulPackage")
require(someUsefulPackage)
})
Loading sage packages
clusterEvalQ(cl, {
options(repos=structure(c(CRAN="http://cran.fhcrc.org/")))
source('http://depot.sagebase.org/CRAN.R')
pkgInstall("synapseClient")
pkgInstall("predictiveModeling")
library(synapseClient)
library(predictiveModeling)
})
...
Loading synapse entities
...
Logging in.
clusterEvalQ(cl, { synapseLogin('joe.user@mydomain.com','secret') })
Asking many worker nodes to request Synapse entities at once is a fun and easy way to mount a distributed denial of service attack on the repository service. The service deals with this by timing out requests, which means some workers will succeed, while others will fail. A couple of tricks will help smooth over these problems.First, we'll
- check if our target data already exists. That way, we can re-try in the event of partial failure without re-doing work and unnecessarily thrashing Synapse.
...
- throw in a few random seconds of rest for our workers. This spreads out the load on Synapse.
clusterEvalQ(cl, {
if (!exists('expr')) {
Sys.sleep(runif(1,0,5))
expr_entity <- loadEntity('syn269056')
expr <- expr_entity$objects$eSet_expr
}
})
Accessing source code repos on worker nodes
-------------------------------------------
Getting code onto the worker nodes can be done like so:
clusterEvalQ(cl, {
system('svn export --no-auth-cache --non-interactive --username joe.user --password supeRsecRet77 https://sagebionetworks.jira.com/svn/COMPBIO/trunk/users/juser/fantasticAnalysis.R')
})
<<github example>>
Return values
-------------
Return values from distributed computations have to come across a socket connection, so be careful what you return. Status values such as dim(result) can confirm that a computation succeeded and are often better than returning a whole result.
clusterEvalQ(cl, {
result <- produceGiantResultMatrix(foo, bar, bat)
dim(result)
})
Also, consider putting intermediate values in synapse, which might serve as a means of checkpointing lengthy computations.
<<synapse example>>
Stopping a cluster
------------------
stopCluster(cl)
Don't forget to delete the stack in the AWS administration console to avoid continuing charges.
To do
-----
* Spot instances? Is this worthwhile for interactive use?
* Create our own Cloud Formation template
* Run a user-specified script on start-up