...
Code Block |
---|
# try something simple
ans <- unlist(clusterEvalQ(cl, { mean(rnorm(1000)) }), use.names=F)
# test a time-consuming job
system.time(ans <- clusterEvalQ(cl, { sapply(1:1000, function(i) {mean(rnorm(10000))}) }))
# do the same thing locally
system.time(ans2 <- sapply(1:(1000*length(hosts)), function(i) {mean(rnorm(10000))}))
# use load balancing parallel lapply
n <- length(cl)*1000
system.time(ans <- parLapplyLB(cl, 1:n, function(x) { mean(rnorm(10000)) })) |
Head node vs. workers
Be aware of when you're running commands on the head node and when commands are running on the workers. Many commands will be better off running on the head node. When it's time to do something in parallel, you'll need to ship data objects to the workers, which is done with clusterExport, something like the following pattern:
...