First Example: Shotgun Stochastic Search

All hosts in Hadoop cluster must receive:

the features file
the parameters file
the stochastic search binary (need link to external source here)

Master Script (via interactive R session)


generate weights file

mapperFunc <- function(key, value) {
   weightsIteration <- key
   weights <- value

   write weight vector to a file on disk

   exec stochastic search binary

   read output files from stochastic search (perhaps upload those files to S3)

   compute p-values, area under the ROC, correlation coefficients...

   keyval(weightsIteration, list(pval=pval,rocArea=rocArea, coeffs=coeffs)
}

mapperInput <- to.dfs(weightsMatrix)
mapperOutput <- mapreduce(input=mapperInput, map=mapperFunc)

stochasticSearchResults <- from.dfs(mapperOutput)

iterate over stochasticSearchResults or we could write a real reducer function too!

Next steps

Nicole
- figure out how to auto-shut down cluster
- figure out what RHadoop does with matrices as input and lists as output
Bruce
- try rmr with topological overlap
Erich
- formulate data for a small example
- write preliminary R script to kick off a job with an apply loop instead of mapreduce

Scientific Applications of MapReduce

First Example: Shotgun Stochastic Search

Next steps