Scientific Applications of MapReduce

Scientific Applications of MapReduce

First Example: Shotgun Stochastic Search

See also Getting Started with RHadoop on Elastic Map Reduce

All hosts in Hadoop cluster must receive:

  • the features file

  • the parameters file

  • the stochastic search binary (need link to external source here)

Master Script (via interactive R session)

generate weights file mapperFunc <- function(key, value) { # this will run on all the machines on the cluster weightsIteration <- key weights <- value write weight vector to a file on disk exec stochastic search binary read output files from stochastic search (perhaps upload those files to S3) compute p-values, area under the ROC, correlation coefficients... keyval(weightsIteration, list(pval=pval,rocArea=rocArea, coeffs=coeffs) } mapperInput <- to.dfs(weightsMatrix) mapperOutput <- mapreduce(input=mapperInput, map=mapperFunc) stochasticSearchResults <- from.dfs(mapperOutput) iterate over stochasticSearchResults or we could write a real reducer function too!

Next steps

  • Nicole

    • figure out how to auto-shut down cluster

    • figure out what RHadoop does with matrices as input and lists as output

  • Bruce

    • try rmr with topological overlap

  • Erich

    • formulate data for a small example

    • write preliminary R script to kick off a job with an apply loop instead of mapreduce