Skip to end of banner
Go to start of banner

Scientific Applications of MapReduce

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »

First Example: Shotgun Stochastic Search

See also Getting Started with RHadoop on Elastic Map Reduce

All hosts in Hadoop cluster must receive:

  • the features file
  • the parameters file
  • the stochastic search binary (need link to external source here)

Master Script (via interactive R session)


generate weights file

mapperFunc <- function(key, value) { # this will run on all the machines on the cluster
   weightsIteration <- key
   weights <- value

   write weight vector to a file on disk

   exec stochastic search binary

   read output files from stochastic search (perhaps upload those files to S3)

   compute p-values, area under the ROC, correlation coefficients...

   keyval(weightsIteration, list(pval=pval,rocArea=rocArea, coeffs=coeffs)
}

mapperInput <- to.dfs(weightsMatrix)
mapperOutput <- mapreduce(input=mapperInput, map=mapperFunc)

stochasticSearchResults <- from.dfs(mapperOutput)

iterate over stochasticSearchResults or we could write a real reducer function too!

Next steps

  • Nicole
    • figure out how to auto-shut down cluster
    • figure out what RHadoop does with matrices as input and lists as output
  • Bruce
    • try rmr with topological overlap
  • Erich
    • formulate data for a small example
    • write preliminary R script to kick off a job with an apply loop instead of mapreduce
  • No labels