Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Elias' randomized simulation. Requires 10,000 runs of elastic net, lasso, ridge using slightly different data.
  2. In Sock's prediction pipeline. Very similar to Elias use case. Parallelization can be either on: a) each predictive model (as in Elias' case); b) each bootstrap run; c) each cross validation fold.
  3. Roche Collaboration: a Bayesian Network analysis which is computationally intensive because it performs a big exploration of parameter space

Solutions to explore

  1. iPython (on Amazon). Larsson says this allows parallelization in Python the same way we are trying to design into BigR. He says this is already set up to run using Star Cluster on Amazon.
  2. Revolution foreach (on Amazon). Chris Bare brings up a good point – have we explored if Revolution's foreach package can run on Amazon? I would think this is the first place they would implement it and likely someone has gotten it working? (Note:  From http://blog.revolutionanalytics.com/2009/07/simple-scalable-parallel-computing-in-r.html "it also allows iterations of foreach loops to run on separate machines on a cluster, or in a cloud environment like Amazon EC2")
    1. Looks like there are tons of offerings for the R language: http://cran.r-project.org/web/views/HighPerformanceComputing.html