Skip to end of banner
Go to start of banner

Distributed Compute Jobs

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

Problem:

How to run multiple jobs in parallel on a common data set.  Code and results should open/transparent/reproducible.

Out of scope (for now):

Complex workflows (e.g. combining intermediate results from distributed jobs);

Automatically configuring the nodes for arbitrary requirements;

 

Resources

Computing Resources

AWS

Google Big Compute

Process Initialization

Sun Grid Engine

MapReduce/Hadoop

Job Assignment / Monitoring

AWS Simple Workflow

AWS Simple Queue Service

 

 

Approach

Phase 1 approach:

- use StarCluster to create a Sun Grid Engine (SGE) cluster.

- Put data and code on NFS file system on the cluster.

- Write SGE job files for the jobs; each job runs the code and sends the results to the NFS

Phase 2 approach:

- use Star Cluster to create a Sun Grid Engine (SGE) cluster.

- Create a Synapse dataset with two locations, (1) S3, (2) NFS file system on the cluster.

- Write SGE job files for the jobs; each job runs the code and sends the results to Synapse

- Push job files to Synapse for future reference

subsequent phases will tackle these issues:

- pull code from Synapse

- pass user credentials without putting them in file(s)

- move queue to AWS SWF or SQS

 

 

  • No labels

0 Comments

You are not logged in. Any changes you make will be marked as anonymous. You may want to Log In if you already have an account.