Skip to end of banner
Go to start of banner

Distributed Compute Jobs

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Problem:

How to run multiple jobs in parallel on a common data set.  Code and results should open/transparent/reproducible.

Out of scope (for now):

Complex workflows (e.g. combining intermediate results from distributed jobs);

Automatically configuring the nodes for arbitrary requirements;

 

Approach

Phase 1 approach:

- use StarCluster to create a Sun Grid Engine (SGE) cluster.

- Put data and code on NFS file system on the cluster.

- Write SGE job files for the jobs; each job runs the code and sends the results to the NFS

Phase 2 approach:

- use Star Cluster to create a Sun Grid Engine (SGE) cluster.

- Create a Synapse dataset with two locations, (1) S3, (2) NFS file system on the cluster.

- Write SGE job files for the jobs; each job runs the code and sends the results to Synapse

- Push job files to Synapse for future reference

subsequent phases will tackle these issues:

- pull code from Synapse

- pass user credentials without putting them in file(s)

- move queue to AWS SWF or SQS

 

 

  • No labels