Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

Table of Contents

This design document is now obsolete

Design Goals

  1. Allow scientists to write the workflow activities in their preferred language
    • approach: further develop the Synapse R client and the service APIs that it utilizes
  2. Ensure that scientists can reuse use the same code as workflow activities AND also for one-off tasks they want to run on their laptops and also as workflow activities
    • approach: use a particular input/output parameter scheme for all R scripts
    • approach: R scripts have no direct dependencies upon Amazon's Simple Workflow Service, they only depend upon Synapse
  3. Ensure that the workflow is scalable to many nodes running concurrently
    • approach: use Amazon's Simple Workflow system
  4. Minimize the amount of workflow decision logic needed in non-R code
    • approach: to keep the complicated logic about whether a particular script should be run on a particular piece of source data out of Java, instead pass all source data to every R script and let the R script decide whether it wants to work on the data or not

Steps in the TCGA Workflow

Image Added

Workflow Scaling

Image Added

Workflow Architecture

Image Added

Preliminary R Script API

To invoke a script locally:

Code Block
R createMatrix.R --args --username 'nicole.deflaux@sagebase.org' --password XXXXX --datasetId 543 --layerId 544

Script workflow output to STDOUT:

Code Block
blah blah, this is ignored ...
SynapseWorkflowResult_START
{"layerId":560}
SynapseWorkflowResult_END
blah blah, this is ignored too ...

Details

Identify new or updated TCGA Datasets

...