Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

After some time, Alice has a result she believes is important and will eventually form part of a paper, and she wants to make sure Carl can see exactly what she did.  At this point she builds a set of R scripts which process the data though a series of steps.  She stores the scrips in a GitHub repository associated with the project.  She also uses a few bioinformatics tools from the command line of linux as part of her process.  Now, she re-runs the analysis, this time recording what she did using Synapse provenance features to link all the files starting with raw data through all intermediate results and ending with a set of figures, vectors, and other output data.  All this can be pushed up to Synapse as before, but now there is a graphical representation of her process available in Synapse that Carl can use to review her work, including links to the code and tools she used.  (Command line client would need to push up the commands used to run tools at the linux command line).

If Carl and Alice are working on the same system access to the code or commands to execute system programs should give Carl a pretty good idea of exactly what Alice did, and she can provide additional commentary in the wiki.

An extension of this scenario in the case where both users are working in Amazon would include capturing the specifics of the environment used to run the analysis (AMI, size, etc) as additional parts of the provenance record.  These environment descriptions could be stored as Files pointing to publicly-accessible AMIs, allowing anyone to execute the work.  In fact, before Alice may want to run the analysis on Amazon again before publication to ensure that her reviewer can step into her analysis, using her project as supplemental materials to her paper.