Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Reproducible research is a fundamental responsibility of scientists, but the best practices for achieving it are not established in computational biology. The Synapse provenance system is one of many solutions you can use to make your work reproducible by you and others.

Overview of Synapse

...

Provenance

Provenance is a concept describing the origin of something. In Synapse, it is used to describe the connections between the workflow steps used to create a particular file or set of results. Data analysis often involves multiple steps to go from a raw data file to a finished analysis. Synapse’s provenance tools allow users to keep track of each step involved in an analysis and share those steps with other users.

...

Below is a Synapse visualization of provenance relationships that was created with the example in this guide using our programmatic and web clients. In this example, we have two scripts, one that generates random numbers and another that takes a list of numbers and computes their squares. The project’s workflow resembles the provenance relationships.

...

Setting

...

Provenance when

...

Uploading a

...

File

Let’s begin with a script that generates a list of normally distributed random numbers and saves the output to a file. For example, you have an R script file called generate_random_data.R and you’ve saved the output to a data file called random_numbers.txt. We’ll begin by uploading the files to Synapse and then set their provenance.

Upload a

...

File and

...

Add Provenance

For this example, we’ll use a project that already exists (Wondrous Research Example : syn1901847). The code file is saved in Synapse with synID syn7205215, so we’ll upload the data file to this project, or in Synapse terminology, the project will be the parent of the new entities.

...

Once the data file is uploaded, Synapse will provide the synID assigned to that file. In this case, the synID is syn7208917.

Editing

...

Provenance

To continue our example above, we’ll now add some new results from our initial data file. We’re going to take the results in random_numbers.txt and square them. The script to square the numbers will be square.R, and we’ll save the output to a data file, squares.txt. As with the previous example, the code file is already saved in Synapse, so we’ll upload the data file and set its provenance.

...

Code Block
languager
# Add the data file to Synapse
squared_file <- File(path="squares.txt", parentId="syn1901847")
squared_file <- synStore(squared_file)

# Set provenance for newly created entity syn7209166
act <- Activity(name = "Squared numbers", used = "syn7208917", executed = "syn7209078")
synStore(squared_file, activity=act)

# Provenance can also be set using local variables instead of looking up synIds
act <- Activity(name = "Squared numbers", used = data_file, executed = "syn7209078")
squared_file <- synStore(squared_file, activity=act)

Deleting

...

Provenance

To delete a provenance relationship, you must be the person who created the entity.

...

Code Block
languager
# Delete provenance on entity syn123
deleteProvenance = synDeleteProvenance('syn123')

Viewing

...

Provenance

Web

Navigate to a file to view its provenance. Clicking on the triple dots above an entity will expand it to show the file's full provenance.

...

Code Block
languager
provenance <- synGetProvenance("syn7209166")
provenance

Reusing

...

Provenance for

...

Multiple Files

An activity is a Synapse object that helps to keep track of what objects were used in an analysis step, as well as what objects were generated. Thus, all relationships between Synapse objects and an activity are governed by dependencies. That is, an activity needs to know what it ‘used’, and outputs need to know what activity they were ‘generatedBy’. A couple of points for clarity:

...