...
In the project she's able to make some notes on what she's doing done using the wiki tools, referencing the data frame and plot. The new project is now in her list of her recent projects.
At local OS command line
syn get . -recurse = true
Pulls down the two new plot files locally. Dataframe could be either .csv or Rbinary file.
Benefits:
- Synapse as dashboard of all her projects, regardless of where the data is living or who the collaborators are (this is likely one of many projects she is switching among)
- Wiki as notebook for future self.
- Annotations to enhance ability to later find the data / project if it goes dormant for a while.
...
After some time she arrives at some preliminary findings she wants to share with her collaborator Bob (more of a biologist (Bob) she is collaborating with.
At local OS command line
syn get . -recurse = true
Pulls down the two new plot files locally. Dataframe could be either .csv or Rbinary file.
At this point she switches over to the Synapse web client and uses previews of the two new results files to write up a summary of her findings in the project wiki. Then she . She adds Bob to the project and emails him a link to view the results. Bob is able to review Alice's findings, comment on the wiki pages. He's got some new data he wants to share with Alice so he uploads it to the project from the web client. Alice receives a notification (via configurated email notifications, or project activity history, etc). Alice is able to pull the files down using an analytical client to her local environment and continue working.
syn get . -recurse = true
Later, Alice would like her analyst friend Carl at another institution to check her analysis. (or would like a backup of her work, or access to it from another machine...)
...
The project could evolve for sometime in this fashion, mainly relying on the file-folder API, wiki, and collaboration features. Extensions could be to have users manage multiple storage locations (e.g. their own S3 buckets), or have clients that automatically synched content in the background.
Benefits:
- Authorization controls over project contents
- Synchronize files among multiple environments (different instution's in house systems, cloud offerings, etc)
- Shared collaborative workspace to pull key findings together from multiple people.
Reproducible Ad hoc analysis
After some time, Alice has a result she believes is important and will eventually form part of a paper, and she wants to make sure Carl can see exactly what she did. At this point she builds a set of R scripts which process the data though a series of steps. She stores the scrips in a GitHub repository associated with the project. She also uses a few bioinformatics tools installed on her local system from the command line of linux as part of her process. Now, she re-runs the analysis, this time recording what she did using Synapse provenance features to link all the files starting with raw data through all intermediate results and ending with a set of figures, vectors, and other output data. All this can be pushed up to Synapse as before, but now there is a graphical representation of her process available in Synapse that Carl can use to review her work, including links to the code and tools she used. (Command line client would need to push up the commands used to run tools at the linux command line). If Carl and Alice are working on the same system, access to the code or commands to execute system programs should give Carl a pretty good idea of exactly what Alice did, and she can provide additional commentary in the wiki and/or edit the provenance records to provide more details (e.g. version info for some of the tools she used).
TODO: Outline of adding provenance calls from R / command line / python
An extension of this scenario in the case where both users are working in Amazon would include capturing the specifics of the environment used to run the analysis (AMI, size, etc) as additional parts of the provenance record. These environment descriptions could be stored as Files pointing to publicly-accessible AMIs, allowing anyone to execute the work (in their own AWS account). In fact, Alice may want to rerun the analysis on Amazon again before publication to ensure that her reviewer can step into her analysis, using her project as supplemental materials to her paper.
...