Goals

The near term goal is to provide an exploration space where individuals could check out our R offerings without having to install any software. Just spin up an R Studio Server browser session with them logged into the Synapse client and download the entity that they were previously looking at on the web. Then play around with the SageBio-curated data and run simple training tutorials. We provide an R Studio session in a sandbox that gets wiped out after each user session. Specificly, in the very near term the first things to enable would be:

  1. Demo of the functionality by Mike / others to gauge interest levels
  2. Ability for Sage Bionetworks employees to play with the prototype to get deeper feedback
  3. Possibly, if it isn't too much trouble and the right situation arises, allow a close collaborator that we trust to also play with the system.

A longer term goal would be able to use the lessons we learn from this prototype to see whether it may be part of the larger solution for computation via Synapse.

Assumptions

Constraints and Other Info

Proposal

Phase 1 Just Get Users Connected to RStudio

Here's an idea for a quick and dirty approach to web-based R for Synapse. If we make the implementation of the auth-plugin for RStudio sound simple and generic enough, perhaps they would be willing to write it for us?  

Phase 2 Extend RStudio Server to do more stuff

Dave had some more great suggestions such as auto-populating the RStudio edit window with some R client code and incorporating the Synapse Web widget for uploads/downloads. I'll leave it to him to describe those more.

Balancing Security and Accessibility

Providing open access to human genetic data presents an interesting challenge for SageBio. There are countless scenarios under which we could be dragged into court when data provided by Synapse is used in dubious ways. The challenge we face is how to provide data security without dialing accessibility to "zero". Our current model is to allow users to download whatever data is available, trusting them to "do the right thing", and taking cover behind carefully authored use agreements in the event that they do not.

While this strategy may provide adequate legal protection, a string of high profile legal cases could turn public opinion against us and potentially derail our efforts to spark an open-source movement in biology. Ultimately, there is no way for us to stop someone bent on "doing the wrong thing" with Synapse data. However, by continually striving to make it hard to do the wrong thing without stifling access, we are protecting ourselves, the burgeoning open biology movement, and most importantly, the patients who donated their genetic information.

We believe that the web-hosted RStudio provides a promising opportunity to create a "padded cell" where analysts can work with sensitive data while keeping it in a secure environment controlled by Synapse. Here are some highlights of the possible features/design: