Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Added idea of secure web-hosted RStudio session

...

Dave had some more great suggestions such as auto-populating the RStudio edit window with some R client code and incorporating the Synapse Web widget for uploads/downloads. I'll leave it to him to describe those more.

Balancing Security and Accessibility

Providing open access to human genetic data presents an interesting challenge for SageBio. There are countless scenarios under which we could be dragged into court when data provided by Synapse is used in dubious ways. The challenge we face is how to provide data security without dialing accessibility to "zero". Our current model is to allow users to download whatever data is available, trusting them to "do the right thing", and taking cover behind carefully authored use agreements in the event that they do not.

While this strategy may provide adequate legal protection, a string of high profile legal cases could turn public opinion against us and potentially derail our efforts to spark an open-source movement in biology. Ultimately, there is no way for us to stop someone bent on "doing the wrong thing" with Synapse data. However, by continually striving to make it hard to do the wrong thing without stifling access, we are protecting ourselves, the burgeoning open biology movement, and most importantly, the patients who donated their genetic information.

We believe that the web-hosted RStudio provides a promising opportunity to create a "padded cell" where analysts can work with sensitive data while keeping it in a secure environment controlled by Synapse. Here are some highlights of the possible features/design:

  • A secure RStudio session could be spun up on an ami that had no direct access to the internet
  • The AMI would only be able to access Synapse web services and would have access to the entire API
  • Highly sensitive data layers (e.g. human genetic data) would only be downloadable from one of these secure AMIs
  • Layers (i.e. legacy locations) created from the secure hosted RStudio would only have download permissions from one of these secure AMIs
    • Possibly this restriction would only apply to certain layer types. For example you could create media layers from the AMI that would be accessible from the web client, etc.
    • This would prevent users from simply creating a copy of the data in another project that could then be downloaded without restriction
  • Layer annotations would have no restriction on where they could be accessed and modified