Execution of User Supplied Code
Technology Choices for Encapsulating Users Code
Docker (a 'containerization' technology) has recently become a popular approach for creating rerunnable executables. In Docker these executables are called 'images'.
As of December 2014 there is a new alternative called Rocket, https://coreos.com/blog/rocket/
Should Synapse be concerned with running containers supplied by users?
Reasons for executing user-submitted containers:
- verifying output of previous run (reproducibility)
- sensitivity of input data (reluctance to release input data to author of the executed code)
- convenience when compilation of source code and dependencies into executable is laborious
The following are by themselves not reasons for executing user supplied containers:
- comprehensibility (Just because you can execute code doesn't mean you can read or understand it. Source code + documentation is better for this)
- repurposibility (Just because you can execute code doesn't mean it's in a form that can be repurposed in another executable. Source code is better for this.)
- large data (May need to move code closer to data, but this can be achieved by giving the author of the executable access to the system where the data is stored. Large data could become a driver for containerization in combination with one of the aforementioned drivers, e.g. when re-compilation of source code on the target platform is laborious.)
Options for using Docker with Synapse/Challenges:
No image sharing (e.g. ALS Stratification Challenge)
In this approach a challenge administrator creates containers (running images) on users' behalf. Users 'log in' and modify containers as if they're in a unix account.
Advantages: Users need not learn Docker. Containers can be run on a shared system where users are denied admin access. Little or no Synapse development.
Disadvantages: Users don't have visibility as to how images are stored and organized. Mechanism is needed to 'signal' to the administrator to create an image from a configured container. While submission may be streamlined, later sharing with or reuse by third parties is not provided and must be supported separately.
Put images into S3 files (e.g. SMC-HET Challenge)
Advantages: No software changes necessary in Synapse. Full upload/download, sharing, search, governance functionality available immediately.
Disadvantages: Wasteful in terms of space: No sharing of 'layers' between images.
Users/participants use DockerHub. Synapse just tracks URIs which are references to images in DockerHub.
Advantages: Little or no Synapse development. (May add a 'URI file handle'.)
Disadvantages: Limited privacy for free accounts. Lack of integration with Synapse means no Synapse-based sharing, search, or governance.
Add Docker registry to Synapse.
(More details below.)
Advantages: Full Synapse integration (sharing, search, governance)
Disadvantages: Non-trivial software effort.
How can Docker be tightly integrated with Synapse?
Docker images are stored in a 'registry', organized by user, repository and version. User accounts and authorization are controlled by a separate component called an 'index'. The registry can be freely deployed and configured to use any index that implements the right API.
https://github.com/docker/distribution/blob/master/docs/spec/auth/token.md
This is good news for Synapse, which can potentially be extended to fill the role of an index, handling authorization for a private registry. To implement the API Synapse would have to map Docker registry users to Synapse users and Docker repositories to (new) Synapse entities.
An open question is how to display Docker repositories in the web UI: As with Synapse Tables, repositories may not be best conceptualized as files but as another kind of entity. It might be best to display the repositories for a user or project on a separate tab.
Running Docker containers
Beyond creating and managing images, there is the matter of where to run them. A simple approach is to install Docker on a machine and run containers there. Other approaches include:
Amazon Elastic Container Service
IBM Containers Service
https://console.ng.bluemix.net/
Galaxy Docker invocation (e.g. SMC-HET challenge)
https://www.synapse.org/#!Synapse:syn2786217/wiki/74384
At this time we have not done a detailed comparison of the pros and cons of each alternative.