Skip to end of banner
Go to start of banner

Execution of User Supplied Code

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Should Synapse be concerned with running code supplied by users?

Reasons for executing user-submitted code:

- verifying output of previous run (reproducibility)
- sensitivity of input data (reluctance to release input data to author of the executed code)
- reuse convenience (when compilation of source code and dependencies into executable is laborious)

The following are not reasons for executing user supplied code:

- comprehensibility (just because you can execute code doesn't mean you can read or understand it. Source code + documentation is better for this)
- reusabilty (just because you can execute code doesn't mean it's in a form that can be reused in another executable. Source code is better for this.)
- large data (may need to move code closer to data, but this can be achieved by giving the author of the executable access to the system where the data is stored.)

 

Technology Choices for Encapsulating Users Code

Docker (a 'containerization' technology) has recently become a popular approach for creating rerunnable executables. In Docker these executables are called 'images'.

 

Options for using Docker with Synapse/Challenges:

No image sharing (e.g. ALS Stratification Challenge)

In this approach a challenge administrator creates containers (running images) on users' behalf. Users 'log in' and modify containers as if they're in a unix account.

Advantages: Users need not learn Docker. Containers can be run on a shared system where users are denied admin access. Little or no Synapse development.
Disadvantages: Users don't have visibility as to how images are stored and organized. Mechanism is needed to 'signal' to the administrator to create an image from a configured container. While submission may be streamlined, later sharing with or reuse by third parties is not provided and must be supported separately.

Put images into S3 files (e.g. SMC-HET Challenge)

Advantages: No software changes necessary in Synapse. Full upload/download, sharing, search, governance functionality available immediately.
Disadvantages: Wasteful in terms of space: No sharing of 'layers' between images.

Users/participants use DockerHub. Synapse just tracks URIs which are references to images in DockerHub.

Advantages: Little or no Synapse development. (May add a 'URI file handle'.)
Disadvantages: Limited privacy for free accounts. Lack of integration with Synapse means no Synapse-based sharing, search, or governance.

 

Add Docker registry to Synapse.

(More details below.)

Advantages: Full Synapse integration (sharing, search, governance)
Disadvantages: Non-trivial software effort.

 

How can Docker be tightly integrated with Synapse?

Docker images are stored in a 'registry', organized by user, repository and version. User accounts and authorization are controlled by a separate component called an 'index'. The registry can be freely deployed and configured to use any index that implements the right API.
https://github.com/docker/distribution/blob/master/docs/spec/auth/token.md
This is good news for Synapse, which can potentially be extended to fill the role of an index, handling authorization for a private registry. To implement the API Synapse would have to map Docker registry users to Synapse users and Docker repositories to (new) Synapse entities.

An open question is how to display Docker repositories in the web UI: As with Synapse Tables, repositories may not be best conceptualized as files but as another kind of entity. It might be best to display the repositories for a user or project on a separate tab.


Running Docker containers

Beyond creating and managing images, there is the matter of where to run them. A simple approach is to install Docker on a single machine. Other approaches include:

Amazon Elastic Container Service

 

IBM Containers Service

 

Galaxy Docker extension (e.g. SMC-HET challenge)

 

 

 

  • No labels