This guide helps organizers create a space within Synapse to host a crowd-sourced Challenge. Challenges are community competitions designed to crowd-source new computational methods for fundamental questions in systems biology and translational medicine. Learn more about Challenges and see examples of past and current projects by visiting Challenges and Benchmarking.
A Challenge space provides participants with a Synapse project to learn about the Challenge, join the Challenge community, submit entries, track progress, and view results. This article will focus on:
Challenge infrastructure setup
Challenge launch
Submission Monitoring
Challenge update
Before You Begin
Computing Power
At Sage Bionetworks, we generally provision an EC2 Linux instance for a Challenge that leverages SynapseWorkflowOrchestrator to run CWL workflows. These workflows will be responsible for evaluating and scoring submissions (see model-to-data-challenge-workflow GitHub for an example workflow). If Sage Bionetworks is responsible for the cloud compute services, please give a general estimate of the computing power (memory, volume) needed. We can also help with the estimates if you are unsure.
By default, up to ten submissions can be evaluated concurrently, though this number can be increased or decreased accordingly within the orchestrator's .env file. Generally, the more submissions you want to run concurrently, the more power will be required of the instance.
Using Sensitive Data
Synapse has the ability to apply access restrictions to sensitive data (e.g. human data), so that legal requirements are met before participants can access it. If human data are being used in the Challenge, or if you have any questions about the sensitivity of the Challenge data, contact the Synapse Access and Compliance Team (act@sagebase.org) for support with the necessary data access procedures.
If data is sensitive and cannot be hosted on Synapse (e.g. it cannot leave the external site or data provider), provide a remote server with the following:
Support for Docker and, if possible,
docker-compose
If Docker is not allowed, then support for Singularity and Java 8 is required
SynapseWorkflowOrchestrator repository
If Sage is not allowed access to the server, then it is the data contributor’s responsibility to get the orchestrator running in whatever environment chosen. If Docker is not supported by the system, please let us know as we do have solutions for workarounds (e.g. using Java to execute, etc.).
Challenge Infrastructure Setup
Requirements
Synapse account
Python 3.7+
(for local testing) CWL runner of choice, e.g. cwltool
Access to cloud compute services, e.g. AWS, GCP, etc.
Steps
1. Create a workflow infrastructure GitHub repository for the Challenge. For the orchestrator to work, the repository must be public.
We have created two templates in Sage-Bionetworks-Challenges that you may use as a starting point. Their READMEs outline what will need to be updated within the scripts (under Configurations), but we will return to this later in Step 10.
Workflow Template | Submission Type |
---|---|
Flat files (e.g. CSV) | |
Docker images |
2. Create the Challenge site on Synapse with challengeutils' create-challenge
:
challengeutils create-challenge "challenge_name"
This command will create two Synapse projects:
Staging - Organizers will use this project during the Challenge development to share files and draft the Challenge wiki. The
create-challenge
command initializes the wiki with the DREAM Challenge Wiki Template.Live - Organizers will use this project as the pre-registration page during Challenge development. When the Challenge is ready for launch, the project will then be replaced with the contents from staging.
You may think of these two projects as development (staging project) and production (live project), in that all edits must be done in the staging site, NOT the live site. Maintenance of both projects enables wiki content to be edited and previewed in the staging project before the content is published to the live project. Changes to the live site are synced over with challengeutils' mirror-wiki
(see Update the Challenge below for more info).
At first, the live site will be just one page where a general overview about the Challenge is provided. There will also be a pre-register button that Synapse users can click on if they are interested in the upcoming Challenge.
create-challenge
will also create four /wiki/spaces/DOCS/pages/1985446029 for the Challenge:
Participants - This Synapse team includes the individual participants and teams who register to the Challenge.
Organizers - The Challenge organizers must be added to this list to provide the permissions to share files and edit the wiki on the staging project.
Admin - The Challenge administrators have administrator access to both the live and staging project. Organizers do not need to be Administrators. Ideally, all admin must have a good understanding of Synapse.
Pre-registrants - This team is recommended for when the Challenge is under development. It allows participants to join a mailing list to receive notification of Challenge launch news.
Add Synapse users to the Organizers and Admin teams as needed.
3. On the live site, go to the Challenge tab and create as many /wiki/spaces/DOCS/pages/1985151345 as needed (for example, one per question/task) by clicking on Challenge Tools > Create Evaluation Queue. By default, create-challenge
will create an evaluation queue for writeups. More information on writeups and how to collect them here.
The 7-digits in the parentheses following each evaluation queue name is its evaluation ID. You will need these IDs later in Step 10.
4. While still on the live site, go to the Files tab and create a new folder called "Logs" by clicking on the add-folder icon:
More information about the log folder and its folder structure provided below.
This folder will contain the participants' submission logs and prediction files (if any). Make note of this Folder ID for use later in Step 10.
5. On the staging site, go to the Files tab and click on the upload icon to Upload or Link to a File:
6. In the pop-up window, switch tabs to Link to URL. For "URL", enter the web address to the zipped download of the workflow infrastructure repository. You may get this address by going to the repository and clicking on Code > right-clicking Download Zip > Copy Link Address:
Name the file (e.g. "workflow"), then click Save.
This file will be what links the evaluation queue to the orchestrator. Make note of this File ID for use later in Step 10.
7. /wiki/spaces/DOCS/pages/2667708522 to the file called ROOT_TEMPLATE by clicking on Files Tools > Annotations > Edit. The "Value" will be the path to the workflow script, written as:
{infrastructure workflow repo}-{branch}/path/to/workflow.cwl
For example, this is the path to workflow.cwl
of the model-to-data template repo:
model-to-data-challenge-workflow-main/workflow.cwl
The ROOT_TEMPLATE annotation is what the orchestrator uses to determine which file among the repo is the workflow script.
8. Create a cloud compute environment with the required memory and volume specifications, then SSH into the instance.
If you are a Sage employee: you can follow our internal instructions on /wiki/spaces/CHAL/pages/2806087732 here.
If you are not a Sage employee: follow the instructions listed under "Setting up linux environment" to install and run Docker as well as docker-compose
onto the compute environment of choice.
9. On the instance, clone the SynapseWorkflowOrchestrator repo if needed. Change directories to SynapseWorkflowOrchestrator/
and create a copy of the .envTemplate
file as .env
(or simply rename it to .env
):
cd SynapseWorkflowOrchestrator/ cp .envTemplate .env
10. Open .env
and enter values for the following property variables:
Property | Description | Example |
| Synapse credentials under which the orchestrator will run. The provided user must have access to the evaluation queue(s) being serviced. |
|
| Password for This can be found under My Dashboard > Settings. |
|
| Synapse ID for "Logs" folder. Use the Synapse ID from Step 4. |
|
| JSON map of evaluation IDs to the workflow repo archive, where the key is the evaluation ID and the value is the link address to the archive. Use the evaluation IDs from Step 3 as the key(s) and the Synapse ID from Step 5 as the value(s). |
|
Other properties may also be updated if desired, e.g. SUBMITTER_NOTIFICATION_MASK
, SHARE_RESULTS_IMMEDIATELY
, MAX_CONCURRENT_WORKFLOWS
, etc. Refer to the "Running the Orchestrator with Docker containers" notes in the README for more details.
11. Return to the workflow infrastructure repo and clone it onto your local machine. Using a text editor or IDE, make the following updates to the following scripts:
Data-to-model template:
Script | TODO | Required? |
| Update | yes |
Set | no | |
Add metrics and scores to | no | |
| Update the base image if the validation code is not Python | no |
Remove the sample validation code and replace with validation code for the Challenge | yes | |
| Update the base image if the validation code is not Python | no |
Remove the sample scoring code and replace with scoring code for the Challenge | yes |
Model-to-data template:
Script | TODO | Required? |
| Provide the admin user ID or admin team ID for (2 steps: | yes |
Update | yes | |
Set (2 steps: | no | |
Provide the absolute path to the data directory, denoted as | yes | |
Set | no | |
Add metrics and scores to | no | |
| Update the base image if the validation code is not Python | no |
Remove the sample validation code and replace with validation code for the Challenge | yes | |
| Update the base image if the validation code is not Python | no |
Remove the sample scoring code and replace with scoring code for the Challenge | yes |
Push the changes up to GitHub when done.
12. On the instance, change directories to SynapseWorkflowOrchestrator/
and kick-start the orchestrator with:
docker-compose up -d
where -d
will make it so the orchestrator is run in the background. This will enable you to log out of the instance without terminating the orchestrator.
If validate.cwl
/score.cwl
is using a Docker image instead of inline code: you must first pull that image onto the instance before starting the orchestrator. Otherwise, the orchestrator will fail, stating that the image cannot be found.
If successful, the orchestrator will continuously monitor the Evaluation queues specified by EVALUATION_TEMPLATES
for submissions with the status, RECEIVED
. Once it encounters a RECEIVED
submission, it will run the workflow as specified by ROOT_TEMPLATE and update the submission status from RECEIVED
to EVALUATION_IN_PROGRESS
. If an error is encountered during any of the workflow steps, the orchestrator will update the submission status to INVALID
and the workflow will stop. If, instead, the workflow finishes to completion, the orchestrator will update the submission status to ACCEPTED
. Depending on how the workflow is set up (configured by Step 11), Participants may periodically be notified by email of their submission's progress.
13. To make changes to the .env
file (e.g. change the number of concurrent submissions), stop the orchestrator with:
docker-compose down
Once you are done making updates, save the file and restart the orchestrator to apply the changes.
Log Files
As it’s running, the orchestrator will upload logs and prediction files to the folder as specified by WORKFLOW_OUTPUT_ROOT_ENTITY_ID
. For each submitter, two folders will be created:
<submitterid>
<submitterid>_LOCKED
where Docker and TOIL logs are uploaded to <submitterid>
and prediction files are uploaded to <submitterid>_LOCKED
. Note that the LOCKED folders will NOT be accessible to the Participants – this is to help prevent data leakage.
The directory structure of “Logs” will look something like this:
Logs ├── submitteridA │ ├── submission01 │ │ ├── submission01_log.txt │ │ └── submission01_logs.zip │ ├── submission02 │ │ ├── submission02_log.txt │ │ └── submission02_logs.zip │ ... │ ├── submitteridA_LOCKED │ ├── submission01 │ │ └── predictions.csv │ ├── submission02 │ │ └── predictions.csv │ ... │ ...
Launch the Challenge
Before proceeding with the launch, contact Sage Governance to ensure that a clickwrap is in-place for Challenge registration. You will need to provide Governance with the Synapse ID to the live Challenge site, as well as the team ID of the Participants team.
Requirements
Synapse account
Python 3.7+
synapseclient
To launch the Challenge, that is, to copy the Wiki pages of the staging site over to the live site, use synapseutils' copyWiki
in a Python script, e.g.
import synapseclient import synapseutils syn = synapseclient.login() synapseutils.copyWiki( syn, "syn1234", # Synapse ID of staging site destinationId="syn2345", # Synapse ID of live site destinationSubPageId=999999 # ID following ../wiki/ of live site URL )
When using copyWiki
, it is important to specify the destinationSubPageId
parameter. This ID can be found in the URL of the live site, where it is the integer following .../wiki/<some number>
.
Once copyWiki
has been used once, DO NOT USE IT AGAIN!!
Following this action, all changes to the live site should now be synced over with challengeutils' mirrow-wiki
. More on updating the Wikis under the Update the Challenge section below.
In addition to copying over the Wiki pages, share all needed Evaluation queues with the Participants team with Can submit
permissions. Once the Challenge is over, we recommend updating their permissions to Can view
(this help keep their interface clean).
Monitor the Submissions
As Challenge Organizers, we recommend creating a Submission View to easily track and monitor submissions as they come in. This table will especially be useful when Participants need help with their submissions.
Learn more about revealing scores and adding leaderboards in Evaluation Queues.
Steps
1. Go to the staging site and click on the Tables tab. Create a new Submission View by clicking on Add New… > Add Submission View.
2. Under "Scope", add the Evaluation queue(s) you are interested in monitoring (you may add more than one), then click Next. On the next screen, select which information to display - this is known as its schema.
We recommend the following schema for monitoring Challenge submissions:
Column Name | Description | Facet values? |
---|---|---|
| Submission ID | Not recommended |
| Date and time of the submission (in Epoch, but rendered as | |
| User or team who submitted (user or team ID, but rendered as username or team name) | Recommended |
| (recommended for model-to-data) Docker image name | Not recommended |
| (recommended for model-to-data) Docker SHA digest | Not recommended |
| Workflow status of the submission (one of [ | Recommended |
| Evaluation status of the submission (one of [None, | Recommended |
| (if any) Validation errors for the predictions file | Not recommended |
| Synapse ID to the submission’s logs folder | Not recommended |
| Synapse ID to the predictions file (if any) | Not recommended |
(any annotations related to scores) | Submission annotations - names used depends on what annotations were used in the scoring step of the workflow |
The highlighted columns would need to be added manually by clicking the + Add Column button at the bottom of the Edit Columns window.
3. Click Save. A table of the submissions and their metadata will now be available for viewing and querying. Changes to the information displayed can be edited by clicking on the schema icon, followed by Edit Schema:
Update the Challenge
Challenge Site and Wikis
Any changes to the Challenge site and its wiki/sub-wiki contents must be done in the staging site, not live. To update:
1. Make whatever changes needed to the staging project.
2. Use challengeutils' mirror-wiki
to push the changes to the live project.
Using the --dryrun
flag prior to officially mirroring can be helpful in ensuring that the pages to be updated are actually the ones intended.
Evaluation Queue Quotas
To update the submission quotas, go to the Synapse live site, then head to the Challenge tab. Edit the Evaluation queues as needed:
Example 1
A Challenge has 4 rounds, each round has a limit of 3 submissions per submitter/team. To implement this quota, click Edit on the desired queue, click Add Round and input the Duration and Submission Limits.
Example 2
A Challenge lasts for 4 months with a daily submission limit of 3. To implement this quota, click Edit on the desired queue, click Add Round and input the Duration and click Advanced Limits to pick daily/weekly/monthly limits.
Workflow Steps
For any changes to the infrastructure workflow steps and/or scripts involved with the workflow (e.g. run_docker.py), simply make the edits to the scripts, then push the changes.
Note: dry-runs should always follow a change to the workflow; this will ensure things are still working as expected.