One of the features to the Synapse.org platform is the ability to host a crowd-sourced challenge. Hosting challenges are a great way to crowd-source new computational methods for fundamental questions in systems biology and translational medicine.
Learn more about challenges and see examples of past/current projects by visiting Challenges and Benchmarking.
Running a challenge on Synapse will require creating a challenge space for participants to learn about the challenge, join the challenge community, submit entries, and view results. This article is aimed at challenge organizers, and will focus on:
Setting up the infrastructure for submission evaluation
Launching the challenge
Updating the challenge
Closing the challenge
Monitoring the submissions
Before You Begin
🖥️ Required Compute Power
At Sage Bionetworks, we generally provision an AWS EC2 Linux instance to run infrastructure of a challenge, leveraging the SynapseWorkflowOrchestrator to run CWL workflows. These workflows will be responsible for evaluating and scoring submissions.
If Sage is responsible for providing the cloud compute services: we ask that you give us a general estimate of the computing power needed to validate and score the submissions (for example: memory, volume, GPU required?, …).
📋 Using Sensitive Data as Challenge Data
Challenge data can be hosted on Synapse. If the data is sensitive (for example, human data), Synapse can apply access restrictions so that legal requirements are met before participants can access them. Contact the Synapse Access and Compliance Team (act@sagebase.org) for support with the necessary data access procedures for sensitive data.
🛑 Restricted Data Access
If data cannot leave the external site or data provider, it will be the data contributor’s responsibility to set up the challenge infrastructure. Contact the Challenges and Benchmarking team (cnb@sagebase.org) for consultations if needed.
To set up the infrastructure, you may follow Sage’s approach of using the SynapseWorkflowOrchestrator. The following will be required to use the orchestrator:
Support for Docker
(ideally) Support for
docker-compose
If Docker is not allowed, then support for Singularity and Java 8 will be a must
Note that the steps outlined in this article will assume the orchestrator will be used.
Challenge Infrastructure Setup
Requirements
Synapse account
Python 3.7+
(for local testing) CWL runner of choice, e.g. cwltool
Access to cloud compute services, e.g. AWS, GCP, etc.
Outcome
This infrastructure setup will continuously monitor the challenge’s evaluation queue(s) for new submissions. Once a submission is received, it will undergo evaluation including validation and scoring. All submissions will be downloadable to the challenge organizers, including the Docker image (if model-to-data challenge) and/or prediction files. Participants may periodically receive email notifications about their submissions (such as status, scores), depending on the infrastructure configurations.
Steps
1. Create a GitHub repository for the challenge workflow infrastructure. For the orchestrator to work, this repo must be public.
Two templates are available in Sage-Bionetworks-Challenges that you may use as a starting point. Their READMEs outline what will need to be updated within the scripts (under Configurations), but we will return to this later in Step 12.
Workflow Template | Submission Type |
---|---|
Flat files, like CSV files | |
Docker images |
2. Create the challenge space on Synapse with challengeutils' create-challenge
:
challengeutils create-challenge "challenge_name"
This command will create two Synapse projects:
Staging - Organizers will use this project during challenge planning and development to share files and draft the wiki content.
create-challenge
will intialize the wiki with the DREAM Challenge Wiki Template.Live - Organizers will use this project as the pre-registration page during challenge development. When the challenge is ready for launch, the project will then be replaced with the contents from staging.
We encourage you to use the staging project to make all edits and preview them before officially pushing the updates over to the live project.
See Update the Challenge below to learn more about syncing changes from staging to live.
create-challenge
will also create four Synapse teams for the challenge:
Pre-registrants - This team is used when the challenge is under development. It allows interested Synapse users to join a mailing list to receive notification of challenge launch news.
Participants - Once the challenge is launched, Synapse users will join this team in order to download the challenge data and make submissions.
Organizers - Synapse users added to this team will have the ability to share files and edit wikis on the staging project. Add users as needed.
Admin - Synapse users added to this team will have administrator access to both the live and staging projects. Organizers do not need to be Administrators. Ideally, all admins must have a good understanding of Synapse. Add users as needed.
3. On the live project, go to the Challenge tab and create as many evaluation queues as needed (for example, one per question/task) by clicking on Challenge Tools > Create Evaluation Queue. create-challenge
will create one evaluation queue by default.
The 7-digits in the parentheses following each evaluation queue name is the evaluation ID. You will need these ID(s) later in Step 11.
4. While still on the live project, go to the Files tab and create a new folder called “Logs” by clicking on the add-folder icon:
This folder will contain the participants' submission logs and prediction files (if any). Make note of its synID for use later in Step 11.
5. On the staging project, go to the Files tab and click on the upload icon to Upload or Link to a File:
6. In the pop-up window, switch tabs to Link to URL. For “URL”, enter the web address to the zipped download of the workflow repo. You may get this address by going to the repo and clicking on Code > right-clicking Download Zip > Copy Link Address:
Click Save.
This file will be what links the evaluation queue to the orchestrator. Make note of its synID for use later in Step 11.
7. Add an annotation to the file called ROOT_TEMPLATE
. This annotation will be used by the orchestrator to determine which file among the repo is the workflow script. Click on the annotations icon, followed by Edit:
8. For “Value”, enter the filepath to the workflow script as if you had downloaded the repo as a ZIP. For example, model-to-data-challenge-workflow would be downloaded and unzipped as model-to-data-challenge-workflow-main
and the path to the workflow script is workflow.cwl
:
In this example, “Value” will be model-to-data-challenge-workflow-main/workflow.cwl
. For the most part, “Value” should look something like this:
{name of repo}-{branch}/workflow.cwl
9. Create a cloud compute environment with the required memory and volume specifications, then log into the instance.
If you are a Sage employee: you can follow our internal instructions on /wiki/spaces/CHAL/pages/2806087732 here.
If you are not a Sage employee: follow the instructions listed under "Setting up linux environment" to install and run Docker as well as
docker-compose
onto the compute environment of choice.
10. On the instance, clone the SynapseWorkflowOrchestrator repo if it’s not already available on the machine. Change directories to SynapseWorkflowOrchestrator/
and create a copy of the .envTemplate
file as .env
(or rename it to .env
):
cd SynapseWorkflowOrchestrator/ cp .envTemplate .env
11. Open .env
and enter values for the following config variables:
Property | Description | Example |
| Synapse credentials under which the orchestrator will run. The provided user must have access to the evaluation queue(s) being serviced. |
|
| Password for This can be found under My Dashboard > Settings. |
|
| synID for "Logs" folder. Use the synID from Step 4. |
|
| JSON map of evaluation IDs to the workflow repo archive, where the key is the evaluation ID and the value is the link address to the archive. Use the evaluation IDs from Step 3 as the key(s) and the synIDs from Step 5 as the value(s). |
|
Refer to the "Running the Orchestrator with Docker containers" README section for additional configuration options.
12. Clone the workflow repo. Using a text editor or IDE, make the following updates to the following scripts:
Push the changes up to GitHub when done.
13. On the instance, change directories to SynapseWorkflowOrchestrator/
and kick-start the orchestrator with:
docker-compose up -d
where -d
will run orchestrator in the background. This will allow you to exit the instance without terminating the orchestrator.
If validate.cwl
/score.cwl
is using a Docker image instead of inline code: you must first pull that image onto the instance before starting the orchestrator. Otherwise, the orchestrator will fail, stating that the image cannot be found.
14. To make changes to the .env
file (such as updating the number of concurrent submissions), stop the orchestrator with:
docker-compose down
Once you are done making updates, save the file and restart the orchestrator to apply the changes.
Log Files
As it’s running, the orchestrator will upload logs and prediction files to the “Logs” folder. For each user or team that submits to the challenge, two folders will be created:
<submitterid>/
<submitterid>_LOCKED/
where Docker and TOIL logs are uploaded to <submitterid>/
, and prediction files are uploaded to <submitterid>_LOCKED/
. Note that the LOCKED folders will not be accessible to the participants, in order to prevent data leakage.
The directory structure of “Logs” will look something like this:
Logs ├── submitteridA │ ├── submission01 │ │ ├── submission01_log.txt │ │ └── submission01_logs.zip │ ├── submission02 │ │ ├── submission02_log.txt │ │ └── submission02_logs.zip │ ... │ ├── submitteridA_LOCKED │ ├── submission01 │ │ └── predictions.csv │ ├── submission02 │ │ └── predictions.csv │ ... │ ...
Launch the Challenge
Requirements
Synapse account
Python 3.7+
Important TODOs:
Before proceeding with the launch, we recommend contacting Sage Governance to add a clickwrap for challenge registration. With a clickwrap in-place, interested participants can only be registered if they agree to the terms and conditions of the challenge data usage.
If you are a Sage employee: submit a Jira ticket to the Governance board with the synID of the live project, as well as the team ID of the participants team.
Share all needed evaluation queues with the participants team with
Can submit
permissions. Once the challenge is over, we recommend updating the permissions toCan view
to prevent late submissions.We also recommend sharing the evaluation queues with the general public so that the leaderboards are openly accessible.
After the challenge is launched, create a folder called “Data” and update its Sharing Settings. Share the “Data” folder with the participants team only. Do not make the folder public or accessible to all registered Synapse users. The sharing settings of the “Data” folder should look something like this:
Upload any challenge data that is to be provided to the participants to the “Data” Folder. DO NOT UPLOAD DATA until you have updated its sharing settings.
To launch the Challenge, that is, to copy the wiki pages of the staging project over to the live project, use synapseutils' copyWiki()
in a Python script.
For example:
import synapseclient import synapseutils syn = synapseclient.login() synapseutils.copyWiki( syn, "syn1234", # synID of staging site destinationId="syn2345", # synID of live site destinationSubPageId=999999 # ID following ../wiki/ of live project URL )
When using copyWiki
, it is important to specify the destinationSubPageId
parameter. This ID can be found in the URL of the live project, where it is the number following .../wiki/
.
Once copyWiki
has been used once, DO NOT RUN IT AGAIN!
Once the wiki has been copied over, all changes to the live project should now be synced with challengeutils' mirrow-wiki
.
Learn more about how to Update the Challenge below.
Monitor the Submissions
As challenge organizers, we recommend creating a Submission View to track and monitor submissions as they come in. This table will especially be useful when participants need help with their submissions.
Learn more about revealing scores and adding leaderboards in Evaluation Queues.
Steps
1. Go to the staging project and click on the Tables tab. Create a new Submission View by clicking on Add New… > Add Submission View.
2. Under "Scope", add evaluation queue(s) you are interested in monitoring. More than one queue can be added. Click Next. On the following screen, select which information to display - this is known as the schema.
We recommend the following schema for monitoring challenge submissions:
Column Name | Description | Facet values? |
---|---|---|
| Evaluation ID (evaluation ID, but rendered as evaluation name) – recommended for SubmissionViews with multiple queues in scope | Recommended |
| Submission ID | |
| Date and time of the submission (in Epoch, but rendered as | |
| User or team who submitted (user or team ID, but rendered as username or team name) | Recommended |
| Docker image name – recommended for model-to-data challenges | Not recommended |
| Docker SHA digest – recommended for model-to-data challenges | Not recommended |
| Workflow status of the submission (one of [ | Recommended |
| Evaluation status of the submission (one of [None, | Recommended |
| (if any) Validation errors for the predictions file | Not recommended |
| synID to the submission’s logs folder | Not recommended |
| synID to the predictions file (if any) | Not recommended |
(any annotations related to scores) | Submission annotations - names used depends on what annotations were used in the scoring step of the workflow |
The highlighted columns would need to be added manually by clicking the + Add Column button at the bottom of the Edit Columns window.
3. Click Save. A table of the submissions and their metadata will now be available for viewing and querying. Changes to the information displayed can be edited by clicking on the schema icon, followed by Edit Schema:
Update the Challenge
Updating Existing Wikis
1. Go to the staging project and navigate to the page(s) you wish to edit. Click on the pencil icon to Edit Project Wiki:
Make edits as needed, then click Save.
2. Use challengeutils' mirror-wiki
to push the changes to the live project:
challengeutils mirror-wiki staging_synid live_synid [--dryrun]
Use --dryrun
to optionally preview which pages will be updated prior to doing an official sync.
Adding a New Wiki Page
1. Go to the staging project, click on Wiki Tools > Add Wiki Subpage. Enter a page title, then click OK. A new page should now be available.
2. On the new page, click on the pencil icon to Edit Project Wiki:
Add the page content, then click Save.
3. Go to the live project and create a new wiki page with the same name as the new page in the staging project. mirror-wiki
depends on the page titles to be the same for synchronization.
4. Use challengeutils' mirror-wiki
to push the changes to the live project:
challengeutils mirror-wiki staging_synid live_synid [--dryrun]
Use --dryrun
to optionally preview which pages will be updated prior to doing an official sync.
Extending the Deadline
It is not unheard of for there to be a change in the submission deadline. To extend the submission deadline date/time, you can either:
edit the Round End of an existing round; or
add a new round that will immediately start after the current one (note that this approach will reset everyone’s submission limit)
We also recommend notifying the participants of any changes to the timeline by posting an announcement.
Learn more posting announcements in Utilizing the Discussion Board.
Updating the Workflow
For any changes to the CWL scripts or run_docker.py, make edits as needed to the scripts, then push the changes. We highly recommend conducting dryruns immediately after, so that errors are addressed in a timely manner.
Evaluation Docker Image
If your workflow is using a Docker image in validate.cwl
and/or score.cwl
, and updates were made, pull the latest changes on the instance with:
docker pull <image name>:<version>
Close the Challenge
Important TODOs:
Registration: remove/hide all "Register" buttons from the challenge site
Pages to search through: main page, Participation Overview, Submission Tutorial(s)
Past challenge as reference:
You can replace the registration button with an alert well like this:
<div class="alert alert-info"> <h4><strong>Registration is closed.</strong></h4> <p>Thank you to everyone who joined the challenge!</p> </div>
Data access: disallow users from joining the participant team, thus barring them from accessing the challenge data after the challenge has completed
Go to Participants team page
Click on Team Actions button > Edit Team
Under Access, select “Team is locked, users may not join or request access. New users must be invited by a team manager.”
Evaluation queues: update the participants team’s permissions from
Can submit
toCan view
Writeups: link writeups to the final submissions
See Collecting Writeups for more information.
Instances: stop and terminate all cloud compute instances for the challenge.
🌟 For additional assistance or guidance, contact the Challenges and Benchmarking team at cnb@sagebase.org.