One of the features to the Synapse.org platform is the ability to host a crowd-sourced challenge. Hosting challenges are a great way to crowd-source new computational methods for fundamental questions in systems biology and translational medicine.
...
Setting up the infrastructure for submission evaluation
Launching and updating the challenge
Monitoring the submissions
...
Before You Begin
🖥️ Required Compute Power
At Sage Bionetworks, we generally provision an AWS EC2 Linux instance to run infrastructure of a challenge, leveraging the SynapseWorkflowOrchestrator to run CWL workflows. These workflows will be responsible for evaluating and scoring submissions.
If Sage is responsible for providing the cloud compute services: we ask that you give us a general estimate of the computing power needed to validate and score the submissions (for example: memory, volume, GPU required?, …).
📋 Using Sensitive Data as Challenge Data
Challenge data can be hosted on Synapse. If the data is sensitive (for example, human data), Synapse can apply access restrictions so that legal requirements are met before participants can access them. Contact the Synapse Access and Compliance Team (act@sagebase.org) for support with the necessary data access procedures for sensitive data.
🛑 Restricted Data Access
If data cannot leave the external site or data provider, it will be the data contributor’s responsibility to set up the challenge infrastructure. Contact the Challenges and Benchmarking team (cnb@sagebase.org) for consultations if needed.
...
Note that the steps outlined in this article will assume the orchestrator will be used.
...
Challenge Infrastructure Setup
...
Requirements
Synapse account
Python 3.7+
(for local testing) CWL runner of choice, e.g. cwltool
Access to cloud compute services, e.g. AWS, GCP, etc.
Outcome
This infrastructure setup will continuously monitor the challenge’s evaluation queue(s) for new submissions. Once a submission is received, it will undergo evaluation including validation and scoring. All submissions will be downloadable to the challenge organizers, including the Docker image (if model-to-data challenge) and/or prediction files. Participants may periodically receive email notifications about their submissions (such as status, scores), depending on the infrastructure configurations.
Steps
1. Create a GitHub repository for the challenge workflow infrastructure. For the orchestrator to work, this repo must be public.
...
Once you are done making updates, save the file and restart the orchestrator to apply the changes.
Log Files
As it’s running, the orchestrator will upload logs and prediction files to the “Logs” folder. For each user or team that submits to the challenge, two folders will be created:
...
Code Block |
---|
Logs ├── submitteridA │ ├── submission01 │ │ ├── submission01_log.txt │ │ └── submission01_logs.zip │ ├── submission02 │ │ ├── submission02_log.txt │ │ └── submission02_logs.zip │ ... │ ├── submitteridA_LOCKED │ ├── submission01 │ │ └── predictions.csv │ ├── submission02 │ │ └── predictions.csv │ ... │ ... |
...
Launch the Challenge
Requirements
Synapse account
Python 3.7+
Important TODOs:
Before proceeding with the launch, we recommend contacting Sage Governance to add a clickwrap for challenge registration. With a clickwrap in-place, interested participants can only be registered if they agree to the terms and conditions of the challenge data usage.
If you are a Sage employee: submit a Jira ticket to the Governance board with the synID of the live project, as well as the team ID of the participants team.
Share all needed evaluation queues with the participants team with
Can submit
permissions. Once the challenge is over, we recommend updating the permissions toCan view
to prevent late submissions.We also recommend sharing the evaluation queues with the general public so that the leaderboards are openly accessible.
After the challenge is launched, create a folder called “Data” and update its Sharing Settings. Share the “Data” folder with the participants team only. Do not make the folder public or accessible to all registered Synapse users. The sharing settings of the “Data” folder should look something like this:
Upload any challenge data that is to be provided to the participants to the “Data” Folder. DO NOT UPLOAD DATA until you have updated its sharing settings.
...
When using copyWiki
, it is important to specify the destinationSubPageId
parameter. This ID can be found in the URL of the live project, where it is the number following .../wiki/
.
Once copyWiki
has been used once, DO NOT RUN IT AGAIN!
Once the wiki has been copied over, all changes to the live project should now be synced with challengeutils' mirrow-wiki
.
Panel | ||||||||
---|---|---|---|---|---|---|---|---|
| ||||||||
Learn more about how to Update the Challenge below. |
...
Monitor the Submissions
As challenge organizers, we recommend creating a Submission View to easily track and monitor submissions as they come in. This table will especially be useful when participants need help with their submissions.
Panel | ||||||||
---|---|---|---|---|---|---|---|---|
| ||||||||
Learn more about revealing scores and adding leaderboards in Evaluation Queues. |
Steps
1. Go to the staging project and click on the Tables tab. Create a new Submission View by clicking on Add New… > Add Submission View.
...
Column Name | Description | Facet values? |
---|---|---|
| Evaluation ID (evaluation ID, but rendered as evaluation name) – recommended for SubmissionViews with multiple queues in scope | Recommended |
| Submission ID | |
| Date and time of the submission (in Epoch, but rendered as | |
| User or team who submitted (user or team ID, but rendered as username or team name) | Recommended |
| Docker image name – recommended for model-to-data challenges | Not recommended |
| Docker SHA digest – recommended for model-to-data challenges | Not recommended |
| Workflow status of the submission (one of [ | Recommended |
| Evaluation status of the submission (one of [None, | Recommended |
| (if any) Validation errors for the predictions file | Not recommended |
| Synapse ID to the submission’s logs folder | Not recommended |
| Synapse ID to the predictions file (if any) | Not recommended |
(any annotations related to scores) | Submission annotations - names used depends on what annotations were used in the scoring step of the workflow |
...
3. Click Save. A table of the submissions and their metadata will now be available for viewing and querying. Changes to the information displayed can be edited by clicking on the schema icon, followed by Edit Schema:
...
Update the Challenge
Challenge Site and Wikis
1. Make whatever changes needed to the staging project.
...
Use --dryrun
to optionally preview which pages will be updated prior to doing an official sync.
Extending the Deadline
It is not unheard of for there to be a change in the submission deadline.
...
edit the Round End of an existing round; or
add a new round that will immediately start after the current one
Workflow Steps
For any changes to the CWL scripts or run_docker.py, simply make the edits to the scripts, then push the changes. We highly recommend conducting dryruns whenever there is a change to the workflow, so that errors are addressed in a timely manner.
Evaluation Docker Image
If your workflow is using a Docker image in validate.cwl
and/or score.cwl
, and updates were made, pull the latest changes on the instance with:
...