Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

One of the features of One of the features to the Synapse.org platform is the ability to host a crowd-sourced Challenge. Challenges on Synapse are meant to be community competitions, designed challenge. Hosting challenges are a great way to crowd-source new computational methods for fundamental questions in systems biology and translational medicine.

Learn more about Challenges challenges and see examples of past/current projects by visiting Challenges and Benchmarking.

Running a Challenge challenge on Synapse will require creating a Challenge challenge space for Participants to participants to learn about the Challengechallenge, join the Challenge challenge community, submit entries, track progress, and view results. This article is aimed at Challenge Organizerschallenge organizers, and will focus on:

  • Setting up the infrastructure

  • Launching and updating the Challengechallenge

  • Monitoring the submissions

...

Before You Begin

🖥️ Required Compute Power

At Sage Bionetworks, we generally provision an AWS EC2 Linux instance to run infrastructure of a Challengechallenge, leveraging the SynapseWorkflowOrchestrator to run CWL workflows.  These workflows will be responsible for evaluating and scoring submissions.

  • If Sage is responsible for providing the cloud compute services: we ask that you give us a general estimate of the computing power needed (e.g. to validate and score the submissions (for example: memory, volume, GPU required?, etc).

📋 Using Sensitive Data as Challenge Data

Synapse has the ability to apply access restrictions to sensitive data (e.g. Challenge data can be hosted on Synapse. If the data is sensitive (for example, human data), Synapse can apply access restrictions so that legal requirements are met before Participants participants can access it. If human data are being used in the Challenge, or if you have any questions about the sensitivity of the Challenge data, contact the them. Contact the Synapse Access and Compliance Team (act@sagebase.org) for support with the necessary data access procedures for sensitive data.

🛑 Restricted Data Access

If data is sensitive and cannot be hosted on Synapse (e.g. it cannot leave the external site or data provider), provide a remote server with the following:

...

, it will be the data contributor’s responsibility to set up the challenge infrastructure. Contact the Challenges and Benchmarking team (cnb@sagebase.org) for consultations if needed.

To set up the infrastructure, you may follow Sage’s approach of using the SynapseWorkflowOrchestrator. The following will be required to use the orchestrator:

Note that it will be the data contributor’s responsibility to set up the infrastructure if Sage employees are not allowed access to the remote server. However, we are happy to consult if neededNote that the steps outlined in this article will assume the orchestrator will be used.

...

Challenge Infrastructure Setup

...

Outcome

Once the infrastructure is set up, the selected Evaluation This infrastructure setup will continuously monitor the challenge’s evaluation queue(s) will continuously be monitored for new submissions. Once a submission is received, it will undergo evaluation including validation and scoring. All submissions will be accessible downloadable to the Challenge Organizerschallenge organizers, including the Docker image (if model-to-data Challengechallenge) and/or prediction files. Participants may periodically receive email notifications about their submissions (e.g. such as status, scores), depending on the infrastructure configurations.

Steps

1. Create a workflow infrastructure GitHub repository for the Challengechallenge workflow infrastructure. For the orchestrator to work, the repository this repo must be public.

We have created two templates Two templates are available in Sage-Bionetworks-Challenges that you may use as a starting point. Their READMEs outline what will need to be updated within the scripts (under Configurations), but we will return to this later in Step 1012.

Workflow Template

Submission Type

data-to-model-challenge-workflow

Flat files (e.g. CSV), like CSV files

model-to-data-challenge-workflow

Docker images

2. Create the Challenge site challenge space on Synapse with challengeutils' create-challenge:

...

  • Staging - Organizers will use this project during the Challenge challenge planning and development to share files and draft the Challenge wikiwiki content. The  create-challenge command initializes will intialize the wiki with the DREAM Challenge Wiki Template.

  • Live - Organizers will use this project as the pre-registration page during Challenge challenge development. When the Challenge challenge is ready for launch, the project will then be replaced with the contents from staging.

You may think of these two projects as development (staging project) and production (live project), in that all edits must be done in the staging site, NOT the live site. Maintenance of both projects enables wiki content to be edited and previewed in the staging project before the content is published to the live project. Changes to the live site are synced over with challengeutils' mirror-wiki (see Update the Challenge below for more info).

Info

At first, the live site will be just one-pager where a general overview about the Challenge is provided.  There will also be a Pre-register button that Synapse users can click on if they are interested in staying notified about the Challenge.

create-challenge will also create four /wiki/spaces/DOCS/pages/1985446029 for the Challenge:

...

Participants - This Synapse team includes the individual participants and teams who register to the Challenge.

...

Organizers - The Challenge organizers must be added to this list to provide the permissions to share files and edit the wiki on the staging project.

...

We encourage you to use the staging project to make all edits and preview them before officially pushing the updates over to the live project.

Panel
panelIconId27a1
panelIcon:arrow_right:
panelIconText➡️
bgColor#F4F5F7

See Update the Challenge below to learn more about syncing changes from staging to live.

create-challenge will also create four Synapse teams for the challenge:

  • Pre-registrants - This team is used when the challenge is under development. It allows interested Synapse users to join a mailing list to receive notification of challenge launch news.

  • Participants - Once the challenge is launched, Synapse users will join this team in order to download the challenge data and make submissions.

  • Organizers - Synapse users added to this team will have the ability to share files and edit wikis on the staging project. Add users as needed.

  • Admin - Synapse users added to this team will have administrator access to both the live and staging project projects. Organizers do not need to be Administrators. Ideally, all admin admins must have a good understanding of Synapse.

  • Pre-registrants - This team is recommended for when the Challenge is under development. It allows participants to join a mailing list to receive notification of Challenge launch news.

Add Synapse users to the Organizers and Admin teams as needed.

...

  • Add users as needed.

3. On the live project, go to the Challenge tab and create as many evaluation queues as needed (for example, one per question/task) by clicking on Challenge Tools > Create Evaluation QueueBy default, create-challenge will create an one evaluation queue for writeups. More information on writeups and how to collect them here.by default.

Panel
panelIconId270f
panelIcon:pencil2:
panelIconText
bgColor#E3FCEF

The 7-digits in the parentheses following each evaluation queue name is its the evaluation ID. You will need these IDs ID(s) later in Step 1011.

...

4. While still on the live siteproject, go to the Files tab and create a new folder called "Logs" “Logs” by clicking on the add-folder icon:

...

...

Panel
panelIconId270f
panelIcon:pencil2:
panelIconText
bgColor#E3FCEF

This folder will contain the participants' submission logs and prediction files (if any). Make note of this Folder ID its synID for use later in Step 1011.

5. On the staging siteproject, go to the Files tab and click on the upload icon to Upload or Link to a File:

...


6. In the pop-up window, switch tabs to Link to URL. For "URL"“URL”, enter the web address to the zipped download of the workflow infrastructure repositoryrepo.  You may get this address by going to the repository repo and clicking on Code > right-clicking Download Zip > Copy Link Address:

...

Name the file (e.g. "workflow"), then click Save.Click Save.

Panel
panelIconId270f
panelIcon:pencil2:
panelIconText
bgColor#E3FCEF

This file will be what links the evaluation queue to the orchestrator. Make note of this File ID its synID for use later in Step 1011.

7. Add an annotation to the file called ROOT_TEMPLATE by clicking on Files Tools > Annotations > Edit.  The "Value" will be the path to the workflow script, written as:

{infrastructure workflow repo}-{branch}/path/to/workflow.cwl

For example, this is the path to workflow.cwl of the model-to-data template repo:

. This annotation will be used by the orchestrator to determine which file among the repo is the workflow script. Click on the annotations icon, followed by Edit:

...

8. For “Value”, enter the filepath to the workflow script as if you had downloaded the repo as a ZIP. For example, model-to-data-challenge-workflow would be downloaded and unzipped as model-to-data-challenge-workflow-main

...

and the path to the workflow script is workflow.cwl:

Info

The ROOT_TEMPLATE annotation is what the orchestrator uses to determine which file among the repo is the workflow script.

...

In this example, “Value” will be model-to-data-challenge-workflow-main/workflow.cwl. For the most part, “Value” should look something like this:

{name of repo}-{branch}/workflow.cwl

9. Create a cloud compute environment with the required memory and volume specifications, then SSH log into the instance.

910. On the instance, clone the SynapseWorkflowOrchestrator repo if neededit’s not already available on the machine. Change directories to SynapseWorkflowOrchestrator/ and create a copy of the .envTemplate file as .env (or simply rename it to .env):

Code Block
cd SynapseWorkflowOrchestrator/
cp .envTemplate .env

1011. Open .env and enter values for the following property config variables:

Property

Description

Example

SYNAPSE_USERNAME

Synapse credentials under which the orchestrator will run.  

The provided user must have access to the evaluation queue(s) being serviced.

dream_user

SYNAPSE_PASSWORD

Password for SYNAPSE_USERNAME.

This can be found under My Dashboard > Settings.

"abcdefghi1234=="

WORKFLOW_OUTPUT_ROOT_ENTITY_ID

Synapse ID for "Logs" folder.

Use the Synapse ID synID from Step 4.

syn123

EVALUATION_TEMPLATES

JSON map of evaluation IDs to the workflow repo archive, where the key is the evaluation ID and the value is the link address to the archive.

Use the evaluation IDs from Step 3 as the key(s) and the Synapse ID synIDs from Step 5 as the value(s).

{

"9810678": "syn456", 

 "9810679": "syn456"

}

...

Panel
panelIconId27a1
panelIcon:arrow_right:
panelIconText➡️
bgColor#F4F5F7

Refer to the "Running the Orchestrator with Docker containers"

...

README section for

...

additional configuration options.

1112. Return to Clone the workflow infrastructure repo and clone it onto your local machine.  Using a text editor or IDE, make the following updates to the following scripts:

...

Expand
titleIf using data-to-model template:

Script

TODO

Required?

workflow.cwl

Update synapseid to the

...

synID of the Challenge's goldstandard/groundtruth file

yes

Set errors_only to false if an email notification about a valid submission should also be sent

no

Add metrics and scores to private_annotations if they are to be withheld from the

...

participants

no

validate.cwl

Update the base image if the validation code is not Python

no

Remove the sample validation code and replace with validation code for the

...

challenge

yes

score.cwl

Update the base image if the validation code is not Python

no

Remove the sample scoring code and replace with scoring code for the

...

challenge

yes

...

Expand
titleIf using model-to-data template:

Script

TODO

Required?

workflow.cwl

Provide the admin user ID or admin team ID for principalid 

(2 steps: set_submitter_folder_permissions, set_admin_folder_permissions)

yes

Update synapseid to the Synapse ID of the Challenge's goldstandard

yes

Set errors_only to false if an email notification about a valid submission should also be sent

(2 steps: email_docker_validation, email_validation)

no

Provide the absolute path to the data directory, denoted as input_dir, to be mounted during the container runs.

yes

Set store to false if log files should be withheld from the Participants

no

Add metrics and scores to private_annotations if they are to be withheld from the Participants

no

validate.cwl

Update the base image if the validation code is not Python

no

Remove the sample validation code and replace with validation code for the Challenge

yes

score.cwl

Update the base image if the validation code is not Python

no

Remove the sample scoring code and replace with scoring code for the Challenge

yes

Push the changes up to GitHub when done.

1213. On the instance, change directories to SynapseWorkflowOrchestrator/ and kick-start the orchestrator with:

Code Block
docker-compose up -d

where -d will make it so the orchestrator is run run orchestrator in the background. This will enable allow you to log out of exit the instance without terminating the orchestrator.

Infonote

If validate.cwl/score.cwl is using a Docker image instead of inline code: you must first pull that image onto the instance before starting the orchestrator. Otherwise, the orchestrator will fail, stating that the image cannot be found.

If successful, the orchestrator will continuously monitor the Evaluation queues specified by EVALUATION_TEMPLATES for submissions with the status, RECEIVED.  Once it encounters a RECEIVED submission, it will run the workflow as specified by ROOT_TEMPLATE and update the submission status from RECEIVED to EVALUATION_IN_PROGRESS. If an error is encountered during any of the workflow steps, the orchestrator will update the submission status to INVALID and the workflow will stop.  If, instead, the workflow finishes to completion, the orchestrator will update the submission status to ACCEPTED.  Depending on how the workflow is set up (configured by Step 11), Participants may periodically be notified by email of their submission's progress.

13. To make changes to the .env file (e.g. change the number of concurrent submissions), stop the orchestrator with:

Code Block
docker-compose down

Once you are done making updates, save the file and restart the orchestrator to apply the changes.

Log Files

As it’s running, the orchestrator will upload logs and prediction files to the folder as specified by WORKFLOW_OUTPUT_ROOT_ENTITY_ID. For each submitter, two folders will be created:

  • <submitterid>

  • <submitterid>_LOCKED

where Docker and TOIL logs are uploaded to <submitterid> and prediction files are uploaded to <submitterid>_LOCKED. Note that the LOCKED folders will NOT be accessible to the Participants – this is to help prevent data leakage.

The directory structure of “Logs” will look something like this:

...

...

14. To make changes to the .env file (such as updating the number of concurrent submissions), stop the orchestrator with:

Code Block
docker-compose down

Once you are done making updates, save the file and restart the orchestrator to apply the changes.

Log Files

As it’s running, the orchestrator will upload logs and prediction files to the “Logs” folder. For each user or team that submits to the challenge, two folders will be created:

  • <submitterid>/

  • <submitterid>_LOCKED/

where Docker and TOIL logs are uploaded to <submitterid>/, and prediction files are uploaded to <submitterid>_LOCKED/. Note that the LOCKED folders will not be accessible to the participants, in order to prevent data leakage.

The directory structure of “Logs” will look something like this:

Code Block
Logs
 ├── submitteridA
 │  ├── submission01
 │  │  ├── submission01_log.txt
 │  │  └── submission01_logs.zip
 │  ├── submission02
 │  │  ├── submission02_log.txt
 │  │  └── submission02_logs.zip
 │ ...
 │
 ├── submitteridA_LOCKED
 │  ├── submission01
 │  │  └── predictions.csv
 │  ├── submission02
 │  │  └── predictions.csv
 │ ...
 │
...

...

Important TODOs:

  1. (warning) Before proceeding with the launch, contact Sage Governance to ensure that a clickwrap is in-place for Challenge registration.(warning) You will need to provide Governance with the Synapse ID to the live Challenge sitelaunch, we recommend contacting Sage Governance to add a clickwrap for challenge registration. With a clickwrap in-place, interested participants can only be registered if they agree to the terms and conditions of the challenge data usage.

    • If you are a Sage employee: submit a Jira ticket to the Governance board with the synID of the live project, as well as the team ID of the

    Participants
    • participants team.

  2. Share all needed Evaluation evaluation queues with the Participants participants team with Can submit permissions. Once the Challenge challenge is over, we recommend updating their the permissions to Can view – this will help enhance their user experience with Evaluation queues to prevent late submissions.

  3. We also recommend sharing the Evaluation evaluation queues with the general public so that way, the leaderboards will be publicly accessible. If public leaderboards are not wanted, then at least grant all registered Synapse users with Can view permissions so that Synapse users can view the leaderboard(s).are openly accessible.

  4. After the Challenge challenge is launched, create a Folder named folder called “Data” and update its Sharing Settings. (warning) Share the “Data” Folder folder with the Participants participants team only! (warning) DO NOT . Do not make the Folder folder public or accessible to all registered Synapse users. The Local Sharing Settings for sharing settings of the “Data” Folder folder should look something like this:

    Image RemovedImage Added

  5. Upload any Challenge Data challenge data that is to be provided to the Participants participants to the “Data” Folder. Remember to only do this once you have ensured that the Folder is only accessible to the Participants team DO NOT UPLOAD DATA until you have updated its sharing settings.

To launch the Challenge, that is, to copy the Wiki wiki pages of the staging site project over to the live siteproject, use synapseutils' copyWiki() in a Python script, e.g..

For example:

Code Block
languagepy
import synapseclient
import synapseutils
syn = synapseclient.login()

synapseutils.copyWiki(
   syn, "syn1234",  # Synapse ID of staging site
   destinationId="syn2345",  # Synapse ID of live site
   destinationSubPageId=999999  # ID following ../wiki/ of live 

...

project URL
)

When using copyWiki, it is important to specify the destinationSubPageId parameter.  This ID can be found in the URL of the live siteproject, where it is the integer number following .../wiki/<some number>.

(warning) Once copyWiki has been used once, DO NOT

...

RUN IT AGAIN

...

! (warning)


Following this actionOnce the wiki has been copied over, all changes to the live site should now be synced over with challengeutils' mirrow-wiki. More on updating the Wikis under the Update the Challenge section below.

Monitor the Submissions

As Challenge Organizers, we recommend creating a Submission View to easily track and monitor submissions as they come in. This table will especially be useful when Participants need help with their submissions.

Learn more about revealing scores and adding leaderboards in Evaluation Queues.

Steps

1. Go to the staging site and click on the Tables tab. Create a new Submission View by clicking on Add New… > Add Submission View

2. Under "Scope", add the Evaluation queue(s) you are interested in monitoring (you may add more than one), then click Next. On the next screen, select which information to display - this is known as its schema.

We recommend the following schema for monitoring Challenge submissions:

...

Column Name

...

Description

...

Facet values?

...

evaluationid

...

Evaluation ID (evaluation ID, but rendered as evaluation name) – recommended for SubmissionViews with multiple queues in scope

...

(tick) Recommended

...

id

...

Submission ID

...

createdOn

...

Date and time of the submission (in Epoch, but rendered as MM/dd/yyyy, hh:mm:ss)

...

submitterid

...

User or team who submitted (user or team ID, but rendered as username or team name)

...

(tick) Recommended

...

dockerrepositoryname

...

Docker image name – recommended for model-to-data challenges

...

(error) Not recommended

...

dockerdigest

...

Docker SHA digest – recommended for model-to-data challenges

...

(error) Not recommended

...

status

...

Workflow status of the submission (one of [RECEIVED, EVALUATION_IN_PROGRESS, ACCEPTED, INVALID])

...

(tick) Recommended

...

submission_status

...

Evaluation status of the submission (one of [None, VALIDATED, SCORED, INVALID])

...

(tick) Recommended

...

submission_errors

...

(if any) Validation errors for the predictions file

...

(error) Not recommended

...

orgSagebionetworksSynapseWorkflowOrchestratorSubmissionFolder

...

Synapse ID to the submission’s logs folder

...

(error) Not recommended

...

prediction_fileid

...

Synapse ID to the predictions file (if any)

...

(error) Not recommended

...

(any annotations related to scores)

...

Submission annotations - names used depends on what annotations were used in the scoring step of the workflow

Info

The highlighted columns would need to be added manually by clicking the + Add Column button at the bottom of the Edit Columns window.

3. Click Save. A table of the submissions and their metadata will now be available for viewing and querying. Changes to the information displayed can be edited by clicking on the schema icon, followed by Edit Schema:

...

Update the Challenge

Challenge Site and Wikis

Any changes to the Challenge site and its wiki/sub-wiki contents must be done in the staging site, not live.  To update:

1. Make whatever changes needed to the staging project.

...

Info

Using the --dryrun flag prior to officially mirroring can be helpful in ensuring that the pages to be updated are actually the ones intended.

Evaluation Queue Quotas

To update the submission quotas, go to the Synapse live site, then head to the Challenge tab.  Edit the Evaluation queues as needed:

...

Panel
panelIconId1f914
panelIcon:thinking:
panelIconText🤔
bgColor#F4F5F7

Example 1

A Challenge has 4 rounds, each round has a limit of 3 submissions per submitter/team. To implement this quota, click Edit on the desired queue, click Add Round and input the Duration and Submission Limits.

Panel
panelIconId1f914
panelIcon:thinking:
panelIconText🤔
bgColor#F4F5F7

Example 2

A Challenge lasts for 4 months with a daily submission limit of 3. To implement this quota, click Edit on the desired queue, click Add Round and input the Duration and click Advanced Limits to pick daily/weekly/monthly limits.

Workflow Steps

For any changes to the infrastructure workflow steps and/or scripts involved with the workflow (e.g. run_docker.py), simply make the edits to the scripts, then push the changes.

...

project should now be synced with challengeutils' mirrow-wiki.

Panel
panelIconId27a1
panelIcon:arrow_right:
panelIconText➡️
bgColor#F4F5F7

Learn more about how to Update the Challenge below.

...

Monitor the Submissions

As challenge organizers, we recommend creating a Submission View to easily track and monitor submissions as they come in. This table will especially be useful when participants need help with their submissions.

Panel
panelIconId27a1
panelIcon:arrow_right:
panelIconText➡️
bgColor#F4F5F7

Learn more about revealing scores and adding leaderboards in Evaluation Queues.

Steps

1. Go to the staging project and click on the Tables tab. Create a new Submission View by clicking on Add New… > Add Submission View

2. Under "Scope", add evaluation queue(s) you are interested in monitoring. More than one queue can be added. Click Next. On the following screen, select which information to display - this is known as the schema.

We recommend the following schema for monitoring challenge submissions:

Column Name

Description

Facet values?

evaluationid

Evaluation ID (evaluation ID, but rendered as evaluation name) – recommended for SubmissionViews with multiple queues in scope

(tick) Recommended

id

Submission ID

createdOn

Date and time of the submission (in Epoch, but rendered as MM/dd/yyyy, hh:mm:ss)

submitterid

User or team who submitted (user or team ID, but rendered as username or team name)

(tick) Recommended

dockerrepositoryname

Docker image name – recommended for model-to-data challenges

(error) Not recommended

dockerdigest

Docker SHA digest – recommended for model-to-data challenges

(error) Not recommended

status

Workflow status of the submission (one of [RECEIVED, EVALUATION_IN_PROGRESS, ACCEPTED, INVALID])

(tick) Recommended

submission_status

Evaluation status of the submission (one of [None, VALIDATED, SCORED, INVALID])

(tick) Recommended

submission_errors

(if any) Validation errors for the predictions file

(error) Not recommended

orgSagebionetworksSynapseWorkflowOrchestratorSubmissionFolder

Synapse ID to the submission’s logs folder

(error) Not recommended

prediction_fileid

Synapse ID to the predictions file (if any)

(error) Not recommended

(any annotations related to scores)

Submission annotations - names used depends on what annotations were used in the scoring step of the workflow

Info

The highlighted columns would need to be added manually by clicking the + Add Column button at the bottom of the Edit Columns window.

3. Click Save. A table of the submissions and their metadata will now be available for viewing and querying. Changes to the information displayed can be edited by clicking on the schema icon, followed by Edit Schema:

...

Update the Challenge

Challenge Site and Wikis

1. Make whatever changes needed to the staging project.


2. Use challengeutils' mirror-wiki to push the changes to the live project:

Code Block
challengeutils mirror-wiki staging_synid live_synid [--dryrun]

Use --dryrun to optionally preview which pages will be updated prior to doing an official sync.

Extending the Deadline

It is not unheard of for there to be a change in the submission deadline.

To change the submission deadline date, you can either:

  • edit the Round End of an existing round; or

  • add a new round that will immediately start after the current one

Workflow Steps

For any changes to the CWL scripts or run_docker.py, simply make the edits to the scripts, then push the changes. We highly recommend conducting dryruns whenever there is a change to the workflow, so that errors are addressed in a timely manner.

Evaluation Docker Image

If your workflow is using a Docker image in validate.cwl and/or score.cwl, and updates were made, pull the latest changes on the instance with:

Code Block
docker pull <image name>:<version>

...

🌟 For additional assistance or guidance, contact the Challenges and Benchmarking team at cnb@sagebase.org.