Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

One of the features to the Synapse.org platform is the ability to host a crowd-sourced challenge. Hosting challenges are a great way to crowd-source new computational methods for fundamental questions in systems biology and translational medicine.

...

  • Setting up the infrastructure for submission evaluation

  • Launching and updating the challenge

  • Updating the challenge

  • Closing the challenge

  • Monitoring the submissions

...

Before You Begin

🖥️ Required Compute Power

At Sage Bionetworks, we generally provision an AWS EC2 Linux instance to run infrastructure of a challenge, leveraging the SynapseWorkflowOrchestrator to run CWL workflows.  These workflows will be responsible for evaluating and scoring submissions.

  • If Sage is responsible for providing To budget the cloud compute services: we ask that you give us a general estimate of the computing power consider the computing power needed to validate and score the submissions (for example: memory, volumestorage, GPU required?, …)network speed). If contracting with Sage Bionetworks, we can help estimate these costs as part of the Challenge budget.

📋 Using Sensitive Data as Challenge Data

Challenge data can be hosted on Synapse. If the data is sensitive (for example, human data), Synapse can apply access restrictions so that legal requirements are met before participants can access them. Contact the Synapse Access and Compliance Team (act@sagebase.org) for support with the necessary data access procedures for sensitive data.

🛑 Restricted Data Access

If data cannot leave the external site or data provider, it will be the data contributor’s responsibility to set up the challenge infrastructure. Contact the Challenges and Benchmarking team (cnb@sagebase.org) for consultations if needed.

...

Note that the steps outlined in this article will assume the orchestrator will be used.

...

Challenge Infrastructure Setup

...

Requirements

  • Synapse One Sage account

  • Python 3.78+

  • synapseclient

  • challengeutils

  • (for local testing) CWL runner of choice, e.g. cwltool

  • Access to cloud compute services, e.g. AWS, GCP, etc.

Outcome

This infrastructure setup will continuously monitor the challenge’s evaluation queue(s) for new submissions. Once a submission is received, it will undergo evaluation including validation and scoring. All submissions will be downloadable to the challenge organizers, including the Docker image (if model-to-data challenge) and/or prediction files. Participants may periodically receive email notifications about their submissions (such as status, scores), depending on the infrastructure configurations.

Steps

1. Create a GitHub repository for the challenge workflow infrastructure. For the orchestrator to work, this repo must be public.

...

Property

Description

Example

SYNAPSE_USERNAME

Synapse credentials under which the orchestrator will run.  

The provided user must have access to the evaluation queue(s) being serviced.

dream_user

SYNAPSE_PASSWORD

Password for SYNAPSE_USERNAME.

This can be found under My Dashboard > Settings.

"abcdefghi1234=="

WORKFLOW_OUTPUT_ROOT_ENTITY_ID

Synapse ID synID for "Logs" folder.

Use the synID from Step 4.

syn123

EVALUATION_TEMPLATES

JSON map of evaluation IDs to the workflow repo archive, where the key is the evaluation ID and the value is the link address to the archive.

Use the evaluation IDs from Step 3 as the key(s) and the synIDs from Step 5 as the value(s).

{

"9810678": "syn456", 

 "9810679": "syn456"

}

...

Expand
titleIf using data-to-model template:

Script

TODO

Required?

workflow.cwl

Update synapseid to the synID of the Challenge's goldstandardgold standard/groundtruth ground truth file

yes

Set errors_only to false if an email notification about a valid submission should also be sent

no

Add metrics and scores to private_annotations if they are to be withheld from the participants

no

validate.cwl

Update the base image if the validation code is not Python

no

Remove the sample validation code and replace with validation code for the challenge

yes

score.cwl

Update the base image if the validation code is not Python

no

Remove the sample scoring code and replace with scoring code for the challenge

yes

Expand
titleIf using model-to-data template:

Script

TODO

Required?

workflow.cwl

Provide the admin user ID or admin team ID for principalid 

(2 steps: set_submitter_folder_permissions, set_admin_folder_permissions)

yes

Update synapseid to the Synapse ID synID of the Challenge's goldstandardgold standard / ground truth data

yes

Set errors_only to false if an email notification about a valid submission should also be sent

(2 steps: email_docker_validation, email_validation)

no

Provide the absolute path to the data directory, denoted as input_dir, to be mounted during the container runs.

yes

Set store to false if log files should be withheld from the Participants

no

Add metrics and scores to private_annotations if they are to be withheld from the Participants

no

validate.cwl

Update the base image if the validation code is not Python

no

Remove the sample validation code and replace with validation code for the Challenge

yes

score.cwl

Update the base image if the validation code is not Python

no

Remove the sample scoring code and replace with scoring code for the Challenge

yes

...

Once you are done making updates, save the file and restart the orchestrator to apply the changes.

Log Files

As it’s running, the orchestrator will upload logs and prediction files to the “Logs” folder. For each user or team that submits to the challenge, two folders will be created:

...

Code Block
Logs
 ├── submitteridA
 │  ├── submission01
 │  │  ├── submission01_log.txt
 │  │  └── submission01_logs.zip
 │  ├── submission02
 │  │  ├── submission02_log.txt
 │  │  └── submission02_logs.zip
 │ ...
 │
 ├── submitteridA_LOCKED
 │  ├── submission01
 │  │  └── predictions.csv
 │  ├── submission02
 │  │  └── predictions.csv
 │ ...
 │
...

...

Launch the Challenge

Requirements

Important TODOs:

...

Info

Following steps 1-5 should be done in the live project.

  1. Before proceeding with the launch, we recommend contacting Sage Governance to add a clickwrap for challenge registration. With a clickwrap in-place, interested participants can only be registered if they agree to the terms and conditions of the challenge data usage.

    • If you are a Sage employee: submit a Jira ticket to the Governance board with the synID of the live project, as well as the team ID of the participants team.

  2. Share all needed evaluation queues with the participants team with Can submit permissions. Once the challenge is over, we recommend updating the permissions to Can view to prevent late submissions.

  3. We also recommend sharing the evaluation queues with the general public so that the leaderboards are openly accessible.

  4. After the challenge is launched, create a folder called “Data” and update its Sharing Settings. Share the “Data” folder with the participants team only. Do not make the folder public or accessible to all registered Synapse users. The sharing settings of the “Data” folder should look something like this:

  5. Upload any challenge data that is to be provided to the participants to the “Data” Folder. DO NOT UPLOAD DATA until you have updated its sharing settings.

...

Code Block
languagepy
import synapseclient
import synapseutils
syn = synapseclient.login()

synapseutils.copyWiki(
   syn, "syn1234",  # Synapse IDsynID of staging site
   destinationId="syn2345",  # Synapse IDsynID of live site
   destinationSubPageId=999999  # ID following ../wiki/ of live project URL
)

When using copyWiki, it is important to specify the destinationSubPageId parameter.  This ID can be found in the URL of the live project, where it is the number following .../wiki/.

(warning) Once copyWiki has been used once, DO NOT RUN IT AGAIN! (warning)


Once the wiki has been copied over, all changes to the live project should now be synced with challengeutils' mirrow-wiki.

Panel
panelIconId27a1
panelIcon:arrow_right:
panelIconText➡️
bgColor#F4F5F7

Learn more about how to Update the Challenge below.

...

Monitor the Submissions

As challenge organizers, we recommend creating a Submission View to easily track and monitor submissions as they come in. This table will especially be useful when participants need help with their submissions.

Panel
panelIconId27a1
panelIcon:arrow_right:
panelIconText➡️
bgColor#F4F5F7

Learn more about revealing scores and adding leaderboards in Evaluation Queues.

Steps

1. Go to the staging project and click on the Tables tab. Create a new Submission View by clicking on Add New… > Add Submission View

...

Column Name

Description

Facet values?

evaluationid

Evaluation ID (evaluation ID, but rendered as evaluation name) – recommended for SubmissionViews with multiple queues in scope

(tick) Recommended

id

Submission ID

createdOn

Date and time of the submission (in Epoch, but rendered as MM/dd/yyyy, hh:mm:ss)

submitterid

User or team who submitted (user or team ID, but rendered as username or team name)

(tick) Recommended

dockerrepositoryname

Docker image name – recommended for model-to-data challenges

(error) Not recommended

dockerdigest

Docker SHA digest – recommended for model-to-data challenges

(error) Not recommended

status

Workflow status of the submission (one of [RECEIVED, EVALUATION_IN_PROGRESS, ACCEPTED, INVALID])

(tick) Recommended

submission_status

Evaluation status of the submission (one of [None, VALIDATED, SCORED, INVALID])

(tick) Recommended

submission_errors

(if any) Validation errors for the predictions file

(error) Not recommended

orgSagebionetworksSynapseWorkflowOrchestratorSubmissionFolder

Synapse ID synID to the submission’s logs folder

(error) Not recommended

prediction_fileid

Synapse ID synID to the predictions file (if any)

(error) Not recommended

(any annotations related to scores)

Submission annotations - names used depends on what annotations were used in the scoring step of the workflow

...

3. Click Save. A table of the submissions and their metadata will now be available for viewing and querying. Changes to the information displayed can be edited by clicking on the schema icon, followed by Edit Schema:

...

Update the Challenge

...

Updating Existing Wikis

1. Make whatever changes needed Go to the staging project.2. Use challengeutils' mirror-wiki to push the changes to the and navigate to the page(s) you wish to edit. Click on the pencil icon to Edit Project Wiki:

...

Make edits as needed, then click Save.


2. Use challengeutils' mirror-wiki to push the changes to the live project:

Code Block
challengeutils mirror-wiki staging_synid live_synid [--dryrun]

Use --dryrun to optionally preview which pages will be updated prior to doing an official sync.

Extending the Deadline

It is not unheard of for there to be a change in the submission deadline.

To change the submission deadline date, you can either:

  • edit the Round End of an existing round; or

  • add a new round that will immediately start after the current one

Workflow Steps

For any changes to the CWL scripts or run_docker.py, simply make the edits to the scripts, then push the changes. We highly recommend conducting dryruns whenever there is a change to the workflow, so that errors are addressed in a timely manner.

Evaluation Docker Image

If your workflow is using a Docker image in validate.cwl and/or score.cwl, and updates were made, pull the latest changes on the instance with:

...

Adding a New Wiki Page

1. Go to the staging project, click on Wiki Tools > Add Wiki Subpage. Enter a page title, then click OK. A new page should now be available.


2. On the new page, click on the pencil icon to Edit Project Wiki:

...

Add the page content, then click Save.

3. Go to the live project and create a new wiki page with the same name as the new page in the staging project. mirror-wiki depends on the page titles to be the same for synchronization.

4. Use challengeutils' mirror-wiki to push the changes to the live project:

Code Block
challengeutils mirror-wiki staging_synid live_synid [--dryrun]

Use --dryrun to optionally preview which pages will be updated prior to doing an official sync.

Extending the Deadline

It is not unheard of for there to be a change in the submission deadline. To extend the submission deadline date/time, you can either:

  • edit the Round End of an existing round; or

  • add a new round that will immediately start after the current one (note that this approach will reset everyone’s submission limit)

We also recommend notifying the participants of any changes to the timeline by posting an announcement.

Panel
panelIconId27a1
panelIcon:arrow_right:
panelIconText➡️
bgColor#F4F5F7

Learn more posting announcements in Utilizing the Discussion Board.

Updating the Workflow

For any changes to the CWL scripts or run_docker.py, make edits as needed to the scripts, then push the changes. We highly recommend conducting dryruns immediately after, so that errors are addressed in a timely manner.

Evaluation Docker Image

If your workflow is using a Docker image in validate.cwl and/or score.cwl, and updates were made, pull the latest changes on the instance with:

Code Block
docker pull <image name>:<version>

...

Close the Challenge

Important TODOs:

...

Code Block
languagehtml
<div class="alert alert-info">
  <h4><strong>Registration is closed.</strong></h4>
  <p>Thank you to everyone who joined the challenge!</p>
</div>

  • Data access: disallow users from joining the participant team, thus barring them from accessing the challenge data after the challenge has completed

    • Go to Participants team page

    • Click on Team Actions button > Edit Team

    • Under Access, select “Team is locked, users may not join or request access. New users must be invited by a team manager.”

  • Evaluation queues: update the participants team’s permissions from Can submit to Can view

  • Writeups: link writeups to the final submissions

Panel
panelIconId27a1
panelIcon:arrow_right:
panelIconText➡️
bgColor#F4F5F7

See Collecting Writeups for more information.

  • Instances: stop and terminate all cloud compute instances for the challenge.

...

🌟 For additional assistance or guidance, contact the Challenges and Benchmarking team at cnb@sagebase.org.