Evaluating Submissions

Synapse collects all submissions via Evaluation Queues, and you can view and monitor them using Submission Views. This tutorial will walk you through how to view the submissions, download them for evaluation, and upload their scores back to Synapse. Looking to automate this process? Check out the last section for some available resources!

Need more help with infrastructure setup? Reach out to the Challenges & Benchmarking Service Desk and a CNB team member will be in touch.

Learn more about Evaluation Queues.

Learn more about setting up Submission Views at Creating the Submission View.

Retrieving Submissions

Before you can retrieve submissions from Synapse, ensure you have at least "Can score" permissions for the relevant Evaluation Queue(s), otherwise, an error about permissions will be encountered.

If you had created the Queues, you should already have "Admin" privileges, which grant you the necessary permissions to view and retrieve submissions.

View Submissions Directly from Evaluation Queues

To view submissions programmatically, you can use syn.getSubmissions(EVALUATION_ID) [ref]. For example:

Python

import synapseclient

syn = synapseclient.login()

evaluation_id = "9615516"
submissions = syn.getSubmissions(evaluation_id)  # returns a generator of submissions
for submission in submissions:
  print(submission)

# To view the submissions in a dataframe, use pandas.
import pandas as pd

submissions_df = pd.DataFrame(syn.getSubmissions(evaluation_id))
print(submissions_df)

R

library(dplyr)
library(jsonlite)
library(synapser)

synLogin()

submissions <- synGetSubmissions("9615516")$asList()

# To view the submissions in a dataframe
submissions_df <- lapply(submissions, function(s) {
  data.frame(
    id = s$id,
    userId = s$userId,
    evaluationId = s$evaluationId,
    entityId = s$entityId,
    entityBundleJSON = s$entityBundleJSON,
    versionNumber = s$versionNumber,
    name = s$name,
    createdOn = as.character(s$createdOn),
    contributors = as.character(toJSON(s$contributors, auto_unbox = TRUE)),
    stringsAsFactors = FALSE
  )
}) %>% bind_rows()

print(submissions_df)

All new submissions are denoted with a status of RECEIVED. To view only new submissions, add status=”RECEIVED” to the method call:

Python

submissions = syn.getSubmissions(evaluation_id, status="RECEIVED")

R

submissions <- synGetSubmissions("9615516", status="RECEIVED")$asList()

If you do not know the evaluation ID, you can first get the Evaluation object by using syn.getEvaluationByName(EVALUATION_NAME)and then pass it into syn.getSubmissions(). For example:

Python

evaluation = syn.getEvaluationByName("MY_EVALUATION_QUEUE_NAME")
submissions = syn.getSubmissions(evaluation)

R

evaluation <- synGetEvaluationByName("MY_EVALUATION_QUEUE_NAME")
submissions <- synGetSubmissions(evaluation)

If you also don’t know the name, contact the Synapse user who created the Challenge project. They are likely the current admin of the Evaluation Queue(s) and can provide you with the evaluation ID(s). Otherwise, submit a support ticket to the Challenges & Benchmarking Service Desk for further assistance.

View Submissions via Submission View

If you have already set up a Submission View, you can view submissions by querying that table programmatically using syn.tableQuery(QUERY) [ref]:

Python

import pandas as pd
import synapseclient

syn = synapseclient.login()

# View all submissions.
view_id = "syn53293336"
submissions_df = (
  syn.tableQuery(f"SELECT * FROM {view_id}")
  .asDataFrame()
  .fillna("")
)
print(submissions_df)

R

library(synapser)
library(tidyverse)

synLogin()

# View all submissions.
view_id <- "syn53293336"
submissions_df <- 
  synTableQuery(str_glue("SELECT * FROM {view_id}")) %>%
  .$asDataFrame() %>%
  mutate(across(everything(), ~ ifelse(is.na(.) | . == "NaN", "", .)))

See Using Advanced Search Queries for more examples of SQL-like queries supported by Synapse.

To view only new submissions, add status = 'RECEIVED' as a clause to the query:

Python

import pandas as pd
import synapseclient

syn = synapseclient.login()

# View only new submissions.
view_id = "syn53293336"
submissions_df = (
  syn.tableQuery(f"SELECT * FROM {view_id} WHERE status = 'RECEIVED'")  # add a clause
  .asDataFrame()
  .fillna("")
)
print(submissions_df)

R

library(synapser)
library(tidyverse)

synLogin()

# View only new submissions.
view_id <- "syn53293336"
submissions_df <- 
  synTableQuery(str_glue("SELECT * FROM {view_id} WHERE status = 'RECEIVED'")) %>%  # add a clause
  .$asDataFrame() %>%
  mutate(across(everything(), ~ ifelse(is.na(.) | . == "NaN", "", .)))

Download Submissions

While there is not currently a feature on the Synapse web UI to download submissions, you can do so programmatically. Technically, you will be downloading Submission objects, which are copies of the entities submitted to an Evaluation Queue.

As long as you have at least "Can score" permissions on the queue, you will be able to access and download these Submission objects.

File Submissions

You can directly download file submissions by using the submission ID with syn.getSubmission(SUBMISSION_ID) [ref].

syn.getSubmission() is not the same as syn.getSubmissions()!

syn.getSubmission() (no “s” at the end) retrieves the metadata for a single submission and downloads it if it’s a file. Whereas syn.getSubmissions() only retrieves metadata of multiple submissions submitted to an Evaluation Queue.

Python

import synapseclient

syn = synapseclient.login()
submission_id = 9743445
syn.getSubmission(submission_id)

# By default, all files are downloaded to ~/.synapseCache. To specify
# a different location, use `downloadLocation`.
syn.getSubmission(submission_id, downloadLocation="path/to/download")

R

library(synapser)

synLogin()
submission_id <- 9743445
synGetSubmission(submission_id)

# By default, all files are downloaded to ~/.synapseCache. To specify
# a different location, use `downloadLocation`.
synGetSubmission(submission_id, downloadLocation = "path/to/download")

Docker Submissions

For Docker submissions, you will need to use Docker to retrieve the submitted images; using syn.getSubmission() will not pull the image onto your machine. To get the image name associated with a submission, combine the dockerRepositoryName and dockerDigest from the Submission object, separated by an @ symbol, e.g.{dockerRepositoryName}@{dockerDigest}.

CLI

docker pull DOCKER_REPOSITORY_NAME@DOCKER_DIGEST

Python

import docker
import synapseclient

syn = synapseclient.login()

# Setup Docker client.
client = docker.from_env()

# Get submission metadata with syn.getSubmission(..., downloadFile=False).
submission_id = 9753713
submission = syn.getSubmission(submission_id, downloadFile=False)
repo_name = submission.get("dockerRepositoryName")
digest = submission.get("dockerDigest")

# Attempt to pull Docker submission, otherwise output error message.
try:
  client.images.pull(f"{repo_name}@{digest}")
except docker.errors.APIError:
  print(f"Something went wrong with pulling submission {submission_id}")

This example uses the Docker SDK for Python to programmatically pull the images.

R

library(synapser)
library(tidyverse)

synLogin()

# Login to Synapse Docker Registry.
system("docker login docker.synapse.org")

# Get submission metadata with syn.getSubmission(..., downloadFile=False).
submission_id <- 9753713
submission <- synGetSubmission(submission_id)
repo_name <- submission$dockerRepositoryName
digest <- submission$dockerDigest

# Attempt to pull Docker submission, otherwise output error message.
if (nzchar(repo_name) && nzchar(digest)) {
  exit_code <- system(str_glue("docker pull {repo_name}@{digest}"))
  if (exit_code != 0) {
    message(str_glue("Something went wrong with pulling submission {submission_id}"))
  }
} else {
  message(str_glue("Submission {submission_id} is not a Docker submission"))
}

If you are pulling the submission metadata via a Submission View instead of using syn.getSubmission(), the annotation names will be in all lowercase (dockerrepositoryname and dockerdigest) instead of camel case, e.g.

submissions_df = syn.tableQuery(...)

for _, row in submissions_df.iterrows():
  submission_id = row['id']
  repo_name = row["dockerrepositoryname"]
  digest = row["dockerdigest"]

Putting It All Together

You now know how to query for new submissions and download them. By combining these steps, you can create a single script to evaluate all new submissions.

Once a submission is evaluated, we recommend updating its status from RECEIVED so that it does not get picked up again if/when you re-query for new submissions in the future.

Here's an example of an integrated script for downloading file submissions:

Python

"""Evaluating file submissions."""
import os

import pandas as pd
import synapseclient

syn = synapseclient.login()

# Get new submissions.
view_id = "syn53293336"
submissions_df = (
  syn.tableQuery(f"SELECT * FROM {view_id} WHERE status = 'RECEIVED'")
  .asDataFrame()
  .fillna("")
)

# Evaluate each new submission.
for submission_id in submissions_df["id"]:

  # Download predictions file to /path/to/download/.
  submission = syn.getSubmission(submission_id, downloadLocation="/path/to/download")
  with open(submission.filePath) as f:
    scores = ...  # evaluate the predictions file
  
  # Update submission status to 'SCORED' if evaluation is successful, else 'INVALID'.
  submission_status_obj = syn.getSubmissionStatus(submission_id)
  submission_status_obj.status = "SCORED" if scores else "INVALID"
  syn.store(submission_status_obj)
  
  # File cleanup.
  try:
    os.remove(submission.filePath)
  except OSError as e:
    print(f"Could not delete predictions file for submission {submission_id}: {e}")

R

library(synapser)
library(tidyverse)

synLogin()

# Get new submissions.
view_id <- "syn53293336"
submissions_df <- 
  synTableQuery(str_glue("SELECT * FROM {view_id} WHERE status = 'RECEIVED'")) %>%
  .$asDataFrame() %>%
  mutate(across(everything(), ~ ifelse(is.na(.) | . == "NaN", "", .)))

# Evaluate each new submission.
for (submission_id in submissions_df$id) {
  
  # Download predictions file to /path/to/download/.
  submission <- synGetSubmission(submission_id, downloadLocation = "/path/to/download")
  pred_file <- submission$filePath
  scores <- ...  # evaluate the predictions file

  # Update submission status to 'SCORED' if evaluation is successful, else 'INVALID'.
  submission_status_obj = synGetSubmissionStatus(submission_id)
  submission_status_obj$status <- ifelse(!is.null(scores), "SCORED", "INVALID")
  synStore(submission_status_obj)
  
  # File cleanup.
  tryCatch({
    file.remove(submission$filePath)
  }, error = function(e) {
    message(str_glue("Could not delete predictions file for submission {submission_id}: {e}"))
  })
}

Submission status can only be set to specific values; refer to SubmissionStatusEnum for a list of acceptable values.

Assigning and Displaying Scores

Scores are assigned to submissions by adding them as annotations. You can then display these scores in a Submission View by updating the view's schema to include these new annotations.

Annotate Submissions

You can only add annotations to submissions programmatically; the Synapse web UI does not currently support this feature.

Python

import synapseclient

syn = synapseclient.login()

# Get submission status object.
submission_id = 123
submission_status_obj = syn.getSubmissionStatus(submission_id)

# Add scores to the annotations metadata.

## adding one score
submission_status_obj.submissionAnnotations["auc_roc"] = 0.0

## adding multiple scores
score_annots = {
  "auprc": 0.0,
  "pearson": 0.0
}
submission_status_obj.submissionAnnotations.update(score_annots)

# Save the new annotations.
syn.store(submission_status_obj)

R

library(synapser)

synLogin()

# Get submission status object.
submission_id <- 123
submission_status_obj <- synGetSubmissionStatus(submission_id)

# Add scores to the annotations metadata.

## adding one score
submission_status_obj$submissionAnnotations$auc_roc <- 0.0

## adding multiple scores
score_annots <- list(
  auprc = 0.0,
  pearson = 0.0
)
submission_status_obj$submissionAnnotations$update(score_annots)

# Save the new annotations.
synStore(submission_status_obj)

Display Scores

To display the scores on Synapse:

Navigate to the Submission View containing the Evaluation Queue(s), and click on 3-bar icon (next to Submission View Tools) to Show Submission View Schema.
The schema will now appear above the table. Click on Edit Schema and a new window will pop up.
Click + Add Column. For “Column Name”, enter the exact annotation key name you used (e.g. auc_roc from the code examples above). Update the “Column Type” appropriately.
Repeat Step 3 for each scoring metric you want to display.
Click Save to apply the changes.

If done correctly, your Submission View should now include the new metric columns with scores displayed for each submission.

Troubleshooting: If scores are not appearing, double-check that the column names in your schema exactly match the annotation keys on your submissions, including casing. For example, AUC_ROC is not considered the same as auc_roc.

Another potential issue is a mismatch in the “Column Type”. For instance, if you specify "Integer" but your values are strings, the scores will not display.

Tools for Automation

If you're looking to automate the process of evaluating submissions as they come in, Sage offers several tools and services:

Orchestrators

ORCA (Paid Service): This tool uses NextFlow to run workflows. Your main job is to provide the evaluation code (template available below) which the Data Processing & Engineering (DPE) team then integrates into a workflow. For cost estimates, please contact the DPE Service Desk.
SynapseWorkflowOrchestrator: This tool executes Common Workflow Language (CWL) workflows, which you will design yourself (templates available below). This tool also requires manual setup and configuration to link it with your Challenge project. If you need help, contact the Challenges & Benchmarking Service Desk.

Workflow templates

orca-evaluation-templates (for use with ORCA)
data-to-model-challenge-workflow (for use with the SynapseWorkflowOrchestrator)
model-to-data-challenge-workflow (for use with the SynapseWorkflowOrchestrator)