Objective

To enhance the search and discovery experience on Synapse, we need a well-defined test dataset that supports measuring, benchmarking, and optimizing search performance. This dataset will ensure consistent evaluation of search engines, improve user engagement, and establish a baseline for future improvements.

The scope of this test data submission is currently limited to Synapse datasets—that is, collections of files, folders, and projects in Synapse identified by a synID, e.g., syn12345678. For tables, the combination of row_id and row_version serves as a stable and unique identifier, although it falls outside the scope of this schematized test data submission.

Acceptance Criteria

Submission Process

  1. Submit datasets in JSON Schema format as shown below.

  2. Commit updates in the chosen version-controlled repository.

  3. Provide the file URL and file version number or tag name as part of the submission.

Here is the JSON schema for the test cases:

{

  "$schema": "http://json-schema.org/draft-07/schema#",

  "type": "array",

  "items": {

    "type": "object",

    "properties": {

      "queryString": {

        "type": "string",

        "description": "The query string for the test case."

      },

      "ids": {

        "type": "array",

        "items": {

          "type": "string",

          "description": "An syn identifier."

        },

        "description": "A list of IDs associated with the test case."

      }

    },

    "required": [

      "queryString",

      "ids"

    ],

    "additionalProperties": false,

    "description": "A test case with a query string and a list of IDs."

  },

  "description": "An array of test cases, each with a query string and a list of IDs."

}

Here is an example of what the test cases might look like:

[

  {

    "queryString": "metabolic dynamics",

    "ids": ["syn123", "syn456", "syn789""]

  },

  {

    "queryString": "mega data",

    "ids": ["syn1001", "syn1002""]

  }

]