...
Code Block | ||
---|---|---|
| ||
{ "$schema": "http://json-schema.org/draft-07/schema", "$id": "https://repo-prod.prod.sagebase.org/repo/v1/schema/type/registered/ebispot.duo-duo", "title": "Full DUO schema", "description": "...", "allOf": [ {"$ref": "#/definitions/ebispot.duo-D0000004"}, {"$ref": "#/definitions/ebispot.duo-D0000006"}, {"$ref": "#/definitions/ebispot.duo-D0000007"}, {"$ref": "#/definitions/ebispot.duo-D0000011"}, {"$ref": "#/definitions/ebispot.duo-D0000012"}, {"$ref": "#/definitions/ebispot.duo-D0000015"}, {"$ref": "#/definitions/ebispot.duo-D0000016"}, {"$ref": "#/definitions/ebispot.duo-D0000018"}, {"$ref": "#/definitions/ebispot.duo-D0000019"}, {"$ref": "#/definitions/ebispot.duo-D0000020"}, {"$ref": "#/definitions/ebispot.duo-D0000021"}, {"$ref": "#/definitions/ebispot.duo-D0000022"}, {"$ref": "#/definitions/ebispot.duo-D0000024"}, {"$ref": "#/definitions/ebispot.duo-D0000025"}, {"$ref": "#/definitions/ebispot.duo-D0000026"}, {"$ref": "#/definitions/ebispot.duo-D0000027"}, {"$ref": "#/definitions/ebispot.duo-D0000028"}, {"$ref": "#/definitions/ebispot.duo-D0000029"}, {"$ref": "#/definitions/ebispot.duo-D0000042"}, {"$ref": "#/definitions/ebispot.duo-D0000043"}, {"$ref": "#/definitions/ebispot.duo-D0000044"}, {"$ref": "#/definitions/ebispot.duo-D0000045"}, {"$ref": "#/definitions/ebispot.duo-D0000046"} ], "definitions": { "ebispot.duo-D0000004": { "$schema": "http://json-schema.org/draft-07/schema", "properties": {"NRES": { "type": "boolean", "description": "This data use permission indicates there is no restriction on use." }}, "title": "no restriction" }, "ebispot.duo-D0000006": { "$schema": "http://json-schema.org/draft-07/schema", "properties": {"HMB": { "type": "boolean", "description": "This data use permission indicates that use is allowed for health/medical/biomedical purposes; does not include the study of population origins or ancestry." }}, "title": "health or medical or biomedical research" }, "ebispot.duo-D0000007": { "$schema": "http://json-schema.org/draft-07/schema", "properties": {"DS": { "type": "boolean", "description": "This data use permission indicates that use is allowed provided it is related to the specified disease." }}, "title": "disease specific research", "if": {"properties": {"DS": {"const": true}}}, "then": {"properties": {"DS_disease": { "type": "string", "description": "DUO recommends MONDO be used, to provide the basis for automated evaluation. For more information see https://github.com/EBISPOT/DUO/blob/master/MONDO_Overview.md", "enum": [ "cancer", "alzheimer", "amnesia", "..." ] }}} }, "ebispot.duo-D0000011": { "$schema": "http://json-schema.org/draft-07/schema", "properties": {"POA": { "type": "boolean", "description": "This data use permission indicates that use of the data is limited to the study of population origins or ancestry." }}, "title": "population origins or ancestry research only" }, "ebispot.duo-D0000012": { "$schema": "http://json-schema.org/draft-07/schema", "properties": {"RS": { "type": "boolean", "description": "This data use modifier indicates that use is limited to studies of a certain research type." }}, "title": "research specific restrictions", "if": {"properties": {"RS": {"const": true}}}, "then": {"properties": {"RS_research_type": { "type": "string", "description": "...", "enum": [ "???cancer", "..." ] }}} }, "ebispot.duo-D0000015": { "$schema": "http://json-schema.org/draft-07/schema", "properties": {"NMDS": { "type": "boolean", "description": "This data use modifier indicates that use does not allow methods development research (e.g., development of software or algorithms)." }}, "title": "no general methods research" }, "ebispot.duo-D0000016": { "$schema": "http://json-schema.org/draft-07/schema", "properties": {"GSO": { "type": "boolean", "description": "This data use modifier indicates that use is limited to genetic studies only (i.e., studies that include genotype research alone or both genotype and phenotype research, but not phenotype research exclusively)" }}, "title": "genetic studies only" }, "ebispot.duo-D0000018": { "$schema": "http://json-schema.org/draft-07/schema", "properties": {"NPUNCU": { "type": "boolean", "description": "This data use modifier indicates that use of the data is limited to not-for-profit organizations and not-for-profit use, non-commercial use." }}, "title": "not for profit, non commercial use only" }, "ebispot.duo-D0000019": { "$schema": "http://json-schema.org/draft-07/schema", "properties": {"PUB": { "type": "boolean", "description": "This data use modifier indicates that requestor agrees to make results of studies using the data available to the larger scientific community." }}, "title": "publication required" }, "ebispot.duo-D0000020": { "$schema": "http://json-schema.org/draft-07/schema", "properties": {"COL": { "type": "boolean", "description": "This data use modifier indicates that the requestor must agree to collaboration with the primary study investigator(s)." }}, "title": "collaboration required", "if": {"properties": {"COL": {"const": true}}}, "then": {"properties": {"COL_PI": { "type": "string", "description": "This could be coupled with a string describing the primary study investigator(s)." }}} }, "ebispot.duo-D0000021": { "$schema": "http://json-schema.org/draft-07/schema", "properties": {"IRB": { "type": "boolean", "description": "This data use modifier indicates that the requestor must provide documentation of local IRB/ERB approval." }}, "title": "ethics approval required" }, "ebispot.duo-D0000022": { "$schema": "http://json-schema.org/draft-07/schema", "properties": {"GS": { "type": "boolean", "description": "This data use modifier indicates that use is limited to within a specific geographic region." }}, "title": "geographical restriction", "if": {"properties": {"GS": {"const": true}}}, "then": {"properties": {"GS_location": { "type": "string", "description": "This should be coupled with an ontology term describing the geographical location the restriction applies to." }}} }, "ebispot.duo-D0000024": { "$schema": "http://json-schema.org/draft-07/schema", "properties": {"MOR": { "type": "boolean", "description": "This data use modifier indicates that requestor agrees not to publish results of studies until a specific date." }}, "title": "publication moratorium", "if": {"properties": {"MOR": {"const": true}}}, "then": {"properties": {"MOR_date": { "type": "string", "description": "This should be coupled with a date specified as ISO8601", "format": "date-time" }}} }, "ebispot.duo-D0000025": { "$schema": "http://json-schema.org/draft-07/schema", "properties": {"TS": { "type": "boolean", "description": "This data use modifier indicates that use is approved for a specific number of months." }}, "title": "time limit on use", "if": {"properties": {"TS": {"const": true}}}, "then": {"properties": {"TS_number_of_months": { "type": "integer", "description": "This should be coupled with an integer value indicating the number of months." }}} }, "ebispot.duo-D0000026": { "$schema": "http://json-schema.org/draft-07/schema", "properties": {"US": { "type": "boolean", "description": "This data use modifier indicates that use is limited to use by approved users." }}, "title": "user specific restriction" }, "ebispot.duo-D0000027": { "$schema": "http://json-schema.org/draft-07/schema", "properties": {"PS": { "type": "boolean", "description": "This data use modifier indicates that use is limited to use within an approved project." }}, "title": "project specific restriction", "if": {"properties": {"PS": {"const": true}}}, "then": {"properties": {"PS_project": { "type": "string", "description": "???" }}} }, "ebispot.duo-D0000028": { "$schema": "http://json-schema.org/draft-07/schema", "properties": {"IS": { "type": "boolean", "description": "This data use modifier indicates that use is limited to use within an approved institution." }}, "title": "institution specific restriction", "if": {"properties": {"IS": {"const": true}}}, "then": {"properties": {"IS_institution": { "type": "string", "description": "???" }}} }, "ebispot.duo-D0000029": { "$schema": "http://json-schema.org/draft-07/schema", "properties": {"RTN": { "type": "boolean", "description": "This data use modifier indicates that the requestor must return derived/enriched data to the database/resource." }}, "title": "return to database or resource" }, "ebispot.duo-D0000042": { "$schema": "http://json-schema.org/draft-07/schema", "properties": {"GRU": { "type": "boolean", "description": "This data use permission indicates that use is allowed for general research use for any research purpose." }}, "title": "general research use" }, "ebispot.duo-D0000043": { "$schema": "http://json-schema.org/draft-07/schema", "properties": {"CC": { "type": "boolean", "description": "This data use modifier indicates that use is allowed for clinical use and care. Clinical Care is defined as Health care or services provided at home, in a healthcare facility or hospital. Data may be used for clinical decision making." }}, "title": "clinical care use" }, "ebispot.duo-D0000044": { "$schema": "http://json-schema.org/draft-07/schema", "properties": {"NPOA": { "type": "boolean", "description": "This data use modifier indicates use for purposes of population, origin, or ancestry research is prohibited." }}, "title": "population origins or ancestry research prohibited" }, "ebispot.duo-D0000045": { "$schema": "http://json-schema.org/draft-07/schema", "properties": {"NPU": { "type": "boolean", "description": "This data use modifier indicates that use of the data is limited to not-for-profit organizations." }}, "title": "not for profit organisation use only" }, "ebispot.duo-D0000046": { "$schema": "http://json-schema.org/draft-07/schema", "properties": {"NCU": { "type": "boolean", "description": "This data use modifier indicates that use of the data is limited to not-for-profit use. This indicates that data can be used by commercial organisations for research purposes, but not commercial purposes." }}, "title": "non-commercial use only" } } } |
DUO Schema
We chose to implement each category as a separate JSON-schema, each consisting of at least one boolean property, using the ontology:shorthand as the key. For example, D0000007
use the key DS
. In each case, the default value of each boolean is false. For some cases, when a value of “true” is provided, extra data is expected. For example, D0000024
which is labeled as: publication moratorium
, with a key of MOR
, include an extra field to capture the moratorium date: MOR_date
, when MOR
is set to true.
The next step is to apply the DUO schema to the project from the main governance narrative:
Code Block | ||
---|---|---|
| ||
{
"$schema": "http://json-schema.org/draft-07/schema",
"title": "Schema for Some Project",
"$id": "some.project-main",
"description": "This schema defines how DUO should be used with Some Project.",
"allOf": [
{
"$ref": "org.sagebionetworks-repo.model.FileEntity-1.0.0"
},
{
"$ref": "ebispot.duo-duo"
},
{
"if": {
"properties": {
"patientLocation": {
"const": "Germany"
},
"assayType": {
"const": "genomic"
}
}
},
"then": {
"properties": {
"GS": {
"title": "geographical restriction",
"type": "boolean",
"const": true
},
"GS_location": {
"type": "string",
"description": "This data cannot leave Germany",
"const": "Germany"
}
},
"required": [
"GS_location"
]
}
},
{
"if": {
"properties": {
"patientLocation": {
"const": "USA"
},
"assayType": {
"const": "genomic"
}
}
},
"then": {
"properties": {
"sourceGeography": {
"const": "US"
},
"jurisdiction ": {
"const": "HIPAA"
},
"dataLabel": {
"const": "De-identified"
}
},
"required": [
"sourceGeography",
"jurisdiction",
"dataLabel"
]
}
}
],
"properties": {
"assayType": {
"description": "Identifies they type of data for this files.",
"type": "string",
"enum": [
"clinical",
"assay",
"imaging",
"genomic"
]
},
"patientLocation": {
"description": "The location of the patient associated with the data",
"type": "string",
"enum": [
"USA",
"Germany"
]
},
"RS": {
"title": "research specific restrictions",
"type": "boolean",
"const": true
},
"RS_research_type": {
"title": "Restricted to cancer research",
"type": "string",
"const": "cancer"
},
"IRB": {
"title": "ethics approval required",
"type": "boolean",
"const": true
},
"MOR": {
"title": "publication moratorium",
"type": "boolean",
"const": true
},
"MOR_date": {
"title": "publication moratorium date",
"type": "string",
"format": "date",
"const": "2022-05-20"
}
},
"required": [
"assayType",
"patientLocation",
"NRES",
"HMB",
"DS",
"POA",
"RS",
"NMDS",
"GSO",
"NPUNCU",
"PUB",
"COL",
"IRB",
"GS",
"MOR",
"TS",
"US",
"PS",
"IS",
"RTN",
"GRU",
"CC",
"NPOA",
"NPU",
"NCU"
]
}
|
DUO applied to a Project
Notice that line:8 indicates that this schema applies to FileEntities, while line:11 indicates that the schema “extends” the DUO schema. The first two properties of the schema: assayType
(line:74) and patientLocation
(line:84) are drivers that will determine what conditional properties must be applied to each file. There are two if/then blocks (line:14 to 71) that define what conditional properties should be applied based on assayType
and patientLocation
. The properties RS
, RS_research_type
, IRB
, MOR
, MOR_date
, are all unconditional properties, with constant values that must be applied to all files in the project.
The following JSON is an example of what “valid” properties could be for syn1 from the example above:
Code Block | ||
---|---|---|
| ||
{
"name": "GermanGenomic.data",
"description": "Genomic data from patients in Germany",
"id": "syn1",
"etag": "some-etag",
"createdOn": "2020-05-20T20:20:39+00:00",
"modifiedOn": "2020-05-20T20:20:39+00:00",
"createdBy": "123456789",
"modifiedBy": "123456789",
"parentId": "syn444",
"versionLabel": "one",
"versionComment": "leaving blank",
"versionNumber": 1,
"dataFileHandleId": "98765",
"fileNameOverride": "",
"concreteType": "org.sagebionetworks.repo.model.FileEntity",
"assayType": "genomic",
"patientLocation": "Germany",
"NRES": false,
"HMB": false,
"DS": false,
"POA": false,
"RS": true,
"RS_research_type": "cancer",
"NMDS": false,
"GSO": false,
"NPUNCU": false,
"PUB": false,
"COL": false,
"IRB": true,
"GS": true,
"GS_location": "Germany",
"MOR": true,
"MOR_date": "2022-05-20",
"TS": false,
"US": false,
"PS": false,
"IS": false,
"RTN": false,
"GRU": false,
"CC": false,
"NPOA": false,
"NPU": false,
"NCU": false
} |
syn1.json
Since syn1 includes "assayType": "genomic"
and "patientLocation": "Germany"
, it must include "GS": true
and "GS_location": "Germany"
according to the rules of the first if/then. Most of the properties between lines: 19 to 44, are all constants based on this projects schema.
The following JSON is an example of what “valid” properties could be for syn4 from the example above:
Code Block | ||
---|---|---|
| ||
{
"name": "USGenomic.data",
"description": "Genomic data from patients in USA",
"id": "syn4",
"etag": "some-etag",
"createdOn": "2020-05-20T20:20:39+00:00",
"modifiedOn": "2020-05-20T20:20:39+00:00",
"createdBy": "123456789",
"modifiedBy": "123456789",
"parentId": "syn444",
"versionLabel": "one",
"versionComment": "leaving blank",
"versionNumber": 1,
"dataFileHandleId": "98765",
"fileNameOverride": "",
"concreteType": "org.sagebionetworks.repo.model.FileEntity",
"assayType": "genomic",
"patientLocation": "USA",
"NRES": false,
"HMB": false,
"DS": false,
"POA": false,
"RS": true,
"RS_research_type": "cancer",
"NMDS": false,
"GSO": false,
"NPUNCU": false,
"PUB": false,
"COL": false,
"IRB": true,
"GS": false,
"MOR": true,
"MOR_date": "2022-05-20",
"TS": false,
"US": false,
"PS": false,
"IS": false,
"RTN": false,
"GRU": false,
"CC": false,
"NPOA": false,
"NPU": false,
"NCU": false,
"sourceGeography":"US",
"jurisdiction": "HIPAA",
"dataLabel":"De-identified"
} |
syn4.json
Since syn4 has "assayType": "genomic"
and "patientLocation": "USA"
, according to the if/then statements it must also the following constant properties: "sourceGeography":"US"
, "jurisdiction": "HIPAA"
, "dataLabel":"De-identified"
. Note: For syn4 "GS": false
because the patient location does not equal Germany.
Derived Annotations
The example above for both syn1 and syn4, indicates that all of the the governance specific metadata is derived from two sources:
Project specific constant - For example “publication moratorium” (
"MOR": true
&"MOR_date": "2022-05-20"
) apply to all files in the project, as defined by the schema ("$id": "some.project-main"
)User Provided properties - For example, the if/then blocks define additional properties based on the values of the user provided:
"assayType"
and"patientLocation"
Note: For this phase we are glossing over the fact that the value of patientLocation is a transitive. Its value would be found by joining the patient table, with the treatement table, and then joining with the sampleIds of each file. We will attempt to address this in a later phase.
Given that the goverence data is derived, it does not seem reasonable to require that any user directly provide this data as annotations. Instead, the system should be able to “automatically” provide these value-key-pairs for each file using the bound JSON-Schema and user-provided values.