...
Code Block | ||
---|---|---|
| ||
{
"$schema": "http://json-schema.org/draft-07/schema",
"title": "Schema for Some Project",
"$id": "some.project-main",
"description": "This schema defines how DUO should be used with Some Project.",
"allOf": [
{
"$ref": "org.sagebionetworks-repo.model.FileEntity-1.0.0"
},
{
"$ref": "ebispot.duo-duo-1.0.1"
},
{
"if": {
"properties": {
"patientLocation": {
"const": "Germany"
},
"assayType": {
"const": "genomic"
}
}
},
"then": {
"properties": {
"GS": {
"title": "geographical restriction",
"type": "boolean",
"const": true
},
"GS_location": {
"type": "string",
"description": "This data cannot leave Germany",
"const": "Germany"
}
},
"required": [
"GS_location"
]
}
},
{
"if": {
"properties": {
"patientLocation": {
"const": "USA"
},
"assayType": {
"const": "genomic"
}
}
},
"then": {
"properties": {
"sourceGeography": {
"const": "US"
},
"jurisdiction ": {
"const": "HIPAA"
},
"dataLabel": {
"const": "De-identified"
}
},
"required": [
"sourceGeography",
"jurisdiction",
"dataLabel"
]
}
}
],
"properties": {
"assayType": {
"description": "Identifies they type of data for this files.",
"type": "string",
"enum": [
"clinical",
"assay",
"imaging",
"genomic"
]
},
"patientLocation": {
"description": "The location of the patient associated with the data",
"type": "string",
"enum": [
"USA",
"Germany"
]
},
"RS": {
"title": "research specific restrictions",
"type": "boolean",
"const": true
},
"RS_research_type": {
"title": "Restricted to cancer research",
"type": "string",
"const": "cancer"
},
"IRB": {
"title": "ethics approval required",
"type": "boolean",
"const": true
},
"MOR": {
"title": "publication moratorium",
"type": "boolean",
"const": true
},
"MOR_date": {
"title": "publication moratorium date",
"type": "string",
"format": "date",
"const": "2022-05-20"
}
},
"required": [
"assayType",
"patientLocation",
"NRES",
"HMB",
"DS",
"POA",
"RS",
"NMDS",
"GSO",
"NPUNCU",
"PUB",
"COL",
"IRB",
"GS",
"MOR",
"TS",
"US",
"PS",
"IS",
"RTN",
"GRU",
"CC",
"NPOA",
"NPU",
"NCU"
]
}
|
...
Note: For this phase we are glossing over the fact that the value of patientLocation is a transitive. Its value would be found by joining the patient table, with the treatement treatment table, and then joining with the sampleIds of each file. We will attempt to address this in a later phase.
Given that the goverence data is there are 30+ governance annotations for this project, and all of then values can be derived, it does not seem reasonable to require that ask any user directly provide this data as annotationsto provide these annotation values. Instead, the system should be able to “automatically” provide it would be better if Synapse could “automatically” provided these value-key-pairs for each file using the bound JSON-Schema and user-provided values.
One of the governance narratives includes a case where the "patientLocation"
value for a given file was mistakenly given the wrong value. For example, lets assume that syn4 was incorrectly given "patientLocation": "USA"
, as the patients location is actually Germany. Correcting this single value on syn4 would require that five other governance annotations would need to change. It might not even be obvious to the user making the correction that these additional changes are needed. Instead, it would be better if the Synapse could “automatically” re-derive the governance annotations.
It should be noted that a system that could automatically derive annotations could be useful for many external use cases. For example, one of the main JSON schema use cases involves setting annotations on files that are uploaded in bulk. For some of these use cases, a few key values provided by the upload might be enough to automatically derive the rest of the value-key pairs.
For this discussion we are defining the following terms:
Actual Annotation - This is a value-key-pair that is provided by user not the system.
Derived Annotation - This is a value-key-pair that is automatically provide by the system using a combination of JSON schema and actual annotations.
Derived Annotations Algorithm
Given a valid JSON schema, and a JSON representation of actual annotation value-key-pairs as input, calculate the list of derived annotation value-key-pairs as output. The algorithm must meet the following requirements:
Only actual annotations are to be considered. In other words, a derived annotation value-key-pair cannot be used to derive another value-key-pair.
Only JSON schema properties that are defined to have a constant value (for example :
"const": "cancer"
) will be considered as derived annotation candidates.If an actual annotation exists with the same key, the candidate will be eliminated. This means that derived annotations will never “correct” invalid actual annotations.
If the candidate is in an unreachable logic branch, then it will be eliminated. For example, if the candidate resides in a
"then"
block, that is unreachable because the corresponding"if"
evaluates to “false”, then the candidate will be eliminated.Any candidate that is not eliminated will be added to the results as a derived annotation value-key-pair.
Derived Annotations API
Derived annotations are to be considered “transient” data. This means they are subject to be recalculated any time, either the input JSON schema changes or the actual annotations change. This implies that derived data will not be migrated between stacks, but instead, recalculated on each stack.
Derived annotations are to be considered separate from the actual annotation of a Entity. For example, an actual annotation is part of the persisted data of an Entity. While a derived annotation might be cached, it will not be part of the persisted data of an Entity.
JSON Schema Binding API Changes
Currently the API: PUT /entity/{id}/schema/binding is used to bind a JSON schema to a Entity. We propose extending the BindSchemaToEntityRequest object to include a new boolean property called “automaticallyIncludeDerivedAnnotations” with default value of “false”. With this value set to “false” Synapse will not attempt to calculate derived annotations for the Entities bound to this schema. However, when “automaticallyIncludeDerivedAnnotations=true”, Synapse will automatically, calculate the derived annotations for the Entities bound to this schema. Note: This new property value will be persisted with the JSON Schema’s binding data.
Entity Services API Changes
Currently there are three APIs for getting the annotations of an Entity:
Each API returns the annotations of the given entity id/(version). In order to get the derived annotations of an Entity, we propose extending each of these APIs to include a new boolean parameter named “includeDerivedAnnotations” with a default value of “false”. When the “includeDerivedAnnotations=true”, the results will include both the actual annotations, and the derived annotations.
Note: Each service currently returns the Entity’s “etag” in addition to the annotations. When a user wishes to update the annotations of an entity, they must include the provided “etag” with the update request. However, when “includeDerivedAnnotations=true” each service will not return the “etag”. This is done to prevent the user from accidentally updating the annotations of an Entity with the transient derived annotations.
Note: When “includeDerivedAnnotations=true”, the caller will not be able to distinguish the actual annotations from the derived annotations in the results.
Entity View API Changes
Currently, an EntityView is configured with a list of ColumnModels that define the schema of that view. Users will typically use the following asynchronous service to get the possible ColumnModels when setting the schema of their views: POST /column/view/scope/async/start. In order to create an EntityView that includes derived columns, we propose extending this API’s request object: ViewColumnModelRequest to include a boolean parameter named: “includeDerivedColumns“ with a default value of “false”. When this parameter is set to “true”, the services will include derived columns as possible results. In this way, users will be able to configure their views to include derived columns.
Entity Manifest Changes
Currently, when a user downloads a FileEntity via the packaging option of their download list (POST /download/list/package/async/start), the DownloadListPackageRequest include an option to include a manifest. When “includeManifest=true”, the package will include a CSV file contain all of the annotations for any FileEntity include in the download. We propose extending this manifest to automatically include all derived annotations.
AccessRequirement API Changes
Currently, AccessRequirment (AR) include a list of “subjectIds” that define what Entities (or Teams) the AR applies too.