...
However, it is unlikely that we can simply grant data providers the permission to assign AR as they see fit due to a conflicts of interest. Funding agencies are interested in sharing the data they fund as broadly as is ethically possible to maximize the return-on-investment. The researchers that receive the founding are typically only interested in sharing their data after they have extract all possible insights from it. It is common for funders to add sharing conditions to research they fund to encourage researcher to share. According to our governance team, it is typical for a researcher to select “true” for each question of the DUO questionnaire, in an effort to lock down their data. The assumption is they can claim to have shared the data while at the same time ensuring it is nearly impossible for anyone to actually access the data.
...
Note: We will reference each of these ARs by their access requirement ID. The expected annotations column describes the expected DUO annotation key-value-pairs that would be any file that would under that AR.
The next step is to build a project specific JSON schema that defines how DUO should be applied. It is expected that the DUO specific elements and ARs in this schema would be the result of a collaboration between ACT and the Community Manager/Data Curator. The following JSON schema is an examples of such schema that could be bound to the project from the main governance narrative:
Code Block | ||
---|---|---|
| ||
{ "$schema": "http://json-schema.org/draft-07/schema", "title": "Schema for Some Project", "$id": "some.project-main-1.3", "description": "This schema defines how DUO should be used with Some Project.", "allOf": [ { "$ref": "org.sagebionetworks-repo.model.FileEntity-1.0.0" }, { "$ref": "ebispot.duo-duo-1.0.1" }, { "if": { "properties": { "patientLocation": { "const": "Germany" }, "assayType": { "const": "genomic" } } }, "then": { "properties": { "GS": { "title": "geographical restriction", "type": "boolean", "const": true }, "GS_location": { "type": "string", "description": "This data cannot leave Germany", "const": "Germany" }, "_accessRequirementIdsar4":{ "const": "4"true } }, "required": [ "GS_location" ] } }, { "if": { "properties": { "patientLocation": { "const": "USA" }, "assayType": { "const": "genomic" } } }, "then": { "properties": { "sourceGeography": { "const": "US" }, "jurisdiction ": { "const": "HIPAA" }, "dataLabel": { "const": "De-identified" } }, "required": [ "sourceGeography", "jurisdiction", "dataLabel" ] } } ], "properties": { "assayType": { "description": "Identifies they type of data for this files.", "type": "string", "enum": [ "clinical", "assay", "imaging", "genomic" ] }, "patientLocation": { "description": "The location of the patient associated with the data", "type": "string", "enum": [ "USA", "Germany" ] }, "RS": { "title": "research specific restrictions", "type": "boolean", "const": true }, "RS_research_type": { "title": "Restricted to cancer research", "type": "string", "const": "cancer" }, "IRB": { "title": "ethics approval required", "type": "boolean", "const": true }, "MOR": { "title": "publication moratorium", "type": "boolean", "const": true }, "MOR_date": { "title": "publication moratorium date", "type": "string", "format": "date", "const": "2022-05-20" }, "_accessRequirementIdsar1":{ "const" : true }, "_ar2":{ "const" : true }, "_ar3":{ "const" : "1,2,3"true } }, "required": [ "assayType", "patientLocation", "NRES", "HMB", "DS", "POA", "RS", "NMDS", "GSO", "NPUNCU", "PUB", "COL", "IRB", "GS", "MOR", "TS", "US", "PS", "IS", "RTN", "GRU", "CC", "NPOA", "NPU", "NCU" ], } |
...
Notice that line:8 indicates that this schema applies to FileEntities, while line:11 indicates that the schema “extends” the DUO schema. The first two properties of the schema: assayType
(line:7577) and patientLocation
(line:8587) are drivers that will determine what conditional properties must be applied to each file. There are two if/then blocks (line:13 14 to 7375) that define what conditional properties should be applied based on assayType
and patientLocation
. The properties RS
, RS_research_type
, IRB
, MOR
, MOR_date
, are all unconditional properties, with constant values that must be applied to all files in the project.
...
Synapse would be expected to use these the"_accessRequirementIdsar#"
properties for guidance to “automatically” associate files with ARs according to the rules defined in the schema. Specifically "_accessRequirementIdsar4": [ "4" ]true
(line:3637) indicates that ARid:4 should be applied to any file with the annotations: "assayType": "genomic"
and "patientLocation": "Germany"
. While "_accessRequirementIds":[ "1ar1", "2_ar2", "3_ar3" ]
(line:119121 - 129) indicates that ARids: 1,2,3 should be applied to all files in the project unconditionally.The following JSON is an example of what “valid” properties could be
We will also add derived annotations with with each of these keys and a value: true to each entity. This would allow a view designer to add these columns to a view, thus enabling end-users to query for files based on ARs.
Note: We will need to block users from adding any annotations with the prefix "_ar"
.
The following JSON is an example of what “valid” properties could be for syn1 from the example above:
Code Block | ||
---|---|---|
| ||
{ "name": "GermanGenomic.data", "description": "Genomic data from patients in Germany", "id": "syn1", "etag": "some-etag", "createdOn": "2020-05-20T20:20:39+00:00", "modifiedOn": "2020-05-20T20:20:39+00:00", "createdBy": "123456789", "modifiedBy": "123456789", "parentId": "syn444", "versionLabel": "one", "versionComment": "leaving blank", "versionNumber": 1, "dataFileHandleId": "98765", "fileNameOverride": "", "concreteType": "org.sagebionetworks.repo.model.FileEntity", "assayType": "genomic", "patientLocation": "Germany", "NRES": false, "HMB": false, "DS": false, "POA": false, "RS": true, "RS_research_type": "cancer", "NMDS": false, "GSO": false, "NPUNCU": false, "PUB": false, "COL": false, "IRB": true, "GS": true, "GS_location": "Germany", "MOR": true, "MOR_date": "2022-05-20", "TS": false, "US": false, "PS": false, "IS": false, "RTN": false, "GRU": false, "CC": false, "NPOA": false, "NPU": false, "NCU": false, "_ar1" : true, "_ar2" : true, "_ar3" : true, "_ar4" : true } |
syn1.json
Since syn1 includes "assayType": "genomic"
and "patientLocation": "Germany"
, it must include "GS": true
and "GS_location": "Germany"
according to the rules of the first if/then. Most of the properties between lines: 19 to 44, are all constants based on this projects schema.
...
Code Block | ||
---|---|---|
| ||
{ "name": "USGenomic.data", "description": "Genomic data from patients in USA", "id": "syn4", "etag": "some-etag", "createdOn": "2020-05-20T20:20:39+00:00", "modifiedOn": "2020-05-20T20:20:39+00:00", "createdBy": "123456789", "modifiedBy": "123456789", "parentId": "syn444", "versionLabel": "one", "versionComment": "leaving blank", "versionNumber": 1, "dataFileHandleId": "98765", "fileNameOverride": "", "concreteType": "org.sagebionetworks.repo.model.FileEntity", "assayType": "genomic", "patientLocation": "USA", "NRES": false, "HMB": false, "DS": false, "POA": false, "RS": true, "RS_research_type": "cancer", "NMDS": false, "GSO": false, "NPUNCU": false, "PUB": false, "COL": false, "IRB": true, "GS": false, "MOR": true, "MOR_date": "2022-05-20", "TS": false, "US": false, "PS": false, "IS": false, "RTN": false, "GRU": false, "CC": false, "NPOA": false, "NPU": false, "NCU": false, "sourceGeography":"US", "jurisdiction": "HIPAA", "dataLabel":"De-identified", "_ar1" : true, "_ar2" : true, "_ar3" : true } |
syn4.json
Since syn4 has "assayType": "genomic"
and "patientLocation": "USA"
, according to the if/then statements it must also the following constant properties: "sourceGeography":"US"
, "jurisdiction": "HIPAA"
, "dataLabel":"De-identified"
. Note: For syn4 "GS": false
because the patient location does not equal Germany.
...
In both of these examples (syn1 & syn4) we did not include an annotation included annotations for "_accessRequirementIds"
even though that was part of the schema. This is because the "_accessRequirementIds"
is a special key that helps Synapse determine how to associate ARs, but should not actually be implemented as an actual annotation key-value-pair. In fact, Synapse should reject any attempt to set an annotation with the special key: "_accessRequirementIds"
ar#"
. Both files have ‘true’ for AR IDs 1-3, since they are unconditional. Syn1 has "_ar4" : true
, indicating it requires that condition AR. Syn4 excludes "_ar4"
since it is not required for that condition.
Derived Annotations
The example above for both syn1 and syn4, indicates that all of the the governance specific metadata is derived from two sources:
...