Governance Stories
Users:
ACT Member - A Member of the Access and Compliance Team (ACT). This user has general access and compliance knowledge and will work with a member of the Governance Team and Community Manager to establish sharing conditions on the platform.
Governance Member - A Member of the Sage Governance team. This user is dedicated to making data as open as ethically possible. This user is responsible for developing the underlying governance model for projects (i.e., ensuring that sharing is in compliance with laws and regulations, drafting agreements and policies, developing the underlying schema, etc).
Community Manager - The person responsible for coordinating all aspects of the Synapse project. This person might not know all of the details, but they will know who to contact for anything related to the project. This person may be internal or external to Sage.
Funder - The person(s) that funds the research project.
PI - The Primary Investigator (PI) is responsible for running a research project that will generate all of the data. The PI is responsible for following institutional policies, including human subjects research requirements, and has knowledge of restrictions on secondary data use.
Lab Technician - This person is knowledgeable about running lab assays and will be familiar with the data files generated by those assays. This person will often be the generator of data files but they might not be privy to the details of the broader research project.
Clinician - A medical professional that will have direct contact with patients.
Data Curator - A person with expertise in both curation and quality control of data.
Data consumer - A research scientist that is not part of the PI’s team, who may or may not work in the same field, and has identified this data as a dataset of interest.
Pre-Release Narrative:
A PI has just been awarded a grant to fund a novel cancer research project. The funder added a condition to the grant requiring that the data generated by this project be made available to other scientists after the PI has published their initial findings. The funder recommends that the PI publish their data in Synapse. The funder works with Governance to define what the sharing criteria will be for the study, e.g., who can access the data. The PI works with Governance to convey additional conditions of use on the data based on the PI’s informed consent process, legal obligations, and/or institutional policies. With this information Governance drafts sharing agreements between institutions, and ACT drafts access restrictions (ARs) for the platform.
The PI works with a community manager that is both familiar with this type of research and is also familiar with Synapse. The community manager’s job will be to coordinate the data ingress process, data curation, and the creation of a new data portal for this data.
Part of this research project will be to gather bio samples from patients in the United States and Germany. The PI works with the clinicians of two separate and independent clinics, one in Seattle and the other in Frankfurt. The clinicians role will be to gather bio-samples from patients before and after a treatment. The clinician will provide a table that maps each bio-sample ID to a patient ID and treatment. The clinicians will also provide a table of patient data including information such as age, diagnosis, and patients geographical location.
The PI hires two separate independent laboratories to run multiple assays on the bio samples. One laboratory is in Seattle and will process all of the US samples, while the other is in Frankfurt and will process all of the German samples.
The community manager creates two folders within the project, one for the Seattle lab’s data and another for the Frankfurt lab’s data. The community manager instructs lab technicians from each lab to upload the data they generate to their respective Synapse folders using the Synapse R client. The technicians are also instructed to annotate each file with its appropriate data type. There are four data types generated by each lab - clinical data, assay data, imaging data, and genomic data. All of the data is keyed by sample ID. The community manager maps two of the data types as “genomic” information, knowing that this data is managed by Sage through controlled access.
The PI engages with the Governance and ACT members and describes any conditions of use on the data, and this involves answering relevant DUO questions. Based on the underlying governance model, and conditions conveyed by the PI, the following DUO attributes are determined to be “true” for genomic data from Germany:
Disease Specific Restriction - Data should only be accessible for cancer research.
Ethics Approval Required - Data consumers agree to have their intended use of the data approved by their institutional review board (IRB) or independent ethics committee (IEC).
Geographical Restriction - Genetic data from Germany cannot leave the country.
Project Specific Restriction - Intended use of the data should be outlined in a project/research description that is reviewed and approved or denied by the PI.
Publication Moratorium - Data consumers agree to not publish any findings with the data prior to May 20th of next year.
Based on the DUO extension created by Sage Governance, the following attributes are also associated with the US data:
Source Geography - US
Jurisdiction - HIPAA
Data Label - De-identified
Data Sensitivity - HIPAA Safe Harbor De-identified (clinical and assay data); Images and Biometric Data (imaging data); Large-Scale Omics (genomic data)
Data Tier - Registered (clinical and assay data); Controlled (imaging and genomic data)
Release Date - Data may be released to the scientific community on May 1, 2022.
Attribution - [Generic attribution statement with funder acknowledgment]
The community manager engages with the Governance and ACT members and describes the types of data that will be gathered, how it is organized, any processing of the data that will take place, and/or any other requirements for managing the data, e.g., maintaining data from Germany in a German S3 bucket. Part of this engagement is to coordinate the release of that data to the research community with the appropriate access requirements.
Once data is ready for release, the ACT Member reviews the conditions of use on the data and applies an access requirement (AR) on the data as follows:
Genomic data from Germany:
Click Wrap - Users confirm that they understand data should only be used for cancer research, that genomic data may not be transferred outside Germany, and that any findings resulting from use of the data may not be published prior to May 20th of next year.
Managed AR - Users complete a free text box describing their intended use of the data.
Managed AR - Users upload a copy of their IRB/IEC letter.
Genomic data from the US:
Click Wrap - Users confirm that they understand data should only be used for cancer research and that any findings resulting from use of the data may not be published prior to May 20th of next year.
Managed AR - Users complete a free text box describing their intended use of the data.
Managed AR - Users upload a copy of their IRB/IEC letter.
All other data:
Click Wrap - Users confirm that they understand data should only be used for cancer research and that any findings resulting from use of the data may not be published prior to May 20th of next year.
Release Narrative:
Once the PI has published their manuscript (approximately on or around May 20th of next year), data is released to the scientific community. Other researchers may locate the data based on a “cancer” query. Once they have identified this data as data of interest, governance metadata provides additional information about how they can access and use this data. Researchers may determine whether it is worthwhile to them to fulfill the access requirements.
In addition, other repositories linking to this data, such as in a portal view, can determine how and what governance metadata to surface to their users regarding how users can access and use this data.
Data correction Narrative:
The data curator detects that the geographical location of two patients was transposed. They fix the issue by making the appropriate corrections in the patient’s table.
This data change has an effect on how the data associated with each patient should be governed. The data of the US patient should be less restrictive while the data for the German patient should be more restrictive. Ideally, the system would detect this change and automatically apply the appropriate restrictions. Is an audit needed to determine if anyone, unknowing, moved German genomic data out of the country of origin due to the original transposition?
Data Discovery Narrative:
An independent cancer researcher wishes to find data that is restricted to their area of expertise. The researcher currently works in the US and specializes on genomic data. Therefore, they want to find all data restricted to cancer research that can be accessed in the US.
A data consumer works for a pharmaceutical company. They need find data that is relevant to their area of research, but also the data must be available for commercial use. Therefore, they wish to filter out all data that is restricted to non-commercial use.
Data Download Narrative:
A data consumer has downloaded many files from many projects in Synapse. The data consumer needs to keep track of which the various restrictions, that they agreed to, for each file they download. One tool for keeping track of this would be to include this metadata in the “manifest” file that accompanied their data download. It would also be useful for the manifest to include all of the metadata, such as, patient information and treatment.
Embargo Change Narrative:
Due to unforeseen circumstances, the PI will not be able to publish their results by the original agreed upon date of May 20th. The PI, estimates they will need an additional three month extension. Ideally, there would be a single place where embargo date can be changed, and automatically propagated to all effected files. Should all data consumers that have already accessed the data be notified of the new embargo date?
Review Delegation Narrative:
The community manager of the project has been properly trained to be approve/reject access request submissions for one of the managed ARs in the project. A member of the ACT should be able to grant the community manager permission to “REVIEW” in order to delegate the approval process.
Auditing Narrative:
On a biannual basis, ACT conducts an audit of data stored on the platform. Among other things, This audit is intended to surface potential threats to the integrity of the platform, including:
A Synapse user who intentionally or inadvertently accesses controlled-access data without qualification, and
Inappropriate egress of data, e.g., data moving from controlled-access to open-access without rationale or controlled-access data being duplicated and shared under different access conditions.
The ACT Member queries the data warehouse to identify an entity that has been “released”, i.e., viewable by platform users, prior to the entity’s annotated Release Date.
The ACT Member queries the data warehouse to identify any conflicts in entities' governance annotations and User Profiles or Passports. A potential finding is followed up with community managers and users to confirm whether access was inadvertently granted.
The ACT Member queries the data warehouse to identify any conflicts in entities' governance annotations and ARs. A potential finding, such as an entity (or duplicate of an entity) moving from a more restricted to a less restricted state, is cross-referenced with the project’s underlying governance model to confirm that the annotations are aligned with the project’s sharing expectations. If the annotations are not aligned with sharing expectations or the AR is not aligned with the annotations, the ACT Member follows up with the community manager and/or the PI to confirm how data should be shared.
Reporting Narrative:
A community manager pulls reports for funders that demonstrate the ‘health’ of the repository/community. This report provides an update on project milestones, e.g., number of studies with data contribution and volume of data contributed, as well as transparency for how data is being shared, e.g.,
Volume of data contributed to
Open access tier
Registered access tier
Restricted access tier
Controlled access tier
Data access requests
Number approved, rejected, and total requests
Length of time between access request and access decision (i.e., approved, denied)
Requests by study, data type, &/or tier.
An ACT Member pulls reports for compliance purposes, including ensuring data is not shared prior to release date, ensuring appropriate friction is applied based on data sensitivity, assessing which datasets are subject to changes in legislation, and that appropriate licenses have been applied.
Patient Removal Narrative:
A patient the original participated in the study has decided that they want to be removed from the study. For this case all data related to this patient must be removed. What else needs to happen for this case?