Document toolboxDocument toolbox

Governance Glossary

 

Term

Definition

Discussion/Guidance

References

 

Term

Definition

Discussion/Guidance

References

 

1

Access Renewal

 

Resubmission of accessor(s) Synapse access request to enable continued access to data governed by a Managed AR with an access expiration period.

*Applicability: Synapse

Access renewal settings, including specified intervals, are set when an access requirement is setup by ACT. Renewals and their intervals are determined as part of Conditions of Use for the data.

Automated Synapse emails are sent to request submitters 2 months and 1 month before their access is set to expire. If the user does not resubmit an access request application by the expiration date they will lose access to the respective data. Most access renewal periods are yearly, but this field can be customized by ACT during the Access Requirement setup.

 

 

2

Access Requirement (AR)

 

A data restriction, or lock, placed on a Synapse entity (a folder, file, project, or team) and set according to the data contributor’s established Conditions for Use defining the requirements that must be met by a user in order to be allowed access to the entity.

*Applicability: Synapse

ARs are applied to controlled-access data and may be applied in the form of a Managed AR and/or a Click-wrap. A Managed AR requires a user to submit an Access Request. Access Requests may be reviewed and approved by members of the Synapse Access and Compliance Team (ACT) or a Data Access Committee (DAC). A Click-wrap requires a user to read data terms and conditions and click "I accept" before obtaining access. Click-wraps do not require ACT or DAC approval. ARs can be set up for projects, folders, tables, and teams.

 

 

3

Access Request

 

An electronic application submitted via Synapse by a user seeking permission to controlled-access data protected by a Managed AR requiring the user fulfill terms and conditions of the AR and review and approval either by the Access and Compliance Team (ACT) or a Data Access Committee (DAC). An Access Request user may be submitted by a single user on behalf of several collaborating Synapse users at their institution.

*Applicability: Synapse

Once the designated request reviewer - either Access and Compliance Team (ACT) or Data Access Committee (DAC) - issues an approval or rejection of an Access Request via Synapse, an email is generated and sent to the submitter of the Access Request.

If the Access Request is approved, the approval email will alert the submitter that they now have access to the Synapse entities associated with the Managed AR for which the request was submitted.

If the Access Request is rejected, the rejection email will include notes from the reviewer explaining the reason for rejection, guidance for successful resubmission, and will redirect the submitter to the Managed AR in Synapse to resubmit their Access Request. Approval / rejection emails are only sent to the user who submitted the Access Request and not to other users who may also have been listed on the Access Request.

 

 

4

Access Request Submitter

 

A Synapse user who submits an Access Request via Synapse for access to controlled-access data protected by a Managed AR. The Access Request Submitter may submit the Access Request on behalf of the user only or may also list additional collaborating Synapse users from a single institution.

*Applicability: Synapse

A single Access Request will have a single submitter via Synapse who completes and submits the Access Request and is the only Synapse user who will receive approval/rejection emails generated for the Access Request. Multiple collaborating Synapse users from a single institution may be included by the submitter for data access through a single Access Request, but these additional users are not considered Access Request Submitters and will not receive approval/rejection emails generated for the Access Request.

 

 

5

Acknowledgement Statement

Attribution Statement

 

A statement set forth by a Data Contributor to be used by data recipients to include in publications, talks, presentations, etc., to ensure the Data Contributor (and any other relevant bodies, such as participants or funders, or Sage Bionetworks) are recognized for their efforts surrounding the data.

*Applicability: Synapse

Acknowledgement Statements are usually posted on the project wiki page or directly in a click-wrap agreement.

 

 

6

Access and Compliance Team (ACT)

 

A Sage Governance sub-team that has Synapse administration privileges enabling members to process access requests, create and manage ARs, validate user profiles, and take other administrative actions for governance purposes.

*Applicability: Synapse

 

 

 

7

Access Tiers

A categorization used to designate the level of restriction that should be applied to data based on factors such as risk of identifiability or limitations on use.

*Applicability: Synapse/General

Access Tiers are defined by Governance in a manner appropriate to each individual study. Terms used to describe access tiers include “Open/Anonymous/Whitelisted”, “Registered,” “Restricted,” “Controlled,” and “Controlled-Plus.”

  • Open/Anonymous/Whitelisted: data that is available for anyone on the web without requiring them to fulfill Conditions of Use

  • Registered: data that is available to registered users of Synapse

  • Restricted/Controlled: data that is available to registered users of Synapse who fulfill specific requirements for data access, such as submitting an Intended Data Use statement, agreeing to data use limitations, becoming Certified Users in Synapse, and/or undergoing Profile Validation.

  • Controlled-Plus: data that is restricted/controlled and is sensitive enough that additional prerequisites are required such as submitting an IRB approval letter or other institutional documentation.

 

 

8

Aggregate Data

Data produced by grouping information into categories and combining values within these categories.

*Applicability: General

Also known as tabular data or macrodata. Often presented in tables. Since aggregate data is the combination of individual-level data, aggregate data is often a term used to describe data that is “less easy” to identify individual subjects; however, disclosure risks can arise if a user can access multiple tables containing common data elements. Data reduction treatments (such as combining categories so sample sizes within categories represent a larger n) and data modification treatments (such as rounding or adding perturbations so the potential for re-identification is reduced) are example methods that can be applied to aggregate data as part of a robust data privacy strategy.

Definition Source: Data Confidentiality Guide, Australian Bureau of Statistics

 

9

Anonymous Data

Anonymized Data

(1) Broad Definition:

Individual-level data that has been stripped of personally identifiable information.

(2) Enhanced Definition:

Individual-level data that cannot be used alone or with other data to identify a unique individual.

*Applicability: General

Anonymization performed through simple de-identification techniques are useful as a primary safeguard for protecting privacy, but a growing body of literature has shown that as the size and diversity of available data grows, the likelihood of being able to re-identify individuals also grows substantially.

When communicating the protectiveness of de-identification, “anonymization” should be used carefully so as to not mislead participants or the community that “anonymized” data without additional treatment or analysis is a robust method of protecting against future re-identification.

In the “Enhanced Definition,” the data cannot be coded such that a link to the identifiers existing in a separate, existing data set could re-identify the individual.

 

 

10

Anonymous User

 

A Synapse user interacting with the platform without creating (or logging into) a Synapse account.

*Applicability: Synapse

Anonymous Users are able to review platform features, public resources (including the catalog of public projects, files, and tables), and other Anonymous Access Data.

Anonymous Users cannot create Projects in Synapse, upload or download data, add wiki content, or comment in discussion forums.

 

 

11

Biometric Data

Personal data (see below) resulting from specific technical processing relating to the physical, physiological or behavioral characteristics of a natural person, which allow or confirm the unique identification of that natural person, such as facial images or dactyloscopic data.

*Applicability: General, GDPR

Uses: This definition may be used broadly outside of the scope of GDPR, but the definition source is from GDPR.

Definition Source: GDPR Article 4, See also: 21 CFR 11.3(3)

 

12

Breach

(1) General Applicability:

The loss of control, compromise, unauthorized disclosure, unauthorized acquisition, or any similar occurrence where (1) a person other than an authorized user accesses or potentially accesses personally identifiable information or (2) an authorized user accesses or potentially accesses personally identifiable information for an other then authorized purpose.

(2) HIPAA Applicability:

The acquisition, access, use, or disclosure of protected health information in a manner not permitted [by the regulations] which compromises the security or privacy of the protected health information.

*Applicability and Uses - Definition 1: Can be used broadly for breach incidents that do not involved HIPAA-regulated data. This definition is adopted from Office of Management and Budget; however, it does not carry regulatory weight.

*Applicability and Uses - Definition 2: Applies only to Breaches subject to HIPAA regulations.

 

Definition Source, HIPAA: 45 CFR 164.402

Definition Source, General: OMB M-17-12

 

13

Business Associate (under HIPAA)

A business associate, with respect to a covered entity, is a person or entity who:

On behalf of such covered entity […] creates, receives, maintains, or transmits protected health information for a function or activity regulated by [HIPAA], including claims processing or administration, data analysis, processing or administration, utilization review, quality assurance, patient safety activities […], billing, benefit management, practice management, and repricing.

*Applicability: HIPAA

Uses: This definition should only be applied to situations where Sage is agreeing to take on a Business Associate role.

A Business Associate role can only be taken on when a formal contract (Business Associate Agreement) meeting regulatory requirements has been executed. When an organization agrees to be a Business Associate, HIPAA regulations are applied in full effect (including enforcement requirements, breach requirements and the possibility of penalties).

Definition Source: 45 CFR 160.103

 

14

Business Associate Agreement (BAA)

A contract between a covered entity and the entity or person agreeing to participate in activities on behalf of the covered entity. The BAA establishes the permitted and required uses and disclosures of protected health information by the business associate.

*Applicability: HIPAA

Uses: This definition should only be applied to situations where Sage is agreeing to take on a Business Associate role.

Definition Source: 45 CFR 164.502(e)

 

15

Certified User

A Synapse user who has created a Synapse ID, has logged into Synapse using their email and password, and has successfully completed the Certification Quiz.

*Applicability: Synapse

To become a Certified User, a Registered User must pass a short quiz concerning the Synapse Commons Data Use Procedure to ensure the user understands the rules and policies that govern data sharing on Synapse.

Certified Users have access to full Synapse functionality, including the ability to upload files and tables as well as create folders.

 

 

16

Certification Quiz

A quiz which is taken by a Registered User to become a Certified User and ensures the user understands the rules and policies that govern data sharing on Synapse.

*Applicability: Synapse

To become a Certified User, a Registered User must pass a short quiz concerning the Synapse Commons Data Use Procedure to ensure the user understands the rules and policies that govern data sharing on Synapse.

The Certification Quiz is 15 questions and takes approximately 15-20 minutes to complete.

 

 

17

Certificate of Confidentiality (CoC)

A CoCs is a tool to protect information, documents, and/or biospecimens that contain identifiable, sensitive information related to a research participant with the intention to protect the privacy of research participants by prohibiting disclosure of identifiable, sensitive research information to anyone not connected to the research except when the participant consents or in a few other specific situations.

*Applicability: NIH, General

CoCs are:

  • Established by the Public Health Service Act §301(d), 42 U.S.C. §241(d), "Protection of privacy of individuals who are research subjects”

  • Applicable only to human subjects research studies in which identifiable, sensitive information is collected or used.

  • Issued by NIH and other HHS agencies (e.g., CDC) for research studies.

    • Since 2017, NIH automatically issues CoCs for any NIH-funded research meeting their criteria.

    • Researchers can also apply for a CoC for non-NIH funded research studies.

For more information, see the NIH FAQs page.

OHRP Guidance: Certificates of Confidentiality - Privacy Protection for Research Subjects

NIH CoC

 

18

Click-wrap

A type of Access Requirement placed on a Synapse entity (a folder, file, project, or team) that can be satisfied by the user by reviewing the data contributors conditions and clicking the button that states "I accept the terms of use.”

*Applicability: Synapse

Click-wraps generally contain Terms and Conditions of data use (i.e., what you can and cannot do with the data) and often contain an Acknowledgement Statement.

 

 

19

Community Governance

Policies, processes, and structures that guide and oversee the research activities such as the research design, data collection, analysis, tools, methods, and dissemination. Community Governance:

1) ensures the ethical, responsible, and accountable conduct of research activities; and

2) protects the rights and well-being of research participants and maintains the integrity of the research process.

*Applicability: General

Who is involved: Research consortia, steering committees, funders

 

 

20

Conditions of Use

A set of expectations and/or terms for data access applied to Synapse content.

*Applicability: Synapse

Conditions of Use are organized to help Requesters comply with the terms under which the data were collected or with other human subjects regulations. Data Contributors collaborate with ACT to set up Conditions for Use in the form of an Access Requirement.

 

 

21

Coded Data

Data is coded when:

  1. Identifying information (such as name or social security number) that would enable the investigator to readily ascertain the identity of the individual to whom the private information or specimens pertain has been replaced with a number, letter, symbol, or combination thereof (i.e., the code); and

  2. A key to decipher the code exists, enabling linkage of the identifying information to the private information or specimens.

*Applicability: General

Uses: This definition may be used broadly.

Definition Source: OHRP Guidance: Coded Private Information or Specimens Use in Research (2008)

 

22

Covered Entity

(“HIPAA Covered Entity”)

Covered entity means:

(1) A health plan.

(2) A health care clearinghouse.

(3) A health care provider who transmits any health information in electronic form in connection with a transaction covered by this subchapter (45 CFR 160.102).

*Applicability: HIPAA

Uses: This definition should only be used to determine whether an institution (entity) is subject to HIPAA regulations.

Sage is not a covered entity. Covered entities are generally organizations engaged in health care operations that cause them to be subject to HIPAA laws.

Related Definitions: Hybrid Entity, Business Associate

Definition Source: 45 CFR 160.103

 

23

Creative Commons License

One of several public copyright licenses that enable the free distribution of an otherwise copyrighted work and is used when an author wants to give other people the right to share, use, and build upon a work that the author has created.

*Applicability: Synapse, General

This is required for most data in the Open Access Data Tier.

https://creativecommons.org/licenses/

 

24

Data Access Committee (DAC)

An individual or group who reviews and approves or rejects applications or requests for access to and use of data governed by a managed AR.

*Applicability: Synapse, General

The Access and Compliance Team (ACT) serves as the Sage Data Access Committee (DAC).

 

 

25

Data Concerning Health

Personal data (see below) related to the physical or mental health of a natural person, including the provision of health care services, which reveal information about his or her health status.

*Applicability: GDPR

Uses: This definition need only be used when working with data subject to GDPR.

Related Definition (HIPAA): Health Information

Definition Source: GDPR Article 4

 

26

Data Contributor

The owner (individual, group or institution) of data (which may include analysis and/or tools) who uploads data content to Synapse.

*Applicability: Synapse, General

For Synapse communities involving Sage services (such as data curation), a data ingress agreement may be required before a Data Contributor can upload their data.

 

 

27

Data Governance

Policies, procedures, and controls of research assets including access management, safe and responsible use, supporting interoperability, and contributing to overall data lifecycle management. Data Governance:

1) ensures data is managed, protected, and utilized effectively and responsibly within an organization; and

2) ensures the availability, usability, integrity, and security of data.

*Applicability: General

Who is involved: ACT, IT and Security, Platform engineers, Privacy officers

 

 

28

Data Incident

An occurrence that (1) actually or imminently jeopardizes the integrity, confidentiality, or availability of information or an information system, or (2) constitutes a violation or imminent threat of violation of law, security policies, security procedures, or acceptable use policies.

*Applicability: General

This definition is adopted from Office of Management and Budget; however, it does not carry regulatory weight.

Definition Source: OMB M-17-12

 

29

Data Ingress Agreements

A formal contract between parties that outlines the terms and conditions under which data is allowed to enter or be imported into Synapse.

*Applicability: General

This term represents:

  • Data Processing Agreement (DPA)

  • Data Sharing Agreement (DSA)

  • Data Sharing Permission (DSP) - this is a Sage term that is not commonly used externally.

  • Data Transfer Agreement (DTA)

  • Data Transfer and Use Agreement (DTUA)

  • Data Use Agreement (DUA) - See separate definition. Note that DUAs that are issued for HIPAA Limited Data Sets must meet specific requirements as dictated by HIPAA. “DUA” should be reserved for HIPAA-regulated data sets unless the countersigning party has a preference for using this term.

  • Materials Transfer Agreement (MTA)

  • Memorandum of Understanding (MOU)

Sage engages in a variety of agreements with external customers and partners to define: the expectations for providing data to Synapse; the roles and responsibilities each party takes to manage data; and the conditions under which data will be shared with other users and/or institutions from a Sage platform (e.g., authorized persons, access tiers, security boundaries); and the roles and responsibilities that Sage may take on for reviewing access requests.  The scope and applicability of these agreements is dependent upon a number of project-specific factors, including participant consent, data types, contractual obligations, institutional policies, rules and regulations, funder mandate, and/or research community sharing expectations.

A data ingress agreement is required for institutions contributing data to a Synapse community, and/or for institutions that are having Sage manage data access for them. Sage Governance may attempt to use a standard template to meet the agreement needs (e.g., using an FDP template), but the type and content of the agreement can vary widely depending on the nature of the data, the scope of work, and the preferences of the institution.

Note that a grant or other existing agreement such as a Data Use Agreement (DUA) can take the place of an additional ingress agreement as long as it is signed by an Institutional Signing Official and the existing document mentions that data will be stored in a repository matching the project’s access controls.

 

 

30

Data Management

The process of validating, organizing, protecting, maintaining, and processing scientific data to ensure the accessibility, reliability, and quality of the scientific data for its users.

 

Definition Source: NIH NOT-OD-21-013 (Data Sharing and Management Plans)

 

31

Data Protection Impact Assessment (DPIA)

A tool used to identify risks, impact or risks arising out of the processing of personal data and build awareness to minimize these risks as much and as early as possible.

*Applicability: General, GDPR

This is a general tool that may be used at Sage regardless of the regulatory oversight.

A Data Protection Impact Assessment (DPIA) is required under the GDPR any time a new project is initiated that is likely to involve “a high risk” to personal information. (More HERE.)

Sage Data Protection Policy

GDPR Article 35

 

32

Data Repository

A database of research data maintained for the purpose of performing secondary research.

*Applicability: General

Additional synonyms: banks, registries, libraries

Data repository activities can include data curation and data maintenance (i.e., “data management”), and access management. A data repository containing de-identified data is not “research,” through the downstream product of the repository is for research.

 

 

33

Data Requester

All individuals listed on a Synapse Access Request for access to data.

*Applicability: Synapse

When applicable for a Managed Access Requirements (AR), all Data Requesters listed on a Synapse data Access Eequest should exactly match the Data Requesters as listed on the associated Data Use Certificate (DUC).

 

 

34

Data Sharing

The act of making scientific data available for use by others (e.g., the larger research community, institutions, the broader public), for example, via an established repository.

*Applicability: NIH, General

Definition Source: NIH NOT-OD-21-013 (Data Sharing and Management Plans)

 

35

Data Sharing and Management Plan (DSMP)

A plan describing the data management, preservation, and sharing of scientific data and accompanying metadata.

*Applicability: NIH, General

See also NOT-OD-21-014: Supplemental Information to the NIH Policy for Data Management and Sharing: Elements of an NIH Data Management and Sharing Plan

Source: NIH NOT-OD-21-013 (Data Sharing and Management Plans)

 

36

Data Subject

 

 

Identified or identifiable living individual to whom personal data relates.

*Applicability: GDPR

Uses: This definition need only be used when working with data subject to GDPR.

Related Definitions: Human Subject

Definition Source: GDPR Article 4

 

37

Data Use Agreement (DUA)

(1) General Applicability:

A contractual document used for the transfer of data that has been developed by nonprofit, government or private industry, where the data are nonpublic or is otherwise subject to some restrictions on its use.

(2) HIPAA Applicability:

An agreement between a covered entity and a limited data set recipient to establish permitted uses and disclosures by the recipient.

*Applicability: General, HIPAA

Uses: This definition has broad uses, but the HIPAA definition is for specific circumstances where a covered entity is disclosing a Limited Data Set to another institution.

  • Non-HIPAA uses of the term generally refer to data sharing agreements between institutions and may be synonymous with “DTA,” “DSA,” “MOU,” and similar agreements to govern data sharing.

  • HIPAA: DUAs under HIPAA must meet specific regulatory requirements. The terms of the DUA define the allowed uses. HIPAA regulations prohibit the recipient from further disclosing or using the information in a manner that would violate HIPAA regulations or the agreement. Recipients under the agreement are required to use appropriate safeguards to prevent use or disclosure of information outside of the defined terms of the agreement.

Definition Sources:

General: UPitt Office of Sponsored Programs

HIPAA: 45 CFR 164.514(e)(4)

 

 

38

Data Use Certificate (DUC)

A documented agreement outlining the terms of use for accessing a specific Synapse dataset, which must be signed by the Data Requester(s) and often also requires the signature of an institutional Signing Official.

*Applicability: Synapse

Managed ARs can be created to require submission of a Data Use Certificate (DUC) for data access.

 

 

39

De-identification

De-Identified Data

(1) Non-HIPAA/General:

Information that has had personally identifiable information (PII), including PHI, removed.

(2) HIPAA Safe Harbor Method:

(i) Removal of the 18 identifiers defined in 45 CFR 164.514(b)(2)(i)(A)-(R) [paraphrased]

and

(ii) The covered entity does not have actual knowledge that the information could be used alone or in combination with other information to identify an individual who is a subject of the information.

(3) HIPAA Expert Determination/Statistical Method:

A person with appropriate knowledge of and experience with generally accepted statistical and scientific principles and methods for rendering information not individually identifiable:

(i) Applying such principles and methods, determines that the risk is very small that the information could be used, alone or in combination with other reasonably available information, by an anticipated recipient to identify an individual who is a subject of the information; and

(ii) Documents the methods and results of the analysis that justify such determination.

*Applicability: HIPAA, General

Uses:

Non-HIPAA/General Considerations:

HIPAA’s de-identification standards are over 20 years old and numerous studies have demonstrated many ways in which data labeled as de-identified can be re-identified.

The U.S. Department of Health and Human Services (“HHS”) Secretary’s Advisory Committee on Human Research Protections (“SACHRP”) has noted, for example:

Though de-identification is commonly perceived to be an effective means to protect human participants, certain studies have shown convincingly that other data can be used in conjunction with de-identified data from research studies to re-identify individuals.  Increasingly, the protections afforded by removing the eighteen identifying data elements cited in HIPAA have become out of date, as technological advances and the combining of data sets increase the risk of re-identification.  For example, commercial interests have increasingly been trying to combine large, de-identified data sets with real-world data collected during the course of ordinary daily activities (e.g., credit card charges, driving habits), which increases the risk of re-identification and misuse of previously de-identified data. 

It is important to note that these de-identification methods are not recognized globally. GDPR requirements in the European Union, for example, are comparatively more rigorous. However, GDPR does not provide any specific de-identification methods.

At Sage, HIPAA standards for de-identification are applied broadly in recognition of national standards and as a basic foundation for protecting privacy; however, Governance’s evaluation of data sensitivity and privacy risks must take into account the limitations of HIPAA de-identification standards in favor of more rigorously protective methods or systems.

 

HIPAA:

HIPAA has defined two de-identification methods that have become a national standard. These definitions specifically apply to protected health information (PHI), which is created by and transmitted by a covered entity, but have been applied broadly across the U.S. and within the research profession.

For more information, see “Guidance Regarding Methods for De-identification of Protected Health Information in Accordance with the HIPAA Privacy Rule (2012)

Definition Sources:

(1) HIPAA 45 CFR 164.514(b)(2)

(2) HIPAA 45 CFR 164.514(b)(1)

 

40

Derived Data

New data created by transforming, processing, or analyzing existing data.

*Applicability: General

 

 

41

FISMA

(Federal Information Security Management Act of 2002 and Federal Information Security Modernization Act of 2014)

A U.S. federal law (FISMA 2002) which requires each federal agency to develop, document, and implement an agency-wide program to provide information security for the information and systems that support the operations and assets of the agency, including those provided or managed by another agency, contractor, or other sources.

FISMA 2014 amends FISMA 2002 by modernizing federal security practices to address evolving security concerns resulting in less overall reporting, strengthening the use of continuous monitoring in systems, and increasing focus on the agencies for compliance and reporting that is more focused on the issues caused by security incidents.

FISMA 2014 also required the Office of Management and Budget (OMB) to amend/revise OMB Circular A-130 to eliminate inefficient and wasteful reporting and reflect changes in law and technological advances.

*Applicability: Synapse, General

Synapse is a FISMA-compliant platform. See the Synapse Platform page for more information.

 

 

42

Fully-Executed

Term used when all Parties’ authorized representatives have formally signed the Project Material(s).

 

 

 

43

General Data Protection Regulation (GDPR)

Rules and privacy regulations governing data in the European Union (EU). GDPR establishes personal data privacy protections as a fundamental right.

*Applicability: GDPR

Fulltext of GDPR: https://gdpr.eu/tag/gdpr/

 

44

Genetic Data

Personal data (see below) relating to the inherited or acquired genetic characteristics of a natural person which give unique information about the physiology or the health of that natural person and which result, in particular, from an analysis of a biological sample from the natural person in question.

*Applicability: GDPR, General

Uses: This definition may be used broadly outside of the scope of GDPR.

Definition Source: GDPR Article 4

 

45

Governance Structures

Governance Models

The data sharing framework that dictates what data to acquire, how to bring them into systems, how to store them, how to analyze them, and how to share downstream knowledge.

*Applicability: General

Types of Governance Structures:

  • Pairwise (One-to-one): Two parties agree to work together and/or share on a data set in some fashion, typically with a closed contract or an informal agreement. The negotiation terms depend on the relative status of the parties and/or the value of the data and knowledge.

  • Open Source (One-to-many or some-to-many): Data are distributed for reuse with a license defining reuse rights and conditions. The creator is in charge of the negotiation at first (choice of license), but then rights to analyze and redistribute are permanently transferred to the user. This is typical of a centralized project in the sciences, i.e., the Human Genome Project.

  • Federated Query (Many-to-many, via platform): Data are housed in a variety of locations, and users are able to query to those local data simultaneously. Typically restricted to pre-configured queries (rather than data exploration) and may require registration before use.

  • Trusted research environment (Many-to-some): Data are housed in a central location under a contractual regime including data transfer and use agreements. Users apply to use the data. Users must “visit” the data rather than download them, agree to be known, and, in some cases, agree to be surveilled by a data steward.

  • Model-to-data (One-to-many): Data are held by a steward who is responsible for running algorithms on the behalf of researchers. In some cases, a synthetic version of the data may be released openly to facilitate model training. Researchers develop algorithms, send them to the steward, and receive back output of their analysis as run on the real dataset. The variety of analyses that may be performed is restricted by this structure, because the data steward must ensure data are specifically curated for any analytical question at hand.

  • Open citizen science (Many-to-many): Rights to use and distribute data are often fully decentralized via license or contract. Open citizen science is a peer-to-peer version of open source science.

  • Clubs and Trusts (Some-to-some): Clubs and Trusts are versions of a common pool resource: a group of people and/or institutions who agree to share resources towards a common goal. Control over the development and negotiation of data sharing and use terms is often held by the founders/settlers (and/or funders) and then can be distributed amongst club participants. Importantly, clubs that operate in the cloud can easily publish data products that are more “open” than the club itself.

  • Closed: Data are held privately by a single party.

  • Closed and Restricted: Data are held privately in order to protect a population, meet a legal requirement, or protect a secret.

Mangravite, Lara M., Avery Sen, John T. Wilbanks, and Sage Bionetworks Team. Mechanisms to Govern Responsible Sharing of Open Data: A Progress Report. Manubot, 2020.

 

46

Health Information

Any information, including genetic information, whether oral or recorded in any form or medium, that:

(1) Is created or received by a health care provider, health plan, public health authority, employer, life insurer, school or university, or health care clearinghouse; and

(2) Relates to the past, present, or future physical or mental health or condition of an individual; the provision of health care to an individual; or the past, present, or future payment for the provision of health care to an individual.

*Applicability: HIPAA, General

Uses: This definition may be used broadly, but sub-definition (1) can be omitted if the use is not within the scope of HIPAA-regulated activities.

Related Definition (GDPR): Data Concerning Health

Defintion Sources: 45 CFR 160.103

 

47

Health Information Portability & Accountability Act (HIPAA)

US health information privacy law. HIPAA legislation resulted in regulations collectively referred to as “HIPAA” and are made up of the “Privacy Rule,” “Security Rule,” and “Enforcement Rule.”

*Applicability: HIPAA

HIPAA Legislation:

https://www.govinfo.gov/content/pkg/PLAW-104publ191/pdf/PLAW-104publ191.pdf

Combined HIPAA Regulations:

https://www.hhs.gov/sites/default/files/ocr/privacy/hipaa/administrative/combined/hipaa-simplification-201303.pdf

 

48

Human Subject

Research Participant

A living individual about whom an investigator (whether professional or student) conducting research:

(i) Obtains information or biospecimens through interaction or intervention with the individual, and uses, studies, or analyzes the information or biospecimens, or

(ii) Obtains, uses, studies, analyzes, or generates identifiable private information or identifiable biospecimens.

*Applicability: Common Rule, FDA Regulations

Uses: This definition is used primarily to determine whether information, interactions, interventions, or biospecimens used for research purposes is subject to human subjects regulations (i.e., whether IRB review is required).

This is a truncated definition. Contact Governance for an in-depth discussion.

Definition Source: 45 CFR 46.102(e) (2018 revision)

See also: Chart 01: Is an Activity Human Subjects Research Covered by 45 CFR Part 46?

 

49

Hybrid Entity

A single legal entity:

(1) That is a covered entity;

(2) Whose business activities include both covered and non-covered functions; and

(3) That designates health care components in accordance with paragraph 164.105(a)(2)(iii)(D) of HIPAA regulations.

*Applicability: HIPAA

Uses: This definition only applies to HIPAA-regulated organizations.

A typical example of a hybrid entity is a university with an affiliated teaching hospital. The hospital portion of the organization performs HIPAA-covered health care functions, while the rest of the university performs non-covered functions.

Definition Source: 45 CFR 164.103

 

50

Identifiable Data/Information

(1) Common Rule:

Data for which the identities of the source subjects are or may readily be ascertained by the investigator or associated with the information.

(2) NIH:

Data that are still attached to a readily available subject identifier such as name, social security number, study number, hospital number, medical record number, address, telephone number, etc., such that the identity of the subject can be ascertained.

(3) GDPR (“Identifiable Natural Person”):

One who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person.

*Applicability: Common Rule, NIH, General

Uses:

U.S. Federal policies focus on identifiable meaning that the identities of the subjects can be readily ascertained or that there are readily available identifiers attached to the data that would allow individual subject identities to be ascertained. When navigating the applicability of federal policies and regulations, the definition provided by the regulatory source should be applied.

The GDPR definition of an “identifiable natural person” goes beyond the U.S. references to traditional identifiers (like name, address, phone number or SSN), and includes reference to “one or more factors specific to the physical, psychological, genetic, mental, economic, cultural or social identity” of the subject.

At Sage, we recognize the need to combine many definition sources when evaluating factors such as the level to which data is identifiable, data sensitivity, and the risk of re-identification. In practice, Sage Governance will always apply the definition applicable to the specific laws and regulations of the data, but will take a more protective stance whenever feasible. When evaluating data outside the scope of a specific regulatory question, Sage should apply the NIH definition. While the evaluation of data sensitivity and risk should include the combined nature of the individual factors listed in the GDPR definition, Sage will not label data “identifiable” due to these factors alone unless GDPR applies.

For GDPR-regulated data, also see Personal Data.

For HIPAA-regulated data, also see Individually Identifiable Health Information (IIHI).

Related Definitions: Personally Identifiable Information (PII), Coded Data, De-identified Data

Definition Sources:

Common Rule: 45 CFR 46.102(e)(5)

NIH: 3016 - Intramural Research Program Human Data Sharing (HDS) Policy

GDPR: GDPR Article 4

 

51

Individually Identifiable Health Information (IIHI)

Individually identifiable health information is information that is a subset of health information, including demographic information collected from an individual, and:

(1) Is created or received by a health care provider, health plan, employer, or health care clearinghouse; and

(2) Relates to the past, present, or future physical or mental health or condition of an individual; the provision of health care to an individual; or the past, present, or future payment for the provision of health care to an individual; and

(i) That identifies the individual; or

(ii) With respect to which there is a reasonable basis to believe the information can be used to identify the individual

*Applicability: HIPAA

Uses: This definition need only be used when working with data subject to HIPAA regulations.

Defintion Source: 45 CFR 160.103

 

52

Informed Consent

The process of informed consent is a fundamental mechanism to ensure respect for persons through the provision of thoughtful consent for a voluntary act.

*Applicability: General

Depending on the research and the approved consenting plan approved by an Institutional Review Board (IRB), consent may be performed (1) orally without a signed document; (2) using a disclosure form without a signature; or (3) using an informed consent form with required signatures.

 

 

 

 

53

Informed Consent Form (ICF)

Informed Consent Document (ICD)

Informed consent forms are written documents presented as part of an informed consent process when enrolling a human subject in research.

*Applicability: General

Informed consent forms must meet specific requirements defined by the regulations.

Informed consent is not the same as “HIPAA Authorization,” though some institutions may allow these distinct documents to be combined.

Informed consent forms often include restrictions on data sharing and future use limitations. Informed consent forms therefore help to establish Conditions for Data Use within Synapse.

Elements of informed consent are defined by the regulations at 45 CFR 46.116 (Common Rule), 21 CFR 56.116 (for FDA-regulated studies).

Documentation requirements for informed consent are defined by the regulations at 45 CFR 46.117 (Common Rule), and 21 CFR 56.117 (for FDA-regulated studies).

 

54

Intended Data Use Statement (IDU)

 

A detailed description submitted with a Data Access Request identifying the Data Requester's research purpose for accessing and using certain data stored in Synapse which is used by the Data Access Committee (DAC) to determine whether access to the data should be allowed. IDUs should address the following questions: What do you want to do with the data? Why are you doing it? How do you want to do it?

*Applicability: Synapse

IDUs can be required to access certain data via a Managed AR. They are often posted publicly on Synapse wiki pages or portal pages.

 

55

Institutional Review Board (IRB)

An independent body constituted of medical, scientific, and nonscientific members, whose responsibility it is to ensure the protection of the rights, safety, and well-being of human subjects by, among other things, reviewing, approving, and providing continuing review of protocols, amendments, and the methods and material to be used in obtaining and documenting informed consent of the research subjects.

*Applicability: General

IRB approval may be required to access certain data via a Managed AR.

Adapted from ICH E6(R2) 1.31 Good Clinical Practice

 

56

Interconnection Security Agreement (ISA)

An ISA captures the technical and security requirements to establish and maintain the interconnection between any two or more systems.

*Applicability: NIH

Federal policy recommends agencies to develop Interconnection Security Agreements (ISAs) when information is exchanged with another organization via a system interconnection. This is a FISMA-required document discussing security-relevant aspects of an intended connection between a federal agency system and an external system.

Reference: NIST

 

57

Legacy Project

Term used for Synapse Data Coordination Center (DCC) projects that are no longer actively funded, yet require Sage’s continued support, maintenance and closure, as needed. Work completed in support of such projects is funded through indirect funds.

*Applicability: General, Synapse

 

 

58

Limited Data Set

“HIPAA Limited Data Set”

A limited data set is protected health information (PHI) that excludes the direct identifiers listed in 45 CFR 164.514(e)(2).

For simplification purposes, one or more of the following identifiers may be allowed:

  • dates such as admission, discharge, date of service, date of birth, date of death;

  • city, state, five digit or more zip code; and

  • calculated ages in years, months or days or hours (including ages over 89).

*Applicability: HIPAA

Uses: The term “Limited Data Set” is only truly applicable when:

  1. The data was created or received by a covered entity,

  2. The data was stripped of all identifiers except one or more of the identifiers indicated on the left, AND

  3. There is a Data Use Agreement in place meeting the requirements specified by HIPAA regulations.

At Sage, “Limited Data Set” is used broadly as Limited Data Sets are recognized benchmarks in de-identification in the U.S.; however, it is important to be aware of the regulatory applicability. Whereas de-identification of PHI (via the HIPAA Safe Harbor or Expert Determination methods) can convert data into a non-PHI state, Limited Data Sets remain as PHI with the DUA serving as the additional protection.

Generally, Limited Data Sets should always be categorized in the Controlled Access Data Tier.

Defintion Source: 45 CFR 164.514(e)

 

59

Managed Access Requirement (AR)

An Access Requirement that requires data access to be granted via the Synapse Access and Compliance Team (ACT) and/or Data Access Committee (DAC).

*Applicability: Synapse

ACT often implements Managed ARs on data categorized in the Controlled Access Tier. Managed ARs often consist of:

  1. Data Access Application.

  2. One or more of the following: intended data use statement, IRB approval letter, or data use certificate.

  3. Requirement for data accessors to be registered, certified or validated.

 

 

60

Metadata

 

Data that provide additional information intended to make scientific data interpretable and reusable (e.g., date, independent sample and variable construction and description, methodology, data provenance, data transformations, any intermediate or descriptive observational variables).

*Applicability: NIH, General

Definition Source: NIH NOT-OD-21-013 (Data Sharing and Management Plans)

 

61

Personal Data

 

Personal data means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person.

*Applicability: GDPR

Uses: This definition should only be applied to GDPR-regulated data. See Identifiable Data/Information for more discussion and related terms.

Defintion Source: GDPR Article 4

 

62

Personally Identifiable Information (PII)

 

Information that can be used to distinguish or trace an individual’s identity, either alone or when combined with other information that is linked or linkable to a specific individual.

(Because there are many different types of information that can be used to distinguish or trace an individual identity, the term PII is necessarily broad.)

*Applicability: General

PII is not defined officially by the U.S. government through legislative or regulatory bodies, but has been offered through Office of Management and Budget (OMB) memoranda.

To determine whether information is PII, the OMB has recommended to executive agencies that they should perform assessments of the specific risk that an individual can be identified using the information with other information that is linked or linkable to the individual. This is because information that is not PII can become PII whenever additional information becomes available - in any medium or from any source - that would make it possible to identify an individual.

Definition Source: OMB M-17-12

 

63

Private Access

Private Project

A category of Synapse data only available to the Data Contributor (i.e., Project Administrator) and other users that they specify in the entity's Sharing Settings.

*Applicability: Synapse

Often, Private Data is managed via sharing through Synapse Teams.

 

 

64

Private Information

 

(1) Information about behavior that occurs in a context in which an individual can reasonably expect that no observation or recording is taking place, and

(2) Information that has been provided for specific purposes by an individual an that the individual can reasonably expect that will not be made public (e.g., a medical record).

*Applicability: Common Rule

Uses: This definition may be used broadly.

Defintion Source: 45 CFR 46.102(e)(4)

 

65

Project Materials

Project-specific governance documentation, e.g., agreements, amendments, memorandums of understanding, and related legal documents.

 

 

 

66

Protected Health Information (PHI)

 

Protected health information means individually identifiable health information:

(1) Except as provided in paragraph (2) of this definition, that is:

(i) Transmitted by electronic media;

(ii) Maintained in electronic media; or

(iii) Transmitted or maintained in any other form or medium.

(2) Protected health information excludes individually identifiable health information:

(i) In education records covered by the Family Educational Rights and Privacy Act, as amended, 20 U.S.C. 1232g;

(ii) In records described at 20 U.S.C. 1232g(a)(4)(B)(iv);

(iii) In employment records held by a covered entity in its role as employer; and

(iv) Regarding a person who has been deceased for more than 50 years.

*Applicability: HIPAA, General

Do not use this term to mean “Personal Health Information.”

Uses: Data is only PHI when it is regulated under HIPAA. This means that it was created and transmitted by a covered entity, and/or has either been transmitted to another covered entity or to a business associate (with a BAA in place). Sage is not a covered entity, but has, in some circumstances, served as a business associate.

HIPAA terminology has become commonplace when discussing health information used for research. Since health information is most often collected by or combined with data collected by covered entities (like hospitals and clinics), discussion of, and reference to PHI has served to keep a focus on data privacy and security, and the penalties that can arise when privacy rules are broken. Discussion of PHI also maintains a focus on de-identification processes, such as the removal of the 18 HIPAA identifiers, or use of Limited Data Sets.

At Sage, data will rarely meet the definition of being PHI when it is placed in Synapse. The exceptions are when Sage has signed a BAA, or if the data contributor is a covered entity and has put data in Synapse improperly.

Data may start as PHI (when Individually Identifiable Health Information [IIHI] is created by a covered entity and transmitted electronically), but through the process of compliant disclosure authorizations, releases, formal IRB-approved waivers, and/or de-identification procedures, PHI may be placed into Synapse and no longer meet the definition of PHI. Additionally, once data is transferred from a covered entity to a non-covered entity, HIPAA protections no longer apply.

In cases where PHI is put in Synapse “improperly,” this constitutes a privacy breach at the fault of the disclosing entity. These instances should be reported to Sage Governance for investigation and corrective action.

Defintion Source: 45 CFR 160.103

 

67

Pseudonymization

 

(1) GDPR Applicability:

The processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately and is subject to technical and organizational measures to ensure that the personal data are not attributed to an identified or identifiable natural person.

(2) General Applicability:

Data where individual identifiers have been replaced by a code or pseudo (false) identifier.

*Applicability: GDPR

Uses: These definitions may be used broadly outside of the scope of GDPR.

Defintion Source: GDPR Article 4

 

68

Publicly Accessible Data

 

Data are available to qualified researchers. It may include either data that are openly accessible and available for any use or data that are accessed in a controlled manner to protect appropriately certain interests, for example, the privacy of research subjects, intellectual property or security.

*Applicability: General

In some cases, “publicly accessible” data may include only “openly accessible” data.

Definition Source: NIH 3016 - Intramural Research Program Human Data Sharing (HDS) Policy

 

69

Registered User

Synapse user who has successfully created an account, has logged into Synapse using their email and password, and has agreed to the Synapse Pledge.

*Applicability: Synapse

Registered users can create projects and wikis. They can collaborate with other registered users and create Synapse teams. Registered users can also download publicly available data and, if they fulfill the Conditions for Use, they can also access controlled data.

 

 

70

Reliable Method (RM)

Internal process documents that provide detailed, step-by-step instructions for completing a task.

*Applicability: Governance Document Control

RMs are meant to elaborate on other generalized instructions that are covered in SOP or Policy documents. Unlike SOPs or Policies, RMs are meant be updated on a continual basis to best reflect the most reliable, comprehensive method for completing work.

 

 

71

Research

 

A systematic investigation, including research development, testing, and evaluation, designed to develop or contribute to generalizable knowledge.

*Applicability: Common Rule, HIPAA, General

Uses: This definition can be used broadly.

Definition Sources:

45 CFR 46.102(l)

45 CFR 164.501

 

72

Research Governance

Policies, processes, and structures that guide and oversee the research activities such as the research design, data collection, analysis, tools, methods, and dissemination. Research Governance:

1) ensures the ethical, responsible, and accountable conduct of research activities; and

2) protects the rights and well-being of research participants and maintains the integrity of the research process.

*Applicability: General

Who is involved: IRB, ethics committees

 

 

73

Scientific Data

 

The recorded factual material commonly accepted in the scientific community as of sufficient quality to validate and replicate research findings, regardless of whether the data are used to support scholarly publications. Scientific data do not include laboratory notebooks, preliminary analyses, completed case report forms, drafts of scientific papers, plans for future research, peer reviews, communications with colleagues, or physical objects, such as laboratory specimens.

*Applicability: NIH, General

Definition Source: NIH NOT-OD-21-013 (Data Sharing and Management Plans)

 

74

Secondary Research

 

Reusing information or specimens that are collected for some other “primary” or “initial” activity for research purposes.

*Applicability: General

Secondary research will generally involve use of data or specimens that were collected for a reason other than the present research purpose. The “primary” or “initial” activity can be for research purposes or non-research purposes.

  • For example, research performed using medical records is an example of secondary research because the medical records data was collected for regular patient care. The “initial” activity in this case was for non-research purposes.

  • In another example, a researcher might collect data for a specific research purpose by consenting subjects and administering a validated assessment. Once that research study is completed, the researcher may store the data (if the subjects consented to future use of their data and an IRB approved the protocol) and another researcher may conduct secondary research analysis of the data for a different research study.

Definition Source: Preamble to 45 CFR 46 (82 F.R. 7191)

 

75

Sensitive Data

 

(1) General Applicability:

Data that must be protected from unauthorized access to safeguard the privacy or security of an individual or organization. This includes human data at risk of re-identification.

(2) GDPR Applicability:

The following personal data is considered sensitive:

  • personal data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs;

  • trade-union membership;

  • genetic data, biometric data processed solely to identify a human being;

  • health-related data;

  • data concerning a person’s sex life or sexual orientation.

(3) Veteran’s Affairs Applicability (for example purposes):

Sensitive personal information includes:
(A) Education, financial transactions, medical history, and criminal or employment history.
(B) Information that can be used to distinguish or trace the individual’s identity, including name, social security number, date and place of birth, mother’s maiden name, or biometric records.

*Applicability: GDPR, General

“Sensitivity” of information is highly subjective and it is generally difficult to set a list of data elements that will reliably apply to every data set as a method to easily label information as “sensitive.” As a result, some governmental agencies choose to use consider any personally identifiable information (PII) as “sensitive.”

Sage Governance processes may involve a risk-based approach to evaluating the sensitivity of data. This may include an analysis of the risk that the data could pose if data were re-identified, coupled with an analysis of the de-identification methods used to treat the data.

Defintion Sources:

GDPR Article 4(13), (14) and (15), Article 9 and Recitals (51) to (56)

38 U.S.C. 5727(19)

 

76

Sharing Settings

Controls used by a Project Administrator to define and customize public or private access to a Synapse entitiy (Project, File, Folder, or Table). The Project Administrator also has the option to create "Local Sharing Settings" which allows for different access customization for an entity within another entity (example: a parent Folder may have Sharing Settings that allow for "public" access, while a File within that parent Folder may have Local Sharing Settings restricting access to specific Users).

*Applicability: Synapse

Within Sharing Settings, Project Administrators can grant users view, download, edit, edit/delete, and administrator access

 

 

77

Signing Official

Institutional Signing Official

 

(1) General:

An employee affiliated with the respective organization who has oversight authority.

(2) NIH:

An Institutional Signing Official is generally a senior official at an institution who is credentialed through NIH eRA Commons system and is authorized to enter the institution into a legally binding contract and sign on behalf of an investigator who has submitted data or a data access request to NIH.

*Applicability: Synapse, General, NIH

A Data Use Certificate (DUC) or data ingress agreement may require a Signing Official's signature to validate the document. This term is not synonymous with “Institutional Official.

For DUCs: Generally, the Signing Official should be a person meeting the following criteria:

  • Has oversight authority over the data requestor,

  • Is responsible for ensuring appropriate and ethical use of the Data by the data data requestor, and

  • Is not a member of the study team (as this would introduce a conflict of interest).

The institutional role of a Signing Official on a DUC is generally more appropriate in a Department Head position (or similar) due to the nature of wanting closer oversight of the requestor.

For data ingress agreements (e.g., DTA, DUAs, MOUs, etc.): A Signing Official must have institutional authority to enter their institution into a legally binding contracts. For this reason, the Signing Official is typically a designee in a Grants & Contracts office (or similar).

For NIH Data Sharing Policy: The NIH requires additional credentialing and authority.

Definition Source (NIH): NOT-OD-14-124 Genomic Data Sharing Policy

 

78

Synthetic Data

Artificially generated data that mimics real-world data and is created using algorithms and simulations rather than collected from real-life events or observations.

*Applicability: General

 

 

79

Teams (in Synapse)

Multiple Synapse users accepted into a group.

*Applicability: Synapse

Teams can be used to share Synapse entities to multiple users at once. Access Requirements can be implemented on Synapse teams or directly on Synapse entities

 

 

80

Unlinked Data

 

Data that were initially collected with identifiers but, before research use, have been irreversibly stripped of all identifiers by use of an arbitrary or random alphanumeric code and the key to the code is destroyed, thus making impossible for anyone to link the samples to the sources.  This does not preclude linkage with existing clinical, pathological, and demographic information so long as all individual identifiers are removed prior to distribution or receipt.

*Applicability: General

Definition Source: NIH 3016 - Intramural Research Program Human Data Sharing (HDS) Policy

 

81

Validated User

 

Synapse user who has created a Synapse ID, has logged into Synapse using their email and password, has successfully completed the Certification Quiz, and has had their profile and identity validated by Sage Access and Compliance Team.

 

*Applicability: Synapse

The process of becoming a Validated User enables greater transparency within the research community which promote a reciprocal relationship between the Synapse user and the data participants and contributors. Validated Users are eligible to request access to specific controlled-access data and to Bridge data.

To become a Validated User, a Certified User must establish their identity by providing to the Sage Access and Compliance Team (ACT) a combination of Synapse profile information, ORCID profile information, a signed Synapse Pledge, and an external credential.