Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

...

...

...

...

...

...

...

...

...

...

...

Term

...

Definition

...

Discussion/Guidance

...

References

...

Access Renewal

(updated)

...

*Applicability and Uses - Definition 1: Can be used broadly for breach incidents that do not involved HIPAA-regulated data. This definition is adopted from Office of Management and Budget; however, it does not carry regulatory weight.

*Applicability and Uses - Definition 2: Applies only to Breaches subject to HIPAA regulations.

A contract between a covered entity and the entity or person agreeing to participate in activities on behalf of the covered entity. The BAA establishes the permitted and required uses and disclosures of protected health information by the business associate

Conditions of Use

(updated)

Data Subject

(new)

(1) General Applicability:

A contractual document used for the transfer of data that has been developed by nonprofit, government or private industry, where the data are nonpublic or is otherwise subject to some restrictions on its use.

(2) HIPAA Applicability:

An agreement between a covered entity and a limited data set recipient to establish permitted uses and disclosures by the recipientMigration

*Applicability: HIPAA, General

Uses:

Non-HIPAA/General Considerations:

HIPAA’s de-identification standards are over 20 years old and numerous studies have demonstrated many ways in which data labeled as de-identified can be re-identified.

The U.S. Department of Health and Human Services (“HHS”) Secretary’s Advisory Committee on Human Research Protections (“SACHRP”) has noted, for example:

Though de-identification is commonly perceived to be an effective means to protect human participants, certain studies have shown convincingly that other data can be used in conjunction with de-identified data from research studies to re-identify individuals.  Increasingly, the protections afforded by removing the eighteen identifying data elements cited in HIPAA have become out of date, as technological advances and the combining of data sets increase the risk of re-identification.  For example, commercial interests have increasingly been trying to combine large, de-identified data sets with real-world data collected during the course of ordinary daily activities (e.g., credit card charges, driving habits), which increases the risk of re-identification and misuse of previously de-identified data. 

It is important to note that these de-identification methods are not recognized globally. GDPR requirements in the European Union, for example, are comparatively more rigorous. However, GDPR does not provide any specific de-identification methods.

At Sage, HIPAA standards for de-identification are applied broadly in recognition of national standards and as a basic foundation for protecting privacy; however, Governance’s evaluation of data sensitivity and privacy risks must take into account the limitations of HIPAA de-identification standards in favor of more rigorously protective methods or systems.

HIPAA:

HIPAA has defined two de-identification methods that have become a national standard. These definitions specifically apply to protected health information (PHI), which is created by and transmitted by a covered entity, but have been applied broadly across the U.S. and within the research profession.

For more information, see “Guidance Regarding Methods for De-identification of Protected Health Information in Accordance with the HIPAA Privacy Rule (2012)

*Applicability: General

Depending on the research and the approved consenting plan approved by an Institutional Review Board (IRB), consent may be performed (1) orally without a signed document; (2) using a disclosure form without a signature; or (3) using an informed consent form with required signatures.

Term

Definition

Discussion/Guidance

References

1

Access Renewal

Anchor
Access_Renewal
Access_Renewal

Resubmission of accessor(s) Synapse access request to enable continued access to data governed by a Managed AR with an access expiration period.

*Applicability: Synapse

Access renewal settings, including specified intervals, are set when an access requirement is setup by ACT. Renewals and their intervals are determined as part of Conditions for of Use for the data.

Automated Synapse emails are sent to request submitters 2 months and 1 month before their access is set to expire. If the user does not resubmit an access request application by the expiration date they will lose access to the respective data. Most access renewal periods are yearly, but this field can be customized by ACT during the Access Requirement setup.

2

Access Requirement (AR)

(updated)

Anchor
AR
AR

A data restriction, or lock, placed on applied by ACT to a Synapse entity (such as a folder, file, project, or team) and set according to the data contributor’s established Conditions for Use defining the requirements that must be met by a user in order to be allowed access to the entity.

*Applicability: Synapse

ARs are applied to controlled-access data and may be applied in the form of a Managed AR and/or a Click-wrap. A Managed AR requires a user to submit an Access Request. Access Requests may be reviewed and approved by members of the Synapse Access and Compliance Team (ACT) or a Data Access Committee (DAC). A Click-wrap requires a user to read data terms and conditions and click "I accept" before obtaining access. Click-wraps do not require ACT or DAC approval. ARs can be set up for projects, folders, tables, and teams.

3

Access Request

(updated)
Anchor
Access_Request
Access_Request

An electronic application submitted via Synapse by a user seeking permission to controlled-access data protected by a Managed AR requiring the user fulfill terms and conditions of the AR and review and approval either by the Access and Compliance Team (ACT) or a Data Access Committee (DAC). An Access Request user may be submitted by a single user on behalf of several collaborating Synapse users at their institution.

*Applicability: Synapse

Once the designated request reviewer - either Access and Compliance Team (ACT) or Data Access Committee (DAC) - issues an approval or rejection of an Access Request via Synapse, an email is generated and sent to the submitter of the Access Request.

If the Access Request is approved, the approval email will alert the submitter that they now have access to the Synapse entities associated with the Managed AR for which the request was submitted.

If the Access Request is rejected, the rejection email will include notes from the reviewer explaining the reason for rejection, guidance for successful resubmission, and will redirect the submitter to the Managed AR in Synapse to resubmit their Access Request. Approval / rejection emails are only sent to the user who submitted the Access Request and not to other users who may also have been listed on the Access Request.

4

Access Request Submitter

Anchor
Access_Request_Submitter
(updated)
Access_Request_Submitter

A Synapse user who submits an Access Request via Synapse for access to controlled-access data protected by a Managed AR. The Access Request Submitter may submit the Access Request on behalf of the user only or may also list additional collaborating Synapse users from a single institution.

*Applicability: Synapse

A single Access Request will have a single submitter via Synapse who completes and submits the Access Request and is the only Synapse user who will receive approval/rejection emails generated for the Access Request. Multiple collaborting collaborating Synapse users from a single institution may be included by the submitter for data access through a single Access Request, but these additional users are not considered Access Request Submitters and will not receive approval/rejection emails generated for the Access Request.

5

Acknowledgement Statement

Attribution
Anchor
Acknowledgement_Statement
(updated)
Acknowledgement_Statement

A statement set forth by a Data Contributor to be used by data recipients to include in publications, talks, presentations, etc., to ensure the Data Contributor (and any other relevant bodies, such as participants or funders, or Sage Bionetworks) are recognized for their efforts surrounding the data.

*Applicability: Synapse

Acknowledgement Statements are usually posted on the project wiki page or directly in a click-wrap agreement.

6

Access and Compliance Team (ACT)

(updated)
Anchor
ACT
ACT

A Sage Governance sub-team that has Synapse administration privileges enabling members to process access requests, create and manage ARs, validate user profiles, escalate data incidents and other violations of the Synapse Terms and Conditions of Use, and take other administrative actions for governance purposes.

*Applicability: Synapse

7

Access Tiers

Anchor
Access_Tiers
(new)
Access_Tiers

A categorization used to designate the level of restriction that should be applied to data based on factors such as risk of identifiability or limitations on use.

*Applicability: Synapse/General

Access Tiers are defined by Governance in a manner appropriate to each individual study. Terms used to describe access tiers include “Open/Anonymous/Whitelisted”, “Registered,” “Restricted,” “Controlled,” and “Controlled-Plus.”

  • Open/Anonymous/Whitelisted: data that is available for anyone on the web without requiring them to fulfill Conditions for of Use

  • Registered: data that is available to registered users of Synapse

  • Restricted/Controlled: data that is available to registered users of Synapse who fulfill specific requirements for data access, such as submitting an Intended Data Use statement, agreeing to data use limitations, becoming Certified Users in Synapse, and/or undergoing Profile Validation.

  • Controlled-Plus: data that is restricted/controlled and is sensitive enough that additional prerequisites are required such as submitting an IRB approval letter or other institutional documentation.

8

Aggregate Data

Anchor
Aggregate_Data
(new)
Aggregate_Data

Data produced by grouping information into categories and combining values within these categories.

*Applicability: General

Also known as tabular data or macrodata. Often presented in tables. Since aggregate data is the combination of individual-level data, aggregate data is often a term used to describe data that is “less easy” to identify individual subjects; however, disclosure risks can arise if a user can access multiple tables containing common data elements. Data reduction treatments (such as combining categories so sample sizes within categories represent a larger n) and data modification treatments (such as rounding or adding perturbations so the potential for re-identification is reduced) are example methods that can be applied to aggregate data as part of a robust data privacy strategy.

Definition Source: Data Confidentiality Guide, Australian Bureau of Statistics

9

Anonymous DataAnonymized Access Data (new)(1Synapse)

Data available for download on Synapse by anyone on the web without requiring them to login to a Synapse account or fulfill Conditions for Use.

*Applicability: Synapse

10

Anonymous Data

Anchor
Anonymous_Data
Anonymous_Data

Anonymized Data

(1) Broad Definition:

Individual-level data that has been stripped of personally identifiable information.

(2) Enhanced Definition:

Individual-level data that cannot be used alone or with other data to identify a unique individual.

*Applicability: General

Anonymization performed through simple de-identification techniques are useful as a primary safeguard for protecting privacy, but a growing body of literature has shown that as the size and diversity of available data grows, the likelihood of being able to re-identify individuals also grows substantially.

When communicating the protectiveness of de-identification, “anonymization” should be used carefully so as to not mislead participants or the community that “anonymized” data without additional treatment or analysis is a robust method of protecting against future re-identification.

In the “Enhanced Definition,” the data cannot be coded such that a link to the identifiers existing in a separate, existing data set could re-identify the individual.

10

Biometric Data

(new)

Personal data (see below) resulting from specific technical processing relating to the physical, physiological or behavioral characteristics of a natural person, which allow or confirm the unique identification of that natural person, such as facial images or dactyloscopic data.

*Applicability: General, GDPR

Uses: This definition may be used broadly outside of the scope of GDPR, but the definition source is from GDPR.

Definition Source: GDPR Article 4, See also: 21 CFR 11.3(3)

11

Breach

(new)

(1) General Applicability:

The loss of control, compromise, unauthorized disclosure, unauthorized acquisition, or any similar occurrence where (1) a person other than an authorized user accesses or potentially accesses personally identifiable information or (2) an authorized user accesses or potentially accesses personally identifiable information for an other then authorized purpose.

(2) HIPAA Applicability:

The acquisition, access, use, or disclosure of protected health information in a manner not permitted [by the regulations] which compromises the security or privacy of the protected health information.

Definition Source, HIPAA: 45 CFR 164.402

Definition Source, General: OMB M-17-12

12

Business Associate (under HIPAA)

(new)

A business associate, with respect to a covered entity, is a person or entity who:

On behalf of such covered entity […] creates, receives, maintains, or transmits protected health information for a function or activity regulated by [HIPAA], including claims processing or administration, data analysis, processing or administration, utilization review, quality assurance, patient safety activities […], billing, benefit management, practice management, and repricing.

*Applicability: HIPAA

Uses: This definition should only be applied to situations where Sage is agreeing to take on a Business Associate role.

A Business Associate role can only be taken on when a formal contract (Business Associate Agreement) meeting regulatory requirements has been executed. When an organization agrees to be a Business Associate, HIPAA regulations are applied in full effect (including enforcement requirements, breach requirements and the possibility of penalties).

Definition Source: 45 CFR 160.103

13

Business Associate Agreement (BAA)

(new)

11

Anonymous Journal Review

A process by which reviewers from a scientific journal anonymously access data in Synapse in order to evaluate it as part of their review of a manuscript being submitted for publication.

*Applicability: Synapse

To facilitate Anonymous Journal Review, ACT sets up a temporary account for journal reviewers to access data for a temporary period of time.

12

Anonymous User

Anchor
Anonymous-User
Anonymous-User

A Synapse user interacting with the platform without creating (or logging into) a Synapse account.

*Applicability: Synapse

Anonymous Users are able to review platform features, public resources (including the catalog of public projects, files, and tables), and other Anonymous Access Data.

Anonymous Users cannot create Projects in Synapse, upload or download data, add wiki content, or comment in discussion forums.

13

Biometric Data

Anchor
Biometric_Data
Biometric_Data

Personal data (see below) resulting from specific technical processing relating to the physical, physiological or behavioral characteristics of a natural person, which allow or confirm the unique identification of that natural person, such as facial images or dactyloscopic data.

*Applicability: General, GDPR

Uses: This definition may be used broadly outside of the scope of GDPR, but the definition source is from GDPR.

Definition Source: GDPR Article 4, See also: 21 CFR 11.3(3)

14

Breach

Anchor
Breach
Breach

(1) General Applicability:

The loss of control, compromise, unauthorized disclosure, unauthorized acquisition, or any similar occurrence where (1) a person other than an authorized user accesses or potentially accesses personally identifiable information or (2) an authorized user accesses or potentially accesses personally identifiable information for an other then authorized purpose.

(2) HIPAA Applicability:

The acquisition, access, use, or disclosure of protected health information in a manner not permitted [by the regulations] which compromises the security or privacy of the protected health information.

*Applicability and Uses - Definition 1: Can be used broadly for breach incidents that do not involved HIPAA-regulated data. This definition is adopted from Office of Management and Budget; however, it does not carry regulatory weight.

*Applicability and Uses - Definition 2: Applies only to Breaches subject to HIPAA regulations.

Definition Source, HIPAA: 45 CFR 164.402

Definition Source, General: OMB M-17-12

15

Business Associate (under HIPAA)

Anchor
Business_Associate
Business_Associate

A business associate, with respect to a covered entity, is a person or entity who:

On behalf of such covered entity […] creates, receives, maintains, or transmits protected health information for a function or activity regulated by [HIPAA], including claims processing or administration, data analysis, processing or administration, utilization review, quality assurance, patient safety activities […], billing, benefit management, practice management, and repricing.

*Applicability: HIPAA

Uses: This definition should only be applied to situations where Sage is agreeing to take on a Business Associate role.

Definition Source: 45 CFR 164.502(e)

14

Certified User

A Synapse user who has created a Synapse ID, has logged into Synapse using their email and password, and has successfully completed the Certification Quiz. Certified Users have access to full Synapse functionality, including the ability to upload files and tables as well as create folders.

*Applicability: Synapse

To become a Certified User, a A Business Associate role can only be taken on when a formal contract (Business Associate Agreement) meeting regulatory requirements has been executed. When an organization agrees to be a Business Associate, HIPAA regulations are applied in full effect (including enforcement requirements, breach requirements and the possibility of penalties).

Definition Source: 45 CFR 160.103

16

Business Associate Agreement (BAA)

Anchor
BAA
BAA

A contract between a covered entity and the entity or person agreeing to participate in activities on behalf of the covered entity. The BAA establishes the permitted and required uses and disclosures of protected health information by the business associate.

*Applicability: HIPAA

Uses: This definition should only be applied to situations where Sage is agreeing to take on a Business Associate role.

Definition Source: 45 CFR 164.502(e)

17

Certified User

Anchor
Certified_User
Certified_User

A Synapse user who has created a Synapse ID, has logged into Synapse using their email and password, and has successfully completed the Certification Quiz.

*Applicability: Synapse

To become a Certified User, a Registered User must pass a short quiz concerning the Synapse Commons Data Use Procedure to ensure the user understands the rules and policies that govern data sharing on Synapse.

15

Certification Quiz

(new)

A quiz which is taken by a Registered User to become a Certified Users have access to full Synapse functionality, including the ability to upload files and tables as well as create folders.

18

Certification Quiz

Anchor
Certification_Quiz
Certification_Quiz

A quiz which is taken by a Registered User to become a Certified User and ensures the user understands the rules and policies that govern data sharing on Synapse.

*Applicability: Synapse

To become a Certified User, a Registered User must pass a short quiz concerning the Synapse Commons Data Use Procedure to ensure the user understands the rules and policies that govern data sharing on Synapse.

The Certification Quiz is 15 questions and takes approximately 15-20 minutes to complete.

1619

Certificate of Confidentiality (CoC) (new)

Anchor
COC
COC

A CoCs is a tool to protect information, documents, and/or biospecimens that contain identifiable, sensitive information related to a research participant with the intention to protect the privacy of research participants by prohibiting disclosure of identifiable, sensitive research information to anyone not connected to the research except when the participant consents or in a few other specific situations.

*Applicability: NIH, General

CoCs are:

  • Established by the Public Health Service Act §301(d), 42 U.S.C. §241(d), "Protection of privacy of individuals who are research subjects”

  • Applicable only to human subjects research studies in which identifiable, sensitive information is collected or used.

  • Issued by NIH and other HHS agencies (e.g., CDC) for research studies.

    • Since 2017, NIH automatically issues CoCs for any NIH-funded research meeting their criteria.

    • Researchers can also apply for a CoC for non-NIH funded research studies.

For more information, see the NIH FAQs page.

OHRP Guidance: Certificates of Confidentiality - Privacy Protection for Research Subjects

NIH CoC

1720

Click-wrap

Anchor
Click-wrap
(updated)
Click-wrap

A type of Access Requirement placed on a Synapse entity (a folder, file, project, or team) that can be satisfied by the user by reviewing the data contributors conditions and clicking the button that states "I accept the terms of use.”

*Applicability: Synapse

Clickwraps Click-wraps generally contain Terms and Conditions of data use (i.e., what you can and cannot do with the data) and often contain an Acknowledgement Statement.

18

21

Common Rule

A set of expectations and/or terms for data access applied to Synapse content.

*Applicability: Synapse

Conditions of Use are organized to help Requesters comply with the terms under which the data were collected or with other human subjects regulations. Data Contributors collaborate with ACT to set up Conditions for Use in the form of an Access Requirement.

19

Coded Data

(new)

Data is coded when:

  • Identifying information (such as name or social security number) that would enable the investigator to readily ascertain the identity of the individual to whom the private information or specimens pertain has been replaced with a number, letter, symbol, or combination thereof (i.e., the code); and

  • A key to decipher the code exists, enabling linkage of the identifying information to the private information or specimens

    federal guidelines in the U.S. that protect people who participate in research studies, ensuring their rights, safety, and privacy are respected by outlining how researchers must obtain consent from participants, how the proposed research must reviewed and approved by an Institutional Review Board (IRB), and how the researchers must handle the human subject data.

    *Applicability: General

    45 CFR 46 - "Common Rule" | HHS.gov

    22

    Community Governance

    Policies, processes, and structures that guide and oversee the research activities such as the research design, data collection, analysis, tools, methods, and dissemination. Community Governance:

    1) ensures the ethical, responsible, and accountable conduct of research activities; and

    2) protects the rights and well-being of research participants and maintains the integrity of the research process.

    23

    Data Concerning Health

    (new)

    Personal data (see below) related to the physical or mental health of a natural person, including the provision of health care services, which reveal information about his or her health status.

    *Applicability: GDPR

    Uses: This definition need only be used when working with data subject to GDPR.

    Related Definition (HIPAA): Health Information

    Definition Source: GDPR Article 4

    24

    Data Contributor

    (revised)

    *Applicability: General

    Uses: This definition may be used broadly.

    Definition Source: OHRP Guidance: Coded Private Information or Specimens Use in Research (2008)

    20

    Covered Entity

    (“HIPAA Covered Entity”)

    (new)

    Covered entity means:

    (1) A health plan.

    (2) A health care clearinghouse.

    (3) A health care provider who transmits any health information in electronic form in connection with a transaction covered by this subchapter (45 CFR 160.102).

    *Applicability: HIPAA

    Uses: This definition should only be used to determine whether an institution (entity) is subject to HIPAA regulations.

    Sage is not a covered entity. Covered entities are generally organizations engaged in health care operations that cause them to be subject to HIPAA laws.

    Related Definitions: Hybrid Entity, Business Associate

    Definition Source: 45 CFR 160.103

    21

    Creative Commons License

    One of several public copyright licenses that enable the free distribution of an otherwise copyrighted work and is used when an author wants to give other people the right to share, use, and build upon a work that the author has created.

    *Applicability: Synapse, General

    This is required for most data in the Open Access Data Tier.

    https://creativecommons.org/licenses/

    22

    Data Access Committee (DAC)

    (revised)

    An individual or group who reviews and approves or rejects applications or requests for access to and use of data governed by a managed AR.

    *Applicability: Synapse, General

    The Access and Compliance Team (ACT) serves as the Sage Data Access Committee (DAC).

    The owner (individual, group or institution) of data (which may include analysis and/or tools) who uploads data content to SynapseWho is involved: Research consortia, steering committees, funders

    23

    Conditions of Use

    Anchor
    Conditions_of_Use
    Conditions_of_Use

    A set of expectations and/or terms for data access applied to Synapse content.

    *Applicability: Synapse

    Conditions of Use are organized to help Requesters comply with the terms under which the data were collected or with other human subjects regulations. Data Contributors collaborate with ACT to set up Conditions for Use in the form of an Access Requirement.

    24

    Coded Data

    Anchor
    Coded_Data
    Coded_Data

    Data is coded when:

    1. Identifying information (such as name or social security number) that would enable the investigator to readily ascertain the identity of the individual to whom the private information or specimens pertain has been replaced with a number, letter, symbol, or combination thereof (i.e., the code); and

    2. A key to decipher the code exists, enabling linkage of the identifying information to the private information or specimens.

    *Applicability: General

    Uses: This definition may be used broadly.

    Definition Source: OHRP Guidance: Coded Private Information or Specimens Use in Research (2008)

    25

    Covered Entity

    Anchor
    Covered_Entity
    Covered_Entity

    (“HIPAA Covered Entity”)

    Covered entity means:

    (1) A health plan.

    (2) A health care clearinghouse.

    (3) A health care provider who transmits any health information in electronic form in connection with a transaction covered by this subchapter (45 CFR 160.102).

    *Applicability: HIPAA

    Uses: This definition should only be used to determine whether an institution (entity) is subject to HIPAA regulations.

    Sage is not a covered entity. Covered entities are generally organizations engaged in health care operations that cause them to be subject to HIPAA laws.

    Related Definitions: Hybrid Entity, Business Associate

    Definition Source: 45 CFR 160.103

    26

    Creative Commons License

    Anchor
    Creative_Commons_License
    Creative_Commons_License

    One of several public copyright licenses that enable the free distribution of an otherwise copyrighted work and is used when an author wants to give other people the right to share, use, and build upon a work that the author has created.

    *Applicability: Synapse, General

    This is required for most data in the Open Access Data Tier.

    https://creativecommons.org/licenses/

    27

    Data Access Committee (DAC)

    Anchor
    DAC
    DAC

    An individual or group who reviews and approves or rejects applications or requests for access to and use of data governed by a managed AR.

    *Applicability: Synapse, General

    For Synapse communities involving Sage services (such as data curation), a data ingress agreement may be required before a Data Contributor can upload their data.

    25

    Data Incident

    (new)

    An occurrence that (1) actually or imminently jeopardizes the integrity, confidentiality, or availability of information or an information system, or (2) constitutes a violation or imminent threat of violation of law, security policies, security procedures, or acceptable use policies.

    *Applicability: General

    This definition is adopted from Office of Management and Budget; however, it does not carry regulatory weight.

    Definition Source: OMB M-17-12

    26

    Data Ingress Agreements

    (new)

    A general term to represent multiple types of agreements that Sage may engage in with external parties as part of data ingress.

    *Applicability: General

    This term represents:

    • Data Processing Agreement (DPA)

    • Data Sharing Agreement (DSA)

    • Data Sharing Permission (DSP) - this is a Sage term that is not commonly used externally.

    • Data Transfer Agreement (DTA)

    • Data Transfer and Use Agreement (DTUA)

    • Data Use Agreement (DUA) - See separate definition. Note that DUAs that are issued for HIPAA Limited Data Sets must meet specific requirements as dictated by HIPAA. “DUA” should be reserved for HIPAA-regulated data sets unless the countersigning party has a preference for using this term.

    • Materials Transfer Agreement (MTA)

    • Memorandum of Understanding (MOU)

    Sage engages in a variety of agreements with external customers and partners to define: the expectations for providing data to Synapse; the roles and responsibilities each party takes to manage data; and the conditions under which data will be shared with other users and/or institutions from a Sage platform (e.g., authorized persons, access tiers, security boundaries); and the roles and responsibilities that Sage may take on for reviewing access requests.  The scope and applicability of these agreements is dependent upon a number of project-specific factors, including participant consent, data types, contractual obligations, institutional policies, rules and regulations, funder mandate, and/or research community sharing expectations.

    A data ingress agreement is required for institutions contributing data to a Synapse community, and/or for institutions that are having Sage manage data access for them. Sage Governance may attempt to use a standard template to meet the agreement needs (e.g., using an FDP template), but the type and content of the agreement can vary widely depending on the nature of the data, the scope of work, and the preferences of the institution.

    Note that a grant or other existing agreement such as a Data Use Agreement (DUA) can take the place of an additional ingress agreement as long as it is signed by an Institutional Signing Official and the existing document mentions that data will be stored in a repository matching the project’s access controls.

    27

    Data Protection Impact Assessment (DPIA)

    (new)

    A tool used to identify risks, impact or risks arising out of the processing of personal data and build awareness to minimize these risks as much and as early as possible.

    *Applicability: General, GDPR

    This is a general tool that may be used at Sage regardless of the regulatory oversight.

    A Data Protection Impact Assessment (DPIA) is required under the GDPR any time a new project is initiated that is likely to involve “a high risk” to personal information. (More HERE.)

    Sage Data Protection Policy

    GDPR Article 35

    28

    Data Repository

    (new)

    A database of research data maintained for the purpose of performing secondary research.

    *Applicability: General

    Additional synonyms: banks, registries, libraries

    Data repository activities can include data curation and data maintenance (i.e., “data management”), and access management. A data repository containing de-identified data is not “research,” through the downstream product of the repository is for research.

    29

    Data Requesters

    (updated)

    An individual who submits an access request for Synapse data and all other individuals who are listed on such request.

    *Applicability: Synapse

    The Data Requesters listed on a Synapse data access request should exactly match the Data Requesters as listed on the associated Data Use Certificate (DUC) if applicable for a specific Access Requirement (AR).

    30

    Data Sharing

    (new)

    The act of making scientific data available for use by others (e.g., the larger research community, institutions, the broader public), for example, via an established repository.

    *Applicability: NIH, General

    Definition Source: NIH NOT-OD-21-013 (Data Sharing and Management Plans)

    31

    Data Sharing and Management Plan (DSMP)

    (new)

    A plan describing the data management, preservation, and sharing of scientific data and accompanying metadata.

    *Applicability: NIH, General

    See also NOT-OD-21-014: Supplemental Information to the NIH Policy for Data Management and Sharing: Elements of an NIH Data Management and Sharing Plan

    Source: NIH NOT-OD-21-013 (Data Sharing and Management Plans)

    32

    Identified or identifiable living individual to whom personal data relates.

    *Applicability: GDPR

    Uses: This definition need only be used when working with data subject to GDPR.

    Related Definitions: Human Subject

    Definition Source: GDPR Article 4

    33

    Data Management

    (new)The Access and Compliance Team (ACT) serves as the Sage Data Access Committee (DAC).

    28

    Data Concerning Health

    Anchor
    Data_Concerning_Health
    Data_Concerning_Health

    Personal data (see below) related to the physical or mental health of a natural person, including the provision of health care services, which reveal information about his or her health status.

    *Applicability: GDPR

    Uses: This definition need only be used when working with data subject to GDPR.

    Related Definition (HIPAA): Health Information

    Definition Source: GDPR Article 4

    29

    Data Contributor

    Anchor
    Data_Contributor
    Data_Contributor

    The owner (individual, group or institution) of data (which may include analysis and/or tools) who uploads data content to Synapse.

    *Applicability: Synapse, General

    For Synapse communities involving Sage services (such as data curation), a Data Ingress / Egress Agreement may be required before a Data Contributor can upload their data.

    30

    Data Disposition

    The process of determining when and how data is retained, archived, or deleted. It involves making decisions about what to do with data based on factors such as its value, relevance, and legal or regulatory requirements.

    *Applicability: Synapse, General

    31

    Data Disposition View

    A tool, such as a Synapse fileview or materialized view (if data spans multiple projects), .csv file, or R or Python script that enumerates synIDs, which is created by Sage to allow a Data Contributor to easily view and confirm their contributed data for data migration and/or removal from Synapse.

    *Applicability: Synapse, General

    32

    Data Encryption Key

    A secret code used to unscramble encrypted data to make it readable.

    *Applicability: Synapse, General

    33

    Data Governance

    Policies, procedures, and controls of research assets including access management, safe and responsible use, supporting interoperability, and contributing to overall data lifecycle management. Data Governance:

    1) ensures data is managed, protected, and utilized effectively and responsibly within an organization; and

    2) ensures the availability, usability, integrity, and security of data.

    *Applicability: General

    Who is involved: ACT, IT and Security, Platform engineers, Privacy officers

    34

    Data Incident

    Anchor
    Data_Incident
    Data_Incident

    An occurrence that (1) actually or imminently jeopardizes the integrity, confidentiality, or availability of information or an information system, or (2) constitutes a violation or imminent threat of violation of law, security policies, security procedures, or acceptable use policies.

    *Applicability: General

    This definition is adopted from Office of Management and Budget; however, it does not carry regulatory weight.

    Definition Source: OMB M-17-12

    35

    Data Ingress / Egress Agreement

    Anchor
    Data_Ingress_/_Ingress_Agreements
    Data_Ingress_/_Ingress_Agreements

    A formal contract between Sage and external party/ies that outlines the terms and conditions under which data is allowed to enter or be imported into Synapse and the intentions, responsibilities, and roles of each party. 

    *Applicability: General

    This term represents:

    • Data Processing Agreement (DPA)

    • Data Sharing Agreement (DSA)

    • Data Sharing Permission (DSP) - this is a Sage term that is not commonly used externally.

    • Data Transfer Agreement (DTA)

    • Data Transfer and Use Agreement (DTUA)

    • Data Use Agreement (DUA) - See separate definition. Note that DUAs that are issued for HIPAA Limited Data Sets must meet specific requirements as dictated by HIPAA. “DUA” should be reserved for HIPAA-regulated data sets unless the countersigning party has a preference for using this term.

    • Materials Transfer Agreement (MTA)

    • Memorandum of Understanding (MOU)

    Sage engages in a variety of agreements with external customers and partners to define: the expectations for providing data to Synapse; the roles and responsibilities each party takes to manage data; and the conditions under which data will be shared with other users and/or institutions from a Sage platform (e.g., authorized persons, access tiers, security boundaries); and the roles and responsibilities that Sage may take on for reviewing access requests.  The scope and applicability of these agreements is dependent upon a number of project-specific factors, including participant consent, data types, contractual obligations, institutional policies, rules and regulations, funder mandate, and/or research community sharing expectations.

    A Data Ingress / Egress Agreement is required for institutions contributing data to a Synapse community, and/or for institutions that are having Sage manage data access for them. Sage Governance may attempt to use a standard template to meet the agreement needs (e.g., using an FDP template), but the type and content of the agreement can vary widely depending on the nature of the data, the scope of work, and the preferences of the institution.

    Note that a grant or other existing agreement such as a Data Use Agreement (DUA) can take the place of an additional Data Ingress / Egress Agreement as long as it is signed by an Institutional Signing Official and the existing document mentions that data will be stored in a repository matching the project’s access controls.

    36

    Data Landscape Survey

    Phase of initial engagement between the DCC, data contributors, and collaborators which involves defining parameters for receiving and classifying data.

    *Applicability: General

    37

    Data Management

    Anchor
    Data_Management
    Data_Management

    The process of validating, organizing, protecting, maintaining, and processing scientific data to ensure the accessibility, reliability, and quality of the scientific data for its users.

    *Applicability: General

    Definition Source: NIH NOT-OD-21-013 (Data Sharing and Management Plans)

    3438

    Data Use Agreement (DUA)

    Anchor
    DUADUA

    A Data Disposition option which involves the retention and relocation of the Data Contributor’s data on Synapse.

    *Applicability: Synapse, General

    39

    Data Protection Impact Assessment (DPIA)

    Anchor
    DPIA
    DPIA

    A tool used to identify risks, impact or risks arising out of the processing of personal data and build awareness to minimize these risks as much and as early as possible.

    *Applicability: General, GDPR

    This is a general tool that may be used at Sage regardless of the regulatory oversight.

    A Data Protection Impact Assessment (DPIA) is required under the GDPR any time a new project is initiated that is likely to involve “a high risk” to personal information. (More HERE.)

    Sage Data Protection Policy

    GDPR Article 35

    40

    Data Repository

    Anchor
    Data_Repository
    Data_Repository

    A database of research data maintained for the purpose of performing secondary research.

    *Applicability: General, HIPAA

    Uses: This definition has broad uses, but the HIPAA definition is for specific circumstances where a covered entity is disclosing a Limited Data Set to another institution.

    • Non-HIPAA uses of the term generally refer to data sharing agreements between institutions and may be synonymous with “DTA,” “DSA,” “MOU,” and similar agreements to govern data sharing.

    • HIPAA: DUAs under HIPAA must meet specific regulatory requirements. The terms of the DUA define the allowed uses. HIPAA regulations prohibit the recipient from further disclosing or using the information in a manner that would violate HIPAA regulations or the agreement. Recipients under the agreement are required to use appropriate safeguards to prevent use or disclosure of information outside of the defined terms of the agreement.

    Definition Sources:

    General: UPitt Office of Sponsored Programs

    HIPAA: 45 CFR 164.514(e)(4)

    35

    Data Use Certificate (DUC)

    (updated)

    A documented agreement outlining the terms of use for accessing a specific Synapse dataset, which must be signed by the Data Requester(s) and often also requires the signature of an institutional Signing Official.

    *Applicability: Synapse

    Managed ARs can be created to require submission of a Data Use Certificate (DUC) for data access.

    36

    De-identification

    De-Identified Data

    (new)

    (1) Non-HIPAA/General:

    Information that has had personally identifiable information (PII), including PHI, removed.

    (2) HIPAA Safe Harbor Method:

    (i) Removal of the 18 identifiers defined in 45 CFR 164.514(b)(2)(i)(A)-(R) [paraphrased]

    and

    (ii) The covered entity does not have actual knowledge that the information could be used alone or in combination with other information to identify an individual who is a subject of the information.

    (3) HIPAA Expert Determination/Statistical Method:

    A person with appropriate knowledge of and experience with generally accepted statistical and scientific principles and methods for rendering information not individually identifiable:

    (i) Applying such principles and methods, determines that the risk is very small that the information could be used, alone or in combination with other reasonably available information, by an anticipated recipient to identify an individual who is a subject of the information; and

    (ii) Documents the methods and results of the analysis that justify such determination.

    Definition Sources:

    (1) HIPAA 45 CFR 164.512(b)(2)

    (2) HIPAA 45 CFR 164.512(b)(1)

    37

    Federated Query Governance Structure

    Data are housed in a variety of locations, and users are able to query to those local data simultaneously. Typically restricted to pre-configured queries (rather than data exploration) and may require registration before use.

    *Applicability: General

    38

    FISMA

    (Federal Information Security Management Act of 2002 and Federal Information Security Modernization Act of 2014)

    (new)

    A U.S. federal law (FISMA 2002) which requires each federal agency to develop, document, and implement an agency-wide program to provide information security for the information and systems that support the operations and assets of the agency, including those provided or managed by another agency, contractor, or other sources.

    FISMA 2014 amends FISMA 2002 by modernizing federal security practices to address evolving security concerns resulting in less overall reporting, strengthening the use of continuous monitoring in systems, and increasing focus on the agencies for compliance and reporting that is more focused on the issues caused by security incidents.

    FISMA 2014 also required the Office of Management and Budget (OMB) to amend/revise OMB Circular A-130 to eliminate inefficient and wasteful reporting and reflect changes in law and technological advances.

    *Applicability: Synapse, General

    Synapse is a FISMA-compliant platform. See the Synapse Platform page for more information. Federal Information Security Management Act (FISMA)

    39

    General Data Protection Regulation (GDPR)

    Rules and privacy regulations governing data in the European Union (EU). GDPR establishes personal data privacy protections as a fundamental right.

    *Applicability: GDPR

    Fulltext of GDPR: https://gdpr.eu/tag/gdpr/

    40

    Genetic Data

    (new)

    Personal data (see below) relating to the inherited or acquired genetic characteristics of a natural person which give unique information about the physiology or the health of that natural person and which result, in particular, from an analysis of a biological sample from the natural person in question.

    *Applicability: GDPR, General

    Uses: This definition may be used broadly outside of the scope of GDPR.

    Definition Source: GDPR Article 4

    41

    Governance Structures

    Governance Models

    (new)

    The data sharing framework that dictates what data to acquire, how to bring them into systems, how to store them, how to analyze them, and how to share downstream knowledge.

    *Applicability: General

    Types of Governance Structures:

    • Pairwise (One-to-one): Two parties agree to work together and/or share on a data set in some fashion, typically with a closed contract or an informal agreement. The negotiation terms depend on the relative status of the parties and/or the value of the data and knowledge.

    • Open Source (One-to-many or some-to-many): Data are distributed for reuse with a license defining reuse rights and conditions. The creator is in charge of the negotiation at first (choice of license), but then rights to analyze and redistribute are permanently transferred to the user. This is typical of a centralized project in the sciences, i.e., the Human Genome Project.

    • Federated Query (Many-to-many, via platform): Data are housed in a variety of locations, and users are able to query to those local data simultaneously. Typically restricted to pre-configured queries (rather than data exploration) and may require registration before use.

    • Trusted research environment (Many-to-some): Data are housed in a central location under a contractual regime including data transfer and use agreements. Users apply to use the data. Users must “visit” the data rather than download them, agree to be known, and, in some cases, agree to be surveilled by a data steward.

    • Model-to-data (One-to-many): Data are held by a steward who is responsible for running algorithms on the behalf of researchers. In some cases, a synthetic version of the data may be released openly to facilitate model training. Researchers develop algorithms, send them to the steward, and receive back output of their analysis as run on the real dataset. The variety of analyses that may be performed is restricted by this structure, because the data steward must ensure data are specifically curated for any analytical question at hand.

    • Open citizen science (Many-to-many): Rights to use and distribute data are often fully decentralized via license or contract. Open citizen science is a peer-to-peer version of open source science.

    • Clubs and Trusts (Some-to-some): Clubs and Trusts are versions of a common pool resource: a group of people and/or institutions who agree to share resources towards a common goal. Control over the development and negotiation of data sharing and use terms is often held by the founders/settlers (and/or funders) and then can be distributed amongst club participants. Importantly, clubs that operate in the cloud can easily publish data products that are more “open” than the club itself.

    • Closed: Data are held privately by a single party.

    • Closed and Restricted: Data are held privately in order to protect a population, meet a legal requirement, or protect a secret.

    Mangravite, Lara M., Avery Sen, John T. Wilbanks, and Sage Bionetworks Team. Mechanisms to Govern Responsible Sharing of Open Data: A Progress Report. Manubot, 2020. https://github.com/Sage-Bionetworks/governanceGreenPaper/tree/3c2a648b892d8c672a3043c4bacda65505947921

    42

    Health Information

    (new)

    Any information, including genetic information, whether oral or recorded in any form or medium, that:

    (1) Is created or received by a health care provider, health plan, public health authority, employer, life insurer, school or university, or health care clearinghouse; and

    (2) Relates to the past, present, or future physical or mental health or condition of an individual; the provision of health care to an individual; or the past, present, or future payment for the provision of health care to an individual.

    *Applicability: HIPAA, General

    Uses: This definition may be used broadly, but sub-definition (1) can be omitted if the use is not within the scope of HIPAA-regulated activities.

    Related Definition (GDPR): Data Concerning Health

    Defintion Sources: 45 CFR 160.103

    43

    Health Information Portability & Accountability Act (HIPAA)

    (revised)

    US health information privacy law. HIPAA legislation resulted in regulations collectively referred to as “HIPAA” and are made up of the “Privacy Rule,” “Security Rule,” and “Enforcement Rule.”

    *Applicability: HIPAA

    HIPAA Legislation:

    https://www.govinfo.gov/content/pkg/PLAW-104publ191/pdf/PLAW-104publ191.pdf

    Combined HIPAA Regulations:

    https://www.hhs.gov/sites/default/files/ocr/privacy/hipaa/administrative/combined/hipaa-simplification-201303.pdf

    44

    Human Subject

    Research Participant

    (new)

    A living individual about whom an investigator (whether professional or student) conducting research:

    (i) Obtains information or biospecimens through interaction or intervention with the individual, and uses, studies, or analyzes the information or biospecimens, or

    (ii) Obtains, uses, studies, analyzes, or generates identifiable private information or identifiable biospecimens.

    *Applicability: Common Rule, FDA Regulations

    Uses: This definition is used primarily to determine whether information, interactions, interventions, or biospecimens used for research purposes is subject to human subjects regulations (i.e., whether IRB review is required).

    This is a truncated definition. Contact Governance for an in-depth discussion.

    Definition Source: 45 CFR 46.102(e) (2018 revision)

    See also: Chart 01: Is an Activity Human Subjects Research Covered by 45 CFR Part 46?

    https://www.hhs.gov/ohrp/regulations-and-policy/decision-charts-2018/index.html#c1

    45

    Hybrid Entity

    (new)

    A single legal entity:

    (1) That is a covered entity;

    (2) Whose business activities include both covered and non-covered functions; and

    (3) That designates health care components in accordance with paragraph 164.105(a)(2)(iii)(D) of HIPAA regulations.

    *Applicability: HIPAA

    Uses: This definition only applies to HIPAA-regulated organizations.

    A typical example of a hybrid entity is a university with an affiliated teaching hospital. The hospital portion of the organization performs HIPAA-covered health care functions, while the rest of the university performs non-covered functions.

    Definition Source: 45 CFR 164.103

    46

    Identifiable Data/Information

    (new)

    (1) Common Rule:

    Data for which the identities of the source subjects are or may readily be ascertained by the investigator or associated with the information.

    (2) NIH:

    Data that are still attached to a readily available subject identifier such as name, social security number, study number, hospital number, medical record number, address, telephone number, etc., such that the identity of the subject can be ascertained.

    (3) GDPR (“Identifiable Natural Person”):

    One who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person.

    *Applicability: Common Rule, NIH, General

    Uses:

    U.S. Federal policies focus on identifiable meaning that the identities of the subjects can be readily ascertained or that there are readily available identifiers attached to the data that would allow individual subject identities to be ascertained. When navigating the applicability of federal policies and regulations, the definition provided by the regulatory source should be applied.

    The GDPR definition of an “identifiable natural person” goes beyond the U.S. references to traditional identifiers (like name, address, phone number or SSN), and includes reference to “one or more factors specific to the physical, psychological, genetic, mental, economic, cultural or social identity” of the subject.

    At Sage, we recognize the need to combine many definition sources when evaluating factors such as the level to which data is identifiable, data sensitivity, and the risk of re-identification. In practice, Sage Governance will always apply the definition applicable to the specific laws and regulations of the data, but will take a more protective stance whenever feasible. When evaluating data outside the scope of a specific regulatory question, Sage should apply the NIH definition. While the evaluation of data sensitivity and risk should include the combined nature of the individual factors listed in the GDPR definition, Sage will not label data “identifiable” due to these factors alone unless GDPR applies.

    For GDPR-regulated data, also see Personal Data.

    For HIPAA-regulated data, also see Individually Identifiable Health Information (IIHI).

    Related Definitions: Personally Identifiable Information (PII), Coded Data, De-identified Data

    Definition Sources:

    Common Rule: 45 CFR 46.102(e)(5)

    NIH: 3016 - Intramural Research Program Human Data Sharing (HDS) Policy

    GDPR: GDPR Article 4

    47

    Individually Identifiable Health Information (IIHI)

    (new)

    Individually identifiable health information is information that is a subset of health information, including demographic information collected from an individual, and:

    (1) Is created or received by a health care provider, health plan, employer, or health care clearinghouse; and

    (2) Relates to the past, present, or future physical or mental health or condition of an individual; the provision of health care to an individual; or the past, present, or future payment for the provision of health care to an individual; and

    (i) That identifies the individual; or

    (ii) With respect to which there is a reasonable basis to believe the information can be used to identify the individual

    *Applicability: HIPAA

    Uses: This definition need only be used when working with data subject to HIPAA regulations.

    Defintion Source: 45 CFR 160.103

    48

    Informed Consent

    (revised)

    The process of informed consent is a fundamental mechanism to ensure respect for persons through the provision of thoughtful consent for a voluntary act.

    49

    Informed Consent Form (ICF)

    Informed Consent Document (ICD)

    (new)

    Informed consent forms are written documents presented as part of an informed consent process when enrolling a human subject in research.

    *Applicability: General

    Informed consent forms must meet specific requirements defined by the regulations.

    Informed consent is not the same as “HIPAA Authorization,” though some institutions may allow these distinct documents to be combined.

    Informed consent forms often include restrictions on data sharing and future use limitations. Informed consent forms therefore help to establish Conditions for Data Use within Synapse.

    Elements of informed consent are defined by the regulations at 45 CFR 46.116 (Common Rule), 21 CFR 56.116 (for FDA-regulated studies).

    Documentation requirements for informed consent are defined by the regulations at 45 CFR 46.117 (Common Rule), and 21 CFR 56.117 (for FDA-regulated studies).

    50

    Intended Data Use Statement (IDU)

    A description of the research purpose for using requested Synapse data.

    *Applicability: Synapse

    IDUs can be required to access certain data via a Managed AR. They are often posted publicly on Synapse wiki pages or portal pages.

    51

    Institutional Review Board (IRB)

    (revised)

    An independent body constituted of medical, scientific, and nonscientific members, whose responsibility it is to ensure the protection of the rights, safety, and well-being of human subjects by, among other things, reviewing, approving, and providing continuing review of protocols, amendments, and the methods and material to be used in obtaining and documenting informed consent of the research subjects.

    *Applicability: General

    IRB approval may be required to access certain data via a Managed AR.

    Adapted from ICH E6(R2) 1.31 Good Clinical Practice

    52

    Interconnection Security Agreement (ISA)

    (new)

    An ISA captures the technical and security requirements to establish and maintain the interconnection between any two or more systems.

    *Applicability: NIH

    Federal policy recommends agencies to develop Interconnection Security Agreements (ISAs) when information is exchanged with another organization via a system interconnection. This is a FISMA-required document discussing security-relevant aspects of an intended connection between a federal agency system and an external system.

    Reference: NIST

    53

    Legacy Project

    (new)

    Term used for Synapse Data Coordination Center (DCC) projects that are no longer actively funded, yet require Sage’s continued support, maintenance and closure, as needed. Work completed in support of such projects is funded through indirect funds.

    *Applicability: General, Synapse

    54

    Limited Data Set

    “HIPAA Limited Data Set”

    (revised)

    A limited data set is protected health information (PHI) that excludes the direct identifiers listed in 45 CFR 164.514(e)(2).

    For simplification purposes, one or more of the following identifiers may be allowed:

    • dates such as admission, discharge, date of service, date of birth, date of death;

    • city, state, five digit or more zip code; and

    • calculated ages in years, months or days or hours (including ages over 89).

    *Applicability: HIPAA

    Uses: The term “Limited Data Set” is only truly applicable when:

    1. The data was created or received by a covered entity,

    2. The data was stripped of all identifiers except one or more of the identifiers indicated on the left, AND

    3. There is a Data Use Agreement in place meeting the requirements specified by HIPAA regulations.

    At Sage, “Limited Data Set” is used broadly as Limited Data Sets are recognized benchmarks in de-identification in the U.S.; however, it is important to be aware of the regulatory applicability. Whereas de-identification of PHI (via the HIPAA Safe Harbor or Expert Determination methods) can convert data into a non-PHI state, Limited Data Sets remain as PHI with the DUA serving as the additional protection.

    Generally, Limited Data Sets should always be categorized in the Controlled Access Data Tier.

    Defintion Source: 45 CFR 164.514(e)

    55

    Managed Access Requirement (AR)

    (updated)

    An Access Requirement that requires data access to be granted via the Synapse Access and Compliance Team (ACT) and/or Data Access Committee (DAC).

    *Applicability: Synapse

    ACT often implements Managed ARs on data categorized in the Controlled Access Tier. Managed ARs often consist of:

    1. Data Access Application.

    2. One or more of the following: intended data use statement, IRB approval letter, or data use certificate.

    3. Requirement for data accessors to be registered, certified or validated.

    56

    Metadata

    (new)

    Data that provide additional information intended to make scientific data interpretable and reusable (e.g., date, independent sample and variable construction and description, methodology, data provenance, data transformations, any intermediate or descriptive observational variables).

    *Applicability: NIH, General

    Definition Source: NIH NOT-OD-21-013 (Data Sharing and Management Plans)

    57

    Model-to-Data Governance Structure

    Data are held by a steward who is responsible for running algorithms on the behalf of researchers. In some cases, a synthetic version of the data may be released openly to facilitate model training. Researchers develop algorithms, send them to the steward, and receive back output of their analysis as run on the real dataset. The variety of analyses that may be performed is restricted by this structure, because the data steward must ensure data are specifically curated for any analytical question at hand

    *Applicability: General

    58

    Open Source Governance Structure

    Data are distributed for reuse with a license defining reuse rights and conditions. The creator is in charge of the negotiation at first (choice of license), but then rights to analyze and redistribute are permanently transferred to the user.

    *Applicability: Synapse

    This governance structure is typical of a centralized project in the sciences, i.e., the Human Genome Project

    59

    Pairwise Governance Structure

    Two parties agree to work together on and/or share a data set in some fashion, typically with a closed contract (data ingress agreement) or an informal agreement. The negotiation terms depend on the relative status of the parties and/or the value of the data and knowledge.

    *Applicability: General

    60

    Personal Data

    (new)

    Personal data means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person.

    *Applicability: GDPR

    Uses: This definition should only be applied to GDPR-regulated data. See Identifiable Data/Information for more discussion and related terms.

    Defintion Source: GDPR Article 4

    61

    Personally Identifiable Information (PII)

    (new)

    Information that can be used to distinguish or trace an individual’s identity, either alone or when combined with other information that is linked or linkable to a specific individual.

    (Because there are many different types of information that can be used to distinguish or trace an individual identity, the term PII is necessarily broad.)

    *Applicability: General

    PII is not defined officially by the U.S. government through legislative or regulatory bodies, but has been offered through Office of Management and Budget (OMB) memoranda.

    To determine whether information is PII, the OMB has recommended to executive agencies that they should perform assessments of the specific risk that an individual can be identified using the information with other information that is linked or linkable to the individual. This is because information that is not PII can become PII whenever additional information becomes available - in any medium or from any source - that would make it possible to identify an individual.

    Definition Source: OMB M-17-12

    62

    Pseudonymization

    (new)Additional synonyms: banks, registries, libraries

    Data repository activities can include data curation and data maintenance (i.e., “data management”), and access management. A data repository containing de-identified data is not “research,” through the downstream product of the repository is for research.

    41

    Data Requester

    Anchor
    Data_Requesters
    Data_Requesters

    All individuals listed on a Synapse Access Request for access to data.

    *Applicability: Synapse

    When applicable for a Managed Access Requirements (AR), all Data Requesters listed on a Synapse data Access Eequest should exactly match the Data Requesters as listed on the associated Data Use Certificate (DUC).

    42

    Data Roadmap

    An evidence-driven data plan, developed in the Data Landscap Phase by a DCC Team, which will be updated in subsequent stages as the data landscape changes and expands.

    *Applicability: General

    The Data Roadmap will answer the following questions:

    • who is contributing data

    • what types of data

    • how much (e.g., expected number of files, samples and individuals per data type)

    • who owns the data or has ability to approve sharing it

    • who should have access to the data

    • where is the data currently

    • where will it be stored

    • where the data will be analyzed (in the case of cloud computing)

    • when will it be transferred to the DCC

    • when should the DCC expect to release the data

    • how to communicate with the DCC regarding data

    • the governance conditions for sharing and using the data

    43

    Data Sharing

    Anchor
    Data_Sharing
    Data_Sharing

    The act of making scientific data available for use by others (e.g., the larger research community, institutions, the broader public), for example, via an established repository.

    *Applicability: NIH, General

    Definition Source: NIH NOT-OD-21-013 (Data Sharing and Management Plans)

    44

    Data Sharing and Management Plan (DSMP)

    Anchor
    DSMP
    DSMP

    A plan describing the data management, preservation, and sharing of scientific data and accompanying metadata.

    *Applicability: NIH, General

    See also NOT-OD-21-014: Supplemental Information to the NIH Policy for Data Management and Sharing: Elements of an NIH Data Management and Sharing Plan

    Source: NIH NOT-OD-21-013 (Data Sharing and Management Plans)

    45

    Data Subject

    Anchor
    Data_Subject
    Data_Subject

    Identified or identifiable living individual to whom personal data relates.

    *Applicability: GDPR

    Uses: This definition need only be used when working with data subject to GDPR.

    Related Definitions: Human Subject

    Definition Source: GDPR Article 4

    46

    Data Use Agreement (DUA)

    Anchor
    DUA
    DUA

    (1) General Applicability:

    A contractual document used for the transfer of data that has been developed by nonprofit, government or private industry, where the data are nonpublic or is otherwise subject to some restrictions on its use.

    (2) HIPAA Applicability:

    An agreement between a covered entity and a limited data set recipient to establish permitted uses and disclosures by the recipient.

    *Applicability: General, HIPAA

    Uses: This definition has broad uses, but the HIPAA definition is for specific circumstances where a covered entity is disclosing a Limited Data Set to another institution.

    • Non-HIPAA uses of the term generally refer to data sharing agreements between institutions and may be synonymous with “DTA,” “DSA,” “MOU,” and similar agreements to govern data sharing.

    • HIPAA: DUAs under HIPAA must meet specific regulatory requirements. The terms of the DUA define the allowed uses. HIPAA regulations prohibit the recipient from further disclosing or using the information in a manner that would violate HIPAA regulations or the agreement. Recipients under the agreement are required to use appropriate safeguards to prevent use or disclosure of information outside of the defined terms of the agreement.

    Definition Sources:

    General: UPitt Office of Sponsored Programs

    HIPAA: 45 CFR 164.514(e)(4)

    47

    Data Use Certificate (DUC)

    Anchor
    DUC
    DUC

    A documented agreement outlining the terms of use for accessing a specific Synapse dataset, which must be signed by the Data Requester(s) and often also requires the signature of an institutional Signing Official.

    *Applicability: Synapse

    Managed ARs can be created to require submission of a Data Use Certificate (DUC) for data access.

    48

    Data Use Ontology (DUO) Codes

    Standardized terms used to specify permissible data uses and access restrictions for shared genomic and health-related datasets to help ensure that researchers and data users comply with ethical and legal requirements, particularly regarding the privacy and consent preferences of data subjects (the individuals whose data is being shared).

    *Applicability: General

    DUO Codes simplify the interpretation of Data Ingress / Egress Agreements by providing a consistent framework, making it easier for data providers, access committees, and researchers to understand what uses of the data are allowed. Examples of DUO Codes include terms like "general research use," "disease-specific research," or "not-for-profit use."

    49

    De-identification

    Anchor
    De-identification
    De-identification

    De-Identified Data

    (1) Non-HIPAA/General:

    Information that has had personally identifiable information (PII), including PHI, removed.

    (2) HIPAA Safe Harbor Method:

    (i) Removal of the 18 identifiers defined in 45 CFR 164.514(b)(2)(i)(A)-(R) [paraphrased]

    and

    (ii) The covered entity does not have actual knowledge that the information could be used alone or in combination with other information to identify an individual who is a subject of the information.

    (3) HIPAA Expert Determination/Statistical Method:

    A person with appropriate knowledge of and experience with generally accepted statistical and scientific principles and methods for rendering information not individually identifiable:

    (i) Applying such principles and methods, determines that the risk is very small that the information could be used, alone or in combination with other reasonably available information, by an anticipated recipient to identify an individual who is a subject of the information; and

    (ii) Documents the methods and results of the analysis that justify such determination.

    *Applicability: HIPAA, General

    Uses:

    Non-HIPAA/General Considerations:

    HIPAA’s de-identification standards are over 20 years old and numerous studies have demonstrated many ways in which data labeled as de-identified can be re-identified.

    The U.S. Department of Health and Human Services (“HHS”) Secretary’s Advisory Committee on Human Research Protections (“SACHRP”) has noted, for example:

    Though de-identification is commonly perceived to be an effective means to protect human participants, certain studies have shown convincingly that other data can be used in conjunction with de-identified data from research studies to re-identify individuals.  Increasingly, the protections afforded by removing the eighteen identifying data elements cited in HIPAA have become out of date, as technological advances and the combining of data sets increase the risk of re-identification.  For example, commercial interests have increasingly been trying to combine large, de-identified data sets with real-world data collected during the course of ordinary daily activities (e.g., credit card charges, driving habits), which increases the risk of re-identification and misuse of previously de-identified data. 

    It is important to note that these de-identification methods are not recognized globally. GDPR requirements in the European Union, for example, are comparatively more rigorous. However, GDPR does not provide any specific de-identification methods.

    At Sage, HIPAA standards for de-identification are applied broadly in recognition of national standards and as a basic foundation for protecting privacy; however, Governance’s evaluation of data sensitivity and privacy risks must take into account the limitations of HIPAA de-identification standards in favor of more rigorously protective methods or systems.

    HIPAA:

    HIPAA has defined two de-identification methods that have become a national standard. These definitions specifically apply to protected health information (PHI), which is created by and transmitted by a covered entity, but have been applied broadly across the U.S. and within the research profession.

    For more information, see “Guidance Regarding Methods for De-identification of Protected Health Information in Accordance with the HIPAA Privacy Rule (2012)

    Definition Sources:

    (1) HIPAA 45 CFR 164.514(b)(2)

    (2) HIPAA 45 CFR 164.514(b)(1)

    50

    Derived Data

    Anchor
    Derived-Data
    Derived-Data

    New data created by transforming, processing, or analyzing existing data.

    *Applicability: General

    51

    FISMA

    Anchor
    FISMA
    FISMA

    (Federal Information Security Management Act of 2002 and Federal Information Security Modernization Act of 2014)

    A U.S. federal law (FISMA 2002) which requires each federal agency to develop, document, and implement an agency-wide program to provide information security for the information and systems that support the operations and assets of the agency, including those provided or managed by another agency, contractor, or other sources.

    FISMA 2014 amends FISMA 2002 by modernizing federal security practices to address evolving security concerns resulting in less overall reporting, strengthening the use of continuous monitoring in systems, and increasing focus on the agencies for compliance and reporting that is more focused on the issues caused by security incidents.

    FISMA 2014 also required the Office of Management and Budget (OMB) to amend/revise OMB Circular A-130 to eliminate inefficient and wasteful reporting and reflect changes in law and technological advances.

    *Applicability: Synapse, General

    Synapse is a FISMA-compliant platform. See the Synapse Platform page for more information. Federal Information Security Management Act (FISMA)

    52

    Fully-Executed

    Anchor
    Fully-Executed
    Fully-Executed

    Term used when all Parties’ authorized representatives have formally signed the Project Material(s).

    53

    General Data Protection Regulation (GDPR)

    Anchor
    GDPR
    GDPR

    Rules and privacy regulations governing data in the European Union (EU). GDPR establishes personal data privacy protections as a fundamental right.

    *Applicability: GDPR

    Fulltext of GDPR: https://gdpr.eu/tag/gdpr/

    54

    Genetic Data

    Anchor
    Genetic_Data
    Genetic_Data

    Personal data (see below) relating to the inherited or acquired genetic characteristics of a natural person which give unique information about the physiology or the health of that natural person and which result, in particular, from an analysis of a biological sample from the natural person in question.

    *Applicability: GDPR, General

    Uses: This definition may be used broadly outside of the scope of GDPR.

    Definition Source: GDPR Article 4

    55

    Governance Structures

    Anchor
    Governance_Structures
    Governance_Structures

    Governance Models

    The data sharing framework that dictates what data to acquire, how to bring them into systems, how to store them, how to analyze them, and how to share downstream knowledge.

    *Applicability: General

    Types of Governance Structures:

    • Pairwise (One-to-one): Two parties agree to work together and/or share on a data set in some fashion, typically with a closed contract or an informal agreement. The negotiation terms depend on the relative status of the parties and/or the value of the data and knowledge.

    • Open Source (One-to-many or some-to-many): Data are distributed for reuse with a license defining reuse rights and conditions. The creator is in charge of the negotiation at first (choice of license), but then rights to analyze and redistribute are permanently transferred to the user. This is typical of a centralized project in the sciences, i.e., the Human Genome Project.

    • Federated Query (Many-to-many, via platform): Data are housed in a variety of locations, and users are able to query to those local data simultaneously. Typically restricted to pre-configured queries (rather than data exploration) and may require registration before use.

    • Trusted research environment (Many-to-some): Data are housed in a central location under a contractual regime including Data Ingress / Egress Agreements. Users apply to use the data. Users must “visit” the data rather than download them, agree to be known, and, in some cases, agree to be surveilled by a data steward.

    • Model-to-data (One-to-many): Data are held by a steward who is responsible for running algorithms on the behalf of researchers. In some cases, a synthetic version of the data may be released openly to facilitate model training. Researchers develop algorithms, send them to the steward, and receive back output of their analysis as run on the real dataset. The variety of analyses that may be performed is restricted by this structure, because the data steward must ensure data are specifically curated for any analytical question at hand.

    • Open citizen science (Many-to-many): Rights to use and distribute data are often fully decentralized via license or contract. Open citizen science is a peer-to-peer version of open source science.

    • Clubs and Trusts (Some-to-some): Clubs and Trusts are versions of a common pool resource: a group of people and/or institutions who agree to share resources towards a common goal. Control over the development and negotiation of data sharing and use terms is often held by the founders/settlers (and/or funders) and then can be distributed amongst club participants. Importantly, clubs that operate in the cloud can easily publish data products that are more “open” than the club itself.

    • Closed: Data are held privately by a single party.

    • Closed and Restricted: Data are held privately in order to protect a population, meet a legal requirement, or protect a secret.

    Mangravite, Lara M., Avery Sen, John T. Wilbanks, and Sage Bionetworks Team. Mechanisms to Govern Responsible Sharing of Open Data: A Progress Report. Manubot, 2020. https://github.com/Sage-Bionetworks/governanceGreenPaper/tree/3c2a648b892d8c672a3043c4bacda65505947921

    56

    Health Information

    Anchor
    Health_Information
    Health_Information

    Any information, including genetic information, whether oral or recorded in any form or medium, that:

    (1) Is created or received by a health care provider, health plan, public health authority, employer, life insurer, school or university, or health care clearinghouse; and

    (2) Relates to the past, present, or future physical or mental health or condition of an individual; the provision of health care to an individual; or the past, present, or future payment for the provision of health care to an individual.

    *Applicability: HIPAA, General

    Uses: This definition may be used broadly, but sub-definition (1) can be omitted if the use is not within the scope of HIPAA-regulated activities.

    Related Definition (GDPR): Data Concerning Health

    Defintion Sources: 45 CFR 160.103

    57

    Health Information Portability & Accountability Act (HIPAA)

    Anchor
    HIPAA
    HIPAA

    US health information privacy law. HIPAA legislation resulted in regulations collectively referred to as “HIPAA” and are made up of the “Privacy Rule,” “Security Rule,” and “Enforcement Rule.”

    *Applicability: HIPAA

    HIPAA Legislation:

    https://www.govinfo.gov/content/pkg/PLAW-104publ191/pdf/PLAW-104publ191.pdf

    Combined HIPAA Regulations:

    https://www.hhs.gov/sites/default/files/ocr/privacy/hipaa/administrative/combined/hipaa-simplification-201303.pdf

    58

    Human Subject

    Anchor
    Human_Subject
    Human_Subject

    Research Participant

    A living individual about whom an investigator (whether professional or student) conducting research:

    (i) Obtains information or biospecimens through interaction or intervention with the individual, and uses, studies, or analyzes the information or biospecimens, or

    (ii) Obtains, uses, studies, analyzes, or generates identifiable private information or identifiable biospecimens.

    *Applicability: Common Rule, FDA Regulations

    Uses: This definition is used primarily to determine whether information, interactions, interventions, or biospecimens used for research purposes is subject to human subjects regulations (i.e., whether IRB review is required).

    This is a truncated definition. Contact Governance for an in-depth discussion.

    Definition Source: 45 CFR 46.102(e) (2018 revision)

    See also: Chart 01: Is an Activity Human Subjects Research Covered by 45 CFR Part 46?

    https://www.hhs.gov/ohrp/regulations-and-policy/decision-charts-2018/index.html#c1

    59

    Hybrid Entity

    Anchor
    Hybrid_Entity
    Hybrid_Entity

    A single legal entity:

    (1) That is a covered entity;

    (2) Whose business activities include both covered and non-covered functions; and

    (3) That designates health care components in accordance with paragraph 164.105(a)(2)(iii)(D) of HIPAA regulations.

    *Applicability: HIPAA

    Uses: This definition only applies to HIPAA-regulated organizations.

    A typical example of a hybrid entity is a university with an affiliated teaching hospital. The hospital portion of the organization performs HIPAA-covered health care functions, while the rest of the university performs non-covered functions.

    Definition Source: 45 CFR 164.103

    60

    Identifiable Data/Information

    Anchor
    Identifiable_Data
    Identifiable_Data

    (1) Common Rule:

    Data for which the identities of the source subjects are or may readily be ascertained by the investigator or associated with the information.

    (2) NIH:

    Data that are still attached to a readily available subject identifier such as name, social security number, study number, hospital number, medical record number, address, telephone number, etc., such that the identity of the subject can be ascertained.

    (3) GDPR (“Identifiable Natural Person”):

    One who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person.

    *Applicability: Common Rule, NIH, General

    Uses:

    U.S. Federal policies focus on identifiable meaning that the identities of the subjects can be readily ascertained or that there are readily available identifiers attached to the data that would allow individual subject identities to be ascertained. When navigating the applicability of federal policies and regulations, the definition provided by the regulatory source should be applied.

    The GDPR definition of an “identifiable natural person” goes beyond the U.S. references to traditional identifiers (like name, address, phone number or SSN), and includes reference to “one or more factors specific to the physical, psychological, genetic, mental, economic, cultural or social identity” of the subject.

    At Sage, we recognize the need to combine many definition sources when evaluating factors such as the level to which data is identifiable, data sensitivity, and the risk of re-identification. In practice, Sage Governance will always apply the definition applicable to the specific laws and regulations of the data, but will take a more protective stance whenever feasible. When evaluating data outside the scope of a specific regulatory question, Sage should apply the NIH definition. While the evaluation of data sensitivity and risk should include the combined nature of the individual factors listed in the GDPR definition, Sage will not label data “identifiable” due to these factors alone unless GDPR applies.

    For GDPR-regulated data, also see Personal Data.

    For HIPAA-regulated data, also see Individually Identifiable Health Information (IIHI).

    Related Definitions: Personally Identifiable Information (PII), Coded Data, De-identified Data

    Definition Sources:

    Common Rule: 45 CFR 46.102(e)(5)

    NIH: 3016 - Intramural Research Program Human Data Sharing (HDS) Policy

    GDPR: GDPR Article 4

    61

    Identity Attestation Document

    Documentation needed from a Synapse User to confirm their identity as part of the Synapse Profile Verification process.

    *Applicability: Synapse

    Acceptable forms of Identity Attestation Documents include: - Letter from a signing official (other than the person submitting) on official letterhead attesting to their identity - Notarized letter attesting to their identity - A copy of a professional license (i.e. medical license, etc.)

    62

    Incident

    Suspected event that impacts the computer or data environment within Sage Bionetworks.

    *Applicability: Synapse, General

    63

    Individually Identifiable Health Information (IIHI)

    Anchor
    IIHI
    IIHI

    Individually identifiable health information is information that is a subset of health information, including demographic information collected from an individual, and:

    (1) Is created or received by a health care provider, health plan, employer, or health care clearinghouse; and

    (2) Relates to the past, present, or future physical or mental health or condition of an individual; the provision of health care to an individual; or the past, present, or future payment for the provision of health care to an individual; and

    (i) That identifies the individual; or

    (ii) With respect to which there is a reasonable basis to believe the information can be used to identify the individual

    *Applicability: HIPAA

    Uses: This definition need only be used when working with data subject to HIPAA regulations.

    Defintion Source: 45 CFR 160.103

    64

    Informed Consent

    Anchor
    Informed_Consent
    Informed_Consent

    The process of informed consent is a fundamental mechanism to ensure respect for persons through the provision of thoughtful consent for a voluntary act.

    *Applicability: General

    Depending on the research and the approved consenting plan approved by an Institutional Review Board (IRB), consent may be performed (1) orally without a signed document; (2) using a disclosure form without a signature; or (3) using an informed consent form with required signatures.

    65

    Informed Consent Form (ICF)

    Anchor
    ICF
    ICF

    Informed Consent Document (ICD)

    Informed consent forms are written documents presented as part of an informed consent process when enrolling a human subject in research.

    *Applicability: General

    Informed consent forms must meet specific requirements defined by the regulations.

    Informed consent is not the same as “HIPAA Authorization,” though some institutions may allow these distinct documents to be combined.

    Informed consent forms often include restrictions on data sharing and future use limitations. Informed consent forms therefore help to establish Conditions for Data Use within Synapse.

    Elements of informed consent are defined by the regulations at 45 CFR 46.116 (Common Rule), 21 CFR 56.116 (for FDA-regulated studies).

    Documentation requirements for informed consent are defined by the regulations at 45 CFR 46.117 (Common Rule), and 21 CFR 56.117 (for FDA-regulated studies).

    66

    Intended Data Use Statement (IDU)

    Anchor
    Intended-Data-Use-Statement-(IDU)
    Intended-Data-Use-Statement-(IDU)

    A detailed description submitted with a Data Access Request identifying the Data Requester's research purpose for accessing and using certain data stored in Synapse which is used by the Data Access Committee (DAC) to determine whether access to the data should be allowed. IDUs should address the following questions: What do you want to do with the data? Why are you doing it? How do you want to do it?

    *Applicability: Synapse

    IDUs can be required to access certain data via a Managed AR. They are often posted publicly on Synapse wiki pages or portal pages.

    67

    Institutional Review Board (IRB)

    Anchor
    IRB
    IRB

    An independent body constituted of medical, scientific, and nonscientific members, whose responsibility it is to ensure the protection of the rights, safety, and well-being of human subjects by, among other things, reviewing, approving, and providing continuing review of protocols, amendments, and the methods and material to be used in obtaining and documenting informed consent of the research subjects.

    *Applicability: General

    IRB approval may be required to access certain data via a Managed AR.

    Adapted from ICH E6(R2) 1.31 Good Clinical Practice

    68

    Interconnection Security Agreement (ISA)

    Anchor
    ISA
    ISA

    An ISA captures the technical and security requirements to establish and maintain the interconnection between any two or more systems.

    *Applicability: NIH

    Federal policy recommends agencies to develop Interconnection Security Agreements (ISAs) when information is exchanged with another organization via a system interconnection. This is a FISMA-required document discussing security-relevant aspects of an intended connection between a federal agency system and an external system.

    Reference: NIST

    69

    Journal

    A periodical publication that disseminates original research, reviews, and scholarly articles in a specific field of study.

    *Applicability: General

    Scientific journals serve as the primary means of sharing new knowledge, discoveries, and theories among researchers, academics, and professionals.

    70

    Legacy Project

    Anchor
    Legacy_Project
    Legacy_Project

    Term used for Synapse Data Coordination Center (DCC) projects that are no longer actively funded, yet require Sage’s continued support, maintenance and closure, as needed. Work completed in support of such projects is funded through indirect funds.

    *Applicability: General, Synapse

    71

    Limited Data Set

    Anchor
    Limited_Data_Set
    Limited_Data_Set

    “HIPAA Limited Data Set”

    A limited data set is protected health information (PHI) that excludes the direct identifiers listed in 45 CFR 164.514(e)(2).

    For simplification purposes, one or more of the following identifiers may be allowed:

    • dates such as admission, discharge, date of service, date of birth, date of death;

    • city, state, five digit or more zip code; and

    • calculated ages in years, months or days or hours (including ages over 89).

    *Applicability: HIPAA

    Uses: The term “Limited Data Set” is only truly applicable when:

    1. The data was created or received by a covered entity,

    2. The data was stripped of all identifiers except one or more of the identifiers indicated on the left, AND

    3. There is a Data Use Agreement in place meeting the requirements specified by HIPAA regulations.

    At Sage, “Limited Data Set” is used broadly as Limited Data Sets are recognized benchmarks in de-identification in the U.S.; however, it is important to be aware of the regulatory applicability. Whereas de-identification of PHI (via the HIPAA Safe Harbor or Expert Determination methods) can convert data into a non-PHI state, Limited Data Sets remain as PHI with the DUA serving as the additional protection.

    Generally, Limited Data Sets should always be categorized in the Controlled Access Data Tier.

    Defintion Source: 45 CFR 164.514(e)

    72

    Managed Access Requirement (AR)

    Anchor
    Managed_AR
    Managed_AR

    An Access Requirement that requires data access to be granted via the Synapse Access and Compliance Team (ACT) and/or Data Access Committee (DAC).

    *Applicability: Synapse

    ACT often implements Managed ARs on data categorized in the Controlled Access Tier. Managed ARs often consist of:

    1. Data Access Application.

    2. One or more of the following: intended data use statement, IRB approval letter, or data use certificate.

    3. Requirement for data accessors to be registered, certified or validated.

    73

    Manuscript

    A research paper or scholarly work that is submitted to a scientific journal for publication.

    *Applicability: General

    A manuscript typically contains original research findings, theoretical analysis, or a review of existing literature. Before being published, a manuscript goes through a peer-review process where it is evaluated by experts in the field.

    74

    Metadata

    Anchor
    Metadata
    Metadata

    Data that provide additional information intended to make scientific data interpretable and reusable (e.g., date, independent sample and variable construction and description, methodology, data provenance, data transformations, any intermediate or descriptive observational variables).

    *Applicability: NIH, General

    Definition Source: NIH NOT-OD-21-013 (Data Sharing and Management Plans)

    75

    Peer Review

    A process in which a submitted manuscript or research paper is evaluated by independent experts in the same field before it is accepted for publication in a journal.

    *Applicability: General

    The purpose of peer review is to ensure the quality, credibility, and validity of the research by subjecting it to scrutiny from knowledgeable professionals who are not involved in the work.

    76

    Personal Data

    Anchor
    Personal_Data
    Personal_Data

    Personal data means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person.

    *Applicability: GDPR

    Uses: This definition should only be applied to GDPR-regulated data. See Identifiable Data/Information for more discussion and related terms.

    Defintion Source: GDPR Article 4

    77

    Personally Identifiable Information (PII)

    Anchor
    PII
    PII

    Information that can be used to distinguish or trace an individual’s identity, either alone or when combined with other information that is linked or linkable to a specific individual.

    (Because there are many different types of information that can be used to distinguish or trace an individual identity, the term PII is necessarily broad.)

    *Applicability: General

    PII is not defined officially by the U.S. government through legislative or regulatory bodies, but has been offered through Office of Management and Budget (OMB) memoranda.

    To determine whether information is PII, the OMB has recommended to executive agencies that they should perform assessments of the specific risk that an individual can be identified using the information with other information that is linked or linkable to the individual. This is because information that is not PII can become PII whenever additional information becomes available - in any medium or from any source - that would make it possible to identify an individual.

    Definition Source: OMB M-17-12

    78

    Privacy Incident

    An event where protected information is used or disclosed without authorization.

    *Applicability: Synapse, General

    79

    Private Access

    Anchor
    Private_Access
    Private_Access

    Private Project

    A category of Synapse data only available to the Data Contributor (i.e., Project Administrator) and other users that they specify in the entity's Sharing Settings.

    *Applicability: Synapse

    Often, Private Data is managed via sharing through Synapse Teams.

    80

    Private Information

    Anchor
    Private_Information
    Private_Information

    (1) Information about behavior that occurs in a context in which an individual can reasonably expect that no observation or recording is taking place, and

    (2) Information that has been provided for specific purposes by an individual an that the individual can reasonably expect that will not be made public (e.g., a medical record).

    *Applicability: Common Rule

    Uses: This definition may be used broadly.

    Defintion Source: 45 CFR 46.102(e)(4)

    81

    Project Materials

    Anchor
    Project-Materials
    Project-Materials

    Project-specific governance documentation, e.g., agreements, amendments, memorandums of understanding, and related legal documents.

    82

    Protected Health Information (PHI)

    Anchor
    PHI
    PHI

    Protected health information means individually identifiable health information:

    (1) Except as provided in paragraph (2) of this definition, that is:

    (i) Transmitted by electronic media;

    (ii) Maintained in electronic media; or

    (iii) Transmitted or maintained in any other form or medium.

    (2) Protected health information excludes individually identifiable health information:

    (i) In education records covered by the Family Educational Rights and Privacy Act, as amended, 20 U.S.C. 1232g;

    (ii) In records described at 20 U.S.C. 1232g(a)(4)(B)(iv);

    (iii) In employment records held by a covered entity in its role as employer; and

    (iv) Regarding a person who has been deceased for more than 50 years.

    *Applicability: HIPAA, General

    Do not use this term to mean “Personal Health Information.”

    Uses: Data is only PHI when it is regulated under HIPAA. This means that it was created and transmitted by a covered entity, and/or has either been transmitted to another covered entity or to a business associate (with a BAA in place). Sage is not a covered entity, but has, in some circumstances, served as a business associate.

    HIPAA terminology has become commonplace when discussing health information used for research. Since health information is most often collected by or combined with data collected by covered entities (like hospitals and clinics), discussion of, and reference to PHI has served to keep a focus on data privacy and security, and the penalties that can arise when privacy rules are broken. Discussion of PHI also maintains a focus on de-identification processes, such as the removal of the 18 HIPAA identifiers, or use of Limited Data Sets.

    At Sage, data will rarely meet the definition of being PHI when it is placed in Synapse. The exceptions are when Sage has signed a BAA, or if the data contributor is a covered entity and has put data in Synapse improperly.

    Data may start as PHI (when Individually Identifiable Health Information [IIHI] is created by a covered entity and transmitted electronically), but through the process of compliant disclosure authorizations, releases, formal IRB-approved waivers, and/or de-identification procedures, PHI may be placed into Synapse and no longer meet the definition of PHI. Additionally, once data is transferred from a covered entity to a non-covered entity, HIPAA protections no longer apply.

    In cases where PHI is put in Synapse “improperly,” this constitutes a privacy breach at the fault of the disclosing entity. These instances should be reported to Sage Governance for investigation and corrective action.

    Defintion Source: 45 CFR 160.103

    83

    Pseudonymization

    Anchor
    Pseudonymization
    Pseudonymization

    (1) GDPR Applicability:

    The processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately and is subject to technical and organizational measures to ensure that the personal data are not attributed to an identified or identifiable natural person.

    (2) General Applicability:

    Data where individual identifiers have been replaced by a code or pseudo (false) identifier.

    *Applicability: GDPR

    Uses: These definitions may be used broadly outside of the scope of GDPR.

    Defintion Source: GDPR Article 4

    63

    Protected Health Information (PHI)

    (new)

    Protected health information means individually identifiable health information:

    (1) Except as provided in paragraph (2) of this definition, that is:

    (i) Transmitted by electronic media;

    (ii) Maintained in electronic media; or

    (iii) Transmitted or maintained in any other form or medium.

    (2) Protected health information excludes individually identifiable health information:

    (i) In education records covered by the Family Educational Rights and Privacy Act, as amended, 20 U.S.C. 1232g;

    (ii) In records described at 20 U.S.C. 1232g(a)(4)(B)(iv);

    (iii) In employment records held by a covered entity in its role as employer; and

    (iv) Regarding a person who has been deceased for more than 50 years.

    *Applicability: HIPAA, General

    Do not use this term to mean “Personal Health Information.”

    Uses: Data is only PHI when it is regulated under HIPAA. This means that it was created and transmitted by a covered entity, and/or has either been transmitted to another covered entity or to a business associate (with a BAA in place). Sage is not a covered entity, but has, in some circumstances, served as a business associate.

    HIPAA terminology has become commonplace when discussing health information used for research. Since health information is most often collected by or combined with data collected by covered entities (like hospitals and clinics), discussion of, and reference to PHI has served to keep a focus on data privacy and security, and the penalties that can arise when privacy rules are broken. Discussion of PHI also maintains a focus on de-identification processes, such as the removal of the 18 HIPAA identifiers, or use of Limited Data Sets.

    At Sage, data will rarely meet the definition of being PHI when it is placed in Synapse. The exceptions are when Sage has signed a BAA, or if the data contributor is a covered entity and has put data in Synapse improperly.

    Data may start as PHI (when Individually Identifiable Health Information [IIHI] is created by a covered entity and transmitted electronically), but through the process of compliant disclosure authorizations, releases, formal IRB-approved waivers, and/or de-identification procedures, PHI may be placed into Synapse and no longer meet the definition of PHI. Additionally, once data is transferred from a covered entity to a non-covered entity, HIPAA protections no longer apply.

    In cases where PHI is put in Synapse “improperly,” this constitutes a privacy breach at the fault of the disclosing entity. These instances should be reported to Sage Governance for investigation and corrective action.

    Defintion Source: 45 CFR 160.103

    64

    Private Access

    Private Project

    A category of Synapse data only available to the Data Contributor (i.e., Project Administrator) and other users that they specify in the entity's Sharing Settings.

    *Applicability: Synapse

    Often, Private Data is managed via sharing through Synapse Teams.

    65

    Private Information

    (new)

    (1) Information about behavior that occurs in a context in which an individual can reasonably expect that no observation or recording is taking place, and

    (2) Information that has been provided for specific purposes by an individual an that the individual can reasonably expect that will not be made public (e.g., a medical record).

    *Applicability: Common Rule

    Uses: This definition may be used broadly.

    Defintion Source: 45 CFR 46.102(e)(4)

    66

    Publicly Accessible Data

    (new)

    Data are available to qualified researchers. It may include either data that are openly accessible and available for any use or data that are accessed in a controlled manner to protect appropriately certain interests, for example, the privacy of research subjects, intellectual property or security.

    *Applicability: General

    In some cases, “publicly accessible” data may include only “openly accessible” data.

    Definition Source: NIH 3016 - Intramural Research Program Human Data Sharing (HDS) Policy

    67

    Registered User

    Synapse users that have successfully created an account and agreed to the Synapse Pledge.

    *Applicability: Synapse

    Registered users can create projects and wikis. They can collaborate with other registered users and create Synapse teams. Registered users can also download publicly available data and, if they fulfill the Conditions for Use, they can also access controlled data.

    68

    Research

    (new)

    A systematic investigation, including research development, testing, and evaluation, designed to develop or contribute to generalizable knowledge.

    *Applicability: Common Rule, HIPAA, General

    Uses: This definition can be used broadly.

    Definition Sources:

    45 CFR 46.102(l)

    45 CFR 164.501

    69

    Scientific Data

    (new)

    The recorded factual material commonly accepted in the scientific community as of sufficient quality to validate and replicate research findings, regardless of whether the data are used to support scholarly publications. Scientific data do not include laboratory notebooks, preliminary analyses, completed case report forms, drafts of scientific papers, plans for future research, peer reviews, communications with colleagues, or physical objects, such as laboratory specimens.

    *Applicability: NIH, General

    Definition Source: NIH NOT-OD-21-013 (Data Sharing and Management Plans)

    70

    Secondary Research

    (new)

    Reusing information or specimens that are collected for some other “primary” or “initial” activity for research purposes.

    *Applicability: General

    Secondary research will generally involve use of data or specimens that were collected for a reason other than the present research purpose. The “primary” or “initial” activity can be for research purposes or non-research purposes.

    • For example, research performed using medical records is an example of secondary research because the medical records data was collected for regular patient care. The “initial” activity in this case was for non-research purposes.

    • In another example, a researcher might collect data for a specific research purpose by consenting subjects and administering a validated assessment. Once that research study is completed, the researcher may store the data (if the subjects consented to future use of their data and an IRB approved the protocol) and another researcher may conduct secondary research analysis of the data for a different research study.

    Definition Source: Preamble to 45 CFR 46 (82 F.R. 7191)

    71

    Sensitive Data

    (revised)

    (1) General Applicability:

    Data that must be protected from unauthorized access to safeguard the privacy or security of an individual or organization. This includes human data at risk of re-identification.

    (2) GDPR Applicability:

    The following personal data is considered sensitive:

    • personal data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs;

    • trade-union membership;

    • genetic data, biometric data processed solely to identify a human being;

    • health-related data;

    • data concerning a person’s sex life or sexual orientation.

    (3) Veteran’s Affairs Applicability (for example purposes):

    Sensitive personal information includes:
    (A) Education, financial transactions, medical history, and criminal or employment history.
    (B) Information that can be used to distinguish or trace the individual’s identity, including name, social security number, date and place of birth, mother’s maiden name, or biometric records.

    *Applicability: GDPR, General

    “Sensitivity” of information is highly subjective and it is generally difficult to set a list of data elements that will reliably apply to every data set as a method to easily label information as “sensitive.” As a result, some governmental agencies choose to use consider any personally identifiable information (PII) as “sensitive.”

    Sage Governance processes may involve a risk-based approach to evaluating the sensitivity of data. This may include an analysis of the risk that the data could pose if data were re-identified, coupled with an analysis of the de-identification methods used to treat the data.

    Defintion Sources:

    GDPR Article 4(13), (14) and (15), Article 9 and Recitals (51) to (56)

    38 U.S.C. 5727(19)

    72

    Sharing Settings

    A setting on the Synapse platform that enables a Project Administrator to define with whom a project or entity may be shared

    *Applicability: Synapse

    Within Sharing Settings, Project Administrators can grant users view, download, edit, edit/delete, and administrator access

    73

    Signing Official

    Institutional Signing Official

    (revised)

    (1) General:

    An employee affiliated with the respective organization who has oversight authority.

    (2) NIH:

    An Institutional Signing Official is generally a senior official at an institution who is credentialed through NIH eRA Commons system and is authorized to enter the institution into a legally binding contract and sign on behalf of an investigator who has submitted data or a data access request to NIH.

    *Applicability: Synapse, General, NIH

    A Data Use Certificate (DUC) or data ingress agreement may require a Signing Official's signature to validate the document. This term is not synonymous with “Institutional Official.

    For DUCs: Generally, the Signing Official should be a person meeting the following criteria:

    • Has oversight authority over the data requestor,

    • Is responsible for ensuring appropriate and ethical use of the Data by the data data requestor, and

    • Is not a member of the study team (as this would introduce a conflict of interest).

    The institutional role of a Signing Official on a DUC is generally more appropriate in a Department Head position (or similar) due to the nature of wanting closer oversight of the requestor.

    For data ingress agreements (e.g., DTA, DUAs, MOUs, etc.): A Signing Official must have institutional authority to enter their institution into a legally binding contracts. For this reason, the Signing Official is typically a designee in a Grants & Contracts office (or similar).

    For NIH Data Sharing Policy: The NIH requires additional credentialing and authority.

    Definition Source (NIH): NOT-OD-14-124 Genomic Data Sharing Policy

    74

    Teams (in Synapse)

    Multiple Synapse users accepted into a group.

    *Applicability: Synapse

    Teams can be used to share Synapse entities to multiple users at once. Access Requirements can be implemented on Synapse teams or directly on Synapse entities

    75

    Unlinked Data

    (new)) identifier.

    *Applicability: GDPR

    Uses: These definitions may be used broadly outside of the scope of GDPR.

    Defintion Source: GDPR Article 4

    84

    Publicly Accessible Data

    Anchor
    Publicly_Accessible
    Publicly_Accessible

    Data are available to qualified researchers. It may include either data that are openly accessible and available for any use or data that are accessed in a controlled manner to protect appropriately certain interests, for example, the privacy of research subjects, intellectual property or security.

    *Applicability: General

    In some cases, “publicly accessible” data may include only “openly accessible” data.

    Definition Source: NIH 3016 - Intramural Research Program Human Data Sharing (HDS) Policy

    85

    Registered User

    Anchor
    Registered_User
    Registered_User

    Synapse user who has successfully created an account, has logged into Synapse using their email and password, and has agreed to the Synapse Pledge.

    *Applicability: Synapse

    Registered users can create projects and wikis. They can collaborate with other registered users and create Synapse teams. Registered users can also download publicly available data and, if they fulfill the Conditions for Use, they can also access controlled data.

    86

    Reliable Method (RM)

    Anchor
    Reliable-Method
    Reliable-Method

    Internal process documents that provide detailed, step-by-step instructions for completing a task.

    *Applicability: Governance Document Control

    RMs are meant to elaborate on other generalized instructions that are covered in SOP or Policy documents. Unlike SOPs or Policies, RMs are meant be updated on a continual basis to best reflect the most reliable, comprehensive method for completing work.

    87

    Research

    Anchor
    Research
    Research

    A systematic investigation, including research development, testing, and evaluation, designed to develop or contribute to generalizable knowledge.

    *Applicability: Common Rule, HIPAA, General

    Uses: This definition can be used broadly.

    Definition Sources:

    45 CFR 46.102(l)

    45 CFR 164.501

    88

    Research Governance

    Anchor
    Research-Governance
    Research-Governance

    Policies, processes, and structures that guide and oversee the research activities such as the research design, data collection, analysis, tools, methods, and dissemination. Research Governance:

    1) ensures the ethical, responsible, and accountable conduct of research activities; and

    2) protects the rights and well-being of research participants and maintains the integrity of the research process.

    *Applicability: General

    Who is involved: IRB, ethics committees

    89

    Scientific Data

    Anchor
    Scientific_Data
    Scientific_Data

    The recorded factual material commonly accepted in the scientific community as of sufficient quality to validate and replicate research findings, regardless of whether the data are used to support scholarly publications. Scientific data do not include laboratory notebooks, preliminary analyses, completed case report forms, drafts of scientific papers, plans for future research, peer reviews, communications with colleagues, or physical objects, such as laboratory specimens.

    *Applicability: NIH, General

    Definition Source: NIH NOT-OD-21-013 (Data Sharing and Management Plans)

    90

    Secondary Research

    Anchor
    Secondary_Research
    Secondary_Research

    Reusing information or specimens that are collected for some other “primary” or “initial” activity for research purposes.

    *Applicability: General

    Secondary research will generally involve use of data or specimens that were collected for a reason other than the present research purpose. The “primary” or “initial” activity can be for research purposes or non-research purposes.

    • For example, research performed using medical records is an example of secondary research because the medical records data was collected for regular patient care. The “initial” activity in this case was for non-research purposes.

    • In another example, a researcher might collect data for a specific research purpose by consenting subjects and administering a validated assessment. Once that research study is completed, the researcher may store the data (if the subjects consented to future use of their data and an IRB approved the protocol) and another researcher may conduct secondary research analysis of the data for a different research study.

    Definition Source: Preamble to 45 CFR 46 (82 F.R. 7191)

    91

    Secret Store

    A secure service, such as LastPass, used to store sensitive information such as passwords, API keys, encryption keys, certificates, and other credentials and ensures that these secrets are protected from unauthorized access, often through encryption and access controls.

    *Applicability: General

    92

    Security Incident

    A fault in the confidentiality, availability, or integrity of an information system.

    *Applicability: Synapse, General

    93

    Security Incident Response Team (SIRT)

    Sage workforce members who are responsible for organizational response to incidents, and to prepare for incidents, assess risks, and maintain the incident response process.

    *Applicability: General

    94

    Sensitive Data

    Anchor
    Sensitive_Data
    Sensitive_Data

    (1) General Applicability:

    Data that must be protected from unauthorized access to safeguard the privacy or security of an individual or organization. This includes human data at risk of re-identification.

    (2) GDPR Applicability:

    The following personal data is considered sensitive:

    • personal data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs;

    • trade-union membership;

    • genetic data, biometric data processed solely to identify a human being;

    • health-related data;

    • data concerning a person’s sex life or sexual orientation.

    (3) Veteran’s Affairs Applicability (for example purposes):

    Sensitive personal information includes:
    (A) Education, financial transactions, medical history, and criminal or employment history.
    (B) Information that can be used to distinguish or trace the individual’s identity, including name, social security number, date and place of birth, mother’s maiden name, or biometric records.

    *Applicability: GDPR, General

    “Sensitivity” of information is highly subjective and it is generally difficult to set a list of data elements that will reliably apply to every data set as a method to easily label information as “sensitive.” As a result, some governmental agencies choose to use consider any personally identifiable information (PII) as “sensitive.”

    Sage Governance processes may involve a risk-based approach to evaluating the sensitivity of data. This may include an analysis of the risk that the data could pose if data were re-identified, coupled with an analysis of the de-identification methods used to treat the data.

    Defintion Sources:

    GDPR Article 4(13), (14) and (15), Article 9 and Recitals (51) to (56)

    38 U.S.C. 5727(19)

    95

    Sharing Settings

    Anchor
    Sharing_Settings
    Sharing_Settings

    Controls used by a Project Administrator to define and customize public or private access to a Synapse entitiy (Project, File, Folder, or Table). The Project Administrator also has the option to create "Local Sharing Settings" which allows for different access customization for an entity within another entity (example: a parent Folder may have Sharing Settings that allow for "public" access, while a File within that parent Folder may have Local Sharing Settings restricting access to specific Users).

    *Applicability: Synapse

    Within Sharing Settings, Project Administrators can grant users view, download, edit, edit/delete, and administrator access

    96

    Signing Official

    Anchor
    Signing_Official
    Signing_Official

    Institutional Signing Official

    (1) General:

    An employee affiliated with the respective organization who has oversight authority.

    (2) NIH:

    An Institutional Signing Official is generally a senior official at an institution who is credentialed through NIH eRA Commons system and is authorized to enter the institution into a legally binding contract and sign on behalf of an investigator who has submitted data or a data access request to NIH.

    *Applicability: Synapse, General, NIH

    A Data Use Certificate (DUC) or Data Ingress / Egress Agreement may require a Signing Official's signature to validate the document. This term is not synonymous with “Institutional Official.

    For DUCs: Generally, the Signing Official should be a person meeting the following criteria:

    • Has oversight authority over the data requestor,

    • Is responsible for ensuring appropriate and ethical use of the Data by the data data requestor, and

    • Is not a member of the study team (as this would introduce a conflict of interest).

    The institutional role of a Signing Official on a DUC is generally more appropriate in a Department Head position (or similar) due to the nature of wanting closer oversight of the requestor.

    For data ingress agreements (e.g., DTA, DUAs, MOUs, etc.): A Signing Official must have institutional authority to enter their institution into a legally binding contracts. For this reason, the Signing Official is typically a designee in a Grants & Contracts office (or similar).

    For NIH Data Sharing Policy: The NIH requires additional credentialing and authority.

    Definition Source (NIH): NOT-OD-14-124 Genomic Data Sharing Policy

    97

    Synthetic Data

    Artificially generated data that mimics real-world data and is created using algorithms and simulations rather than collected from real-life events or observations.

    *Applicability: General

    98

    Team (in Synapse)

    Anchor
    Teams
    Teams

    Multiple Synapse users accepted into a group for the purpose of controlling access to projects, faciliatating communication within Synapse, and/or allowing participation in Challenges.

    *Applicability: Synapse

    Teams can be used to share Synapse entities to multiple users at once. Access Requirements can be implemented on Synapse teams or directly on Synapse entities

    Synapse Docs > Collaborating in Synapse > Teams

    99

    Team Manager (Synapse)

    Role for Synapse Team member(s) with authority to invite or remove team members and the abilitty to edit Team Synapse settings.

    *Applicability: Synapse

    Synapse Docs > Collaborating in Synapse > Teams

    100

    Two-Factor Authentication (2FA)

    A security method that requires two different forms of identification to access data or resources which can help protect against phishing, social engineering, password brute-force attacks, and weak or stolen credentials.

    *Applicability: Synapse, General

    Adding Two-Factor Authentication (2FA) to your Synapse account

    101

    Unlinked Data

    Anchor
    Unlinked_Data
    Unlinked_Data

    Data that were initially collected with identifiers but, before research use, have been irreversibly stripped of all identifiers by use of an arbitrary or random alphanumeric code and the key to the code is destroyed, thus making impossible for anyone to link the samples to the sources.  This does not preclude linkage with existing clinical, pathological, and demographic information so long as all individual identifiers are removed prior to distribution or receipt.

    *Applicability: General

    Definition Source: NIH 3016 - Intramural Research Program Human Data Sharing (HDS) Policy

    102

    Validated User

    76

    Anchor
    Validated_User
    (updated)
    Validated_User

    Synapse user who has created a Synapse ID, has logged into Synapse using their email and password, has successfully completed the Certification Quiz, and has had their profile and identity validated by Sage Access and Compliance Team.

    *Applicability: Synapse

    The process of becoming a Validated User enables greater transparency within the research community which promote a reciprocal relationship between the Synapse user and the data participants and contributors. Validated Users are eligible to request access to specific controlled-access data and to Bridge data.

    To become a Validated User, a Certified User must establish their identity by providing to the Sage Access and Compliance Team (ACT) a combination of Synapse profile information, ORCID profile information, a signed Synapse Pledge, and an external credential.

    103

    Violation

    Any behavior or action that is not compliant with the Synapse Terms and Conditions of Use, Privacy Policy, or Community Standards.

    *Applicability: Synapse

    104

    Whitelisting

    The act of making data available on Synapse as Anonymous Access, a category of data available for download by anyone on the web without requiring them to login to a Synapse account or fulfill Conditions for Use.

    *Applicability: Synapse