The periodic audit of Synapse activity is intended to surface potential threat scenarios concerning the privacy and security of data held in the Synapse. The approach to this audit is informed by an assessment of risks to priority data, such as the data sets associated with Synapse projects marked with restricted access control lists. The risk assessment process considers access control at the point when access is granted, when access is used, and when access may become uncontrolled.
Auditing may be done by analyzing a comprehensive report of activity over the audit period. A comprehensive report is generated by running queries that precisely target privacy threat scenarios.
Overview
The Synapse audit should occur twice a year, once in July and once in January. Each audit should contain data from the two quarters prior to the data pull. The purpose of the audit is to ensure that there have not been any data breaches or security risks during the respective audit period.
An audit report is generated during each audit to analyze the data and explain whether there have been any security breaches or privacy concerns. The Governance Regulatory Support Team should submit the audit report to WIRB annually in October during the Synapse continuing review, which occurs in October.
Threat Scenarios
1. Data access
Synapse implements an access control system based on the properties of the dataset and/or on the properties of the user profile attempting to gain access.. Public datasets may be controlled (users must agree to specific terms and request access from ACT or another specified entity and may be required to upload certain documentation based on the dataset), restricted (users must agree to specific terms, but access is granted automatically after terms are accepted), or open (users can view data either anonymously or once they have created a Synapse account).
Within dataset specific access requirements and project sharing settings, data contributors may specify whether users must be registered (created account and agreed to Synapse Pledge), certified (passed quiz indicating security and privacy policy awareness), or validated (identity linked to account has been verified) to obtain access.
Rather than restricting or controlling data, project administrators can also choose to make their projects or folders private and only share them with specific synapse users and teams. General Synapse users will not be able to view or access private projects or entities unless explicitly shared to them by a project administrator.
Data Access Threat Scenarios
Threat: A Synapse user intentionally or inadvertently accesses controlled data without qualification
Identify through data warehouse query and end user reporting:
☑ Users who have posted or accessed controlled data without the appropriate access level required for the respective dataset.
Threat: A Synapse user with significant access to data intentionally or inadvertently shares access
Identify through data warehouse query and end user reporting:
☑ A single file downloaded multiple times by a single user
Data Access Associated Queries: Top downloaders
Data handling
Synapse allows end users to upload data once they have certified their account through a training module. The certification process is an administrative control that trains users on appropriate data handling procedures. Once granted data upload rights, an end user is expected to respect the permission sets associated with the data sets they handle.
Threat: A Synapse superuser intentionally or accidentally copies or uploads a controlled data set without appropriate access controls
Identify through data warehouse query and end user reporting:
PHI accidentally/intentionally released without appropriate conditions
Original terms of data contribution are not respected. Data proliferated into Synapse beyond the original terms of use
Public Synapse spaces contain only data classified as public
Associated queries: MD5 duplicates
, Restriction change of state
Data loss
A Synapse account may be permitted to access many data sets of differing classifications. An incident of account sharing or account compromise may result in the download of a data set beyond what is intended according to an access restriction.
Threat: A Synapse account with extensive access to controlled data sets may be compromised:
Identify through data warehouse query and end user reporting:
Detecting the exfiltration of data from Synapse correlated with large-scale download activity by a user
Associated queries: Restriction change of state
, Top downloaders
Audit Constraints
The Synapse audit approach was revised in 2020 to focus on specific threats identified through a risk assessment process. Automated queries were designed to report on the activity related to each threat.
The audit reports are limited by the time spans available to the automated queries. Some queries are based on changes to properties of objects and a query may not be able to compare an event with activity outside of its observation window. In these cases, the query will not surface a conflict between the event and a prior state.
Data warehouse queries, documentation, and handling
Restriction change of state
Code Block |
---|
#select t1.ID, t1.IS_CONTROLLED, t1.IS_RESTRICTED, t1.IS_PUBLIC, t2.IS_CONTROLLED, t2.IS_RESTRICTED, t2.IS_PUBLIC
select t1.*, t2.*
from (
select ns2.*
from NODE_SNAPSHOT ns2
join (
# most recent snapshot
select ns1.ID, max(ns1.TIMESTAMP)
from NODE_SNAPSHOT ns1
group by ns1.ID
) nsmax1 on nsmax1.ID=ns2.ID
) t1
join (
select ns2.*
from NODE_SNAPSHOT ns2
join (
# snapshot a month ago
select ns1.ID, max(ns1.TIMESTAMP)
from NODE_SNAPSHOT ns1
where ns1.TIMESTAMP < unix_timestamp('2019-09-01 00:00:00')*1000
group by ns1.ID
) nsmax1 on nsmax1.ID=ns2.ID
) t2 on t2.ID=t1.ID and t2.VERSION_NUMBER=t1.VERSION_NUMBER
where not (t1.IS_PUBLIC = t2.IS_PUBLIC and t1.IS_CONTROLLED = t2.IS_CONTROLLED and t1.IS_RESTRICTED = t2.IS_RESTRICTED)
limit 100
;
|
Export the
Restriction change of state
table to a spread sheet to create a pivot table summary of the number of access control changes by project. Include this summary in the report.Contact the project owner or community manager of each different project on the list to notify them that their files have been identified as anomalies through a regular Synapse audit.
For any responses that indicate inadvertent or inappropriately permissive access control changes, create a ticket within the Governance Jira space for investigation of a privacy incident.
Top downloaders
Code Block |
---|
# top 20 downloaders by count(filehandle_id)
select fhdr.USER_ID, count(*) as c
from FILE_HANDLE_DOWNLOAD_RECORD fhdr
where fhdr.TIMESTAMP between unix_timestamp('2019-07-01 00:00:00')*1000 and unix_timestamp('2019-09-10 00:00:00')*1000
group by fhdr.USER_ID
order by c desc
limit 20; |
Contact the account holder of each account returned by this query with a prompt like the following:
Your Synapse account has been identified during a routine Synapse audit as having accessed a large number of files in the last six months. This activity may be expected due to how you use Synapse, or may be the result of a compromised or shared account.Please reply to this email message to confirm that you are not aware of a breach of your Synapse credentials and that you have not shared them with anyone else.
Summarize the responses for the report.
For any responses that indicate loss of control of account credentials, create a ticket within the Governance Jira space for investigation of a privacy incident.
MD5 duplicates
Code Block | ||
---|---|---|
| ||
create table auditdb.fhd_detail2 as
select ls.ID, ls.VERSION_NUMBER, ls.NAME,
ls.PROJECT_ID, ls.PARENT_ID, ls.BENEFACTOR_ID,
ls.IS_PUBLIC, ls.IS_CONTROLLED, ls.IS_RESTRICTED,
ls.FILE_HANDLE_ID, md5c.CONTENT_MD5, md5c.c as DUP_COUNT
from auditdb.latest_snapshot_202003 ls
join warehouse.FILE_HANDLE_RECORD fhr on fhr.ID=ls.FILE_HANDLE_ID
join auditdb.fhr_md5_count md5c on md5c.CONTENT_MD5=fhr.CONTENT_MD5
where ls.IS_PUBLIC=1 and (ls.IS_CONTROLLED=1 or ls.IS_RESTRICTED=1)
select fhdd.CONTENT_MD5 as MD5, fhdd.PROJECT_ID as SOURCE_PROJECT, fhdd.ID as SOURCE_ID, fhdd.VERSION_NUMBER as SOURCE_VERSION, fhdd.IS_PUBLIC as SOURCE_IS_PUBLIC, fhdd.IS_CONTROLLED as SOURCE_IS_CONTROLLED, fhdd.IS_RESTRICTED as SOURCE_IS_RESTRICTED,
ls.PROJECT_ID as DUP_PROJECT, ls.ID as DUP_ID, ls.CREATED_BY, ls.IS_PUBLIC as DUP_IS_PUBLIC, ls.IS_CONTROLLED as DUP_IS_CONTROLLED, ls.IS_RESTRICTED as DUP_IS_RESTRICTED
from auditdb.fhd_detail2 fhdd
join FILE_HANDLE_RECORD fhr on fhr.CONTENT_MD5=fhdd.CONTENT_MD5
join auditdb.latest_snapshot_202003 ls on ls.FILE_HANDLE_ID=fhr.ID
where fhdd.FILE_HANDLE_ID <> fhr.ID and ls.IS_PUBLIC = 1 and (fhdd.IS_CONTROLLED <> ls.IS_CONTROLLED or fhdd.IS_RESTRICTED <> ls.IS_RESTRICTED) |
...
Export the project summary table from the MD5 duplicates
table
...
Contact the file owner on the list to notify them that their files have been identified as anomalies through a regular Synapse audit.
...
The periodic audit of Synapse activity is intended to surface potential threat scenarios concerning the privacy and security of data held in the Synapse. The approach to this audit is informed by an assessment of risks to priority data, such as the data sets associated with Synapse projects marked with restricted access control lists. The risk assessment process considers access control at the point when access is granted, when access is used, and when access may become uncontrolled.
Auditing may be done by analyzing a comprehensive report of activity over the audit period. A comprehensive report is generated by running queries that precisely target privacy threat scenarios.
Overview
The Synapse audit should occur twice a year, once in July and once in January. Each audit should contain data from the two quarters prior to the data pull. The purpose of the audit is to ensure that there have not been any data breaches or security risks during the respective audit period.
An audit report is generated during each audit to analyze the data and explain whether there have been any security breaches or privacy concerns. The Governance Regulatory Support Team should submit the audit report to WIRB annually in October during the Synapse continuing review, which occurs in October.
Threat Scenarios
1. Data access
Synapse implements an access control system based on the properties of the dataset and/or on the properties of the user profile attempting to gain access.. Public datasets may be controlled (users must agree to specific terms and request access from ACT or another specified entity and may be required to upload certain documentation based on the dataset), restricted (users must agree to specific terms, but access is granted automatically after terms are accepted), or open (users can view data either anonymously or once they have created a Synapse account).
Within dataset specific access requirements and project sharing settings, data contributors may specify whether users must be registered (created account and agreed to Synapse Pledge), certified (passed quiz indicating security and privacy policy awareness), or validated (identity linked to account has been verified) to obtain access.
Rather than restricting or controlling data, project administrators can also choose to make their projects or folders private and only share them with specific synapse users and teams. General Synapse users will not be able to view or access private projects or entities unless explicitly shared to them by a project administrator.
Data Access Threat Scenarios
Threat: A Synapse user intentionally or inadvertently accesses controlled data without qualification
Identify through data warehouse query and end user reporting:
☑ Users who have posted or accessed controlled data without the appropriate access level required for the respective dataset.
Threat: A Synapse user with significant access to data intentionally or inadvertently shares access
Identify through data warehouse query and end user reporting:
☑ A single file downloaded multiple times by a single user
Data Access Associated Queries: Top downloaders
2. Data handling
Synapse allows end users to upload data once they have certified their account by passing the certification quiz. The certification process is an administrative control that trains users on appropriate data handling procedures. Once granted data upload rights, an end user is expected to determine sharing settings and request access restrictions for the data they contribute to the platform, or adopt the sharing/access settings of the respective Synapse community they are contributing to.
Data Handling Threat Scenarios
Threat: A Synapse user intentionally or accidentally copies or uploads a controlled/restricted dataset without appropriate access controls/restrictions.
Identify through data warehouse query and end user reporting:
☑ Original terms of data contribution are not respected. Data proliferated into Synapse beyond the original terms of use
☑ Public Synapse spaces contain only data classified as public
Data Handling Associated Queries: MD5 duplicates
, Restriction change of state
3. Data loss
A Synapse account may be permitted to access many datasets of different classifications. An incident of account sharing or account compromise may result in the download of a dataset beyond what is intended according to an access restriction.
Data Loss Threat Scenarios
Threat: A Synapse account with extensive access to controlled datasets may be compromised:
Identify through data warehouse query and end user reporting:
☑ Detecting the exfiltration of data from Synapse correlated with large-scale download activity by a user
Data Loss Associated Queries: Restriction change of state
, Top downloaders
Audit Constraints
The Synapse audit approach was revised in 2020 to focus on specific threats identified through a risk assessment process. Automated queries were designed to report on the activity related to each threat.
The audit reports are limited by the time spans available to the automated queries. Some queries are based on changes to properties of objects and a query may not be able to compare an event with activity outside of its observation window. In these cases, the query will not surface a conflict between the event and a prior state. Additionally, the audit reports are constrained by what data is available in the Synapse data warehouse. Currently, account tier information (i.e. anonymous, registered, certified, validated) is not captured in the data warehouse, and therefore cannot be analyzed in the audit report. Additionally, changes to access requirement text is not captured and likewise cannot be reported.
Audit Timeline
When | Who | What |
First two weeks of January and July | Synapse Security Engineer | Run Automation
Reference “Data Warehouse Queries, Documentation, and Handline” section for details |
Second two weeks of January and July | Synapse ACT | Sort Data & Triage Threats
Reference the “ACT Data Sorting and Triaging” section for details |
Mid September | Synapse Security Engineer and Synapse ACT | Generate Audit report following this template
Reference the “Generating the Audit Report” section for details |
Late September | Director of Governance (Christine) | Review and Approve/Reject Audit Report
|
October | Synapse Security Engineer and Governance Regulatory Support Team | Security Engineer: Submit Audit Report to HITRUST Governance Regulatory Support Team: Submit Audit Report to WIRB during Synapse Continuing Review Reference the “Generating the Audit Report” section for details |
Resources
Data warehouse queries, documentation, and handling
Restriction change of state
Code Block |
---|
#select t1.ID, t1.IS_CONTROLLED, t1.IS_RESTRICTED, t1.IS_PUBLIC, t2.IS_CONTROLLED, t2.IS_RESTRICTED, t2.IS_PUBLIC
select t1.*, t2.*
from (
select ns2.*
from NODE_SNAPSHOT ns2
join (
# most recent snapshot
select ns1.ID, max(ns1.TIMESTAMP)
from NODE_SNAPSHOT ns1
group by ns1.ID
) nsmax1 on nsmax1.ID=ns2.ID
) t1
join (
select ns2.*
from NODE_SNAPSHOT ns2
join (
# snapshot a month ago
select ns1.ID, max(ns1.TIMESTAMP)
from NODE_SNAPSHOT ns1
where ns1.TIMESTAMP < unix_timestamp('2019-09-01 00:00:00')*1000
group by ns1.ID
) nsmax1 on nsmax1.ID=ns2.ID
) t2 on t2.ID=t1.ID and t2.VERSION_NUMBER=t1.VERSION_NUMBER
where not (t1.IS_PUBLIC = t2.IS_PUBLIC and t1.IS_CONTROLLED = t2.IS_CONTROLLED and t1.IS_RESTRICTED = t2.IS_RESTRICTED)
limit 100
;
|
Export the
Restriction change of state
table to a spread sheet to create a pivot table summary of the number of access control changes by project. Include this summary in the report.Contact the project owner or community manager of each different project on the list to notify them that their files have been identified as anomalies through a regular Synapse audit.
For any responses that indicate inadvertent or inappropriately permissive access control changes, create a ticket within the Governance Jira space for investigation of a privacy incident.
Top downloaders
Code Block |
---|
# top 20 downloaders by count(filehandle_id)
select fhdr.USER_ID, count(*) as c
from FILE_HANDLE_DOWNLOAD_RECORD fhdr
where fhdr.TIMESTAMP between unix_timestamp('2019-07-01 00:00:00')*1000 and unix_timestamp('2019-09-10 00:00:00')*1000
group by fhdr.USER_ID
order by c desc
limit 20; |
Contact the account holder of each account returned by this query with a prompt like the following:
Your Synapse account has been identified during a routine Synapse audit as having accessed a large number of files in the last six months. This activity may be expected due to how you use Synapse, or may be the result of a compromised or shared account.Please reply to this email message to confirm that you are not aware of a breach of your Synapse credentials and that you have not shared them with anyone else.
Summarize the responses for the report.
For any responses that indicate loss of control of account credentials, create a ticket within the Governance Jira space for investigation of a privacy incident.
MD5 duplicates
Code Block | ||
---|---|---|
| ||
create table auditdb.fhd_detail2 as
select ls.ID, ls.VERSION_NUMBER, ls.NAME,
ls.PROJECT_ID, ls.PARENT_ID, ls.BENEFACTOR_ID,
ls.IS_PUBLIC, ls.IS_CONTROLLED, ls.IS_RESTRICTED,
ls.FILE_HANDLE_ID, md5c.CONTENT_MD5, md5c.c as DUP_COUNT
from auditdb.latest_snapshot_202003 ls
join warehouse.FILE_HANDLE_RECORD fhr on fhr.ID=ls.FILE_HANDLE_ID
join auditdb.fhr_md5_count md5c on md5c.CONTENT_MD5=fhr.CONTENT_MD5
where ls.IS_PUBLIC=1 and (ls.IS_CONTROLLED=1 or ls.IS_RESTRICTED=1)
select fhdd.CONTENT_MD5 as MD5, fhdd.PROJECT_ID as SOURCE_PROJECT, fhdd.ID as SOURCE_ID, fhdd.VERSION_NUMBER as SOURCE_VERSION, fhdd.IS_PUBLIC as SOURCE_IS_PUBLIC, fhdd.IS_CONTROLLED as SOURCE_IS_CONTROLLED, fhdd.IS_RESTRICTED as SOURCE_IS_RESTRICTED,
ls.PROJECT_ID as DUP_PROJECT, ls.ID as DUP_ID, ls.CREATED_BY, ls.IS_PUBLIC as DUP_IS_PUBLIC, ls.IS_CONTROLLED as DUP_IS_CONTROLLED, ls.IS_RESTRICTED as DUP_IS_RESTRICTED
from auditdb.fhd_detail2 fhdd
join FILE_HANDLE_RECORD fhr on fhr.CONTENT_MD5=fhdd.CONTENT_MD5
join auditdb.latest_snapshot_202003 ls on ls.FILE_HANDLE_ID=fhr.ID
where fhdd.FILE_HANDLE_ID <> fhr.ID and ls.IS_PUBLIC = 1 and (fhdd.IS_CONTROLLED <> ls.IS_CONTROLLED or fhdd.IS_RESTRICTED <> ls.IS_RESTRICTED) |
Export the project summary table from the
MD5 duplicates
tableContact the file owner on the list to notify them that their files have been identified as anomalies through a regular Synapse audit.
For any responses that indicate proliferation of files beyond intended, create a ticket within the Governance Jira space for investigation of a privacy incident.
ACT Data Sorting and Triaging
The Synapse ACT determines which flagged entities require further investigation or action. In general, we are most concerned about entities that are publicly available that have unintentionally become unrestricted or uncontrolled. Please reference the use case tables below to identify when it is necessary to reach out to a Community Manager or Synapse user. Community Managers are as followed (as of December 2020):
AD Knowledge Portal: Mette Peters
Challenges: Thomas Yu
GENIE: TBD
mHEALTH: Solly Sieberts
HTAN: Xengie Doan
imCORE: TBD
Hackathons: Jineta Banerjee
Change of State Audit Use Cases
Use Case | Why is this needed? | Action Needed |
AR/Click-wrap (controlled/restricted) was removed from a public entity/project (this query should only include public entities in public projects). | Audit for security to ensure there have been no accidental breaches or loss of data. | Sort through Jira tickets to see whether this was accounted for. Otherwise reach out to the Community Manager. For scenarios where the entity/project is not linked to a Sage-managed Synapse Community, please reach out to the Synapse Security Engineer for further investigation before contacting external Synapse users. The Synapse Security Engineer should investigate whether the flagged entity contains local access settings, indicating that the entity’s access settings were not unintentionally acquired from the parent folder/project. If it seems that an entity’s access settings were inherited accidentally, ACT should reach out to the project owner. |
File switched to public and then back to private AND there is an access change to a less restrictive state than the original public file. | Audit for security to ensure there have been no accidental breaches or loss of data. | If the file was public for more than a day, investigate whether data was downloaded. If the data was downloaded during this time, reach out to the Community Manager. |
Sage employee’s test project is public. | Internal QA | Reach out to Community Manager (or the project owner if it is not associated with a community)tosee whether the owner can make the entity private. |
AR/click-wrap was added AND the project or entity was made public. | Internal QA | Do not reach out to Community Manager, but ACT should ensure that the project is listed in the Conditions for Use Synapse page |
MD5 Audit Use Cases
Use Case | Query Information Need | Why is this needed? | Action Needed |
Instances where a restricted/controlled entity is copied and the resulting public duplicate is less restricted/controlled (regardless of if the source entity was public) | Date of event For both source and duplicates we will need Project: synID, name, Entity: synID, name, created by, last modified by, controlled status (was/is controlled), restricted status (was/is restricted), public/private Potential follow-up: If a breach is discovered, we will need: any downloaders after the date at which the AR/click-wrap was removed so we can contact them) | Audit for security to ensure that any duplicated files are under the correct access requirements | Reach out to Community Manager Reach out to Synapse Security Engineer if there is no Sage Community Manager associated with the project/entity |
For Top Downloader Data, the Synapse Governance Team should reach out to all 20 top downloaders directly.
Example Email:
Your Synapse account has been identified during a routine Synapse audit as having accessed a large number of files in the last six months. This activity may be expected due to how you use Synapse, or may be the result of a compromised or shared account.
Please reply to this email message to confirm that you are not aware of a breach of your Synapse credentials and that you have not shared them with anyone else.
Ensure that you document Community Manager and Synapse User responses within the respective audit data export doc to track which flagged entities are problematic and which are not.
What to Do if a Breach is Detected
Please follow this Synapse Data Breach SOP and reference this Synapse incident confluence page for steps on reporting breaches and a log of incidents that have occurred, respectively.
Note that in a case where an access requirement/click-wrap was inappropriately removed, ACT should request a list of downloaders following the removal date from the Synapse Security Engineer.
Make sure you log the incident both in the Governance Incident Tracker and in the Confluence Incident Log.
Audit Report
Generating the Report:
The audit report should be generated annually in September, and the report requires approval by the Sage Bionetworks Director of Governance. The report should contain data from the past two audit cycles. For example, the October 2021 audit report should contain data from the Q1/Q2 2020 audit and the Q3/Q4 2020 audit periods.
Use this audit report template to generate the report. The same audit report can be submitted to WIRB and to HITRUST. The report should be finalized by the end of September so that it can be submitted to WIRB and HITRUST in October.
Submitting the Report:
One audit report will be generated for both the WIRB and HITRUST submissions.
WIRB Submission:
The Governance Regulatory Support team will submit the audit report during the Synapse Continuing Review, which occurs annually in October. For reference, Synapse submissions to WIRB are stored here.
HITRUST Submission:
The Synapse Security Engineer will submit the audit report to HITRUST annually in October.
Storing the Report:
Store the audit report, associated data files, and Community Manager/Synapse user responses here.