Document toolboxDocument toolbox

Data Contributors Exemption from Access Requirements

Personas

  • Data Contributors - The group of personas that are involved in the construction of a project and can include uploaders, curators, annotators, and analyst.

  • Data Administrators- The group of personas responsible for on-boarding and off-boarding the data contributors.

  • Data Owner - The group of personas that actually “own” the data.

  • Access and Compliance Team (ACT) - Is a team of Sage employees responsible for evaluating and enforcing data governance restrictions for all projects in Synapse.

  • Data Consumer - This is the group of users that consume data in Synapse including browsing, searching and downloading.

Introduction

The construction of many Synapse projects involves many Data Contributors including: uploaders, curators, annotators, and analysts. A project may be built over the course of multiple phases that can span months or even years. One role of a Data Administrator, would be to on-board and off-board data contributors throughout a project’s lifecycle. Today, Data Administrators manage who can contribute to a project by setting the project’s Access Control Lists (ACLs) that grant permissions such as “upload” or “edit” or “delete”. ACLs can be set on a project, folder, or even an individual Entity. A project’s ACLs define the first layer of access to its data.

 

The Sage ACT will often be engaged with the construction of a project at its earliest stages. This typically involves members of ACT working closely with Data Owners and Contributors to determine the appropriate level of data governance needed for a project’s data. When ACT determines that data restrictions are necessary for a project, they will bind special data controls called AccessRequirements (ARs) to the data. ARs can be bounds to a project, folder, or an individual Entity. Any data that is bound to an AR can only be downloaded by a user if that user has “met” the terms defined by the AR. The terms of some ARs are as simple as agreeing to special terms-of-use, while the terms of other ARs might require a user to submit a data access request to ACT for approval. A project’s ARs define the second layer of access to its data.

 

It is worth noting, that nobody* is exempt from ARs bound to data. This means Data Owners, Contributors, Administrators, Consumers, and even members of the ACT, must “meet” a bound AR before they can download the data. For Data Contributors, the terms-of-use style of AR are a minor inconvenience. However, for ARs that require data access approval, today Data Contributors need to submit “dummy” access requests and wait for approval from ACT before they can analyze the data. This is a burden on both Data Contributors and ACT. The burden compounded for large projects with many ARs.

 

* The user that actually uploaded a particular file can always download that file. This corner case should probably be removed.

 

The goal of this design is to provide a new mechanism to ease the burden of AR approvals on both Data Contributors and members of the ACT. Ideally, Data Contributors would be “exempt” from ARs on Data they they manage. At the same time, ACT must ensure that “exemptions” are not abused to bypass the controls on restricted data.

 

For more information on the use cases please see: https://sagebionetworks.jira.com/wiki/spaces/GI/pages/2946695193.

Download Authorization Mechanisms

Before we can suggest changes to the Synapse download authorization mechanisms, we need to review how they currently work. There are subtle complexities with the existing system that we will need to preserve.

Anytime a user wants to download data from Synapse, the authorization system will run the following decision chain:

DENY_IF_DOES_NOT_EXIST, DENY_IF_IN_TRASH, GRANT_IF_ADMIN, DENY_IF_HAS_UNMET_ACCESS_RESTRICTIONS, DENY_IF_TWO_FA_REQUIREMENT_NOT_MET, GRANT_IF_OPEN_DATA_WITH_READ, DENY_IF_ANONYMOUS, DENY_IF_HAS_NOT_ACCEPTED_TERMS_OF_USE, GRANT_IF_HAS_DOWNLOAD, DENY

For the purpose of our discussion we will focus on line:4 DENY_IF_HAS_UNMET_ACCESS_RESTRICTIONS and line:9 GRANT_IF_HAS_DOWNLOAD.

DENY_IF_HAS_UNMET_ACCESS_RESTRICTIONS

This rule works by first determining which ARs are bound to the Entity to be downloaded. An AR can be bound directly to an Entity, or any Entity in its hierarchy such as parent/grandparent folders… all the way up to the containing project. An Entity is the subject of all ARs found in its hierarchy. Once the full set of ARs that are applied to Entity are discovered, the next step is to determine if the user has been granted at least one approval for each AR.

Note: All AR types are explicitly granted approvals. Terms-of-use types are granted an approval when the user accepts the terms. Data access submission types are granted approvals when ACT approves the submission. ACT can also revoke approvals.

It is possible for a user to be granted multiple approvals for the same AR. For example, a scientist might belong to more than one research group where each group has been granted approval. So, even if the scientist leaves one of the research groups (resulting in a revocation), they would still have have access via the remaining group’s approval.

If there is even a single AR bound to an Entity for which the user has not been approved, then the DENY_IF_HAS_UNMET_ACCESS_RESTRICTIONS rule will deny the user’s request.

GRANT_IF_HAS_DOWNLOAD

This rule works by first finding the ACL that controls the Entity to be downloaded. An ACL can be bound directly to an Entity, or any Entity in its hierarchy such as parent/grandparent folders all the way up to the containing project. An Entity is controlled by the first ACL found in its hierarchy. Once the controlling ACL is found, it is check to determine if at least one of the the user’s principal IDs has been granted the download permission.

Note: A user’s principals IDs include the user’s ID plus the IDs of all teams that the user is a member.

If the user has been granted download (either directly of via a team), then the GRANT_IF_HAS_DOWNLOAD rule will grant the user’s download request.

Key Considerations

  1. Data contributors are not global. A Data Contributor’s AR “exemption” must be limited to data within the scope of their contribution. They must not be granted a global “exemption”.

  2. A single Data Contributor might contribute to multiple, unrelated, projects.

  3. Off-boarding a Data Contributor must remove any “exemption” granted by their participation as a contributor.

  4. A single AR can be bound to data in multiple projects.

  5. For a given set of files that are the subject of a single AR, a user might only be a Data Contributor for a sub-set of those files.

  6. Given 4 & 5, it is not possible for ACT to determine who is a Data Contributor for a single AR.

  7. Data Administrators determine who is a Data Contributor for a set of files, not ACT.

  8. ACT must approve all users for “exemptions”.

  9. The Data Contributors for a project can change over time.

  10. A single project might have many ARs, all of whose Data Contributors might need to be granted “exemptions”.

  11. ACT should be able to “reuse” the same list of “exempt” users for multiple ARs,

Proposal

As discussed above, Synapse has two independent layers of data access. The first layer is controlled by the Data Administrator to define the Data Contributors of a project through the Entity ACL system. The second layer is controlled by ACT to apply data restrictions through the AR system. In order for Data Contributors to download data they must successfully navigate both layers. It is our hope that this proposal will help these three groups better coordinate data access.

 

We recently added a feature that allows ACT to add ACLs to ARs for the purpose of delegating access approvals to non-ACT users. Specifically, we added support for a new permission: REVIEW_SUBMISSIONS, that can be granted to user on an AR’s ACL. Any users granted this permission is permitted to review and approve data access submission on to that AR.

Note: Only members of the ACT are allowed to modify ACLs on ARs.

 

Exemption Eligible Permission on AR ACLs

We propose adding support for new permission: EXEMPTION_ELIGIBLE to the AR ACL system. This new permission would allow members of the ACT to grant either an individual user or team, to be eligible for “exemption”, from data access approval requirements. It is important to note that being eligible for “exemption” is not a data access approval. Instead, the DENY_IF_HAS_UNMET_ACCESS_RESTRICTIONS rule would be replaced by: DENY_IF_NOT_EXEMPT_AND_HAS_UNMET_ACCESS_RESTRICTIONS

This new rule would be checked, for each AR bound to the Entity. The user must either be exempt or be granted at least one access approval. If a single AR check fails, the request would be denied.

To determine if a user is “exempt” on an AR, a check will be made to determine if at least one of the user’s principals have been granted the EXEMPTION_ELIGIBLE permission on the AR’s ACL, plus one of the user’s principals must be granted one more permissions on the Entity’s ACL that identify the user as a Data Contributor. If both conditions are not met, the user will not be considered “exempt”. In short, a user must be both eligible for exemption (via ACT) and must be a Data Contributor (via Data Administrator) to be exempt from an AR on a per file basis.

 

Note: We settled on a Data Contributor status to be identified by a user that has been granted both the EDIT and DELETE on a file.

 

Exemption Eligible Team

Given that a single project might have many ARs, it would laborious for ACT add/remove individual eligible users to each AR over the course of a project’s lifecycle. In fact, this might be just as much work, for all users, as the existing system of granting “dummy” submissions. Instead, it would be more convenient for ACT to create a reusable team of eligible individuals. The team would then be granted the EXEMPTION_ELIGIBLE permission on each AR in the project. ACT would then add/remove users from team over the course of the project’s lifecycle.

 

While it is possible that a single team might be reused for multiple projects, it is more likely that truly distinct projects will have their own team of exemption eligible users. This means, Data Contributors would need to leverage to the existing team membership request feature to request exemption-eligibility for each project to which they contribute. Since ACT will be the administrators of the teams bound to ARs, ACT will be notified of all membership requests of these teams. ACT would control the team membership approvals/rejections to manage exemption eligibility.

 

At this time, we recommend that teams created/used by ACT for the purpose of data access exemptions, be regular Synapse teams. This means the teams will be created using the standard team UI. We should also be able to reuse the existing team membership application UI for Data Contributor membership requests.

 

ACT will be free to add/remove team administrators from such teams, as they see fit. This would allow ACT to delegate exemption eligibility management to a 3rd party when required.

 

This proposal does not eliminate the need for a Data Contributor interact with ACT to be able to download files with restrictions. However, for cases where there are many ARs for a project, data access exemptions can be as simple as a single click for both Data Contributors and ACT.

 

Action Required

Currently, when a user attempts to download a file (through any of the Synapse download features), they are presented with one “actions required” for each AR that they have not “met”. This is shown to the user in various places in the UI, to aid the user in “meeting” each AR. For the case where the user is a Data Contributor identified by the appropriate permission, we propose extending the “actions required” to include the ID of the ACT managed team/teams. We can then extend the “actions required” UI, to include a link for the user to request membership to the team/teams.

 

Summary

This proposal provides a new mechanism for Data Contributors to be exempt from data Access Restrictions on data that they are responsible for maintaining. Exemptions are on a per file basis. In order to be exempt, the Data Contributor must be granted the appropriate permission on a file by the Data Administrator. In addition, the Data Contributor must also be granted the “EXEMPTION_ELIGIBLE” permission on a AR by a member of the ACT. In this way, a contribution from both Data Administrators and ACT, determines which files a Data Contributor is exempt. This also means exemption can be revoked by either Data Administrators or ACT at any time without the need to coordinate their actions.