Introduction
A data contributor works with the Sage Access and Compliance Team (ACT) to establish that a new data set added to Synapse can only be downloaded by a NIH qualified researchers. This means, when a caller attempts to download this dataset, Synapse must first check with the NIH to determine if the caller is actually a NIH qualified researchers. GA4GH provides a technical specification to facilitate this type of authentication/authorization exchange between two system: GA4GH Passports.
According to the GA4GH specification, Synapse would be a Passport Clearinghouse, while the NIH service would be the passport Broker. This document covers the API changes needed to support this type of use case. Here is a high level summary of what we proposed to build into Synapse:
A new AccessRequirement (AR) type that can be created/managed by ACT to define one or more Claims that the caller must have in order to download restricted data.
A new Action type that informs callers when a passport visa is required in order to download a file.
Extend the Synapse OIDC Authentication system:
Add new OAuthProviderBinding implementation to connect with each passport Broker that we wish to support.
Extend the Synapse generated access_tokens system to append passport claims provided by passport Brokers to the Synapse access_token.
Add a passport visa interceptor that will validate passport visas from the Synapse access token and forward the valid sub-set to the thread local
Extend the EntityAuthorizationManagerImpl to match AR visa conditions to the principal’s visas in the thread local.
PassportACTManagedAccessRequirement
Currently, an ACT managed access requirement (AR), is created by a member of ACT to restrict download access to one or more file within Synapse. When a user wishes to download a file that is the subject of a managed AR, they will typical need to first submit a data access request to ACT. The user will only be able to download the file after ACT has approved their submission. The approval process often involves providing information that demonstrates their qualification as a researcher.
The GA4GH passport specification was designed for the case where the system that holds data and the system that approves data access are not the same. In the introduction we introduced an example where Synapse controls data that can only be access by NIH qualified research. For this example, Synapse must defer to a NIH system to determine if a user is a NIH qualified researcher. In the GA4GH passport specification terms, Synapse would be the passport clearinghouse, while the NIH system would be the passport broker. The broker provides authentication information about the user in the form of one or more passport visas, and the clearinghouse uses the passport visas to make authorization decisions.
In order to support the approval delegation process in Synapse, ACT members need a new mechanism. Specifically, ACT needs a way to define cases where data access is contingent on one or more data broker provided passport visa claims. We propose adding a new managed access requirement type: PassportACTManagedAccessRequirement. This new AR type will define the required passport visa claims needed to download data for its associated subjects.
GA4GH supports multiple types of visa claims, each with varying degrees of complexity. In addition, each claim contains temporal data used for validation. Some visas have conditions such they are only valid if one or more other visas are also present. Deciding if a visa matches the access requirement conditions will often require more than a simple “equals” check.
Part of the GA4GH visa claim specification includes a section called: conditions for cases where a visa is only valid if another visa is present. The conditions specification provides a syntax for defining visa matching rules. We propose that we reuse this syntax within the new passport AR to define the rules for matching the AR to the appropriate visa(s).
Note: As a passport clearinghouse we are required to parse visa claim conditions in order to determine if the claim is valid. For example, if visa A has a condition on visa B, then A must be invalid if B is missing. This means we already need a system for parsing conditions and matching them to visas. We should be able to reuse that system to match passport ARs to the user’s visa claims.
PassportACTManagedAccessRequirement.json
{ "description": "This is an ACT managed access requirement used to require that a user has obtained one or more GA4GH Passport Visa Claims in order to access the associated subjects.", "extends": { "$ref": "org.sagebionetworks.repo.model.ManagedACTAccessRequirement" }, "properties": { "visaConditions": { "description": "The conditions define how this access requirement matches to each required GA4GH passport visa. Each condition group can contain one or more VisaContions. Conditions within each group are delimited with an 'AND' while each groups are delimited with an 'OR'", "type": "array", "items": { "$ref": "org.sagebionetworks.repo.model.ar.ConditionGroup" } } } }
ConditionGroup.json
{ "description": "A group of one or more VisaConditions.", "properties": { "andConditions": { "description": "A group of one or more visa conditions. Each condition within the group is delimited with an 'AND'.", "type": "array", "items": { "$ref": "org.sagebionetworks.repo.model.ar.VisaCondition" } } } }
VisaType.json
{ "description": "Required. The visa type to be matched. Note: Custom types are not supported.", "type": "string", "enum": [ { "name": "AffiliationAndRole" }, { "name": "AcceptedTermsAndPolicies" }, { "name": "ResearcherStatus" }, { "name": "ControlledAccessGrants" }, { "name": "LinkedIdentities" } ] }
VisaCondition.json
{ "description": "Defines a match to a single GA4GH passport visa. See: <a href=\"https://github.com/ga4gh-duri/ga4gh-duri.github.io/blob/master/researcher_ids/ga4gh_passport_v1.md#conditions\">GA4GH passport conditions</a>", "properties": { "type": { "description": "Required. The visa type to be matched. Note: Custom types are not supported.", "$ref": "org.sagebionetworks.repo.model.ar.VisaType" }, "value": { "description": "Optional. When provided defines an expected 'value' claim.", "$ref": "org.sagebionetworks.repo.model.ar.MatchTypeValue" }, "source": { "description": "Optional. When provided defines an expected 'source' claim.", "$ref": "org.sagebionetworks.repo.model.ar.MatchTypeValue" }, "by": { "description": "Optional. When provided defines an expected 'by' claim.", "$ref": "org.sagebionetworks.repo.model.ar.MatchTypeValue" }, "brokerRedirectUrl": { "description": "The redirect URL of the passport broker that will provide this GA4GH passport visa for an authenticated caller.", "type": "string" }, "visaName": { "description": "The name of the visa to request from the passport broker.", "type": "string" } } }
MatchTypeValue.json
{ "description": "Defines both the operation and value for matching a single visa claim.", "properties": { "type": { "description": "Required. The type defines how value should be matched to a claim.", "name": "MatchType", "type": "string", "enum": [ { "name": "const", "description": "A case sensitive full string match." }, { "name": "pattern", "description": "Supports special meaning characters for matching values. Use '?' to match any single character, and '*' to match multiple characters" }, { "name": "split_pattern", "description": "A pattern match on part of a ';' delimited value." } ] }, "value": { "description": "The value depends on match type. For 'const' a match requires a case sensitive full string match of this value. For 'patterns', use a '?' to match any single character, and '*' to match multiple characters including the empty string and null string.", "type": "string" } } }
Visa Action Required
Clients use the ‘GET /entity/{id}/actions/download’ service to help guide callers with “unmet” access requirements. This service provides a list of “Actions” that the caller will need to take, in order to meet all of the AR associated with a file.
With the new passport AR, a caller will need to acquire one or more GA4GH passport visas from one or more passport broker before they will be permitted to download any file that is the subject of the AR.
Note: The caller might have many visas available to them from a passport broker. A broker might ask the caller which visa should be sent to Synapse. For such a case, we need to provide the user with the names of the visas to request from the broker.
In order to acquire the required visas, the web client will need to redirect the caller’s browser to the broker’s portal. We will cover the details of this redirect in a later section. The end result of a broker redirect will be the creation of a new Synapse access token that will include the visa claims provided by the broker. The resulting access token can then be used by either a web or programmatic client to download files that are subject to the passport AR.
Therefore, the “Action Required” for an unmet passport AR must provide both the broker’s redirect URL, plus the names of the visas to acquire.
PassportVisaClaimAction.json
{ "description": "In order to download a file the user will need to provide ore ore more GA4GH passport visa claim. Such a claim will be provided by the linked GA4GH passport broker.", "implements": [ { "$ref": "org.sagebionetworks.repo.model.download.Action" } ], "properties": { "brokerRedirectUrl": { "description": "The redirect URL of the passport broker that provides the passport visa claims needed to access data.", "type": "string" }, "visaNames": { "description": "The name of the visas that the to be provided.", "type": "array", "items": { "type": "string" } } } }
If more than one passport broker is needed to meet an single AR, ‘GET /entity/{id}/actions/download’ will provide a separate PassportVisaClaimAction for each broker.
Broker OIDC interaction
Synapse already uses OpenID Connect (OIDC) to support login via “Google” and to link an ORCID to a Synapse account. For the login case, information from Google is used to link the caller to a Synapse user ID. The final product of the OIDC process is a new Synapse access token that encodes both the user’s ID and the scope of the token. The Synapse access token is a signed JSON Web Token (JWT). The Synapse access token can be used by both web and programmatic to authenticate for all Synapse API calls.
The GA4GH ‘Data Passports' specification extends the basic OIDC process to enable a passport broker to provide a passport clearinghouse with a passport containing one or more visa claims. See also: ‘AAI OIDC Profile’. Specifically, the access token (also a JWT) provided by the broker, to clearinghouse will include an entry for the caller’s passport.
We propose extending the Synapse OIDC support to not only “login” via a broker but to also capture the broker provided passport in the resulting Synapse access token.
Note: More than one passport broker might be needed to provide a full set of required passport visa claims. Therefore, it is important that newly provided visa claim accumulate with existing visa claims.
By appending claims to the resulting Synapse access token, we can ensure that the visa are available to both web and command line clients. In the next section we will cover how the Synapse access tokens with embedded visa claim JWTs can be used for download authorization.
Passport Visa Interceptor
Currently, the primary job of the Synapse AuthenticationFilter is to validate a user provided access token in order identify the caller. The filter also passes along the access token as a header that can be read by downstream code such as the OAuthScopeInterceptor. We propose adding a new interceptor for processing visa claims found in the access token. The per-processing would include the following:
Validate signature and expiration of each visa.
Validate the conditional relationship between visas. For example, a via might include a condition such that it is only valid if another visa also exists. For such a case, the dependent visa would be invalid if its dependency were missing.
Finally, the passport visa interceptor would bind the remaining list of valid claims to the thread local. The thread local list would be of type:
PassportVisa.json
{ "description": "A representation of See: <a href=\"https://github.com/ga4gh-duri/ga4gh-duri.github.io/blob/master/researcher_ids/ga4gh_passport_v1.md#visa-format\">GA4GH passport visa</a>", "properties": { "type": { "description": "Required.", "$ref": "org.sagebionetworks.repo.model.ar.VisaType" }, "value": { "description": "Required. A string that represents any of the scope, process, identifier and version of the assertion. The format of the string can vary by the Visa Type", "type": "string" }, "source": { "description": "Required. A URL Claim that provides at a minimum the organization that made the assertion. If there is no organization making the assertion, the source claim value MUST be set to 'https://no.organization'.", "type": "string" }, "by": { "description": "Optional. The level or type of authority within the 'source' organization of the assertion.", "type": "string" } } }
Note: Since the interceptor excludes invalid visas, the PassportVisa.json does not include or any field used for validation or signing such as; conditions, asserted, alg, exp, jit, iat…
In the next section we will cover how download authorization code can use the passport visas to make download decisions.
Download Authorization
The EntityAuthorizationManagerImpl is responsible for making all entity related authorization decisions, including file download. The following is the current download decision chain:
DENY_IF_DOES_NOT_EXIST, DENY_IF_IN_TRASH, GRANT_IF_ADMIN, DENY_IF_HAS_UNMET_ACCESS_RESTRICTIONS, DENY_IF_TWO_FA_REQUIREMENT_NOT_MET, GRANT_IF_OPEN_DATA_WITH_READ, DENY_IF_ANONYMOUS, DENY_IF_HAS_NOT_ACCEPTED_TERMS_OF_USE, GRANT_IF_HAS_DOWNLOAD, DENY
Currently the step at line:4 DENY_IF_HAS_UNMET_ACCESS_RESTRICTIONS
is based on managed AR where the principal must be approved by ACT. This typically involves, checking if the principal has been granted approval for all ARs that have the given file as a subject.
We will need to extend the unmet AR check to look for the new passport AR type. The conditions of each passport AR must then be matched against the passport visa list from the thread local. The AR would be treated as ‘met’ if all visas match, and ‘unmet’ if one or more do not match. Note: The visa condition matching system should be the same as the system used to validate visas with conditions in the interceptor layer.