Document toolboxDocument toolbox

Access Requirements with GA4GH Passport Visa Claims

This document has been superseded by:

 

 

Introduction

A data contributor works with the Sage Access and Compliance Team (ACT) to establish that a new data set added to Synapse can only be downloaded by a NIH qualified researchers. This means, when a caller attempts to download this dataset, Synapse must first check with the NIH to determine if the caller is actually an NIH qualified researcher. GA4GH provides a technical specification to facilitate this type of authentication/authorization exchange between two system: GA4GH Passports.

 

According to the GA4GH specification, Synapse would be a Passport Clearinghouse, while the NIH service would be the passport Broker. This document covers the API changes needed to support this type of use case. Here is a high level summary of what we proposed to build into Synapse:

  • A new AccessRequirement (AR) type that can be created/managed by ACT to define one or more Claims that the caller must have in order to download restricted data.

  • A new Action type that informs callers when a passport visa is required in order to download a file.

  • Extend the Synapse OIDC Authentication system:

    • Add new OAuthProviderBinding implementation to connect with each passport Broker that we wish to support.

    • Extend the Synapse generated access_tokens system to append passport claims provided by passport Brokers to the Synapse access_token.

  • Add a passport visa interceptor that will validate passport visas from the Synapse access token and forward the valid sub-set to the thread local. Extend UserManagerImpl.getUserInfo() to add visas from the thread local to resulting UserInfo object.

  • Extend the EntityAuthorizationManagerImpl to match AR visa conditions to the principal’s visas in the UserInfo.

  • Extend AsynchJobStatusManagerImpl append visa from UserInfo to the Job’s status.

PassportACTManagedAccessRequirement

Currently, an ACT managed access requirement (AR) is created by a member of ACT to restrict download access to one or more files within Synapse. When a user wishes to download a file that is the subject of a managed AR, they will typically need to first submit a data access request to ACT. The user will only be able to download the file after ACT has approved their submission. The approval process often involves providing information that demonstrates their qualification as a researcher.

The GA4GH passport specification was designed for the case where the system that holds data and the system that approves data access are not the same. In the introduction we introduced an example where Synapse controls data that can only be access by NIH qualified researchers. For this example, Synapse must defer to an NIH system to determine if a user is an NIH qualified researcher. In the GA4GH passport specification terms, Synapse would be the passport clearinghouse, while the NIH system would be the passport broker. The broker provides authentication information about the user in the form of one or more passport visas, and the clearinghouse uses the passport visas to make authorization decisions.

In order to support the approval delegation process in Synapse, ACT members need a new mechanism. Specifically, ACT needs a way to define cases where data access is contingent on one or more data broker provided passport visa claims. We propose adding a new managed access requirement type: PassportACTManagedAccessRequirement. This new AR type will define the required passport visa claims needed to download data for its associated subjects.

GA4GH supports multiple types of visa claims, each with varying degrees of complexity. In addition, each claim contains temporal data used for validation. Some visas have conditions such that they are only valid if one or more other visas are also present. Deciding if a visa matches the access requirement conditions will often require more than a simple “equals” check.

Part of the GA4GH visa claim specification includes a section called: conditions for cases where a visa is only valid if another visa is present. The conditions specification provides a syntax for defining visa matching rules. We propose that we reuse this syntax within the new passport AR to define the rules for matching the AR to the appropriate visa(s).

Note: As a passport clearinghouse we are required to parse visa claim conditions in order to determine if the claim is valid. For example, if visa A has a condition on visa B, then A must be invalid if B is missing. This means we already need a system for parsing conditions and matching them to visas. We should be able to reuse that system to match passport ARs to the user’s visa claims.

 

PassportACTManagedAccessRequirement.json

{ "description": "This is an ACT managed access requirement used to require that a user has obtained one or more GA4GH Passport Visa Claims in order to access the associated subjects.", "extends": { "$ref": "org.sagebionetworks.repo.model.ManagedACTAccessRequirement" }, "properties": { "visaConditions": { "description": "The conditions define how this access requirement matches to each required GA4GH passport visa. Each condition group can contain one or more VisaConditions. Conditions within each group are delimited with an 'AND' while groups are delimited with an 'OR'", "type": "array", "items": { "$ref": "org.sagebionetworks.repo.model.ar.ConditionGroup" } } } }

 

ConditionGroup.json

{ "description": "A group of one or more VisaConditions.", "properties": { "andConditions": { "description": "A group of one or more visa conditions. Each condition within the group is delimited with an 'AND'.", "type": "array", "items": { "$ref": "org.sagebionetworks.repo.model.ar.VisaCondition" } } } }

VisaType.json

{ "description": "Required. The visa type to be matched. Note: Custom types are not supported.", "type": "string", "enum": [ { "name": "AffiliationAndRole" }, { "name": "AcceptedTermsAndPolicies" }, { "name": "ResearcherStatus" }, { "name": "ControlledAccessGrants" }, { "name": "LinkedIdentities" } ] }

 

VisaCondition.json

 

MatchTypeValue.json

 

Visa Action Required

Clients use the ‘GET /entity/{id}/actions/download’ service to help guide callers with “unmet” access requirements. This service provides a list of “Actions” that the caller will need to take in order to meet all of the ARs associated with a file.

With the new passport AR, a caller will need to acquire one or more GA4GH passport visas from one or more passport brokers before they will be permitted to download any file that is the subject of the passport AR.

 

Note: The caller might have many visas available to them from a passport broker. A broker might ask the caller which visa should be sent to Synapse. For such a case, we need to provide the user with the names of the visas to request from the broker.

 

In order to acquire the required visas, the web client will need to redirect the caller’s browser to the broker’s portal. We will cover the details of this redirect in a later section. The end result of a broker redirect will be the creation of a new Synapse access token that will include the visa claims provided by the broker. The resulting access token can then be used by either a web or programmatic client to download files that are subject to the passport AR.

Therefore, the “Action Required” for an unmet passport AR must provide both the broker’s redirect URL, plus the names of the visas to acquire.

PassportVisaClaimAction.json

If more than one passport broker is needed to meet an single AR, ‘GET /entity/{id}/actions/download’ will provide a separate PassportVisaClaimAction for each broker.

 

Broker OIDC interaction

Synapse already uses OpenID Connect (OIDC) to support login via “Google” and to link an ORCID to a Synapse account. For the login case, information from Google is used to link the caller to a Synapse user ID. The final product of the OIDC process is a new Synapse access token that encodes both the user’s ID and the scope of the token. The Synapse access token is a signed JSON Web Token (JWT). The Synapse access token can be used by both web and programmatic clients to authenticate Synapse API requests.

 

The GA4GH ‘Data Passports' specification extends the basic OIDC process to enable a passport broker to provide a passport clearinghouse with a passport containing one or more visa claims. See also: ‘AAI OIDC Profile’. Specifically, the access token (also a JWT) provided by the broker, to the clearinghouse will include an entry for the caller’s passport.

 

We propose extending the Synapse OIDC support to not only “login” via a broker but to also capture the broker provided passport in the resulting Synapse access token.

 

Note: More than one passport broker might be needed to provide a full set of required passport visa claims. Therefore, it is important that newly provided visa claim accumulate with existing visa claims.

 

By appending claims to the resulting Synapse access token, we can ensure that the visas are available to both web and command line clients. In the next section we will cover how the Synapse access tokens with embedded visa claim JWTs can be used for download authorization.

 

Passport Visa Interceptor

Currently, the primary job of the Synapse AuthenticationFilter is to validate a user provided access token in order identify the caller. The filter also passes along the access token as a header that can be accessed by downstream code such as the OAuthScopeInterceptor. We propose adding a new interceptor for processing visa claims found in the access token. The per-processing would include the following:

  • Validate signature and expiration of each visa.

  • Validate the conditional relationship between visas. For example, a visa might include a condition such that it is only valid if another visa also exists. For such a case, the dependent visa would be invalid if its dependency were missing.

After validation, the passport visa interceptor will bind all valid visas to the thread local. The thread local list would be of type:

PassportVisa.json

Note: Since the interceptor excludes invalid visas, the PassportVisa.json does not include or any field used for validation or signing such as; conditions, asserted, alg, exp, jit, iat…

 

Currently, the service layer calls: UserManagerImpl.getUserInfo() to get an in-memory representation of the User (UserInfo). This UserInfo object is then forwarded to all of the lower code layers. Therefore, we propose extending the UserManager to gather the Vias from the thread local and add them to the resulting UserInfo object. This abstracts most of the code from the thread local data.

In the next section we will cover how download authorization code can use the passport visas to make download decisions.

 

Download Authorization

The EntityAuthorizationManagerImpl is responsible for making all entity related authorization decisions, including file download. The following is the current download decision chain:

Currently, line:4 DENY_IF_HAS_UNMET_ACCESS_RESTRICTIONS is based on managed AR where the principal must be approved by ACT. This typically involves, checking if the principal has been granted approval for all ARs that have the given file as a subject.

We will need to extend the unmet AR check to look for the new passport AR type. The conditions of each passport AR must then be matched against the principal’s passport visas contained in the UserInfo object passed to the manager. The AR would be treated as ‘met’ if all visas match, and ‘unmet’ if one or more do not match. Note: The visa condition matching system should be the same as the system used to validate visas with conditions in the interceptor layer.

 

Asynchronous Jobs

A caller can start an asynchronous to download files as a zip. For this case one or more of the files to be download might require one or more visas in order to be authorized to download. For such a case, machine that executes the job will not be the same as the machine that originated the request, so the thread local visa information will unavailable on to the worker’s thread.

 

Note: A user might use multiple access tokens to make API calls at the same time. For example, a user might uses one token to make edits to a Synapse project in the web UI. At the same time, they might be running a headless workflow to update data in a different project. We cannot assume that both access tokens will have the same passport visas. Passport visas cannot be treated as global data automatically applied to a user.

 

In order to maintain the stateless nature of passport visas, we propose copping visas from thread that starts an asynchronous job into the job’s status. Specifically, the AsynchJobStatusManagerImpl.startJob() method can copy visas from the provided UserInfo into the job’s status. We can then extend the AsyncJobRunnerAdapter to pull the visas from the job’s state, and add them to the UserInfo used at the the start of each asynchronous worker run. This would allow download authorization checks from within asynchronous workers to behave the same as synchronous calls.