...
The flow starts when the user indicates to Synapse that they want to login with an external IdP. Synapse redirects the browser (the “user agent”) to the IdP, which, after authenticating them, returns an authorization code. This is forwarded to Synapse which uses its so called client credentials to exchange the authorization code for an access token, id token and (optionally) a refresh token. The inclusion of the id token is the fundamental extension to OAuth 2.0 by OpenID Connect: In addition to authorizing access (via the access token) the IdP returns information it has about the user. The id token is a JSON Web Token (JWT) so it has a JSON payload, i.e. a key-value map. The keys are “claims” about the user, like “family_name” or “email”, and the values are the user data. If the IdP is a so-called “Broker” then it can return a GA4GH data passport. The claim name is “ga4gh“passport_
passportjwt_
v1” v11
” and the value is another, embedded JWT which has a claim “ga4gh_passport_v1
". The value of this claim is an array of GA4GH “visas”, described further below. An example from NIH RAS is here.
The OIDC specification provides for defining an expiration for user information. That is, the IdP can indicate that the recipient of an id token should only consider the user information valid for a limited time. One the information has expired then a new id token should be obtained. An OIDC IdP has a “/userInfo” endpoint to which an access token can be passed. The result is a new collection of user info, returned either as a JWT or as a JSON object.
...
The passport is an array of visas, each enocded as a signed JWTencoded as a signed JWT. There are multiple types of visas. The variety of types is open ended but the GA4GH spec' defines some particular types here. It’s important to note that while some visas are assertions about the researcher (like AffiliationAndRole, ResearcherStatus) others refer to specific data sets, like ControlledAccessGrants. The reason to have these different types is illuminated by this overview article on GA4GH Data Passports, which differentiates between “registered access” and “controlled access” models. Regarding registered access:
Registered access models are a type of role-based access to datasets.
while for controlled data
In the DAC review phase, the DAC must verify the identity of the data user and determine if the proposed research is within the bounds of the permitted use(s) of the dataset. If approved, the data user and their institution must agree to the terms of use of the repository’s data through a data use or processing agreement. In the data use phase, the data user gains access to the dataset(s).
We may then conceive of different sorts of passport-linked access requirements in Synapse. One type would grant access to data if a researcher is indicated to have a certain status by a trusted Broker. Another type would grant access only if a visa provided by a trusted Broker indicates that the user has access to a certain (controlled) data set. We would expect that in the visa the data set would be referred to by its ID in the namespace known to the Broker, as opposed to its Synapse ID. Therefore the Synapse access requirement would have to include the former ID in order to be able to evaluate the user’s visas.
An example of a ResearcherStatus visa taken from here is:
Code Block |
---|
"ga4gh_visa_v1": {
"type": "ResearcherStatus",
"asserted": 1549680000,
"value": "https://doi.org/10.1038/s41431-018-0219-y",
"source": "https://grid.ac/institutes/grid.240952.8",
"by": "so"
} |
The Synapse access requirement would, at a minimum, be configured with the URI seen in the ‘value’ field. An example of a ControlledAccessGrants
visa (from the same source) is:
Code Block |
---|
...
"ga4gh_visa_v1": {
"type": "ControlledAccessGrants",
"asserted": 1549632872,
"value": "https://example-institute.org/datasets/710",
"source": "https://grid.ac/institutes/grid.0000.0a",
"by": "dac"
} |
Again, the Synapse access requirement would, at a minimum, be configured with the data set ID seen in the ‘value’ field.
Implementation Considerations
In many places, Synapse needs to rapidly answer the question of which of one or more entities a user is authorized to download. The determination reflects access requirements placed on the entities and uses corresponding access approvals, stored in the Synapse database, to answer the question. Moreover, there are use cases in which a headless user agent (e.g., a batch data processing job) seeks to download data on behalf of a user. In such cases the user agent can’t be redirected to a data passport broker to retrieve user info via the OAuth flow. We should therefore adopt a model in which, when a user first authenticates to a broker, their user info, access token and refresh token are captured in Synapse so that Synapse can maintain up-to-date visa information, which can be used to answer authorization questions without a user’s involvement.
Passport Expiration
OIDC provides for an 'expires_in' the token response. This is the time, in seconds, until the provided access token expires. The client can use this to decide when to use the refresh token to get a new access token. Note that doing so may also update the refresh token.
ID Tokens, being JWTs, have a 'exp' time stamp which is the epoch time after which the user information should no longer be considered valid. A passport should not be respected beyond this time limit.
Visas have an "asserted" field which is the timestamp (in epoch seconds) when an authority asserted what the visa claims.
The GA4GH spec' suggests that clients use this to decide whether to respect a visa. If this timestamp is not used, then the minimal check is that of the JWT "exp" timestamp.
Synapse could periodically examine the 'expires_in' and 'exp' time stamps for the current access and id tokens it holds. If an access token is close to expiration, it could update the access and id tokens. If 'expires_in' is not close to expiration but the 'exp' is close to expiration, then it could update the id token. When an id token is updated the corresponding user's access approvals would be updated (created or deleted) accordingly.
Client Provided Passports
It seems some in the GA4GH community view passports as being provided to the Clearinghouse by the client, rather than the Clearinghouse retrieving them from Brokers as proposed above. See:
In the diagram shown in “Approach 2”, the client (the blue column in the sequence diagram) will receive a userInfo object from the RAS server with a subject ID “paired” to that client by RAS. The Auth server, upon receiving a passport containing that subject ID will not be able to resolve it against the subject IDs it received from RAS. It will not “know” which of its own users the passport represents. The question then is what is required by security compliances standards (HIPAA, FISMA) and Sage Governance with respect to tracking the identity of users who download controlled data.