Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Table of Contents

Introduction

A data contributor works with the Sage Access and Compliance Team (ACT) to establish that a new data set added to Synapse can only be downloaded by a NIH qualified researchers. This means, when a caller attempts to download this dataset, Synapse must first check with the NIH to determine if the caller is actually a NIH qualified researchers. GA4GH provides a technical specification to facilitate this type of authentication/authorization exchange between two system: GA4GH Passports.

According to the GA4GH specification, Synapse would be a Passport Clearinghouse, while the NIH service would be the passport Broker. This document covers the API changes needed to support this type of use case. Here is a high level summary of what we proposed to build into Synapse:

...

A new AccessRequirement (AR) type that can be created/managed by ACT to define one or more Claims that the caller must have in order to download restricted data.

...

A new Action type that informs callers when a passport visa is required in order to download a file.

...

Extend the Synapse OIDC Authentication system:

  • Add new OAuthProviderBinding implementation to connect with each passport Broker that we wish to support.

  • Extend the Synapse generated access_tokens system to append passport claims provided by passport Brokers to the Synapse access_token.

...

This document has been superseded by:API Extensions for GA4GH Passport Integration

Table of Contents

Introduction

A data contributor works with the Sage Access and Compliance Team (ACT) to establish that a new data set added to Synapse can only be downloaded by a NIH qualified researchers. This means, when a caller attempts to download this dataset, Synapse must first check with the NIH to determine if the caller is actually an NIH qualified researcher. GA4GH provides a technical specification to facilitate this type of authentication/authorization exchange between two system: GA4GH Passports.

According to the GA4GH specification, Synapse would be a Passport Clearinghouse, while the NIH service would be the passport Broker. This document covers the API changes needed to support this type of use case. Here is a high level summary of what we proposed to build into Synapse:

  • A new AccessRequirement (AR) type that can be created/managed by ACT to define one or more Claims that the caller must have in order to download restricted data.

  • A new Action type that informs callers when a passport visa is required in order to download a file.

  • Extend the Synapse OIDC Authentication system:

    • Add new OAuthProviderBinding implementation to connect with each passport Broker that we wish to support.

    • Extend the Synapse generated access_tokens system to append passport claims provided by passport Brokers to the Synapse access_token.

  • Add a passport visa interceptor that will validate passport visas from the Synapse access token and forward the valid sub-set to the thread localExtend the . Extend UserManagerImpl.getUserInfo() to add visas from the thread local to resulting UserInfo object.

  • Extend the EntityAuthorizationManagerImpl to match AR visa conditions to the principal’s visas in the thread localUserInfo.

  • Extend AsynchJobStatusManagerImpl append visa from UserInfo to the Job’s status.

PassportACTManagedAccessRequirement

Currently, an ACT managed access requirement (AR) , is created by a member of ACT to restrict download access to one or more file files within Synapse. When a user wishes to download a file that is the subject of a managed AR, they will typical typically need to first submit a data access request to ACT. The user will only be able to download the file after ACT has approved their submission. The approval process often involves providing information that demonstrates their qualification as a researcher.

The GA4GH passport specification was designed for the case where the system that holds data and the system that approves data access are not the same. In the introduction we introduced an example where Synapse controls data that can only be access by NIH qualified researchresearchers. For this example, Synapse must defer to a an NIH system to determine if a user is a an NIH qualified researcher. In the GA4GH passport specification terms, Synapse would be the passport clearinghouse, while the NIH system would be the passport broker. The broker provides authentication information about the user in the form of one or more passport visas, and the clearinghouse uses the passport visas to make authorization decisions.

...

GA4GH supports multiple types of visa claims, each with varying degrees of complexity. In addition, each claim contains temporal data used for validation. Some visas have conditions such that they are only valid if one or more other visas are also present. Deciding if a visa matches the access requirement conditions will often require more than a simple “equals” check.

...

Code Block
languagejson
{
	"description": "This is an ACT managed access requirement used to require that a user has obtained one or more GA4GH Passport Visa Claims in order to access the associated subjects.",
	"extends": {
		"$ref": "org.sagebionetworks.repo.model.ManagedACTAccessRequirement"
	},
	"properties": {
		"visaConditions": {
			"description": "The conditions define how this access requirement matches to each required GA4GH passport visa.  Each condition group can contain one or more VisaContionsVisaConditions. Conditions within each group are delimited with an 'AND' while each groups are delimited with an 'OR'",
			"type": "array",
			"items": {
				"$ref": "org.sagebionetworks.repo.model.ar.ConditionGroup"
			}
		}
	}
}

...

Code Block
languagejson
{
	"description": "A group of one or more VisaConditions.",
	"properties": {
		"andConditions": {
			"description": "A group of one or more visa conditions.  Each condition within the group is delimited with an 'AND'.",
			"type": "array",
			"items": {
				"$ref": "org.sagebionetworks.repo.model.ar.VisaCondition"
			}
		}
	}
}

VisaType.json

Code Block
languagejson
{
	"description": "Required.  The visa type to be matched.  Note: Custom types are not supported.",
	"type": "string",
	"enum": [
		{
			"name": "AffiliationAndRole"
		},
		{
			"name": "AcceptedTermsAndPolicies"
		},
		{
			"name": "ResearcherStatus"
		},
		{
			"name": "ControlledAccessGrants"
		},
		{
			"name": "LinkedIdentities"
		}
	]
}

...

VisaCondition.json

Code Block
languagejson
{
	"description": "Defines a match to a single GA4GH passport visa.  See: <a href=\"https://github.com/ga4gh-duri/ga4gh-duri.github.io/blob/master/researcher_ids/ga4gh_passport_v1.md#conditions\">GA4GH passport conditions</a>",
	"properties": {
		"type": {
			"description": "Required.  The visa type to be matched.  Note: Custom types are not supported.",
			"$ref": "org.sagebionetworks.repo.model.ar.VisaType"
		},
		"value": {
			"description": "Optional.  When provided defines an expected 'value' claim.",
			"$ref": "org.sagebionetworks.repo.model.ar.MatchTypeValue"
		},
		"source": {
			"description": "Optional.  When provided defines an expected 'source' claim.",
			"$ref": "org.sagebionetworks.repo.model.ar.MatchTypeValue"
		},
		"by": {
			"description": "Optional.  When provided defines an expected 'by' claim.",
			"$ref": "org.sagebionetworks.repo.model.ar.MatchTypeValue"
		},
		"brokerRedirectUrl": {
			"description": "The redirect URL of the passport broker that will provide this GA4GH passport visa for an authenticated caller.",
			"type": "string"
		},
		"visaName": {
			"description": "The name of the visa to request from the passport broker.",
			"type": "string"
		}
	}
}

...

MatchTypeValue.json

Code Block
languagejson
{
	"description": "Defines both the operation and value for matching a single visa claim.",
	"properties": {
		"type": {
			"description": "Required.  The type defines how value should be matched to a claim.",
			"name": "MatchType",
			"type": "string",
			"enum": [
				{
					"name": "const",
					"description": "A case sensitive full string match."
				},
				{
					"name": "pattern",
					"description": "Supports special meaning characters for matching values.  Use '?' to match any single character, and '*' to match multiple characters"
				},
				{
					"name": "split_pattern",
					"description": "A pattern match on part of a ';' delimited value."
				}
			]
		},
		"value": {
			"description": "The value depends on match type.  For 'const' a match requires a case sensitive full string match of this value.  For 'patterns', use a '?' to match any single character, and '*' to match multiple characters including the empty string and null string.",
			"type": "string"
		}
	}
}

...

Clients use the ‘GET /entity/{id}/actions/download’ service to help guide callers with “unmet” access requirements. This service provides a list of “Actions” that the caller will need to take , in order to meet all of the AR ARs associated with a file.

With the new passport AR, a caller will need to acquire one or more GA4GH passport visas from one or more passport broker brokers before they will be permitted to download any file that is the subject of the passport AR.

Note: The caller might have many visas available to them from a passport broker. A broker might ask the caller which visa should be sent to Synapse. For such a case, we need to provide the user with the names of the visas to request from the broker.

...

Code Block
languagejson
{
	"description": "In order to download a file the user will need to provide oreone oreor more GA4GH passport visa claimclaims.  Such a claim will be provided by the linked GA4GH passport broker.",
	"implements": [
		{
			"$ref": "org.sagebionetworks.repo.model.download.Action"
		}
	],
	"properties": {
		"brokerRedirectUrl": {
			"description": "The redirect URL of the passport broker that provides the passport visa claims needed to access data.",
			"type": "string"
		},
		"visaNames": {
			"description": "The name of the visas that the to be provided.",
			"type": "array",
			"items": {
				"type": "string"
			}
		}
	}
}

...

Synapse already uses OpenID Connect (OIDC) to support login via “Google” and to link an ORCID to a Synapse account. For the login case, information from Google is used to link the caller to a Synapse user ID. The final product of the OIDC process is a new Synapse access token that encodes both the user’s ID and the scope of the token. The Synapse access token is a signed JSON Web Token (JWT). The Synapse access token can be used by both web and programmatic clients to authenticate for all Synapse API callsrequests.

The GA4GH ‘Data Passports' specification extends the basic OIDC process to enable a passport broker to provide a passport clearinghouse with a passport containing one or more visa claims. See also: ‘AAI OIDC Profile’. Specifically, the access token (also a JWT) provided by the broker, to the clearinghouse will include an entry for the caller’s passport.

...

By appending claims to the resulting Synapse access token, we can ensure that the visa visas are available to both web and command line clients. In the next section we will cover how the Synapse access tokens with embedded visa claim JWTs can be used for download authorization.

...

Currently, the primary job of the Synapse AuthenticationFilter is to validate a user provided access token in order identify the caller. The filter also passes along the access token as a header that can be read accessed by downstream code such as the OAuthScopeInterceptor. We propose adding a new interceptor for processing visa claims found in the access token. The per-processing would include the following:

  • Validate signature and expiration of each visa.

  • Validate the conditional relationship between visas. For example, a via visa might include a condition such that it is only valid if another visa also exists. For such a case, the dependent visa would be invalid if its dependency were missing.

FinallyAfter validation, the passport visa interceptor would bind the remaining list of valid claims will bind all valid visas to the thread local. The thread local list would be of type:

...

Note: Since the interceptor excludes invalid visas, the PassportVisa.json does not include or any field used for validation or signing such as; conditions, asserted, alg, exp, jit, iat…

In the next section we will cover how download authorization code can use the passport visas to make download decisions.

Download Authorization

The EntityAuthorizationManagerImpl is responsible for making all entity related authorization decisions, including file download. The following is the current download decision chain:

...

Currently, the service layer calls: UserManagerImpl.getUserInfo() to get an in-memory representation of the User (UserInfo). This UserInfo object is then forwarded to all of the lower code layers. Therefore, we propose extending the UserManager to gather the Vias from the thread local and add them to the resulting UserInfo object. This abstracts most of the code from the thread local data.

In the next section we will cover how download authorization code can use the passport visas to make download decisions.

Download Authorization

The EntityAuthorizationManagerImpl is responsible for making all entity related authorization decisions, including file download. The following is the current download decision chain:

Code Block
			DENY_IF_HAS_UNMET_ACCESS_RESTRICTIONSDOES_NOT_EXIST,
			DENY_IF_IN_TRASH,
			GRANT_IF_ADMIN,
			DENY_IF_TWO_FA_REQUIREMENT_NOT_MET,
			GRANT_IF_OPEN_DATA_WITH_READ,
			DENY_IF_ANONYMOUS,
			DENY_IF_HAS_NOT_ACCEPTED_TERMS_OF_USE,
			GRANT_IF_HAS_DOWNLOAD,
			DENY

Currently the step at line:4 DENY_IF_HAS_UNMET_ACCESS_RESTRICTIONS is based on managed AR where the principal must be approved by ACT. This typically involves, checking if the principal has been granted approval for all ARs that have the given file as a subject.

...

_IF_HAS_UNMET_ACCESS_RESTRICTIONS,
			DENY_IF_TWO_FA_REQUIREMENT_NOT_MET,
			GRANT_IF_OPEN_DATA_WITH_READ,
			DENY_IF_ANONYMOUS,
			DENY_IF_HAS_NOT_ACCEPTED_TERMS_OF_USE,
			GRANT_IF_HAS_DOWNLOAD,
			DENY

Currently, line:4 DENY_IF_HAS_UNMET_ACCESS_RESTRICTIONS is based on managed AR where the principal must be approved by ACT. This typically involves, checking if the principal has been granted approval for all ARs that have the given file as a subject.

We will need to extend the unmet AR check to look for the new passport AR type. The conditions of each passport AR must then be matched against the principal’s passport visas contained in the UserInfo object passed to the manager. The AR would be treated as ‘met’ if all visas match, and ‘unmet’ if one or more do not match. Note: The visa condition matching system should be the same as the system used to validate visas with conditions in the interceptor layer.

Asynchronous Jobs

A caller can start an asynchronous to download files as a zip. For this case one or more of the files to be download might require one or more visas in order to be authorized to download. For such a case, machine that executes the job will not be the same as the machine that originated the request, so the thread local visa information will unavailable on to the worker’s thread.

Note: A user might use multiple access tokens to make API calls at the same time. For example, a user might uses one token to make edits to a Synapse project in the web UI. At the same time, they might be running a headless workflow to update data in a different project. We cannot assume that both access tokens will have the same passport visas. Passport visas cannot be treated as global data automatically applied to a user.

In order to maintain the stateless nature of passport visas, we propose copping visas from thread that starts an asynchronous job into the job’s status. Specifically, the AsynchJobStatusManagerImpl.startJob() method can copy visas from the provided UserInfo into the job’s status. We can then extend the AsyncJobRunnerAdapter to pull the visas from the job’s state, and add them to the UserInfo used at the the start of each asynchronous worker run. This would allow download authorization checks from within asynchronous workers to behave the same as synchronous calls.