JSON Schema-based Access Requirements
Corresponding ticket: PLFM-9449
This document describes extending the Synapse API so Access Requirements can describe additional, flexible information that must be collected from end users in data access requests. The solution provides services that facilitate managing the AR questionnaires as well as simplifying the UI presented to data requesters.
The proposed solution involves adding new services for the Synapse Access and Compliance Team (ACT) to create and manage 'form field' elements. A new Access Requirement type is added, which contains form field elements which can be composed to define the required responses when a user requests access to data. New services will use the new objects to generate a streamlined form that can be presented to a user when they initiate a data access request.
Background
Sensitive data in Synapse can be protected using one or more Access Requirements (ARs). When an AR is applied to a resource (e.g. FileEntity) in Synapse, end users must meet the terms of the AR to access the resource (e.g., download the file). Many ARs are "Managed" ARs, where information is collected from a requester in a "Data Access Request" (DAR). The DAR is then reviewed by a Data Access Committee (DAC). Using the information provided by the requester, the committee may approve or reject the DAR. Requester(s) meet the terms of the AR when their corresponding DAR is approved.
Access Requirements in Synapse are currently defined using a 'static' interface, which makes it difficult to gather information from data requesters that is required to support projects with diverse data governance and adjudication needs. To support these use cases, we propose a design that extends the existing Access Requirement/Data Access Request flow to use JSON Schema to describe additional information to gather in an Data Access Request.
For more information and use case information, see PSI-1 and TECH-184.
Goals
Extend access requirement and data access submission services to collect custom data within the existing data access request flow
Streamlined form experience when submitting a request against multiple access requirements
Reuse of information from prior submissions
Synapse Governance Team can describe additional data to collect in an access request using a JSON Schema
Custom information is gathered via a form presented to requester
Synapse Governance Team can provide an optional 'UI Schema' to customize form appearance
Submission reviewers can review custom form data
Add a path to using schemas for existing Access Requirements/approvals (OK to have a 1-time migration step from old AR to new AR)
Non-Goals
See PSI-1.
Supporting all types of form data (e.g. nested objects, file upload)
Conditional logic (see Appendix)
Enabling interoperability with other platforms (see Appendix)
Proposal Summary
We propose the following changes:
New services that can be used by ACT to create and maintain customizable, shared form fields.
A new access requirement type is defined that contains a set of form fields. Form fields can be shared by multiple access requirements.
A new service for users to generate a schema describing the fields captured by multiple access requirements. If any fields are shared between access requirements, they will be deduplicated in the schema. The schema can be used by the UI to display a form.
A new service for users to submit form data and create multiple data access requests (Submissions)
API Design
We propose adding the following new services and new/changed objects.
Services
Endpoint | Request Body | Response | Notes | Authorization Required |
|---|---|---|---|---|
POST /accessRequirement/field | FormField | FormField | Used to create a form field. | ACT only |
GET /accessRequirement/field/{id} | None | FormField | Used to retrieve a form field by its ID | None |
GET /accessRequirement/field/{id}/version/{versionNumber} | None | FormField | Used to retrieve a specific version of a form field | None |
POST /accessRequirement/field/{id}/update/async/start | UpdateFormFieldRequest | AsyncJobId | Used to update a form field by its ID. The response includes the updated object and a list of updated AR IDs Form fields are versioned, and field versions are immutable. Any change(s) to a field will increment the version number. When a field is updated, all associated access requirements are also updated (their version is incremented) to use the new field version. | ACT only |
GET /accessRequirement/field/{id}/update/async/get/{asyncToken} | None | UpdateFormFieldResponse |
| ACT only |
POST /accessRequirement/field/search | FormFieldSearchRequest | FormFieldSearchResponse | Search all registered form fields in the system. Only the latest versions of fields are returned. | None |
POST /dataAccessSubmission/schema/generate/async/start | GenerateDataAccessSchemaRequestInterface | AsyncJobId | Given a set of Access Requirement IDs and Versions, generates a JSON Schema and a form UI Schema, and optionally the current submission data | None |
GET /dataAccessSubmission/schema/generate/async/get/{asyncToken} | None | GenerateDataAccessSchemaResponse |
| None |
POST /dataAccessSubmission/schema/submit/async/start | SubmitSchemaDataRequest | AsyncJobId | Used to issue multiple submissions for using user-provided data that was created with the help of a schema. If the data is valid against the schema, the response includes the set of created submissions. If the data is invalid against the schema, no submissions are created, and the validation errors are returned. | Authenticated only |
GET /dataAccessSubmission/schema/submit/async/get/{asyncToken} | None | SubmitSchemaDataResponse |
| Authenticated only |
Objects
org.sagebionetworks.repo.model.dataaccess.schema.FormField
{
"title": "Data Access Request Form Field",
"description": "Defines a field that is used to define the set of questions a user must respond to when providing a data access request. DAR fields cannot be deleted. The schema 'type' of a DAR field cannot be changed.",
"implements": [
{
"$ref": "org.sagebionetworks.repo.model.Versionable"
}
],
"properties": {
"id": {
"type": "string",
"description": "The unique identifier of this field."
},
"name": {
"type": "string",
"description": "The internal name of this field. This name should be used by Access Requirement editors to identify and reuse the field. This field will not be exposed in the schema or form (define \"title\" in the schema instead)."
},
"etag": {
"type": "string",
"description": "Synapse employs an Optimistic Concurrency Control (OCC) scheme to handle concurrent updates. Since the E-Tag changes every time a resource is updated it is used to detect when a client's current representation of a resource is out-of-date."
},
"schemaDefinition": {
"$ref":"org.sagebionetworks.repo.model.schema.JsonSchema",
"description": "A JSON Schema that defines this property."
},
"uiDefinition": {
"type": "object",
"description": "Defines the appearance of this field in the form UI. Defined by <a href=\"https://rjsf-team.github.io/react-jsonschema-form/docs/api-reference/uiSchema\">react-jsonschema-form</a>."
},
"preFillScope": {
"$ref": "org.sagebionetworks.repo.model.dataaccess.schema.PreFillScope",
"description": "Defines the logic for how prior submission data should be loaded when filling out the form. Default is `RENEWAL`"
},
"orderWeight": {
"type": "integer",
"description": "Defines the order of elements in the form. Fields with a lower order weight will appear earlier in the form. Fields with the same orderWeight will be ordered lexicographically by `id`, then `versionNumber`. Ensures that a schema generated from multiple access requirements has a consistent, predictable ordering."
},
"deprecated": {
"type": "boolean",
"description": "Marking a field as deprecated will hide it from search results by default. It will not affect Access Requirements that use this field. Default is `false`."
}
},
"required": ["name", "schemaDefinition", "orderWeight"]
}
org.sagebionetworks.repo.model.dataaccess.schema.PreFillScope
{
"title": "Pre-fill Scope",
"description": "Defines the logic for how a form field should load data from a prior submission.",
"type": "string",
"enum": [
{
"name": "RENEWAL",
"description": "Loads the most recent response to this field if a prior submission from this user exists against a specified access requirement."
},
{
"name": "USER",
"description": "Loads the most recent response to this field across all submissions. Examples: Name, Institution"
},
{
"name": "NONE",
"description": "The most recent response is never loaded."
}
]
}
org.sagebionetworks.repo.model.dataaccess.schema.UpdateFormFieldRequest
{
"title": "Update Form Field Request",
"description": "Request body to update an access requirement form field.",
"implements": [
{
"$ref": "org.sagebionetworks.repo.model.asynch.AsynchronousRequestBody"
}
],
"properties": {
"field": {
"$ref": "org.sagebionetworks.repo.model.dataaccess.schema.FormField"
}
},
"required": ["field"]
}
This object could be extended to support new parameters as needed (e.g. skipUpdateAccessRequirements, dryRun).
org.sagebionetworks.repo.model.dataaccess.schema.UpdateFormFieldResponse
{
"title": "Update Form Field Response",
"description": "Response body for a request to update a data access request field.",
"implements":[
{
"$ref": "org.sagebionetworks.repo.model.asynch.AsynchronousResponseBody"
}
],
"properties": {
"field": {
"$ref": "org.sagebionetworks.repo.model.dataaccess.schema.FormField"
},
"updatedAccessRequirementIds": {
"type": "array",
"items": {
"type": "string"
}
}
},
"required": ["field", "updatedAccessRequirementIds"]
}
org.sagebionetworks.repo.model.dataaccess.schema.GenerateDataAccessSchemaRequestInterface
{
"title": "Generate Data Access Schema Request",
"type": "interface",
"description": "Request body to generate the schema, UI schema, and pre-filled data for a submission.",
"implements": [
{
"$ref": "org.sagebionetworks.repo.model.asynch.AsynchronousRequestBody"
}
],
"properties": {
"concreteType": {
"type": "string",
"description": "Indicates which implementation this object represents."
}
},
"required": ["concreteType"]
}
org.sagebionetworks.repo.model.dataaccess.schema.GenerateDataAccessSchemaFromAccessRequirements
{
"title": "Generate Data Access Schema From Access Requirements",
"description": "Request body to generate the schema, UI schema, and pre-filled data for a submission. Intended to support data requesters in the data access request flow.",
"implements": [{
"$ref": "org.sagebionetworks.repo.model.dataaccess.schema.GenerateDataAccessSchemaRequestInterface"
}],
"properties": {
"accessRequirements": {
"type": "array",
"description": "The set of AR ID and version numbers used to generate a schema.",
"items": {
"$ref": "org.sagebionetworks.repo.model.AccessRequirementReference"
}
},
"includePrefilledSubmissionData": {
"type": "boolean",
"description": "Whether to include the prefilled submission data in the response. Default is `false`."
}
},
"required": ["accessRequirements"]
}
org.sagebionetworks.repo.model.dataaccess.schema.GenerateDataAccessSchemaFromFields
{
"title": "Generate Data Access Schema from Form Fields",
"description": "Request body to generate the schema and UI schema from a set of fields. Intended to support drafting a form for an access requirement.",
"implements": [{
"$ref": "org.sagebionetworks.repo.model.dataaccess.schema.GenerateDataAccessSchemaRequestInterface"
}],
"properties": {
"formFields": {
"type": "array",
"description": "The set of form fields that describes the form.",
"items": {
"$ref": "org.sagebionetworks.repo.model.dataaccess.schema.FormFieldReference"
}
}
},
"required": ["formFields"]
}
org.sagebionetworks.repo.model.dataaccess.schema.GenerateDataAccessSchemaResponse
{
"title": "Generate Data Access Schema Response",
"description": "Response body to generate the schema, UI schema, and pre-filled request data for a submission.",
"implements":[
{
"$ref": "org.sagebionetworks.repo.model.asynch.AsynchronousResponseBody"
}
],
"properties": {
"jsonSchema": {
"$ref": "org.sagebionetworks.repo.model.schema.JsonSchema"
},
"uiSchema": {
"type": "object"
},
"prefilledSubmissionData": {
"type": "object"
}
},
"required": ["jsonSchema", "uiSchema"]
}
org.sagebionetworks.repo.model.dataaccess.schema.SubmitSchemaDataRequest
{
"title": "Submit Schema Data Request",
"description": "Request body to create submissions using user-provided data that was filled in with the assistance of a JSON Schema.",
"implements": [
{
"$ref": "org.sagebionetworks.repo.model.asynch.AsynchronousRequestBody"
}
],
"properties": {
"accessRequirements": {
"type": "array",
"description": "The set of AR ID and version numbers used to generate the schema.",
"items": {
"$ref": "org.sagebionetworks.repo.model.AccessRequirementReference"
}
},
"submissionData": {
"type": "object",
"description": "The data that the user provided to create submissions. It must be valid against the schema that describes the corresponding set of access requirements."
},
"accessorChanges": {
"type": "array",
"description": "List of user changes. For a batch submission using schema data, users can only gain access at this time.",
"items": {
"$ref":"org.sagebionetworks.repo.model.dataaccess.AccessorChange"
}
},
"subjectId":{
"type": "string",
"description": "The ID of the subject user interested in. This information will be used to help user navigate back to where they were to continue their work."
},
"subjectType":{
"$ref":"org.sagebionetworks.repo.model.RestrictableObjectType",
"description": "The type of the subject user interested in. This information will be used to help user navigate back to where they were to continue their work."
}
},
"required": ["accessRequirements", "submissionData", "accessorChanges"]
}
org.sagebionetworks.repo.model.dataaccess.schema.SubmitSchemaDataResponse
{
"title": "Submit Schema Data Response",
"description": "Response body representing the result of a submit schema data request. A request either results in creation of one or more submissions, or a set of schema validation errors.",
"implements":[
{
"$ref": "org.sagebionetworks.repo.model.asynch.AsynchronousResponseBody"
}
],
"properties": {
"status": {
"$ref": "org.sagebionetworks.repo.model.dataaccess.schema.SubmitSchemaDataResultStatus"
},
"createdSubmissionIds": {
"type": "array",
"description": "The set of submissions that were created as a result of the request.",
"items": {
"type": "string"
}
},
"validationErrors": {
"$ref": "org.sagebionetworks.repo.model.dataaccess.schema.SubmissionValidationResult",
"description": "The validation errors that were encountered that prevent creating submissions."
}
}
}
org.sagebionetworks.repo.model.dataaccess.schema.SubmitSchemaDataResultStatus
{
"title": "Submit Schema Data Result Status",
"description": "Status of a Submit Schema Data Response",
"type": "string",
"enum": [
{
"name": "SUCCESS",
"description": "Submissions were successfully created using the attached data."
},
{
"name": "VALIDATION_ERROR",
"description": "Submitted data was invalid against the schema. Submissions were not created."
}
]
}
org.sagebionetworks.repo.model.dataaccess.schema.SubmissionValidationResult
Note: all of these objects fields also exist in org.sagebionetworks.repo.model.schema.ValidationResults. We are likely to instead factor out an interface that describes these properties and reuse it in both implementation.
{
"title": "Submission Validation Result",
"description": "Represents the JSON Schema validation results of a SubmitSchemaDataResponse.",
"properties": {
"isValid": {
"type": "boolean",
"description": "True if the object is currently valid according to the schema."
},
"validatedOn": {
"type": "string",
"format": "date-time",
"description": "The date-time this object was validated"
},
"validationErrorMessage": {
"type": "string",
"description": "If the object is not valid according to the schema, a simple one line error message will be provided."
},
"allValidationMessages": {
"description": "If the object is not valid according to the schema, a the flat list of error messages will be provided with one error message per sub-schema.",
"type": "array",
"items": {
"type": "string"
}
},
"validationException": {
"description": "If the object is not valid according to the schema, a recursive ValidationException will be provided that describes all violations in the sub-schema tree.",
"$ref": "org.sagebionetworks.repo.model.schema.ValidationException"
}
}
}
isValid, validationErrorMessage, allValidationMessages, and validationException
org.sagebionetworks.repo.model.dataaccess.schema.FormFieldReference
{
"title": "Form Field Reference",
"description": "Used to reference a specific data access request field and version.",
"properties": {
"fieldId": {
"type": "string",
"description": "The unique identifier of this field."
},
"fieldVersionNumber": {
"type": "integer",
"description": "The version number of this field."
}
},
"required": ["fieldId", "fieldVersionNumber"]
}
org.sagebionetworks.repo.model.dataaccess.schema.FormFieldSearchRequest
{
"title": "Form Field Search Request",
"description": "A request body to search form fields. Only the latest versions of form fields can be retrieved.",
"properties": {
"name": {
"type": "string",
"description": "Filter by the internal name of the FormField using case-insensitive substring matching."
},
"includeDeprecated": {
"type": "boolean",
"description": "Whether to include deprecated fields in the results. Default is `false`."
},
"nextPageToken": {
"type": "string",
"description": "A token used to get the next page of a request."
}
}
}
org.sagebionetworks.repo.model.dataaccess.schema.FormFieldSearchResponse
{
"title": "Form Field Search Response",
"description": "A response body containing form field search results.",
"properties": {
"results": {
"type": "array",
"items": {
"$ref": "org.sagebionetworks.repo.model.dataaccess.schema.FormField"
},
"description": "The matching Form Fields corresponding to the search parameters."
},
"nextPageToken": {
"type": "string",
"description": "A token used to get the next page of a particular search query."
}
}
}
org.sagebionetworks.repo.model.AccessRequirementReference
{
"title": "Access Requirement Reference",
"description": "Used to reference a specific access requirement and version.",
"properties": {
"accessRequirementId": {
"type": "string",
"description": "The unique identifier of this access requirement."
},
"accessRequirementVersionNumber": {
"type": "integer",
"description": "The version number of this access requirement."
}
},
"required": ["accessRequirementId", "accessRequirementVersionNumber"]
}
org.sagebionetworks.repo.model.HasExpiration
interface factored out of
ManagedACTAccessRequirementfor reuse
{
"title": "Has Expiration",
"type": "interface",
"description": "Used to describe an access requirement for which AccessApprovals will expire after some duration.",
"properties": {
"expirationPeriod": {
"type":"integer",
"description": "After an AccessApproval is granted for this AccessRequirement, it will be expired after expirationPeriod milliseconds. Set this value to 0 to indicate that AccessApproval will never be expired."
}
}
}
org.sagebionetworks.repo.model.JsonSchemaAccessRequirement
{
"title": "JSON Schema Access Requirement",
"description": "A Synapse 'Access Control Team' controlled Access Requirement, a 'tier 3' Access Requirement. In addition to the functionality provided by the Managed ACT Access Requirement, this Access Requirement type also supports collecting information described by a JSON Schema.",
"implements": [
{
"$ref": "org.sagebionetworks.repo.model.ACTAccessRequirementInterface"
},
{
"$ref": "org.sagebionetworks.repo.model.HasAccessorRequirement"
},
{
"$ref": "org.sagebionetworks.repo.model.HasExpiration"
}
],
"properties": {
"formFields": {
"type": "array",
"description": "The set of form fields that describes the required submission data. When creating a submission against multiple access requirements, the fields are deduplicated. Submission data will not be accepted by the system if it is not valid against the schema.",
"items": {
"$ref": "org.sagebionetworks.repo.model.dataaccess.schema.FormFieldReference"
}
}
},
"required": ["formFields"]
}
org.sagebionetworks.repo.model.dataaccess.Submission
{
"description": "A submission to request access to controlled data.",
"properties": {
// Existing properties omitted for brevity
"schemaData": {
"type": "object",
"description": "Additional data provided by the submitter that is described by a schema generated using the corresponding access requirement version."
}
}
}
Sequence Diagram Examples
ACT creates form fields and updates multiple JSON Schema-based access requirements to use the fields
Data requester creates submissions for ARs with overlapping fields
Open Questions
Which, if any, of these properties should continue to be collected by JSON-Schema defined ARs WITHOUT defining them in the schema?
File Uploads (we could support file upload for schema-defined properties, but that adds complexity)
Data Use Certificate
IRB Approval
Additional file attachments
ResearchProject
Institution
Project Lead
Intended Data Use Statement
If we have support for this data through fields, we can offer a migration service to convert a Managed AR to a JSON Schema AR.
It it important for us to also streamline the renewal flow in this pass?
Today, certain fields are shown in only if a submission is a renewal ("publications", "summaryOfUse"). Is support for this kind of conditional logic a requirement?
Appendix
GA4GH Passport Visa
The research spike ticket includes the following "bonus" acceptance criteria":
Assess whether JSON submission can be stored/exchanged on Passport Visa
My assessment is that form responses could technically be encoded in a GA4GH Passport Custom Visa, which is a signed JWT. However, this may not be the best approach to disseminating access request responses. For example, the set of user responses could be very large. Using the Visa protocol to disseminate this kind of information seems counter to the intent of the specification.
Field Groups
At a later date, we could add support for field groups. In addition to fields, AR may reference field groups. Field groups could enable one or more of the following use cases
Simplify building and updating ARs (insert or update a group of questions)
UI affordances across groups (e.g. pages or sections)
Capturing conditional logic
Dynamically changing 'requiredness' of a shared field