Document toolboxDocument toolbox

Data Access Notifications

versio

comment

versio

comment

11/24/2021

Added this tracking table

06/15/2020

Created

Use cases: Annual Renewal

JIRA:

PLFM-6072 - Getting issue details... STATUS

Overview

When the ACT establishes an Access Requirement (AR) for a dataset users might request access to the dataset, submit the requests and gain access after ACT approval. An AR might also impose an expiration date for the approval, usually set to 1 year.

Once the submission request is approved a set of access approvals are created in the system for each user that are part of the original request storing along the expiration of the approval.

Today the system doesn’t process automatically the expiration for approvals nor sends out automatic reminders for near to expire approvals. Instead members of the governance team manually review the approvals at fixed intervals and process the list sending out emails to the submitters (or accessors when access is revoked) of the requests (The current flow can be found here: Access renewal flow) maintaining a spreadsheet with the processed users (See Access Renewals List). A document detailing the manual process and the emails that are sent out can be found here: Access Renewal Process.

The whole process is time consuming, prone to errors and can be automated, the purpose of this design is to introduce the support to send automatic renewal notifications as well as to handle automatically the revocation and associated notifications.

The focus of this design will be on AR of type ManagedACTAccessRequirement with accessType DOWNLOAD and that therefore apply to a set of RestrictableObjectDescriptor of type ENTITY (e.g. in other words access to datasets). Note that the wording dataset includes a set of entities and it is not a defined object in the system but rather a convention used to manage the AR.

Requirements

From the document about Annual Renewal there are a set of requirements that the system should implements.

  • The system must send two renewal reminders for a given AR and submitter: 2 months and 1 month prior to the expiration of the submission

  • The renewal notifications must be configurable to include a custom URL that points to extra instructions for the renewal

  • The system must automatically revoke expired approvals

  • The system must send a revocation notification for users that lost their approvals

  • The notifications must be delivered irrespectively of user notification settings: https://www.ftc.gov/tips-advice/business-center/guidance/can-spam-act-compliance-guide-business for CAN spam act compliance, these emails about expiration or revocation are purely transactional/relationship/other, and hence not covered by CAN SPAM)

  • The renewal and revocation notification should include a friendly name of the dataset referred by the AR

Proposed Design

While most of the requirements can be fulfilled already without any major change, there are some considerations both in the API and in the backend implementations:

API Changes:

  • Introduce two new fields in a ManagedACTAccessRequirement model object:

    • datasetName (String): Optional filed that will be used as a friendly label when sending notifications, and could be used when showing the AR

    • renewalDetailsUrl (String → parsed as a URL): An optional URL with additional instructions for the renewal notifications

  • Include in the AccessorGroup model object the information about the notifications through a new field:

    • notifications: A list of notifications sent for the accessor group, the model of each item in the list would contain information about the notification that are scheduled/sent.

Notification:

field

type

description

field

type

description

accessRequirementId

String

The id of the AR

submitterId

String

The id of the submitter

recipientId

String

The id of the user where the notification was sent to

type

enum<NotificationType>

The type of notification: RENEWAL_NOTIFICATION, REVOKE_NOTIFICATION

status

enum<NotificationStatus>

The status of the notification: SCHEDULED, SENT

Note that the AccessorGroup is an object computed from a grouping by AR and Submitter, the notifications will be computed and included accordingly.

Backend Changes

Access requirements

  • We need to add the new fields above: datasetName and renewalDetailsUrl.

  • Minimum expiration date: either 0 (disabled) or 1 year. This because it seems that there is no need for a complex scheduling of notifications for now (e.g. what happens if the expiration is less than 2 months etc). If in the future we require more flexibility we can add more complex notification configuration to the AR and compute the notification schedule accordingly.

Notifications

In order to support sending renewal and revocation notifications we need to store the ones that are already processed so that we can avoid sending them multiple times. Additionally renewal and revocation notifications have been sent out manually already so we should avoid reprocessing them and give us a chance to backfill the AR with the dataset names.

In order to reach this goal the proposal involves to actually schedule the renewal notifications when a Submission is APPROVED. This allows us:

  • Apply the system for future submissions and give us time to:

    • Backfill the AR with the dataset name and urls

    • Backfill the old submissions with notifications if needed for the few datasets that we manage

  • The implementation of a simple worker that just checks and send scheduled notifications (rather than trying to compute what is about to expire)

  • Report information about the scheduled/sent notifications

The drawback is that the notifications needs to be managed, e.g. when a submission is renewed the previous scheduled notification needs to be updated.

A new support table needs to be introduced that allows us to store the state of the notifications that mirrors the object model above:

DATA_ACCESS_NOTIFICATIONS (omits std fields, id etag, created_on etc)

field

description

field

description

ACCESS_REQUIREMENT_ID

The id of the AR

SUBMITTER_ID

The id of the submitter

STATUS

The status of the notification: SCHEDULED/SENT

TYPE

The type of the notification: RENEWAL_NOTIFICATION/REVOKE_NOTIFICATION

TIMESTAMP

The time when the notification should be sent

RECIPIENT_ID

The id of the user that is the target of the notification (this is the same as the submitter for renewals, while the accessor id for revocations)

With this table in place we can introduce a worker that process the scheduled renewals notifications. Note that we cannot reuse the change message machinery since there is no object type that actually changes (and for an AccessorGroup a notification applies to a dynamic group that is defined by the submitter/ar tuple).

Also note that we store the submitter id rather than the submission id, this is due to the disconnect between access approvals and submissions (See open question below).

Note: We might want to consider using a generic “scheduling” service for messages to users (e.g. MessageManager.scheduleMessage()) adding a layer of abstraction, maybe having a message_type plus some sort of unique identifier (note that in this case the identifier might include the AR id + submitter id) with a set of message factories for the given types.

Revocation

To automatically revoke the approvals a new worker will need to periodically check the APPROVED approvals that have expired and change their status and potentially schedule/send the revocation notification. Note that changing the approval status might not actually revoke access for the accessor (e.g. if other not expired approvals are present for the user) so the worker will have to implement logic to check the actual accessor approval state.

Note that the same process applies when a revocation is “added” through a submission renewal.

Emails

We can reuse the current machinery for sending messages to users, when the workers process the notifications they will build the message from a template according to the AR configuration save it to a file handle and create a message to be sent.

Note that the notification might actually go out from a real synapse user account, we can create a dedicated synapse account for data access renewals instead of the noreply notification, in this way the sender can receive eventual bounces and or processing errors and the recipient might reply to the notification as well.

Since the messages need to be delivered despite the user notification settings we need a way to override this behavior, for example adding a new column in the MESSAGE_TO_USER table:

field

description

field

description

OVERRIDE_NOTIFICATION_SETTING

A flag that defines if the message should be sent despite the notification setting of the recipient.

To build the email body itself we will reuse the current velocity templating engine and the content will be built according to the configuration of the AR with an end result that should resemble the canned language used in this document: Renewal Process and Canned Language.

The dataset name and the renewal url in the AR will be used to build the final email. Note however that the name is optional, so we need to come up with some heuristic to generate some sort of header. My proposal is to simply list the entities the AR refers to with a link to them.

TODO: Get the final text for the templates

Open Questions

  • How to handle prod vs staging? Isn’t there the issue that an email could be sent by staging after a first migration if some notification were not processed already (We probably have this problem already?): Below are some solutions discussed with the engineering team, for the first iteration of the implementation solution 5. below is the easiest and more flexible to implement.

    1. A solution that comes to mind is that we implement a service to disable certain features, similar to the read-only service (could be part of the same API call) and when go live we reenable them. The flow might be: We prepare the staging stack → put in read-only mode + disable the email notifications (e.g. the worker that sends emails asynchronously) → we perform migration → we put the system back in read-write mode → testing → read-only mode → final migration → re-enable all features → release. Note however that this would disable all the email notifications that are processed asynchronously by a worker (e.g. messages to users).

    2. An alternative might be to configure staging to redirect all the messages processed by the worker to a single recipient that we control (this way we can actually check the emails that are sent without spamming users)?

      Alternative solutions proposed during the design review:

    3. The staging stack could detect that it’s “staging” and make decisions based on that: the drawback is that this introduces (as the solutions proposed above) different behavior between prod/staging.

    4. A service that is separate from the stack and deployed separately could be created to process the emails to be sent, the service could interrogate a given stack only (e.g. prod only) and ask for the emails to be sent. The same service could be deployed for staging but configured differently (e.g. instead of sending the emails it could save them to S3). In this way the prod/staging stacks would remain the same there is no need to switch configurations. The drawback is that we need to build additional infrastructure.

    5. Another interesting solution: We can implement a new (administrative) web service that allows to push messages for example on a queue. A timer worker will periodically send a request to this service to trigger another worker (that polls the given queue and is setup as any other MessageDriven worker). The target worker is the one that actually process the scheduled messages to send, so we can basically externalize the trigger of the worker that process the messages that are scheduled. In this way the timer worker can be instructed to always trigger the request to prod and at the same time we can reuse the same web service on staging for testing. This solution has the advantage that it reuses the current infrastructure for timers and workers and it’s a bit more generic and can be reused for other purposes.

  • Would it be better to store the submission id in the notification table instead of the ar/submitter? It seems that an access approval can be created without a submission (See https://rest-docs.synapse.org/rest/POST/accessApproval.html)? How that ties together with the accessor group?