Download Participant Roster Worker
See https://sagebionetworks.jira.com/browse/BRIDGE-1630
Scenario
Researchers frequently need to see all participants in their study. Some of our larger studies have thousands or tens of thousands of participants. This requires calling a paginated API in a loop that can take a very long time.
We want to provide a solution where a researcher can request a participant roster, which will kick off a worker process to gather all participants into a CSV, and email that CSV to researchers.
Milestone 1: We email the CSV as an attachment to the researchers, similar to what the User Data Downloader currently does. This link will expire in 12 hours.
Milestone 2: The CSV will be placed in a password-protected zip file before emailing it to the researcher.
Stretch Goal: By default, we will return all participant’s in the caller’s org. As a stretch goal, the participant can specify a study ID. We will need to verify this participant has access to this study ID, and then instead fetch all participants enrolled in this study.
Worker Changes
Create a new DownloadParticipantRosterWorkerProcessor class (in the BridgeWorkerPlatform git project, in new java package org.sagebionetworks.bridge.participantroster). See for example UploadRedriveWorkerProcessor.
The worker will take in the following JSON as an input:
{
"service":"DownloadParticipantRosterWorker",
"body":{
"appId":"<app ID>",
"userId":"<caller's user ID>",
"password":"<password used to protect the zip file, see Milestone 2>",
"studyId":"<optional study ID, see Stretch Goal above>"
}
}
Note that the service name much match the Component name in the Worker Processor class. The Worker Processor will only receive the JSON inside the “body” block, not the whole JSON.
The worker will then need to get the caller’s user using Bridge API getParticipantByIdForApp (see example).
The worker will then need to call Bridge API searchAccountSummariesForApp (see example), passing in the appId and caller’s org. Because this is a paginated API, we will want to start with offsetBy=0 and increment offsetBy by pageSize with every loop until we reach a total of AccountSummaryList.getTotal() (or until we get an empty result set). Additionally notes:
We’ll want to sleep 1 second between each call to avoid burning out Bridge Server.
We’ll want to explicitly specify pageSize 100 to be clear.
We may need to add one or more calls to BridgeHelper.
As we collect account summaries, we should write that to a CSV using OpenCSV CSVWriter. This is already included in the Worker Platform’s dependencies. An example can be found here. Documentation can be found here.
Once this CSV is complete, we’ll want to email it to the caller’s email address (as provided by the previous Bridge call). See the following examples:
Note: You may need to create your own versions of these methods and/or refactor some of them to make them more general.
Milestone 2
Instead of email the CSV, we put the CSV into a password-protected zip file and email that. Note that for UserDataDownload, we use Java’s built-in ZipOutputStream, which does not have password protection capability. Kelly to investigate Zip libraries that allow password protection.
Stretch Goal
TODO
Testing the Worker Changes
Get your Worker up and running locally as per the Getting Started guide. Be sure to override your SQS queue with your own personal queue so you don’t conflict with anyone else’s tests. (You may need to create your queue in the AWS console.) In the AWS console, send a message to your queue using the JSON format described above.
Bridge Server Changes
We will need to create an API in ParticipantsController, probably POST /v3/participants/emailroster. Initially, the POST body will be empty.
BridgeServer will then post a message to the worker SQS queue with the format above, passing in the caller’s app ID and user ID.
In Milestone 2, BridgeServer will generate a password to provide to the Worker and return the password to the caller for their reference. We will need to generate a password with minimum strength, such as 8 characters minimum, capital and lowercase letters, and numbers.
In Stretch Goal, the POST body will optionally include a studyId, which BridgeServer will then pass into the Worker request.
See for example UserDataDownloadController and UserDataDownloadService.