This is documentation for Bridge User Data Download (BUDD) Service.
Data Flow
- User requests their data in the app, specifying the start date and end date. (App may or may not supply default start and end dates.)
- The app calls Bridge REST API API with the start date and end date (requires user authentication).
- Bridge Server writes the request to an internal SQS queue. This request contains study ID, username, start date, and end date.
- BUDD reads from the SQS, aggregates the requested data, and sends an email to the user with a link to where they can download the data. This link will expire after 24 hours.
What BUDD Does Internally
- Parses the request from SQS (https://github.com/Sage-Bionetworks/BridgeUserDataDownloadService/blob/develop/src/main/java/org/sagebionetworks/bridge/udd/helper/SqsHelper.java)
- Gets the study from Dynamo DB Study table (https://github.com/Sage-Bionetworks/BridgeUserDataDownloadService/blob/develop/src/main/java/org/sagebionetworks/bridge/udd/dynamodb/DynamoHelper.java#L41) - This is needed to get the Stormpath information to get the user's account, as well as to get the configured "from" email address.
- Gets the user's health ID (user to obtain health code) and email address from Stormpath (https://github.com/Sage-Bionetworks/BridgeUserDataDownloadService/blob/develop/src/main/java/org/sagebionetworks/bridge/udd/accounts/StormpathHelper.java)
- Gets the user's health code from the Dynamo DB HealthId table and queries for uploads with that health code and with upload dates within the start and end date (inclusive) from the Dynamo DB Upload table, index healthCode-uploadDate-index (https://github.com/Sage-Bionetworks/BridgeUserDataDownloadService/blob/develop/src/main/java/org/sagebionetworks/bridge/udd/dynamodb/DynamoHelper.java#L59)
- S3 Packager (https://github.com/Sage-Bionetworks/BridgeUserDataDownloadService/blob/develop/src/main/java/org/sagebionetworks/bridge/udd/s3/S3Packager.java). This does the following:
- Downloads the uploads from S3.
- Decrypts the uploads and writes them to a temp directory. (A new temp directory is created for every request.) Individual files are named in the format YYYY-MM-DD-UploadId.zip, so users can organize their uploads by date.
- Errors in downloading or decrypting are written to a error.log in the temp directory, which is included in the master zip file.
- Creates a master zip file called userdata-YYYY-MM-DD-to-YYYY-MM-DD-randomGuid.zip (start date and end date) and zips all upload files and error.log into the master zip file.
- Uploads the master zip file to S3.
- Creates a pre-signed URL for the master zip file, for HTTP GET only and with an expiration date 24 hours from now.
- Deletes the temp files and temp directories.
- Emails the S3 pre-signed URL to the user's registered email address (https://github.com/Sage-Bionetworks/BridgeUserDataDownloadService/blob/develop/src/main/java/org/sagebionetworks/bridge/udd/helper/SesHelper.java)
Deployment
TODO
Future Improvements
TODO
See Also
Original design doc: Design Doc