Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Overview

Components:

...

This is documentation for Bridge User Data Download (BUDD) Service

...

The user (participant) will call the requestUserData API in BridgePF. The user will pass in a .

Table of Contents

Data Flow

  • User requests their data in the app, specifying the start date and end date. (App may or may not supply default start and end dates.)
  • The app calls Bridge REST API API with the start date and end date

...

  • (requires user authentication).
  • Bridge Server writes the request to an internal SQS queue. This request contains study ID, username, start date, and end date

...

The User Data Request Queue is a backed by SQS. Queue entries contain health code, email address, start date, and end date. (NOTE: It's probably a bad idea to have health code and email address live in the same place, even if only temporarily, as this means someone only needs to break into our AWS account to potentially get identified user health data. One alternative is to put just the email address in the SQS queue and have the Bridge User Data Download Service query Stormpath, but that still leaves us with personal identifying info in SQS.)

The Bridge User Data Download Service is a daemon process running in EC2. (TBD What framework? How is this deployed?) This process will poll the User Data Request Queue for user data requests. It then queries the HealthDataRecord and HealthDataAttachment DDB tables and the attachments S3 bucket to pull the raw health data records, bundles them up, and uploads them to S3. The service then sends an email to the user with a link to download their health data.

The Email with Download Link is an email with an S3 pre-signed URL that points to the user's health data. For security, the pre-signed URL will expire after 24 hours. If the user still needs access to the data after 24 hours, there is a renewUserData link in the email which will generate a new S3 pre-signed URL. This link points to the renewUserData API in BridgePF server (TODO fill in these details), which follows a similar data flow as requestUserData. However, because the user data download has already been generated, the Bridge User Data Download Service simply creates a new S3 pre-signed URL and sends a new Email with Download Link. Advantages: Users can download their data on a system other than their phone. We don't need to build a web UI for users to download their data.

Alternative Designs

Bridge User Data Download as a Web Portal

Advantages:

  • fewer moving parts, since everything is encapsulated in a single service
  • security is simpler, since the user must provide their credentials to access Bridge User Data Download

Disadvantages:

  • higher dev cost
  • users need to get credentials from phone to log into web portal, which may or may not be easy.
  • BUDD reads from the SQS, aggregates the requested data (which takes roughly a minute, depending on the amount of data), and sends an email to the user with a link to where they can download the data. This link will expire after 12 hours.

BUDD Internal Structure

The main entry point into BUDD is BridgeUddWorker https://github.com/Sage-Bionetworks/BridgeUserDataDownloadService/blob/develop/src/main/java/org/sagebionetworks/bridge/udd/worker/BridgeUddWorker.java. This contains:

Development

Local Development

Create a fork from https://github.com/Sage-Bionetworks/BridgeUserDataDownloadService. Follow the steps in https://github.com/Sage-Bionetworks/BridgeUserDataDownloadService/blob/develop/README.md (If you're only planning on running the code but not on editing, you should be able to pull from the root fork directly.)

Testing

First, make sure your test account has uploads for the time range you want to test with.

To test through the Bridge Server, use the following example request:

POST https://webservices.sagebridge.org/v3/users/self/emailData
{
  "startDate":"2015-08-15",
  "endDate":"2015-08-19",
  "type":"DateRange"
}

To test against BUDD directly, log into the AWS Console, go to the SQS dashboard, and submit the following example request as an SQS message:

{
  "studyId":"api",
  "username":"dwayne.jeng+test01@sagebase.org",
  "startDate":"2015-07-23",
  "endDate":"2015-07-30"
}

Either method will send an email to your registered email address.

Deploy to Dev

Submit your code changes to your own personal fork. Create a pull request to the root fork. Once the pull request has been merged, Travis will automatically build and deploy to the dev server on Elastic Beanstalk.

Deploy to Staging/Prod

  • Create a workspace from the root fork (https://github.com/Sage-Bionetworks/BridgeUserDataDownloadService) if you don't already have one.
  • Make sure all branches are up to date (git pull as necessary).
  • Go to the staging branch (git checkout uat), merge from develop (git merge --ff-only develop).
  • Push back to GitHub (git push). This should trigger Travis to automatically build and deploy to the staging server on Elastic Beanstalk.

Similar steps for Prod.

Rolling Back Deployments

  • Log into the AWS Console and go to the Elastic Beanstalk Dashboard.
  • In the top nav bar drop down, go to Application Versions.
  • You'll see a list of versions named travis-[git commit hash]-[timestamp in epoch seconds]. Check the version you want to roll back to and click Deploy.
  • Select the environment from the drop down and click Deploy.

Access Logs

Logs can be found at https://logentries.com/. Credentials to the root Logentries account can be found at belltown:/work/platform/PasswordsAndCredentials/passwords.txt. Alternatively, get someone with account admin access to add your user account to Logentries.

If for some reason, the logs aren't showing up in Logentries, file a support ticket with Logentries. The alternate steps to reach the logs are below

  • Log into the AWS Console and go to the Elastic Beanstalk Dashboard.
  • Select the environment you want to view logs for.
  • Click on Logs in the left nav bar.
  • In the drop down (top right), click Request Logs.
    • Last 100 Lines will give you a link to a page with the logs on screen.
    • Full Logs will give you a link to a zip file you can download.

If this still doesn't work, you can SSH directly into BUDD hosts (see below) and find logs at /var/log/tomcat8/catalina.out

Logging Into BUDD Hosts

You may need to be in the Fred Hutch intranet or logged into the Fred Hutch VPN for this to work.

  • Log into the AWS Console and go to the EC2 Dashboard.
  • Click on Instances in the left nav bar.
  • In the table, find the host(s) with the name Bridge-UDD-Worker-Dev (or whatever environment you want to log into). Select that host. (If there's more than one in the environment you want, select just one host.)
  • In the information panel on the bottom, find the field Public DNS host. This is the hostname you want to SSH into. But first, you'll need the PEM file to log in.
  • Log into belltown and download the PEM files from /work/platform/PasswordsAndCredentials
  • On your machine, run ssh -i [path to PEM file] ec2-user@[EC2 hostname]

You can save yourself some time with an entry in your ~/.ssh/config that looks like

host Bridge-UDD-Dev
     HostName ec2-52-20-91-245.compute-1.amazonaws.com
     User ec2-user
     IdentityFile ~/Bridge-UDD-Dev.pem

Now you can just run ssh Bridge-UDD-Dev.

Next Steps

Short/Medium-Term

  • Jira Legacy
    serverJIRA (sagebionetworks.jira.com)
    serverIdba6fb084-9827-3160-8067-8ac7470f78b2
    keyBRIDGE-735
    - add User Data Download to iOS SDK
  • Jira Legacy
    serverJIRA (sagebionetworks.jira.com)
    serverIdba6fb084-9827-3160-8067-8ac7470f78b2
    keyBRIDGE-761
    - Log archiving and alarms
  • Jira Legacy
    serverJIRA (sagebionetworks.jira.com)
    serverIdba6fb084-9827-3160-8067-8ac7470f78b2
    keyBRIDGE-762
    - Monitoring
  • Jira Legacy
    serverJIRA (sagebionetworks.jira.com)
    serverIdba6fb084-9827-3160-8067-8ac7470f78b2
    keyBRIDGE-763
    - Refactor shared copy-pasted code into shared package
  • Jira Legacy
    serverJIRA (sagebionetworks.jira.com)
    serverIdba6fb084-9827-3160-8067-8ac7470f78b2
    keyBRIDGE-764
    - Audit IAM credentials
  • Jira Legacy
    serverJIRA (sagebionetworks.jira.com)
    serverIdba6fb084-9827-3160-8067-8ac7470f78b2
    keyBRIDGE-765
    - Move Stormpath keys from env vars to key management solution

Long-Term

  • Performance improvements - Multi-threading? Map-Reduce?
  • Web Portal - Better user interface than email?
  • Data visualization - More useful than raw JSON dump
  • Caching/De-duping - If the user requests the same data again, use the existing master zip file instead of generating a new one. Also helpful if their link expires and they want to get the data again.
  • Cleanup task to delete old user requests and user-requested data?

See Also

Original design doc: Design Doc