...
- User requests their data in the app, specifying the start date and end date. (App may or may not supply default start and end dates.)
- The app calls Bridge REST API API with the start date and end date (requires user authentication).
- Bridge Server writes the request to an internal SQS queue. This request contains study ID, username, start date, and end date.
- BUDD reads from the SQS, aggregates the requested data (which takes roughly a minute, depending on the amount of data), and sends an email to the user with a link to where they can download the data. This link will expire after 24 12 hours.
...
BUDD
...
Internal Structure
The main entry point into BUDD is BridgeUddWorker https://github.com/Sage-Bionetworks/BridgeUserDataDownloadService/blob/develop/src/main/java/org/sagebionetworks/bridge/udd/
...
...
BridgeUddWorker.java. This contains:
- A loop which polls SQS for requests and parses those requests. (There is a wait time configured so that while testing, if you Ctrl+C out of the process, you don't get a bunch of errors from "can't connect to SQS".)
- Gets the study from Dynamo DB Study table (https://github.com/Sage-Bionetworks/BridgeUserDataDownloadService/blob/develop/src/main/java/org/sagebionetworks/bridge/udd/dynamodb/DynamoHelper.java#L41) - This is needed to get the Stormpath information to get the user's account, as well as to get the configured "from" email address.Gets the user's health ID (user to obtain health code) and email address from Stormpath DynamoDB (because accounts and data are partitioned by study).
- Gets the account from Stormpath by email address. (The code says username, but we recently changed Bridge Server so that all usernames are the same as email address, and everything just keys off email address.)
- Gets the health ID from the user's account and queries DDB with the health ID to get the health code.
- Queries DDB SynapseTables to get a list of all Synapse health data tables for that study and SynapseSurveyTables to get a list of all survey metadata tables for that study.
- Calls the SynapsePackager (https://github.com/Sage-Bionetworks/BridgeUserDataDownloadService/blob/develop/src/main/java/org/sagebionetworks/bridge/udd/accountssynapse/StormpathHelperSynapsePackager.java) Gets to download all the user's health code from the Dynamo DB HealthId table and queries for uploads with that health code and with upload dates within the start and end date (inclusive) from the Dynamo DB Upload table, index healthCode-uploadDate-index data from Synapse (within the specified date range). The SynapsePackager does the following.
- It kicks off a bunch of async tasks for each Synapse table and for each survey table.
- Some of these tasks are SynapseDownloadFromTableTask (https://github.com/Sage-Bionetworks/BridgeUserDataDownloadService/blob/develop/src/main/java/org/sagebionetworks/bridge/udd/
- java), which queries a Synapse table for the user's data by health code and date range, downloads the results as a TSV, and downloads all the file handles.
- Some of these tasks are SynapseDownloadSurveyTask (https://github.com/Sage-Bionetworks/BridgeUserDataDownloadService/blob/develop/src/main/java/org/sagebionetworks/bridge/udd/
- Downloads the uploads from S3.
- Decrypts the uploads and writes them to a temp directory. (A new temp directory is created for every request.) Individual files are named in the format YYYY-MM-DD-UploadId.zip, so users can organize their uploads by date.
- Errors in downloading or decrypting are written to a error.log in the temp directory, which is included in the master zip file.
- Creates a master zip file called userdata-YYYY-MM-DD-to-YYYY-MM-DD-randomGuid.zip (start date and end date) and zips all upload files and error.log into the SynapseDownloadSurveyTask.java), which downloads the complete table of survey metadata as a TSV.
- Zips up all files (TSVs and file handles) into a master zip file.
- Uploads the master zip file to S3.
- Creates a pre-signed URL for the master zip file , for HTTP GET only and with an expiration date 24 hours from now.Deletes the temp files and temp directoriesand returns the pre-signed URL to BridgeUddWorker.
- Emails the S3 pre-signed URL back to the user's registered email address (https://github.com/Sage-Bionetworks/BridgeUserDataDownloadService/blob/develop/src/main/java/org/sagebionetworks/bridge/udd/helper/SesHelper.java)
Development
Local Development
...
{
"studyId":"api",
"username":"DwayneJengTest01dwayne.jeng+test01@sagebase.org",
"startDate":"2015-07-23",
"endDate":"2015-07-30"
}
...
- Log into the AWS Console and go to the Elastic Beanstalk Dashboard.
- In the top nav bar drop down, go to Application Versions.
- You'll see a list of versions named travis-[git commit hash]-[timestamp in epoch seconds]. Check the version you want to roll back to and click Deploy.
- Select the environment from the drop down and click Deploy.
Access Logs
Logs can be found at https://logentries.com/. Credentials to the root Logentries account can be found at belltown:/work/platform/PasswordsAndCredentials/passwords.txt. Alternatively, get someone with account admin access to add your user account to Logentries.
If for some reason, the logs aren't showing up in Logentries, file a support ticket with Logentries. The alternate steps to reach the logs are below
- Log into the AWS Console and go to the Elastic Beanstalk Dashboard.
- Select the environment you want to view logs for.
- Click on Logs in the left nav bar.
- In the drop down (top right), click Request Logs.
- Last 100 Lines will give you a link to a page with the logs on screen.
- Full Logs will give you a link to a zip file you can download.
If this still doesn't work, you can SSH directly into BUDD hosts (see below) and find logs at /var/log/tomcat8/catalina.out
Logging Into BUDD Hosts
You may need to be in the Fred Hutch intranet or logged into the Fred Hutch VPN for this to work.
- Log into the AWS Console and go to the EC2 Dashboard.
- Click on Instances in the left nav bar.
- In the table, find the host(s) with the name Bridge-UDD-Worker-Dev (or whatever environment you want to log into). Select that host. (If there's more than one in the environment you want, select just one host.)
- In the information panel on the bottom, find the field Public DNS host. This is the hostname you want to SSH into. But first, you'll need the PEM file to log in.
- Log into belltown and download the PEM files from /work/platform/PasswordsAndCredentials
- On your machine, run ssh -i [path to PEM file] ec2-user@[EC2 hostname]
...
Now you can just run ssh Bridge-UDD-Dev, and it will just work.
Next Steps
Short/Medium-Term
...