Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Note that this also includes the ExporterDate. This is exported to Synapse as UploadDate, but it is different from the UploadDate because the UploadDate represents that comes from the health data record in DDB. The UploadDate in DDB tracks when the health data was uploaded to Bridge (generally yesterday) while the ExporterDate represents when the Exporter was run and when the data hit Synapse. The design decision was made this way since researcher's would want to operate on "today's data drop" (ExporterDate) while the Exporter needs to set a cutoff for exporting (all data up through the end of yesterday, that is, UploadDate). The ExporterDate (UploadDate in Synapse) tracks when the data was upload to Synapse. Since the Bridge Exporter runs every day at ~2am and pulls yesterday's data, these dates are (in normal circumstances) always different.

This was done because the ExporterDate and UploadDate were done for very similar but different use cases. The UploadDate in DDB is for the Bridge Exporter to keep track of which data set it needs to export to Synapse. The ExporterDate (UploadDate in Synapse) is for researchers to keep track of which data they have processed, and to determine "today's data drop".

(worker) ExportWorker - An asynchronous execution. The Worker Manager will create one for each record, and may make more than if the record needs, for example, both the Health Data handler and the AppVersion handler. This is executed in the Worker Manager's asynchronous thread pool and tracked in the ExportTask's asynchronous execution queue.

...

createTableWithColumnsAndAcls() encapsulates logic to create a Synapse table with the given columns, principal ID (table owner), and data access team ID (permissions to view table). This is a common pattern found in all tables created by Bridge-EX. This is used by SynapseExportHandler and its children as well as the SynapseStatusTableHelper.

(synapse) SynapseStatusTableHelper

(synapse) SynapseTableIterator

(util) BridgeExporterUtil

Deployment

Troubleshooting

Redrives

...

SynapseStatusTableHelper.

(synapse) SynapseStatusTableHelper - At the end of every request, the Worker Manager will invoke the SynapseStatusTableHelper to write a row to the status table of each study. This is intended for use by automated systems to determine when the Exporter has finished running for the night. Currently, the status row contains only today's date. The SynapseStatusTableHelper also creates the status table if it doesn't yet exist, offloading the logic to SynapseHelper.

(synapse) SynapseTableIterator - Currently defunct. This is an iterator to iterate through Synapse table rows individually. It used to be used by various helper scripts but is currently unused.

(util) BridgeExporterUtil - Static helper functions for Bridge-EX. Contains methods to build a schema key from a health data record DDB item and methods to extract values from DDB items and JSON objects and sanitize them (including stripping HTML, stripping newlines and tabs, and truncating strings to fit maximum length restrictions).

DynamoDB Tables

Exporter-Scheduler-Config - Contains configuration for Bridge-EX-Scheduler to call Bridge-EX. Lambda is unable to pass any parameters into Bridge-EX-Scheduler other than the function name, so we key off of function name and use this table to get Scheduler configs.

  • schedulerName (hash key) - Matches the Lambda function name. Used to distinguish between devo, staging, and prod.
  • sqsQueueUrl - SQS queue to write requests to
  • timeZone - Currently configured to America/Los_Angeles (equivalent to Pacific Time) for all envs. In the future, if we need to launch Bridge-EX stacks in other regions, this may have other values.
  • requestOverrideJson - Optional. Request template that the Bridge-EX-Scheduler uses and fills in "date" with yesterday's date. Generally used for specialized stacks with special parameters or for testing. Example:
Code Block
{
  "studyWhitelist":["api", "breastcancer", "parkinson"],
  "sharingMode":"PUBLIC_ONLY"
}

(ddbPrefix)SynapseMetaTables - Bridge-EX automatically writes to this table to keep track of meta tables (specifically appVersion tables and status tables). The key is the table name, generally of the form "parkinson-appVersion" or "parkinson-status", and it maps to the Synapse table ID. Bridge-EX uses this table to remember if it's already created a table, and if so, where to find that table.

(ddbPrefix)SynapseTables - Similar to SynapseMetaTables, Bridge-EX automatically writes to this table to keep track of tables, in this case, health data tables. The key is the schema name, flattened into the form "parkinson-TappingActivity-v6", which also maps to Synapse table IDs.

Operations

Deployment

Bridge-EX

  1. Bridge-EX changes are committed to our GitHub repository (generally via pull requests): https://github.com/Sage-Bionetworks/Bridge-Exporter
  2. Travis (https://travis-ci.org/Sage-Bionetworks/Bridge-Exporter) automatically builds the latest commit and deploys it to AWS Elastic Beanstalk according to the Travis configuration (https://github.com/Sage-Bionetworks/Bridge-Exporter/blob/develop/.travis.yml)
  3. AWS Elastic Beanstalk automatically deploys the Bridge-EX code to the AWS-managed EC2 cluster (currently configured to be a "cluster" of one machine), and then automatically starts the service.
  4. To test, go to the SQS console and generate a sample request into the appropriate SQS queue.
  5. To deploy to staging (or prod), merge the code in GitHub from the develop branch to the uat branch (or from the uat branch to the prod branch). Using a local repository cloned from the root fork, run the following commands:
    1. git checkout develop
    2. git pull
    3. git checkout uat
    4. git merge --ff-only develop
    5. git push

Bridge-EX-Scheduler

  1. Make the Bridge-EX-Scheduler changes in your local repository and commit to the root in GibHub (generally via pull request): https://github.com/Sage-Bionetworks/Bridge-EX-Scheduler
  2. In your local repository, run "mvn verify", then upload target/Bridge-EX-Scheduler-2.0.jar to AWS Lambda using the AWS Lambda console.
    1. Unfortunately, Travis doesn't support automated deployments of Java to AWS Lambda, so we have to do it manually.
  3. To test, click the "Test" button in AWS Lambda.

Troubleshooting

Logs

NOTE: We're having data loss issues with Logentries. See Logentries support ticket (https://support.logentries.com/hc/en-us/requests/12415) and

Jira Legacy
serverJIRA (sagebionetworks.jira.com)
serverIdba6fb084-9827-3160-8067-8ac7470f78b2
keyBRIDGE-1241
.

Logs can be found at https://logentries.com/. Credentials to the root Logentries account can be found at belltown:/work/platform/PasswordsAndCredentials/passwords.txt. Alternatively, get someone with account admin access to add your user account to Logentries.

If for some reason, the logs aren't showing up in Logentries, file a support ticket with Logentries. The alternative is to go to the AWS Elastic Beanstalk console, go to the environment you need logs for, go to Logs, and click on Request Logs. This will allow you to access the logs in your browser (if you choose Last 100 Lines) or download the logs to disk (if you choose Full Logs). The log file you're looking for is catalina.out.

If this doesn't work, you can try SSHing directly into the host.

  1. To find the hostname, go to a host tagged with the appropriate name (example, Bridge-EX-Prod), select it, and note the Public DNS in the description (example, ec2-52-91-223-70.compute-1.amazonaws.com).
  2. Download the security PEM from belltown:/work/platform/PasswordsAndCredentials/Bridge-EX-Prod.pem (or equivalent for another env).
  3. (This is optional, but makes things easier.) Set up your ~/.ssh/config with the following (replacing HostName and IdentityFile as needed). The host can be anything you want. User must be ec2-user.
    host BridgeEX2-Prod
         HostName ec2-52-91-223-70.compute-1.amazonaws.com
         User ec2-user
         IdentityFile ~/Bridge-EX-Prod.pem
  4. SSH into the host. You may need to be in the Fred Hutch intranet or log into the Fred Hutch VPN.
  5. Logs can be found at /var/log/tomcat8/catalina.out

Metrics to Look For

When scrubbing the Bridge-EX logs the key metrics to look for are:

  • number of ERRORs - Lots of errors generally means something is wrong at the systemic level. A single error generally means a record failed to upload to Synapse and is worth redriving or repairing. See Redrives for more info.
    • The only ERROR worth ignoring is "Unable to parse sharing options for hash[healthCode]=-691460808, sharing scope value=null". However, if there are a lot of these, this generally indicates a systemic error
    • On the flip side, exceptions and warnings generally aren't a problem. They generally are things like "#createFileHandleWithRetry(): attempt #1 of 5 failed", which indicates a Synapse call failed and was retried. That said, be sure to look at exceptions and warnings in case there are other problems.
  • A log line that looks like "Finished processing request in 835 seconds, date=2016-03-16, tag=[scheduler=Bridge-EX-Scheduler-prod;date=2016-03-16]". This indicates that Bridge-EX completed successfully and how long it took. If this line is missing, it indicates that Bridge-EX never completed. If this request time is significantly higher, this indicates a systemic problem.

Below are other issues that are worth looking at, but are too cumbersome to look at manually. Rather, these are things we need to build an automated monitoring and alarm system for:

  • accepted[ALL_QUALIFIED_RESEARCHERS], accepted[SPONSORS_AND_PARTNERS], excluded[NO_SHARING] - If there's a big shift in these numbers, it may indicate a bug in the Sharing Settings in Bridge, or possibly a major change in the app.
  • parkinson-appVersion.lineCount (and similar for other studies) - These indicate the total number of entries exported to Synapse for a particular study. If this number shifts (up or down) by a lot, it may indicate a problems in the app or in Bridge.
  • parkinson-TappingActivity-v6.lineCount (and similar for other studies and schemas) - Similarly, if any particular table sees large shifts, that could be a problem.
  • *.errorCount - If this appear at all, that means there's an error. This generally doesn't suggest a systemic issue (unless the error count is high, in which case our logscan alarms would go off), but rather indicate that we need to redrive some records.
  • numTotal - The total number of records Bridge-EX saw today across all studies and schemas, including records that were excluded or filtered out. Similarly, if this number shifts by a lot, it could be a problem.
  • uniqueHealthCodes[parkinson] (and similar for other studies) - This represents the number of active users. If this drops suddenly, it indicates dataloss somewhere in the Bridge pipeline. If the number rises suddenly, it may not be an issue, but it's worth understanding the cause behind it.

Currently, we manually scrub our logs about once a week. We want to move this to an automated monitoring and alarming system. This may involve pumping the logs to CloudWatch (or another system) or writing a custom solution. It may involve sending the metrics in a different format so our monitoring solution doesn't need to parse raw logs.

See

Jira Legacy
serverJIRA (sagebionetworks.jira.com)
serverIdba6fb084-9827-3160-8067-8ac7470f78b2
keyBRIDGE-1225
for tracking automated monitoring.

Monitoring and Alarms

We have logscan alarms in Logentries for 10+ ERRORs in an hour or for 100+ WARNs in an hour. These alarms send an email to bridgeit@sagebase.org.

We don't have a good automated monitoring system outside of logscans. See above or see

Jira Legacy
serverJIRA (sagebionetworks.jira.com)
serverIdba6fb084-9827-3160-8067-8ac7470f78b2
keyBRIDGE-1225
for more details.

Redrives

Limitations

Legacy Hacks

* upload freeform text as attachments

* converting old surveys to health data

More Info

Bridge Data Pipeline

...