Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

(config) WorkerLauncher - This is a command-line runner that Spring Boot knows about. Spring Boot automatically calls the run() method on this when it's done loading the Spring context. This is what sets up and runs the PollSqsWorker (defined in bridge-base), which in turn calls the BridgeExporterSqsCallback when it gets a request. The WorkerLauncher currently does everything single-threaded, since Bridge-EX workers are already heavily multi-threaded and we never need to run multiple Export requests in parallel.

...

When the Record Processor is done iterating through all the records, it calls endOfStream(), signalling to ExportWorkerManager that there are no more subtasks. At this point, ExportWorkerManager blocks until all asynchronous executions in the current task's queue are complete (using the synchronous Future.get(); this can be thought of as the equivalent to fork-join). Once complete, it will signal to each HealthData and AppVersion handler to upload its payload to Synapse, then invoke the SynapseStatusTableHelper to update status tables.

(worker) TsvInfo - Every ExportTask contains a TsvInfo for each study AppVersion table and for each schema. The TsvInfo contains a reference to the TSV file on disk and a writer for that file. Given a mapping from column names to column values, it knows how to write that to the TSV. It keeps track of the column names so it can do this as well as line counts for metrics. The handlers will get the TsvInfo from from the ExportTask and call writeRow() with the column value map.

It is theoretically possible for two asynchronous executions to call the same handler with the same TsvInfo on two different records. In this case, it's unclear how the data will be written to the TSV. To prevent this, we synchronize writeRow().

Handlers

(handler) AppVersionExportHandler - This handler is for the poorly named AppVersion table. This table was originally meant to track AppVersions for each health data record. As the system evolved, this became the table used to track metadata for all records, used to compute stats and metrics, but the name stuck.

...

(handler) HealthDataExportHandler - Every schema has its own HealthData handler. The Worker Manager calls this handler with a subtask, and the HealthData handler will serialize the record's data into Synapse-compatible values (calling the SynapseHelper) and write those values to the Synapse table. This also includes logic to generate Synapse table column definitions based on a schema. See the parent class SynapseExportHandler for more details.

(handler) IosSurveyExportHandler - A special iOS Survey handler to support a legacy hack. See Legacy Hacks for more details.

(handler) SynapseExportHandler - This is the abstract parent class to both HealthData handler and AppVersion handler. This encapsulates logic to initialize the TsvInfo for a given task and schema, create the Synapse table if it doesn't already exist, column definitions and column values for common metadata columns, and upload the TSV to Synapse when the task is done.

Helpers

(dynamo) DynamoHelper - BridgeEX-specific DDB helper. Encapsulates querying, parsing, and caching for schemas, studies, and participant sharing scope. Also handles defaulting sharing scope to "no sharing" if the sharing scope could not be obtained from DDB.

(helper) ExportHelper - This helper contains some complex logic used for legacy hacks. See Legacy Hacks for more details.

(metrics) Metrics - Object for tracking metrics for a request, lives inside the ExportTask. CounterMap allows you to associate a count to a string. (Example: parkinson-TappingActivity-v6.lineCount = 73) KeyValuesMap allows you to associate one or more values to a given key. (Example: uniqueAppVersions[parkinson] = ["version 1.0.5, build 12", "version 1.2, build 31", "version 1.3-pre9, build 40"]). SetCounterMap allows you to associate a count to a string, but each count is associated with a value, which the counter dedupes on. (Example: uniqueHealthCodes[parkinson] = 49 ) All metrics use sorted data structures, so Bridge-EX can log the metric values in alphabetical order, for ease of log viewing.

Note that KeyValuesMap and SetCounterMap are both backed by a multimap. We keep them separate to make it clear that KeyValuesMap is used for the values while SetCounterMap is used for the count. The difference is because (a) KeyValuesMap is expected to have a small set of values while SetCounterMap is expected to have a large set of values (b) SetCounterMap counts unique health codes, which we don't want to write to our logs.

(metrics) MetricsHelper - Helper to encapsulate a few metrics-related functionality. This is used only by the Record Processor and exists mainly to keep the Record Processor from becoming too complicated. Current responsibilities include capturing metrics common to all records and writing all metrics to the logs at the end of a request.

(synapse) SynapseHelper - SynapseHelper, which includes simple wrappers around Synapse Java Client calls to include Retry annotations, as well as shared complex logic.

serializeToSynapseType() is used to convert raw health data records from JSON values in DDB to values that can be used in Synapse tables. If a value can't be serialized to the given type, it will return null. It also includes logic for downloading attachments from S3 and uploading them to Synapse as file handles. This is currently only used by HealthData handler.

uploadFromS3ToSynapseFileHandle() is a helper method to download an attachment from S3 and upload it to Synapse as a file handle. This also uses the Bridge type to determine the correct MIIME type for JSON and for CSVs, defaulting to application/octet-stream if the type is ambiguous. This is currently only used by SynapseHelper.serializeToSynapseType().

generateFilename() includes some clever logic to preserve file extensions (in case researchers actually care about file extensions), or insert a new file extension if the existing one doesn't make sense (for things like JSON or CSVs). This is currently only used by SynapseHelper.uploadFromS3ToSynapseFileHandle().

uploadTsvFileToTable() encapsulates multiple Synapse calls used to upload TSVs to a Synapse table, as well as a poll-and-wait loop to process the asynchronous call as a blocking call. This is used by SynapseExportHandler and its children.

createTableWithColumnsAndAcls() encapsulates logic to create a Synapse table with the given columns, principal ID (table owner), and data access team ID (permissions to view table). This is a common pattern found in all tables created by Bridge-EX. This is used by SynapseExportHandler and its children as well as the SynapseStatusTableHelper.

(synapse) SynapseStatusTableHelper

...