...
When the Record Processor is done iterating through all the records, it calls endOfStream(), signalling to ExportWorkerManager that there are no more subtasks. At this point, ExportWorkerManager blocks until all asynchronous executions in the current task's queue are complete (using the synchronous Future.get(); this can be thought of as the equivalent to fork-join). Once complete, it will signal to each HealthData and AppVersion handler to upload its payload to Synapse, then invoke the SynapseStatusTableHelper to update status tables.
(worker) TsvInfo - Every ExportTask contains a TsvInfo for each study AppVersion table and for each schema. The TsvInfo contains a reference to the TSV file on disk and a writer for that file. Given a mapping from column names to column values, it knows how to write that to the TSV. It keeps track of the column names so it can do this as well as line counts for metrics. The handlers will get the TsvInfo from from the ExportTask and call writeRow() with the column value map.
It is theoretically possible for two asynchronous executions to call the same handler with the same TsvInfo on two different records. In this case, it's unclear how the data will be written to the TSV. To prevent this, we synchronize writeRow().
Handlers
(handler) AppVersionExportHandler - This handler is for the poorly named AppVersion table. This table was originally meant to track AppVersions for each health data record. As the system evolved, this became the table used to track metadata for all records, used to compute stats and metrics, but the name stuck.
Every study has its own AppVersion handler. The Worker Manager calls this handler with a subtask, and the AppVersion handler will write the subtask's metadata to the AppVersion table, including a column for the original table, where the record's content lives. See the parent class SynapseExportHandler for more details.
(handler) ExportHandler - Abstract parent class for all handlers. The primary purpose for this class is to the ExportWorker (an asynchronous execution) can have a reference to a handler without needing to know what kind of handler it is. It also contains references to the Worker Manager and other various getters and setters concrete handlers need.
(handler) HealthDataExportHandler - Every schema has its own HealthData handler. The Worker Manager calls this handler with a subtask, and the HealthData handler will serialize the record's data into Synapse-compatible values (calling the SynapseHelper) and write those values to the Synapse table. This also includes logic to generate Synapse table column definitions based on a schema. See the parent class SynapseExportHandler for more details.
(handler) IosSurveyExportHandler
...
https://github.com/Sage-Bionetworks/Bridge-Exporter
...