Table of Contents | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
This document contains instruction for both study administrators and Bridge Downstream developers. Once the study administrator has requested that a new study be set up and alerted the developers to the first batch of data exported to Synapse, the developers will execute the steps described here to ensure that data is flowing smoothly from Bridge and through the pipeline.
...
Because we don’t necessarily want data to be submitted if the Parquet dataset specified in --diff-s3-uri
does not exist (perhaps we incorrectly specified the dataset name and are unknowingly submitting all data in the study project each time the script is invoked), the bootstrap trigger script must be manually run for the first batch of data so that we ensure that the --diff-s3-uri
dataset exists. This can be done by running the same command as specified in the cron file after removing the following parameters:
--diff-s3-uri
...
--diff-parquet-field
--diff-file-view-field
...
Once the /wiki/spaces/BD/pages/2746613791 completes, the /wiki/spaces/BD/pages/2749759500 can be manually run. This is a good opportunity to verify that jobs are completing successfully, data is passing /wiki/spaces/BD/pages/2751594608, and the resulting parquet datasets can be read using pyarrow. Once a full run through the pipeline is complete, study data will be automatically processed on a recurring schedule as specified in the crontab in the previous step.
...