Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • This precludes nightly processing or hourly processing, as this would add significant delays to returning the data.

  • How much of a delay is acceptable? Minutes? This will determine if we can do batch processing on a short schedule (eg once per 5 minutes), or if we need to trigger a job for each individual upload.

  • Note that given the current upload rates (peak less than 100 per day for all of MTB, and averaging much lower), there’s probably no difference between a batch job every 5 minutes and triggering per individual upload.

ARC Measures

Things we will have to do regardless of our design decisions:

  • Modify the Glue job to trigger off each upload instead of a cron schedule. (Or move the Python script to something that’s not Glue.)

  • Aggregate data across all dates and participants.

  • Write the aggregated data back to Bridge.

  • Write the post-processing status to Adherence.

  • Bridge API to download the aggregated data as a CSV.

  • Researcher UI to surface the API to download a CSV.

  • Note that the Bridge Downstream code hardcodes appId=mobile-toolbox. We want to un-hardcode this and either read the appId from the FileEntity annotations, or propagate the appId through each step so that we never lose it.

    • Also, the appId is currently being set by the app. This should instead be a top-level property in the Exporter itself.

Proposed Design: Replace Parquet

...

It would probably be simpler to write our custom Survey-to-Table implementation, custom-built to the new Survey Builder and survey format.

Additional Notes

Bridge Downstream code base:https://github.com/Sage-Bionetworks/BridgeDownstream

Bridge Downstream getting started: Getting Started

Bridge Downstream developer docs: /wiki/spaces/BD/pages/2746351624

How to read parquet data in the command line:

  1. Pre-req: pip needs to be installed on your Mac. The fastest way to install it is port install py-pip. This will also install Python, if necessary.

  2. pip install parquet-cli - You only need to do this once. This will put a command-line utility called parq in your bin.

  3. parq <filename> to get metadata for a parquet file.

  4. parq <filename> --head Nor parq <filename> --tail Nto read the actual parquet data.

  5. Alternate solution: https://github.com/devinrsmith/deephaven-parquet-viewer Allows you to view parquet files in your browser. Requires Docker.