Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Write the post-processing status to Adherence.

  • Note that the Bridge Downstream code hardcodes appId=mobile-toolbox. We want to un-hardcode this and either read the appId from the FileEntity annotations, or propagate the appId through each step so that we never lose it.

  • Also, the appId is currently being set by the app. This should instead be a top-level property in the Exporter itself.

Addendum 1: Who Needs Parquet?

We’ve got power users using JSON and basic users using the simple CSV summaries in the Researcher UI. However, we don’t really know who wants to use Parquet. If no one needs Parquet, then we can save a lot of work by skipping Bridge Downstream for Q4. Since we plan to re-build a lot of this in 2024 anyway, we wouldn’t be saving any work by standing up Bridge Downstream right now.

Specifically, we can cut the following Jiras from our plan:

  • Jira Legacy
    serverSystem JIRA
    serverIdba6fb084-9827-3160-8067-8ac7470f78b2
    keyDHP-1020
    (8)

  • Jira Legacy
    serverSystem JIRA
    serverIdba6fb084-9827-3160-8067-8ac7470f78b2
    keyDHP-1021
    (1)

  • Jira Legacy
    serverSystem JIRA
    serverIdba6fb084-9827-3160-8067-8ac7470f78b2
    keyDHP-1018
    (1)

  • Jira Legacy
    serverSystem JIRA
    serverIdba6fb084-9827-3160-8067-8ac7470f78b2
    keyDHP-1032
    (3)

The following Jiras would change:

  • Jira Legacy
    serverSystem JIRA
    serverIdba6fb084-9827-3160-8067-8ac7470f78b2
    keyDHP-1023
    is now a JSON-to-CSV Worker, and the cost would decrease from 8 to 5 as there’s less uncertainty and less risk.

  • Jira Legacy
    serverSystem JIRA
    serverIdba6fb084-9827-3160-8067-8ac7470f78b2
    keyDHP-1025
    , previously costed at 5, would be split into 2 Jiras, one for “summarizing” the CSV, costed at 3, and another Jira for re-writing the ARC scoring code, probably costed at 2 or 3.

    • This scoring code could either be part of the apps or it could live in the Worker.

  • Jira Legacy
    serverSystem JIRA
    serverIdba6fb084-9827-3160-8067-8ac7470f78b2
    keyDHP-1028
    would live in Bridge Worker instead of Bridge Downstream, but would stay costed at 1.

We would need to add a Jira for storing the intermediate tabular results that the JSON-to-CSV Worker generates. This work would be costed at 3.

Pros

  • Reduces the estimated amount of work by 12 sprint points.

  • Completely eliminates DHP-1020, which is the riskiest work item, and almost completely reduces the risk from DHP-1023.

  • Potentially frees up Dwayne’s schedule later in Q4 to help with the Android stuff or the Permissions stuff.

  • Bridge Downstream runs hourly. If we remove Bridge Downstream, we don’t have to worry about triggering Bridge Downstream in the CSV Worker and adding additional delay.

Cons

  • Would have to re-write the scoring code from that one Arc measure.

    • This is mitigated because we were planning to re-write it from R to Python anyway, and now we’re considering re-writing it in Kotlin (mobile) or Java (Worker).

  • In the old design, the JSON-to-CSV Worker would generate a zip file with multiple CSVs per assessment. (Some assessments, such as Number Match, could be relationalized into up to 6 Parquet tables.) In the new design, the JSON-to-CSV Worker would generate only metadata unless the Summarize component were written for the assessment.

    • This can be mitigated if we have some kind of reasonable default, like a link to the raw data in Synapse.

    • We have to write the Summarize components for all 3 Arc measures and for surveys anyway, so this might be a non-issue.

    • One possibility is that the apps provide a separate answers.json which is just a flattened map of key-value pairs, which makes Summarize very easy.

Surveys

We can easily build survey processing on top of the Bridge Downstream pipeline as described above for ARC measures.

...