Write the post-processing status to Adherence.
Note that the Bridge Downstream code hardcodes appId=mobile-toolbox. We want to un-hardcode this and either read the appId from the FileEntity annotations, or propagate the appId through each step so that we never lose it.
Also, the appId is currently being set by the app. This should instead be a top-level property in the Exporter itself.

Addendum 1: Who Needs Parquet?

We’ve got power users using JSON and basic users using the simple CSV summaries in the Researcher UI. However, we don’t really know who wants to use Parquet. If no one needs Parquet, then we can save a lot of work by skipping Bridge Downstream for Q4. Since we plan to re-build a lot of this in 2024 anyway, we wouldn’t be saving any work by standing up Bridge Downstream right now.

Specifically, we can cut the following Jiras from our plan:

Jira Legacy
server System JIRA
serverId ba6fb084-9827-3160-8067-8ac7470f78b2
key DHP-1020
(8)
Jira Legacy
server System JIRA
serverId ba6fb084-9827-3160-8067-8ac7470f78b2
key DHP-1021
(1)
Jira Legacy
server System JIRA
serverId ba6fb084-9827-3160-8067-8ac7470f78b2
key DHP-1018
(1)
Jira Legacy
server System JIRA
serverId ba6fb084-9827-3160-8067-8ac7470f78b2
key DHP-1032
(3)

The following Jiras would change:

Jira Legacy
server System JIRA
serverId ba6fb084-9827-3160-8067-8ac7470f78b2
key DHP-1023
is now a JSON-to-CSV Worker, and the cost would decrease from 8 to 5 as there’s less uncertainty and less risk.
Jira Legacy
server System JIRA
serverId ba6fb084-9827-3160-8067-8ac7470f78b2
key DHP-1025
, previously costed at 5, would be split into 2 Jiras, one for “summarizing” the CSV, costed at 3, and another Jira for re-writing the ARC scoring code, probably costed at 2 or 3.
- This scoring code could either be part of the apps or it could live in the Worker.
Jira Legacy
server System JIRA
serverId ba6fb084-9827-3160-8067-8ac7470f78b2
key DHP-1028
would live in Bridge Worker instead of Bridge Downstream, but would stay costed at 1.

We would need to add a Jira for storing the intermediate tabular results that the JSON-to-CSV Worker generates. This work would be costed at 3.

Pros

Reduces the estimated amount of work by 12 sprint points.
Completely eliminates DHP-1020, which is the riskiest work item, and almost completely reduces the risk from DHP-1023.
Potentially frees up Dwayne’s schedule later in Q4 to help with the Android stuff or the Permissions stuff.
Bridge Downstream runs hourly. If we remove Bridge Downstream, we don’t have to worry about triggering Bridge Downstream in the CSV Worker and adding additional delay.

Cons

Would have to re-write the scoring code from that one Arc measure.
- This is mitigated because we were planning to re-write it from R to Python anyway, and now we’re considering re-writing it in Kotlin (mobile) or Java (Worker).
In the old design, the JSON-to-CSV Worker would generate a zip file with multiple CSVs per assessment. (Some assessments, such as Number Match, could be relationalized into up to 6 Parquet tables.) In the new design, the JSON-to-CSV Worker would generate only metadata unless the Summarize component were written for the assessment.
- This can be mitigated if we have some kind of reasonable default, like a link to the raw data in Synapse.
- We have to write the Summarize components for all 3 Arc measures and for surveys anyway, so this might be a non-issue.
- One possibility is that the apps provide a separate answers.json which is just a flattened map of key-value pairs, which makes Summarize very easy.

Surveys

We can easily build survey processing on top of the Bridge Downstream pipeline as described above for ARC measures.

...

Version	Old Version 8	New Version 9
Changes made by	Dwayne Jeng (Unlicensed)	Dwayne Jeng (Unlicensed)
Saved on	Sept 18, 2023	Sept 25, 2023

Versions Compared

Key

Addendum 1: Who Needs Parquet?

Surveys

Page Comparison

Versions Compared

Key

Addendum 1: Who Needs Parquet?

Surveys