Requirements
These were collected from a meeting we held with Larsson, Abhi, Mike, Brian, and Thaneer, and from a follow-up interview with Dan.
- We have partners who contribute participants to existing studiesour studies while being part of another study, that is, users who participants who download and use our app, and thus are contributing data to a participating in the larger population of participants that are recruited from multiple locations and also a specific sub-study (we propose to call these partners "sub-study partners");
- Sub-study partners may include their participants in (at least) a couple of ways:
- They may assign them an external ID, that in essence should identify the participant's sub-study membership as well;
- They may ask existing participants to join the study, such that signing a consent in Bridge should enroll that participant in the sub-study;
- Sub-study partners will also create user accounts to manage their own users and external identifiers (only)
- Sub-study participants do not need to have an external ID; management accounts and users enrolled through a consent may not have an external ID. We may assign an external ID as an optional feature of enrolling through signing a consent (but it wouldn't be necessary);
- Sub-study participants may receive schedules (and thus, tasks or surveys) that are unique to their sub-study, but we believe these changes will be additive to the main study. However, Sage Bionetworks will be responsible for incorporating such changes so that they don't break the main study;
- Sub-study partners may access their list of participants and/or external identifiers, and can probably do anything an existing researcher can do with those entities... but only those that are in the sub-study. They cannot see participants or external IDs in other sub-studies (or no sub-study);
- The client app should know the sub-study memberships of a participant in case this is important to deliver the correct UI/behavior;
- Sub-study membership needs to be exported with all data exported to Synapse. The team that processes the data will use this membership information to create specific repositories for those sub-study partners;
- All users can be in multiple sub-studies (both participants, and administrative users). It's not clear what the requirements are for this (e.g. should a researcher see all participants in all the sub-studies they are a member of at once, or only one at a time with a mechanism to select their current sub-study?), but in a client app, the behavior would have to be additive, so again, Sage would want to vet and implement this to ensure no sub-study breaks and the app is useable.
- it follows that users may have multiple external IDs;
- in studies where users authenticate with external ID + phone/email, all the external IDs should be usable to authenticate;
- membership should not leak, i.e. if I am a researcher in sub-study A, and a participant is in sub-studies A and B, B should not be included in the API, BSM, etc.;
- As before, users can belong to no sub-studies:
- Researchers and developers would see everything in the study (no sub-studies means no filtering of study participants, etc.);
- Participants would not match criteria based on sub-studies (schedules/consents may or may not be filtered out for those participants);
- Research data is exported to Synapse with out sub-study tags; based on our agreements with sub-study partners, this data might or might not be included in their data set.
- For developers, this feature should not require multiple builds or multiple server study configurations.
Database Schema
Other possibilities listed at the end of this document
This seems best given Dwayne's comments below... manage external IDs separately, and in a transaction, mark when they are used and add them as an attribute to the AccountSubStudies table record, which is a true associative table.
Tasks
Clean up accounts to remove GenericAccount and HibernateAccount, and the copying between the two classes (subsequent changes heavily involve the AccountsDao, so it would help to simplify first).
...
GET /v3/substudies?includeDeleted=boolean [list]
POST /v3/substudies [create]
GET /v3/substudies/:id [read]
POST /v3/substudies/:id [update]
DELETE /v3/substudies/:id?physical=boolen [delete]
Create SubStudyAccountsAccountSubStudies
CREATE TABLE `SubStudyAccounts` `AccountSubStudies` {
`studyId` VARCHAR(6025) NOT NULL,
`id` `subStudyId` VARCHAR(6015) NOT NULL,
`subStudyId` `accountId` VARCHAR(15255) NOT NULL,
`healthCode` `externalId` VARCHAR(255) NULL,
`externalId` VARCHAR(255) NULL,
PRIMARY KEY (`studyId`,`id`) // ID must be unique in a study so we can unambiguously determine the sub-study membership
UNIQUE INDEX `StudyId-SubStudyId-Index` (`studyId` ASC, `id` ASC), / /to retrieve all external IDs in a sub-study
INDEX `HealthCode-Index` (`healthCode` ASC) // to retrieve external IDs for an account
UNIQUE INDEX 'StudyId-SubStudyId-ExternalId-Index (`studyId` ASC,`subStudyId` ASC, `id` ASC) // for security, we will include the substudy in the query, though external ID is unique to study
CHECK `HealthCode-Or-ExternalId-Required` (`externalId` IS NOT NULL OR `healthCode` IS NOT NULL) // not required for the association
PRIMARY KEY (`studyId`, `subStudyId`,`accountId`)
INDEX `AccountId-Index` (`accountId` ASC)
CONSTRAINT `AccountsFK` FOREIGN KEY (`accountId`) REFERENCES `Accounts` (`id`) ON DELETE CASCADE
CONSTRAINT `SubStudiesFK` FOREIGN KEY (`studyId`, `subStudyId`) REFERENCES `SubStudies` (`studyId`, `id`) ON DELETE CASCADE
}
Create ExternalIds
We could leave this in DDB but there might be more consistency errors. Unless we can execute DDB code as part of a SQL transaction and only commit the transaction if the DDB updates succeed.
CREATE_TABLE `ExternalIds` {
`studyId` VARCHAR(25) NOT NULL,
`subStudyId` VARCHAR(15) NOT NULL,
`identifier` VARCHAR(255) NOT NULL,
`accountId` VARCHAR(255) NULL // do not delete the external ID record if the user is deleted
PRIMARY KEY (`studyId`,`identifier`) // externalId must be unique across all sub-studies
CONSTRAINT `SubStudiesFK` FOREIGN KEY (`studyId`, `subStudyId`) REFERENCES `SubStudies` (`studyId`, `id`) ON DELETE CASCADE
}
// This replaces the external ID service
ExternalIdsServiceV2 {
listExternalIds(studyId, subStudyId, offsetBy, pageSize, includeDeleted)
createExternalId(externalIdObj)
getExternalId(studyId, subStudyId, externalId)
updateExternalId(externalIdObj)
deleteExternalId(studyId, subStudyId, externalId)
deleteExternalIdPermanently(studyId, subStudyId, externalId)
// These would update an add an associate record between Account and SubStudy with an account ID
assignExternalId(studyId, subStudyId, externalId, accountId)
unassignExternalId(studyId, subStudyId, externalId)
}
...
GET /v3/substudies/:subStudyId/externalids [list]
POST /v3/substudies/:subStudyId/externalids [create] <-- could take a list for batch creates
GET /v3/substudies/:subStudyId/externalids/:id [read]
POST /v3/substudies/:subStudyId/externalids/:id [update]
DELETE /v3/substudies/:subStudyId/externalids/:id [delete]
Add sub-study to existing exernal IDs API. Make it possible to associate external ID record with sub-study
Write to both external ID systemstables, and read first from the new external ID database. In this period, user can only have one external identifier:
- Add subStudyId to the existing ExternalIdService APIs; to use them, a sub-study will need to be provided (this may not be functional at first)
- Write to both the new and old tables when assigning or changing external ID.
- Backfill sub-study from the new to the old table
- Add method to ExternalIdService to find external ID by health code. Use this in preference to the value stored in the Accounts table, so you can first look at the join with new table.
AccountDao - mostly calls through to the ExternalIdService
- sub-study association needs to be restricted to the caller's sub-studies. For example, the set of sub-studies need to be passed in, or the StudyParticipant needs to be "tainted" with the caller's identity when it is constructed so permissions can be checked. Submitted record can participate in a set of sub-studies and then these are checked against membership of caller.
- constructAccount: set up eventual persistence of associative record
- Iterator/List calls should specify a sub-study (or perhaps a list, if we want BSM users to see all users they can see, across all studies)
Account, AccountSummary, StudyParticipant, UserSession
- The construction of all three of these arguments should return their sub-studies. It's conceivable we don't include in AccountSummary.
Migrate external IDs
- Create existing sub-studies
- read from both external ID systems, write only to new system
- Migrate external IDs to create associative records, removing external ID from the account table where appropriate.
- Remove old APIs from SDK
- Remove links to old tables
- Remove APIs from BridgePF
new table before old one. However at this time, you can only belong in one sub-study/only have one external ID.
Backfill the older external ID table with sub-study IDs
Switch to looking up users via external ID by quering for the record (not looking in Accounts table)
Join tables when retrieving user to get external IDs
- at this point the externalId column in the Accounts table should not be in use
Add substudies to AccountSummary, StudyParticipant, add substudies to UserSession
Add sub-study filtering to the getAccountSummaries() and Iterator calls. A sub-study must be selected if the user has sub-studies, and it must belong in their set of sub-studies, or the request is an error. Otherwise, the records are filtered only to those accounts that have the sub-study ID.
Remove older external IDs API (maintaining it would be very difficult, if it has to be maintained, switch it over to be a special case of calling the new API)
Update sub-populations so that signing the consent of a sub-population will assign a user to one ore more sub-studies, without an external ID (additional behaviors can be implemented as needed).
...
Add the ability to filter by sub-studies using the Criteria object. Like tags, you should be able to add a set of sub-studies, at least one of which should match, or a set where none may match. The main use for this would be to schedule different sub-studies differently, in the context of an overarching multi-study design.
Other Database Options
The external IDs table is an associative table between accounts and sub-studies, but they are also entities and can exist even if an account is not associated to the ExternalId record. (The sub-study relationship is always required). This is the simplest but requires some additional indexes.
External Ids are managed as entities separately from the association (FK to sub-study). Then accounts are associated to external IDs, but only one per study (enforced in code). Queries for accounts in a sub-study will need to join two tables; we'd have to create a dummy external ID to associate people to a stub-study who otherwise weren't assigned an external ID as part of the study design.
We could create a separate associative table for sub-study membership. Simpler to query, doesn't require an external ID, does require constraints in code (user can't be associated to an external ID without also being associated to a sub-study, so we'd add one when we add the other, remove). We decided instead to join these tables.