Mapping between Accounts and Study Data
Stormpath generates and assigns it’s own unique identifier to each user for it’s own internal use, which Bridge uses as a publicly accessible account identifier. In addition, for each user in a study, the Bridge service generates its own pair of universally unique identifiers (UUIDs). The first UUID is called “Bridge account ID” and is generated by the Bridge Server on account creation, encrypted, and saved in Stormpath. Encryption uses 256-bit AES, GCM-mode. The AES cipher is created using Apache Shiro. The actual encryption is provided by the latest version of Bouncy Castle. Bridge servers are currently deployed as Heroku stacks. As a result, the encryption key is set as an environment variable on the Heroku admin console to pass to the Bridge servers. Only the Bridge technical team have access to the Heroku admin console. Additionally, each stack (local, development, staging, or production) is assigned its own encryption key. The second UUID, called “participant health code”, is used internally by Bridge as the key to user’s study data. If the same user enrolls in multiple studies, multiple account IDs and health codes will be generated to keep data for each study isolated. A one-way mapping is set up from account ID to health code. Currently the mapping is implemented as a DynamoDB table with the account ID as the hash key. The workflow is illustrated by the following diagram:
Thus, Bridge never links the Stormpath account ID directly to the study data. When an authenticated user makes a call to create, update, or delete his/her data, the Bridge server retrieves the encrypted account ID from the user profile, decrypts the key, and uses the key to find the user’s health code in the map. The mapping may exist for a time in memory on the Bridge server, but is never stored in a way accessible to someone who gains access to the back-end systems.
Keys for the production service can be managed by a separate deployment team, allowing Sage to minimize the number of individuals with access. This design limits the view of the entire map, ensuring that even Sage Bionetworks engineers with access to back-end systems would find it exceedingly difficult to reidentify patients in this system.
Bridge provides no APIs to allow researchers to query the study data in real time. Instead, authenticated researchers on the project team can trigger an export of the aggregated study data from all participants in their study in which participants are identified only by their unique study data ID. This ensures that the researcher cannot link back any particular records to any particular participant.
Future Extensions
A risk is that an attacker with access to the Dynamo ID map could scan the table in whole and then reverse the map. With BRIDGE-177, IAM rules will be set up to forbid querying and scanning the DynamoDB table that holds the one-way map (See AWS documentation). With BRIDGE-178, participant health codes will be stored encrypted in AWS. And with BRIDGE-179, if we introduce user's password hash into the encryption, even the master key holder would not be able to reverse the map.