Document toolboxDocument toolbox

Auth Migration

versioncomment
2021/11/08Added this table for tracking
2017/10/20Created

Master tracking JIRA BRIDGE-1672 - Getting issue details... STATUS

Overview

Part 1: Account Information - Stormpath is currently used to store credentials, profile attributes, consents, and other "custom data". This needs to be represented in Bridge. Given that we frequently need to query and iterate users, a SQL database is a good fit.

Part 2: Email Workflows - We need to re-implement email workflows for verify email, password reset, and "magic link" authentication.

These two parts can be done mostly in parallel.

Account Information

Table Schemas

Accounts

Primary Key: id

Constraints: unique(studyId, email)

id varchar(255) - Accounts imported from Stormpath will use the Stormpath ID. New accounts will have a randomly generated ID. (Guid?)

studyId varchar(255) - Study the account lives in.

email varchar(255) - User's email address.

createdOn bigint - Long epoch milliseconds representing when the account was created.

healthCode varchar(255) - Health code. Has an index so we can look it up.

healthId varchar(255) - Health ID (used to map to Health Code). This used to be encrypted in Stormpath. Since we're now storing healthCode unencrypted, there's no need to store this encrypted.

modifiedOn bigint - Long epoch milliseconds representing when the account was last modified.

firstName varchar(255) - User's first name.

lastName varchar(255) - User's last name.

passwordHash varchar(255) - Salted hashed passwords.

passwordModifiedOn bigint - Long epoch milliseconds representing when the account's password was last modified.

passwordAlgorithm enum - HmacSha256 or Bcrypt or maybe something, for future compatibility.

status enum - disabled, enabled, or unverified.

Attributes

Unique Key: accountId + attributeKey

accountId varchar(255) - Foreign key into Accounts table.

attributeKey varchar(255) - Attribute key. (Note: "key" is a SQL reserved keyword.)

attributeValue varchar(255) - Attribute value. (Note: "value" is a SQL reserved keyword.)

Roles

Unique Key: accountId + role

accountId varchar(255) - Foreign key into Accounts table.

role enum - developer, researcher, admin, test_users, or worker

NOTE: This table exists as a separate join table, because Hibernate doesn't support the MySQL SET type.

Consents

Unique Key: accountId + subpopulationGuid + signedOn

accountId varchar(255) - Foreign key into Accounts table.

subpopulationGuid varchar(255) - Subpopulation guid of the signed consent.

signedOn bigint - Long epoch milliseconds representing when the consent was signed.

birthdate date - User's birthdate.

consentCreatedOn bigint - Long epoch milliseconds representing when the consent was created on, used to identify what version of the subpopulation's consent was signed.

name varchar(255) - User's full name.

signatureImageData mediumtext - Base64 encoded signature image. Empirically, the largest signature is ~135kb, which is much too large for a text (64kb). So we're using a mediumtext (16mb). It is very unlikely that an app will upload a signature that's more than 16mb.

signatureImageMimeType varchar(255) - Signature image MIME type. Example: image/png

withdrewOn bigint - Long epoch milliseconds representing when the user withdrew consent, if they withdrew consent.

NOTE: Birthdate and full name are in Consents table, even though these are unlikely to change as the user consents and withdraws and re-consents. However, to maintain backwards compatibility and parity, we are keeping it in the Consents table. We can consider migrating these to Accounts at a later date.

Password Hashing Algorithm

As per https://stormpath.com/export#use-pw, password hashes are either in HmacSha256 or in Bcrypt. (NOTE: During the migration, we discovered that all Stormpath passwords were hashed using HmacSha256.) Bcrypt passwords are documented on the Stormpath FAQ.

HmacSha256 can be decrypted using the following code sample (provided by Stormpath support):

// Get MCF string from accountID.json file
String mcf_string = "$stormpath1$ctYP52a2Sp2yIjzzlJAuPg==$djHLTcfEerQ3rCQAUi1kFgGN9lqmZHwz7PjKdSst/hg=";

// Extract the b64 encoded salt and password hash
String[] mcf = mcf_string.split("\\$");
String b64_salt = mcf[2];
String b64_hash = mcf[3];

// Decode the b64 salt to get the salt byte array
byte[] salt = Base64.getDecoder().decode(b64_salt.getBytes());

// Get the user's plain password
String plaintext_password = "Jenydoby6!";

// Salt the plaintext password, prepend to user provided password, then hash
try {
    Mac sha256_HMAC = Mac.getInstance("HmacSHA256");
    SecretKeySpec secret_key = new SecretKeySpec(salt, "HmacSHA256");
    sha256_HMAC.init(secret_key);
    String hash = Base64.getEncoder().encodeToString(sha256_HMAC.doFinal(plaintext_password.getBytes()));

    if (hash.equals(b64_hash) {
        log.info("Success! Both hashes match!");
    } else {
        log.info("Passwords do not match.");
    }
} catch (Exception e) {
    log.error(e);
}

NIST recommends using PBKDF2, so new passwords will be hashed using this algorithm. Note that this means passwords between newly created accounts will always have different hashes, since we have no way of knowing a priori whether an account is using Stormpath HMAC or Bcrypt. We'll need to account for this in our Migration script.

Migration Plan

  1. Separate email workflow from account management in AccountDao.
  2. Implement HIbernateAccountDao.
  3. Implement MigrationAccountDao, which
    1. Reads and writes to Stormpath.
    2. Writes to MySQL.
  4. Wait for verifyEmail and resetPassword workflows are implemented.
    1. This is necessary because verifyEmail and resetPassword are currently opaque in Stormpath. Until we re-implement them on our side, we will never be able to sync MySQL with Stormpath.
    2. Once this is implemented, all newly created accounts should now be in sync between MySQL and Stormpath. We just need to backfill old accounts.
  5. Export from Stormpath and backfill into MySQL. At this point, accounts should be sync'ed.
  6. Reverse the MigrationAccountDao to call through to MySQL first, and then fall back to Stormpath. (We should log errors when we fall back to Stormpath, as this represents a major issue with the Stormpath export/import.)
  7. As needed, export from Stormpath again. Generate diff.
  8. Repeat previous step until there is no diff.
  9. Remove MigrationAccountDao and StormpathAccountDao and call HibernateAccountDao directly.

Encryption

Amazon RDS can be configured to be encrypted at rest. This is available on MySQL hosts for t2.larges and above, or any Aurora host. GIven that we only need t2.mediums (see below), it would be cheaper to migrate to Aurora. In addition, according to the docs (http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Overview.Encryption.html#Overview.Encryption.Limitations), once we encrypt a MySQL DB, we won't be able to migrate to Aurora. With that in mind, the most future-proof solution would be to migrate to Aurora before turning on encryption. See also BRIDGE-1842 - Getting issue details... STATUS .

In transit, we use SSL encryption. Note that we're getting SSL cert warnings from Hibernate. It's unclear whether we're actually using SSL. We willneed to address this before we can roll out the new account stuff to Prod. See BRIDGE-1783 - Getting issue details... STATUS

Database Scaling

Looking at MetricsInterceptor logs for signIn, signUp, and /v3/users, we peak at 8 requests per minute. This suggests request rate is not our bottleneck.

According to du -sh, the Stormpath export was 1.6gb uncompressed. We should assume the data size in MySQL will be similar. According to https://www.quora.com/How-much-data-is-too-much-data-for-MySQL, we want to aim for having our entire DB fit in 80% of RAM. This means we need at least 2gb RAM.To handle growth, let's shoot for 4gb RAM.

This suggests we need a db.t2.medium (4gb RAM). Assuming a read replica (for failover and redundancy), this costs 2x$0.082/hr, which is roughly $122/mo.

This is a reasonably sized machine. We probably don't need to partition.

Other Considerations

Enums: Enums in SQL can be appended to, but existing enum values must be unchanged and in the same order.

Timestamps: MySQL/Hibernate have poor support for timestamps. In particular, timezone is ambiguous, and there were no simple solutions to always force UTC. Since MySQL doesn't support timestamps with timezones anyway, we'll just use long epoch milliseconds.

Email Workflows

BRIDGE-1838 - Getting issue details... STATUS

Can be ported over from https://github.com/Sage-Bionetworks/BridgePF/pull/1429

TODO