Auth Migration
version | comment |
---|---|
2021/11/08 | Added this table for tracking |
2017/10/20 | Created |
Master tracking JIRA - BRIDGE-1672Getting issue details... STATUS
Overview
Part 1: Account Information - Stormpath is currently used to store credentials, profile attributes, consents, and other "custom data". This needs to be represented in Bridge. Given that we frequently need to query and iterate users, a SQL database is a good fit.
Part 2: Email Workflows - We need to re-implement email workflows for verify email, password reset, and "magic link" authentication.
These two parts can be done mostly in parallel.
Account Information
Table Schemas
Accounts
Primary Key: id
Constraints: unique(studyId, email)
id varchar(255) - Accounts imported from Stormpath will use the Stormpath ID. New accounts will have a randomly generated ID. (Guid?)
studyId varchar(255) - Study the account lives in.
email varchar(255) - User's email address.
createdOn bigint - Long epoch milliseconds representing when the account was created.
healthCode varchar(255) - Health code. Has an index so we can look it up.
healthId varchar(255) - Health ID (used to map to Health Code). This used to be encrypted in Stormpath. Since we're now storing healthCode unencrypted, there's no need to store this encrypted.
modifiedOn bigint - Long epoch milliseconds representing when the account was last modified.
firstName varchar(255) - User's first name.
lastName varchar(255) - User's last name.
passwordHash varchar(255) - Salted hashed passwords.
passwordModifiedOn bigint - Long epoch milliseconds representing when the account's password was last modified.
passwordAlgorithm enum - HmacSha256 or Bcrypt or maybe something, for future compatibility.
status enum - disabled, enabled, or unverified.
Attributes
Unique Key: accountId + attributeKey
accountId varchar(255) - Foreign key into Accounts table.
attributeKey varchar(255) - Attribute key. (Note: "key" is a SQL reserved keyword.)
attributeValue varchar(255) - Attribute value. (Note: "value" is a SQL reserved keyword.)
Roles
Unique Key: accountId + role
accountId varchar(255) - Foreign key into Accounts table.
role enum - developer, researcher, admin, test_users, or worker
NOTE: This table exists as a separate join table, because Hibernate doesn't support the MySQL SET type.
Consents
Unique Key: accountId + subpopulationGuid + signedOn
accountId varchar(255) - Foreign key into Accounts table.
subpopulationGuid varchar(255) - Subpopulation guid of the signed consent.
signedOn bigint - Long epoch milliseconds representing when the consent was signed.
birthdate date - User's birthdate.
consentCreatedOn bigint - Long epoch milliseconds representing when the consent was created on, used to identify what version of the subpopulation's consent was signed.
name varchar(255) - User's full name.
signatureImageData mediumtext - Base64 encoded signature image. Empirically, the largest signature is ~135kb, which is much too large for a text (64kb). So we're using a mediumtext (16mb). It is very unlikely that an app will upload a signature that's more than 16mb.
signatureImageMimeType varchar(255) - Signature image MIME type. Example: image/png
withdrewOn bigint - Long epoch milliseconds representing when the user withdrew consent, if they withdrew consent.
NOTE: Birthdate and full name are in Consents table, even though these are unlikely to change as the user consents and withdraws and re-consents. However, to maintain backwards compatibility and parity, we are keeping it in the Consents table. We can consider migrating these to Accounts at a later date.
Password Hashing Algorithm
As per https://stormpath.com/export#use-pw, password hashes are either in HmacSha256 or in Bcrypt. (NOTE: During the migration, we discovered that all Stormpath passwords were hashed using HmacSha256.) Bcrypt passwords are documented on the Stormpath FAQ.
HmacSha256 can be decrypted using the following code sample (provided by Stormpath support):
// Get MCF string from accountID.json file String mcf_string = "$stormpath1$ctYP52a2Sp2yIjzzlJAuPg==$djHLTcfEerQ3rCQAUi1kFgGN9lqmZHwz7PjKdSst/hg="; // Extract the b64 encoded salt and password hash String[] mcf = mcf_string.split("\\$"); String b64_salt = mcf[2]; String b64_hash = mcf[3]; // Decode the b64 salt to get the salt byte array byte[] salt = Base64.getDecoder().decode(b64_salt.getBytes()); // Get the user's plain password String plaintext_password = "Jenydoby6!"; // Salt the plaintext password, prepend to user provided password, then hash try { Mac sha256_HMAC = Mac.getInstance("HmacSHA256"); SecretKeySpec secret_key = new SecretKeySpec(salt, "HmacSHA256"); sha256_HMAC.init(secret_key); String hash = Base64.getEncoder().encodeToString(sha256_HMAC.doFinal(plaintext_password.getBytes())); if (hash.equals(b64_hash) { log.info("Success! Both hashes match!"); } else { log.info("Passwords do not match."); } } catch (Exception e) { log.error(e); }
NIST recommends using PBKDF2, so new passwords will be hashed using this algorithm. Note that this means passwords between newly created accounts will always have different hashes, since we have no way of knowing a priori whether an account is using Stormpath HMAC or Bcrypt. We'll need to account for this in our Migration script.
Migration Plan
- Separate email workflow from account management in AccountDao.
- Implement HIbernateAccountDao.
- Implement MigrationAccountDao, which
- Reads and writes to Stormpath.
- Writes to MySQL.
- Wait for verifyEmail and resetPassword workflows are implemented.
- This is necessary because verifyEmail and resetPassword are currently opaque in Stormpath. Until we re-implement them on our side, we will never be able to sync MySQL with Stormpath.
- Once this is implemented, all newly created accounts should now be in sync between MySQL and Stormpath. We just need to backfill old accounts.
- Export from Stormpath and backfill into MySQL. At this point, accounts should be sync'ed.
- Reverse the MigrationAccountDao to call through to MySQL first, and then fall back to Stormpath. (We should log errors when we fall back to Stormpath, as this represents a major issue with the Stormpath export/import.)
- As needed, export from Stormpath again. Generate diff.
- Repeat previous step until there is no diff.
- Remove MigrationAccountDao and StormpathAccountDao and call HibernateAccountDao directly.
Encryption
Amazon RDS can be configured to be encrypted at rest. This is available on MySQL hosts for t2.larges and above, or any Aurora host. GIven that we only need t2.mediums (see below), it would be cheaper to migrate to Aurora. In addition, according to the docs (http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Overview.Encryption.html#Overview.Encryption.Limitations), once we encrypt a MySQL DB, we won't be able to migrate to Aurora. With that in mind, the most future-proof solution would be to migrate to Aurora before turning on encryption. See also - BRIDGE-1842Getting issue details... STATUS .
In transit, we use SSL encryption. Note that we're getting SSL cert warnings from Hibernate. It's unclear whether we're actually using SSL. We willneed to address this before we can roll out the new account stuff to Prod. See - BRIDGE-1783Getting issue details... STATUS
Database Scaling
Looking at MetricsInterceptor logs for signIn, signUp, and /v3/users, we peak at 8 requests per minute. This suggests request rate is not our bottleneck.
According to du -sh, the Stormpath export was 1.6gb uncompressed. We should assume the data size in MySQL will be similar. According to https://www.quora.com/How-much-data-is-too-much-data-for-MySQL, we want to aim for having our entire DB fit in 80% of RAM. This means we need at least 2gb RAM.To handle growth, let's shoot for 4gb RAM.
This suggests we need a db.t2.medium (4gb RAM). Assuming a read replica (for failover and redundancy), this costs 2x$0.082/hr, which is roughly $122/mo.
This is a reasonably sized machine. We probably don't need to partition.
Other Considerations
Enums: Enums in SQL can be appended to, but existing enum values must be unchanged and in the same order.
Timestamps: MySQL/Hibernate have poor support for timestamps. In particular, timezone is ambiguous, and there were no simple solutions to always force UTC. Since MySQL doesn't support timestamps with timezones anyway, we'll just use long epoch milliseconds.
Email Workflows
- BRIDGE-1838Getting issue details... STATUS
Can be ported over from https://github.com/Sage-Bionetworks/BridgePF/pull/1429
TODO