Reauthentication Improvements

Goals of our design

1) Once a user signs in using a credential, we want them to be able to stay signed in indefinitely. This holds true even if the client returns outside of any time window we might anticipate. To reauthenticate, we've been issuing a token for the purpose of re-acquiring a session when the user loses it.

2) We don't want the reauthentication token stored in plaintext on the server, since it is effectively a password.

3) The reauthentication token can be lost in transit back to the client, along with the session being returned. When this happens the client is in a "failed reauthentication" state and these improvements are primarily designed to ensure the client can recover from this state.

4) We want to invalidate the existing session when issuing a new session (on sign in). There are some alternatives for how we might accomplish this, but because it's related to sign in, it can be done on a later update (see "Addressing Concurrent Sign In Requests" below).

In the current implementation, When a reauthentication request succeeds, but the client fails to get back the session, we create a new reauth token and store the old token in Redis. While the client can recover by resending the old reauth token again, and they will get the session, the session we send back does not include the new reauth token (we don't have it due to #2 above). We just return the old token in the session. As a result, at some point, that user will still have to authenticate when the cached reauthentication token expires from cache.

The proposed design would fix this.

Signing in

This doesn't change (it might change if we attempt to address concurrent sign in requests; see below).

User signs in
We create a new session, session token, reauthToken
We store the following Redis mappings:
- sessionToken ↝ userId
- userId ↝ session
return the session with the sessionToken and reauthToken in the session

Authenticating a request

This doesn't change (it might change if we attempt to address concurrent sign in requests; see below).

User makes request with a sessionToken;
Retrieve the userId with the sessionToken (if this fails return 404);
Retrieve the session with the userId (if this fails return 404);
Return the session with this session token (we do not store the reauthentication token in the session);

Reauthentication

This is changing so we'll include the success, failure, and concurrent scenarios:

Success

User reauthenticates with the reauthentication token
We retrieve the N most recent records by their creation date, hash the token by the algorithm in each record, and compare to the hashed records looking for a match. A match is an authentication success
If a session exists, we create a new session but keep the session token/internal session token;
If a session does not exist, a new session includes new session token/internal session token;
Persist a new record in the secrets table for the new reauthToken
We store the following Redis mappings:
- sessionToken ↝ userId
- userId ↝ session
return the session with the sessionToken and reauthToken in the session

The session token is not rotated once it exists (the session contents are rebuilt but the session and internal session tokens are not changed unless the session doesn't exist n the first place because it has expired). If the session token rotated with each reauth request and we removed the validity of the last session token, concurrent reauthentication requests might capture the invalidated session token. The session token still expires after 12 hours regardless of how it is read or updated.

Failure

User reauthenticates with the reauthentication token
We retrieve the N most recent records by their creation date, hash the token by the algorithm in each record, and compare to the hashed records looking for a match. In this case, there is no match
We return a 404 to the user and we do not rotate the reauthentication tokens.

Concurrent Requests

User reauthenticates with the reauthentication token, then sends a second identical request;
For the first request, we retrieve the N most recent records by their creation date, hash the token by the algorithm in each record, and compare to the hashed records looking for a match. A match is an authentication success;
a new session is prepared for the first request;
a new record in the secrets table is persisted for the first request;
We store the following Redis mappings for the first request:
- sessionToken ↝ userId
- userId ↝ session
We return the first request with this session prepared;
The second request, meanwhile, retrieves N most recent records which may or may not induce the new row in the secrets table created by the other request, but it will still include the desired reauth token row and should lead to another authentication success;
a new session is prepared and the session token from the first request is maintained;
We store the following mappings again, which should be identical except possibly for the session contents:
- sessionToken ↝ userId
- userId ↝ session
We return the second request with this session which looks similar to the first instance that was returned.

If the first request failed to return, and was followed up with a retry, the steps should look similar to this. We can adjust N to make reauthentication more or less robust to concurrent reauthentication requests (e.g. if we find out clients routinely send 5 at once, we could increase N, though not desirable).

Sign Out

Delete the userId ↝ session mapping
Delete the sessionToken ↝ userId mapping
Delete the reauth secret records for this user in the secrets table

Concurrency issues

There are some issue we have identified when the client makes multiple requests to reauthenticate:

Multiple requests lead to a reauthentication token being consumed and then subsequent reauthentication requests fail. Without keeping the reauth token in plaintext on the server, we issue a unique token on each request and all tokens will be valid until N tokens are issued. So any request that is persisted will have a valid reauthentication token;
Multiple request lead to multiple session tokens being returned, only one of which can be valid. We return the existing session token if it exists, rather than rotating it. The token still expires ever 12 hours, or can be deleted with a sign in/sign out operation (along with all valid reauth tokens). So any request that is persisted will have the valid session token;
There were issues with updating outdated versions of the account record when two requests were both writing a reauth token to the account table. By moving the creation of new reauth tokens to a different table, we eliminate 409 responses during reauthentication (unless the health code is missing, no update to the account table occurs).

Persistence

Add this table, along with a DAO to manage writes to it. Possible names: AccountCredential, AccountSecret, AccountToken, Account(Secret)Key... this table could eventually hold other credentials, like passwords or API keys, so I would keep the nomenclature more general.

CREATE TABLE `AccountSecret` (
  `userId` VARCHAR(255) NOT NULL,
  `algorithm` ENUM('STORMPATH_HMAC_SHA_256', 'BCRYPT', 'PBKDF2_HMAC_SHA_256') NOT NULL,
  `hash` VARCHAR(255) NOT NULL,
  `createdOn` BIGINT NOT NULL,
  `sessionToken` VARCHAR(255) NOT NULL, # maybe... not sure we'll ever need to know the pairing
  `type` ENUM('REAUTH_TOKEN') DEFAULT 'REAUTH_TOKEN'
);

Migration

For some amount of time we'll need to read and incorporate the existing reauth token in the Accounts table into the records we load from this new table, and persist back to this new table. Once this is deployed, we can migrate the tokens out of the Accounts table, then remove the 3 columns from Accounts.

We could eventually migrate passwords out this way as well, if it's ever useful.

Addressing Concurrent Sign In Requests

Concurrent sign ins should be rarer because they involve human intervention (enter credentials, click on a link), but could still theoretically happen. There are two approaches to dealing with this.

The simpler option would be to record the timestamp when a session token is created, and reuse that token for a grace period on subsequent or concurrent sign ins. This is the simplest approach.

A more complicated approach would be to issue a new token with each successful sign in. This would solve concurrent sign ins and it would also allow for sign ins on multiple devices (which has been discussed as a capability for Bridge). The logic could be as follows:

issue new session token on each sign in, adding to a set of tokens in the session
on access
- sessionToken ↝ userId
- userId ↝ session
  - is token in session tokens set?
    - NO: not authenticated
    - YES: is there more than one token in session?
      - NO: return session
      - YES: replace set with a set consisting only of this token, write session to cache, return session

Note that with this approach, we can later allow multiple clients to authenticate simultaneously by not stripping out other session tokens. Each token has an expiry due to the first sessionToken ↝ userId lookup, independent of the session expiry. We might also want to tie these session tokens to something like a UA header or an IP address to make it harder to hijack them.