Goals of our design
1) Once a user signs in using a credential, we want them to be able to stay signed in indefinitely. This holds true even if the client returns outside of any time window we might anticipate. To reauthenticate, we've been issuing a token for the purpose of re-acquiring a session when the user loses it.
2) We don't want the reauthentication token stored in plaintext on the server, since it is effectively a password.
3) The reauthentication token can be lost in transit back to the client, along with the session being returned. When this happens the client is in a "failed reauthentication" state and these improvements are primarily designed to ensure the client can recover from this state.
4) We want to invalidate the existing session when issuing a new session. However in practice, this means that we want to invalidate any existing session token(s) once a new session token has been successfully used by the client to authenticate (that's the only session token we know the client has received).
When a reauthentication request succeeds, but the client fails to get back the session, we create a new reauth token and store the old token in Redis. While the client can recover by resending the old reauth token, and they will get the session, the session we send back does not include the new reauth token (we don't have it due to #2 above). We just return the old token in the session. As a result, at some point, that user will still have to authenticate when the cached reauthentication token expires from cache.
The proposed design would fix this.
Signing in
- User signs in
- We create a new session, session token, reauthToken
- We store the following Redis mappings:
- sessionToken ↝ userId
- userId ↝ (sessionToken) the set of valid session tokens is only this token when user signs in
- userId ↝ session
- return the session with the sessionToken and reauthToken in the session
Alternative sign in options that address concurrent sign ins
Concurrent sign ins should be rarer because they involve human intervention (enter credentials, click on a link), but could still theoretically happen. One approach to dealing with this would be to issue a new token with each successful sign in, following this logic:
- issue new session token on each sign in, adding to a set of tokens in the session
- on access
- sessionToken ↝ userId
- userId ↝ session
- is token in session tokens set?
- NO: not authenticated
- YES: is there more than one token in session?
- NO: return session
- YES: replace set with a set consisting only of this token, write session to cache, return session
- is token in session tokens set?
Note that with this approach, we can later allow multiple clients to authenticate simultaneously by not stripping out other session tokens. Each token has an expiry due to the first sessionToken ↝ userId lookup, independent of the session expiry.
Another alternative would be to try the userId↝ session lookup and if a session already exists, return the session token in the session, for some grace period we can record in the session or in the cache (due to network issues though, "concurrency" issues could spread out over time in unexpected ways).
Authenticating a request
...
Goals of our design
1) Once a user signs in using a credential, we want them to be able to stay signed in indefinitely. This holds true even if the client returns outside of any time window we might anticipate. To reauthenticate, we've been issuing a token for the purpose of re-acquiring a session when the user loses it.
2) We don't want the reauthentication token stored in plaintext on the server, since it is effectively a password.
3) The reauthentication token can be lost in transit back to the client, along with the session being returned. When this happens the client is in a "failed reauthentication" state and these improvements are primarily designed to ensure the client can recover from this state.
4) We want to invalidate the existing session when issuing a new session (on sign in). There are some alternatives for how we might accomplish this, but because it's related to sign in, it can be done on a later update (see "Addressing Concurrent Sign In Requests" below).
In the current implementation, When a reauthentication request succeeds, but the client fails to get back the session, we create a new reauth token and store the old token in Redis. While the client can recover by resending the old reauth token again, and they will get the session, the session we send back does not include the new reauth token (we don't have it due to #2 above). We just return the old token in the session. As a result, at some point, that user will still have to authenticate when the cached reauthentication token expires from cache.
The proposed design would fix this.
Signing in
This doesn't change (it might change if we attempt to address concurrent sign in requests; see below).
- User signs in
- We create a new session, session token, reauthToken
- We store the following Redis mappings:
- sessionToken ↝ userId
- userId ↝ session
- return the session with the sessionToken and reauthToken in the session
Authenticating a request
This doesn't change (it might change if we attempt to address concurrent sign in requests; see below).
- User makes request with a sessionToken;
- Retrieve the userId with the sessionToken (if this fails return 404);
- Retrieve the valid sessionToken set session with the userId (if this fails return 404)
- Verify the sessionToken is in the set (if this fails return 404)
- Update the set to include only this sessionToken (if the set only includes this token do nothing)
- Retrieve the session with the userId
- return the session with this session token and whatever reauthentication token is in the session
...
- ;
- Return the session with this session token (we do not store the reauthentication token in the session);
Reauthentication
This is changing so we'll include the success, failure, and concurrent scenarios:
Success
- User reauthenticates with the reauthentication token
- We retrieve the N most recent records by their creation date, hash the token by the algorithm in each record, and compare to the hashed records looking for a match. A match is an authentication success
- We create a new session, sessionToken, reauthTokenIf a session exists, we create a new session but keep the session token/internal session token;
- If a session does not exist, a new session includes new session token/internal session token;
- Persist a new record in the secrets table for the new reauthToken
- We store the following Redis mappings:
- sessionToken ↝ userIduserId ↝ (sessionToken) add this session token to the set, do not recreate it
- userId ↝ session
- return the session with the sessionToken and reauthToken in the session
Thus, the tokens are rotated by successful reauthentication attempts, not by an expiration time.
To force rotation of the session token, we regenerate it with each reauthentication attempt. (If we copied it over, you could reauthenticate every day just before the session expired, keeping the token indefinitely, which isn't secure). We can issue multiple session tokens and we need to accept any of them, but once the client uses a session token, that is the only token that we will accept. That's the one the client has definitely received.
Sign Out
- userId ↝ () empty the set of valid tokens
- Delete the userId ↝ session mapping
- Delete the reauth secret records for this user in the secrets table
Reauthentication
When the session token is expired, the client can send a reauth token via the reauth API. We retrieve the N most recent records for that user by their creation date (probably N=2 but could be N=3 if this is more robust), hash the token by the algorithm in each record, and compare to the hashed records looking for a match. (Do this intelligently: cache the hash by algorithm and reuse it since the algorithm is unlikely to change between reauthentications.) If there's a match, we treat this like a sign in: we generate a new session token and persist a new reauth token, and return a new session with these new tokens. If the reauthentication fails, even on return, the previous token continues to work, because we're comparing against older records as well.
Thus, the tokens are rotated by successful reauthentication attempts, not by an expiration time.
To force rotation of the session token, we regenerate it with each reauthentication attempt. (If we copied it over, you could reauthenticate every day just before the session expired, keeping the token indefinitely, which isn't secure). We can issue multiple session tokens and we need to accept any of them, but once the client uses a session token, that is the only token that we will accept. That's the one the client has definitely received.
Successful reauthentication
- the user signs in, we create a session token and reauth token, create a new record in the secrets table that includes the reauth token hashed, and we return the session token and reauth token as part of the new session. However, the session is stored without the session token
- Redis expires the session after 12 hours, which renders the session token unusable. The client, on getting a 401, makes a request to the reauthentication API with the reauth token;
- we load the most recent N records from the secrets table. Proceeding through each record:
- we hash the reauth token according to the algorithm in the record, OR reuse a cached version of the hash;
- if the hash does not match, proceed to the next recorrd
- if no records match, return a 401
- if the record matches, we remove the current session if it's there, then we create a new session token and reauth token, create a new record in the secrets table that includes the reauth token hashed, and we return the session token and reauth token as part of the new session. This means the oldest record in the secrets table will "drop off" on future queries to load the most recent N records from the secrets table
User reauths, fails to receive the session, and reauths again with the same token
Let's assume in the worst case that the client does not get the session back from the reauthentication call.
- the client makes the same request with the (now old) reauth token;
- we load the most recent N records from the secrets table. It includes the old token, now the second oldest record in the system, and so reauthentication succeeds, as above.
- again a new session is created, a new table record is created, and a session is returned to the user. We can recover from this failure as many times as we want to configure, so if N=3, we can fail 2 times and recover the third time. If that's not robust enough, we can switch to N=4 or higher.
Concurrency issues
There are some issue we have identified when the client makes multiple requests to reauthenticate:
- Multiple requests lead to a reauthentication token being consumed and then subsequent reauthentication requests fail. Without keeping the reauth token in plaintext on the server, we issue a unique token on each request and all tokens will be valid until N tokens are issued. So any request that is persisted will have a valid reauthentication token;
- Multiple request lead to multiple session tokens being returned, only one of which can be valid. We return the existing session token if it exists, rather than rotating it. The token still expires ever 12 hours, or can be deleted with a sign in/sign out operation (along with all valid reauth tokens). So any request that is persisted will have the valid session token;
- There were issues with updating outdated versions of the account record when two requests were both writing a reauth token to the account table. By moving the creation of new reauth tokens to a different table, we eliminate 409 responses during reauthentication (unless the health code is missing, no update to the account table occurs).
Sign out
In addition to deleting the session and session token, we can delete all AccountSecret records for this user.
Persistence
I would add this table, along with a DAO to manage writes to it. Possible names: AccountCredential, AccountSecret, AccountToken, Account(Secret)Key... this table could eventually hold other credentials, like passwords or API keys, so I would keep the nomenclature more general.
CREATE TABLE `AccountSecret` (
`userId` VARCHAR(255) NOT NULL,
`algorithm` ENUM('STORMPATH_HMAC_SHA_256', 'BCRYPT', 'PBKDF2_HMAC_SHA_256') NOT NULL,
`hash` VARCHAR(255) NOT NULL,
`createdOn` BIGINT NOT NULL,
`sessionToken` VARCHAR(255) NOT NULL, # maybe... not sure we'll ever need to know the pairing
`type` ENUM('REAUTH_TOKEN') DEFAULT 'REAUTH_TOKEN'
);
Migration
For some amount of time we'll need to read and incorporate the existing reauth token in the Accounts table into the records we load from this new table, and persist back to this new table. Once this is deployed, we can migrate the tokens out of the Accounts table, then remove the 3 columns from Accounts.
We could eventually migrate passwords out this way as well, if it's ever usefulThe session token is not rotated once it exists (the session contents are rebuilt but the session and internal session tokens are not changed unless the session doesn't exist n the first place because it has expired). If the session token rotated with each reauth request and we removed the validity of the last session token, concurrent reauthentication requests might capture the invalidated session token. The session token still expires after 12 hours regardless of how it is read or updated.
Failure
- User reauthenticates with the reauthentication token
- We retrieve the N most recent records by their creation date, hash the token by the algorithm in each record, and compare to the hashed records looking for a match. In this case, there is no match
- We return a 404 to the user and we do not rotate the reauthentication tokens.
Concurrent Requests
- User reauthenticates with the reauthentication token, then sends a second identical request;
- For the first request, we retrieve the N most recent records by their creation date, hash the token by the algorithm in each record, and compare to the hashed records looking for a match. A match is an authentication success;
- a new session is prepared for the first request;
- a new record in the secrets table is persisted for the first request;
- We store the following Redis mappings for the first request:
- sessionToken ↝ userId
- userId ↝ session
- We return the first request with this session prepared;
- The second request, meanwhile, retrieves N most recent records which may or may not induce the new row in the secrets table created by the other request, but it will still include the desired reauth token row and should lead to another authentication success;
- a new session is prepared and the session token from the first request is maintained;
- We store the following mappings again, which should be identical except possibly for the session contents:
- sessionToken ↝ userId
- userId ↝ session
- We return the second request with this session which looks similar to the first instance that was returned.
If the first request failed to return, and was followed up with a retry, the steps should look similar to this. We can adjust N to make reauthentication more or less robust to concurrent reauthentication requests (e.g. if we find out clients routinely send 5 at once, we could increase N, though not desirable).
Sign Out
- Delete the userId ↝ session mapping
- Delete the sessionToken ↝ userId mapping
- Delete the reauth secret records for this user in the secrets table
Concurrency issues
There are some issue we have identified when the client makes multiple requests to reauthenticate:
- Multiple requests lead to a reauthentication token being consumed and then subsequent reauthentication requests fail. Without keeping the reauth token in plaintext on the server, we issue a unique token on each request and all tokens will be valid until N tokens are issued. So any request that is persisted will have a valid reauthentication token;
- Multiple request lead to multiple session tokens being returned, only one of which can be valid. We return the existing session token if it exists, rather than rotating it. The token still expires ever 12 hours, or can be deleted with a sign in/sign out operation (along with all valid reauth tokens). So any request that is persisted will have the valid session token;
- There were issues with updating outdated versions of the account record when two requests were both writing a reauth token to the account table. By moving the creation of new reauth tokens to a different table, we eliminate 409 responses during reauthentication (unless the health code is missing, no update to the account table occurs).
Persistence
Add this table, along with a DAO to manage writes to it. Possible names: AccountCredential, AccountSecret, AccountToken, Account(Secret)Key... this table could eventually hold other credentials, like passwords or API keys, so I would keep the nomenclature more general.
CREATE TABLE `AccountSecret` (
`userId` VARCHAR(255) NOT NULL,
`algorithm` ENUM('STORMPATH_HMAC_SHA_256', 'BCRYPT', 'PBKDF2_HMAC_SHA_256') NOT NULL,
`hash` VARCHAR(255) NOT NULL,
`createdOn` BIGINT NOT NULL,
`sessionToken` VARCHAR(255) NOT NULL, # maybe... not sure we'll ever need to know the pairing
`type` ENUM('REAUTH_TOKEN') DEFAULT 'REAUTH_TOKEN'
);
Migration
For some amount of time we'll need to read and incorporate the existing reauth token in the Accounts table into the records we load from this new table, and persist back to this new table. Once this is deployed, we can migrate the tokens out of the Accounts table, then remove the 3 columns from Accounts.
We could eventually migrate passwords out this way as well, if it's ever useful.
Addressing Concurrent Sign In Requests
Concurrent sign ins should be rarer because they involve human intervention (enter credentials, click on a link), but could still theoretically happen. There are two approaches to dealing with this.
The simpler option would be to record the timestamp when a session token is created, and reuse that token for a grace period on subsequent or concurrent sign ins. This is the simplest approach.
A more complicated approach would be to issue a new token with each successful sign in. This would solve concurrent sign ins and it would also allow for sign ins on multiple devices (which has been discussed as a capability for Bridge). The logic could be as follows:
- issue new session token on each sign in, adding to a set of tokens in the session
- on access
- sessionToken ↝ userId
- userId ↝ session
- is token in session tokens set?
- NO: not authenticated
- YES: is there more than one token in session?
- NO: return session
- YES: replace set with a set consisting only of this token, write session to cache, return session
- is token in session tokens set?
Note that with this approach, we can later allow multiple clients to authenticate simultaneously by not stripping out other session tokens. Each token has an expiry due to the first sessionToken ↝ userId lookup, independent of the session expiry. We might also want to tie these session tokens to something like a UA header or an IP address to make it harder to hijack them.