This page is starting as a collection of notes, design decisions, etc. related to implementing OAuth2 into Synapse. Part of the process has included considerations about developing our own library, or using an off-the-shelf solution like ORY Hydra. The information on this page may change as the project evolves.
See also: Bruce Hoff's presentation on his preliminary research into OAuth2 and how it relates to Synapse.
More reading: Synapse as OAuth 2.0 Provider
And a Jira Epic: - PLFM-4585Getting issue details... STATUS
Good summary of OIDC: https://github.com/dexidp/dex/blob/master/Documentation/openid-connect.md
Use cases
High level use-cases, per Bruce Hoff's presentation:
- Let third party (web) app’s securely access a user’s data in Synapse. Today such app’s must either
- predownload/embed data,
- use the app’ author’s Synapse credentials, or
- prompt the user for their Synapse credentials
- Let a headless batch job (e.g.,a “workflow”) securely access a user’s data in Synapse. Today such a process must either
- Use predownloaded data
- Use the job runner’s Synapse credentials
These use cases must guide how we encode scope.
Some questions:
- How granular do we expect scope must be?
- Do Read/Edit/Write permissions cover all use cases?
- Do permissions need to be set on the entity level?
- e.g. can we afford to make all cases "this external app can read everything you have access to" vs
- this external app can read "syn123, a file in project syn999"
Basic OAuth or OAuth-like flow (authorization code flow only, with no external tools)
Important question: How important is it that we are as-compliant-as-possible to specs like RFC 6749? Do we expect clients to develop their own solutions, or to use existing libraries that expect spec compliance? Are we considering developing our own OAuth client for users? Per this blog post, there are large orgs that aren't entirely up to spec, so we might be able to get away with it too.
To implement the bare minimum to address use cases with an OAuth-like flow, we need
- Endpoints to create, read, list, delete (and optionally, update) clients
- Authentication code generation
Verb | Endpoint | Purpose | Request Object/Params | Response Object/Params | Notes |
---|---|---|---|---|---|
GET | /oauth2/clients/{id} | Get details about one client | Path param: id: an existing OAuth2 Client ID | OAuth2Client name: String redirect_uri: String client_id: Unique created_by/on modified_by/on Supplemental params | The web layer can use this to get details about a client requesting authorization. |
POST | /oauth2/clients | Create a client | OAuth2Client name: String redirect_uri: String Supplemental params | OAuth2Client name: String redirect_uri: String client_id: Unique client_secret: String created_by/on modified_by/on Supplemental params | Supplemental params could include the URL to a logo, link to a website for the app, terms of service, etc. Typically the secret key is used to perform these actions, but if we give Synapse users a claim over an OAuth2 client, we could use their credentials. |
DELETE | /oauth2/clients/{id} | Delete a client | Path param: id: an existing OAuth2 Client ID | None | |
GET | /oauth2/auth | Display the login/consent info to the user and prompt for an accept/reject | URL Parameters: response_type: String (always "code") client_id: Unique redirect_uri: String (points to OAuth client) scope: String state: String | Web interface for Synapse authorization The form should permit login and we must be able to include the request parameters in a new request | This endpoint should point to a web layer that can show a UI with a login form and display the access that the user can consent to, along with a prompt for the user to accept/reject. We should think about using login cookies here to simplify the UX if a user is already logged into Synapse |
POST | /oauth2/auth | The user grants access to the OAuth2 Client to access protected resources | URL Parameters: response_type: String (always "code") client_id: Unique redirect_uri: String (points to OAuth client) scope: String state: String Body: LoginRequest (already exists) | If login is successful: Redirect URL: redirect_uri (provided in request) Parameters: code: the authorization code state: the same value in the request Should we include a LoginResponse body? (Probably not, if the body is kept in the redirect then we may be exposing a session token to the 3rd party client) | Who should execute this? The User Agent or the Web Layer on behalf of the user agent? Question: how to handle with various Synapse IdPs? (E.g. Synapse users who sign in with Google accounts). The "state" parameter is designed to avoid CSRF attacks. More info. |
POST | /oauth2/token | Called by a client to get an access token | Body: OAuth2AuthorizationCodeTokenRequest grant_type: String (always "authorization_code" for this call) code: String (the authorization code) redirect_uri: String (should be the same as previous redirect uri) client_id: Unique client_secret: String | Body: OAuth2AccessToken access_token: String token_type: String ("Bearer") expires_in: Integer (seconds) refresh_token: String (optionally, scope) | The redirect URI should be validated here before granting a token, along with the credentials in the request. More info. The token type in almost all OAuth2 cases is "Bearer". We can use a different token type (or make our own) if we want to, but there is probably no need. More info. |
POST | /oauth2/token or /oauth2/token/refresh | Called by a client to refresh an authentication token | Body: OAuth2AuthorizationCodeRefreshTokenRequest grant_type: String (always "refresh_token" for this call) refresh_token: String client_id: Unique client_secret: String | Body: OAuth2AccessToken access_token: String token_type: String ("Bearer") expires_in: Integer (seconds) refresh_token: String | |
GET | /oauth2/token/introspect or /oauth2/token/info | Clients can determine if an authentication token is valid (and get scope, if it is opaque in the token) | Body: OAuth2TokenIntrospectionRequest token: String client_id: Unique client_secret: String | Body: OAuth2TokenIntrospectionResponse active: Boolean client_id: Unique username: String (principal of user who authorized) exp: Date (seconds until expiration) scope: String | We must have this if we decide to not include scope with the access token |
POST | /oauth2/revoke | A logged in user can revoke OAuth2 client access using this method. | OAuth2RevokeRequest client_id: unique Is there a need for more granularity? | None | Revoking access not in the OAuth2 spec but allowing users to revoke client access may be important. |
Diagrams to show where these API endpoints would be used and what objects/params are needed:
What is "scope"?
JIRAs(?): - PLFM-5170Getting issue details... STATUS
In short: scopes are clearly defined permissions that a user may grant to an OAuth client.
In ORY Hydra, OAuth clients may be limited in the scope they can request (e.g. a photo printing service (OAuth client) may be restricted to only acquire read-photo permission, even if they attempt to request edit-photo permission). This is not a requirement of our implementation, but it is worth consideration.
From RFC-6749
The authorization and token endpoints allow the client to specify the scope of the access request using the "scope" request parameter. In turn, the authorization server uses the "scope" response parameter to inform the client of the scope of the access token issued.
The value of the scope parameter is expressed as a list of space-delimited, case-sensitive strings. The strings are defined by the authorization server. If the value contains multiple space-delimited strings, their order does not matter, and each string adds an additional access range to the requested scope. scope = scope-token *( SP scope-token ) scope-token = 1*( %x21 / %x23-5B / %x5D-7E )
The authorization server MAY fully or partially ignore the scope requested by the client, based on the authorization server policy or the resource owner's instructions. If the issued access token scope is different from the one requested by the client, the authorization server MUST include the "scope" response parameter to inform the client of the actual scope granted. If the client omits the scope parameter when requesting authorization, the authorization server MUST either process the request using a pre-defined default value or fail the request indicating an invalid scope. The authorization server SHOULD document its scope requirements and default value (if defined).
The actual encoding and representation of scope is not necessarily within the scope of this document (ha ha), but it is something that will likely strongly guide our implementation of OAuth. Naturally, how we design scope should be informed by use cases. We should methodically determine what access we wish to grant OAuth2 clients, as well as how much granularity (both breadth of permissions and Synapse object access) we can/should reasonably encode. Additionally, this should be informed by how we have designed and use ACLs, since this paradigm of authorizing access to content in Synapse is likely to be the driver of OAuth-based content authorization.
An OAuth2 client developer must be able to determine the scope that they need when designing their OAuth2 client service, so it is also critical that we document
Examples of OAuth2 scope documentation in the wild:
If we choose to be ultra-granular with scope (e.g. only granting read access to one particular file, with a permission structure like read:syn123
), we will likely have to store these scopes in a database. One implementation option is to make scope a UUID and store the actual scope information in a database; the downside: client developers may have to generate scope UUIDs on-the-fly. This also seems to conflict with the design decisions of other OAuth providers (see external documentation samples above).
Worth looking into: UMA
Stack overflow that led me towards UMA: https://softwareengineering.stackexchange.com/questions/372526/how-to-handle-per-resource-fine-grained-permissions-in-oauth
The spec: https://docs.kantarainitiative.org/uma/ed/uma-core-2.0-01.html
Possibly of interest is the HEART Working group: https://openid.net/wg/heart/
HEART WG aims to standardize data sharing in healthcare using HFIR, OAuth, OIDC, UMA. They seem to be more aimed at patients having control over data they share, but some of their specs may be appropriate for some of our use cases
ORY Hydra
This is a good place to collect information and research relevant to using ORY Hydra to implement OAuth2 and OIDC in Synapse. This portion of the document is not complete, and may not be completed if we choose not to use ORY Hydra to implement OAuth2.
Why ORY Hydra:
Per Bruce Hoff's preliminary research, ORY Hydra is one of very few (and perhaps the only) established, off-the-shelf OAuth2 + OIDC solution that delegates authorization provider and resource provider roles to external services. Since Synapse already has this infrastructure in place (excepting interfaces for OAuth2 authorization flow), we can use Hydra to handle some of the more complicated parts of implementing OAuth2 and OIDC, and create our own authorization/resource provider services that use existing Synapse infrastructure.
Why NOT ORY Hydra:
To be expanded upon later, and points may be focused upon later in this document.
- Possible future maintenance costs, incompatibilities with our infrastructure, and other tech-debt related concerns
- Overkill for our use case?
Does Hydra work with our use cases?
For example, can we implement our proposed scope pattern into Hydra? What other constraints exist in Hydra that may be major roadblocks?
ORY Hydra in the wild
Can we find cases of other engineers using Hydra?
What problems have they run into?
What benefits have they seen that Hydra provides?
What other potentially crucial information about Hydra can we find on the internet?
Decisions and requirements to deploy via Cloudformation
See:
- PLFM-5163Getting issue details... STATUS
Internal configuration
We should tailor ORY Hydra's configuration to meet our needs
Database
Per the ORY Hydra docs:
The SQL adapter supports two DBMS: PostgreSQL 9.6+ and MySQL 5.7+. Please note that older MySQL versions have issues with ORY Hydra's database schema. For more information go here.
If my understanding is correct, the DB that Hydra uses is entirely separate from other services, so it should not be a concern here that Synapse currently uses MySQL 5.6
One concern here is that ORY Hydra requirements may evolve to conflict with other constraints. For example, Hydra may have a security flaw that is only patched by upgrading to a database service that is not provided by Amazon RDS.
Infrastructure Configuration
For reliability, we will want to deploy two instances of ORY Hydra behind a load balancer.
Is Hydra truly stateless? Can we safely configure it behind a load balancer?
How does Hydra "federate" identity providers? Should we configure Hydra differently to interact with Synapse u:p vs. Synapse users that use Google SSO via OAuth? Or delegate that complexity to a Synapse Authentication provider?
Choosing an ELB type
AWS offers three different types of load balancers, described further in this AWS document.
- Application Load Balancer
- Network Load Balancer
- Classic Load Balancer (formerly Elastic Load Balancer)
Also worth looking into (if relevant?) is the Elastic Container Service.
VPC
Per the ORY Hydra docs, ORY Hydra has two ports, a public port, and an administrative port. I think the VPC/ELB should forward requests over TLS/443 to the public port.
The administrative port should not be exposed to public internet traffic. If you want to expose certain endpoints, such as the
/clients
endpoint for OpenID Connect Dynamic Client Registry, you can do so but you need to properly secure these endpoints with an API Gateway or Authorization Proxy.
Do we need this?
Spring Security
Most documentation/blog posts on Spring Security do not refer to our use case. Some posts refer to Spring Security OAuth 2, which is in "maintenance mode" and does not support OIDC. Many posts also instruct using OAuth2/OIDC as a client (we wish to act as a provider).
https://spring.io/blog/2018/01/30/next-generation-oauth-2-0-support-with-spring-security
I think this uses the old version of Spring Security:
This question seems to outline what we want to do:
https://stackoverflow.com/questions/52683165/creating-oauth-2-0-login-provider-with-spring-boot
OIDC is a layer on OAuth, why can we not just implement it on top of the old version of Spring Security?
code
token
id_token
id_token token
code id_token
code token
code id_token token
none
The old version of Spring Security was not built to handle this. Here is the issue (which has not been resolved at the time of writing: https://github.com/spring-projects/spring-security-oauth/issues/619)
The answerer of the SO post also has a blog post that goes more in-depth: https://medium.com/@darutk/full-scratch-implementor-of-oauth-and-openid-connect-talks-about-findings-55015f36d1c3
So how would we use Spring Security 5?
First we need to make sure it can handle all of our needs. Development on OAuth2+OIDC support is ongoing, so it isn't guaranteed that it can currently do what we need it to.
This shows the state of OAuth2 support in Spring Security 5 (and compares it to Spring Security OAuth 2, the old version). Note that this has not been updated since Jan 2018, and I suspect it is out of date.
https://github.com/spring-projects/spring-security/wiki/OAuth-2.0-Features-Matrix
Here are some SpringBoot examples that we could probably leverage:
OAuth2 Authorization Server: https://github.com/spring-projects/spring-security/tree/5.1.1.RELEASE/samples/boot/oauth2authorizationserver
OAuth2 Resource Server: https://github.com/spring-projects/spring-security/tree/5.1.1.RELEASE/samples/boot/oauth2resourceserver
There is no ea
Another library to look into: Connect2ID's OAuth2.0 SDK with OpenID Connect
https://connect2id.com/products/nimbus-oauth-openid-connect-sdk
https://bitbucket.org/connect2id/oauth-2.0-sdk-with-openid-connect-extensions/overview
Apache 2.0 license
This may just help us bootstrap our own solution if Spring doesn't fit our needs. Need to collect more info.