Document toolboxDocument toolbox

Synapse as OAuth 2.0 Provider

Introduction

On Tue, Aug 22, 2017 users started complaining that Synapse was 'slow'.  The resulting investigation revealed that a web crawler walking over the public content of Synapse, inadvertently triggered a Denial of Service (DoS) event (see PLFM-4568).  Synapse utilizes throttling to both prevent and resolve DoS events.  However, this was not an option for this event since all of the calls were made anonymously (see: PLFM-4586).  To address this issue we need to ensure session identifiers are provided for all calls, including anonymous calls.


Since we are going to make changes to how sessions are created in Synapse, it seemed like a good time to reevaluate the Synapse authentication services.  Specifically, should Synapse adopt an industry standard for web services authentication  such as OAuth 2.0?  The following epic is used to track all things related to OAuth 2.0: PLFM-4585.

What is OAuth 2.0

The OAuth 2.0 authorization framework enables a third-party

application to obtain limited access to an HTTP service, either on

behalf of a resource owner by orchestrating an approval interaction

between the resource owner and the HTTP service, or by allowing the

third-party application to obtain access on its own behalf. – RFC 6749

For the full specification see: RFC 6749 The OAuth 2.0 Authorization Framework.

Roles

OAuth 2.0 defines five roles.  For Synapse these roles are assigned as follows:

  • resource owner - A Synapse end-user that has been granted access to a resource within Synapse such as a file.
  • resource server - The Synapse REST API services that provides access to a resource.  The majority of Synapse REST API services fall into this category.
  • client - Any client software designed to communicate with the Synapse REST API services.
  • authorization server - Sub-set of Synapse REST API services the perform authorization checks.

Protocol Flow

Synapse Protocol Flow

A core feature of Synapse is to act as a file store.  Each file is a resource owned by the end-user that uploaded it.  Resource owners control access to a resources by adding/removing users or teams from the resource's Access Control List (ACL).  The followings steps are required to access a resources in Synapse:

  1.  The identity of the caller must be confirmed by the process of Authentication.
  2. A check is then made to determine if the identified caller has access to the resource by the process of Authorization
  3. Upon successful completion of both Authentication and Authorization, an end-users gains access to the resource.

In Synapse, step one is achieved by calling POST /login to acquire a session token.  The session token identifies the caller, it does not authorize the holder to do anything.  Instead, an authorization check occurs when the session token is provided in the header of a call to access a resources such as GET /fileHandle/{handleId}/url.

OAuth 2.0 Protocol Flow

The OAuth 2.0 Protocol Flow is fundamentally different than the Synapse protocol flow.   The OAuth protocol does not include an equivalent to a Synapse session token.  Instead, callers request Access Tokens from an authorization server.  The authorization check occurs when an access token is acquired.  Therefore, when an access token is presented to a resources server, no further authorization check is required.  Instead, the resources server is expected to provide access to the resource if the provided access token is valid.

Synapse VS OAuth 2.0

The main advantage of the OAuth 2.0 approach is that authorization is completely decoupled from resource access.  In Synapse, authorization is integral to each and every web service request.  The main disadvantage of OAuth 2.0 is that it takes at least two web service requests to do anything: one call to get an access token and another to get the actual resource

Requiring two calls to gain access to a resources might not be as bad as it sounds.  Here is an extreme example where it could actually be a performance boost.   The GET /entity/{id}/bundle service allows a caller to gain access to fourteen different aspects of an Entity in a single call.  In this case, each aspect is actually its own service, each with its own integrated authorization.  This means a single call to GET /entity/{id}/bundle can trigger at least fourteen separate and redundant authorization checks.  If there was a clean separation of authorization and resources access then only single authorization check would be required for this call.  So, is the price of the extra web service request to get an access token less than the price of the fourteen redundant authorization checks?

Decoupling authorization from all Synapse service calls would be a monuments task.  We might want to consider a path where we support a hybrid approach and slowly transition to the OAuth 2.0 Protocol Flow.

OAuth Client Types

OAuth 2.0 defines two client types.   A client's type determines what a client is allowed to do.  Synapse, the client types are assigned as follows:

  • confidential - Clients developed by Sagebionetworks specifically designed to communicate with the Synapse REST API.  These are clients written by us that can guarantee that user's credentials remain confidential.  Confidential clients are allowed to gather and use a user's credentials (username & password).  Confidential clients are limited to the following:
    • Synapse Web Client (synapse.org) via Synapse Java Client.
    • Synapse Python client
    • Synapse R client
  • public - Clients developed by 3rd parties.  These are clients that we have no control over and therefore cannot guarantee that user's credentials remain confidential.  Public clients are not allowed to gather or use a user's private credentials (password or api key).

OAuth 2.0 defines separate services for confidential and public clients.

OAuth Authorization User Cases

The OAuth 2.0 specification covers four main authorization use case to gain access tokens:

  • 4.1. Authorization Code Grant - Supports authorization by confidential web clients such as www.synapse.org (see: PLFM-4590).
  • 4.2. Implicit Grant - Supports authorization by public (3rd party) web clients (see: PLFM-4591).  Not sure why we would do this rather than simply asking all 3rd party web-based clients to use the Authorization Code Grant.
  • 4.3. Resource Owner Password Credentials Grant - Supports authorization by confidential non-web clients using resource owners credentials (see: PLFM-4592).
  • 4.4. Client Credentials Grant - Supports access by a trusted client using the client's credentials and not the resource owner's credentials (see PLFM-4593).  Examples in Synapse would be CloudMailIn (when sending an email via a web request) and the Synapse Docker registry (when sending event notifications).

Note: Both 4.1 and 4.2 requires the client to have a web page that the authorization server can redirect a browser to upon success.