Service API Design

Current APIs

See Repository Service API for full examples of requests and responses. What follows are merely examples of the URL patterns for accessing the service using the HTTP methods GET to read an entity, POST to create an entity, PUT to update an entity, and DELETE to delete an entity.

Proposed APIs

Proposed API for EULAs

EULA = end user license agreement

Assumptions

One EULA might be used by many datasets
Some datasets may have no EULA at all
There is no way to agree to EULAs in bulk, it must be done once per dataset unless you have write access to the dataset??? (We don't want our scientists to have to agree to the EULA every time they want to download a new dataset they just created.)

EULA entity

A EULA is yet another entity (node) in the repository service and has the following additional fields:

{
  "name": "TCGA Redistribution Use Agreement",
  "agreement": "The recipient acknowledges that the data herein is provided by TCGA and not SageBionetworks and must abide by ..."
}

The entity kind is 'eula':

POST/GET/PUT/DELETE /repo/v1/eula/#

Dataset to EULA mapping

Datasets will have a new field called eulaId which may be null.

Agreements: User to EULA mapping

There will be a table to record when users agree to EULAs.
The structure of this table will be:

datasetId	datasetVersionId	eulaId	eulaVersionId	userId	agreementDate

Downloading Data

When a user tries to get a location for a dataset or layer:

If ((the user does not have write access to the dataset) AND (the dataset has a eulaId))
- look in the agreement table for an agreement
  - If no agreement, return a AgreementNeeded exception
Otherwise return the location

Making an Agreement

To create the agreement

POST /repo/v1/agreement
{
  "datasetId":"123",
  "eulaId":"456"
}

Note that values for userId, datasetVersionId, eulaVersionId, and agreementDate will all be set by the system at agreement time. Therefore a user must be authenticated and can only create agreements for himself.

Determining in advance whether a user needs to agree:

/repo/v1/query?query=select * from agreement where datasetId=='123' and eulaId='456' and userId='789'

Proposed API for Projects

This work is done.

Get all projects
- GET http://platform.sagebase.org/repo/project

Get primary fields for a project

GET http://platform.sagebase.org/repo/v1/project/812

Response

{ creationDate: 1/1/2011, leaderUserId: 'david.burdick@sagebase.org', numberOfMembers: 1, numberOfPublications: 0, status: 'Active', projectWebsite: 'http://daveproj.com', description: 'project description text'  }

Create a new project

POST http://platform.sagebase.org/repo/v1/project

Transfer Object

{ leaderUserId: 'david.burdick@sagebase.org', numberOfMembers: 1, numberOfPublications: 0, status: 'Active', projectWebsite: 'http://daveproj.com', description: 'project description text' }

Update project
- PUT http://platform.sagebase.org/repo/v1/project/812
Delete a project
- DELETE http://platform.sagebase.org/repo/v1/project/812
Get all annotations for a project
- GET http://platform.sagebase.org/repo/v1/project/812/annotations
Update annotations for a project
- PUT http://platform.sagebase.org/repo/v1/project/812/annotations
Get all Actions for a project (history)
- GET http://platform.sagebase.org/repo/v1/project/812/actions
Create a new Action for a project
- POST http://platform.sagebase.org/repo/v1/project/812/actions
Get all Analyses for a project
- GET http://platform.sagebase.org/repo/v1/project/812/alanyses
Get all Datasets for a project
- GET http://platform.sagebase.org/repo/v1/project/812/datasets

Proposed API for Actions

Create happens from several other areas (i.e. a Project)
Update an Action
- PUT http://platform.sagebase.org/repo/action/99
Delete an Action
- DELETE http://platform.sagebase.org/repo/action/99

Proposed API for Analyses

Create happens from a project, thus an analysis is always owned
Update an Analysis
- PUT http://platform.sagebase.org/repo/analysis/88
Delete an Analysis
- DELETE http://platform.sagebase.org/repo/analysis/88

Proposed API for downloading datasets or layers

This is mostly done, except for enforcing authentication and authorization along with the eula.

Get a location, user is not logged in

GET http://platform.sagebase.org/repo/v1/location/511

HTTP/1.1 401 Unauthorized
Content-Type: application/json
Date: Fri, 11 Feb 2011 19:03:18 GMT
Server: Google Frontend
Cache-Control: private, x-gzip-ok=""
Transfer-Encoding: chunked

{"reason":"You must be logged in to access data"}

Get a location, user has not yet been granted access

GET http://platform.sagebase.org/repo/v1/location/511

HTTP/1.1 403 Forbidden
Content-Type: application/json
Date: Fri, 11 Feb 2011 19:03:18 GMT
Server: Google Frontend
Cache-Control: private, x-gzip-ok=""
Transfer-Encoding: chunked

{"reason":"You are not authorized to access this resource"}

Get a location, user previously been granted access

GET http://platform.sagebase.org/repo/v1/location/511

HTTP/1.1 200 OK
Content-Type: application/json
Date: Fri, 11 Feb 2011 19:03:18 GMT
Server: Google Frontend
Cache-Control: private, x-gzip-ok=""
Transfer-Encoding: chunked

{
    "path":"http://data01.sagebase.org.s3.amazonaws.com/tcga_curation_pacakge.tar.gz?AWSAccessKeyId=44CF9SAMPLEF252F707&Expires=1177363698&Signature=vjSAMPLENmGa%2ByT272YEAiv4%3D",
    "md5sum":"3a4460b6378bea1509954b6c13d84387",
    "type":"awss3"
}

Proposed API for Free Text Search

Free Text Search datasets
- GET http://platform.sagebase.org/repo/v1/search?q=aging&type=dataset
- This is for a free-text search of "aging". Also will need to do structured queries against specific fields. (E.g. tissue = "Brain", service returns any datasets where tissue is brain or a subclass of brain in an ontology)

Proposed Annotation Type API

JSON Schema Resources

Example schemas: http://json-schema.org/
Draft JSON Schema Spec: http://tools.ietf.org/html/draft-zyp-json-schema-03
Schema to Java POJO code generator: http://code.google.com/p/jsonschema2pojo/
Java POJO to Schema generator: http://wiki.fasterxml.com/JacksonJsonSchemaGeneration
Google is using JSON schema to describe its APIs
- http://www.pcworld.com/businesscenter/article/227477/google_launches_discovery_service_for_its_apis.html
- http://code.google.com/apis/discovery/

Positive Integer Annotation

POST /repo/v1/annotationtype
      {
         "name":"numberOfSamples",
         "displayName":"Number of Samples",
         "schema":{
            "description":"The number of samples in a dataset layer.",
            "type":"integer",
            "minimum":1
         }
      }

Enumerated Type Annotation

POST /repo/v1/annotationtype
      {
         "name":"status",
         "displayName":"Status",
         "schema":{
            "description":"The status of a dataset.",
            "type":"string",
            "enum":[
               "unknown",
               "pending",
               "curated",
               "QCed"
            ],
            "default":"unknown"
         }
      }

Very Long String Annotation

POST /repo/v1/annotationtype
      {
         "name":"curationNotes",
         "displayName":"Curation Notes",
         "schema":{
            "description":"The free text notes on curation that did not have a proper home in the ISA-Tab representation of the dataset metadata.",
            "type":"string",
            "maxLength":4096
         }
      }

Ontology Annotation

POST /repo/v1/annotationtype
      {
         "name":"tissueType",
         "displayName":"Tissue Type",
         "schema":{
            "description":"The type of tissue in a dataset layer.",
            "type":"string",
            "format":"TODO URL to the ontology we are using for this"
         }
      }

Array of Values Annotation

POST /repo/v1/annotationtype
      {
         "name":"tissueTypes",
         "displayName":"Tissue Types",
         "schema":{
            "description":"The types of tissue in a dataset.",
            "type":"array",
            "items":{
               "type":"string",
               "format":"TODO URL to the ontology we are using for this"
            }
         }
      }

Proposed API for Batch Requests

Ultimately we would like to support a full-on batch implementation similar to https://developers.facebook.com/docs/api/batch/ that would allow fully independent requests to be issued in batch

BATCH REQUEST

curl \
    –F 'access_token=…' \
    -F 'batch=[ \
          { "method": "POST", \
            "relative_url": "me/feed", \
            "body": "message=Test status update&link=http://developers.facebook.com/" \
          }, \
          { "method":"GET", \
            "relative_url":"me/feed?limit=1" \
          } \
        ]'\
    https://graph.facebook.com

BATCH RESPONSE

[
    { "code": 200,
      "headers": [
          { "name":"Content-Type", 
            "value":"text/javascript; charset=UTF-8"}
       ],
      "body":"{\"id\":\"…\"}"
    },
    { "code": 200,
      "headers": [
          { "name":"Content-Type", 
            "value":"text/javascript; charset=UTF-8"
          },
          { "name":"ETag", 
            "value": "…"
          }
      ],
      "body": "{\"data\": [{…}]}
    }
]

We have an immediate need for one particular API to support batch requests and in the interest of time could go with something more simple:

BATCH REQUEST

GET /repo/v1/entity/type?batch=123,456,789

BATCH RESPONSE

{
  "paging": {},
  "results": [
              {
               "id":"123",
               "name":"Example Dataset 1",
               "type":"/dataset"
              },
              {
               "id":"456",
               "name":"Example Dataset 2",
               "type":"/dataset"
              },
              {
               "id":"567",
               "name":"Example Dataset 3",
               "type":"/dataset"
              },
             ],
  "totalNumberOfResults": 3
}

And so on:

GET /entity/type?batch=123,456,789

GET /entity/annotations?batch=123,456,789

GET /entity/s3Token?batch=123,456,789

GET /entity/acl?batch=123,456,789

Each returning PaginatedResults<T> where T is what ever /entity/<the id>/<the suffix> would return.

Requirements and Design Goals

Strongly Recommended

Different clients should not need different APIs.
- R and the web ui should be be able to call the same service APIs although the encoding of the request body and result might differ, differences indicated by HTTP headers such as Content-Type and Accept.
Read-only requests should be GET requests. This makes them cacheable and bookmark-able.
- Requests that are not read-only should not be GET requests.
All resources in the system should be uniquely identifiable by a particular URL.
- We should work hard to make sure the mapping of URL to resource is immutable.
- This URL will return the most recent version. For discussion: An additional query parameter would be used to specify a particular version? Or do we add more to the path to specify the version?
- The idea is that these resource URLs will occur in publications, hence the need for immutability.
APIs must be idempotent.
- For resource creation requests, include a request ID if there is no other field in the request that could be used to determine whether the creation request is a duplicate.
The primary data format for responses will be JSON.
- We may support additional formats on an as-needed basis such as JSONP, ATOM (XML), and RSS 2.0(XML), and RDF.
- For any data that might be valuable in mashups, we should strongly consider adding support for JSONP.
All APIs that could potentially return more than ~5 items should support pagination parameters.
All responses returning resources and/or modifying resources should include an ETag. What is an ETag?
Update requests include the ETag via the If-Match header for concurrency control
A version number for our API
All components in the software stack should speak UTF-8

Nice To Have

Partial Responses: query parameters are used to indicate which portions of the resource to return
Partial Updates: request parameters specify only the portions of the resource to be updated

Options

Options Considered

Generic REST
Google Data API
OData

Options Ruled Out Early

ATOM
- ATOM is XML only
JSON-RPC
- this is what we used at IMDb (it was not a recommendation of mine though)
- it is very expressive
- but "heavy" and overkill for many things
SOAP
- from my reading the trend is that new development is moving away from SOAP because it is too "heavy".

REST Details

REST is merely guidance on your API. Its not actually a protocol.

Good intro to REST presentation http://www.slideshare.net/guestb2ed5f/scalable-reliable-secure-rest

Spring MVC 3.0 added many more features to make REST-ful services easier to write.

Good introductory slide shows on Spring MVC 3.0 REST features http://www.slideshare.net/habuma/spring-mvc-rest
Versus JSX-RS (also a good introduction): http://www.infoq.com/articles/springmvc_jsx-rs
Spring MVC 3.0 Documentation http://static.springsource.org/spring/docs/3.0.x/spring-framework-reference/html/mvc.html
More server-side info
Client-side info http://blog.springsource.com/2009/03/27/rest-in-spring-3-resttemplate/

Some REST-ful API Examples

LinkedIn http://blog.linkedin.com/2009/07/08/brandon-duncan-java-one-building-consistent-restful-apis-in-a-high-performance-environment/
Facebook Graph API
- Overview http://developers.facebook.com/docs/api
- Reference http://developers.facebook.com/docs/reference/api/
NextBio Query API

Helpful Links to HTTP protocol Spring stuff:

HTTP status codes http://static.springsource.org/spring/docs/3.0.x/javadoc-api/org/springframework/http/HttpStatus.html
Annotations http://static.springsource.org/spring/docs/3.0.x/javadoc-api/org/springframework/web/bind/annotation/package-tree.html
Exception Handlers http://pietrowski.info/2010/06/spring-mvc-exception-handler/ an example

Google Data Protocol Details

Google Data Protocol is a REST-ful protocol based upon ATOM and JSON used by many Google APIs

Example requests and responses: http://code.google.com/apis/gdata/docs/2.0/basics.html
Protocol reference: http://code.google.com/apis/gdata/docs/2.0/reference.html
PUT means update, POST means create, PATCH means partial update
the item unique identifier is held in gd:etag field and Etag: header
- use IF-MATCH header for PUT, PATCH, and DELETE methods
For PUT and PATCH you have to specify the original entry's ETag to make sure you don't overwite anyone else's changes.
For results formatted as JSON instead of XML, add request parameter alt=json
For results formatted as JSONP (JSON wrapped in a script tag) instead of XML, add request parameter alt=json-in-script&callback=myFunction.
- Using callback functions allows you get around some of the cross-domain security issues you might encounter in typical client side JavaScript. Usually browsers prevent you from loading files across domains because of potential security holes and the cross domain attacks that could result.

OData Data

OData is a REST-ful protocol based upon ATOM and JSON and backed by Microsoft.

Overview http://www.odata.org/developers/protocols/overview
Protocol Reference http://www.odata.org/developers/protocols/operations
PUT means update, POST means create, MERGE means partial update
- use IF-MATCH header for PUT, MERGE, and DELETE methods
Uses $value at the end of a URL to indicate raw data instead of OData formatted data

More Details

APIKeys for Authentication and Authorization

(proposal from Bruce here)

HTTP Methods

Some browsers and/or firewalls do not support PUT or DELETE operations. Here are two commons work arounds:

HTTP Header Override
- X-HTTP-Method-Override: PUT
- X-HTTP-Method-Override: DELETE
- used by Google Data Protocol
Hidden Form Field
- _method=PUT
- _method=DELETE
- Spring MVC 3.0 implements this via org.springframework.web.filter.HiddenHttpMethodFilter