Document toolboxDocument toolbox

Service API Design

Current APIs

See Repository Service API for full examples of requests and responses. What follows are merely examples of the URL patterns for accessing the service using the HTTP methods GET to read an entity, POST to create an entity, PUT to update an entity, and DELETE to delete an entity.

Proposed APIs

Proposed API for EULAs

EULA = end user license agreement

Assumptions

  • One EULA might be used by many datasets
  • Some datasets may have no EULA at all
  • There is no way to agree to EULAs in bulk, it must be done once per dataset unless you have write access to the dataset??? (We don't want our scientists to have to agree to the EULA every time they want to download a new dataset they just created.)

EULA entity

A EULA is yet another entity (node) in the repository service and has the following additional fields:

{
  "name": "TCGA Redistribution Use Agreement",
  "agreement": "The recipient acknowledges that the data herein is provided by TCGA and not SageBionetworks and must abide by ..."
}

The entity kind is 'eula':

POST/GET/PUT/DELETE /repo/v1/eula/#

Dataset to EULA mapping

Datasets will have a new field called eulaId which may be null.

Agreements: User to EULA mapping

There will be a table to record when users agree to EULAs.
The structure of this table will be:

datasetId

datasetVersionId

eulaId

eulaVersionId

userId

agreementDate

Downloading Data

When a user tries to get a location for a dataset or layer:

  • If ((the user does not have write access to the dataset) AND (the dataset has a eulaId))
    • look in the agreement table for an agreement
      • If no agreement, return a AgreementNeeded exception
  • Otherwise return the location

Making an Agreement

  • To create the agreement

    POST /repo/v1/agreement
    {
      "datasetId":"123",
      "eulaId":"456"
    }
    
  • Note that values for userId, datasetVersionId, eulaVersionId, and agreementDate will all be set by the system at agreement time. Therefore a user must be authenticated and can only create agreements for himself.

Determining in advance whether a user needs to agree:

/repo/v1/query?query=select * from agreement where datasetId=='123' and eulaId='456' and userId='789'

Proposed API for Projects

This work is done.

Proposed API for Actions

Proposed API for Analyses

Proposed API for downloading datasets or layers

This is mostly done, except for enforcing authentication and authorization along with the eula.

  • Get a location, user is not logged in

    GET http://platform.sagebase.org/repo/v1/location/511
    
    HTTP/1.1 401 Unauthorized
    Content-Type: application/json
    Date: Fri, 11 Feb 2011 19:03:18 GMT
    Server: Google Frontend
    Cache-Control: private, x-gzip-ok=""
    Transfer-Encoding: chunked
    
    {"reason":"You must be logged in to access data"}
    
  • Get a location, user has not yet been granted access

    GET http://platform.sagebase.org/repo/v1/location/511
    
    HTTP/1.1 403 Forbidden
    Content-Type: application/json
    Date: Fri, 11 Feb 2011 19:03:18 GMT
    Server: Google Frontend
    Cache-Control: private, x-gzip-ok=""
    Transfer-Encoding: chunked
    
    {"reason":"You are not authorized to access this resource"}
    
  • Get a location, user previously been granted access

    GET http://platform.sagebase.org/repo/v1/location/511
    
    HTTP/1.1 200 OK
    Content-Type: application/json
    Date: Fri, 11 Feb 2011 19:03:18 GMT
    Server: Google Frontend
    Cache-Control: private, x-gzip-ok=""
    Transfer-Encoding: chunked
    
    {
        "path":"http://data01.sagebase.org.s3.amazonaws.com/tcga_curation_pacakge.tar.gz?AWSAccessKeyId=44CF9SAMPLEF252F707&Expires=1177363698&Signature=vjSAMPLENmGa%2ByT272YEAiv4%3D",
        "md5sum":"3a4460b6378bea1509954b6c13d84387",
        "type":"awss3"
    }
    

Proposed API for Free Text Search

Proposed Annotation Type API

JSON Schema Resources

Positive Integer Annotation

POST /repo/v1/annotationtype
      {
         "name":"numberOfSamples",
         "displayName":"Number of Samples",
         "schema":{
            "description":"The number of samples in a dataset layer.",
            "type":"integer",
            "minimum":1
         }
      }

Enumerated Type Annotation

POST /repo/v1/annotationtype
      {
         "name":"status",
         "displayName":"Status",
         "schema":{
            "description":"The status of a dataset.",
            "type":"string",
            "enum":[
               "unknown",
               "pending",
               "curated",
               "QCed"
            ],
            "default":"unknown"
         }
      }

Very Long String Annotation

POST /repo/v1/annotationtype
      {
         "name":"curationNotes",
         "displayName":"Curation Notes",
         "schema":{
            "description":"The free text notes on curation that did not have a proper home in the ISA-Tab representation of the dataset metadata.",
            "type":"string",
            "maxLength":4096
         }
      }

Ontology Annotation

POST /repo/v1/annotationtype
      {
         "name":"tissueType",
         "displayName":"Tissue Type",
         "schema":{
            "description":"The type of tissue in a dataset layer.",
            "type":"string",
            "format":"TODO URL to the ontology we are using for this"
         }
      }

Array of Values Annotation

POST /repo/v1/annotationtype
      {
         "name":"tissueTypes",
         "displayName":"Tissue Types",
         "schema":{
            "description":"The types of tissue in a dataset.",
            "type":"array",
            "items":{
               "type":"string",
               "format":"TODO URL to the ontology we are using for this"
            }
         }
      }

Proposed API for Batch Requests

Ultimately we would like to support a full-on batch implementation similar to https://developers.facebook.com/docs/api/batch/ that would allow fully independent requests to be issued in batch

BATCH REQUEST

curl \
    –F 'access_token=…' \
    -F 'batch=[ \
          { "method": "POST", \
            "relative_url": "me/feed", \
            "body": "message=Test status update&link=http://developers.facebook.com/" \
          }, \
          { "method":"GET", \
            "relative_url":"me/feed?limit=1" \
          } \
        ]'\
    https://graph.facebook.com

BATCH RESPONSE

[
    { "code": 200,
      "headers": [
          { "name":"Content-Type", 
            "value":"text/javascript; charset=UTF-8"}
       ],
      "body":"{\"id\":\"…\"}"
    },
    { "code": 200,
      "headers": [
          { "name":"Content-Type", 
            "value":"text/javascript; charset=UTF-8"
          },
          { "name":"ETag", 
            "value": "…"
          }
      ],
      "body": "{\"data\": [{…}]}
    }
]

We have an immediate need for one particular API to support batch requests and in the interest of time could go with something more simple:

BATCH REQUEST

GET /repo/v1/entity/type?batch=123,456,789

BATCH RESPONSE

{
  "paging": {},
  "results": [
              {
               "id":"123",
               "name":"Example Dataset 1",
               "type":"/dataset"
              },
              {
               "id":"456",
               "name":"Example Dataset 2",
               "type":"/dataset"
              },
              {
               "id":"567",
               "name":"Example Dataset 3",
               "type":"/dataset"
              },
             ],
  "totalNumberOfResults": 3
}
And so on:
GET /entity/type?batch=123,456,789
GET /entity/annotations?batch=123,456,789
GET /entity/s3Token?batch=123,456,789
GET /entity/acl?batch=123,456,789
Each returning PaginatedResults<T> where T is what ever /entity/<the id>/<the suffix> would return.

Requirements and Design Goals

Strongly Recommended

  • Different clients should not need different APIs.
    • R and the web ui should be be able to call the same service APIs although the encoding of the request body and result might differ, differences indicated by HTTP headers such as Content-Type and Accept.
  • Read-only requests should be GET requests. This makes them cacheable and bookmark-able.
    • Requests that are not read-only should not be GET requests.
  • All resources in the system should be uniquely identifiable by a particular URL.
    • We should work hard to make sure the mapping of URL to resource is immutable.
    • This URL will return the most recent version. For discussion: An additional query parameter would be used to specify a particular version? Or do we add more to the path to specify the version?
    • The idea is that these resource URLs will occur in publications, hence the need for immutability.
  • APIs must be idempotent.
    • For resource creation requests, include a request ID if there is no other field in the request that could be used to determine whether the creation request is a duplicate.
  • The primary data format for responses will be JSON.
    • We may support additional formats on an as-needed basis such as JSONP, ATOM (XML), and RSS 2.0(XML), and RDF.
    • For any data that might be valuable in mashups, we should strongly consider adding support for JSONP.
  • All APIs that could potentially return more than ~5 items should support pagination parameters.
  • All responses returning resources and/or modifying resources should include an ETag. What is an ETag?
  • Update requests include the ETag via the If-Match header for concurrency control
  • A version number for our API
  • All components in the software stack should speak UTF-8

Nice To Have

  • Partial Responses: query parameters are used to indicate which portions of the resource to return
  • Partial Updates: request parameters specify only the portions of the resource to be updated

Options

Options Considered

  • Generic REST
  • Google Data API
  • OData

Options Ruled Out Early

  • ATOM
    • ATOM is XML only
  • JSON-RPC
    • this is what we used at IMDb (it was not a recommendation of mine though)
    • it is very expressive
    • but "heavy" and overkill for many things
  • SOAP
    • from my reading the trend is that new development is moving away from SOAP because it is too "heavy".

REST Details

REST is merely guidance on your API. Its not actually a protocol.

Good intro to REST presentation http://www.slideshare.net/guestb2ed5f/scalable-reliable-secure-rest

Spring MVC 3.0 added many more features to make REST-ful services easier to write.

Some REST-ful API Examples

Helpful Links to HTTP protocol Spring stuff:

Google Data Protocol Details

Google Data Protocol is a REST-ful protocol based upon ATOM and JSON used by many Google APIs

  • Example requests and responses: http://code.google.com/apis/gdata/docs/2.0/basics.html
  • Protocol reference: http://code.google.com/apis/gdata/docs/2.0/reference.html
  • PUT means update, POST means create, PATCH means partial update
  • the item unique identifier is held in gd:etag field and Etag: header
    • use IF-MATCH header for PUT, PATCH, and DELETE methods
  • For PUT and PATCH you have to specify the original entry's ETag to make sure you don't overwite anyone else's changes.
  • For results formatted as JSON instead of XML, add request parameter alt=json
  • For results formatted as JSONP (JSON wrapped in a script tag) instead of XML, add request parameter alt=json-in-script&callback=myFunction
    • Using callback functions allows you get around some of the cross-domain security issues you might encounter in typical client side JavaScript. Usually browsers prevent you from loading files across domains because of potential security holes and the cross domain attacks that could result.

OData Data

OData is a REST-ful protocol based upon ATOM and JSON and backed by Microsoft.

More Details

APIKeys for Authentication and Authorization

(proposal from Bruce here)

HTTP Methods

Some browsers and/or firewalls do not support PUT or DELETE operations. Here are two commons work arounds:

  • HTTP Header Override
    • X-HTTP-Method-Override: PUT
    • X-HTTP-Method-Override: DELETE
    • used by Google Data Protocol
  • Hidden Form Field
    • _method=PUT
    • _method=DELETE
    • Spring MVC 3.0 implements this via org.springframework.web.filter.HiddenHttpMethodFilter