Service API Design
Current APIs
See Repository Service API for full examples of requests and responses. What follows are merely examples of the URL patterns for accessing the service using the HTTP methods GET to read an entity, POST to create an entity, PUT to update an entity, and DELETE to delete an entity.
Proposed APIs
Proposed API for EULAs
EULA = end user license agreement
Assumptions
- One EULA might be used by many datasets
- Some datasets may have no EULA at all
- There is no way to agree to EULAs in bulk, it must be done once per dataset unless you have write access to the dataset??? (We don't want our scientists to have to agree to the EULA every time they want to download a new dataset they just created.)
EULA entity
A EULA is yet another entity (node) in the repository service and has the following additional fields:
{ "name": "TCGA Redistribution Use Agreement", "agreement": "The recipient acknowledges that the data herein is provided by TCGA and not SageBionetworks and must abide by ..." }
The entity kind is 'eula':
POST/GET/PUT/DELETE /repo/v1/eula/#
Dataset to EULA mapping
Datasets will have a new field called eulaId
which may be null.
Agreements: User to EULA mapping
There will be a table to record when users agree to EULAs.
The structure of this table will be:
datasetId | datasetVersionId | eulaId | eulaVersionId | userId | agreementDate |
---|
Downloading Data
When a user tries to get a location for a dataset or layer:
- If ((the user does not have write access to the dataset) AND (the dataset has a eulaId))
- look in the agreement table for an agreement
- If no agreement, return a AgreementNeeded exception
- look in the agreement table for an agreement
- Otherwise return the location
Making an Agreement
To create the agreement
POST /repo/v1/agreement { "datasetId":"123", "eulaId":"456" }
- Note that values for
userId
,datasetVersionId
,eulaVersionId
, andagreementDate
will all be set by the system at agreement time. Therefore a user must be authenticated and can only create agreements for himself.
Determining in advance whether a user needs to agree:
/repo/v1/query?query=select * from agreement where datasetId=='123' and eulaId='456' and userId='789'
Proposed API for Projects
This work is done.
- Get all projects
- Get primary fields for a project
GET
http://platform.sagebase.org/repo/v1/project/812Response
{ creationDate: 1/1/2011, leaderUserId: 'david.burdick@sagebase.org', numberOfMembers: 1, numberOfPublications: 0, status: 'Active', projectWebsite: 'http://daveproj.com', description: 'project description text' }
- Create a new project
POST
http://platform.sagebase.org/repo/v1/projectTransfer Object
{ leaderUserId: 'david.burdick@sagebase.org', numberOfMembers: 1, numberOfPublications: 0, status: 'Active', projectWebsite: 'http://daveproj.com', description: 'project description text' }
- Update project
- Delete a project
- Get all annotations for a project
- Update annotations for a project
- Get all Actions for a project (history)
- Create a new Action for a project
- Get all Analyses for a project
- Get all Datasets for a project
Proposed API for Actions
- Create happens from several other areas (i.e. a Project)
- Update an Action
- Delete an Action
Proposed API for Analyses
- Create happens from a project, thus an analysis is always owned
- Update an Analysis
- Delete an Analysis
Proposed API for downloading datasets or layers
This is mostly done, except for enforcing authentication and authorization along with the eula.
Get a location, user is not logged in
GET http://platform.sagebase.org/repo/v1/location/511 HTTP/1.1 401 Unauthorized Content-Type: application/json Date: Fri, 11 Feb 2011 19:03:18 GMT Server: Google Frontend Cache-Control: private, x-gzip-ok="" Transfer-Encoding: chunked {"reason":"You must be logged in to access data"}
Get a location, user has not yet been granted access
GET http://platform.sagebase.org/repo/v1/location/511 HTTP/1.1 403 Forbidden Content-Type: application/json Date: Fri, 11 Feb 2011 19:03:18 GMT Server: Google Frontend Cache-Control: private, x-gzip-ok="" Transfer-Encoding: chunked {"reason":"You are not authorized to access this resource"}
Get a location, user previously been granted access
GET http://platform.sagebase.org/repo/v1/location/511 HTTP/1.1 200 OK Content-Type: application/json Date: Fri, 11 Feb 2011 19:03:18 GMT Server: Google Frontend Cache-Control: private, x-gzip-ok="" Transfer-Encoding: chunked { "path":"http://data01.sagebase.org.s3.amazonaws.com/tcga_curation_pacakge.tar.gz?AWSAccessKeyId=44CF9SAMPLEF252F707&Expires=1177363698&Signature=vjSAMPLENmGa%2ByT272YEAiv4%3D", "md5sum":"3a4460b6378bea1509954b6c13d84387", "type":"awss3" }
Proposed API for Free Text Search
- Free Text Search datasets
GET
http://platform.sagebase.org/repo/v1/search?q=aging&type=dataset
- This is for a free-text search of "aging". Also will need to do structured queries against specific fields. (E.g. tissue = "Brain", service returns any datasets where tissue is brain or a subclass of brain in an ontology)
Proposed Annotation Type API
JSON Schema Resources
- Example schemas: http://json-schema.org/
- Draft JSON Schema Spec: http://tools.ietf.org/html/draft-zyp-json-schema-03
- Schema to Java POJO code generator: http://code.google.com/p/jsonschema2pojo/
- Java POJO to Schema generator: http://wiki.fasterxml.com/JacksonJsonSchemaGeneration
- Google is using JSON schema to describe its APIs
Positive Integer Annotation
POST /repo/v1/annotationtype { "name":"numberOfSamples", "displayName":"Number of Samples", "schema":{ "description":"The number of samples in a dataset layer.", "type":"integer", "minimum":1 } }
Enumerated Type Annotation
POST /repo/v1/annotationtype { "name":"status", "displayName":"Status", "schema":{ "description":"The status of a dataset.", "type":"string", "enum":[ "unknown", "pending", "curated", "QCed" ], "default":"unknown" } }
Very Long String Annotation
POST /repo/v1/annotationtype { "name":"curationNotes", "displayName":"Curation Notes", "schema":{ "description":"The free text notes on curation that did not have a proper home in the ISA-Tab representation of the dataset metadata.", "type":"string", "maxLength":4096 } }
Ontology Annotation
POST /repo/v1/annotationtype { "name":"tissueType", "displayName":"Tissue Type", "schema":{ "description":"The type of tissue in a dataset layer.", "type":"string", "format":"TODO URL to the ontology we are using for this" } }
Array of Values Annotation
POST /repo/v1/annotationtype { "name":"tissueTypes", "displayName":"Tissue Types", "schema":{ "description":"The types of tissue in a dataset.", "type":"array", "items":{ "type":"string", "format":"TODO URL to the ontology we are using for this" } } }
Proposed API for Batch Requests
Ultimately we would like to support a full-on batch implementation similar to https://developers.facebook.com/docs/api/batch/ that would allow fully independent requests to be issued in batch
BATCH REQUEST
curl \ –F 'access_token=…' \ -F 'batch=[ \ { "method": "POST", \ "relative_url": "me/feed", \ "body": "message=Test status update&link=http://developers.facebook.com/" \ }, \ { "method":"GET", \ "relative_url":"me/feed?limit=1" \ } \ ]'\ https://graph.facebook.com
BATCH RESPONSE
[ { "code": 200, "headers": [ { "name":"Content-Type", "value":"text/javascript; charset=UTF-8"} ], "body":"{\"id\":\"…\"}" }, { "code": 200, "headers": [ { "name":"Content-Type", "value":"text/javascript; charset=UTF-8" }, { "name":"ETag", "value": "…" } ], "body": "{\"data\": [{…}]} } ]
We have an immediate need for one particular API to support batch requests and in the interest of time could go with something more simple:
BATCH REQUEST
GET /repo/v1/entity/type?batch=123,456,789
BATCH RESPONSE
{ "paging": {}, "results": [ { "id":"123", "name":"Example Dataset 1", "type":"/dataset" }, { "id":"456", "name":"Example Dataset 2", "type":"/dataset" }, { "id":"567", "name":"Example Dataset 3", "type":"/dataset" }, ], "totalNumberOfResults": 3 }
GET /entity/type?batch=123,456,789
GET /entity/acl?batch=123,456,789
Requirements and Design Goals
Strongly Recommended
- Different clients should not need different APIs.
- R and the web ui should be be able to call the same service APIs although the encoding of the request body and result might differ, differences indicated by HTTP headers such as
Content-Type
andAccept
.
- R and the web ui should be be able to call the same service APIs although the encoding of the request body and result might differ, differences indicated by HTTP headers such as
- Read-only requests should be GET requests. This makes them cacheable and bookmark-able.
- Requests that are not read-only should not be GET requests.
- All resources in the system should be uniquely identifiable by a particular URL.
- We should work hard to make sure the mapping of URL to resource is immutable.
- This URL will return the most recent version. For discussion: An additional query parameter would be used to specify a particular version? Or do we add more to the path to specify the version?
- The idea is that these resource URLs will occur in publications, hence the need for immutability.
- APIs must be idempotent.
- For resource creation requests, include a request ID if there is no other field in the request that could be used to determine whether the creation request is a duplicate.
- The primary data format for responses will be JSON.
- We may support additional formats on an as-needed basis such as JSONP, ATOM (XML), and RSS 2.0(XML), and RDF.
- For any data that might be valuable in mashups, we should strongly consider adding support for JSONP.
- All APIs that could potentially return more than ~5 items should support pagination parameters.
- All responses returning resources and/or modifying resources should include an ETag. What is an ETag?
- Update requests include the ETag via the
If-Match
header for concurrency control - A version number for our API
- All components in the software stack should speak UTF-8
Nice To Have
- Partial Responses: query parameters are used to indicate which portions of the resource to return
- Partial Updates: request parameters specify only the portions of the resource to be updated
Options
Options Considered
- Generic REST
- Google Data API
- OData
Options Ruled Out Early
- ATOM
- ATOM is XML only
- JSON-RPC
- this is what we used at IMDb (it was not a recommendation of mine though)
- it is very expressive
- but "heavy" and overkill for many things
- SOAP
- from my reading the trend is that new development is moving away from SOAP because it is too "heavy".
REST Details
REST is merely guidance on your API. Its not actually a protocol.
Good intro to REST presentation http://www.slideshare.net/guestb2ed5f/scalable-reliable-secure-rest
Spring MVC 3.0 added many more features to make REST-ful services easier to write.
- Good introductory slide shows on Spring MVC 3.0 REST features http://www.slideshare.net/habuma/spring-mvc-rest
- Versus JSX-RS (also a good introduction): http://www.infoq.com/articles/springmvc_jsx-rs
- Spring MVC 3.0 Documentation http://static.springsource.org/spring/docs/3.0.x/spring-framework-reference/html/mvc.html
- More server-side info
- Client-side info http://blog.springsource.com/2009/03/27/rest-in-spring-3-resttemplate/
Some REST-ful API Examples
- LinkedIn http://blog.linkedin.com/2009/07/08/brandon-duncan-java-one-building-consistent-restful-apis-in-a-high-performance-environment/
- Facebook Graph API
- NextBio Query API
Helpful Links to HTTP protocol Spring stuff:
- HTTP status codes http://static.springsource.org/spring/docs/3.0.x/javadoc-api/org/springframework/http/HttpStatus.html
- Annotations http://static.springsource.org/spring/docs/3.0.x/javadoc-api/org/springframework/web/bind/annotation/package-tree.html
- Exception Handlers http://pietrowski.info/2010/06/spring-mvc-exception-handler/ an example
Google Data Protocol Details
Google Data Protocol is a REST-ful protocol based upon ATOM and JSON used by many Google APIs
- Example requests and responses: http://code.google.com/apis/gdata/docs/2.0/basics.html
- Protocol reference: http://code.google.com/apis/gdata/docs/2.0/reference.html
- PUT means update, POST means create, PATCH means partial update
- the item unique identifier is held in gd:etag field and Etag: header
- use
IF-MATCH
header for PUT, PATCH, and DELETE methods
- use
- For PUT and PATCH you have to specify the original entry's ETag to make sure you don't overwite anyone else's changes.
- For results formatted as JSON instead of XML, add request parameter
alt=json
- For results formatted as JSONP (JSON wrapped in a script tag) instead of XML, add request parameter
alt=json-in-script&callback=myFunction
.- Using callback functions allows you get around some of the cross-domain security issues you might encounter in typical client side JavaScript. Usually browsers prevent you from loading files across domains because of potential security holes and the cross domain attacks that could result.
OData Data
OData is a REST-ful protocol based upon ATOM and JSON and backed by Microsoft.
- Overview http://www.odata.org/developers/protocols/overview
- Protocol Reference http://www.odata.org/developers/protocols/operations
- PUT means update, POST means create, MERGE means partial update
- use
IF-MATCH
header for PUT, MERGE, and DELETE methods
- use
- Uses
$value
at the end of a URL to indicate raw data instead of OData formatted data
More Details
APIKeys for Authentication and Authorization
(proposal from Bruce here)
HTTP Methods
Some browsers and/or firewalls do not support PUT or DELETE operations. Here are two commons work arounds:
- HTTP Header Override
X-HTTP-Method-Override: PUT
X-HTTP-Method-Override: DELETE
- used by Google Data Protocol
- Hidden Form Field
_method=PUT
_method=DELETE
- Spring MVC 3.0 implements this via org.springframework.web.filter.HiddenHttpMethodFilter