Service API Design
- 1 Current APIs
- 2 Proposed APIs
- 2.1 Proposed API for EULAs
- 2.1.1 Assumptions
- 2.1.2 EULA entity
- 2.1.3 Dataset to EULA mapping
- 2.1.4 Agreements: User to EULA mapping
- 2.1.5 Downloading Data
- 2.1.6 Making an Agreement
- 2.1.7 Determining in advance whether a user needs to agree:
- 2.2 Proposed API for Projects
- 2.3 Proposed API for Actions
- 2.4 Proposed API for Analyses
- 2.5 Proposed API for downloading datasets or layers
- 2.6 Proposed API for Free Text Search
- 2.7 Proposed Annotation Type API
- 2.8 Proposed API for Batch Requests
- 2.1 Proposed API for EULAs
- 3 Requirements and Design Goals
- 4 Options
- 5 More Details
Current APIs
See Repository Service API for full examples of requests and responses. What follows are merely examples of the URL patterns for accessing the service using the HTTP methods GET to read an entity, POST to create an entity, PUT to update an entity, and DELETE to delete an entity.
Proposed APIs
Proposed API for EULAs
EULA = end user license agreement
Assumptions
One EULA might be used by many datasets
Some datasets may have no EULA at all
There is no way to agree to EULAs in bulk, it must be done once per dataset unless you have write access to the dataset??? (We don't want our scientists to have to agree to the EULA every time they want to download a new dataset they just created.)
EULA entity
A EULA is yet another entity (node) in the repository service and has the following additional fields:
{
"name": "TCGA Redistribution Use Agreement",
"agreement": "The recipient acknowledges that the data herein is provided by TCGA and not SageBionetworks and must abide by ..."
}
The entity kind is 'eula':
POST/GET/PUT/DELETE /repo/v1/eula/#
Dataset to EULA mapping
Datasets will have a new field called eulaId which may be null.
Agreements: User to EULA mapping
There will be a table to record when users agree to EULAs.
The structure of this table will be:
datasetId | datasetVersionId | eulaId | eulaVersionId | userId | agreementDate |
|---|
Downloading Data
When a user tries to get a location for a dataset or layer:
If ((the user does not have write access to the dataset) AND (the dataset has a eulaId))
look in the agreement table for an agreement
If no agreement, return a AgreementNeeded exception
Otherwise return the location
Making an Agreement
To create the agreement
POST /repo/v1/agreement { "datasetId":"123", "eulaId":"456" }Note that values for
userId,datasetVersionId,eulaVersionId, andagreementDatewill all be set by the system at agreement time. Therefore a user must be authenticated and can only create agreements for himself.
Determining in advance whether a user needs to agree:
/repo/v1/query?query=select * from agreement where datasetId=='123' and eulaId='456' and userId='789'
Proposed API for Projects
This work is done.
Get all projects
Get primary fields for a project
Response
{ creationDate: 1/1/2011, leaderUserId: 'david.burdick@sagebase.org', numberOfMembers: 1, numberOfPublications: 0, status: 'Active', projectWebsite: 'http://daveproj.com', description: 'project description text' }
Create a new project
Transfer Object
{ leaderUserId: 'david.burdick@sagebase.org', numberOfMembers: 1, numberOfPublications: 0, status: 'Active', projectWebsite: 'http://daveproj.com', description: 'project description text' }
Update project
Delete a project
Get all annotations for a project
Update annotations for a project
Get all Actions for a project (history)
Create a new Action for a project
Get all Analyses for a project
Get all Datasets for a project
Proposed API for Actions
Create happens from several other areas (i.e. a Project)
Update an Action
Delete an Action
Proposed API for Analyses
Create happens from a project, thus an analysis is always owned
Update an Analysis
Delete an Analysis
Proposed API for downloading datasets or layers
This is mostly done, except for enforcing authentication and authorization along with the eula.
Get a location, user is not logged in
GET http://platform.sagebase.org/repo/v1/location/511 HTTP/1.1 401 Unauthorized Content-Type: application/json Date: Fri, 11 Feb 2011 19:03:18 GMT Server: Google Frontend Cache-Control: private, x-gzip-ok="" Transfer-Encoding: chunked {"reason":"You must be logged in to access data"}Get a location, user has not yet been granted access
GET http://platform.sagebase.org/repo/v1/location/511 HTTP/1.1 403 Forbidden Content-Type: application/json Date: Fri, 11 Feb 2011 19:03:18 GMT Server: Google Frontend Cache-Control: private, x-gzip-ok="" Transfer-Encoding: chunked {"reason":"You are not authorized to access this resource"}Get a location, user previously been granted access
GET http://platform.sagebase.org/repo/v1/location/511 HTTP/1.1 200 OK Content-Type: application/json Date: Fri, 11 Feb 2011 19:03:18 GMT Server: Google Frontend Cache-Control: private, x-gzip-ok="" Transfer-Encoding: chunked { "path":"http://data01.sagebase.org.s3.amazonaws.com/tcga_curation_pacakge.tar.gz?AWSAccessKeyId=44CF9SAMPLEF252F707&Expires=1177363698&Signature=vjSAMPLENmGa%2ByT272YEAiv4%3D", "md5sum":"3a4460b6378bea1509954b6c13d84387", "type":"awss3" }
Proposed API for Free Text Search
Free Text Search datasets
GEThttp://platform.sagebase.org/repo/v1/search?q=aging&type=datasetThis is for a free-text search of "aging". Also will need to do structured queries against specific fields. (E.g. tissue = "Brain", service returns any datasets where tissue is brain or a subclass of brain in an ontology)
Proposed Annotation Type API
JSON Schema Resources
Example schemas: http://json-schema.org/
Draft JSON Schema Spec: http://tools.ietf.org/html/draft-zyp-json-schema-03
Schema to Java POJO code generator: http://code.google.com/p/jsonschema2pojo/
Java POJO to Schema generator: http://wiki.fasterxml.com/JacksonJsonSchemaGeneration
Google is using JSON schema to describe its APIs
Positive Integer Annotation
POST /repo/v1/annotationtype
{
"name":"numberOfSamples",
"displayName":"Number of Samples",
"schema":{
"description":"The number of samples in a dataset layer.",
"type":"integer",
"minimum":1
}
}
Enumerated Type Annotation
POST /repo/v1/annotationtype
{
"name":"status",
"displayName":"Status",
"schema":{
"description":"The status of a dataset.",
"type":"string",
"enum":[
"unknown",
"pending",
"curated",
"QCed"
],
"default":"unknown"
}
}
Very Long String Annotation
POST /repo/v1/annotationtype
{
"name":"curationNotes",
"displayName":"Curation Notes",
"schema":{
"description":"The free text notes on curation that did not have a proper home in the ISA-Tab representation of the dataset metadata.",
"type":"string",
"maxLength":4096
}
}
Ontology Annotation
POST /repo/v1/annotationtype
{
"name":"tissueType",
"displayName":"Tissue Type",
"schema":{
"description":"The type of tissue in a dataset layer.",
"type":"string",
"format":"TODO URL to the ontology we are using for this"
}
}
Array of Values Annotation
POST /repo/v1/annotationtype
{
"name":"tissueTypes",
"displayName":"Tissue Types",
"schema":{
"description":"The types of tissue in a dataset.",
"type":"array",
"items":{
"type":"string",
"format":"TODO URL to the ontology we are using for this"
}
}
}
Proposed API for Batch Requests
Ultimately we would like to support a full-on batch implementation similar to https://developers.facebook.com/docs/api/batch/ that would allow fully independent requests to be issued in batch
BATCH REQUEST
curl \
–F 'access_token=…' \
-F 'batch=[ \
{ "method": "POST", \
"relative_url": "me/feed", \
"body": "message=Test status update&link=http://developers.facebook.com/" \
}, \
{ "method":"GET", \
"relative_url":"me/feed?limit=1" \
} \
]'\
https://graph.facebook.comBATCH RESPONSE
[
{ "code": 200,
"headers": [
{ "name":"Content-Type",
"value":"text/javascript; charset=UTF-8"}
],
"body":"{\"id\":\"…\"}"
},
{ "code": 200,
"headers": [
{ "name":"Content-Type",
"value":"text/javascript; charset=UTF-8"
},
{ "name":"ETag",
"value": "…"
}
],
"body": "{\"data\": [{…}]}
}
]We have an immediate need for one particular API to support batch requests and in the interest of time could go with something more simple:
BATCH REQUEST
GET /repo/v1/entity/type?batch=123,456,789BATCH RESPONSE
{
"paging": {},
"results": [
{
"id":"123",
"name":"Example Dataset 1",
"type":"/dataset"
},
{
"id":"456",
"name":"Example Dataset 2",
"type":"/dataset"
},
{
"id":"567",
"name":"Example Dataset 3",
"type":"/dataset"
},
],
"totalNumberOfResults": 3
}And so on:
GET /entity/type?batch=123,456,789
GET /entity/annotations?batch=123,456,789
GET /entity/s3Token?batch=123,456,789
GET /entity/acl?batch=123,456,789
Each returning PaginatedResults<T> where T is what ever /entity/<the id>/<the suffix> would return.
Requirements and Design Goals
Strongly Recommended
Different clients should not need different APIs.
R and the web ui should be be able to call the same service APIs although the encoding of the request body and result might differ, differences indicated by HTTP headers such as
Content-TypeandAccept.
Read-only requests should be GET requests. This makes them cacheable and bookmark-able.
Requests that are not read-only should not be GET requests.
All resources in the system should be uniquely identifiable by a particular URL.
We should work hard to make sure the mapping of URL to resource is immutable.
This URL will return the most recent version. For discussion: An additional query parameter would be used to specify a particular version? Or do we add more to the path to specify the version?
The idea is that these resource URLs will occur in publications, hence the need for immutability.
APIs must be idempotent.
For resource creation requests, include a request ID if there is no other field in the request that could be used to determine whether the creation request is a duplicate.
The primary data format for responses will be JSON.
We may support additional formats on an as-needed basis such as JSONP, ATOM (XML), and RSS 2.0(XML), and RDF.
For any data that might be valuable in mashups, we should strongly consider adding support for JSONP.
All APIs that could potentially return more than ~5 items should support pagination parameters.
All responses returning resources and/or modifying resources should include an ETag. What is an ETag?
Update requests include the ETag via the
If-Matchheader for concurrency controlA version number for our API
All components in the software stack should speak UTF-8
Nice To Have
Partial Responses: query parameters are used to indicate which portions of the resource to return
Partial Updates: request parameters specify only the portions of the resource to be updated
Options
Options Considered
Generic REST
Google Data API
OData
Options Ruled Out Early
ATOM
ATOM is XML only
JSON-RPC
this is what we used at IMDb (it was not a recommendation of mine though)
it is very expressive
but "heavy" and overkill for many things
SOAP
from my reading the trend is that new development is moving away from SOAP because it is too "heavy".
REST Details
REST is merely guidance on your API. Its not actually a protocol.
Good intro to REST presentation http://www.slideshare.net/guestb2ed5f/scalable-reliable-secure-rest
Spring MVC 3.0 added many more features to make REST-ful services easier to write.
Good introductory slide shows on Spring MVC 3.0 REST features http://www.slideshare.net/habuma/spring-mvc-rest
Versus JSX-RS (also a good introduction): http://www.infoq.com/articles/springmvc_jsx-rs
Spring MVC 3.0 Documentation http://static.springsource.org/spring/docs/3.0.x/spring-framework-reference/html/mvc.html
More server-side info
Client-side info http://blog.springsource.com/2009/03/27/rest-in-spring-3-resttemplate/
Some REST-ful API Examples
Facebook Graph API
NextBio Query API
Helpful Links to HTTP protocol Spring stuff:
HTTP status codes http://static.springsource.org/spring/docs/3.0.x/javadoc-api/org/springframework/http/HttpStatus.html
Exception Handlers http://pietrowski.info/2010/06/spring-mvc-exception-handler/ an example
Google Data Protocol Details
Google Data Protocol is a REST-ful protocol based upon ATOM and JSON used by many Google APIs
Example requests and responses: http://code.google.com/apis/gdata/docs/2.0/basics.html
Protocol reference: http://code.google.com/apis/gdata/docs/2.0/reference.html
PUT means update, POST means create, PATCH means partial update
the item unique identifier is held in gd:etag field and Etag: header
use
IF-MATCHheader for PUT, PATCH, and DELETE methods
For PUT and PATCH you have to specify the original entry's ETag to make sure you don't overwite anyone else's changes.
For results formatted as JSON instead of XML, add request parameter
alt=jsonFor results formatted as JSONP (JSON wrapped in a script tag) instead of XML, add request parameter
alt=json-in-script&callback=myFunction.Using callback functions allows you get around some of the cross-domain security issues you might encounter in typical client side JavaScript. Usually browsers prevent you from loading files across domains because of potential security holes and the cross domain attacks that could result.
OData Data
OData is a REST-ful protocol based upon ATOM and JSON and backed by Microsoft.
Protocol Reference http://www.odata.org/developers/protocols/operations
PUT means update, POST means create, MERGE means partial update
use
IF-MATCHheader for PUT, MERGE, and DELETE methods
Uses
$valueat the end of a URL to indicate raw data instead of OData formatted data
More Details
APIKeys for Authentication and Authorization
(proposal from Bruce here)
HTTP Methods
Some browsers and/or firewalls do not support PUT or DELETE operations. Here are two commons work arounds:
HTTP Header Override
X-HTTP-Method-Override: PUTX-HTTP-Method-Override: DELETEused by Google Data Protocol
Hidden Form Field
_method=PUT_method=DELETESpring MVC 3.0 implements this via org.springframework.web.filter.HiddenHttpMethodFilter