Data Repository Service (DRS)
Introduction
The Data Repository Service (DRS) API provides a generic interface to data repositories so data consumers, including workflow systems, can access data objects in a single, standard way regardless of where they are stored and how they are managed. The primary functionality of DRS is map a logical ID to a means for physically retrieving the data represented by the ID.
DRS is a standard way for data producers to make their data available to data consumers, that supports the control needs of the former and the access needs of the latter. And we need it to be interoperable, so anyone who builds access tools and systems can be confident they’ll work with all the data out there, and anyone who publishes data can be confident it will work with all the tools out there.
Use Cases
Data Producer has stored data using Synapse platform. If data consumer is using some other platform to consume the the data. So there is no way to consume the data of Synapse until there is a common tool which provide access across multiple platform.
Data should be uploaded to Synapse by using our existing APIs by data producer and Data consumer should use DRS APIs to consume data. The common standard DRS makes data platform independent.
User Type
Data Producer : Anyone who has authorization can upload the data
Data Consumer : Anyone who has authorization can download the data
DRS URIs
Hostname-based DRS URIs should be chosen as the ID is always percent-encoded to ensure special characters do not interfere with subsequent DRS endpoint calls and are simple. They contain the DRS host name and the DRS ID only and can be converted directly into a fetch-able URL based on a simple rule.
drs://<hostname>/<id>
<hostname> = repo-prod.prod.sagebase.org
<id> = syn32042766.1 (synapse ID plus version)
e.g drs://repo-prod.prod.sagebase.org/syn32042766.1
The client makes a GET request to the DRS server, using the standard DRS URL syntax:
drs://repo-prod.prod.sagebase.org/syn32042766.1
which will be converted by workflow system to url syntax as below:
GET https://repo-prod.prod.sagebase.org/ga4gh/drs/v1/objects/syn32042766.1
Prod hostname :repo-prod.prod.sagebase.org
Staging hostname : repo-staging.staging.sagebase.org
Datatypes
DRS v1 supports two types of content:
Blob is a file — A Drs blob is a FileEntity and represented by a DrsObject without a contents array.
Bundle is a dataset— A Drs bundle is a Dataset and represented by a DrsObject with a contents array
Schema
DRS Object Json:
{
"id":"string",
"name":"string",
"self_uri":"drs://repo-prod.prod.sagebase.org/32042766.1",
"size":0,
"created_time":"2019-08-24T14:15:22Z",
"updated_time":"2019-08-24T14:15:22Z",
"version":"string",
"mime_type":"application/json",
"checksums":[
{
}
],
"access_methods":[
{
}
],
"contents":[
{
}
],
"description":"string"
}
Checksums json:
{
"checksum": "string",
"type": "md5"
}
Access method json:
{
"type": "https",
"access_id": "string",
}
Access url json:
Contents json:
DRS Object Attribute Description
Attribute name | Blob (File) | Bundle (Dataset) |
---|---|---|
id | A DRS id should be Synapse file id with version which makes it immutable e.g syn32132536.1 or a file handle ID prepended with the string “fh” (e.g., fh56789345)). | A DRS id should be Synapse dataset id with version which makes it immutable e.g syn32132349.1 |
name | Name of file e.g Test3.pages | Name of Dataset e.g Test Dataset |
self_uri | A drs URI, as defined in the DRS documentation, that tells clients how to access this object. | A drs URI, as defined in the DRS documentation, that tells clients how to access this object. |
size | File size in bytes eg 85.7 kb is 857000 | For dataset the cumulative size, in bytes, of files it contains.
|
created_time | Timestamp of file creation | Timestamp of dataset creation |
updated_time | Timestamp of file updation | Timestamp of dataset updation |
version | A string representing a version e.g 3 | A string representing a version e.g 1 |
mime_type | Has no mime type | |
checksums | e.g d269b370219876bb6ace9a1ce190d730 | The checksum is computed over a sorted concatenation of the checksums of its top-level contained objects(not recursive, names not included). The list of checksums is sorted alphabetically (hex-code) before concatenation and a further checksum is performed on the concatenated checksum value. For example, if a dataset contains two files i.e file1 and file 2. Then the checksum of the bundle is: md5( concat( sort( md5file1,md5file2 ) ) )
|
access_methods | access method will provide access id and the type will be https. | Has no access_method |
contents | Has no contents. | List of object inside bundle.If the user has access on dataset then the content will contain list of all the files under dataset irrespective of access on each file level.
|
description | Description of file. | Description of Dataset. |
checksums attribute description
Attribute name | Description |
---|---|
checksum | The hex-string encoded checksum for the data. e.g b15bd58c8f0946b636545d8309bf0f27 |
type | The digest method used to create the checksum. e.g md5 |
access_method attribute description
Attribute name | Description |
---|---|
type | Type of the access method e.g https |
access_id | Access id should be generated by FileHandleAssociationType, syn_id and filehandle_id and concatenating them by '-'. FileHandleAssociationType_<syn_id> '_’ <filehandle_id>. Where <syn_id> is .e.g FileEntity_syn123.1_56789345 or, if the DRS object is being retrieved with a file handle ID, the Access id will be the file handle ID prepended with the string “fh”. e.g., fh56789345.
|
contents attribute description
Attribute name | Description |
---|---|
name | A name declared by the bundle author that must be used when materializing this object, overriding any name directly associated with the object itself. The name must be unique with the containing bundle. This string is made up of uppercase and lowercase letters, decimal digits, hypen, period, and underscore. e.g syn32132536.1 as synID is unique. |
id | A DRS-identifier of a DrsObject e.g syn32132536.1 |
A list of full DRS identifier URI paths that may be used to obtain the object. These URIs may be external to this DRS instance. e.g drs://repo-prod.prod.sagebase.org/syn32132536.1 |
Note
Nesting of bundle(dataset containing dataset) is not supported.
EndPoints
1.Get information about a DRSObject
The get information about a DRSObject API will provide information about the DrsObject which can be file or dataset as shown below in json example. DrsObject is fetched by drsId i.e Synapse Id plus version which makes it immutable, or the file handle ID prepended with the string “fh” (e.g., fh123)).
https://{serverURL}/ga4gh/drs/v1/objects/{object_id}
HTTP method : GET
Path Parameters :
object_id: object id is drs object id i.e Synapse Id plus version which makes it immutable, or the file handle ID prepended with the string “fh” (e.g., fh123)).
Authorization :
Bearer Auth should be done on controller level as done for all other API’s.
Bundle (dataset) Example:
Dataset syn32132349 is created which contains 2 files syn31538774.3 and syn32132536.1.
Request url: https://repo-prod.prod.sagebase.org/ga4gh/drs/v1/objects/syn32132349.1
REQUEST BODY SCHEMA: application/json
expand : If false and the object_id refers to a bundle, then the ContentsObject array contains only those objects directly contained in the bundle.
If true and the object_id refers to a bundle, response with 400 http status code and message
“ nesting of bundle is not supported” will be returned.
If the object_id refers to a blob, then the query parameter is ignored.
RESPONSE BODY SCHEMA: application/json
RESPONSE CODE: 200
Blob (file) example with Synapse ID as Object ID:
Request Url: https://repo-prod.prod.sagebase.org/ga4gh/drs/v1/objects/syn31538774.3
RESPONSE BODY SCHEMA: application/json
Blob (file) example with file handle ID as Object ID:
Request Url: https://repo-prod.prod.sagebase.org/ga4gh/drs/v1/objects/fh56789345
RESPONSE BODY SCHEMA: application/json
HTTP Responses
HTTP Code | Description | Schema |
---|---|---|
200 | The DrsObject was found successfully. | DrsObject |
400 | The request is malformed. | Error |
401 | The request is unauthorized. | Error |
403 | The requester is not authorized to perform this action. | Error |
404 | The requested DrsObject wasn’t found. | Error |
500 | An unexpected error occurred. | Error |
2. Get a URL for fetching bytes
The get a url for fetching byte API will provide the actual url of blob for example s3 bucket, google cloud etc, from where file can be downloaded.
https://{serverURL}/ga4gh/drs/v1/objects/{object_id}/access/{access_id}
HTTP method : Get
Path parameters :
object_id: Object id is drs object id. i.e Synapse Id plus version which makes it immutable or the file handle ID prepended with the string “fh” (e.g., fh123)).
access_id: Access id from access methods list of drs object.
Authorization :
Bearer Auth should be done on controller level as done for all other API’s.
Blob (file) example with Synapse ID as Object ID:
https://repo-prod.prod.sagebase.org/ga4gh/drs/v1/objects/syn32042766.1/access/FileEntity_syn31538774.3_56789345
REQUEST BODY SCHEMA: None
RESPONSE BODY SCHEMA: application/json
The presigned url will be sent to the user and file can be downloaded directly from the url without any authentication. As presigned url has tokens included, which expires with time.
Blob (file) example with file handle ID as Object ID:
https://repo-prod.prod.sagebase.org/ga4gh/drs/v1/objects/fh56789345/access/fh56789345
REQUEST BODY SCHEMA: None
RESPONSE BODY SCHEMA: application/json
The presigned url will be sent to the user and the file can be downloaded directly from the url without any authentication, as the presigned url has tokens included, which expires with time.
HTTP Responses
HTTP Code | Description | Schema |
---|---|---|
200 | The DrsObject was found successfully. | Access url |
400 | The request is malformed. | Error |
401 | The request is unauthorized. | Error |
403 | The requester is not authorized to perform this action. | Error |
404 | The requested DrsObject wasn’t found. | Error |
500 | An unexpected error occurred. | Error |
3. Get information about DRS service
The GA4GH Service Registry API specification allows information about GA4GH-compliant web services, including DRS services, to be aggregated into registries and made available via a standard API. The following considerations should be followed when registering DRS services within a service registry.
The DRS service attributes returned by /service-info (i.e. id, name, description, etc.) should have the same values as the registry entry for that service.
The value of the type object's artifact property should be drs (i.e. the same as it appears in service-info)
Each entry in a Service Registry must have a url, indicating the base URL to the web service. For DRS services, the registered url must include everything up to the standardized /ga4gh/drs/v1 path.
https://{serverURL}/ga4gh/drs/v1/service-info
HTTP method : Get
Path parameters : None
Authorization : None
Example url: https://repo-prod.prod.sagebase.org/ga4gh/drs/v1/service-info
REQUEST BODY SCHEMA: None
RESPONSE BODY SCHEMA: application/json
Attribute description :
Attribute name | description |
---|---|
id | Unique ID of this service. Reverse domain name notation is recommended, though not required. The identifier should attempt to be globally unique so it can be used in downstream aggregator services e.g. Service Registry |
name | Name of this service. Should be human readable. |
type | Type of a GA4GH service. |
description | Description of the service. Should be human readable and provide information about the service. |
organization | Organization providing the service. |
contactUrl | URL of the contact for the provider of this service, e.g. a link to a contact form (RFC 3986 format), or an email (RFC 2368 format). |
documentationUrl | URL of the documentation of this service (RFC 3986 format). This should help someone learn how to use your service, including any specifics required to access data, e.g. authentication. |
createdAt | Timestamp describing when the service was first deployed and available (RFC 3339 format) |
updatedAt | Timestamp describing when the service was first deployed and available (RFC 3339 format) |
environment | Environment the service is running in. Use this to distinguish between production, development and testing/staging deployments. Suggested values are prod, test, dev, staging. However this is advised and not enforced. |
version | Version of the release in which we will deliver the DRS API. |
url | DRS Service base url for the provider of service. |
Error Response
In case of request failure error should be thrown with error message and status code.