This documentation describes the data format Bridge expects. For documentation on how to upload files to the Bridge Server, see Bridge REST API

Overview

Each upload is a bundle with a series of files. Each file bundle is then zipped using the ZIP format, consistent with http://docs.oracle.com/javase/7/docs/api/java/util/zip/ZipInputStream.html. Each ZIP archive is then encrypted using the public encryption key installed in each app.

Encrypted file
* ZIP file
  * info.json
  * foo.json
  * bar.json
  * baz.json

iOS ResearchKit Uploads

info.json

Upload bundles from iOS ResearchKit apps must include a file called info.json. This file is in JSON format and looks like:

  {
    "files" : [ {
      "filename" : "audio_countdown.m4a",
      "timestamp" : "2015-03-02T03:26:59-08:00"
    }, {
      "filename" : "audio_audio.m4a",
      "timestamp" : "2015-03-02T03:27:09-08:00"
    } ],
    "item" : "Voice Activity"
  }

info.json must contain a list of files. Each file in the bundle must have a corresponding entry in info.json's file list, and vice versa. Each file entry must have a filename and a timestamp representing when the file's data was measured and written. If the data was measured over a long period of time, the timestamp should represent when the data was last measured and written. The timestamp must be in ISO 8601 format (http://en.wikipedia.org/wiki/ISO_8601).

info.json must also include item, which is a human-readable String describing the activity that this data is measured from. The item name must uniquely distinguish the activity from all other activities within the app, and each activity in the app must have the same item name.

info.json may include other fields, in either the top-level struct, or in the file entries, or both. These additional fields are used as metadata for the researchers and have no restrictions imposed by the Bridge server. These fields could include things like app name, app version, OS version, etc.

Surveys

Surveys consist of metadata in the info.json file as well as individual files for each survey answer.

Each answer to each individual survey question is stored in its own file. That file is in JSON format and looks like:

  {
    "questionType" : 6,
    "booleanAnswer" : true,
    "startDate" : "2015-02-15T19:35:10+0000",
    "questionTypeName" : "Boolean",
    "item" : "limitations",
    "endDate" : "2015-02-15T19:35:12+0000"
  }

Each survey answer file must include the following keys: item, startDate, endDate, questionType, questionTypeName.

The item field is a unique identifier that uniquely identifies the survey question in the survey.

startDate and endDate are metadata representing when the user started and finished answering the question. The app may use the same value for both startDate and endDate. Both of these fields are timestamps in ISO 8601 format.

The questionType and questionTypeName fields represent the data type of the answer. questionTypeName is a string representation of the answer field type. Possible questionTypeName values include: Boolean, Date, Decimal, Integer, MultipleChoice, None, Scale, SingleChoice, Text, TimeInterval. questionType is the numeric representation that maps to the string representation of the question type. For documentation on the valid question types and their string and numeric type representations, please consult the ResearchKit documentation.

Each survey file must also have a field representing the survey answer. The name of this field varies with each question type. Possible answer field names include: booleanAnswer, dateAnswer, intervalAnswer, numericAnswer, scaleAnswer, textAnswer. The value type of this field depends on the question type. There may also be additional fields required for the question type, such as unit. For more information, please consult the ResearchKit documentation.

The survey must also have a corresponding schema (see Schemas). The "item" field in info.json must match the schema ID. There must be a field definition in the schema for every "item" field in every possible survey file in the upload bundle. For example, if your survey info.json declares "item":"DailySurvey", and it has files foo.json, bar.json, baz.json with "item":"foo", "item":"bar", and "item":"baz", then you must have a schema with ID "DailySurvey" with fields "foo", "bar", and "baz". TODO: Support server-side surveys, that match the survey and questions using guids and the survey API instead of Upload Schemas.

Non-JSON Data

Some activities will produce non-JSON data, such as audio files or CSVs. If there is at least one non-JSON file in the upload bundle, the Bridge server will automatically detect this and treat the upload bundle as a non-JSON data bundle, even if there is other JSON data in the bundle.

The Bridge server will first use the value of the "item" field in info.json to identify the schema, matching the schema name (not ID) with the item name in info.json. (TODO: Update Bridge server to match with schema ID, then fall back to schema name.) The Bridge server will then match each file in the bundle with the corresponding field in the schema, using the filename. For example, if your bundle contains the files voice_recording.wav, accelerometer_data.csv, and metadata.json, it will match them up with the field names voice_recording.wav, accelerometer_data.csv, metadata.json. For more information about Schemas, see Schemas

info.json will still be preserved as metadata and still be used as JSON data.

JSON Data

Editor's note: ResearchKit data does not tag itself with the data's schema ID, so the Bridge server has to parse into the data to determine its schema. This may or may not be improved in the future.

If an upload bundle is neither a survey nor non-JSON data, then the Bridge server will treat the bundle as a JSON bundle.

The Bridge server first determines the fields the data represents. Since different files can have fields with the same name, it will use filenames to prefix the file's field names to generate the bundle's field names. For example, if your data looks like

foo.json

{
  "xyz":"sample field xyz",
  "persistence":"up",
  "color":"chartreuse"
}

bar.json

{
  "speed":88,
  "speedUnit":"mph",
  "color":"tope"
}

Then your data will have field names "foo.json.xyz", "foo.json.persistence", "foo.json.color", "bar.json.speed", "bar.json.speedUnit", "bar.json.color". The Bridge server will then check these field names and field types against the field definitions of every schema registered to your app. If every required field in the schema is present in your upload bundle representing the correct type, and there are no extraneous fields in your data the schema is not aware of (including both required and optional fields), then the data is matched to the schema.

If no schema matches your data, the server will reject the data.

TODO: Update the server to treat null JSON values and empty strings, structs, and lists as non-existent.

Schemas

A schema is defined as follows:

{
  "name": "Sample Schema",
  "schemaId": "sample-schema",
  "revision": 1,
  "fieldDefinitions": [
    {
      "name": "foo",
      "required": true,
      "type": "STRING"
    },
    {
      "name": "bar",
      "required": true,
      "type": "INT"
    },
    {
      "name": "baz",
      "required": false,
      "type": "BOOLEAN"
    }
  ],
  "type": "UploadSchema"
}

The name field is human-readable display name for your schema. This is also used to match non-JSON bundles.

schemaId is the machine-readable unique identifier for your schema.

revision is the revision number of your schema. Schemas are immutable. Once registered, they cannot be modified. However, researchers can upload a new revision of a schema.

fieldDefinitions is a list of fieldDefinition entries. Each entry specifies the field name, field type, and whether the field is required or optional.

Valid field types include:

ATTACHMENT_BLOB - Used for non-JSON data uploads. Represents any kind of non-JSON attachment that should be uploaded separately from structured data. Generally used for things like audio files.
ATTACHMENT_CSV - Used for non-JSON data uploads. Represents a file in CSV format. Bridge server will attempt to perform further post-processing on this CSV file.
ATTACHMENT_JSON_BLOB - Large blobs of JSON data that should be stored separately from structured data. This is treated the same as ATTACHMENT_BLOB, but is tagged as JSON data for researcher convenience.

ATTACHMENT_JSON_TABLE - A blob of JSON data in a structured tabular form, which can be used for additional post-processing on the server. Example:

[
  {
    "accelerometer":{
      "x":0.23,
      "y":-0.92,
      "z":0.0
    },
    "speed":2.78,
    "feeling":"hungry"
  },
  {
    "accelerometer":{
      "x":0.88,
      "y":-2.7,
      "z":-9.8,
    },
    "speed":0.12,
    "feeling":"excited"
  }
]

BOOLEAN - true or false
CALENDAR_DATE - String in YYYY-MM-DD format.
FLOAT - floating point number, including floats, doubles, and decimals. TODO: Update server to accept ints as floats instead of just floats, decimals, and doubles.
INLINE_JSON_BLOB - JSON blob that's small enough to fit inside the health data. Generally something that's less than a hundred characters. This is used for things like small lists: [ "apples", "banananas", "cranberries" ].
INT - integers, including longs and big integers.
STRING - any non-empty string. NOTE: Strings can't exceed 100 characters in length, or they won't fit into Synapse tables. If this is freeform text from the user, make sure this is limited to 100 characters, or specify the schema as ATTACHMENT_BLOB.
TIMESTAMP - timestamps, can either be a string in ISO 8601 format or a long that's milliseconds since the beginning of the epoch (1970-01-01).

For how to create new schemas or new revisions, see Bridge REST API.

Bridge Upload Data Format