Document toolboxDocument toolbox

Table Entities

Introduction

The use-cases and requirements for this design are outlined in Records and RecordStores.  This document will describe the proposed model object, REST APIs, and usage patterns of a new Synapse object called a TableEntity.

A TableEntity is a new Entity type that has many of the attributes of a relational database table.  In fact, we will adopt all of the nomenclature from: http://en.wikipedia.org/wiki/Relational_database.

Object Model

The following class diagram shows how TableEntity integrates with the rest of the Entity model objects:


The TableEntity is a model object that describes the table including the columns and any constraints.  The TableEntity model object does not encapsulate any row data.  Instead, separate services will be provided to add, update, delete, fetch and query rows from a table entity.  A TableEntity can have an ACL like all other Synapse Entities that can be used to restrict access to the table and all of its rows.  We are not planning to support row level authorization at this time.

REST API

URLMethodRequest BodyResponse BodyDescriptionAuthority
/entityPOSTTableEntityTableEntityCreate a new TableEntity as you would any other Entity in Synapse. A new TableEntity will not have any rows.  Add rows see POST /table/<id>Must have the CREATE permission on the parent entity of the table.
/entity/<id>GET
TableEntityGet a TableEntity as you would any other Entity in SynapseMust have READ permission on the Entity
/entity/<id>PUTTableEntityTableEntityUpdate the TableEntity.Must have UPDATE permission on the Entity
/entity<id>DELETE

Delete a TableEntity and all of its rows.Must have DELETE permission on the Entity
/entity/<id>/tablePOSTRowSetTableRowReferenceThis method is used to both add and update rows to a TableEntity.  The <id> must be the ID of the TableEntity.  The passed RowSet will contain all data for the rows to add or update.  The RowSet.rows is a list of Rows, one of each row to add or update.  If the Row.rowId is null, then a row will be added for that request, if a rowId is provided then the row with that ID will be updated (a 400 will be returned if a row ID is provided that does not actually exist).  The Row.values list should contain a value for each column of the row.  The RowSet.headers identifies the columns (by name) that are to be updated by this request.  Each Row.value list must be the same size as the RowSet.headers, as each value is mapped to a column by the index of these two arrays.  When a row is added it will be issued both a rowId and a version number.  When a row is updated it will be issued a new version number (each row version is immutable).  The resulting TableRowReference will enumerate all rowIds and versionNumbers for this update.  The resulting RowReferecnes will be listed in the same order as the passed result set.  A single POST to this services will be treated as a single transaction, meaning all of the rows will be added/updated or none of the rows will be added/updated.  If this web-services fails for any reason all changes will be "rolled back".Must have UPDATE permission on the TableEntity identified by the <id>
/entity/<id>/table/deleteRowsPOSTTableRowReferenceTableRowReferenceDelete all rows identified in the request.  When a row is delted it will now longer appear in queries.  However, old version of the row will still exist and may still be referenced and retrieved from the POST /entity/<id>/table/version call.Must have UPDATE permission on the Entity
/entity/<id>/table?query="<sql like queryr>"GET
RowSet

Query the rows of a table with a "SQL like" query string.  For example:

"select * from <table_id> where foo=bar limit 100 offset 0"

Must have READ permission on the table.
/entity/<id>/table/versionPOSTTableRowReferenceRowSetGiven a TableRowReference, get the RowSet of the rows.  Each RowReference of the requested can included a versionNumber.  When a versionNumber is included the data for that row will be from the requested version number.  If the versionNumber is blank or null then the current version of that row will be returned.Must have READ permission on the table.
/entity/{id}/table/column/{columnId}/row/{rowId}/version/{versionId}/file
GET
URLGiven a TableRowReference and a column name of a file handle id column, return a signed url for the file.Must have READ permission on the table.
/entity/{id}/table/column/{columnId}/row/{rowId}/version/{versionId}/filepreviewGET
URLGiven a TableRowReference and a column name of a file handle id column, return a signed url for the file previewMust have READ permission on the table.



Row Versions

A Table Entity does not have multiple versions.  Instead, each row is issued a version number when created.  A version of a row is immutable.  When an row is updated, a new version of that row is added and issued a new version number.  Each version of a row can be referenced using a RowReference object that includes both the rowId and a versionNumber. While is is possible to fetch an old version of a row (see POST /entity/<id>/table/version), it is not possible to query for old row version.

Table Query

Synapse will automatically create an index of the current rows of each table entity.  This index will be used to support all SQL-like queries against a table.  Since the index only contains the most recent (or current) version of each row, it is only possible to query against the current version.

The following BNF defines the supported query language.  This BNF is simplified sub-set of: sql-92, that excludes sub-queries, joins, aliases, and all DML. 

<query specification> ::=
		SELECT [ <set quantifier> ] <select list> <table expression>
<select list> ::=
		<asterisk>
	|	<column name> [ { <comma> <column name> }... ] 
<table expression> ::=
		<from clause>
		[ <where clause> ]
		[ <group by clause> ]
		[ <order by clause> ]
		[ <pagination clause> ]
<from clause> ::= FROM <table name> 
<where clause> ::= WHERE <search condition>
<group by clause> ::= GROUP BY <grouping column reference list>
<grouping column reference list> ::=
		<column name> [ { <comma> <column name> }... ]
<order by clause> ::= ORDER BY <sort specification list>
<sort specification list> ::= <column name> [ { <comma> <column name> }... ]
<pagination clause> ::= LIMIT <literal> OFFSET <literal>

<search conditions>

<search condition> ::=
		<boolean term>
	|   <search condition> OR <boolean term>

<boolean term> ::=
		<boolean factor>
	|   <boolean term> AND <boolean factor>

<boolean factor> ::= [ NOT ] <boolean test>

<boolean test> ::= <boolean primary> [ IS [ NOT ] <truth value> ]

<boolean primary> ::= <predicate> | <left paren> <search condition> <right paren>

<predicate> ::=
		<comparison predicate>
	|   <between predicate>
	|   <in predicate>
	|   <like predicate>
	|   <null predicate>
	|   <quantified comparison predicate>
	|   <exists predicate>
	|   <match predicate>
	|   <overlaps predicate>

<comparison predicate> ::= <row value constructor> <comp op> <row value constructor>

<row value constructor> ::=
		<row value constructor element>
	|   <left paren> <row value constructor list> <right paren>
	|   <row subquery>

<row value constructor element> ::=
		<value expression>
	|   <null specification>
	|   <default specification>

<value expression> ::=
		<numeric value expression>
	|   <string value expression>
	|   <datetime value expression>
	|   <interval value expression>

<numeric value expression> ::=
		<term>
	|   <numeric value expression> <plus sign> <term>
	|   <numeric value expression> <minus sign> <term>

<term> ::=
		<factor>
	|   <term> <asterisk> <factor>
	|   <term> <solidus> <factor>

<factor> ::= [ <sign> ] <numeric primary>

<numeric primary> ::= <value expression primary> | <numeric value function>

<value expression primary> ::=
		<unsigned value specification>
	|   <column reference>
	|   <set function specification>
	|   <case expression>
	|   <left paren> <value expression> <right paren>
	|   <cast specification>

<unsigned value specification> ::= <unsigned literal> | <general value specification>

<unsigned literal> ::= <unsigned numeric literal> | <general literal>

<general value specification> ::=
		<parameter specification>
	|   <dynamic parameter specification>
	|   <variable specification>

<parameter specification> ::= <parameter name> [ <indicator parameter> ]

<parameter name> ::= <colon> <identifier>

<indicator parameter> ::= [ INDICATOR ] <parameter name>

<dynamic parameter specification> ::= <question mark>

<variable specification> ::= <embedded variable name> [ <indicator variable> ]

<embedded variable name> ::= <colon><host identifier>

<qualifier> ::= <table name> | <correlation name>

<correlation name> ::= <identifier>

<set function specification> ::=
		COUNT <left paren> <asterisk> <right paren>
	|   <general set function>

<general set function> ::=
		<set function type> <left paren> [ <set quantifier> ] <value expression> <right paren>

<set function type> ::= AVG | MAX | MIN | SUM | COUNT

<set quantifier> ::= DISTINCT | ALL


Table metadata Updates

Columns and constrains can be added or removed from a TableEntity at any time.  This is done by updating the TableEntity (similar to all Entity updates).  However, a few items should be considered:

  • A constraint can only be added if the all existing data already meets the constraint or a 400 will be returned.
  • When a column is added a default value should be provided in the column model, if not the default value will be null.
  • When a column is removed, old version of each row will retain the values of the deleted column.

Provenance

Anytime a query is run the resulting RowSet will contain the a rowId and versionNumber for each row.  This row metadata can be bundled into a TableRowRefrence object can be used in provenance.

Consistency, Availability, Partition Tolerance (CAP)

According to Brewer's CAP theorem any database system can only support two-out-of-three of the following concerns: Consistency (C), Availability (A), and Partition Tolerance (P).    Table Entities are designed to support Consistency and Partition Tolerance at the cost of Availability, making it a CP system (see: visual-guide-to-nosql-systems to see how this compares with other technologies).

So how does this translate into supported functionality?

Table Entities provide consistency by guaranteeing the following:

  • All rows of a RowSet will either be accepted or rejected (partial updates are not possible).
  • Concurrent RowSet updates are processed serially (not in parallel).
  • An optimistic row-level locking scheme is used to ensure that all column values of a single row are consistent between reads and write. If one user changes one column of a row, while another user changes another column of the same row, the first change will be accepted, while the second will be rejected.  In other words, you must have read the current row in order to update that row.
  • Consistency is not guaranteed between separate rows.  This means users can update separate rows concurrently without conflict.
  • Query results only reflect the current state of the table.  This means an "update lock" is held during the execution of a query.  It also means query results may be temporarily unavailable during updates.  Availability is traded for Consistency since the index that supports the query is on a separte partition from the raw update data.  In onther words, the update must be replicated from the "truth" partition to the index partition.

Note: Pagination Consistency - There is a size limit on all query results.  This means large query results can only be fetched using pagination, where the caller fetches one page of a query at a time.  While the data of a single page is guaranteed to be consistent, there is no guarantee of consistency between pages.  In other words, a table can be updated between fetching one page and fetching the next.  However, each query result will always return the current etag of the table, so if the etag changes between one page fetch and the next the client can detect the potential inconsistency between page fetches.  It is up to the client to decide how to handle this case.