Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Current »

The epic PLFM-4698 captures all of the use cases associated with table versions and/or history.  In this document we will summarize the key use cases, review how the tables feature currently works then present a design for the additional features.

Driving Use Case Summary

  1. Add support to allow a user to "freeze" the current state of a table including both the current row data and the current schema.  The freeze would be immutable such that subsequent changes to either the rows or schema will not change the previous freeze (PLFM-4890, PLFM-5335).
  2. The freeze would be reference-able such that reference to freeze can be added to historical records such as provenance (PLFM-2870, PLFM-4558).
  3. The freeze would be query-able such that a query run against the freeze will return results consistent with the state of the table at the time of the freeze (PLFM-4558).
  4. Add support to create a "freeze" of a view by creating a table containing the data from the view and then adding a freeze to the resulting table (PLFM-4247, PLFM-5335, PLFM-5433).
  5. Add support to revert a table to a previous state (PLFM-4607, PLFM-4639).
  6. Add support to add, update, or delete rows using primary keys defined by the user (PLFM-3815).
  7. Add support to remove all history from a table to speed up table build times (PLFM-4988).

Table Architecture Review

The Synapse Table feature is an append-only database.  For example, a user can start a new transaction that includes row addition, updates, and deleted.  To handle such a request, the Synapse table engine will calculate a "delta" that represents each row or column change to be appended to the table.  The engine will automatically batches the deltas into the blocks that are optimized for storage and retrieval.  Each block will contain as many deltas as will fit into memory.  This means the number of row changes in each block is a function of the "width" of the table.  Wide tables have fewer rows deltas per block than narrow tables.  Each block is assigned a version number.  This is the version number required to update a row.  Finally all blocks are grouped together with a transaction number that captures all of the changes from a single user's request. Each delta block is stored in AWS S3 and tracked in the main Synapse database.  The blocks of deltas are the "truth" of all Synapse tables. 


Table query is then supported by building an actual MySQL table, called the "table's index", in a separate MySQL database cluster.  Specifically, each index is built by applying blocks of deltas to the table in the order they were created.  Currently, we always apply all of the blocks of data when building a table's index.  Creating an index that represents a data "freeze" is as simple as building a index that includes all blocks from the beginning of the table's history up to, and including, the blocks marked in the "freeze".


Feature Design

Table Freeze

TableEntities already implement VersionableEntity.  It is currently possible to create a new version of TableEntity using the the API.  It is currently not super useful to create a new version of a table as the only information captured in the version is the table's annotations.  However, this means table versions can be referenced in the same way as file versions.  The following entity version nomenclature is already used throughout Synapse and for Entity DOIs:

syn<id>.<version_number>


To make table versions more useful, the version needs to capture an transaction identifier (aka transaction number).  Synapse would then interpret this to mean that the version includes all deltas from the first transaction number, up to, and including the referenced transaction identifier.

Figure 1

The example in figure 1, show a single table: syn123, that has had three user updates identified by the three transaction IDs (82, 98, & 101).  Notice that transaction 82 has three row version blocks (0, 1, & 2) as it was a large change.  In this example the first version of the table (syn123.1) references transaction 98.  This means version one will include all deltas from 98 and earlier which includes row version blocks 0 though 4.  This also means version one excludes all changes that occurred after transaction 98.  So transaction 101 will not be include (nor will the changes from row version block 5.  However, the second version of the table syn123.2 includes all changes applied to the table.





  • No labels