Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Table query is then supported by building an actual MySQL table, called the "table's index", in a separate MySQL database cluster.  Specifically, each index is built by applying blocks of deltas to the table in the order they were created.  Currently, we always apply all of the blocks of data when building a table's index.  Creating an index that represents a data "freeze" table version is as simple as building a index that includes all blocks from the beginning of the table's history up to, and including, the blocks marked in the "freeze"version.


Feature Design

Table

...

Version

TableEntities already implement VersionableEntity.  It is currently possible to create a new version of TableEntity using the the API.  It is currently not super useful to create a new version of a table as the only information captured in the version is the table's annotations.  However, this means table , but we are not currently capturing any transaction information in the table's version.  We propose extending the table's version to include a reference to a transaction ID.  The example in Figure 1 shows the current state of the table syn123.  We will use this example to demonstrate the proposal for how table version will function.

Figure 1Image Added

Figure 1.


In Figure 1. the blocks of row deltas are on the left (Red), the transactions are in the middle (Green), and the version are on the right (blue).  As depicted, syn123 has eight  blocks of row deltas (0-7), four transactions (82, 98, 101, & 105), and two table versions (syn123.1 & syn123.2).

As mentioned above, each time a users updates a table, all of the changes of each update are captured with a transaction identifier. In this example the first user update created three blocks of row deltas (0-2) which are identified as transaction 82.  The second update included two blocks of row deltas (3-4) identified by transaction 98, and so on.  Notice that first and last transactions (82 & 105 respectively), do not have a version a table version associated with it.  This is because table updates do not automatically create new table version.  Instead, the user must explicitly request that a new table version be created using one of the following two methods:

  1. POST /entity/{id}/table/transaction/async/start - We will add a new Boolean parameter to the TableUpdateTransactionRequest called 'newVersion'.   By default the 'newVersion' will be 'false'.  When 'newVersion' is set to 'true', if the transaction completes successfully, a new table version will be created that references the newly created transaction.
  2. PUT /entity/{id} - This method already has a parameter called 'newVersion' and is currently used as a way to create new version of FileEntites.  We will extend this method to do the same for TableEntites.  When 'newVersion' is set to 'true' (for a TableEntity), a new version will be created that referenced the last transaction of that table.

Table version References

Table versions can be referenced in the same way as file versions.  The following entity version nomenclature is already used throughout Synapse and for Entity DOIs:

Code Block
syn<id>.<version_number>

To make table versions more useful, the version needs to capture an transaction identifier (aka transaction number).  Synapse would then interpret this to mean that the version includes all deltas from the first transaction number, up to, and including the referenced transaction identifier.

Image Removed

Figure 1


The example in figure 1, show a single table: syn123, that has had three user updates identified by the three transaction IDs (82, 98, & 101).  Notice that transaction 82 has three row version blocks (0, 1, & 2) as it was a large change.  In this example the first version of the table (syn123.1) references transaction 98.  This means version one will include all deltas from 98 and earlier which includes row version blocks 0 though 4.  This also means version one excludes all changes that occurred after transaction 98.  So transaction 101 will not be include (nor will the changes from row version block 5.  However, the second version of the table syn123.2 includes all changes applied to the tableIt will also be possible to reference a table version in a table query using the same pattern:

Code Block
select * from syn123.2 where foo = 'bar'

When a table version is referenced in a query, a table "index" will be built (if it does not already exist) using the transactions from the beginning of the table's history up to and including the transaction referenced by the version.  The query above references version two (syn123.2), so Synapse will run such a query against an index that has been built to include blocks 0 - 5.  Any changes applied to the table after version two will not be reflected in the query results.  So the query results for syn123.2 will not reflect the changes from transaction 105 (blocks 6 & 7).


Table Version History

The same API used to list the version history for FileEntities will be used to list a table's version history:  GET /entity/{id}/version.  However, unlike files that always have at least one version, a table's version history will be empty until a user actively creates the first version.  For tables, the List of VersionInfo objects will not include values for 'contentSize' or 'contentMd5', but all other fields will be provided.


View Freeze

Yet to be determined...


User Defined Primary Keys

Yet to be determined...