...
patient id | patient age |
---|---|
x | 23 |
y | 65 |
z | 85 |
Part of normalizing our Files and Patients tables is to guarantee that they do not contain duplicate rows. This is typically achieved by defining a primary key. All FileViews automatically define the syn<ID> of each file (or object) as the primary key. However, currently there is no way to set a primary key for a Synapse table. In order to guarantee that we cannot have duplicate patient rows, we would want to define the primary key of our Patients table as the patient id column.
Now that our data is in two separate tables: Files and Patients, we no longer have data duplication. If we need to update the patient y’s age, there is only one place where we need to make the change. However, this creates a new problem. How to find all files with patients over the age of 60? The answer is a relational join.
...
When a materialized view is first created, the source tables/views might not be available on the a given shard. This means we would need to build an index for all source table/views on a given shard before, building a materialized view.
When a change occurs in a source table/view, the change will first need to be applied to the source index on the destination shard before the change can propagate to the destination materialized view. The amount of time for this propagation is unbounded.
In order to meet sub-second query times the materialized view index must remain available for query even during unbounded change propagation.
Views queries currently have a equivalent transaction isolation level of Read Committed. While tables currently have a transaction isolation level of Serializable. Since a materialized views can have either a table or a view as a source, the highest possible materialized view isolation level would be Read Committed.
Joining two tables without specifying an on clause will produce a Cartesian product. The Cartesian product can be large for even small tables. In addition it is unlikely that our users will intentionally want to create the Cartesian product. In order to prevent a potentially expensive mistake we reject any SQL JOIN that does not include an on clause.
API Design
MaterializedViews
The API design is fairly simple, and only involves the creation of a new Entity type that extends the View interface called MaterializedView:
...
File ID | name | patient id | patient age |
---|---|---|---|
syn456 | other.csv | y | 65 |
syn890 | more.csv | z | 85 |
syn333 | less.csv | y | 65 |
Primary Keys/Uniqueness
A uniqueness constraint on a single column would ensure that all values in that column are unique. A uniqueness constraint on multiple columns ensure that the combined values of those columns are unique. A single table can have more than one uniqueness constraint. A primary key is just a specific type of uniqueness constraint with the restriction that there can be only one primary key per table. Therefore we are proposing supporting the more generic uniqueness constraint.
The following object model defines a uniqueness constraint:
UniquenessConstraint
type | field | description |
---|---|---|
String | name | A user friendly name to show when a uniqueness constraint has been violated. |
List<Long> | columnIds | A list of one or more columnIDs that define the uniqueness constraint. |
We will extend the TableEntity object mode to include: List<UniquenessConstraint> uniquenessConstraints;
Does it make sense to add Uniqueness Constraints to views?