Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

It is important to note that our materialized view cannot be directly modified. If we want to change the age of patient y, we would first need to update the patient table which would then rebuild the materialized view.

Direct View vs Materialized View

Since the release of views in Synapse the primary concern from our users has been query performance. Users expect sub-second query performance since these queries drive many UI interactions. Poor query performance can make a portal sluggish. On the other hand, data producers are tolerant of slow view creation times. They are also willing to accept eventual consistency if it yields faster query performance. This trade-off lends itself to caching, or materialization. All existing views in Synapse today are materialized.

User Requirements

  1. From the data consumer’s perspective a materialized view would behave exactly like any other table or view in Synapse. For example, faceted navigation of materialized view would behave exactly like faceted navigation in a table.

  2. The data consumer should not need to understand that the data in a materialized view was derived from one or more tables/views.

  3. The data producer would be expected to construct materialized view on behalf of their data consumers.

  4. The data producer would be expected to have a basic understanding of database normalization and the SQL join syntax.

  5. When the source data of a materialized view changes (example a change occurs in the patients table), the materialized view should eventually get update automatically. Specifically, no users action should be required to force the materialized view to re-build upon source changes.

  6. Materialized view sources can be either views, tables, or even other materialized views.

  7. Materialized view are defined by SQL only. The SQL can include any supported SQL including joins, where clause, group by clause, and select functions.

  8. All authorization checks on the source data must extends to materialized view.

    1. Tables - If a materialized view has a source table, the user must have the DOWNLOAD permission on the source table to see any rows in the materialized view. All or none.

    2. View - The user must have READ on each object from a view. Row level filtering.

...

  1. When a materialized view is first created, the source tables/views might not be available on the a given shard. This means we would need to build an index for all source table/views on a given shard before, building a materialized view.

  2. When a change occurs in a source table/view, the change will first need to be applied to the source index on the destination shard before the change can propagate to the destination materialized view. The amount of time for this propagation is unbounded.

  3. In order to meet sub-second query times the materialized view index must remain available for query even during unbounded change propagation.

  4. Views queries currently have a equivalent transaction isolation level of Read Committed. While tables currently have a transaction isolation level of Serializable. Since a materialized views can have either a table or a view as a source, the highest possible materialized view isolation level would be Read Committed.

  5. Joining two tables without specifying an on clause will produce a Cartesian product. The Cartesian product can be large for even small tables. In addition it is unlikely that our users will intentionally want to create the Cartesian product. In order to prevent a potentially expensive mistake we reject any SQL JOIN that does not include an on clause.

...