Combining Data from Multiple Table Sources
Database normalization (not to be confused with statistical normalization) is a common technique used in data management to limit the amount of redundant data that is stored in a database, which can reduce the risk of introducing errors or inconsistencies in data, and simplify the process of updating data.
When data has been structured in this way, it is considered to be normalized. However, for practical purposes, like presenting a report or running an analysis, data from normalized sources often must be joined together, or denormalized.
If you store normalized data in Synapse tables, views, or datasets, you can combine separate data sources using Materialized Views. A materialized view is a type of Synapse table that is defined using a Synapse SQL statement, which can contain SQL keywords such as JOIN and UNION to combine existing Synapse tables.
Creating a Materialized View
You can create a materialized view from the “Tables” tab of any project in which you have permission to create new objects. From the toolbar of the Tables tab, click “Add New…”, then click “Add Materialized View”.
Enter the name of your new materialized view, and enter a Synapse SQL query that will define the results of your materialized view.
Allowed SQL Keywords
In the defining SQL of a materialized view, you can use any supported Synapse SQL you can use in a query, as well as the following operations:
LEFT JOIN
RIGHT JOIN
INNER JOIN
UNION (column definitions must match)
Like other tables and views, JOINs and UNIONs are not permitted in a query on a materialized view.
Example: Creating and Joining Denormalized Datasets
Suppose we wanted to augment the classic Iris dataset with the iris species' common name. Naïvely, we could just add another column to the dataset:
sepallength | sepalwidth | petallength | petalwidth | species | commonName |
---|---|---|---|---|---|
5.1 | 3.5 | 1.4 | 0.2 | Setosa | bristle-pointed iris |
4.9 | 3.0 | 1.4 | 0.2 | Setosa | bristle-pointed iris |
7.0 | 3.2 | 4.7 | 1.4 | Versicolor | blue flag |
6.4 | 3.2 | 4.5 | 1.5 | Versicolor | blue flag |
6.3 | 3.3 | 6.0 | 2.5 | Virginica | Virginia blueflag |
5.8 | 2.7 | 5.1 | 1.9 | Virginica | Virginia blueflag |
However, this data is not normalized. Instead, we could create a table that maps the unique species name to its common name:
species | commonName |
---|---|
Setosa | bristle-pointed iris |
Versicolor | blue flag |
Virginica | Virginia blueflag |
If we create two separate Synapse tables for the Iris dataset (syn51941633) and the mapping between species and common name (syn51941637), we can create a new materialized view with the following defining SQL:
SELECT
iris.sepallength as sepalLength,
iris.sepalwidth as sepalWidth,
iris.petallength as petalLength,
iris.petalwidth as petalWidth,
iris.species as species,
nameMap.commonName as commonName
FROM
syn51941633 iris
LEFT JOIN
syn51941635 nameMap
ON
iris.species = nameMap.species
Synapse will build the materialized view containing the following sample data, which can be further queried and treated like any other table or view:
sepalLength | sepalWidth | petalLength | petalWidth | species | commonName |
---|---|---|---|---|---|
5.1 | 3.5 | 1.4 | 0.2 | Setosa | bristle-pointed iris |
4.9 | 3.0 | 1.4 | 0.2 | Setosa | bristle-pointed iris |
7.0 | 3.2 | 4.7 | 1.4 | Versicolor | blue flag |
6.4 | 3.2 | 4.5 | 1.5 | Versicolor | blue flag |
6.3 | 3.3 | 6.0 | 2.5 | Virginica | Virginia blueflag |
5.8 | 2.7 | 5.1 | 1.9 | Virginica | Virginia blueflag |
Querying Materialized Views
Materialized views can be queried like any other table or view in Synapse. For more information, see Querying Tables, Views, and Datasets .
Versioning Materialized Views
Unlike tables and file views, you cannot create a snapshot of a materialized view. When the referenced tables and views are updated, the query results of your materialized view may change.
One technique that can be used to ensure your materialized view results do not change unexpectedly is to reference snapshot versions of sources in the materialized view’s defining SQL. See Versioning Tables, Views, and Datasets for guidance on versioning source tables.
Permissions on Materialized Views
Any Synapse user with the view permission on a materialized view can query it. However, the results of a materialized view may vary depending on permissions in source tables and views referenced in the materialized view’s defining SQL. For more information about permissions in Synapse, see Sharing Settings, Permissions, and Conditions for Use .
To query a materialized view, a Synapse user must have “download” permission on all source tables (if any exist). Additionally, each row in a materialized view that is derived from a row in a view is only visible to Synapse users that have view permission on the corresponding object, such as a file. Rows that are derived from multiple objects in one or more views are only visible if the user has view permission on all referenced objects.