Skip to end of banner
Go to start of banner

Combining Data from Multiple Table Sources

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

Database normalization (not to be confused with statistical normalization) is a common technique used in data management to limit the amount of redundant data that is stored in a database, which can reduce the risk of introducing errors or inconsistencies in data, and simplify the process of updating data.

When data has been structured in this way, it is considered to be normalized. However, for practical purposes, like presenting a report or running an analysis, data from normalized sources often must be joined together, or denormalized.

If you store normalized data in Synapse tables, views, or datasets, you can combine separate data sources using Materialized Views. A materialized view is a type of Synapse table that is defined using a Synapse SQL statement, which can contain SQL keywords such as JOIN and UNION to combine existing Synapse tables.

Creating a Materialized View

You can create a materialized view from the “Tables” tab of any project in which you have permission to create new objects. From the toolbar of the Tables tab, click “Add New…”, then click “Add Materialized View”.

Enter the name of your new materialized view, and enter a Synapse SQL query that will define the results of your materialized view.

Allowed SQL Keywords

In the defining SQL of a materialized view, you can use any supported Synapse SQL you can use in a query, as well as the following operations:

  • LEFT JOIN

  • RIGHT JOIN

  • INNER JOIN

  • UNION (column definitions must match)

Like other tables and views, JOINs and UNIONs are not permitted in a query on a materialized view.

Example: Creating and Joining Denormalized Datasets

Suppose we wanted to augment the classic Iris dataset with the iris species' common name. Naïvely, we could just add another column to the dataset:

sepallength

sepalwidth

petallength

petalwidth

species

Common Name

5.1

3.5

1.4

0.2

Setosa

bristle-pointed iris

4.9

3.0

1.4

0.2

Setosa

bristle-pointed iris

7.0

3.2

4.7

1.4

Versicolor

blue flag

6.4

3.2

4.5

1.5

Versicolor

blue flag

6.3

3.3

6.0

2.5

Virginica

Virginia blueflag

5.8

2.7

5.1

1.9

Virginica

Virginia blueflag

However, this data is not normalized. Instead, we could create a table that maps the unique species name to its common name:

species

commonName

Setosa

bristle-pointed iris

Versicolor

blue flag

Virginica

Virginia blueflag

If we create two separate Synapse tables for the Iris dataset (syn51941633) and the mapping between species and column name (syn51941637), we can create a new materialized view with the following defining SQL:

SELECT 
  iris.sepallength as sepalLength,
  iris.sepalwidth as sepalWidth,
  iris.petallength as petalLength,
  iris.petalwidth as petalWidth,
  iris.species as species,
  nameMap.commonName as commonName
FROM 
  syn51941633 iris
LEFT JOIN 
  syn51941635 nameMap
ON
  iris.species = nameMap.species

Synapse will build the materialized view containing the following sample data, which can be further queried and treated like any other table or view:

sepalLength

sepalWidth

petalLength

petalWidth

species

commonName

5.1

3.5

1.4

0.2

Setosa

bristle-pointed iris

4.9

3.0

1.4

0.2

Setosa

bristle-pointed iris

7.0

3.2

4.7

1.4

Versicolor

blue flag

6.4

3.2

4.5

1.5

Versicolor

blue flag

6.3

3.3

6.0

2.5

Virginica

Virginia blueflag

5.8

2.7

5.1

1.9

Virginica

Virginia blueflag

Querying Materialized Views

Materialized views can be queried like any other table or view in Synapse. For more information, see Querying Tables, Views, and Datasets .

Versioning Materialized Views

Unlike tables and file views, you cannot create a snapshot of a materialized view. When the referenced tables and views are updated, the query results of your materialized view may change.

One technique that can be used to ensure your materialized view results do not change unexpectedly is to reference snapshot versions of sources in the materialized view’s defining SQL. See Versioning Tables, Views, and Datasets for guidance on versioning source tables.

Permissions on Materialized Views

Any Synapse user with the view permission on a materialized view can query it. However, the results of a materialized view may vary depending on permissions in source tables and views referenced in the materialized view’s defining SQL. For more information about permissions in Synapse, see Sharing Settings, Permissions, and Conditions for Use .

To query a materialized view, a Synapse user must have “download” permission on all source tables (if any exist). Additionally, each row in a materialized view that is derived from a row in a view is only visible to Synapse users that have view permission on the corresponding object, such as a file. Rows that are derived from multiple objects in one or more views are only visible if the user has view permission on all referenced objects.

  • No labels