Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • FILE_VIEW- This represents a Synapse file view that contains all of the files within the scope of the project. Each row represents a single file. The metadata about each file is captured in the columns of this table. These files represent the entire pool of files that the end user can draw from.

  • PARTICIPANTS - This represents a Synapse table that contains one row per participant. The metadata about each participant is captured in the columns of this table. These participants represent the entire pool of candidate participants that the end user can draw from.

  • FILE_TO_PART - This is a Synapse table that maps files to participant. In the real data a single participant might be mapped to thousands of files, and vise versa.

  • MATERIAL - This table represents a materialized view that is the de-normalized join of the proceeding three tables. It includes files that might not have any participants and vise versa. In the real world this table would be automatically index on all relevant columns. Note: A worst case for this table would be where each file is associated with each participant thereby resulting in a Cartesian product.

...

While the MATERIAL table contains all of the data, it is not super user-friendly. To make it user friendly, we create improve usability, the data is exposed as two separate perspectives of the data:

  • Participants-with-aggregated-files - The results of this query is such that each row represents a single participant, including columns for the participant’s metadata. This table The results also has include additional columns that contain file aggregation data for each participant.

  • Files-with-aggregated-participants - The results of this query is such that each row represents a single file, including columns for the file’s metadata. This table The results also has include additional columns that contain participant aggregation data for each file.

...

Extending our existing user story, the user decides they only want to include files that include all four of their selected participants, so they add the filter (see: line 14):

Code Block
WITH
	F2P AS (
		SELECT 
			FILE_ID,
			MAX(FILE_NAME) AS FILE_NAME,
			MAX(FILE_TYPE) AS FILE_TYPE,
            MAX(FILE_SIZE) AS FILE_SIZE,
			COUNT(PART_ID) AS PART_COUNT,
			SUM(CASE STAGE WHEN 'one' THEN 1 ELSE 0 END) AS STAGE_ONE_COUNT,
			SUM(CASE STAGE WHEN 'two' THEN 1 ELSE 0 END) AS STAGE_TWO_COUNT,
			GROUP_CONCAT(DISTINCT PART_ID) AS PART_IDS FROM MATERIAL
				WHERE FILE_ID IS NOT NULL AND PART_ID IN (5,6,7,8) GROUP BY FILE_ID
    )
SELECT * FROM F2P WHERE PART_COUNT >= 4;

...

Row-level Filtering

Anytime a solution includes runtime query result can include files, we need to ensure that the system automatically filters out files that the caller does not have permission to read.

Note: The current query system does not filter files that the user does not have permission to download, or unmet access requirements, by design.

In our example FILE_This is something we must consider if we want to extend runtime queries to support CTE.

In our example FILE_VIEW table we include a column called BEN_ID. It represents the ID of the Project/Folder that controls access to a file via its access control list. It is called a benefactor ID because it defines where a file inherits its permissions from. This column simulate how Synapse treats a real file view. You might also notice that this column was include in the materialized view (MATERIAL). This also simulates how Synapse treats materialized views that include data from file views.

...

The BEN_ID column is then used at query time to automatically filter out rows that the caller cannot see using the following two steps:

...

. This filtering is done in three steps.

In step 1 we extract the FROM and WHERE from the user

...

We run an authorization check to the the sub-set of BEN_IDs that the caller has the read permission on.

...

Finally, the user’s original query is modified to include an unconditional IN clause that limits the results to only benefactors the caller can see.

If we apply this concept to the our example scenario above, we detect that the MATERIAL has two distinct benefactor ID by automatically converting the user’s input query as followsprovided query and use them in combination with a distinct SELECT clause. For this example the results would look like:

Code Block
		SELECT 
			DISTINCT FILE_BEN_ID FROM MATERIAL
				WHERE FILE_ID IS NOT NULL AND PART_ID IN (5,6,7,8);

In this step we must discard the original CTE, select, and group by provided by the user’s query and replace the select with the DISTINCT FILE_BEN_ID. The result of this query returns two rows: 111 and 222.

We then do an authorization check on the user and determine they can This query will return two values: 111, and 222. This tells us the full set of possible benefactors that this query could return. These values are then used in step 2 which is an authorization check to determine the sub-set that they the user is allowed to see. For our example, lets say the caller is allowed to see 111 but not 222. We are then ready to re-write their input SQL as follows

This information is then used in the final step (3), where we rewrite the user’s query to unconditionally limit to rows that the user is authorized to see:

Code Block
WITH
	F2P AS (
		SELECT 
			FILE_ID,
			MAX(FILE_NAME) AS FILE_NAME,
			MAX(FILE_TYPE) AS FILE_TYPE,
            MAX(FILE_SIZE) AS FILE_SIZE,
			COUNT(PART_ID) AS PART_COUNT,
			SUM(CASE STAGE WHEN 'one' THEN 1 ELSE 0 END) AS STAGE_ONE_COUNT,
			SUM(CASE STAGE WHEN 'two' THEN 1 ELSE 0 END) AS STAGE_TWO_COUNT,
			GROUP_CONCAT(DISTINCT PART_ID) AS PART_IDS FROM MATERIAL
				WHERE (FILE_ID IS NOT NULL AND PART_ID IN (5,6,7,8)) AND FILE_BEN_ID IN (111) GROUP BY FILE_ID
    )
SELECT * FROM F2P;

Note: At line 12, the user’s original where clause is unconditionally in parenthesis and combined withAND :FILE_BEN_ID IN (111). This ensures that the caller will never see file rows that they are unauthorized to see.We added this section to show how the adding a CTE query filter requires that we also plan for how to apply the correct runtime row-level filtering

In summary, if we want to support CTE in runtime queries we will need to extend the authorization sub-system to extract and rebuild the inner query, while leaving outer query intact.

New Features

In order to make the above use cases work using the provide example queries we are going to need to add several new features to the Synapse query system.

...