Content Comparison

...

Technically, the aggregate statics statistics for the PART_STAGE are gathered in two separate columns using the following in the the SQL select:

...

These statistics seem similar to aggregate-enumeration. Should they really be two separate types?Now that we understand exactly what we want in our files

Filtering Before Aggregation

So far, we have only discussed applying filters to aggregated columns. Such filter must be applied after the aggregation results are calculated. However, the main use case requires some runtime filtering to occur before aggregation is applied.

For example, a user might start with the participants-with-aggregated-participants perspective, how do we tell Synapse what we want? We will tackle this problem in the next sectionfiles perspective and narrow down their selection of participant IDs to: 2, 4, 5, & 7.

They then will want to apply their selection of participant IDs as a filter to files-with-aggregated-participants perspective. This filter must change the aggregated participant data. This means the filter must be applied before aggregation occurs.

Here is what the results look like with this filter applied:

Expand

title	Expand to see the sql that generated these results:

Code Block

WITH
	F2P AS (
		SELECT 
			FILE_ID,
			FILE_NAME,
			FILE_TYPE,
            FILE_SIZE,
			COUNT(PART_ID) AS PART_COUNT,
			SUM(CASE STAGE WHEN 'one' THEN 1 ELSE 0 END) AS STAGE_ONE_COUNT,
			SUM(CASE STAGE WHEN 'two' THEN 1 ELSE 0 END) AS STAGE_TWO_COUNT,
            MAX(AGE) AS MAX_AGE,
            MIN(AGE) AS MIN_AGE,
			GROUP_CONCAT(DISTINCT PART_ID) AS PART_IDS FROM MATERIAL
				WHERE FILE_ID IS NOT NULL 
				AND PART_ID IN(2, 4, 5, 7)
                GROUP BY FILE_ID, FILE_NAME, FILE_TYPE, FILE_SIZE
    ),
    AGG AS (
		SELECT FILE_ID, FILE_NAME, FILE_TYPE, FILE_SIZE, PART_COUNT,
			JSON_OBJECT('one', STAGE_ONE_COUNT, 'two', STAGE_TWO_COUNT) as PART_STAGE,
            JSON_OBJECT('min', MIN_AGE, 'max', MAX_AGE) as PART_AGE,
            PART_IDS
				FROM F2P
    )
SELECT * FROM AGG;

FILE_ID	FILE_NAME	FILE_TYPE	FILE_SIZE	PART_COUNT	PART_STATE	PART_AGE	PART_ID
1	f1	raw	100	4	{"one": 2, "two": 2}	{"max": 40, "min": 10}	2,4,5,7
2	f2	raw	200	2	{"one": 2, "two": 0}	{"max": 40, "min": 20}	2,4
3	f3	raw	300	2	{"one": 0, "two": 2}	{"max": 30, "min": 10}	5,7
4	f3	raw	400	2	{"one": 2, "two": 0}	{"max": 40, "min": 20}	2,4
5	f5	proc	100	2	{"one": 0, "two": 2}	{"max": 30, "min": 10}	5,7
6	f6	proc	200	2	{"one": 1, "two": 1}	{"max": 20, "min": 10}	2,5
7	f7	proc	300	2	{"one": 1, "two": 1}	{"max": 40, "min": 30}	4,7
8	f8	proc	400	4	{"one": 2, "two": 2}	{"max": 40, "min": 10}	2,4,5,7

Notice, the previous unfiltered results for file:1 had a PART_COUNT=8, while the filtered results have a PART_COUNT=4.

If the aggregation results did not need to change at runtime, then we could simply use a materialized view as a solution to the entire problem. For example, we could pre-build a MV with millions rows of “static” aggregate data. End users could query this “static” data at runtime without any problems.

On the other hand, if user’s runtime selections can change the aggregation results, then pre-building static results will now work. We also cannot rebuild a million row MV each time a user makes a runtime selection.

This means we need a solution that will perform the aggregation at runtime in response to user’s selections.

Aggregate View

We are proposing adding a new entity Type called AggregateView. Like a MaterializedView, an AggregateView would be configured by setting its defining SQL. However, that is where the similarities end. An AggregateView is not a materialization. This means that we do not create an actual table in the database for this view. Instead, an AggregateView provides a simple, table like, layer of of abstraction over a complex aggregation query. To better understand this concept, lets show how we would use an AggregateView to create the files-to-participant-perspective.

...

Version	Old Version 9	New Version 10
Changes made by	John Hill	John Hill
Saved on	May 02, 2023	May 02, 2023

Versions Compared

Key

Filtering Before Aggregation

Aggregate View