Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Technically, the aggregate statics statistics for the PART_STAGE are gathered in two separate columns using the following in the the SQL select:

...

These statistics seem similar to aggregate-enumeration. Should they really be two separate types?Now that we understand exactly what we want in our files

Filtering Before Aggregation

So far, we have only discussed applying filters to aggregated columns. Such filter must be applied after the aggregation results are calculated. However, the main use case requires some runtime filtering to occur before aggregation is applied.

For example, a user might start with the participants-with-aggregated-participants perspective, how do we tell Synapse what we want? We will tackle this problem in the next sectionfiles perspective and narrow down their selection of participant IDs to: 2, 4, 5, & 7.

They then will want to apply their selection of participant IDs as a filter to files-with-aggregated-participants perspective. This filter must change the aggregated participant data. This means the filter must be applied before aggregation occurs.

Here is what the results look like with this filter applied:

Expand
titleExpand to see the sql that generated these results:
Code Block
WITH
	F2P AS (
		SELECT 
			FILE_ID,
			FILE_NAME,
			FILE_TYPE,
            FILE_SIZE,
			COUNT(PART_ID) AS PART_COUNT,
			SUM(CASE STAGE WHEN 'one' THEN 1 ELSE 0 END) AS STAGE_ONE_COUNT,
			SUM(CASE STAGE WHEN 'two' THEN 1 ELSE 0 END) AS STAGE_TWO_COUNT,
            MAX(AGE) AS MAX_AGE,
            MIN(AGE) AS MIN_AGE,
			GROUP_CONCAT(DISTINCT PART_ID) AS PART_IDS FROM MATERIAL
				WHERE FILE_ID IS NOT NULL 
				AND PART_ID IN(2, 4, 5, 7)
                GROUP BY FILE_ID, FILE_NAME, FILE_TYPE, FILE_SIZE
    ),
    AGG AS (
		SELECT FILE_ID, FILE_NAME, FILE_TYPE, FILE_SIZE, PART_COUNT,
			JSON_OBJECT('one', STAGE_ONE_COUNT, 'two', STAGE_TWO_COUNT) as PART_STAGE,
            JSON_OBJECT('min', MIN_AGE, 'max', MAX_AGE) as PART_AGE,
            PART_IDS
				FROM F2P
    )
SELECT * FROM AGG;

FILE_ID

FILE_NAME

FILE_TYPE

FILE_SIZE

PART_COUNT

PART_STATE

PART_AGE

PART_ID

1

f1

raw

100

4

{"one": 2, "two": 2}

{"max": 40, "min": 10}

2,4,5,7

2

f2

raw

200

2

{"one": 2, "two": 0}

{"max": 40, "min": 20}

2,4

3

f3

raw

300

2

{"one": 0, "two": 2}

{"max": 30, "min": 10}

5,7

4

f3

raw

400

2

{"one": 2, "two": 0}

{"max": 40, "min": 20}

2,4

5

f5

proc

100

2

{"one": 0, "two": 2}

{"max": 30, "min": 10}

5,7

6

f6

proc

200

2

{"one": 1, "two": 1}

{"max": 20, "min": 10}

2,5

7

f7

proc

300

2

{"one": 1, "two": 1}

{"max": 40, "min": 30}

4,7

8

f8

proc

400

4

{"one": 2, "two": 2}

{"max": 40, "min": 10}

2,4,5,7

Notice, the previous unfiltered results for file:1 had a PART_COUNT=8, while the filtered results have a PART_COUNT=4.

If the aggregation results did not need to change at runtime, then we could simply use a materialized view as a solution to the entire problem. For example, we could pre-build a MV with millions rows of “static” aggregate data. End users could query this “static” data at runtime without any problems.

On the other hand, if user’s runtime selections can change the aggregation results, then pre-building static results will now work. We also cannot rebuild a million row MV each time a user makes a runtime selection.

This means we need a solution that will perform the aggregation at runtime in response to user’s selections.

Aggregate View

We are proposing adding a new entity Type called AggregateView. Like a MaterializedView, an AggregateView would be configured by setting its defining SQL. However, that is where the similarities end. An AggregateView is not a materialization. This means that we do not create an actual table in the database for this view. Instead, an AggregateView provides a simple, table like, layer of of abstraction over a complex aggregation query. To better understand this concept, lets show how we would use an AggregateView to create the files-to-participant-perspective.

...