...
Technically, the aggregate statics statistics for the PART_STAGE are gathered in two separate columns using the following in the the SQL select:
...
These statistics seem similar to aggregate-enumeration. Should they really be two separate types?Now that we understand exactly what we want in our files
Filtering Before Aggregation
So far, we have only discussed applying filters to aggregated columns. Such filter must be applied after the aggregation results are calculated. However, the main use case requires some runtime filtering to occur before aggregation is applied.
For example, a user might start with the participants-with-aggregated-participants perspective, how do we tell Synapse what we want? We will tackle this problem in the next sectionfiles perspective and narrow down their selection of participant IDs to: 2, 4, 5, & 7.
They then will want to apply their selection of participant IDs as a filter to files-with-aggregated-participants perspective. This filter must change the aggregated participant data. This means the filter must be applied before aggregation occurs.
Here is what the results look like with this filter applied:
Expand | ||
---|---|---|
| ||
|
FILE_ID | FILE_NAME | FILE_TYPE | FILE_SIZE | PART_COUNT | PART_STATE | PART_AGE | PART_ID |
---|---|---|---|---|---|---|---|
1 | f1 | raw | 100 | 4 | {"one": 2, "two": 2} | {"max": 40, "min": 10} | 2,4,5,7 |
2 | f2 | raw | 200 | 2 | {"one": 2, "two": 0} | {"max": 40, "min": 20} | 2,4 |
3 | f3 | raw | 300 | 2 | {"one": 0, "two": 2} | {"max": 30, "min": 10} | 5,7 |
4 | f3 | raw | 400 | 2 | {"one": 2, "two": 0} | {"max": 40, "min": 20} | 2,4 |
5 | f5 | proc | 100 | 2 | {"one": 0, "two": 2} | {"max": 30, "min": 10} | 5,7 |
6 | f6 | proc | 200 | 2 | {"one": 1, "two": 1} | {"max": 20, "min": 10} | 2,5 |
7 | f7 | proc | 300 | 2 | {"one": 1, "two": 1} | {"max": 40, "min": 30} | 4,7 |
8 | f8 | proc | 400 | 4 | {"one": 2, "two": 2} | {"max": 40, "min": 10} | 2,4,5,7 |
Notice, the previous unfiltered results for file:1 had a PART_COUNT=8, while the filtered results have a PART_COUNT=4.
If the aggregation results did not need to change at runtime, then we could simply use a materialized view as a solution to the entire problem. For example, we could pre-build a MV with millions rows of “static” aggregate data. End users could query this “static” data at runtime without any problems.
On the other hand, if user’s runtime selections can change the aggregation results, then pre-building static results will now work. We also cannot rebuild a million row MV each time a user makes a runtime selection.
This means we need a solution that will perform the aggregation at runtime in response to user’s selections.
Aggregate View
We are proposing adding a new entity Type called AggregateView. Like a MaterializedView, an AggregateView would be configured by setting its defining SQL. However, that is where the similarities end. An AggregateView is not a materialization. This means that we do not create an actual table in the database for this view. Instead, an AggregateView provides a simple, table like, layer of of abstraction over a complex aggregation query. To better understand this concept, lets show how we would use an AggregateView to create the files-to-participant-perspective.
...