/
Limit Facet Statistics Query Results

Limit Facet Statistics Query Results

Today, when a client runs a query against a table/view with Facets, the caller can specify that they want the query results to include the statistics for all faceted columns by setting the following in their QueryBundleRequest :

{ ... partMask=0x20 ... }

 

This works well when the client utilizes all of the facet statistics. However, for many of the portals the UI only shows a sub-set of the facet statistics to the end-user. For example consider the following from the https://eliteportal.synapse.org

EL-portal-example.png

Notice, that this view has fourteen faceted columns (Age, Consortium, Diagnosis, Project, Resource Type, Sex, Species, Study, Analysis Type, Data Type, Family Study Participant, File Format, Is Model System, Is Multi-Specimen). However, the UI only shows statistics for the four selected facets: Age, Study, Data Type, and Species. This means the query services spends times gathering statistics for nine facets, that are not even shown to the user.

 

With the current query API it is only possible to specify all-or-none of the facet statistics in the request. Ideally, the query API would have a way for clients to specify which facet statistics they actual intend to use in their query request. This document provides a proposal of a simple exertion to the query request to limit which stats to return.

New lines (10-16) to be added to QueryBundleRequest:

{ "properties": { "query": { "$ref": "org.sagebionetworks.repo.model.table.Query" }, "partMask": { "type": "integer", "description": "Optional, default all. The 'partsMask' is an integer mask that can be combined into to request any desired part. The mask is defined as follows:<ul><li>Query Results <i>(queryResults)</i> = 0x1</li><li>Query Count <i>(queryCount)</i> = 0x2</li><li>Select Columns <i>(selectColumns)</i> = 0x4</li><li>Max Rows Per Page <i>(maxRowsPerPage)</i> = 0x8</li><li>The Table Columns <i>(columnModels)</i> = 0x10</li><li>Facet statistics for each faceted column <i>(facetStatistics)</i> = 0x20</li><li>The sum of the file sizes <i>(sumFileSizesBytes)</i> = 0x40</li><li>The last updated on date <i>(lastUpdatedOn)</i> = 0x80</li><li>The combined SQL query including additional filters <i>(combinedSql)</i> = 0x100</li><li>The list of actions required for any file in the query<i>(actionsRequired)</i> = 0x200 (The query.selectFileColumn needs to be specified)</li></ul>" }, "includeFacetStatsFor": { "description": "Optional. When partMask is set to include facet statistics (0x20), use this to limit which facet columns should return facet statistics. Each value should be the name of a facet column for which the facet statistics should be returned in the query results. When empty, or excluded, all faceted columns statistics will be returned (when partMask=0x20).", "type": "array", "items": { "type": "object", "properties": { "columnName": { "type": "string", "description": "The column name of the facet." "required": "true" }, "jsonPath": { "type": "string" "description": "The optional JSON Path, which should be provided for JSON column facets", "required": "false" } } } } } }