/
User Behavior Search Metrics & Data Collection (2025)

User Behavior Search Metrics & Data Collection (2025)

This document is a research output of SWC-7240 - Getting issue details... STATUS in support of the epic PLFM-8589 - Getting issue details... STATUS .

Summary

In 2025, we wish to improve the quality of Synapse search using a metrics-driven approach. We must first gather data to establish metrics that we will track as we make changes to the system.

We will continue to use Google Analytics to collect new metrics related to search queries, search results, and which search results are clicked by users. This data can be used to establish initial metrics for Synapse search, such as Time to First Relevant Result and Query Success Rate. Classifying search results and user behavior is complex and imperfect, so the methodology of these metrics must be carefully scrutinized and can be refined over time.

Goals

  • Decide on a set of measurements that we can begin to use to measure user behavior of Synapse search features

  • Determine the set of search features / web pages on which we should capture new events

  • Determine the new Google Analytics events that are needed to compute those metrics and validate that our desired measurements can be computed with such events.

Non-Goals

  • Measuring the precision or recall of search results. Doing so would require classifying production data based on expected outcomes. Additionally, this can be instrumented using just the Search APIs--the UI is irrelevant for these kinds of measurements.

  • Creating or adopting a new analytics/tracking service to measure behavior

  • Comparing performance of LLMs used in Synapse. Due to the nature of LLMs, the scope of functionality is unbounded, making it hard to measure and even harder to aggregate and compare measurements between users.

Background

There are many ways in which we can gather information about the usage of our search API(s). We have decided to track user behavior in the Synapse front-end (rather than usage of the API) for various reasons:

  • The Search API(s) are used in many contexts where the constraints and user expectations are different. For example, the entity search page and the entity finder’s search functionality use the same API, but the use cases and user expectations for these features differ.

  • Certain metrics that are commonly used to benchmark information retrieval services rely on measuring user interactions, such as time spent on a search results page, that would be difficult or impossible to do using an API access record

  • A subset of potential improvements are fundamentally related to user experience in the web UI. We may make changes to the user interface that impact the perceived quality of search without making any changes to the API.

 

image-20250218-154455.png
Entity search page
image-20250218-154553.png
Search feature in the Entity Finder. These use the same API, but users are typically trying to accomplish different goals.

Today, we use Google Analytics to capture user metrics on Synapse. Google Analytics provides a dashboard that can be used to query and track the captured events. Only a subset of users provide data to Google Analytics; it will not track users who have disabled tracking cookies or use browser extensions that block trackers.

While we gather many metrics and events today, it is not possible to compute or estimate many search metrics. We can easily add new custom events to calculate new metrics.

Metrics & New Custom Events

We are aiming to collect many of the suggested user behavior-based metrics from , such as

  • Time to First Relevant Result

  • Search Abandonment Rate

  • Click Position of First Relevant Result

  • Query Success Rate

  • Mean Reciprocal Rank

We propose to collect new data by tracking the following new events (technical specification below)

  • Upon the submission of a new search query, collect the query terms and context

  • Upon receiving the results of a search query, collect the query terms, context, page number, and total number of results (if available)

  • Upon receiving the results of a search query, for each displayed result, collect the item identifier and rank

  • Upon a user-click of a search result, collect the item identifier and rank

With this information, in conjunction with the information already in Google Analytics we should be able to estimate some of these measurements.

Many of these metrics rely on marking a result as ‘relevant’. Google analytics does track ‘engagement time’, which we can use as a surrogate for ‘relevance’ (e.g. 30s of engagement time after clicking a search result indicates that the result was relevant).

Major caveats to this approach include:

  • It is not clear if Google Analytics' current measurements are the right criteria for us. More research must be done to see if these are appropriate, or if additional data not described in this document must be collected to create effective metrics

  • Google Analytics has limits in terms of how data can be collected in custom events. These limits may cause us to encounter challenges gathering data

  • The Google Analytics UI has limits in terms of how data can be transformed and visualized. As we try to analyze this data, we may find that we must export the data and perform analysis in a different system.

Technical Specification: New Custom Events

This section defines the specific events and event data that we will send to GA. Note the limitations on GA4 custom events.

Shared JSON Schema definitions

Many of these events will track the same data. To simplify our definitions, these model definitions are shared.

SearchContext

Defines a specific place in the app where a search is conducted. This can be extended as additional search pages are added or changed.

{ "$id": "/models/SearchContext", "type": "string", "enum": [ "synapse_entity", "synapse_people", "synapse_team" ] }

SearchQuery

Captures all of the fields that may be included in a search, formatted and serialized to be GA4 event-compatible

{ "$id": "/models/SearchQuery", "type": "object", "properties": { "query_term": { "type": "string" }, "serialized_boolean_query": { "type": "string" }, "serialized_range_query": { "type": "string" } } }

Event Name

Triggered

JSON Schema

Examples

Event Name

Triggered

JSON Schema

Examples

search_query_submitted

When a user navigates to a search page and submits a search query

{ "type": "object", "allOf": [ { "properties": { "search_context": { "$ref": "/models/SearchContext" } } }, { "$ref": "/models/SearchQuery" }, ] }

Entity

Team

User

search_results_returned

When search results are returned and displayed to the user

Entity

Team (total results not available via API)

User (total results not available via API)

search_result_returned

 

When an individual search result is returned and displayed to the user

Entity

Team

User

search_result_clicked

When an individual search result is clicked by the user

Future Directions

Once these events are established and basic data is gathered, we must validate that the volume of data we are collecting is sufficient to compare after making changes to our search experience.

Some metrics, such as search abandonment, may be more reliably measured by creating custom events that are triggered in the app when a user meets certain conditions (e.g. the user leaves the search page after clicking no results). To improve the measurement of these metrics, we may wish to create new, complex events.

Other common metrics for validating the quality of search results are based on conversion rates, i.e. what percentage of searches resulted in some key event. Some ideas for those key events for our platform are

  • Adding file(s) to download cart

  • Downloading a file

  • Creating a data access request

  • Posting in a forum

We could consider adding more custom events to track correlation between search and key events like these.

Appendix

  1. How to Evaluate Search Engines