Sage Portals OpenSearch Integration
Examples of Limitations of Table and Portal Search
Introduction
In 2021 we introduced full text search (FTS) for synapse tables driven by the NF portal need and funding. At the time we did an analysis of the available options, and we ended up choosing the FTS functionality that we had already available in MySQL. The reasoning was mainly because of costs and ease of integration (Synapse tables are stored in MySQL tables).
Throughout the years we received a small number of reports of the limitations of the current search implementation from the NF portal (see Examples of Limitations of Table and Portal Search ).
More recently the search functionality was extended client side to perform a “multi-table” search across the resources (tables) exposed by a Portal and integrated into the cancer complexity portal home page (https://cancercomplexity.synapse.org/ ). This brought to light some additional limitations (See ), that were addressed exposing boolean mode and adapting it to support phrase search.
While we could extend the portal search functionality to support more advanced features such as fuzzy matching, auto-complete and potentially vector search (e.g. semantic search) these functionalities are not supported natively by MySQL FTS and adding them to the current system would require a big technical effort and it is probably wiser to just use a dedicated search index.
OpenSearch
Looking at the (managed) options we have in AWS we still end up with the OpenSearch service (OpenSearch is a fork of ElasticSearch). Since 2021, it became the de-facto replacement for the old cloud AWS Search Service and an alternative to the elasic.co offering.
In 2021 we looked at costs (estimated between $500 to $2000/month) and the situation improved with a serverless option of OpenSearch (OSS). While the base cost for OSS is still relatively high (starts from ~$350/month for small production workloads) it is a fully managed solution with autoscaling, this is quite different from the alternative offering (OpenSearch Domain) where a cluster needs to be sized in advance (e.g. number of data nodes, number of orchestrator nodes, warm and cold storage etc) and managed according to the number of indices, index size as well as search and indexing load.
The pricing of the OSS is based on compute and storage and uses “OpenSearch Compute Units (OCUs). The number of OCUs corresponds directly to the CPU, memory, Amazon EBS storage, and I/O resources required to index data or run queries. One OCU comprises 6 GB of RAM, corresponding vCPU, GP3 storage (used to provide fast access to the most frequently accessed data), and data transfer to Amazon Simple Storage Service (S3)”.
According to https://aws.amazon.com/blogs/big-data/amazon-opensearch-serverless-cost-effective-search-capabilities-at-any-scale/ “A full OCU includes one vCPU, 6GB of RAM and 120GB of storage.”.
Indexing and search loads are billed separately. We are billed at least for a minimum of 2 OCUs (1 OCU [0.5 x 2] indexing includes primary and standby, and 1 OCU [0.5 x 2] search includes one replica for HA), the OCU in an account are shared across all the collections of the same type (vector vs search and time series) in the account.
Note that Synapse already uses an OOS deployment for its AI agent and RAG integration with the synapse help documents but the collection used to store this data is a vector type so the OCU will NOT be shared (meaning that we will have to pay on top of what we already pay for that).
Note that OpenSearch has a limit of 4000 indices that can be managed by a single cluster (or collection in the OOS terminology), the reason is because in an open search deployment there are dedicated nodes used for orchestration that needs to keep in memory some information about every index (and shards) in the cluster.
To control costs, there is a (default) limit of 10 OCU at the account level (this limit is adjustable), in worst case scenario (indexing and search 100% of the time for a whole month) we would spend less than $2K:
The estimate above considers a total of 10 OCU (5 for indexing and 5 for searching) used constantly for a month.
The other option for deploying OpenSearch in AWS is to provision a cluster with sizing and compute requirements established in advance (e.g. similar to deploying an RDS instance but with some more complexity in the cluster configuration). This option might require in depth knowledge about cluster management and maintenance and more aligned with a long-term always live deployment (but it is possible to deploy a small cluster for around $500).
Portals Structure
Each Synapse portal stores its data using a combination of entity views, dataset and dataset collections (for data directly referencing entities), tables (for data that might not be representable as entities such as publications), materialized views (to represent relationships or unions of multiple sources) and virtual tables (for cohort builder).
The data is then presented to the user mostly in tabular form querying the data using the Synapse tables SQL functionality; The same functionality is used for full text search (which is integrated in the Synapse tables SQL) and has the advantage that the results are returned in the same tabular and structured format of a normal query (and can therefore be easily integrated into the existing UI).
Another key functionality that is part of the portal experience is the integrated faceted navigation that is also integrated with the FTS feature.
Each portal stores in a configuration file directly in the source code of the portal itself the list of Synapse tables that are used in the form of a basic Synapse SQL query (e.g. “SELECT * FROM syn123”, for example the resources of the NF portal are stored in https://github.com/Sage-Bionetworks/synapse-web-monorepo/blob/main/apps/portals/nf/src/config/resources.ts ). A framework of shared components is then used to build the visualization needed in each portal section. The framework handles the table querying and visualization.
By convention each portal uses a dedicated “backend” Synapse project that contains the tables and other parts that compose the portal, such as wiki pages (Note however that in some portals multiple projects are used, such as the elite portal).
We currently have 11 portals that use the same type of table navigation (The following is a snapshot of the data as of 5/6/2025):
Portal | Table | Type | Search Enabled | Row Count | Column Count | Size (MB) |
---|---|---|---|---|---|---|
Tables Count: 21 Total Size: 350 MiB Uses FTS: Yes Backend: https://www.synapse.org/Synapse:syn26451327
| TableEntity | True | 241 | 11 | 0.44 | |
TableEntity | True | 6 | 10 | 0.14 | ||
TableEntity | True | 193 | 2 | 0.06 | ||
TableEntity | True | 30 | 7 | 0.16 | ||
EntityView | True | 29 | 18 | 0.28 | ||
TableEntity | False | 4 | 2 | 0.06 | ||
TableEntity | True | 1,147 | 16 | 2.75 | ||
TableEntity | False | 37 | 7 | 0.13 | ||
TableEntity | False | 257 | 5 | 0.16 | ||
DatasetCollection | True | 58 | 42 | 0.69 | ||
MaterializedView | True | 1147 | 39 | 2.67 | ||
MaterializedView | True | 244 | 9 | 0.3 | ||
MaterializedView | True | 33 | 5 | 0.11 | ||
MaterializedView | True | 1,851 | 12 | 12.38 | ||
MaterializedView | True | 923 | 20 | 7.36 | ||
MaterializedView | True | 128 | 12 | 1.66 | ||
MaterializedView | True | 584 | 8 | 0.78 | ||
MaterializedView | True | 237 | 15 | 0.39 | ||
MaterializedView | True | 285 | 20 | 2.75 | ||
MaterializedView | True | 303,547 | 23 | 314.36 | ||
MaterializedView | False | 22 | 7 | 0.16 | ||
https://cancercomplexity.synapse.org Tables Count: 9 Total Size: 817.03 MB Uses FTS: Yes | TableEntity | True | 3,969 | 20 | 24.66 | |
TableEntity | False | 242 | 9 | 1.63 | ||
TableEntity | True | 970 | 19 | 8.56 | ||
TableEntity | True | 154 | 17 | 3.22 | ||
TableEntity | True | 311 | 46 | 2.08 | ||
TableEntity | True | 83 | 18 | 0.31 | ||
TableEntity | True | 9 | 27 | 0.28 | ||
MaterializedView | False | 24,477 | 13 | 30.06 | ||
EntityView | False | 266,967 | 82 | 746.23 | ||
Tables Count: 7 Total Size: 3.52 MB Uses FTS: No | TableEntity | False | 14 | 4 | 0.09 | |
TableEntity | False | 36 | 6 | 0.13 | ||
TableEntity | False | 7 | 3 | 0.08 | ||
MaterializedView | False | 32 | 8 | 0.17 | ||
MaterializedView | False | 4,313 | 7 | 2.67 | ||
MaterializedView | False | 115 | 7 | 0.16 | ||
MaterializedView | False | 83 | 11 | 0.22 | ||
https://eliteportal.synapse.org Tables Count: 12 Total Size: 11 MB Uses FTS: Yes Backend: https://www.synapse.org/Synapse:syn27229419, https://www.synapse.org/Synapse:syn51223800 | VirtualTable | N/A | 5,368 | 12 | 0 | |
VirtualTabvle | N/A | 52,567 | 28 | 0 | ||
TableEntity | True | 67 | 13 | 0.19 | ||
TableEntity | True | 6 | 15 | 0.25 | ||
EntityView | True | 586 | 15 | 1.83 | ||
MaterializedView | True | 9 | 15 | 0.25 | ||
MaterializedView | True | 16 | 27 | 0.05 | ||
MaterializedView | False | 24,870 | 2 | 8.58 | ||
TableEntity | False | 2 | 6 | 0.13 | ||
TableEntity | False | 6 | 8 | 0.14 | ||
TableEntity | False | 6 | 3 | 0.08 | ||
TableEntity | False | 5 | 6 | 0.13 | ||
Tables Count: 4 Total Size: 465 MB Uses FTS: No | EntityView | False | 735,921 | 22 | 464.16 | |
EntityView | False | 17 | 27 | 0.3 | ||
EntityView | False | 1 | 16 | 0.06 | ||
EntityView | False | 32 | 21 | 0.38 | ||
https://challenges.synapse.org Tables Count: 1 Total Size: 1 MB Uses FTS: Yes | EntityView | True | 27 | 82 | 1.02 | |
Tables Count: 5 Total Size: 1.2 MB Uses FTS: No | EntityView | False | 24 | 23 | 0.38 | |
EntityView | False | 10 | 18 | 0.27 | ||
EntityView | False | 16 | 19 | 0.22 | ||
EntityView | False | 9 | 17 | 0.25 | ||
TableEntity | False | 75 | 7 | 0.09 | ||
https://b2ai.standards.synapse.org Tables Count: 2 Total Size: 2.3 MB Uses FTS: Yes | TableEntity | False | 4 | 6 | 0.13 | |
MaterializedView | True | 898 | 23 | 2.16 | ||
Tables Count: 5 Total Size: 48 MB Uses FTS: No | TableEntity | False | 1 | 2 | 0.03 |