Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Jira Legacy
serverSystem Jira
serverIdba6fb084-9827-3160-8067-8ac7470f78b2
keyPLFM-8728

Background

The Synapse offers a search feature for users, accessible through a dedicated Search API https://rest-docs.synapse.org/rest/org/sagebionetworks/repo/model/search/query/SearchQuery.html . Initially, search was implemented using AWS CloudSearch. However, due to its limitations and the fact that it is no longer available for new deployments, it is being phased out. While AWS has not officially announced an end-of-support date for existing users, they recommend migrating to Amazon OpenSearch Service. More information is available here https://aws.amazon.com/blogs/big-data/transition-from-amazon-cloudsearch-to-amazon-opensearch-service/

CloudSearch limitation

...

solved by OpenSearch

Category

CloudSearch limitation

OpenSearch Solution

Query Language

Limited query flexibility

Full Elasticsearch Query DSL (JSON-based, supports bool, range, script, etc.)

Custom Ranking

Minimal relevance tuning (Only via expr)

Function score queries, script scoring, boosting fields for advanced tuning

Multi-field search

No native multi-field search

Use multi-match to search across multiple fields simultaneously

Field Types

Limited field types (no boolean, nested, etc.)

Wide support: text, keyword, boolean, geo_point, nested, etc.

Monitoring

No detailed logging or query trace

Built-in slow query logs, profiling, and monitoring via CloudWatch + APIs

Aggregation/Facets

Limited aggregation capabilities (facets only)

Aggregations framework: terms, range, histogram, date_histogram, etc.

Security

Only IAM-based security

Fine-grained access control (roles, field-level, document-level security)

Data ingestion

Limited ingest and update options
(Only batch document uploads)

Supports bulk API, ingestion pipelines, Logstash, Beats, real-time indexing

Testing

No testing tools or dev utilities

OpenSearch Dashboards with Dev Tools, query profiling, real-time testing

Scaling & Performance Tuning

Scaling is automatic, but not tunable

Control over shards, replicas, index-level tuning, or serverless

Integration

Limited integration ecosystem

Integrates with Kibana (Dashboards), Beats, Logstash, Grafana, etc.

Autocomplete

Simple suggesters

Completion + edge n-gram + full control

OpenSearch Introduction

OpenSearch is a distributed search and analytics engine. After adding data to OpenSearch, we can perform full-text searches on it with all of the features we might expect: search by field, search multiple indexes, boost fields, rank results by score, sort results by field, and aggregate results.For more information, see the OpenSearch documentation.

OpenSearch Terminology

Document: A document is a unit that stores information (text or structured data). In OpenSearch, documents are stored in JSON format.

...

Relevance: When a search query is executed, OpenSearch matches the query terms against the indexed documents and assigns a relevance score to each result. This score indicates how closely a document matches the query criteria.

OpenSearch deployment options

OpenSearch Service domain: Amazon OpenSearch Service provides a managed environment to deploy and operate OpenSearch clusters. It gives you full control over configuration, including instance types, storage, and network settings. It supports fine-tuned performance optimization, availability zones, VPC access, and security configurations.This option might require in depth knowledge about cluster management and maintenance and more aligned with a long-term always live deployment.

OpenSearch Serverless: Amazon OpenSearch Serverless is an on-demand, serverless option for Amazon OpenSearch Service that eliminates the operational complexity of provisioning, configuring, and tuning OpenSearch clusters. With OpenSearch Serverless, we can search and analyze large volumes of data without managing the underlying infrastructure. An OpenSearch Serverless collection is a group of OpenSearch indexes that work together to support a specific workload or use case. Collections simplify operations compared to self-managed OpenSearch clusters, which require manual provisioning. For more information, see the https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-overview.html

Types of collection in OpenSearch Serverless

Search: Full-text search based on natural language text, where documents are indexed with analyzers (tokenizers, stemmers, etc.) to support ranking, relevance, and partial matching. The main use cases are to find relevant documents based on user-entered keywords, to care about ranking, matching accuracy, and highlighting and when data is text heavy.

...

Time Series: Time series search focuses on analyzing machine-generated, timestamped data such as logs, metrics, and events. The goal is often operational insight, security monitoring, or business performance tracking.

Feasibility Evaluation

To assess whether the features currently supported by CloudSearch can be replicated in OpenSearch, I explored the OpenSearch Serverless offering. As part of this evaluation, I created a collection of type "Search", which is specifically designed for full-text search use cases. OpenSearch Serverless supports three collection types: Search, Time Series, and Vector. Since our requirement focuses on full-text search, the “Search” type was selected. The following steps outline the approach taken during this evaluation:

...