Content Comparison

...

Jira Legacy

server	System Jira
serverId	ba6fb084-9827-3160-8067-8ac7470f78b2
key	PLFM-8728

Background

The Synapse offers a search feature for users, accessible through a dedicated Search API https://rest-docs.synapse.org/rest/org/sagebionetworks/repo/model/search/query/SearchQuery.html . Initially, search was implemented using AWS CloudSearch. However, due to its limitations and the fact that it is no longer available for new deployments, it is being phased out. While AWS has not officially announced an end-of-support date for existing users, they recommend migrating to Amazon OpenSearch Service. More information is available here https://aws.amazon.com/blogs/big-data/transition-from-amazon-cloudsearch-to-amazon-opensearch-service/

CloudSearch limitation

...

solved by OpenSearch

Category	CloudSearch limitation	OpenSearch Solution
Query Language	Limited query flexibility	Full Elasticsearch Query DSL (JSON-based, supports `bool`, `range`, `script`, etc.)
Custom Ranking	Minimal relevance tuning (Only via `expr`)	Function score queries, script scoring, boosting fields for advanced tuning
Multi-field search	No native multi-field search	Use multi-match to search across multiple fields simultaneously
Field Types	Limited field types (no `boolean`, `nested`, etc.)	Wide support: `text`, `keyword`, `boolean`, `geo_point`, `nested`, etc.
Monitoring	No detailed logging or query trace	Built-in slow query logs, profiling, and monitoring via CloudWatch + APIs
Aggregation/Facets	Limited aggregation capabilities (facets only)	Aggregations framework: `terms`, `range`, `histogram`, `date_histogram`, etc.
Security	Only IAM-based security	Fine-grained access control (roles, field-level, document-level security)
Data ingestion	Limited ingest and update options (Only batch document uploads)	Supports bulk API, ingestion pipelines, Logstash, Beats, real-time indexing
Testing	No testing tools or dev utilities	OpenSearch Dashboards with Dev Tools, query profiling, real-time testing
Scaling & Performance Tuning	Scaling is automatic, but not tunable	Control over shards, replicas, index-level tuning, or serverless
Integration	Limited integration ecosystem	Integrates with Kibana (Dashboards), Beats, Logstash, Grafana, etc.
Autocomplete	Simple suggesters	Completion + edge n-gram + full control

OpenSearch Introduction

OpenSearch is a distributed search and analytics engine. After adding data to OpenSearch, we can perform full-text searches on it with all of the features we might expect: search by field, search multiple indexes, boost fields, rank results by score, sort results by field, and aggregate results.For more information, see the OpenSearch documentation.

OpenSearch Terminology

Document: A document is a unit that stores information (text or structured data). In OpenSearch, documents are stored in JSON format.

...

Relevance: When a search query is executed, OpenSearch matches the query terms against the indexed documents and assigns a relevance score to each result. This score indicates how closely a document matches the query criteria.

OpenSearch deployment options

OpenSearch Service domain: Amazon OpenSearch Service provides a managed environment to deploy and operate OpenSearch clusters. It gives you full control over configuration, including instance types, storage, and network settings. It supports fine-tuned performance optimization, availability zones, VPC access, and security configurations.This option might require in depth knowledge about cluster management and maintenance and more aligned with a long-term always live deployment.

OpenSearch Serverless: Amazon OpenSearch Serverless is an on-demand, serverless option for Amazon OpenSearch Service that eliminates the operational complexity of provisioning, configuring, and tuning OpenSearch clusters. With OpenSearch Serverless, we can search and analyze large volumes of data without managing the underlying infrastructure. An OpenSearch Serverless collection is a group of OpenSearch indexes that work together to support a specific workload or use case. Collections simplify operations compared to self-managed OpenSearch clusters, which require manual provisioning. For more information, see the https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-overview.html

Types of collection in OpenSearch Serverless

Search: Full-text search based on natural language text, where documents are indexed with analyzers (tokenizers, stemmers, etc.) to support ranking, relevance, and partial matching. The main use cases are to find relevant documents based on user-entered keywords, to care about ranking, matching accuracy, and highlighting and when data is text heavy.

...

Time Series: Time series search focuses on analyzing machine-generated, timestamped data such as logs, metrics, and events. The goal is often operational insight, security monitoring, or business performance tracking.

Feasibility Evaluation

To assess whether the features currently supported by CloudSearch can be replicated in OpenSearch, I explored the OpenSearch Serverless offering. As part of this evaluation, I created a collection of type "Search", which is specifically designed for full-text search use cases. OpenSearch Serverless supports three collection types: Search, Time Series, and Vector. Since our requirement focuses on full-text search, the “Search” type was selected. The following steps outline the approach taken during this evaluation:

...

Version	Old Version 16	New Version 17
Changes made by	Sandhra Sokhal	Sandhra Sokhal
Saved on	May 19, 2025	May 19, 2025

Versions Compared

Key