...
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
Background
The Synapse offers a search feature for users, accessible through a dedicated Search API https://rest-docs.synapse.org/rest/org/sagebionetworks/repo/model/search/query/SearchQuery.html . Initially, search was implemented using AWS CloudSearch. However, due to its limitations and the fact that it is no longer available for new deployments, it is being phased out. While AWS has not officially announced an end-of-support date for existing users, they recommend migrating to Amazon OpenSearch Service. More information is available here https://aws.amazon.com/blogs/big-data/transition-from-amazon-cloudsearch-to-amazon-opensearch-service/
CloudSearch limitation
...
solved by OpenSearch
Category | CloudSearch limitation | OpenSearch Solution |
---|---|---|
Query Language | Limited query flexibility | Full Elasticsearch Query DSL (JSON-based, supports |
Custom Ranking | Minimal relevance tuning (Only via | Function score queries, script scoring, boosting fields for advanced tuning |
Multi-field search | No native multi-field search | Use multi-match to search across multiple fields simultaneously |
Field Types | Limited field types (no | Wide support: |
Monitoring | No detailed logging or query trace | Built-in slow query logs, profiling, and monitoring via CloudWatch + APIs |
Aggregation/Facets | Limited aggregation capabilities (facets only) | Aggregations framework: |
Security | Only IAM-based security | Fine-grained access control (roles, field-level, document-level security) |
Data ingestion | Limited ingest and update options | Supports bulk API, ingestion pipelines, Logstash, Beats, real-time indexing |
Testing | No testing tools or dev utilities | OpenSearch Dashboards with Dev Tools, query profiling, real-time testing |
Scaling & Performance Tuning | Scaling is automatic, but not tunable | Control over shards, replicas, index-level tuning, or serverless |
Integration | Limited integration ecosystem | Integrates with Kibana (Dashboards), Beats, Logstash, Grafana, etc. |
Autocomplete | Simple suggesters | Completion + edge n-gram + full control |
OpenSearch Introduction
OpenSearch is a distributed search and analytics engine. After adding data to OpenSearch, we can perform full-text searches on it with all of the features we might expect: search by field, search multiple indexes, boost fields, rank results by score, sort results by field, and aggregate results.For more information, see the OpenSearch documentation.
OpenSearch Terminology
Document: A document is a unit that stores information (text or structured data). In OpenSearch, documents are stored in JSON format.
...
Relevance: When a search query is executed, OpenSearch matches the query terms against the indexed documents and assigns a relevance score to each result. This score indicates how closely a document matches the query criteria.
OpenSearch deployment options
OpenSearch Service domain: Amazon OpenSearch Service provides a managed environment to deploy and operate OpenSearch clusters. It gives you full control over configuration, including instance types, storage, and network settings. It supports fine-tuned performance optimization, availability zones, VPC access, and security configurations.This option might require in depth knowledge about cluster management and maintenance and more aligned with a long-term always live deployment.
OpenSearch Serverless: Amazon OpenSearch Serverless is an on-demand, serverless option for Amazon OpenSearch Service that eliminates the operational complexity of provisioning, configuring, and tuning OpenSearch clusters. With OpenSearch Serverless, we can search and analyze large volumes of data without managing the underlying infrastructure. An OpenSearch Serverless collection is a group of OpenSearch indexes that work together to support a specific workload or use case. Collections simplify operations compared to self-managed OpenSearch clusters, which require manual provisioning. For more information, see the https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-overview.html
Types of collection in OpenSearch Serverless
Search: Full-text search based on natural language text, where documents are indexed with analyzers (tokenizers, stemmers, etc.) to support ranking, relevance, and partial matching. The main use cases are to find relevant documents based on user-entered keywords, to care about ranking, matching accuracy, and highlighting and when data is text heavy.
...
Time Series: Time series search focuses on analyzing machine-generated, timestamped data such as logs, metrics, and events. The goal is often operational insight, security monitoring, or business performance tracking.
Feasibility Evaluation
To assess whether the features currently supported by CloudSearch can be replicated in OpenSearch, I explored the OpenSearch Serverless offering. As part of this evaluation, I created a collection of type "Search", which is specifically designed for full-text search use cases. OpenSearch Serverless supports three collection types: Search, Time Series, and Vector. Since our requirement focuses on full-text search, the “Search” type was selected. The following steps outline the approach taken during this evaluation:
...