Rest API Design for OpenSearch Integration

Rest API Design for OpenSearch Integration

Portal Search: REST API Reference

Author: @Bryan Fauble


Overview

Portal Search provides REST endpoints for creating OpenSearch-backed search indexes over Synapse table-like entities and executing full-text search queries against them.

Resources

The API is organized around four standalone configuration resources under /repo/v1/search/* and one SearchIndex Synapse Entity managed via the standard entity controller under /repo/v1/entity/*.

All resources are logically scoped to an Organization (an existing Synapse concept — CRUD and ACL management at /schema/organizations/*), but note that SearchIndex does not have its own organizationId field; it is scoped via the standard entity parentId (Project or Folder).

Resource

Purpose

Resource

Purpose

SynonymSet

Reusable synonym rules (e.g., "NF1" = "neurofibromatosis 1"). Cannot be deleted while referenced by a SearchConfiguration.

ColumnAnalyzerOverride

Per-column text analysis overrides. Cannot be deleted while referenced by a SearchConfiguration.

TextAnalyzer

A standalone reusable resource that defines how text is analyzed (tokenizer, filters, synonym awareness). Referenced by ColumnAnalyzerOverrideEntry (indexAnalyzerId/searchAnalyzerId) and SearchConfiguration (defaultAnalyzerId). System analyzers (IDs 1-6) are bootstrapped on startup; user-defined start at ID 1000+.

SearchConfiguration

Bundles synonym sets and column overrides. References a default TextAnalyzer (defaultAnalyzerId) for columns without explicit overrides. Can be associated with projects or folders via the Project Settings framework for inheritance.

SearchIndex (Entity)

A Synapse Entity (concreteType: "org.sagebionetworks.repo.model.table.search.SearchIndex") that ties a Synapse table-like entity (via definingSQL) to an OpenSearch index, optionally referencing a SearchConfiguration. The source entity is parsed from definingSQL at runtime. CRUD is via /repo/v1/entity/*; only search operations use /repo/v1/search/*.

Resources compose as:

SearchIndex → SearchConfiguration → {SynonymSets[], ColumnAnalyzerOverrides[]}.

TextAnalyzer is a standalone resource referenced by ColumnAnalyzerOverrideEntry.indexAnalyzerId/searchAnalyzerId and SearchConfiguration.defaultAnalyzerId.

Design Principles

  • Organization-scoped authorization for configuration. The four configuration resources (SynonymSet, ColumnAnalyzerOverride, TextAnalyzer, SearchConfiguration) are Organization-scoped and identified by organizationId. They are publicly readable; mutations require Organization ACL permissions. SearchIndex is a standard Synapse Entity without its own organizationId field; it is scoped by parentId (Project or Folder), and its ACL follows normal entity rules.

  • Secure search access. Search queries and autocomplete requests are not fully public. Callers must have:

    • READ on the SearchIndex entity and

    • Sufficient permissions to read from the underlying table referenced by definingSQL (mirrors the MaterializedView authorization pattern).

  • Build-once semantics with rebuild on update. AOSS indexes are point-in-time snapshots. There are no incremental updates. The lifecycle worker builds (or rebuilds) indexes on both CREATE and UPDATE of SearchIndex entities.

  • Single-entity constraint. definingSQL must reference exactly one entity. Multi-entity JOINs are rejected with 400 Bad Request.

  • Shared resource protection. SynonymSets and ColumnAnalyzerOverrides cannot be deleted while referenced by any SearchConfiguration.

  • Name uniqueness.

    • For SynonymSet, ColumnAnalyzerOverride, and SearchConfiguration, names are unique per Organization: UNIQUE(organizationId, name).

    • For TextAnalyzer, system analyzer names are globally unique; user-defined analyzer names are unique per Organization.

    • For SearchIndex (as a Synapse Entity), name uniqueness follows the standard entity rules: unique within its parent container (parentId).

  • Single polling point. Search queries run as async jobs. If an index is still building, the worker automatically retries — the client only polls the async result endpoint.


Quick Start

Step 1: Create Supporting Resources

Create a synonym set in your Organization:

POST /repo/v1/search/synonym/set
{ "organizationId": "42", "name": "NF Disease Terms", "description": "Synonym mappings for neurofibromatosis-related disease terminology", "rules": [ { "ruleType": "EQUIVALENT", "terms": ["NF1", "neurofibromatosis 1", "von Recklinghausen disease"] }, { "ruleType": "EQUIVALENT", "terms": ["schwannoma", "vestibular schwannoma", "acoustic neuroma"] } ] }

Create a search configuration referencing the synonym set:

POST /repo/v1/search/configuration
{ "organizationId": "42", "name": "NF Portal Config", "synonymSetIds": ["501"], "defaultAnalyzerId": "1", "columnAnalyzerOverrideIds": [] }

Step 2: Create a Search Index Entity

SearchIndex is a Synapse Entity, not a standalone /search/index resource. Creation goes through the standard entity controller. Use the SearchIndex concrete type and provide a parentId (a Project or Folder):

POST /repo/v1/entity
{ "concreteType": "org.sagebionetworks.repo.model.table.search.SearchIndex", "parentId": "syn123", "name": "Studies Search", "definingSQL": "SELECT studyName, summary, diseaseFocus, species, assay FROM syn52694652", "searchConfigurationId": "301" }

Sample Response (201 Created):

{ "id": "syn101", "concreteType": "org.sagebionetworks.repo.model.table.search.SearchIndex", "parentId": "syn123", "name": "Studies Search", "definingSQL": "SELECT studyName, summary, diseaseFocus, species, assay FROM syn52694652", "searchConfigurationId": "301", "versionNumber": 1, "etag": "aaa-bbb-ccc", "createdBy": "3350396", "createdOn": "2026-02-18T12:00:00.000Z", "modifiedBy": "3350396", "modifiedOn": "2026-02-18T12:00:00.000Z" }

Note: Index state (CREATING, ACTIVE, FAILED, DELETING) is stored in the indexing DB's SEARCH_INDEX_STATUS table and is not a property of the SearchIndex entity itself.

The index build is handled asynchronously by the SearchIndexLifecycleWorker. Builds are triggered on both create and update of the SearchIndex entity.

Step 3: Submit a Search Query

Search operations remain under the /repo/v1/search/* namespace.

POST /repo/v1/search/query/async/start
{ "searchIndexId": "syn101", "queryText": "schwannoma gene expression", "size": 10 }

Response (201 Created):

{ "token": "98765" }

Step 4: Poll for Results

Poll until the job completes. If the index is still building (CREATING), the worker automatically retries — the client only polls this endpoint.

GET /repo/v1/search/query/async/get/98765

Response (200 OK when ready, 202 Accepted while processing):

{ "searchIndexId": "syn101", "totalHits": 3, "hits": [ { "rowId": 42, "score": 8.73, "fields": { "studyName": "Genomic Landscape of Schwannoma", "diseaseFocus": "Schwannomatosis" } } ], "facets": [] }

Endpoint Summary

All URLs are prefixed with /repo/v1.

Configuration Resources

Method

URL

Auth

Description

Method

URL

Auth

Description

POST

/search/synonym/set

CREATE on Organization

Create synonym set

GET

/search/synonym/set/{synonymSetId}

Public

Get synonym set

PUT

/search/synonym/set/{synonymSetId}

UPDATE on Organization

Update synonym set

DELETE

/search/synonym/set/{synonymSetId}

DELETE on Organization

Delete synonym set

POST

/search/synonym/set/list

Public

List synonym sets

POST

/search/column/analyzer/override

CREATE on Organization

Create column analyzer override

GET

/search/column/analyzer/override/{columnAnalyzerOverrideId}

Public

Get column analyzer override

PUT

/search/column/analyzer/override/{columnAnalyzerOverrideId}

UPDATE on Organization

Update column analyzer override

DELETE

/search/column/analyzer/override/{columnAnalyzerOverrideId}

DELETE on Organization

Delete column analyzer override

POST

/search/column/analyzer/override/list

Public

List column analyzer overrides

GET

/search/text/analyzer/{id}

Public

Get text analyzer

POST

/search/text/analyzer/list

Public

List text analyzers

POST

/search/configuration

CREATE on Organization

Create search configuration

GET

/search/configuration/{searchConfigurationId}

Public

Get search configuration

PUT

/search/configuration/{searchConfigurationId}

UPDATE on Organization

Update search configuration

DELETE

/search/configuration/{searchConfigurationId}

DELETE on Organization

Delete search configuration

POST

/search/configuration/list

Public

List search configurations

Note: SearchIndex CRUD (create, get, update, delete, trash/restore) is handled by the standard Synapse Entity API under /repo/v1/entity/* and is not enumerated above. SearchIndex behaves like other entity types (e.g., Table, View) with additional search-specific fields.

Search Operations

Method

URL

Auth

Description

Method

URL

Auth

Description

POST

/search/query/async/start

READ on SearchIndex and read access to source table

Start async search

GET

/search/query/async/get/{token}

Same as start

Get async results

POST

/search/autocomplete

READ on SearchIndex and read access to source table

Autocomplete


Security & Authorization

Organization-Scoped ACL for Configuration Resources

Configuration resources (SynonymSet, ColumnAnalyzerOverride, SearchConfiguration) belong to an Organization (via organizationId). Authorization reuses the existing Organization ACL model (same as JSON Schemas):

  • ACL scope: Per-Organization (not per-resource).

  • Public read: All configuration resources are publicly readable (no auth for GET/list).

  • Mutating operations: Require appropriate ACCESS_TYPE on the Organization ACL.

  • Admin bypass: Admins skip ACL checks (UserInfo.isAdmin()).

  • Default ACL: When an Organization is created, the creator gets {READ, CREATE, CHANGE_PERMISSIONS, UPDATE, DELETE}.

Operation

Required ACCESS_TYPE / Permissions

Operation

Required ACCESS_TYPE / Permissions

Create configuration resource

CREATE on Organization

Get / List configuration resources

Public (no auth)

Update configuration resource

UPDATE on Organization

Delete configuration resource

DELETE on Organization

Manage Organization ACL

CHANGE_PERMISSIONS on Organization

TextAnalyzer endpoints are currently read-only (GET and LIST only) and publicly accessible. System analyzers are managed by the bootstrapper on startup; user-defined analyzer CRUD is not yet exposed via REST.

Data Plane: Search & Autocomplete

Actual behavior (mirrors MaterializedView authorization):

  • Search queries and autocomplete requests are not public.

  • To execute a search or autocomplete request, the caller must have:

    • READ on the SearchIndex entity and

    • Sufficient permissions to read from the source table referenced by definingSQL.

  • The implementation loads data as the anonymous user when building the OpenSearch index:

    • The SearchIndexLifecycleWorker uses TableQueryManager.runQueryAsStream() with the anonymous user.

    • Only publicly accessible rows are indexed. Even if a caller has elevated permissions on the table, the underlying index contains only public rows; search results cannot leak non-public data.

Operation

Effective Authorization

Operation

Effective Authorization

Execute search query

READ on SearchIndex entity and read access to underlying table; data indexed only from rows visible to anonymous

Autocomplete

Same as search query


Data Model

This section defines all JSON schema types used across the API. Configuration resources are regular REST objects stored in dedicated tables. SearchIndex is implemented as a Synapse Entity backed by NODE/NODE_REVISION, with a separate indexing DB holding status.

SynonymSet

{ "description": "A shared set of synonym rules. SynonymSets belong to an Organization and can be referenced by SearchConfigurations. Cannot be deleted while referenced.", "properties": { "id": { "type": "string", "readOnly": true }, "organizationId": { "type": "string", "description": "Organization this resource belongs to." }, "name": { "type": "string", "description": "Unique within the organization." }, "description": { "type": "string" }, "rules": { "type": "array", "items": { "$ref": "#SynonymRule" } }, "etag": { "type": "string", "readOnly": true }, "createdOn": { "type": "string", "format": "date-time", "readOnly": true }, "createdBy": { "type": "string", "readOnly": true }, "modifiedOn": { "type": "string", "format": "date-time", "readOnly": true }, "modifiedBy": { "type": "string", "readOnly": true } }, "required": ["organizationId", "name", "rules"] }

SynonymRule

{ "description": "A single synonym rule.", "properties": { "ruleType": { "type": "string", "enum": ["EQUIVALENT", "EXPLICIT"], "description": "EQUIVALENT (bidirectional) or EXPLICIT (one-way expansion)." }, "terms": { "type": "array", "items": { "type": "string" }, "minItems": 2, "description": "For EQUIVALENT: all terms are interchangeable. For EXPLICIT: first term maps to the rest." } }, "required": ["ruleType", "terms"] }

Rule Type

Behavior

Example

Rule Type

Behavior

Example

EQUIVALENT

Bidirectional. All terms are interchangeable.

["NF1", "neurofibromatosis 1"]

EXPLICIT

One-way. First term expands to the rest.

["OPG", "optic pathway glioma"]

ColumnAnalyzerOverride

{ "description": "A shared resource containing per-column analyzer override entries. ColumnAnalyzerOverrides belong to an Organization and can be referenced by SearchConfigurations. Cannot be deleted while referenced.", "properties": { "id": { "type": "string", "readOnly": true }, "organizationId": { "type": "string" }, "name": { "type": "string", "description": "Unique within the organization." }, "description": { "type": "string" }, "overrides": { "type": "array", "items": { "$ref": "#ColumnAnalyzerOverrideEntry" } }, "etag": { "type": "string", "readOnly": true }, "createdOn": { "type": "string", "format": "date-time", "readOnly": true }, "createdBy": { "type": "string", "readOnly": true }, "modifiedOn": { "type": "string", "format": "date-time", "readOnly": true }, "modifiedBy": { "type": "string", "readOnly": true } }, "required": ["organizationId", "name", "overrides"] }

ColumnAnalyzerOverrideEntry

{ "description": "A per-column analyzer override entry. Specifies which analyzers to use at index and search time for a specific column.", "properties": { "columnName": { "type": "string", "description": "Must exist in the entity's schema." }, "indexAnalyzerId": { "type": "string", "description": "The ID of the TextAnalyzer to use when indexing this column." }, "searchAnalyzerId": { "type": "string", "description": "The ID of the TextAnalyzer to use when searching this column." } }, "required": ["columnName", "indexAnalyzerId", "searchAnalyzerId"] }

TextAnalyzer

{ "description": "A database-stored text analyzer configuration. System analyzers have IDs 1-999 and are read-only; user-defined analyzers start at 1000+.", "properties": { "id": { "type": "string", "readOnly": true }, "name": { "type": "string", "description": "Unique within the organization (or globally for system analyzers)." }, "description": { "type": "string" }, "organizationId": { "type": "string", "description": "Null for system analyzers." }, "settings": { "$ref": "#TextAnalyzerSettings", "description": "The analyzer configuration." }, "isSystem": { "type": "boolean", "description": "True for system analyzers (read-only)." }, "etag": { "type": "string", "readOnly": true }, "createdOn": { "type": "string", "format": "date-time", "readOnly": true }, "modifiedOn": { "type": "string", "format": "date-time", "readOnly": true } } }

TextAnalyzerSettings

{ "description": "The OpenSearch analyzer configuration. Stores the full definition of how text is analyzed.", "properties": { "charFilters": { "type": "map(string, string)", "description": "Named character filter definitions. Values are JSON-serialized config objects." }, "tokenizer": { "type": "string", "description": "Tokenizer name (e.g., 'standard', 'whitespace', 'keyword')." }, "tokenizerConfig": { "type": "map(string, string)", "description": "Optional custom tokenizer config." }, "tokenFilters": { "type": "map(string, string)", "description": "Named token filter definitions. Values are JSON-serialized config objects." }, "filterOrder": { "type": "array", "items": { "type": "string" }, "description": "Ordered list of token filter names to apply." }, "charFilterOrder": { "type": "array", "items": { "type": "string" }, "description": "Ordered list of character filter names." }, "synonymAware": { "type": "boolean", "description": "Whether synonym filter should be appended when synonyms are configured." } } }

SearchConfiguration

{ "description": "A reusable search configuration resource. References SynonymSets, ColumnAnalyzerOverrides, and a default text analyzer. Can be associated with projects via the Project Settings framework.", "properties": { "id": { "type": "string", "readOnly": true }, "organizationId": { "type": "string" }, "name": { "type": "string", "description": "Unique within the organization." }, "description": { "type": "string" }, "synonymSetIds": { "type": "array", "items": { "type": "string" } }, "columnAnalyzerOverrideIds": { "type": "array", "items": { "type": "string" } }, "defaultAnalyzerId": { "type": "string", "description": "Default text analyzer ID for columns without an explicit override." }, "etag": { "type": "string", "readOnly": true }, "createdOn": { "type": "string", "format": "date-time", "readOnly": true }, "createdBy": { "type": "string", "readOnly": true }, "modifiedOn": { "type": "string", "format": "date-time", "readOnly": true }, "modifiedBy": { "type": "string", "readOnly": true } }, "required": ["organizationId", "name"] }

SearchIndex

The SearchIndex schema has only two explicit properties (everything else is inherited from Entity):

{ "description": "An OpenSearch index definition for a specific entity. Represented as a Synapse Entity.", "properties": { "definingSQL": { "type": "string", "description": "Required. Must reference exactly one entity." }, "searchConfigurationId": { "type": "string", "description": "Optional reference to a SearchConfiguration." } }, "required": ["definingSQL"] }

Note: Index state (CREATING, ACTIVE, FAILED, DELETING) is stored in the indexing DB's SEARCH_INDEX_STATUS table and is not a property of the SearchIndex entity itself.

As a Synapse Entity, SearchIndex also includes the standard entity fields inherited from Entity, such as:

  • id

  • concreteType (must be "org.sagebionetworks.repo.model.table.search.SearchIndex")

  • parentId (Project or Folder)

  • name

  • etag

  • createdOn, createdBy

  • modifiedOn, modifiedBy

  • versionNumber

  • etc.

SearchIndexState

Index state is stored in the indexing DB, not on the entity itself.

Value

Description

Value

Description

CREATING

Index build in progress. Queries submitted against this index will automatically retry until the build completes.

ACTIVE

Index is live and serving queries.

FAILED

Last build failed. Queries will fail with an error. Delete or update the SearchIndex to trigger a rebuild.

DELETING

AOSS index cleanup in progress.

SearchQuery

{ "description": "A structured query against a SearchIndex's OpenSearch index.", "properties": { "searchIndexId": { "type": "string", "description": "The ID of the SearchIndex entity to query. Supplied by the client in the request body." }, "queryType": { "type": "string", "enum": [ "SIMPLE_QUERY_STRING", "MATCH", "MULTI_MATCH", "MATCH_PHRASE", "PREFIX", "WILDCARD", "MATCH_ALL" ], "description": "Full-text query type. Default: SIMPLE_QUERY_STRING." }, "queryText": { "type": "string", "description": "Search text. Null/empty = match all." }, "queryFields": { "type": "array", "items": { "type": "string" }, "description": "Column names with optional boost (e.g., 'studyName^3'). Empty = all indexed fields." }, "booleanFilters": { "type": "array", "items": { "$ref": "#KeyValue" }, "description": "Exact-match filters." }, "termsFilters": { "type": "array", "items": { "$ref": "#KeyValues" }, "description": "Multi-value filters (IN clause)." }, "rangeFilters": { "type": "array", "items": { "$ref": "#KeyRange" }, "description": "Range filters with min/max." }, "existsFilters": { "type": "array", "items": { "type": "string" }, "description": "Columns that must have a non-null value." }, "notExistsFilters": { "type": "array", "items": { "type": "string" }, "description": "Columns that must be null/missing." }, "fuzziness": { "type": "string", "description": "Typo tolerance: 'AUTO', '0', '1', '2'." }, "facetRequests": { "type": "array", "items": { "$ref": "#FacetRequest" }, "description": "Columns to aggregate as facets." }, "returnFields": { "type": "array", "items": { "type": "string" }, "description": "Columns to include in results. Empty = all." }, "sort": { "type": "array", "items": { "$ref": "#SortField" }, "description": "Sort order. Default: relevance descending." }, "highlight": { "type": "boolean", "description": "Return highlighted snippets. Default: false." }, "from": { "type": "integer", "description": "Zero-based pagination offset. Default: 0." }, "size": { "type": "integer", "description": "Results per page. Default: 25. Max: 100." } }, "required": ["searchIndexId"] }

SearchResults

{ "description": "Results of a search query against a SearchIndex's OpenSearch index.", "properties": { "searchIndexId": { "type": "string" }, "totalHits": { "type": "integer" }, "hits": { "type": "array", "items": { "$ref": "#SearchHit" } }, "facets": { "type": "array", "items": { "$ref": "#FacetColumnResult" } }, "from": { "type": "integer" } }, "required": ["searchIndexId", "totalHits", "hits"] }

SearchHit

{ "description": "A single search result hit.", "properties": { "rowId": { "type": "integer" }, "rowVersion": { "type": "integer" }, "score": { "type": "number" }, "fields": { "type": "object", "additionalProperties": { "type": "string" } }, "highlights": { "type": "object", "additionalProperties": { "type": "string" } } }, "required": ["rowId", "rowVersion", "score", "fields"] }