Document toolboxDocument toolbox

Text Search in Discussion Forums

Introduction

This is an API design document for adding text search to Synapse discussion forums.

Synapse currently has a variety of text search features:

  • People Search - Used to find people in Synapse by their real name or user name.

  • Entity Search - An advanced text searching tool to find Entities (files, folders, projects…) that combines both unstructured text search and structured faceted search.

  • Table Search - An advanced text searching tool to find rows in a Table using a combination of both unstructured text search and structured SQL queries.

 

The original discussion forum design included a search box in all of the mock-ups. I believe it was our intention to add search as a feature to discussions from the beginning.

Requirements

  • A user should be able to find all threads and replies within a single project’s discussion forum that match a provided text string.

  • The search results should be ranked by relevance.

  • A user must have the READ permission on the project to see any search results for that project’s forum.

Out of Scope

  • Faceted navigation of search results.

  • Searching across all projects.

  • Integration of discussion forum search with Synapse entity search.

API Design

A single new API will be added to the existing Discussion Services:

Response

URL

Request

Description

Response

URL

Request

Description

ForumSearchResult

PUT /forum/{forumId}/search

ForumSearchRequest

Execute the provide search request against the forum identified in the URL. This service will return a single page of results. The results will be order with the most relevant first.

ForumSearchRequest:

type

name

description

type

name

description

String

searchString

A non-null, non-empty string containing the user’s text search.

String

nextPageToken

If a previous search result included a non-null nextPageToken, then there are more results available. Forward the provided nextPageToken to get the next page of results.

ForumSearchResults:

type

name

description

type

name

description

List<Match>

matches

A single page of matches for the provide request.

String

nextPageToken

When a non-null nextPageToken is provided, then another page of results exists. Forward this token in another request to fetch the next page.

Match:

type

name

description

type

name

description

String

threadId

The ID of the matching thread. If this is a match to the text of the thread, the replyId will be null.

String

replyId

If this is a match to a reply within a thread, the replyId will have a non-null value.

Implementation

Currently, the maximum number of threads for a single discussion forum is 667, while the average is 10.1. Similarity, the maximum number of replies to a single discussion thread is 90, while the average is 3.1. Since these are fairly small numbers and we do not require advanced search features like faceted navigation, we recommend an implementation based on the MySQL Full-Text search feature. This is by far the most economic option since we can leverage the feature in our existing database without incurring any additional costs.

The implementation will be as follows:

  • Events - All discussion thread and reply CRUD operation will generate change events. Since these events are migrated, any change events that occur on production will be “re-played” on staging.

  • Worker - A new indirect worker will be added that implements BatchChangeMessageDrivenRunner. This worker’s role is to listen to discussion CRUD events and and drive the DiscussionSearchIndexManager .

  • DiscussionSearchIndexManager - A new manager that will contain all business logic needed to build a database backed search index. The basic workflow involves loading metadata from the appropriate DAO (DiscussionReplyDAO or DiscussionThreadDAO), then loading the full text from S3, then pushing the text to the new DiscussionSearchIndexDao.

  • DiscussionSearchIndexDao - This DAO is a layer of abstraction between the MySQL database and Java objects. It should encapsulate the JDBC logic for reading/writing the new DISCUSSION_SEARCH_INDEX table.

  • DISCUSSION_SEARCH_INDEX - A new table in the main MySQL database. Discussion searches will query this table. The data in this table will not migrate.

  • DiscussionController - Extend the DiscussionController to include the new API from above. The new method will call the DiscussionSearchIndexManager via the DiscussionService.

 

DISCUSSION_SEARCH_INDEX:

CREATE TABLE IF NOT EXISTS `DISCUSSION_SEARCH_INDEX` ( `FORUM_ID` BIGINT NOT NULL, `THREAD_ID` BIGINT NOT NULL, `REPLY_ID` BIGINT, `TEXT` TEXT NOT NULL, PRIMARY KEY (`FORUM_ID`, `THREAD_ID`,`REPLY_ID`), FULLTEXT idx (`TEXT`) );