Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. MySQL Full Text Index:

    1. A table can be configured to support full text search, the user can select the columns to index

    2. Given the limits on indexes with MySQL a special column (potentially in a special table) contains the concatenated text of the selected columns

    3. Provide a special construct in the SQL language

    4. In some cases re-indexing of the whole data will be needed (e.g. if we drop/add a column snice since we concatenate the text)

    5. Can be extended in the future with some more advanced features (e.g. we can setup stemming etc) as needed, we could explore Apache Lucene (Elasticsearch is based on it) or OpenNLP

  2. Synapse Elasticsearch Cluster with one index-per-table configuration

    1. Setup an elasticsearch cluster with the Synapse stack

    2. A table can be configured to be indexed with a one index per table configuration

    3. Provide the users with a special search API, or integrate with the SQL language

    4. Limit the search results to a maximum value (e.g. 1000) and integrate the results with the faceted results

    5. In some cases a full re-indexing might be necessary but with the one index per table configuration we might be able to run modify-by-query operations instead

    6. We can extend it to allow the users to setup the index configuration as they see fit (e.g. custom analyzers, stop words, suggesters etc)

    7. We can extend it to setup a default query “template” and let the users customize it per table

  3. Synapse Elasticsearch Cluster with single index configuration (can be turned into multiple indexes for scalability)

    1. Similar as 2. but we construct a special document with the concatenated values instead and the table id as a field

    2. Full re-indexing might be necessary for some operations (add/drop column)

    3. No customization for the index or query

    4. We need to be careful not to leak information (e.g. suggestion APIs etc might leak data from different tables)

  4. External Elasticsearch Cluster with one index-per-table configuration

    1. Similar to 2. but the user specifies the cluster endpoint (e.g. similar to what we do for custom storage locations) and we verify ownership. The cluster setup would need to provide access to the Synapse account for a specific index.

    2. Synapse handles the index synchronization, there might be some issues with the fact that tables are actually rebuilt every stack (e.g. the users might not be happy with us rebuilding indexes every week)