Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The code used can be found in the following repository: https://github.com/marcomarasca/synapse-table-search

MySQL Setup

An RDS MySQL instance (version 8.0.16) was used (with a db.t3.small instance) and a dedicated table was created importing the data from the Synapse table: only STRING, STRING_LIST and LARGETEXT columns were imported. This totaled to 46 columns. The table structure is very similar to a real Synapse table (same datatypes were used).

...

Code Block
host: dev-marco-db.cdusmwdhqvso.us-east-1.rds.amazonaws.com
db: devmarco
user: devmarcouser
password: platform
table: SEARCH_TEST

Elasticsearch Setup

We setup an AWS elasticsearch cluster with a single data node (t3.small.elasticsearch instance) and no dedicated master node in a VPC. The setup was initially done using fine grained access with a IAM user to perform the import. Later the authentication was switched to the internal user management with a dedicated user so that queries can be run from the command line for testing.

...

Code Block
Endpoint: https://vpc-tables-search-test-es7bt4peajix4wokysfxfldqoy.us-east-1.es.amazonaws.com
Kibana Console: https://vpc-tables-search-test-es7bt4peajix4wokysfxfldqoy.us-east-1.es.amazonaws.com/_plugin/kibana/
user: devmarco
password: Platform?es2021
indexes: syn26050977_index:_default, syn26050977_index_eng

Queries can be run using curl, e.g.

Code Block
curl -XGET -u 'devmarco:Platform?es2021' 'https://vpc-tables-search-test-es7bt4peajix4wokysfxfldqoy.us-east-1.es.amazonaws.com/syn26050977_index_default/_search?q=testtumor&pretty=true'

Only the STRING, STRING_LIST and LARGETEXT columns were imported in the indexes, no static mapping was performed beforehand and we let elasticsearch dynamically map the fields. Multi values column were set in the document as arrays.

The syn26050977_index_default index is “as-is” from just submitting the documents, the syn26050977_index_eng was instead configured to use as default analyzer the pre-configured English analyzer (See https://www.elastic.co/guide/en/elasticsearch/reference/7.x/analysis-lang-analyzer.html#english-analyzer Note that this is the Elastic.co documentation as I could not find it in AWS or opendistro or opensearch docs).