Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Some example queries are taken from: https://sagebionetworks.jira.com/browse/PLFM-6876

MySQL

A The code used can be found in the following repository: https://github.com/marcomarasca/synapse-table-search

MySQL

An RDS MySQL instance (version 8.0.16) was used (with a db.t3.small instance) and a dedicated table was setup created importing the data from the Synapse table: only string STRING, stringSTRING_list LIST and largetext LARGETEXT columns were imported. This totaled to 46 columns. The setup table structure is very similar to a real Synapse table (same datatypes were used).

MySQL limits the number of secondary indexes to 64, and the total number of columns in one index to 16. We added a special column that contains the concatenated values of all the columns and created a FULL TEXT index on that particular column.

The RDS instance with the data can be reached (through the VPN) at:

Code Block
host: dev-marco-db.cdusmwdhqvso.us-east-1.rds.amazonaws.com
db: devmarco
user: devmarcouser
password: platform
table: SEARCH_TEST

Elasticsearch

We setup an AWS elasticsearch cluster with a single data node (t3.small.elasticsearch instance) and no dedicated master node in a VPC. The setup was initially done using fine grained access with a IAM user to perform the import. Later the authentication was switched to the internal user management with a dedicated user so that queries can be run from the command line for testing.

The instance can be reached (through the VPN) at:

Code Block
Endpoint: https://vpc-tables-search-test-es7bt4peajix4wokysfxfldqoy.us-east-1.es.amazonaws.com
user: devmarco
password: Platform?es2021
index: syn26050977_index

Queries can be run using curl, e.g.

Code Block
curl -XGET -u 'devmarco:Platform?es2021' 'https://vpc-tables-search-test-es7bt4peajix4wokysfxfldqoy.us-east-1.es.amazonaws.com/syn26050977_index/_search?q=test&pretty=true'