We setup a testing environment with some sample data, the source data is stored in the following Synapse table:
https://www.synapse.org/#!Synapse:syn26050977/tables/
We setup both a MySQL RDS instance and an AWS Elasticsearch cluster where we imported the data to perform some testing around indexing and searching. The number of records is very small (51) so we cannot gauge performance metrics.
Some example queries are taken from: https://sagebionetworks.jira.com/browse/PLFM-6876
The code used can be found in the following repository: https://github.com/marcomarasca/synapse-table-search
MySQL
An RDS MySQL instance (version 8.0.16) was used (with a db.t3.small instance) and a dedicated table was created importing the data from the Synapse table: only STRING, STRING_LIST and LARGETEXT columns were imported. This totaled to 46 columns. The table structure is very similar to a real Synapse table (same datatypes were used).
MySQL limits the number of secondary indexes to 64, and the total number of columns in one index to 16. We added a special column that contains the concatenated values of all the columns and created a FULL TEXT index on that particular column.
The RDS instance with the data can be reached (through the VPN) at:
host: dev-marco-db.cdusmwdhqvso.us-east-1.rds.amazonaws.com db: devmarco user: devmarcouser password: platform table: SEARCH_TEST
Elasticsearch
We setup an AWS elasticsearch cluster with a single data node (t3.small.elasticsearch instance) and no dedicated master node in a VPC. The setup was initially done using fine grained access with a IAM user to perform the import. Later the authentication was switched to the internal user management with a dedicated user so that queries can be run from the command line for testing.
The instance can be reached (through the VPN) at:
Endpoint: https://vpc-tables-search-test-es7bt4peajix4wokysfxfldqoy.us-east-1.es.amazonaws.com Kibana Console: https://vpc-tables-search-test-es7bt4peajix4wokysfxfldqoy.us-east-1.es.amazonaws.com/_plugin/kibana/ user: devmarco password: Platform?es2021 index: syn26050977_index
Queries can be run using curl, e.g.
curl -XGET -u 'devmarco:Platform?es2021' 'https://vpc-tables-search-test-es7bt4peajix4wokysfxfldqoy.us-east-1.es.amazonaws.com/syn26050977_index/_search?q=test&pretty=true'