Querying Tables, Views, and Datasets
You can query tables, views, and datasets to find the data you need quickly. Views are especially useful for finding data that may be spread across multiple folders or projects. You can take advantage of querying via the Synapse UI, or using one of the programmatic clients.
Querying Tables, Views, and Datasets via the Synapse UI
Using Simple Search
On the web, simple search mode displays radio buttons or sliders to the left of a table, view, or dataset. These are facets that you can use to search the data. Each facet corresponds to a column in the table, view, or dataset
If the simple search is not visible on a table, view, or dataset, then no facets were specified by the owner of that table, view, or dataset. In this case, use the advanced search described below to query for data.
To use simple search facets, navigate to a table, view, or dataset. Select the features you are interested in to filter the results. Note that the slider for range in simple search is inclusive, meaning the smallest and largest numbers in the selected range will be included in your search results.
Setting Up Simple Search
If you are the owner of a table, view, or dataset, you can choose what facets appear in a simple search menu. To select the facets, click on the Tools menu and select Show View Schema, Show Table Schema, or Show Dataset Schema. Click on the Edit Schema button to view the Edit Columns window. You can choose what facets appear by selecting Values or Range from the dropdown menus under the Facet option. Values can be thought of as categories whereas Range is a date or number. Selecting a blank field will remove the facet from the search window.
Note: If you change the Column Type in the schema, you must set its facet selection again.
Using Advanced Search Queries
The data within a table, view, or dataset can also be retrieved by using a SQL-like query language from the Synapse web interface. To query a table, view, or dataset, select the wrench icon in the upper right-hand menu to reveal the advanced search bar.
By default, the advanced search bar will be pre-populated with a query to capture all of the data contained in the table, view, or dataset. For example:
SELECT * FROM syn3079449
You may specify columns explicitly after the SELECT
statement to subset the data:
SELECT age, gender FROM syn3079449
To keep all columns in the query and filter rows that meet a certain condition, incorporate a WHERE
statement:
SELECT * FROM syn3079449 WHERE age > 50
Add an ORDER BY
statement and a column name to sort the results by that column. The ASC
statement sorts the data returned in ascending order.
SELECT * FROM syn3079449 WHERE age > 50 ORDER BY "treatmentArm" ASC
COUNT
, SELECT AS
, and GROUP_CONCAT
SQL statements are also supported. To count the number of rows:
SELECT count(*) FROM syn3079449
Select and rename a subset of columns:
SELECT age AS "Age at Diagnosis", gender AS "Gender" FROM syn3079449
Group rows by name and identify distinct differences:
SELECT count(distinct(treatmentArm)) AS "Number of Treatments", gender FROM syn3079449 group by gender
To list out the distinct treatment arms that were studied, by gender:
SELECT GROUP_CONCAT(distinct(treatmentArm) SEPARATOR ', ') AS "Available Treatments", gender as "By Gender" FROM syn3079449 group by gender
See the REST API docs for a list of all queries that can be performed.
Querying Tables, Views, and Datasets Programatically
Tables, views, and datasets can also be queried directly from the programmatic clients, which accept all of the SQL-like language used above. For example, to query for the contents of syn12345678:
Command Line
synapse query 'SELECT * FROM syn12345678'
Python
query = syn.tableQuery('SELECT * FROM syn12345678')
R
query <- synTableQuery('SELECT * FROM syn12345678')
The expressions are the conditions for limiting a search. Every entity has properties useful for searching:
All entities (projects, files, folders, tables/views, Docker containers):
id
,name
,createdOn
,createdBy
,modifiedOn
,modifiedBy
,etag
,type
,parentId
,benefactorId
,projectId
Versionable entities (files, table, views, datasets):
currentVersion
Files only:
dataFileHandleId
Files also have contentMd5
, contentSize
, and contentType
as properties. These properties are not available in a view and are not searchable.
SELECT * FROM syn12345678 WHERE "id" = 'syn00012'
For a complete list of example queries, see:
Finding Files in a Specific Project
To find files in a specific project, create a file view in the web client. For example, if you’d like to see all files in a project, navigate to your project and then select the Tables tab. From there, click Tables Tools and Add File View. Click Add container and Enter Synapse ID to create a tabular file view that contains every file in the project, which you can now query. Importantly, if you want to later query on annotations, you must select Add All Annotations.
Listing Files in a Specific Folder
If you are using a programmatic client, you can list the files in a specific folder. First, you need to know the synID of the folder (for example syn1524884, which has data from TCGA related to melanoma). All entities in this folder will have a parentID of syn1524884.
The function to find all files in this folder is called “getChildren”:
Python
foo = list(syn.getChildren(parent='syn1524884', includeTypes=['file']))
R
foo <- as.list(synGetChildren(parent='syn1524884', includeTypes=list('file')))
Queries on Annotations
If annotations have been added to files, they can be used to discover files of interest from a file view syn12345678. For example, you can identify all files annotated as bam
files (fileFormat = bam
) with the following query:
SELECT * FROM syn123456 WHERE "fileFormat"='bam'
Likewise, if you put the RNA-Seq related files described in the section above into the project syn00123 with the described annotations, then you could find all of the files for conditionB
and sampleA
:
SELECT * FROM syn123456 WHERE "projectId"='syn00123' AND "specimenID"='sampleA_conditionB'
Lastly, you can query on a subset of entities that have a specific annotation. You can limit the annotations you want displayed as following.
SELECT specimenID,genomeBuild,fileFormat,platform FROM file WHERE "projectId"='syn00123' AND "specimenID"='sampleA_conditionB'
Reproducible queries can be constructed using one of the analytical clients (command line, Python, and R) and on the web client, query results can be displayed in a table on a wiki page.
In a project, from the wiki page click Wiki Tools in the upper right corner to Edit Project Wiki. Click Insert and choose Table: Query on Files/Folders. Enter your query in the box and click the Insert button. Once you save the wiki page, the results will be displayed as a table.
synapse query "SELECT specimenID,genomeBuild,fileFormat,platform FROM syn123456 WHERE \"specimenID\"='sampleA_conditionB'"
result = syn.tableQuery("SELECT specimenID,genomeBuild,fileFormat,platform FROM syn123456 WHERE \"specimenID\"='sampleA_conditionB'")
result = synTableQuery("SELECT specimenID,genomeBuild,fileFormat,platform FROM syn123456 WHERE \"specimenID\"='sampleA_conditionB'")
Downloading from a Query
You can download files in a folder using queries. Currently this feature is only available in the command line client. For example, if you want to download all files in a file view that has a synapse ID of syn00123
, use:
synapse get -q "SELECT * FROM file WHERE parentId = 'syn00123'"
Troubleshooting
Single quotes in Synapse queries must be replaced by double quotes or two single quotes. In order to query for the chemicalStructure
of 4'-chemical
:
SELECT * FROM syn123 where "chemicalStructure" = '4"-chemical'
# OR
SELECT * FROM syn123 where "chemicalStructure" = '4''-chemical'