Data in Synapse can be downloaded using the programmatic clients (Python, R, and command line) as well as the web client. In this guide, you will learn the basic commands to download data programmatically. For instructions on how to download data from the web, see Downloading Files in our quick start guide.
Downloading Files
Before you begin, it is important to understand that most items in Synapse have a unique identifier associated with them. This identifier is called a Synapse ID, or a synID. The synID format is the prefix “syn” followed by 8 numbers (for example, syn12345678). Items that have unique synIDs in Synapse are: files, folders, projects, tables, views, wikis, links, and Docker repositories. You can use synIDs to refer to specific items when working with Synapse programmatically.
When using the Python, R, or command line clients, files can be downloaded by using the get
command. Downloaded files are stored and/or registered in a cache. By default, the cache location is in your home directory in a hidden folder named .synapseCache
. Whenever the get
function is invoked, the cache is checked to see if the same file is already present by checking its MD5 checksum. If it already exists, the file will not be downloaded again. In other words, if the current version of a file has already been downloaded, Synapse will not re-download the same file.
For the Python and R clients, the default download location is the Synapse cache. The command line client downloads to your current working directory. On the web, your own browser settings determine the download location for files. The Synapse cache is not updated to reflect downloads through a web browser. In all cases you can specify the directory in which to download the file.
For example, to download the experimental protocol on Adult Mouse Cardiac Myocyte Isolation (syn315811
) from the Progenitor Cell Biology Consortium (PCBC) you would run the following:
Command line
Code Block |
---|
synapse get syn3158111
|
Python
Code Block | ||
---|---|---|
| ||
import synapseclient
syn = synapseclient.Synapse()
syn.login()
entity = syn.get("syn3158111")
|
R
Code Block | ||
---|---|---|
| ||
library(synapser)
synLogin()
entity <- synGet("syn3158111")
|
Once a file has been downloaded, you can find the file path using the following:
Command line
Code Block |
---|
# When downloading using the command line client, it will print the filepath of where the file was saved to.
|
Python
Code Block | ||
---|---|---|
| ||
filepath = entity.path
|
R
Code Block | ||
---|---|---|
| ||
filepath <- entity$path
|
Downloading a Specific File Version
If there are multiple versions of a file, a specific version can be downloaded by passing the version
parameter.
In this example, there are multiple versions of an miRNA FASTQ file (syn3260973
) from the Progenitor Cell Biology Consortium. To download the first version:
Command line
Code Block |
---|
synapse get syn3260973 -v 1
|
Python
Code Block | ||
---|---|---|
| ||
entity = syn.get("syn3260973", version=1)
|
R
Code Block | ||
---|---|---|
| ||
entity <- synGet("syn3260973", version=1)
|
See /wiki/spaces/DOCS/pages/2668134540 for more details.
Links
When you click on a link on the Synapse website, it will redirect you to the linked entity. The followLink
parameter will have to be specified when using the programmatic clients or you will only retrieve the link itself without downloading the linked entity.
Command line
Code Block |
---|
synapse get syn1234 --followLink
|
Python
Code Block |
---|
import synapseclient
syn = synapseclient.login()
linkEnt = syn.get("syn1234")
entity = syn.get("syn1234", followLink=True)
|
R
Code Block |
---|
library(synapser)
synLogin()
linkEnt = synGet("syn1234")
entity = synGet("syn1234", followLink=TRUE)
|
Download Location
To override the default download location, you can specify the downloadLocation
parameter.
Command line
Code Block |
---|
synapse get syn00123 --downloadLocation /path/to/folder
|
Python
Code Block |
---|
entity = syn.get("syn00123", downloadLocation="/path/to/folder")
|
R
Code Block |
---|
entity <- synGet("syn00123", downloadLocation="/path/to/folder")
|
Finding and Downloading Files
Files can be annotated in Synapse to help organize your data and make files findable. In order to search the annotations, a /wiki/spaces/DOCS/pages/2011070739 must be created first.
For example, the PCBC Project has a table listing sequencing data files that are annotated. To find all mRNA fastq files originating from CD34+ cells in the we can query by:
Command line
Code Block |
---|
synapse query 'select * from syn7511263 where dataType="mRNA" AND fileType="fastq" AND Cell_Type_of_Origin="CD34+ cells"'
|
Python
Code Block | ||
---|---|---|
| ||
results = syn.tableQuery('select * from syn7511263 where dataType="mRNA" AND fileType="fastq" AND Cell_Type_of_Origin="CD34+ cells"')
|
R
Code Block | ||
---|---|---|
| ||
results <- synTableQuery('select * from syn7511263 where dataType="mRNA" AND fileType="fastq" AND Cell_Type_of_Origin="CD34+ cells"')
df <- as.data.frame(results)
|
Once you’ve queried for the files of interest, they can be downloaded using the following:
Command line
Code Block |
---|
synapse get -q 'select * from syn7511263 where dataType="mRNA" AND fileType="fastq" AND Cell_Type_of_Origin="CD34+ cells"'
|
Python
Code Block | ||
---|---|---|
| ||
results = syn.tableQuery('select * from syn7511263 where dataType="mRNA" AND fileType="fastq" AND Cell_Type_of_Origin="CD34+ cells"')
entity = [syn.get(r['file.id']) for r in results]
|
R
Code Block | ||
---|---|---|
| ||
results <- synTableQuery('select * from syn7511263 where dataType="mRNA" AND fileType="fastq" AND Cell_Type_of_Origin="CD34+ cells"')
df <- as.data.frame(results)
entity <- lapply(df$file.id, function(x) synGet(x))
|
Recursive Downloads
The folder structure that is present on Synapse can be maintained by recursive downloading.
Command line
Code Block |
---|
synapse get -r syn2390898
|
Python
Code Block |
---|
import synapseutils
import synapseclient
syn = synapseclient.login()
files = synapseutils.syncFromSynapse(syn, 'syn2390898')
|
R
Code Block |
---|
# Unfortunately, this feature is not available in the R client
|
Download Wikis
The structure of a wiki page can be extracted through the R and Python clients. The ID, title, and parent wiki page of each sub-wiki page is also determined through the same method.
Python
Code Block | ||
---|---|---|
| ||
wiki = syn.getWikiHeaders("syn00123")
|
R
Code Block | ||
---|---|---|
| ||
entity <- synGet("syn00123")
wiki <- synGetWikiHeaders(entity)
|
The Markdown content within a wiki page can be downloaded if you know the synID and page ID for the wiki. The wiki page ID can either be obtained through the above method or can be found in the URL. For example, in the URL www.synapse.org/#!Synapse:syn00123/wiki/123456
, the last 6 digits of the URL path is the wiki page ID (123456).
Python
Code Block | ||
---|---|---|
| ||
wiki = syn.getWiki("syn00123", 12345)
|
R
Code Block |
---|
entity <- synGet("syn00123")
wiki <- synGetWiki(entity, 12345)
|
Downloading in Bulk
Files can be downloaded in bulk using the syncFromSynapse
function found in the synapseutils helper package. This function crawls all the subfolders of the project or folder that you specify and retrieves all the files that have not been downloaded. By default, the files will be downloaded into your synapseCache
, but a different download location can be specified with the path
parameter. If you do download to a location out side of synapseCache
, this function will also create a tab-delimited manifest of all the files along with their metadata (path, provenance, annotations, etc).
Python
Code Block | ||
---|---|---|
| ||
# Load required libraries
import synapseclient
import synapseutils
# login to Synapse
syn = synapseclient.login(email='me@example.com', password='secret', rememberMe=True)
# download all the files in folder syn123 to a local folder called "myFolder"
all_files = synapseutils.syncFromSynapse(syn, entity='syn123', path='/path/to/myFolder')
|
R
Code Block |
---|
# Load required libraries
library(synapser)
library(synapserutils)
# login to Synapse
synLogin(email='me@example.com', password='secret', rememberMe=TRUE)
# download all the files in folder syn123 to a local folder called "myFolder"
all_files = syncFromSynapse(entity='syn123', path='/path/to/myFolder') |
...
Info |
---|
Remember, when downloading this way, the maximum size of a download is 5 GB, or a maximum of 100 files if using the download cart. |
Once you have found your data of interest and gained access, here’s how to download that data. Reference the screenshots below for a visual representation of the instructions.
Within the project, you may see a series of folders. Any standalone files are downloadable using the down arrow icon in the Download column for that file (1). You can click the > arrow next to any folder to expand it and view its files within (2). To download any individual file from here, click the download icon in the Download column for that file (1), which will add that file to your download cart.
...
Alternatively, if you want to download every file within a folder, you can do so more efficiently. Click on the name of the folder (instead of just expanding it). On this new page with just that folder and its contents, click Download Options (3) followed by Add to Download Cart (4).
...
Repeat this process for as many files (individual and within folders) that you want to download!
As you add items to your download cart, notice that this gets reflected in the Downloads icon of your Synapse toolbar on the left (6). Click this icon once you are ready to download all files in your cart (at this point, they are not downloaded to your computer yet).
In your download cart, review all the items in the list. From here, you can use the Action column to remove any files that you no longer wish to download (7).
When you’re ready to download, click Download As .Zip Packages (8). This will reveal a Create Your Download Package box below, which will prompt you to enter a package name (9). Enter a name that will be easy to find and follows protocols for your project. Then, click Download Package (10). The zipped package will now be available on your computer.
...