Working with a large number of files on the web can be tedious, especially if you want to download, upload, or set annotations and provenance. The command line, Python client and R client have convenience functions for bulk upload and download. Uploading require a tab delimited manifest where each file to be uploaded and, optionally, annotations to be applied, are specified as a row in the file. Downloading in bulk requires identifying a container (Folder, Project, Table, or View) that contains the files of interest. In this article we will cover how to:

create a manifest
upload the files in bulk
modify files in bulk using a manifest
download the files in bulk

Uploading Data in Bulk

Creating a Manifest

Files to be uploaded are specified in a tab separated (.tsv) manifest. The manifest has columns that contain information about each file to be uploaded along with annotations that will be associated with the file in Synapse.

The required columns in the manifest are:

path: the directory of the file to be uploaded
parent: the Synapse ID of the Folder to upload to

It is optional to link provenance - the used column can indicate files that were used to create the one being uploaded, and the executed column can indicate code (in Synapse or on the web) that was used to generate the file. Here is an example manifest that uploads a single file:

...

path

...

parent

...

name

...

used

...

executed

...

emotion

...

species

...

/path/to/file.csv

...

syn123

...

Tardar Sauce

...

syn654

...

https://github.com/your/code/repo

...

grumpy

...

cat

The above manifest describes a “file.csv” that will be uploaded to the Synapse folder syn123 and named “Tardar Sauce”. The manifest describes the provenance of the file indicating that it was generated using code deposited in GitHub (https://github.com/your/code/repo ) from the data in syn654. Additionally, the file has been annotated with emotion: grumpy and species: cat. Additional annotations could be associated with the file by adding more columns.

To review:

the path and parent columns are required
the name is only necessary if the displayed name in Synapse should be different than the name of the uploaded file
used and executed are optional for provenance (but helpful!),
emotion and species are optional annotations (but also helpful!)

Download the template.

Validate the Manifest and Upload Files

The format of the manifest file (called ‘filesToUpload.tsv’ in this example) can be validated prior to upload by using the parameter dryRun in syncToSynapse. dryRun will not upload the data specified in the manifest file. Instead, the client checks the manifest file format, all file paths exist, all files are unique, Provenance can be set (optional) and the parent synId exists. The number of files and total upload size is also summarized in the dryRun output. This helps ensure your data upload does not end prematurely due to a typo in the file path or parent synId.

validate the manifest in the Python client or command line.
validate the manifest in the R client.

After validating the manifest, you can now upload the files to Synapse by removing the dryRun parameter. Once the upload is complete, you will receive an email notification. This notification will also show any errors from the upload.

...

Data in Synapse can be downloaded using our programmatic clients (Python, R, and command line), or /wiki/spaces/DOCS/pages/2004254837. In this guide, you will learn the basic commands to download data programmatically.

Downloading Files

Before you begin, it is important to understand that most items in Synapse have a unique identifier associated with them. This identifier is called a Synapse ID, or a synID. The synID format is the prefix “syn” followed by 8 numbers (for example, syn12345678). Items that have unique synIDs in Synapse are: files, folders, projects, tables, views, wikis, links, and Docker repositories. You can use synIDs to refer to specific items when working with Synapse programmatically.

When using the Python, R, or command line clients, files can be downloaded by using the get command. Downloaded files are stored and/or registered in a cache. By default, the cache location is in your home directory in a hidden folder named .synapseCache. Whenever the get function is invoked, the cache is checked to see if the same file is already present by checking its MD5 checksum. If it already exists, the file will not be downloaded again. In other words, if the current version of a file has already been downloaded, Synapse will not re-download the same file.

For the Python and R clients, the default download location is the Synapse cache. The command line client downloads to your current working directory. On the web, your own browser settings determine the download location for files. The Synapse cache is not updated to reflect downloads through a web browser. In all cases you can specify the directory in which to download the file.

For example, to download the experimental protocol on Adult Mouse Cardiac Myocyte Isolation (syn315811) from the Progenitor Cell Biology Consortium (PCBC) you would run the following:

Command line

Code Block
synapse get syn3158111

Python

Code Block
import synapseclient syn = synapseclient.Synapse() syn.login() entity = syn.get("syn3158111")

R

Code Block
library(synapser) synLogin() entity <- synGet("syn3158111")

Once a file has been downloaded, you can find the file path using the following:

Command line

Code Block
# When downloading using the command line client, it will print the filepath of where the file was saved to.

Python

Code Block
filepath = entity.path

R

Code Block
filepath <- entity$path

Downloading a Specific File Version

If there are multiple versions of a file, a specific version can be downloaded by passing the version parameter.

In this example, there are multiple versions of an miRNA FASTQ file (syn3260973) from the Progenitor Cell Biology Consortium. To download the first version:

Command line

Code Block
synapse get syn3260973 -v 1

Python

Code Block
entity = syn.get("syn3260973", version=1)

R

Code Block
entity <- synGet("syn3260973", version=1)

See Versioning for more details.

Downloading Linked Data

When you click on a link on the Synapse website, it will redirect you to the linked entity. The followLink parameter will have to be specified when using the programmatic clients or you will only retrieve the link itself without downloading the linked entity.

Command line

Code Block
synapse get syn1234 --followLink

Python

Code Block
import synapseclient syn = synapseclient.login() linkEnt = syn.get("syn1234") entity = syn.get("syn1234", followLink=True)

R

Code Block
library(synapser) synLogin() linkEnt = synGet("syn1234") entity = synGet("syn1234", followLink=TRUE)

Downloading Location

To override the default download location, you can specify the downloadLocation parameter.

Command line

Code Block
synapse get syn00123 --downloadLocation /path/to/folder

Python

Code Block
entity = syn.get("syn00123", downloadLocation="/path/to/folder")

R

Code Block
entity <- synGet("syn00123", downloadLocation="/path/to/folder")

Finding and Downloading Files via Annotations

Files can be /wiki/spaces/DOCS/pages/2667708522 in Synapse to help organize your data and make files findable. In order to search the annotations, you must create a /wiki/spaces/DOCS/pages/2011070739 first.

For example, the PCBC Project has a table listing sequencing data files that are annotated. To find all mRNA fastq files originating from CD34+ cells in the we can query by:

Command line

Code Block
synapse query 'select * from syn7511263 where dataType="mRNA" AND fileType="fastq" AND Cell_Type_of_Origin="CD34+ cells"'

Python

Code Block
results = syn.tableQuery('select * from syn7511263 where dataType="mRNA" AND fileType="fastq" AND Cell_Type_of_Origin="CD34+ cells"')

R

Code Block
results <- synTableQuery("select * from syn7511263 where dataType='mRNA' AND fileType='fastq' AND Cell_Type_of_Origin='CD34+ cells'") df <- as.data.frame(results)

Once you’ve queried for the files of interest, they can be downloaded using the following:

Command line

Code Block
synapse get -q 'select * from syn7511263 where dataType="mRNA" AND fileType="fastq" AND Cell_Type_of_Origin="CD34+ cells"'

Python

Code Block
results = syn.tableQuery('select * from syn7511263 where dataType="mRNA" AND fileType="fastq" AND Cell_Type_of_Origin="CD34+ cells"') entity = [syn.get(r['file.id']) for r in results]

R

Code Block

results <- synTableQuery("select * from syn7511263 where dataType='mRNA' AND fileType='fastq' AND Cell_Type_of_Origin='CD34+ cells'")
df <- as.data.frame(results)
entity <- lapply(df$file.id, function(x) synGet(x))

Recursive Downloading

The folder structure that is present on Synapse can be maintained by recursive downloading.

Command line

Code Block
synapse get -r syn2390898

Python

Code Block
import synapseutils import synapseclient syn = synapseclient.login() files = synapseutils.syncFromSynapse(syn, 'syn2390898')

R

Code Block
# Unfortunately, this feature is not available in the R client

Downloading Wikis

The structure of a wiki page can be extracted through the R and Python clients. The ID, title, and parent wiki page of each sub-wiki page is also determined through the same method.

Python

Code Block
wiki = syn.getWikiHeaders("syn00123")

R

Code Block
entity <- synGet("syn00123") wiki <- synGetWikiHeaders(entity)

The Markdown content within a wiki page can be downloaded if you know the synID and page ID for the wiki. The wiki page ID can either be obtained through the above method or can be found in the URL. For example, in the URL www.synapse.org/#!Synapse:syn00123/wiki/123456, the last 6 digits of the URL path is the wiki page ID (123456).

Python

Code Block
wiki = syn.getWiki("syn00123", 12345)

R

Code Block
entity <- synGet("syn00123") wiki <- synGetWiki(entity, 12345)

Downloading in Bulk

Files can be downloaded in bulk using the syncFromSynapse function found in the synapseutils helper package. This function allows you to download crawls all the files in a folder or project along with all the annotations and provenance on those files. A manifest file called SYNAPSE_METADATA_MANIFEST.tsv that contains the metadata will also be added in the path.

download files in the Python client or command line.
download files in the R client.

Editing in Bulk

You can modify values in the manifest and re-upload it to Synapse using syncToSynapse to edit files in bulk. The manifest allows you to modify everything: file path, provenance, annotations, and versions. If the files have not changed and you only want to update the file annotations, add a column called forceVersion to the manifest with the value False for each row. This will stop syncToSynapse from uploading new versions of the files.

You can also update annotations using File Views.

Please note that you cannot move things with a manifest. If the parentId is changed, it will create a copy and the file will exist in two different locations.

Info
Note: Changing the parent synId in a manifest creates a copy of the file. It does not move it.

Versions Compared

Old Version 1

New Version Current

Key

Uploading Data in Bulk

Creating a Manifest

Validate the Manifest and Upload Files

Downloading Files

Downloading a Specific File Version

Downloading Linked Data

Downloading Location

Finding and Downloading Files via Annotations

Recursive Downloading

Downloading Wikis

Downloading in Bulk

Editing in Bulk

See Also

Page Comparison

Versions Compared

Old Version 1

New Version Current

Key

Uploading Data in Bulk

Creating a Manifest

Validate the Manifest and Upload Files

Downloading Files

Downloading a Specific File Version

Downloading Linked Data

Downloading Location

Finding and Downloading Files via Annotations

Recursive Downloading

Downloading Wikis

Downloading in Bulk

Editing in Bulk

See Also