Working with a large number of files on the web can be tedious, especially if you want to download, upload, or set annotations and provenance. The command line, Python client, and R client have convenience convenient functions for bulk upload and download. Uploading require requires a tab delimited manifest where each file delimited manifest where each row in the manifest specifies the File to be uploaded and, optionally, annotations to be applied, are specified as a row in the file.. Downloading in bulk requires identifying a container (Folder, Project
Project, Table
Table, or View
or View) that contains the files your Files of interest.
In this article we , you will cover learn how to:
create Create a manifest
upload the files Upload Files in bulk
modify files Modify Files in bulk using a manifest
download the files Download Files in bulk
Uploading Data in Bulk
...
Files to be uploaded are specified in a tab separated (.tsv
) manifest. The manifest has columns that contain information about each file File to be uploaded along with annotations that will be associated with the file File in Synapse.
The required columns in the manifest are:
path
: the current directory of the file File to be uploadedparent
: the Synapse ID of the Folder to upload to
...
where Files will be uploaded
You can also create Provenance for a File during bulk upload. Adding a used
column indicates Files that were used to create the one being uploaded, and the executed
column can indicate code (in Synapse or on the web) that was used to generate the fileFile. Here is an example manifest that uploads a single fileFile:
path | parent | name | used | executed | emotion | species |
---|---|---|---|---|---|---|
/path/to/file.csv | syn123syn1234 | Tardar Sauce | syn654 | grumpy | cat |
The above manifest describes a “filefile.
csv” csv
that will be uploaded to the Synapse folder syn123
and named “Tardar Sauce”. The manifest describes the provenance Provenance of the file File indicating that it was generated using code deposited in GitHub (https://github.com/your/code/repo ) from the data in syn654
. Additionally, the file File has been annotated with emotion: grumpy
and species: cat
. Additional annotations could be associated with the file File by adding more columns.
To review:
the The path and parent columns are required
the The name is only necessary if the displayed name in Synapse should be different than the name of the uploaded file
usedUsed and executed are optional for provenance (but helpful!),
emotionEmotion and species are optional annotations (but also helpful!)
...
The format of the manifest file (called ‘filesToUploadfilesToUpload.
tsv’ tsv
in this example) can be validated prior to upload by using the parameter dryRun
in syncToSynapse
. dryRun
will not upload the data specified in the manifest file. Instead, the client checks that: the manifest file format is correct, all file paths exist, all files are unique, Provenance can be set (optional), and the parent synId synID exists. The number of files and total upload size is also summarized in the dryRun
output. This helps ensure your data upload does not end prematurely due to a typo in the file path or parent synIdsynID.
validate Validate the manifest in the Python client or command line.
validate Validate the manifest in the R client.
...
Files can be downloaded in bulk using the syncFromSynapse
function. This function allows you to download all the files Files in a folder Folder or project Project along with all the annotations and provenance Provenance on those files. A manifest file called SYNAPSE_METADATA_MANIFEST.tsv
that contains the metadata will also be added in the path.
download Download files in the Python client or command line.
download Download files in the R client.
Editing in Bulk
You can modify values in the manifest and re-upload it them to Synapse using syncToSynapse
to edit files in bulk. The manifest allows you to modify everything: file path, provenance, annotations, and versions. If the files have not changed and you only want to update the file annotations, add a column called forceVersion
to the manifest with the value False
for each row. This will stop syncToSynapse
from uploading new versions of the files.
...
Please note that you cannot move things items in Synapse with a manifest. If the parentId parentID is changed, it will create a copy and the file will exist in two different locations.
Info |
---|
Note: Changing the parent synId synID in a manifest creates a copy of the fileFile. It does not move it. |
See Also
...