mirror local folder structure for bulk upload

Description

User asked how to bulk upload 1000s of files within 100s of folders.

We have an in house script to create a sync manifest that creates all of the folder hierarchy required:

https://github.com/Sage-Bionetworks/synAnnotationUtils/blob/master/bin/sync_manifest.py

This combined with the command line 'synapse sync' solves this. The user had no idea how to find out how to do this - the bulk upload documentation only says how to use the sync, but that's not eminently practical when you have to manually create a manifest.

We should migrate this script to the client with more testing and support.

Another user on the same project posted a tool that he had written to do just that:

https://github.com/pcstout/synapse_uploader

Environment

None

Activity

Show:
Phil Snyder
July 14, 2020, 7:05 PM
Edited

This is more or less my use case. There’s the issue I describe in my comment above, where files have non-unique names, but are indexed by their path prefix. Right now we have a not-too-easy-to-find script () that will create the directory structure on Synapse. before syncing files to their respective parents. It would be nice to have a parameter as part of syncToSynapse which preserves directory structure. But this implies we have a manifest file to use with syncToSynapse, which brings me to my second issue.

We have a convenience function, syncToSynapse, which will bulk upload files to Synapse given a manifest file. But creating this manifest file is anything but convenient. There are no functions included with the client that will create a minimal manifest – with columns path and parent. So users need to implement their own version of our script just to traverse their local directory and output a minimal manifest. This should be handled by the client imo.

I’m envisioning a future where syncToSynapse is as easy to use as aws sync

Kimyen Truong
July 29, 2019, 9:43 PM

Thanks for the details. I really appreciated it.

Phil Snyder
July 29, 2019, 9:37 PM

My example is from the second Parkinson’s mobile sensor DREAM challenge (https://www.synapse.org/#!Synapse:syn20540161). An external source provided files organized in a file hierarchy, but ultimately the files themselves have only a few distinct names.

Nobody should be using these files directly – we will derive other, smaller files from them before sharing with participants – and we want to codify this process from source to finish. Hence the need to dump the files as-is on Synapse.

Kimyen Truong
July 18, 2019, 5:32 PM

, your point is well taken. And this ticket is about upload, not download. So you are at the right place. Ljubo is re-designing how bulk upload and annotation works. I will talk to him about this use case, but please also help me make your voice heard.

Phil Snyder
July 18, 2019, 5:12 PM
Edited

Let’s revive this conversation. Although this is about mirroring the Synapse folder hierarchy when using syncFromSynapse, I want to mirror my local file hierarchy on Synapse when using syncToSynapse.

e.g., if my manifest file looks like

 

and I run syncToSynapse within a runtime where these relative paths make sense, accelvals shouldn’t be written over on Synapse, leaving me with one file and no folder hierarchy.

I don’t think I’m the first person to upload a huge manifest where files are repetitively named but organized in a folder hierarchy – only to find a few files on Synapse after all my hard work There’s no mention in the docstring that the folder hierarchy will be flattened.

Your pinned fields
Click on the next to a field label to start pinning.

Assignee

Jordan Kiang

Reporter

Kenneth Daily