mirror local folder structure for bulk upload
User asked how to bulk upload 1000s of files within 100s of folders.
We have an in house script to create a sync manifest that creates all of the folder hierarchy required:
This combined with the command line 'synapse sync' solves this. The user had no idea how to find out how to do this - the bulk upload documentation only says how to use the sync, but that's not eminently practical when you have to manually create a manifest.
We should migrate this script to the client with more testing and support.
Another user on the same project posted a tool that he had written to do just that:
This is more or less my use case. There’s the issue I describe in my comment above, where files have non-unique names, but are indexed by their path prefix. Right now we have a not-too-easy-to-find script () that will create the directory structure on Synapse. before syncing files to their respective parents. It would be nice to have a parameter as part of syncToSynapse which preserves directory structure. But this implies we have a manifest file to use with syncToSynapse, which brings me to my second issue.
We have a convenience function, syncToSynapse, which will bulk upload files to Synapse given a manifest file. But creating this manifest file is anything but convenient. There are no functions included with the client that will create a minimal manifest – with columns path and parent. So users need to implement their own version of our script just to traverse their local directory and output a minimal manifest. This should be handled by the client imo.
I’m envisioning a future where syncToSynapse is as easy to use as aws sync
Thanks for the details. I really appreciated it.
My example is from the second Parkinson’s mobile sensor DREAM challenge (https://www.synapse.org/#!Synapse:syn20540161). An external source provided files organized in a file hierarchy, but ultimately the files themselves have only a few distinct names.
Nobody should be using these files directly – we will derive other, smaller files from them before sharing with participants – and we want to codify this process from source to finish. Hence the need to dump the files as-is on Synapse.
, your point is well taken. And this ticket is about upload, not download. So you are at the right place. Ljubo is re-designing how bulk upload and annotation works. I will talk to him about this use case, but please also help me make your voice heard.
Let’s revive this conversation. Although this is about mirroring the Synapse folder hierarchy when using syncFromSynapse, I want to mirror my local file hierarchy on Synapse when using syncToSynapse.
e.g., if my manifest file looks like
and I run syncToSynapse within a runtime where these relative paths make sense, accelvals shouldn’t be written over on Synapse, leaving me with one file and no folder hierarchy.
I don’t think I’m the first person to upload a huge manifest where files are repetitively named but organized in a folder hierarchy – only to find a few files on Synapse after all my hard work There’s no mention in the docstring that the folder hierarchy will be flattened.