Explore moving to /filehandle/batch and async over downloadTableColumn
Downloading lots of small files is still a problematic issue with the clients. We have two possibilities either download each file sequentially or ask a worker to package the files up into a zip file. The former is extremely slow and the latter can be slow as it requires two operations (download of files to workers followed by zipping and download of the zip) additioanally it can block if workers are busy. We should benchmark using /fileHandle/batch and either a queue/consumer or async/await model for downloading these files in batch directly to the client.
Yes. An example is data from mPower where it takes us 5 days to download all of the data when downloading with 4 parallel threads to an EC2 instance in the same region as Synapse. This is likely halved or more if we fetch in parallell directly from S3. Off note, we have very similar code in syncToSynapse that could be extended to for this specific case but ideally we would solve this by completing:
and then hooking into the same mechanism for table files. This is harder however as it is relatively easy to solve SYNPY-682 but ideally we would solve both issues by focusing on both parallelizing big files and parallellizing across many smalll files.
Do you have any data about download times that could help here? We have expanded our worker army quite a bit so hopefully you don't see any issue re: workers being too busy, but am not sure how big a priority this is.