review contributed download improvements/methods
A collaborator with Gates is investigating how to download from Synapse in various ways. He has written a CLI that implements download in a few ways. - would you like to evaluate this with me?
From Patrick today:
I don't recall the exact numbers but I believe the current code (heavily using asyncio) was the fastest, or at least on par with multiple threads.
I decided to go with asyncio over threads since it greatly simplifies the code and fits the use case (heavy IO).
This package can also use a file view (--with-view, https://github.com/ki-tools/synapse-downloader/blob/master/src/synapse_downloader/download/file_handle_view.py) which speeds it up a lot on huge projects. The file view is used to build a cache of all the dataFileHandleIds.
I built some pure async API methods (https://github.com/ki-tools/synapse-downloader/blob/master/src/synapse_downloader/core/synapse_proxy.py#L98) that really helped too, and wrapped some of the existing methods (https://github.com/ki-tools/synapse-downloader/blob/master/src/synapse_downloader/core/synapse_proxy.py#L58) to make them async.
We have been using this package daily in one of our production environments without any issues so it should be pretty solid. This code was rushed a bit though and I'm sure there are some improvements that could be made.
I'd be happy to review and chat about this with you and/or your new engineer, just let me know.
A heads up - Patrick (the code's author) has made some changes that don't expose some of the utilities he previously had. I have a fork of the repository that is still at the last state when we were discussing the speed issues:
I think we should assign this to Jordan Kiang once he's on board. For now I will assign it to myself.
Update from the contributor:
The real speed improvements we've seen are on huge Projects. The one we are testing has 56,000+ files and around 80GB of data.
So far this class is the winner (it uses the entity view, thanks for the tip!).
Also using asyncio appears to make a big difference in speed.
Thanks for reaching out, I would be happy to look into this. I ran some tests of the spccore multithreaded download I wrote on Wednesday, so it would be interesting to compare speed among other things.