...
Accelerate upload of small files: Uploading a file to Synapse includes some overhead in terms of a few web requests. Synapse is optimized for transferring large files, for which the overhead is negligible. Uploading many small (say, ~1 KB) files is slow. A leading use case is the upload of measurements from mobile devices, collected by Bridge. A solution to the problem is for the client (Bridge) to upload directly to the S3 bucket.
To allow using existing workflow tools: Some popular tools are built to access S3 directly. A leading example is Apache Spark, a cluster computing solution built on Hadoop. It has an AmazonS3 connector allowing it to work with S3 directly. (I believe the code is here. ) To work with Synapse would require writing and maintaining a Synapse-specific connector. The significant effort to do this argues for simply letting tool access the underlying bucket. Groups needing to access Synapse data through Spark include Sage Sys Bio (Tess Thayer) and the BMGF.
To move large quantities of data. (Will P. gave the example of bucket to bucket copy, but STS doesn't provide that, since the token only provides access to the bucket which the STS folder is linked to.)
What tools?
TODO: Ask Kelsey what tools they will us in PsychEncode. What specific workflows?
Potential Solutions
Separate S3 Access from Synapse
...