Support range requests on synapse gets
A user is requesting support for range requests on synapse gets to avoid having to retrieve a whole file.
We make use of range requests internally in the implementation of multi threaded download so this should not be too difficult (at least for S3 downloads) although there may some interactions with the synapse cache.
, yes you did the right thing by creating a Jira issue.
made me aware of this request on the forum. I’m not sure how we typically handle random outside feature requests. I don’t necessarily think its high priority or anything but figured it should be logged?
thanks for the context.
Also, I think 's assessment of the LOE is spot on.
- from the request on the forum - “Is it possible to use the synapseclient to do a HTTP range request to get a portion of a file? For example, if there is a large bam file that I want to different worker nodes to operate on different regions of the bam, I do not want each worker to have to download the entire bam file.”
Information on what a bam file is from https://software.broadinstitute.org/software/igv/BAM
“A BAM file (.bam) is the binary version of a SAM file. A SAM file (.sam) is a tab-delimited text file that contains sequence alignment data. These formats are described on the SAM Tools web site: http://samtools.github.io/hts-specs/.
BAM, rather than SAM, is the recommended format for IGV. Starting with IGV 2.0.11, IUPAC ambiguity codes in BAM files are supported.”
If there haven’t been other requests regarding this type of feature, I’m not thinking its something we should prioritize. I’m going to ask folks on Slack.
, I've never heard such a request. did the requester give additional context, e.g. what sort of file are they hoping to read part of? How big is it? If we decide to do this we should make sure that the range request is performant, that the time to download doesn't increase as the 'range' goes deeper into the file.