download ending early and not restarting from previous spot

Description

Context:

https://www.synapse.org/#!Synapse:syn2580853/discussion/threadId=1807

User experienced high rate of restarts, but each time had downloaded a significant portion of the file and then the download restarted from the beginning (because of md5 mismatch on the whole file).

We should check the logic to make sure there aren't unexpected ways the download seems to be completed but isn't; and if there is a possibility of restarting from the previous point.

Environment

None

Activity

Show:
Ziming Dong
April 20, 2017, 2:20 AM
Edited

This is sort of related to the current issue: our progress bar percentage calculation is wrong

the 'content-length' header is the size of the respose body. However if it is encoded in any way (e.g. gzip), then 'content-length' no longer reflects the size of the downloaded file. when we use a data chunk from response.iter_content(), the chunk has already been decoded so len(chunk) is not guaranteed to be the same length as the number of bytes read from the body.

one example is from test_command_line_client.py:test_command_line_store_and_submit() line 505. It tries to download an external link: https://www.synapse.org/Portal/clear.cache.gif.
For this file, 'content-length' is 55 but the actual decoded size is 43 since there are extra headers for gzip encoding we end up with this being displayed at the end of the file download:

However for larger files, the encoded size should be smaller than the actual decoded file size so I'm still not sure how the user is getting his error. This is just something I noticed when I added logic for comparing written bytes to 'content-length'

Ziming Dong
April 20, 2017, 4:28 PM

Still can't reproduce the bug download stopping early bug but the retrying if bytes transferred < content-length header should work properly now.

Kenneth Daily
May 15, 2017, 4:29 PM

Tried again with original test and could not get the same error as before.

Assignee

Ziming Dong

Reporter

Kenneth Daily

Labels

None

Validator

Kenneth Daily

Development Area

None

Release Version History

None

Fix versions

Priority

Major
Configure