User experienced high rate of restarts, but each time had downloaded a significant portion of the file and then the download restarted from the beginning (because of md5 mismatch on the whole file).
We should check the logic to make sure there aren't unexpected ways the download seems to be completed but isn't; and if there is a possibility of restarting from the previous point.
This is sort of related to the current issue: our progress bar percentage calculation is wrong
the 'content-length' header is the size of the respose body. However if it is encoded in any way (e.g. gzip), then 'content-length' no longer reflects the size of the downloaded file. when we use a data chunk from response.iter_content(), the chunk has already been decoded so len(chunk) is not guaranteed to be the same length as the number of bytes read from the body.
one example is from test_command_line_client.py:test_command_line_store_and_submit() line 505. It tries to download an external link: https://www.synapse.org/Portal/clear.cache.gif.
For this file, 'content-length' is 55 but the actual decoded size is 43 since there are extra headers for gzip encoding we end up with this being displayed at the end of the file download:
However for larger files, the encoded size should be smaller than the actual decoded file size so I'm still not sure how the user is getting his error. This is just something I noticed when I added logic for comparing written bytes to 'content-length'
Still can't reproduce the bug download stopping early bug but the retrying if bytes transferred < content-length header should work properly now.
Tried again with original test and could not get the same error as before.