improve download speeds to be comparable to AWS

Description

Our data sizes are increasing, specifically for whole genome sequencing (500GB-1TB per file). Our download speeds are significantly slower than using AWS clients - my tests show at least 4x slower. In cases where we have data in external S3 buckets, our users are circumventing Synapse to get improved download speeds, and are (rightfully) complaining that data that does reside in Synapse S3 storage is slow to download when they know what kinds of speeds S3 can offer.

We have made significant improvements to data upload, and has indicated that is likely is possible to make similar improvements to download as well, but not trivially.

Environment

None

Activity

Show:
Justin Guinney
February 29, 2020, 7:28 PM

Tom, I’d be helpful if you formatted the table better, i.e.limit the precision of numbers.

Ziming Dong
March 2, 2020, 11:01 PM

It doesn't look like we can speed up the MD5 caculations by adding byte chunks while it's still downloading so I'm gonna move on.

My next step is working in a responsiveness issue where if user signals to stop the download process via KeyboardInterrupt (Ctrl +C), the workers will continue to do work until whatever is left on the queue is exhausted. This can be problematic because it may be another 30 seconds or so before the download progress bar stops moving, causing users to misinterpret that the program did not respond. If they continue to spam Ctrl + C, the program will not clean up threads properly, leaving orphaned threads that take up resources.

Thomas Yu
March 12, 2020, 6:06 AM
Edited

Final test on t3.xlarge instance

file

1mb_syn

1mb_aws

2mb_syn

2mb_aws

4mb_syn

4mb_aws

8mb_syn

8mb_aws

16mb_syn

16mb_aws

32mb_syn

32mb_aws

64mb_syn

64mb_aws

128mb_syn

128mb_aws

256mb_syn

256mb_aws

512mb_syn

512mb_aws

1024mb_syn

1024mb_aws

2048mb_syn

2048mb_aws

4096mb_syn

4096mb_aws

0

0.53

0.85

0.65

0.75

0.51

1.03

0.71

0.67

0.59

0.81

0.95

1.17

1.11

1.15

1.49

1.61

3.25

2.96

4.54

4.37

8.27

7.74

15.16

14.93

32.58

29.57

1

0.49

0.86

0.57

0.77

0.65

0.73

0.72

0.68

0.96

0.98

0.86

0.95

1.03

1.19

1.58

1.49

3.21

2.61

4.11

4.38

7.91

7.91

16.30

15.40

31.69

29.92

2

0.54

0.75

0.56

0.78

0.57

0.72

1.04

1.24

0.62

1.06

0.88

1.52

1.53

1.14

1.54

1.72

3.17

2.38

5.04

4.33

9.14

7.75

16.96

14.97

31.68

29.68

3

0.48

0.78

0.52

0.60

0.82

0.79

0.70

1.02

0.88

0.88

0.85

1.22

1.08

1.34

1.66

1.82

3.00

2.66

5.64

4.35

7.77

7.97

15.64

15.12

31.60

32.15

4

0.53

0.76

0.46

0.70

0.65

0.77

0.87

0.85

0.80

0.94

0.82

1.21

1.53

1.29

1.46

1.46

2.68

2.59

4.62

4.28

8.22

7.78

16.64

15.30

29.28

29.95

average

0.51

0.80

0.55

0.72

0.64

0.81

0.81

0.89

0.77

0.93

0.87

1.21

1.26

1.22

1.55

1.62

3.06

2.64

4.79

4.34

8.26

7.83

16.14

15.15

31.37

30.25

Looks good to me! There is very slight drop-off, but its basically comparable.

Justin Guinney
March 12, 2020, 2:36 PM

Could you provide plot?

Thomas Yu
March 19, 2020, 2:30 AM

,

Please assist us with the release of the client by looking at the plots I provided. If they look good, please 'Close Issue' or let me know if I can close.

Assignee

Ziming Dong

Reporter

Kenneth Daily

Labels

Validator

Kenneth Daily

Development Area

None

Release Version History

None

Sprint

None

Fix versions

Priority

Critical
Configure