...

Code Block

with latest as (
    select bucket, key, max(snapshot_timestamp) as snapshot_timestamp
    from filesnapshots
    where snapshot_timestamp > current_timestamp - INTERVAL '14' DAY
    group by bucket, key, id, content_size, status
),
max_id as (
    select bucket, key, max(id) as id
    from filesnapshots
    where snapshot_timestamp > current_timestamp - INTERVAL '14' DAY
    group by bucket, key
)
select sum(fs.content_size)/1099511627776 total_size_terabytestebibytes
from filesnapshots fs join latest on fs.snapshot_timestamp=latest.snapshot_timestamp and fs.bucket=latest.bucket and fs.key=latest.key
join max_id on fs.id=max_id.id and fs.bucket=max_id.bucket and fs.key=max_id.key
where fs.snapshot_timestamp > current_timestamp - INTERVAL '14' DAY

...

Code Block

with latest as (
    select bucket, key, max(snapshot_timestamp) as snapshot_timestamp
    from filesnapshots
    where snapshot_timestamp > current_timestamp - INTERVAL '14' DAY
    group by bucket, key, id, content_size, status
),
max_id as (
    select bucket, key, max(id) as id
    from filesnapshots
    where snapshot_timestamp > current_timestamp - INTERVAL '14' DAY
    group by bucket, key
)
select sum(fs.content_size)/1099511627776 size_terabytestebibytes, status
from filesnapshots fs join latest on fs.snapshot_timestamp=latest.snapshot_timestamp and fs.bucket=latest.bucket and fs.key=latest.key
join max_id on fs.id=max_id.id and fs.bucket=max_id.bucket and fs.key=max_id.key
where fs.snapshot_timestamp > current_timestamp - INTERVAL '14' DAY
group by fs.status

This returns:

size_terabytes tebibytes status

1 2271 AVAILABLE
2 276 UNLINKED
3 236 ARCHIVED

...

Code Block

with latest as (
    select bucket, key, max(snapshot_timestamp) as snapshot_timestamp
    from filesnapshots
    where snapshot_timestamp > current_timestamp - INTERVAL '14' DAY
    group by bucket, key, id, content_size
),
max_id as (
    select bucket, key, max(id) as id
    from filesnapshots
    where snapshot_timestamp > current_timestamp - INTERVAL '14' DAY
    group by bucket, key
)
select fs.bucket, sum(fs.content_size)/1099511627776 size_terabytestebibytes
from filesnapshots fs join latest on fs.snapshot_timestamp=latest.snapshot_timestamp and fs.bucket=latest.bucket and fs.key=latest.key
join max_id on fs.id=max_id.id and fs.bucket=max_id.bucket and fs.key=max_id.key
where 
fs.snapshot_timestamp > current_timestamp - INTERVAL '14' DAY
and status='AVAILABLE'
group by fs.bucket
order by size_terabytestebibytes desc
limit 25

bucket size_

...

tebibytes

1 proddata.sagebase.org 698
2 ad-knowledge-portal-main 324
3 ad-knowledge-portal-large 236
4 amp-mayo-sinai-synapseencryptedexternalbucket-1bmvn8rlwixv2 163
5 nda-bsmn-scratch 152
6 htan-dcc-htapp 85
7 exceptional-longevity 66
8 diverse-cohorts 52
9 mpnstwgs 45
10 nf-syn23664726-s3-bucket-n9uakf7bowwd 43
11 integrated-immunology 41
12 gates-ki-001-synapseencryptedexternalbucket-kl8vu7tqto1b 38
13 sc-237179673806-pp-hg7rd6vvvd7mk-s3bucket-eu11pjwpxxiv 37
14 htan-dcc-chop 36
15 htan-dcc-stanford 33
16 strides-ampad-project-tower-bucket 24
17 htan-dcc-hms 20
18 htan-dcc-vanderbilt 18
19 htan-dcc-duke 15
20 ntap-add5-project-tower-bucket 12

...

The aggregate size of these 25 buckets is 2180 TBTiB, or 96% of the total.

proddata.sagebase.org is the primary/default Synapse bucket. The next two large buckets, ad-knowledge-portal-main and ad-knowledge-portal-large are in Sage’s AWS STRIDES account, where the costs are paid directly by NIH. All the buckets are recognizable by name as being related to a Sage project and residing in a Sage AWS account. We can look at storage from the S3 perspective as a rough validation of these figures: Our total S3 footprint should be larger than our Synapse footprint.

We can use S3 Data Lens to get a summary of our storage usage from the AWS perspective. As of June 22, 2023:

AWS Account(s)	S3 usage,

TB

TiB
Sage Organization (~35 accounts)	1953
*6185: Strides AD	823.3
*5034: (Nextflow Tower)	230.7
Total	3007

Since 3007 PB > 2783 PB, the values seem to be correct. Remaining 200 TB TiB may be accounted for by data which Sage owns but which is unrelated to Synapse. Below we reconcile on a per-bucket basis. There are two numerical columns for bucket size reported by S3 in Sage’s “organization” and STRIDES, respectively, and one column for the amount indexed in Synapse.

Bucket

S3 bucket size (

TB

TiB)

STRIDES S3 bucket size (

TB

TiB)

Indexed in Synapse (“AVAILABLE”) (

TB

TiB)
proddata.sagebase.org	937.3		698
ad-knowledge-portal-main		333.1	324
ad-knowledge-portal-large		236.1	236
amp-mayo-sinai-synapseencryptedexternalbucket-1bmvn8rlwixv2	312		163
nda-bsmn-scratch			152
htan-dcc-htapp	0 (bucket is gone)		85 (also 62

TB

TiB “UNLINKED”)
exceptional-longevity		66.2	66
diverse-cohorts		54.9	52
mpnstwgs	45		45
nf-syn23664726-s3-bucket-n9uakf7bowwd	45.4		43
integrated-immunology	41.8		41
gates-ki-001-synapseencryptedexternalbucket-kl8vu7tqto1b	39.2		38
sc-237179673806-pp-hg7rd6vvvd7mk-s3bucket-eu11pjwpxxiv	37.7		37
htan-dcc-chop	35.2		36
htan-dcc-stanford	26.9		33
strides-ampad-project-tower-bucket		187.8	24
htan-dcc-hms	16.1		20
htan-dcc-vanderbilt	19.2		18
htan-dcc-duke	15.8		15
ntap-add5-project-tower-bucket	16.3		12
ctf-swnts-project-tower-bucket	11.9		11
htan-dcc-dfci	0		8
htan-dcc-washu	0		8
sc-237179673806-pp-pisxjdspvwrtg-s3bucket-deoh9m8xdfcc	6.8		8
nda-bsmn			7

The five buckets highlighted in red, appear in Synapse but not in our list of top S3 buckets. They have been emptied or removed by their owners, leaving the index in Synapse ‘stale'. The the child page on stale buckets for more details.

...

Code Block

with latest_ns as (
	select id, max(snapshot_timestamp) as snapshot_timestamp
	from nodesnapshots
	where snapshot_timestamp > current_timestamp - INTERVAL '14' DAY
	group by id
),
latest_fs as (
	select bucket, key, max(snapshot_timestamp) as snapshot_timestamp
	from filesnapshots
	where snapshot_timestamp > current_timestamp - INTERVAL '14' DAY
	group by bucket, key
),
fs_max_id as (
	select bucket, key, max(id) as id
	from filesnapshots
	where snapshot_timestamp > current_timestamp - INTERVAL '14' DAY
	group by bucket, key
)
select sum(fs.content_size)/1099511627776 project_size_terabytestebibytes
from nodesnapshots ns 
join latest_ns on ns.id=latest_ns.id and ns.snapshot_timestamp=latest_ns.snapshot_timestamp
join filesnapshots fs on ns.file_handle_id=fs.id
join fs_max_id on fs.id=fs_max_id.id,
latest_fs
where
ns.node_type='file'
and ns.snapshot_timestamp > current_timestamp - INTERVAL '14' DAY
and fs.snapshot_timestamp > current_timestamp - INTERVAL '14' DAY
and fs.bucket=latest_fs.bucket and fs.key=latest_fs.key and fs.snapshot_timestamp=latest_fs.snapshot_timestamp
and fs.bucket=fs_max_id.bucket and fs.key=fs_max_id.key and fs.id=fs_max_id.id
and fs.status='AVAILABLE'

The result is 2108. This is close to the result of the earlier query on filesnapshots alone. (TODO: Where did the other 163 TB TiB go?) Now we partition the result by Project and Bucket:

Code Block

with latest_ns as (
	select id, max(snapshot_timestamp) as snapshot_timestamp
	from nodesnapshots
	where snapshot_timestamp > current_timestamp - INTERVAL '14' DAY
	group by id
),
latest_fs as (
	select bucket, key, max(snapshot_timestamp) as snapshot_timestamp
	from filesnapshots
	where snapshot_timestamp > current_timestamp - INTERVAL '14' DAY
	group by bucket, key
),
fs_max_id as (
	select bucket, key, max(id) as id
	from filesnapshots
	where snapshot_timestamp > current_timestamp - INTERVAL '14' DAY
	group by bucket, key
)
select 'https://www.synapse.org/#!Synapse:syn' || cast(ns.project_id as varchar), fs.bucket, sum(fs.content_size)/1099511627776 project_size_terabytestebibytes
from nodesnapshots ns 
join latest_ns on ns.id=latest_ns.id and ns.snapshot_timestamp=latest_ns.snapshot_timestamp
join filesnapshots fs on ns.file_handle_id=fs.id
join fs_max_id on fs.id=fs_max_id.id,
latest_fs
where
ns.snapshot_timestamp > current_timestamp - INTERVAL '14' DAY
and fs.snapshot_timestamp > current_timestamp - INTERVAL '14' DAY
and fs.bucket=latest_fs.bucket and fs.key=latest_fs.key and fs.snapshot_timestamp=latest_fs.snapshot_timestamp
and fs.bucket=fs_max_id.bucket and fs.key=fs_max_id.key and fs.id=fs_max_id.id
and fs.status='AVAILABLE'
group by ns.project_id, fs.bucket
order by project_size_terabytestebibytes desc
limit 50

The top 50 account for 86% of the 2108 TBTiB:

Project

Bucket

Project size,

TB

TiB
https://www.synapse.org/#!Synapse:syn2580853	ad-knowledge-portal-main	325
https://www.synapse.org/#!Synapse:syn4921369	proddata.sagebase.org	249
https://www.synapse.org/#!Synapse:syn15067647	amp-mayo-sinai-synapseencryptedexternalbucket-1bmvn8rlwixv2	154
https://www.synapse.org/#!Synapse:syn5902559	nda-bsmn-scratch	150
https://www.synapse.org/#!Synapse:syn2580853	ad-knowledge-portal-large	144
https://www.synapse.org/#!Synapse:syn20834712	htan-dcc-htapp	85
https://www.synapse.org/#!Synapse:syn27229419	exceptional-longevity	66
https://www.synapse.org/#!Synapse:syn51090228	mpnstwgs	45
https://www.synapse.org/#!Synapse:syn23664726	nf-syn23664726-s3-bucket-n9uakf7bowwd	43
https://www.synapse.org/#!Synapse:syn25982655	integrated-immunology	39
https://www.synapse.org/#!Synapse:syn25986113	sc-237179673806-pp-hg7rd6vvvd7mk-s3bucket-eu11pjwpxxiv	37
https://www.synapse.org/#!Synapse:syn22776798	htan-dcc-chop	33
https://www.synapse.org/#!Synapse:syn2759792	proddata.sagebase.org	27
https://www.synapse.org/#!Synapse:syn22860959	diverse-cohorts	27
https://www.synapse.org/#!Synapse:syn23511964	htan-dcc-stanford	25
https://www.synapse.org/#!Synapse:syn30985329	strides-ampad-project-tower-bucket	24
https://www.synapse.org/#!Synapse:syn2580853	diverse-cohorts	23
https://www.synapse.org/#!Synapse:syn49637038	proddata.sagebase.org	21
https://www.synapse.org/#!Synapse:syn1867134	proddata.sagebase.org	18
https://www.synapse.org/#!Synapse:syn21050481	htan-dcc-vanderbilt	17
https://www.synapse.org/#!Synapse:syn2787333	proddata.sagebase.org	17
https://www.synapse.org/#!Synapse:syn11562183	proddata.sagebase.org	16
https://www.synapse.org/#!Synapse:syn10337516	proddata.sagebase.org	15
https://www.synapse.org/#!Synapse:syn2370594	proddata.sagebase.org	14
https://www.synapse.org/#!Synapse:syn22123910	htan-dcc-hms	14
https://www.synapse.org/#!Synapse:syn23511961	htan-dcc-duke	14
https://www.synapse.org/#!Synapse:syn25882123	proddata.sagebase.org	10
https://www.synapse.org/#!Synapse:syn5016925	proddata.sagebase.org	10
https://www.synapse.org/#!Synapse:syn23548160	gates-ki-001-synapseencryptedexternalbucket-kl8vu7tqto1b	9
https://www.synapse.org/#!Synapse:syn22691098	gates-ki-001-synapseencryptedexternalbucket-kl8vu7tqto1b	9
https://www.synapse.org/#!Synapse:syn26133760	sc-237179673806-pp-pisxjdspvwrtg-s3bucket-deoh9m8xdfcc	8
https://www.synapse.org/#!Synapse:syn9727752	ctf-swnts-project-tower-bucket	8
https://www.synapse.org/#!Synapse:syn22255320	htan-dcc-washu	8
https://www.synapse.org/#!Synapse:syn7342718	proddata.sagebase.org	8
https://www.synapse.org/#!Synapse:syn17866381	gates-ki-001-synapseencryptedexternalbucket-kl8vu7tqto1b	7
https://www.synapse.org/#!Synapse:syn36938960	proddata.sagebase.org	7
https://www.synapse.org/#!Synapse:syn5902559	nda-bsmn	7
https://www.synapse.org/#!Synapse:syn30985329	proddata.sagebase.org	7
https://www.synapse.org/#!Synapse:syn22255433	proddata.sagebase.org	6
https://www.synapse.org/#!Synapse:syn21984813	proddata.sagebase.org	6
https://www.synapse.org/#!Synapse:syn5698493	synodos.eu.frankfurt.penn	6
https://www.synapse.org/#!Synapse:syn4939902	ntap-add5-project-tower-bucket	6
https://www.synapse.org/#!Synapse:syn23625052	proddata.sagebase.org	6
https://www.synapse.org/#!Synapse:syn23511954	htan-dcc-dfci	6
https://www.synapse.org/#!Synapse:syn11374339	ntap-cnf-cell-project-tower-bucket	5
https://www.synapse.org/#!Synapse:syn18485874	proddata.sagebase.org	5
https://www.synapse.org/#!Synapse:syn12299750	proddata.sagebase.org	5
https://www.synapse.org/#!Synapse:syn22093319	htan-dcc-ohsu	5
https://www.synapse.org/#!Synapse:syn24984270	htan-dcc-tnp-sardana	5
https://www.synapse.org/#!Synapse:syn26720920	proddata.sagebase.org	4

This query breaks down the data by is_public, is_controlled, is_restricted:

Code Block

with latest_ns as (
	select id, max(snapshot_timestamp) as snapshot_timestamp
	from nodesnapshots
	where snapshot_timestamp > current_timestamp - INTERVAL '14' DAY
	group by id
),
latest_fs as (
	select bucket, key, max(snapshot_timestamp) as snapshot_timestamp
	from filesnapshots
	where snapshot_timestamp > current_timestamp - INTERVAL '14' DAY
	group by bucket, key
),
fs_max_id as (
	select bucket, key, max(id) as id
	from filesnapshots
	where snapshot_timestamp > current_timestamp - INTERVAL '14' DAY
	group by bucket, key
)
select ns.is_public, ns.is_controlled, ns.is_restricted, sum(fs.content_size)/1099511627776 size_terabytestebibytes
from nodesnapshots ns 
join latest_ns on ns.id=latest_ns.id and ns.snapshot_timestamp=latest_ns.snapshot_timestamp
join filesnapshots fs on ns.file_handle_id=fs.id
join fs_max_id on fs.id=fs_max_id.id,
latest_fs
where
ns.snapshot_timestamp > current_timestamp - INTERVAL '14' DAY
and fs.snapshot_timestamp > current_timestamp - INTERVAL '14' DAY
and fs.bucket=latest_fs.bucket and fs.key=latest_fs.key and fs.snapshot_timestamp=latest_fs.snapshot_timestamp
and fs.bucket=fs_max_id.bucket and fs.key=fs_max_id.key and fs.id=fs_max_id.id
and fs.status='AVAILABLE'
group by ns.is_public, ns.is_controlled, ns.is_restricted
order by size_terabytestebibytes desc

Result:

is_public

is_controlled

is_restricted

size_

terabytes

tebibytes
FALSE	FALSE	FALSE	827
TRUE	TRUE	TRUE	615
TRUE	FALSE	FALSE	196
TRUE	TRUE	FALSE	155
FALSE	TRUE	FALSE	124
FALSE	FALSE	TRUE	73
TRUE	FALSE	TRUE	63
FALSE	TRUE	TRUE	53

TODO: What are the exact definitions of these terms?

...

Code Block

with latest_ns as (
	select id, max(snapshot_timestamp) as snapshot_timestamp
	from nodesnapshots
	where snapshot_timestamp > current_timestamp - INTERVAL '14' DAY
	group by id
),
latest_fs as (
	select bucket, key, max(snapshot_timestamp) as snapshot_timestamp
	from filesnapshots
	where snapshot_timestamp > current_timestamp - INTERVAL '14' DAY
	group by bucket, key
),
fs_max_id as (
	select bucket, key, max(id) as id
	from filesnapshots
	where snapshot_timestamp > current_timestamp - INTERVAL '14' DAY
	group by bucket, key
)
select 'https://www.synapse.org/#!Synapse:syn' || cast(ns.project_id as varchar), fs.bucket, sum(fs.content_size)/1099511627776 project_size_terabytestebibytes
from nodesnapshots ns 
join latest_ns on ns.id=latest_ns.id and ns.snapshot_timestamp=latest_ns.snapshot_timestamp
join filesnapshots fs on ns.file_handle_id=fs.id
join fs_max_id on fs.id=fs_max_id.id,
latest_fs
where
ns.snapshot_timestamp > current_timestamp - INTERVAL '14' DAY
and fs.snapshot_timestamp > current_timestamp - INTERVAL '14' DAY
and fs.bucket=latest_fs.bucket and fs.key=latest_fs.key and fs.snapshot_timestamp=latest_fs.snapshot_timestamp
and fs.bucket=fs_max_id.bucket and fs.key=fs_max_id.key and fs.id=fs_max_id.id
and fs.status='AVAILABLE'
and ns.is_public=TRUE
group by ns.project_id, fs.bucket
order by project_size_terabytestebibytes desc
limit 50

The total drops to 948 TBTiB , dominated by AD Portal and PsychEncode, as before. If we restrict the query to the Synapse ‘default’ bucket , we find 279 TB TiB of data. To see how it breaks down by project:

Code Block

with latest_ns as (
	select id, max(snapshot_timestamp) as snapshot_timestamp
	from nodesnapshots
	where snapshot_timestamp > current_timestamp - INTERVAL '14' DAY
	group by id
),
latest_fs as (
	select bucket, key, max(snapshot_timestamp) as snapshot_timestamp
	from filesnapshots
	where snapshot_timestamp > current_timestamp - INTERVAL '14' DAY
	group by bucket, key
),
fs_max_id as (
	select bucket, key, max(id) as id
	from filesnapshots
	where snapshot_timestamp > current_timestamp - INTERVAL '14' DAY
	group by bucket, key
)
select 'https://www.synapse.org/#!Synapse:syn' || cast(ns.project_id as varchar), fs.bucket, sum(fs.content_size)/1099511627776 project_size_terabytestebibytes
from nodesnapshots ns 
join latest_ns on ns.id=latest_ns.id and ns.snapshot_timestamp=latest_ns.snapshot_timestamp
join filesnapshots fs on ns.file_handle_id=fs.id
join fs_max_id on fs.id=fs_max_id.id,
latest_fs
where
ns.snapshot_timestamp > current_timestamp - INTERVAL '14' DAY
and fs.snapshot_timestamp > current_timestamp - INTERVAL '14' DAY
and fs.bucket=latest_fs.bucket and fs.key=latest_fs.key and fs.snapshot_timestamp=latest_fs.snapshot_timestamp
and fs.bucket=fs_max_id.bucket and fs.key=fs_max_id.key and fs.id=fs_max_id.id
and fs.status='AVAILABLE'
and fs.bucket='proddata.sagebase.org'
and ns.is_public=TRUE
group by ns.project_id, fs.bucket
order by project_size_terabytestebibytes desc
limit 50

The top 21 projects account for 83% of the data, with PsychEncode at the top of the list:

_col0

bucket

project_size_

terabytes

tebibytes
https://www.synapse.org/#!Synapse:syn4921369	proddata.sagebase.org	152
https://www.synapse.org/#!Synapse:syn49637038	proddata.sagebase.org	21
https://www.synapse.org/#!Synapse:syn2759792	proddata.sagebase.org	19
https://www.synapse.org/#!Synapse:syn21984813	proddata.sagebase.org	6
https://www.synapse.org/#!Synapse:syn12299750	proddata.sagebase.org	5
https://www.synapse.org/#!Synapse:syn4984604	proddata.sagebase.org	4
https://www.synapse.org/#!Synapse:syn51364943	proddata.sagebase.org	4
https://www.synapse.org/#!Synapse:syn9727752	proddata.sagebase.org	3
https://www.synapse.org/#!Synapse:syn11638893	proddata.sagebase.org	3
https://www.synapse.org/#!Synapse:syn2343195	proddata.sagebase.org	2
https://www.synapse.org/#!Synapse:syn17083203	proddata.sagebase.org	2
https://www.synapse.org/#!Synapse:syn17865732	proddata.sagebase.org	1
https://www.synapse.org/#!Synapse:syn22255433	proddata.sagebase.org	1
https://www.synapse.org/#!Synapse:syn5610418	proddata.sagebase.org	1
https://www.synapse.org/#!Synapse:syn21628800	proddata.sagebase.org	1
https://www.synapse.org/#!Synapse:syn2395480	proddata.sagebase.org	1
https://www.synapse.org/#!Synapse:syn32909874	proddata.sagebase.org	1
https://www.synapse.org/#!Synapse:syn2580853	proddata.sagebase.org	1
https://www.synapse.org/#!Synapse:syn4939902	proddata.sagebase.org	1
https://www.synapse.org/#!Synapse:syn23644107	proddata.sagebase.org	1
https://www.synapse.org/#!Synapse:syn22079621	proddata.sagebase.org	1

How much is the data used?

...

Code Block

with
latest_fs as (
	select bucket, key, max(snapshot_timestamp) as snapshot_timestamp
	from filesnapshots
	where snapshot_timestamp > current_timestamp - INTERVAL '180' DAY
	group by bucket, key
),
fs_max_id as (
	select bucket, key, max(id) as id
	from filesnapshots
	where snapshot_timestamp > current_timestamp - INTERVAL '180' DAY
	group by bucket, key
)
select month(fdr.record_date) as month, sum(fs.content_size)/1073741824 as download_volume_gigabytes
from filedownloadrecords fdr 
join filesnapshots fs on fdr.file_handle_id=fs.id
join fs_max_id on fdr.file_handle_id=fs_max_id.id,
latest_fs
where 
fdr.file_handle_id=fs.id
and fdr.timestamp > current_timestamp - INTERVAL '180' DAY -- downloads in the last week
and fs.snapshot_timestamp > current_timestamp - INTERVAL '180' DAY
and fs.bucket=latest_fs.bucket and fs.key=latest_fs.key and fs.snapshot_timestamp=latest_fs.snapshot_timestamp
and fs.bucket=fs_max_id.bucket and fs.key=fs_max_id.key and fs.id=fs_max_id.id
and fs.status='AVAILABLE'
group by month(fdr.record_date)
order by month(fdr.record_date)

returns

Month	Download volume, GB
6	2031037
7	4553643

At the time of this writing we have not yet accumulated a full month’s worth of data. Let’s compare the current month’s egress to AWS’s stat’s: For July 1-12, these are the egress charges from AWS Cost Explorer:

	Cost	Data out, GB
Sage Organization	$2400	29,800
AD STRIDES	$6500	94,670
NF STRIDES	~$0	~0

The numbers are far less than those reported by Synapse. This could be because the data did not leave the region or because the presigned URL received from Synapse (the creation of which triggered the recording of a download) was not actually used by the client.

...

15 projects accounted for 99% of the download activity. Only 6 of these projects use the Synapse default bucket:

Project	Bucket	Download volume, GB
https://www.synapse.org/#!Synapse:syn23511964	htan-dcc-stanford	1804241
https://www.synapse.org/#!Synapse:syn2580853	ad-knowledge-portal-main	846984
https://www.synapse.org/#!Synapse:syn2580853	diverse-cohorts	530371
https://www.synapse.org/#!Synapse:syn51090228	mpnstwgs	325521
https://www.synapse.org/#!Synapse:syn3193805	proddata.sagebase.org	233216
https://www.synapse.org/#!Synapse:syn21050481	htan-dcc-vanderbilt	146573
https://www.synapse.org/#!Synapse:syn23511961	htan-dcc-duke	132917
https://www.synapse.org/#!Synapse:syn51364943	proddata.sagebase.org	126349
https://www.synapse.org/#!Synapse:syn22093319	htan-dcc-ohsu	73147
https://www.synapse.org/#!Synapse:syn2580853	ecmonsen-emorypipeline-lambda-055273631518-us-east-1	61412
https://www.synapse.org/#!Synapse:syn51703277	proddata.sagebase.org	51600
https://www.synapse.org/#!Synapse:syn25986113	sc-237179673806-pp-hg7rd6vvvd7mk-s3bucket-eu11pjwpxxiv	50989
https://www.synapse.org/#!Synapse:syn2580853	proddata.sagebase.org	50526
https://www.synapse.org/#!Synapse:syn51156910	proddata.sagebase.org	31063
https://www.synapse.org/#!Synapse:syn17865732	proddata.sagebase.org	26497

This analysis could further be refined to look just at down load of public data.

Versions Compared

Old Version 23

New Version Current

Key

bucket size_

tebibytes

How much is the data used?

Page Comparison

Versions Compared

Old Version 23

New Version Current

Key

bucket size_

tebibytes

How much is the data used?