Stale Buckets
It’s possible for a user to link their private bucket to Synapse, populate it with files, then remove the bucket, leaving ‘stale’ file references in Synapse. To find such stale buckets, we first queried the data warehouse for the list of buckets linked to Synapse:
with latest as (
select bucket, key, max(snapshot_timestamp) as snapshot_timestamp
from filesnapshots
where snapshot_timestamp > current_timestamp - INTERVAL '14' DAY
group by bucket, key, id, content_size
),
max_id as (
select bucket, key, max(id) as id
from filesnapshots
where snapshot_timestamp > current_timestamp - INTERVAL '14' DAY
group by bucket, key
)
select fs.bucket, sum(fs.content_size)/1048576 size_megabytes
from filesnapshots fs join latest on fs.snapshot_timestamp=latest.snapshot_timestamp and fs.bucket=latest.bucket and fs.key=latest.key
join max_id on fs.id=max_id.id and fs.bucket=max_id.bucket and fs.key=max_id.key
where
fs.snapshot_timestamp > current_timestamp - INTERVAL '14' DAY
group by fs.bucket
order by size_megabytes desc
This returned 215 results, starting with these 10:
987312108 | |
ad-knowledge-portal-main | 387109989 |
amp-mayo-sinai-synapseencryptedexternalbucket-1bmvn8rlwixv2 | 322466473 |
ad-knowledge-portal-large | 247570974 |
nda-bsmn-scratch | 160541109 |
htan-dcc-htapp | 154986925 |
exceptional-longevity | 69363601 |
diverse-cohorts | 57514536 |
nf-syn23664726-s3-bucket-n9uakf7bowwd | 48237330 |
mpnstwgs | 47217345 |
We then ran this script on the Synapse ‘ops’ Jenkins server to find buckets which are unreachable by Synapse:
#!/bin/bash
declare -a AttachedBuckets=(
"proddata.sagebase.org"
"ad-knowledge-portal-main"
"amp-mayo-sinai-synapseencryptedexternalbucket-1bmvn8rlwixv2"
"ad-knowledge-portal-large"
"nda-bsmn-scratch"
"htan-dcc-htapp"
"exceptional-longevity"
"diverse-cohorts"
"nf-syn23664726-s3-bucket-n9uakf7bowwd"
"mpnstwgs"
...
)
set +e
echo "The following buckets are not connected to Synapse:"
# Iterate the string array using for loop
for val in ${AttachedBuckets[@]}; do
if aws s3api head-bucket --bucket $val 2> /dev/null; then
:
else
echo $val
fi
done
This produced the following list of 110 disconnected buckets:
htan-dcc-htapp
amp-alzheimers-mssm
synodos.eu.frankfurt
amp-alzheimers-mayo
htan-dcc-bu
amp-alzheimers-rushbroad
dmchallenge.synapse.org
endomapperlossless
kf-study-us-east-1-prd-sd-bhjxbdqk
jkiang-sts-us-west-2
metanetworks
htan-perf-test-pdkvtcffvsvpaukp
htan-dcc-srrs
jkiang-sts-test
jkiang-py-upload-perf-test
nkauer-test-bucket
nkauer-test-bucket-2
plfm-5212-unencrypted-bucket
jkiang-external-bucket
plfm-5212-encrypted-bucket
sc-237179673806-pp-cyco5q2hxoiq2-s3bucket-462bbk93wqqg
targetosteo
synapse-perf
par-synapse-test
metanetworksynpasetestbucket
htan-dcc-center-b
phil-sample-sensor-data
target-osteosarcoma
test-bucket-cl
gcptosynapse
nrg-test2
xdoan-clarisse-htan-s3-synapseencryptedexternalbu-1242pz4oeu3c7
btc-sage-bucket-example
xschildw-test.sagebase.org
sage-stack2-data
atediarjo-bucket-kdtest
synapse-gcloud-demo
xs.s3.bucket.east.sagebase.org
tyu-test-bucket
htan-dcc-dsa-test
sage-external-bucket-aryton
python-client-integration-test.sagebase.org
djengtest
vt-test-synapse
omgwhyisfoobartaken
synapse-test-bucket
atri-a4-image-share-synapse-543263417119
sc-237179673806-pp-daamam3ykuje4-s3bucket-1jslzxl1xq4zm
sc-237179673806-pp-wqncg5habar2m-s3bucket-145lkuisa6gqf
test1-kdaily-encryptedex-synapseencryptedexternal-1tw5bmn8d70hx
sc-237179673806-pp-vm3xmsz3uax7q-s3bucket-1274cotbk6f2n
sc-237179673806-pp-i7vklp56the66-s3bucket-eal8qlgc87kj
sc-237179673806-pp-hrx2giywqix2s-s3bucket-mhb2u0tf0ift
nrg-test-s3
test-gstorage-docs-xschildw.sagebase.org
bridgeserver2-develop-processedhealthdatabucket-ivek7ruwdwo5
tthyer-synapse-bucket
kdaily-test-direct-acces-synapseencryptedexternal-1ik9hqs85g1f8
synapse-sync
btc-synapse
test-3398.sagebase.org
org-sagebridge-attachment-uat
testifownerisneeded
external.bucket.test
sc-237179673806-pp-2k2han5e5yflq-s3bucket-9qtpvgladj1j
sc-237179673806-pp-j52t2hyhut5k6-s3bucket-awh918n7wk3x
plfm5082.sagebase.org
org-sagebridge-rawhealthdata-devstaging
xindi-test-cvat
org-sagebridge-rawhealthdata-devlocal-dwaynejeng
gk-cvisb-two
srpbs-synapse-test
testbucketpolicyforsynapse
testingbucketwithr
bridgeserver2-prod-processedhealthdatabucket-16zre2b1awr2x
swc-5086.sagebase.org
zdong-test-bucket
io.sunrisedata.sage-sandbox
kd-test.sagebase.org
plfm-4637.sagebase.org
diverse-cohorts-main
sc-237179673806-pp-hdapydq2iyt4q-s3bucket-t8pns4o03y9a
oms-atlas
tyu-test
test-synapse-gstorage-xschildw.sagebase.org
org-sagebridge-attachment-develop
org-sagebridge-rawhealthdata-devdevelop
sc-237179673806-pp-7sencllu5em3a-s3bucket-miptirty88ld
com.dnli.bioinfo.synapse
srpbs-test
test-external-bucket-config-synapse
ziming-test-same-region-synapseencryptedexternalb-nm2mkkbhm3f8
s3-synapse-demo-trisha.zintel
it1142.sagebase.org
org-sagebridge-attachment-dwaynejeng
synapse-testing
plfm5082-upstream.sagebase.org
beabrian-synapse
test1-kdaily-unencryptedext-synapseexternalbucket-ooygqkyifsqx
sc-237179673806-pp-hnscyp6cgdbsy-s3bucket-w91j7ocw8wrv
testinanotherregion
phil-test-script-werer
sage-stack5-data
synpy1054
bridgeserver2-uat-processedhealthdatabucket-zddu4sii7pgj
ecmonsen-emorypipeline-lambda-localdev-563295687221-us-east-1
smoke-293-1.sagebase.org
test-bucketmover-1
dm-dryrun
test-bucketmover-2