Skip to end of metadata
Go to start of metadata

You are viewing an old version of this content. View the current version.

Compare with Current View Version History

Version 1 Current »

It’s possible for a user to link their private bucket to Synapse, populate it with files, then remove the bucket, leaving ‘stale’ file references in Synapse. To find such stale buckets, we first queried the data warehouse for the list of buckets linked to Synapse:

with latest as (
    select bucket, key, max(snapshot_timestamp) as snapshot_timestamp
    from filesnapshots
    where snapshot_timestamp > current_timestamp - INTERVAL '14' DAY
    group by bucket, key, id, content_size
),
max_id as (
    select bucket, key, max(id) as id
    from filesnapshots
    where snapshot_timestamp > current_timestamp - INTERVAL '14' DAY
    group by bucket, key
)
select fs.bucket, sum(fs.content_size)/1048576 size_megabytes
from filesnapshots fs join latest on fs.snapshot_timestamp=latest.snapshot_timestamp and fs.bucket=latest.bucket and fs.key=latest.key
join max_id on fs.id=max_id.id and fs.bucket=max_id.bucket and fs.key=max_id.key
where 
fs.snapshot_timestamp > current_timestamp - INTERVAL '14' DAY
and status='AVAILABLE'
group by fs.bucket
order by size_megabytes desc

This returned 193 results, starting with these 10:

bucket

size_megabytes

proddata.sagebase.org

736866405

ad-knowledge-portal-main

378302670

ad-knowledge-portal-large

247570974

amp-mayo-sinai-synapseencryptedexternalbucket-1bmvn8rlwixv2

170940992

nda-bsmn-scratch

160145196

htan-dcc-htapp

89183976

exceptional-longevity

69345301

diverse-cohorts

55424113

mpnstwgs

47217345

nf-syn23664726-s3-bucket-n9uakf7bowwd

45949738

We then ran this script on the Synapse ‘ops’ Jenkins server to find buckets which are unreachable by Synapse:

#!/bin/bash

declare -a AttachedBuckets=(
"proddata.sagebase.org"
"ad-knowledge-portal-main"
"ad-knowledge-portal-large"
"amp-mayo-sinai-synapseencryptedexternalbucket-1bmvn8rlwixv2"
"nda-bsmn-scratch"
"htan-dcc-htapp"
"exceptional-longevity"
"diverse-cohorts"
"mpnstwgs"
"nf-syn23664726-s3-bucket-n9uakf7bowwd"
...
)

set +e

echo "The following buckets are not connected to Synapse:"
# Iterate the string array using for loop
for val in ${AttachedBuckets[@]}; do
 if aws s3api head-bucket --bucket  $val ; then
    echo " " 
 else
    echo $val
 fi
done

This produced the following list of disconnected buckets:

htan-dcc-htapp
synodos.eu.frankfurt
htan-dcc-bu
endomapperlossless
kf-study-us-east-1-prd-sd-bhjxbdqk
metanetworks
jkiang-sts-us-west-2
jkiang-py-upload-perf-test
amp-alzheimers-rushbroad
htan-dcc-srrs
htan-perf-test-pdkvtcffvsvpaukp
amp-alzheimers-mssm
par-synapse-test
jkiang-external-bucket
metanetworksynpasetestbucket
targetosteo
htan-dcc-center-b
xdoan-clarisse-htan-s3-synapseencryptedexternalbu-1242pz4oeu3c7
btc-sage-bucket-example
xschildw-test.sagebase.org
synapse-gcloud-demo
xs.s3.bucket.east.sagebase.org
test-bucket-cl
target-osteosarcoma
gcptosynapse
sage-external-bucket-aryton
vt-test-synapse
phil-sample-sensor-data
python-client-integration-test.sagebase.org
omgwhyisfoobartaken
plfm-5212-unencrypted-bucket
com.dnli.bioinfo.synapse
test-synapse-gstorage-xschildw.sagebase.org
synapse-sync
test1-kdaily-unencryptedext-synapseexternalbucket-ooygqkyifsqx
beabrian-synapse
sc-237179673806-pp-i7vklp56the66-s3bucket-eal8qlgc87kj
plfm5082.sagebase.org
sc-237179673806-pp-wqncg5habar2m-s3bucket-145lkuisa6gqf
org-sagebridge-rawhealthdata-devdevelop
nrg-test2
testinanotherregion
test-bucketmover-1
zdong-test-bucket
ecmonsen-emorypipeline-lambda-localdev-563295687221-us-east-1
dm-dryrun
ziming-test-same-region-synapseencryptedexternalb-nm2mkkbhm3f8
plfm5082-upstream.sagebase.org
plfm-4637.sagebase.org
test-external-bucket-config-synapse
sc-237179673806-pp-cyco5q2hxoiq2-s3bucket-462bbk93wqqg
synpy1054
s3-synapse-demo-trisha.zintel
btc-synapse
atediarjo-bucket-kdtest
org-sagebridge-attachment-develop
dmchallenge.synapse.org
phil-test-script-werer
synapse-perf
synapse-test-bucket
xindi-test-cvat
sc-237179673806-pp-7sencllu5em3a-s3bucket-miptirty88ld
test-3398.sagebase.org
sc-237179673806-pp-2k2han5e5yflq-s3bucket-9qtpvgladj1j
nrg-test-s3
sc-237179673806-pp-hrx2giywqix2s-s3bucket-mhb2u0tf0ift
atri-a4-image-share-synapse-543263417119
oms-atlas
external.bucket.test
org-sagebridge-attachment-dwaynejeng
bridgeserver2-develop-processedhealthdatabucket-ivek7ruwdwo5
sc-237179673806-pp-daamam3ykuje4-s3bucket-1jslzxl1xq4zm
plfm-5212-encrypted-bucket
bridgeserver2-prod-processedhealthdatabucket-16zre2b1awr2x
test1-kdaily-encryptedex-synapseencryptedexternal-1tw5bmn8d70hx
synapse-testing
test-bucketmover-2
kdaily-test-direct-acces-synapseencryptedexternal-1ik9hqs85g1f8
org-sagebridge-attachment-uat
nkauer-test-bucket
jkiang-sts-test
htan-dcc-dsa-test
test-gstorage-docs-xschildw.sagebase.org
srpbs-synapse-test
it1142.sagebase.org
tyu-test
bridgeserver2-uat-processedhealthdatabucket-zddu4sii7pgj
testingbucketwithr
kd-test.sagebase.org
gk-cvisb-two
smoke-293-1.sagebase.org
tthyer-synapse-bucket

  • No labels