/
Stale Buckets

Stale Buckets

It’s possible for a user to link their private bucket to Synapse, populate it with files, then remove the bucket, leaving ‘stale’ file references in Synapse. To find such stale buckets, we first queried the data warehouse for the list of buckets linked to Synapse:

with latest as ( select bucket, key, max(snapshot_timestamp) as snapshot_timestamp from filesnapshots where snapshot_timestamp > current_timestamp - INTERVAL '14' DAY group by bucket, key, id, content_size ), max_id as ( select bucket, key, max(id) as id from filesnapshots where snapshot_timestamp > current_timestamp - INTERVAL '14' DAY group by bucket, key ) select fs.bucket, sum(fs.content_size)/1048576 size_megabytes from filesnapshots fs join latest on fs.snapshot_timestamp=latest.snapshot_timestamp and fs.bucket=latest.bucket and fs.key=latest.key join max_id on fs.id=max_id.id and fs.bucket=max_id.bucket and fs.key=max_id.key where fs.snapshot_timestamp > current_timestamp - INTERVAL '14' DAY group by fs.bucket order by size_megabytes desc

This returned 215 results, starting with these 10:

proddata.sagebase.org

987312108

ad-knowledge-portal-main

387109989

amp-mayo-sinai-synapseencryptedexternalbucket-1bmvn8rlwixv2

322466473

ad-knowledge-portal-large

247570974

nda-bsmn-scratch

160541109

htan-dcc-htapp

154986925

exceptional-longevity

69363601

diverse-cohorts

57514536

nf-syn23664726-s3-bucket-n9uakf7bowwd

48237330

mpnstwgs

47217345

We then ran this script on the Synapse ‘ops’ Jenkins server to find buckets which are unreachable by Synapse:

#!/bin/bash declare -a AttachedBuckets=( "proddata.sagebase.org" "ad-knowledge-portal-main" "amp-mayo-sinai-synapseencryptedexternalbucket-1bmvn8rlwixv2" "ad-knowledge-portal-large" "nda-bsmn-scratch" "htan-dcc-htapp" "exceptional-longevity" "diverse-cohorts" "nf-syn23664726-s3-bucket-n9uakf7bowwd" "mpnstwgs" ... ) set +e echo "The following buckets are not connected to Synapse:" # Iterate the string array using for loop for val in ${AttachedBuckets[@]}; do if aws s3api head-bucket --bucket $val 2> /dev/null; then : else echo $val fi done

 

This produced the following list of 110 disconnected buckets:

 

htan-dcc-htapp amp-alzheimers-mssm synodos.eu.frankfurt amp-alzheimers-mayo htan-dcc-bu amp-alzheimers-rushbroad dmchallenge.synapse.org endomapperlossless kf-study-us-east-1-prd-sd-bhjxbdqk jkiang-sts-us-west-2 metanetworks htan-perf-test-pdkvtcffvsvpaukp htan-dcc-srrs jkiang-sts-test jkiang-py-upload-perf-test nkauer-test-bucket nkauer-test-bucket-2 plfm-5212-unencrypted-bucket jkiang-external-bucket plfm-5212-encrypted-bucket sc-237179673806-pp-cyco5q2hxoiq2-s3bucket-462bbk93wqqg targetosteo synapse-perf par-synapse-test metanetworksynpasetestbucket htan-dcc-center-b phil-sample-sensor-data target-osteosarcoma test-bucket-cl gcptosynapse nrg-test2 xdoan-clarisse-htan-s3-synapseencryptedexternalbu-1242pz4oeu3c7 btc-sage-bucket-example xschildw-test.sagebase.org sage-stack2-data atediarjo-bucket-kdtest synapse-gcloud-demo xs.s3.bucket.east.sagebase.org tyu-test-bucket htan-dcc-dsa-test sage-external-bucket-aryton python-client-integration-test.sagebase.org djengtest vt-test-synapse omgwhyisfoobartaken synapse-test-bucket atri-a4-image-share-synapse-543263417119 sc-237179673806-pp-daamam3ykuje4-s3bucket-1jslzxl1xq4zm sc-237179673806-pp-wqncg5habar2m-s3bucket-145lkuisa6gqf test1-kdaily-encryptedex-synapseencryptedexternal-1tw5bmn8d70hx sc-237179673806-pp-vm3xmsz3uax7q-s3bucket-1274cotbk6f2n sc-237179673806-pp-i7vklp56the66-s3bucket-eal8qlgc87kj sc-237179673806-pp-hrx2giywqix2s-s3bucket-mhb2u0tf0ift nrg-test-s3 test-gstorage-docs-xschildw.sagebase.org bridgeserver2-develop-processedhealthdatabucket-ivek7ruwdwo5 tthyer-synapse-bucket kdaily-test-direct-acces-synapseencryptedexternal-1ik9hqs85g1f8 synapse-sync btc-synapse test-3398.sagebase.org org-sagebridge-attachment-uat testifownerisneeded external.bucket.test sc-237179673806-pp-2k2han5e5yflq-s3bucket-9qtpvgladj1j sc-237179673806-pp-j52t2hyhut5k6-s3bucket-awh918n7wk3x plfm5082.sagebase.org org-sagebridge-rawhealthdata-devstaging xindi-test-cvat org-sagebridge-rawhealthdata-devlocal-dwaynejeng gk-cvisb-two srpbs-synapse-test testbucketpolicyforsynapse testingbucketwithr bridgeserver2-prod-processedhealthdatabucket-16zre2b1awr2x swc-5086.sagebase.org zdong-test-bucket io.sunrisedata.sage-sandbox kd-test.sagebase.org plfm-4637.sagebase.org diverse-cohorts-main sc-237179673806-pp-hdapydq2iyt4q-s3bucket-t8pns4o03y9a oms-atlas tyu-test test-synapse-gstorage-xschildw.sagebase.org org-sagebridge-attachment-develop org-sagebridge-rawhealthdata-devdevelop sc-237179673806-pp-7sencllu5em3a-s3bucket-miptirty88ld com.dnli.bioinfo.synapse srpbs-test test-external-bucket-config-synapse ziming-test-same-region-synapseencryptedexternalb-nm2mkkbhm3f8 s3-synapse-demo-trisha.zintel it1142.sagebase.org org-sagebridge-attachment-dwaynejeng synapse-testing plfm5082-upstream.sagebase.org beabrian-synapse test1-kdaily-unencryptedext-synapseexternalbucket-ooygqkyifsqx sc-237179673806-pp-hnscyp6cgdbsy-s3bucket-w91j7ocw8wrv testinanotherregion phil-test-script-werer sage-stack5-data synpy1054 bridgeserver2-uat-processedhealthdatabucket-zddu4sii7pgj ecmonsen-emorypipeline-lambda-localdev-563295687221-us-east-1 smoke-293-1.sagebase.org test-bucketmover-1 dm-dryrun test-bucketmover-2

 

Related content

How much data is in Synapse and how much is it used?
How much data is in Synapse and how much is it used?
More like this
S3 Bucket Analysis
S3 Bucket Analysis
More like this
Monthly Project Statistics
Monthly Project Statistics
More like this
Reconcile Synapse Download Records with S3 Egress charges
Reconcile Synapse Download Records with S3 Egress charges
More like this