...
Code Block | ||
---|---|---|
| ||
SELECT COUNT(*) FROM MULTIPART_UPLOAD U WHERE U.STATE = 'UPLOADING' |
Result: 6353
We can have a rough estimate of the amount of data that has been uploaded in prod but not yet completed:
Code Block | ||
---|---|---|
| ||
WITH UPLOADING AS (
SELECT U.ID, U.PART_SIZE, COUNT(*) AS PARTS FROM MULTIPART_UPLOAD U JOIN MULTIPART_UPLOAD_PART_STATE P ON U.ID = P.UPLOAD_ID
WHERE U.STATE = 'UPLOADING' AND U.BUCKET = 'proddata.sagebase.org'
GROUP BY U.ID
),
UPLOADING_SIZE AS (
SELECT (PART_SIZE * PARTS) AS SIZE FROM UPLOADING
)
SELECT COUNT(*), SUM(SIZE) FROM UPLOADING_SIZE |
Count | Size |
---|---|
3037 | 2649792499252 (2.4TB) |
So we have about 2.4TB of data that could be potentially freed just removing the unfinished multipart uploads.
Upon further analysis of the backend code we discovered a bug where a multipart upload is initiated when we create or update a wiki page using the first version of the wiki API that submitted the markdown as a string. The multipart upload is never completed for such cases: https://sagebionetworks.jira.com/browse/PLFM-6523. Additionally the new multipart upload that tracks the uploads was implemented relatively recently, the previous implementation might have left behind other unfinished multipart uploads.
...