Content Comparison

...

Code Block

language	sql

SELECT COUNT(*) FROM MULTIPART_UPLOAD U WHERE U.STATE = 'UPLOADING'

Result: 6353

We can have a rough estimate of the amount of data that has been uploaded in prod but not yet completed:

Code Block

language	sql

WITH UPLOADING AS (
	SELECT U.ID, U.PART_SIZE, COUNT(*) AS PARTS FROM MULTIPART_UPLOAD U JOIN MULTIPART_UPLOAD_PART_STATE P ON U.ID = P.UPLOAD_ID
	WHERE U.STATE = 'UPLOADING' AND U.BUCKET = 'proddata.sagebase.org'
	GROUP BY U.ID
),
UPLOADING_SIZE AS (
	SELECT (PART_SIZE * PARTS) AS SIZE FROM UPLOADING
)
SELECT COUNT(*), SUM(SIZE) FROM UPLOADING_SIZE

Count	Size
3037	2649792499252 (2.4TB)

So we have about 2.4TB of data that could be potentially freed just removing the unfinished multipart uploads.

Upon further analysis of the backend code we discovered a bug where a multipart upload is initiated when we create or update a wiki page using the first version of the wiki API that submitted the markdown as a string. The multipart upload is never completed for such cases: https://sagebionetworks.jira.com/browse/PLFM-6523. Additionally the new multipart upload that tracks the uploads was implemented relatively recently, the previous implementation might have left behind other unfinished multipart uploads.

...

Version	Old Version 18	New Version 19
Changes made by	Marco Marasca	Marco Marasca
Saved on	Nov 20, 2020	Dec 03, 2020

Versions Compared

Key