Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
languagesql
SELECT COUNT(*) FROM MULTIPART_UPLOAD U WHERE U.STATE = 'UPLOADING'

Result: 6353

We can have a rough estimate of the amount of data that has been uploaded in prod but not yet completed:

Code Block
languagesql
WITH UPLOADING AS (
	SELECT U.ID, U.PART_SIZE, COUNT(*) AS PARTS FROM MULTIPART_UPLOAD U JOIN MULTIPART_UPLOAD_PART_STATE P ON U.ID = P.UPLOAD_ID
	WHERE U.STATE = 'UPLOADING' AND U.BUCKET = 'proddata.sagebase.org'
	GROUP BY U.ID
),
UPLOADING_SIZE AS (
	SELECT (PART_SIZE * PARTS) AS SIZE FROM UPLOADING
)
SELECT COUNT(*), SUM(SIZE) FROM UPLOADING_SIZE

Count

Size

3037

2649792499252 (2.4TB)

So we have about 2.4TB of data that could be potentially freed just removing the unfinished multipart uploads.

Upon further analysis of the backend code we discovered a bug where a multipart upload is initiated when we create or update a wiki page using the first version of the wiki API that submitted the markdown as a string. The multipart upload is never completed for such cases: https://sagebionetworks.jira.com/browse/PLFM-6523. Additionally the new multipart upload that tracks the uploads was implemented relatively recently, the previous implementation might have left behind other unfinished multipart uploads.

...