Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Of note is that part of the unlinked data is temporary file handles that synapse creates internally (e.g. table queries, tables csv for uploads etc), the fact that each month we have such a high number (30K) of file handles that are unlinked might be attributed to this. Unfortunately we do not tag this kind of data so we do not know exactly how much of those 1TiB are actually temporary file handles (We do know that at least 20TiB of unlinked data is due to this: https://sagebionetworks.jira.com/wiki/spaces/PLFM/pages/1629782065/S3+Bucket+Analysis#Temporary-File-Handles).

To solve this we should upload this data either with an special object tag that expires the object through a lifecycle transition or in a dedicated bucket (e.g. another storage location) with a dedicated lifecycle that expires everything after 30 days. This temporary objects could be flagged in the database with a special status (e.g. TEMPORARY) so that they are not collected as UNLINKED but rather removed either by S3 notifications (see open questions below) or through a worker.

...