...
On further analysis we checked a few of those files and we could see that they were in fact normal multipart uploads in the DB with the relative file handles. The reason for this inconsistency is that we encrypted the S3 bucket back in 2019, this most likely was done using a PUT copy of the same object. This belief is reinforced by the fact that the modified dates on those objects seem to be consistent with the timeline of the encryption, while the original upload date in synapse was done prior. If the python API was used most likely all the objects that were smaller than a certain size were “copied” over without multipart.
...
File Handles
We created a table that contains only data pointing to the proddata.sagebase.org bucket that also includes a de-duplication identifier:
...
Note that this number most likely contains temporary objects never deleted (e.g. temporary multipart upload files, old tests, staging data etc).
...
File Entities
From the synapse storage report that is generated monthly and stored in a synapse table we can get an idea of how much data is used by synapse entities in projects, the following query gets the aggregated sum of the size in bytes for the last 10 months:
...