...
dt | count | size |
---|---|---|
2020-11-05-00-00 | 41625517 | 704212857157112 (640.4TB) |
2020-11-08-00-00 | 41654050 | 708573173690177 (644.4TB) |
The inventory reports also if a file was uploaded as multipart, this would provide us with how many objects are uploaded not through the standard synapse upload API:
...
2020-11-08-00-00 | 5712315 | 531513092323401 (483.4TB) |
2020-11-05-00-00 | 5705476 | 527196059092709 (479.4TB) |
This result is surprising, only 5.7M seems to be multipart uploads but we do have an order of magnitude more than that, what is going on?
On further analysis we checked a few of those files and we could see that they were in fact normal multipart uploads in the DB with the relative file handles. The reason for this inconsistency is that we encrypted the S3 bucket back in 2019, this most likely was done using a PUT copy of the same object. This belief is enforced reinforced by the fact that the modified dates on those objects seem to be consistent with the timeline of the encryption, while the original upload date in synapse was done prior. If the python API was used most likely all the objects that were smaller than a certain size were copied “copied” over without multipart.
Synapse File Handles
...