Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Table of Contents

How much data is there?

Synapse has a object called a “file handle”, a low-level object which references a cloud storage bucket and key and which has basic metadata like file name and size. Conceptually the data “footprint” of Synapse is the sum of the sizes of all the files which Synapse indexes as file handles. The Athena filesnapshots tables lists all the file handles in Synapse and should be useful for computing count and aggregate size statistics. However there are duplicate records to be addressed. First, "snapshots" are taken periodically so we want to eliminate this by taking just the latest snapshot of each record. There can also be multiple file IDs for a <bucket, key> pair, which we can address by taking just the latest ID. The following query in Athena asks how much file data is indexed in Synapse:

...