Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

An alternative is to setup a Glue job** that periodically dumps the files table to S3 similar to the S3 inventory and then a job will join on the 2 tables using Athena to find un-indexed data, note however that we need to make sure that no temporary data (e.g. the multipart upload parts) is included in the join, e.g. filtering by sensible dates.

In general we should make sure that any data that ends up in the prod bucket is actually indexed in file handles, for example I would move the temporary objects used for the multipart upload in its own dedicated bucket.

** Unfortunately at the time of writing AWS Glue does not seem to support a connection to MySQL 8.

...