Content Report Table | isMissingRequiredParameters | true|
---|---|---|
Table of Contents | ||
|
Introduction
Current System
...
Each file upload and download event is captured by FileEventRecordWorker and sent to kinesis stream, which stores the data in S3 in queryable Parquet format. The file event can be queried using <Env><Stack>firehoselogs.fileuploadsrecords and <Stack>firehoselogs.filedownloadsrecords Glue tables in Athena.
StatisticsMonthlyStatusWatcherWorker dentifies unprocessed months for project statistics and initiates a processing request by sending a message to a queue, allowing the processing to begin for the specified object type and month.
StatisticsMonthlyWorker retrieve the message from queue and processes it. This worker executes the Athena query for file upload and file download statistics and store the results in STATISTICS_MONTHLY_PROJECT_FILES table of the Synapse main database.
The Synapse users who are admins or have read permissions on a project can access the statistics with https://rest-docs.synapse.org/rest/POST/statistics.html
Why changes are required in current
...
system?
The architecture of the Synapse data warehouse has been updated. Audit and snapshot data is now being sent to S3 in JSON format. AWS Glue ETL jobs process this data and store it in a queryable Parquet format. The fileuploadrecords and filedownloadrecords tables are now accessible in the warehouse database within Glue. As a result, file event data is duplicated, with one set residing in the older <Stack>firehoselogs Glue database and another set in the warehouse Glue database. We should use the newer database and eliminate the outdated Kinesis streams and Glue database.
...