Generating a File Access Report
Introduction
In August 2013, audit logging was added to all Synapse REST calls. Basically, a web filter on the REST servers, records all HTTP calls. This is similar to the tomcat access log but with Synapse specific metadata including the ID of the user that made the call. The logs swept to an S3 buck (prod.access.record.sagebase.org) where they are collated by stack number, date, and hour. The logs are formatted as comma-separated-values (CSV) where each row represents a single HTTP call. Each CSV file is gzipped(gz). The following example shows entries recorded from the elastic beanstalk health check:
returnObjectId | elapseMS | timestamp | via | host | threadId | userAgent | queryString | sessionId | xForwardedFor | requestURL | userId | origin | date | method | vmId | instance | stack | success | responseStatus |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2 | 1404964808225 | 10.65.180.149 | 159 | ELB-HealthChecker/1.0 | 86d2c75c-bff0-4af8-80ce-fdf125e9c44e | /repo/v1/version | 273950 | 2014-07-10 | GET | ebcda4f1e0855063:-680a71b6:1471d26a22c:-7ffd | 49 | prod | true | 200 | |||||
2 | 1404964808227 | 10.230.21.27 | 144 | ELB-HealthChecker/1.0 | 3a84edf2-e942-4702-b59f-68484625122f | /repo/v1/version | 273950 | 2014-07-10 | GET | f2f37ebbcf90c925:60e5000c:1471d2705fc:-7ffd | 49 | prod | true | 200 |
We often need to generate a report of a users that accessed a particular set of files. This guild will walk through the steps required to generate such a report.
Setup the AWS command line client
Since we will need to inspect many CSV files to we need tool that can help download the required access record files from S3. The AWS command-line-client has a powerful utility that can be used to synchronize files from an S3 bucket to a local hard-drive. The client can be downloaded from Amazon: http://aws.amazon.com/cli/ Once the client is installed follow the instruction for setting up your AWS access ID and key.
Once the client is installed and setup correctly, test the setup by running the following:
$ aws s3 ls s3://prod.access.record.sagebase.org
You should see the list of all folders for each stack:
PRE 000000009/ PRE 000000010/ PRE 000000011/ PRE 000000012/ PRE 000000013/ ....