Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

Introduction

Auditing data for the Synapse REST API is captured by a Spring Interceptor: AccessInterceptor that is similar to web filter.  This interceptor is configured to listen to all web services calls made to the repository services.  For each call, the AccessInterceptor will gather data to fill out an AccessRecord model object.  The AccessRecord data is then written as zipped CSV files directly to the  prod.access.record.sagebase.org  S3 bucket.  These CSV files are initially too small to process efficiently so a worker process merges the files by hour.

AccessRecord S3 Files

All AccessRecord CSV data for a single hour from all EC2 instance of a stack are into a single file.  The following is an example of the resulting path:

https://s3.amazonaws.com/prod.access.record.sagebase.org/000000013/2013-09-17/14-39-03-055-75ed6416-438d-4688-8789-3df56f4e4670.csv.gz

The above path is composed of the following parts:

https://s3.amazonaws.com/prod.access.record.sagebase.org/<instance_number>/<year_month_day>/<hour_minutes_seconds_miliseconds>-<UUID>.csv.gz

Here is some sample data from one of the access record files:

returnObjectId

elapseMStimestampviahostthreadIduserAgentqueryStringsessionIdxForwardedForrequestURLuserIdorigindatemethodvmIdinstancestacksuccess
 431379430033942 repo-prod-13.prod.sagebase.org659Synpase-Java-Client/2013-09-13-e70558e-662 Synapse-Web-Client/13.0 b4331f55-6c65-4f6b-a2c5-ee6830cf7641 /repo/v1/entity/header273978 2013-09-17POSTeca4eb39c13ac98c:7461944:1412973c3ee:-7ffd13prodtrue
 301379430033943 repo-prod-13.prod.sagebase.org656Jakarta Commons-HttpClient/3.1query=select+id,name,nodeType+from+entity+where+parentId+==+%22syn2228808%22+limit+500+offset+1b1b9c385-4dba-4e3a-b49c-0f40f7c99ac5 /repo/v1/query273978 2013-09-17GETeca4eb39c13ac98c:7461944:1412973c3ee:-7ffd13prodtrue
 141379430034027 repo-prod-13.prod.sagebase.org1177Synpase-Java-Client/2013-09-13-e70558e-662 Synapse-Web-Client/13.0mask=64597767ef-8ff2-40d0-a65d-b519f5b2f937 /repo/v1/entity/syn2228808/bundle273978 2013-09-17GET9b5a47b65e8703f0:229cd7a3:1412973c18a:-7ffd13prodtrue
syn2228807351379430034057 repo-prod-13.prod.sagebase.org159Synpase-Java-Client/2013-09-13-e70558e-662 Synapse-Web-Client/13.0 e9b15054-dbc6-454b-a1bf-8bef3d5f0fbc /repo/v1/entity/syn2228808/benefactor273978 2013-09-17GET9b5a47b65e8703f0:229cd7a3:1412973c18a:-7ffd13prodtrue
syn2228807191379430034107 repo-prod-13.prod.sagebase.org153Synpase-Java-Client/2013-09-13-e70558e-662 Synapse-Web-Client/13.0 23216ee3-dade-43ac-8efe-fa1e6dc9877d /repo/v1/entity/syn2228807/acl273978 2013-09-17GET9b5a47b65e8703f0:229cd7a3:1412973c18a:-7ffd13prodtrue
59638391379430034123 repo-prod-13.prod.sagebase.org656

Synpase-Java-Client/2013-09-13-e70558e-662 Synapse-Web-Client/13.0

 d7ed19dd-2ed9-47d1-b345-be1aaca0d688 /repo/v1/entity/syn2228808/wiki273978 2013-09-17GETeca4eb39c13ac98c:7461944:1412973c3ee:-7ffd13prodtrue

Column Description

  • returnedObjectId - For any method that returns an object with an ID, this column will contain the returned ID.  This is the only way to determine the ID of a newly created object from a POST.
  • elaseMS - The elapse time of the call in milliseconds.
  • timestamp - The exact time the call was made in epoch time (milliseconds since 1/1/1970).
  • via - The value of the "via" header (see: http://en.wikipedia.org/wiki/List_of_HTTP_header_fields).
  • host - The value of the "host" header (see: http://en.wikipedia.org/wiki/List_of_HTTP_header_fields).
  • threadId- The ID of the thread used to process the request.
  • userAgent - The value of the "User-Agent" header (see: http://en.wikipedia.org/wiki/List_of_HTTP_header_fields).
  • queryString - The value of the request queryString.
  • sessionId - For each call a new UUID is generated for the sessionId.  The sessionId is also bound to the logging thread context and written in all log entries.  This ties access records to log entries.
  • xForwardedFor - The value of the "X-Forwarded-For" header (see: http://en.wikipedia.org/wiki/List_of_HTTP_header_fields).
  • resquestURL - The URL of the request.
  • userID - For calls where the users is authenticated via a sessionToken or an API key, this column will contain the numeric ID of the user.
  • origin - The value of the "Origin" header (see: http://en.wikipedia.org/wiki/List_of_HTTP_header_fields).
  • date - The year-month-day date string.
  • method - The HTTP method: GET, POST, PUT, DELETE
  • vmId - When each EC2 instances of a stack starts, a new unique identifier for the JVM is issued.  This is captured in the access log so calls form a single machine can be grouped together.
  • instance - The instance number of the stack.
  • stack - The stack identifier.  This will always be "prod" for production stacks.
  • success - Set to true when a call complete without an exception, otherwise set to false.  The stack trace of exceptions can be found by searching the logs for the the sessionId of any failed access records.

Log Files

Log files for all components of a stack are also captured in S3, including data from: repo, portal, and works.

 

 

 

 

  • No labels