...
Proposed Solutions
Amazon Cloudwatch
Since it turns out Cloudwatch does no data aggregation/consolidation and holds onto data for exactly two weeks, this is not suitable for the usage metrics use case.
...
- No storage limitations except cost
- No data expectations to workaround
- This EC2 instance would be able to act as both the data collection mechanism and the data store, allowing it to keep it's back end storage mechanism in a consistent state.
Cons:
- Another custom application/library to build and maintain
- It's not clear what the best way to actually implement the collecting of the data would be.
Google BigQuery
Pros:
- No storage limitations except cost
- Can store the data in full detail
- Fast search times, with no pre-calculated search parameters
- Thus exposing a new metric is as simple as thinking of it, and then implementing fetching the data from BigQuery, and then revealing it in the UI
- The infrastructure is built for us. Amazon provides all the pieces to make a system like BigQuery, or at least that solves the same problems, but they're pieces, not a product.
Cons:
- It's a Google technology, not Amazon thus doubling the number of accounts, maintenance etc.
- I feel like there may be others, but I can't think of them right now
Data Requirements
User Activity Data
Given data set: A time-ordered list (or set of lists) of user auth-events from the crowd servers
...
- Current activity status
- Creation date
- Last login date
- Number of total logins
Project Activity Data
Basically the same source data as for User activity is available (or could be), and the method for calculating it is essentially the same. So it's essentially the same metric, just for projects, not users. There may be the additional information on what type of usage it was (data access, data modification, data addition etc.), but otherwise the same.
...