...
- Round Robin type data store - Basically any kind of storage that is fixed in size, and does semi-automatic data aggregation. The basic idea is that for a period X you have the full details of whatever data you log. Then after X has elapsed, the data is aggregated in one, or several ways (average, minimum, maximum). This aggregate data is then stored for another period Y. Repeat until the data is no longer relevant and can be dropped from the store.
- Data Collection - Since Synapse is on Amazon's Elastic Beanstalk, there is the possibility that data usage must be aggregated from several different Synapse instances. In addition, certain data (like user activity) is most easily collected from other sources than the services (like the Crowd servers). Thus some kind of data collection mechanism is needed.
- Data Interpretation - Since both metrics so far proposed (User and project activity levels) are somewhat expensive to compute (if it's even possible), ideally the front-end GUI will not request that this data be recomputed ever. Some background process - whether it is hosted in the metrics web server, or run independently - is needed to do any kind of pre-processing to the data before it is entered into the data store.
Proposed Solutions
Amazon Cloudwatch
Since it turns out Cloudwatch does no data aggregation/consolidation and holds onto data Cloudwatch stores data for exactly two weeks, this is not suitable for the usage metrics use case. To extend the life span, the data must be retrieved via CloudWatch API and then stored to S3, DynamoDB, or Redshift.
Custom EC2 Instance
Pros:
...