Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Round Robin type data store - Basically any kind of storage that is fixed in size, and does semi-automatic data aggregation.  The basic idea is that for a period X you have the full details of whatever data you log.  Then after X has elapsed, the data is aggregated in one, or several ways (average, minimum, maximum).  This aggregate data is then stored for another period Y.  Repeat until the data is  no longer relevant and can be dropped from the store.
  • Data Collection - Since Synapse is on Amazon's Elastic Beanstalk, there is the possibility that data usage must be aggregated from several different Synapse instances.  In addition, certain data (like user activity) is most easily collected from other sources than the services (like the Crowd servers).  Thus some kind of data collection mechanism is needed.
  • Data Interpretation - Since both metrics so far proposed (User and project activity levels) are somewhat expensive to compute (if it's even possible), ideally the front-end GUI will not request that this data be recomputed ever.  Some background process - whether it is hosted in the metrics web server, or run independently - is needed to do any kind of pre-processing to the data before it is entered into the data store.
Proposed Solutions
Amazon Cloudwatch

...

An attractive system for several reasons:

...

However, the real time series data like the number of active projects or users at any given time (or the number of inactive, aborted etc.) would be perfect for storage in Cloudwatch.  Depending on the actual state of how long data is actually available, this might be a viable solution.
Custom EC2 Instance with RDS backing
  • No storage limitations except cost
  • No data expectations to workaround
  • This EC2 instance would be able to act as both the data collection mechanism and the data store, allowing it to keep it's back end storage mechanism in a consistent state.

Cons:

  • Another custom application/library to build and maintain
  • It's not clear what the best way to actually implement the collecting of the data would be.

...

Basically the same source data as for User activity is available (or could be), and the method for calculating it is essentially the same.  So it's essentially the same metric, just for projects, not users.  There may be the additional information on what type of usage it was (data access, data modification, data addition etc.), but otherwise the same.

Summary

It seems like the best option for the back end would be a combination of using both a custom data storage system and Amazon Cloudwatch.  The custom data store would be used to hold "windowed" data, that is high resolution raw data (like per-user login events with timestamps, or project activity events) but for a specifically limited period of time.  This data would then be periodically (daily) aggregated and pushed to Cloudwatch.  Then, if it turns out that we want to store data at a higher resolution or for a longer period than cloud watch does, we can export it to our own round robin database tool (possibly rrdTool).