Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. The actual dashboard component, or the UI (front end)
  2. The data storage/collection mechanism (back end)

The dashboard is a pretty straightforward piece of software, simply taking information from the storage mechanism (whatever that might be) and displaying it to the end user.  For this end of the project it seems natural to continue using technologies that are already in use - namely GWT.  Specifically, to facilitate creating a user-friendly experience, and for the support that it provides in terms of graphs and charts GXT 3 seems like a strong candidate for doing the main UI work.

...

  • Round Robin type data store - Basically any kind of storage that is fixed in size, and does semi-automatic data aggregation.  The basic idea is that for a period X you have the full details of whatever data you log.  Then after X has elapsed, the data is aggregated in one, or several ways (average, minimum, maximum).  This aggregate data is then stored for another period Y.  Repeat until the data is  no longer relevant and can be dropped from the store.
  • Data Collection - Since Synapse is on Amazon's Elastic Beanstalk, there is the possibility that data usage must be aggregated from several different Synapse instances.  In addition, certain data (like user activity) is most easily collected from other sources than the services (like the Crowd servers).  Thus some kind of data collection mechanism is needed.
  • Data Interpretation - Since both metrics so far proposed (User and project activity levels) are somewhat expensive to compute (if it's even possible), ideally the front-end GUI will not request that this data be recomputed ever.  Some background process - whether it is hosted in the metrics web server, or run independently - is needed to do any kind of pre-processing to the data before it is entered into the data store.
Proposed Solutions

Amazon Cloudwatch - An attractive system for several reasons:

  • Ease of use - no setup/administration costs other than monetary
  • Integrated - since we're already heavily using Amazon's Cloud services, using Cloudwatch is a pretty natural extension
  • Scalability - Fully a push mechanism so as long as one Synapse is setup correctly they all are.

However there are also some problems:

  • Length of storage - it's not completely clear whether Cloudwatch does any data aggregation.  However, they are very clear that they only keep the original data around for two weeks.  Since this is far less than the period of time that we would like at least some of our metrics to be stored for if Cloudwatch is used, a supplementary data store may be needed.
  • Data expectations - According to Amazon's promotional materials, they expect Cloudwatch to be used for things like "CPU utilization, latency, and request counts".  The thing about these metrics is that they are all time series, that is, it is natural to want to measure these metrics at consistent intervals, and have the data related across time.  On the other hand, 

Data Requirements

User Activity

Given data set: A time-ordered list (or set of lists) of user auth-events from the crowd servers

Computed data points:

  • For each user, a list of recent logins (either a fixed number or for a window)
  • Activity status - New, Aborted, Active, Inactive.  This could be computed on a daily basis from current activity status and the login record for today.  This method of continuous calculation would also make it easier to 'detect' changes to someone's status (i.e. maybe put them on a list of user's that transitioned to a new status).

Persistent daily data needed:

  • Current activity status
  • Last login date
  • Number of total logins