Synapse Usage Metrics Dashboard Implementation Plan

Data Origin

The data comes from two sources:

Synapse Repository Services (possibly via logs, but possibly through other means)
Synapse Crowd Server(s) (via the logs)

These logs are scanned on a daily basis (could be more often if wanted).

Data Collection/Aggregation

The scanning process will involve running two Simple
Workflow workers:

scan log data and update the RDS window (labeled Agg)
take data from the RDS window and update the RRD (labeled Upd)

Web UI

This will be written in GWT/GXT3. This exposes and does pretty things with the data, which is available from two sources:

An interface to a temporary high-resolution data window, backed by any of several various implementations
Long-term RRD (implemented with rrd4j and backed by any of a variety of storage mechanisms).

Diagrams

Below is a diagram of the basic data flow for the application. I've endeavored to show the relationship's of all the moving parts, and specifically called out what needs to be built as part of this project.

RRD Considerations

See the page on RRD Size Estimates for information about what the storage needs are. (Hint: they're small).

Queries

Queries against the RRD are relatively limited. Basically, you can ask for a data set, and you can specify it by data source, start and end times, and what consolidation function was used. All of these parameters are essentially limited by how the database is defined when it is created. There is no possibility of modification after that (unless maybe you export the data, create a similar database and import the data, but even then, you can only get out of the database what you put into it). As the RRD page says, the trick is to choose the right aggregation functions so that the meaning of the data is best preserved.

It seems like one of the main issues with the RRD, is the fact that as we add more metrics, we'll probably want to keep adding more datasources to the databases. Since this may or may not be practical, it may be the case that we start to create more and more databases. Or more likely, a migration strategy may have to be developed.