Synapse Usage Metrics Dashboard Implementation Plan
Data Origin
The data comes from two sources:
- Synapse Repository Services (possibly via logs, but possibly through other means)
- Synapse Crowd Server(s) (via the logs)
These logs are scanned on a daily basis (could be more often if wanted).
Data Collection/Aggregation
The scanning process will involve running two Simple
Workflow workers:
- scan log data and update the RDS window (labeled Agg)
- take data from the RDS window and update the RRD (labeled Upd)
Web UI
This will be written in GWT/GXT3. This exposes and does pretty things with the data, which is available from two sources:
- An interface to a temporary high-resolution data window, backed by any of several various implementations,
- the main purpose of this data window would be essentially providing a source of data for an update stream, and possibly a place to buffer pushing the data into long-term storage
- Long-term storage
RRD Considerations
See the page on RRD Size Estimates for information about what the storage needs are. (Hint: they're small).
Queries
Queries against the RRD are relatively limited. Basically, you can ask for a data set, and you can specify it by data source, start and end times, and what consolidation function was used. All of these parameters are essentially limited by how the database is defined when it is created. There is no possibility of modification after that (unless maybe you export the data, create a similar database and import the data, but even then, you can only get out of the database what you put into it). As the RRD page says, the trick is to choose the right aggregation functions so that the meaning of the data is best preserved.
It seems like one of the main issues with the RRD, is the fact that as we add more metrics, we'll probably want to keep adding more datasources to the databases. Since this may or may not be practical, it may be the case that we start to create more and more databases. Or more likely, a migration strategy may have to be developed.