...
The scanning process will involve calling running two Simple
Workflow workers:
- scan log data and update the RDS window (labeled Agg)
- take data from the RDS window and update the RRD (labeled Upd)
Web UI
This will be written in GWT/GXT3. This exposes and does pretty things with the data, which is available from two sources:
- Amazon RDS An interface to a temporary high-resolution data window, backed by any of several various implementations,
- the main purpose of this data window would be essentially providing a source of data for an update stream, and possibly a place to buffer pushing the data into long-term storage
- Long-term storage
RRD Considerations
See the page on RRD
...
Size Estimates for information about what the storage needs are. (Hint: they're small).
Diagrams
Below is a diagram of the basic data flow for the application. I've endeavored to show the relationship's of all the moving parts, and specifically called out what needs to be built as part of this project.
...
Queries
Queries against the RRD are relatively limited. Basically, you can ask for a data set, and you can specify it by data source, start and end times, and what consolidation function was used. All of these parameters are essentially limited by how the database is defined when it is created. There is no possibility of modification after that (unless maybe you export the data, create a similar database and import the data, but even then, you can only get out of the database what you put into it). As the RRD page says, the trick is to choose the right aggregation functions so that the meaning of the data is best preserved.
It seems like one of the main issues with the RRD, is the fact that as we add more metrics, we'll probably want to keep adding more datasources to the databases. Since this may or may not be practical, it may be the case that we start to create more and more databases. Or more likely, a migration strategy may have to be developed.