There's a Web UI, written in GWT/GXT3.
This exposes and does pretty things with the data, which is available
from two (possibly three) sources:
- RDS High-resolution data window
- Amazon Cloudwatch Aggregates
- Long-term RRD of cloudwatch aggregates
If this third component is necessary, then it is questionable whether involving Cloudwatch at all is cost-effective or wanted.
Data Origin
The data comes from two sources:
...
These logs are scanned on a daily basis (could be more often if
wantedif wanted).
Data Collection/Aggregation
The scanning process will involve calling from one to three running two Simple
Workflow (SWF) workers:
- scan log data and update the RDS window (labeled Agg)
- take data from the RDS window and update cloudwatch2 week old data from cloudwatch that is still wanted into the long-term RRD storagethe RRD (labeled Upd)
Web UI
This will be written in GWT/GXT3. This exposes and does pretty things with the data, which is available from two sources:
- An interface to a temporary high-resolution data window, backed by any of several various implementations,
- the main purpose of this data window would be essentially providing a source of data for an update stream, and possibly a place to buffer pushing the data into long-term storage
- Long-term storage
RRD Considerations
See the page on RRD Size Estimates for information about what the storage needs are. (Hint: they're small).
Queries
Queries against the RRD are relatively limited. Basically, you can ask for a data set, and you can specify it by data source, start and end times, and what consolidation function was used. All of these parameters are essentially limited by how the database is defined when it is created. There is no possibility of modification after that (unless maybe you export the data, create a similar database and import the data, but even then, you can only get out of the database what you put into it). As the RRD page says, the trick is to choose the right aggregation functions so that the meaning of the data is best preserved.
It seems like one of the main issues with the RRD, is the fact that as we add more metrics, we'll probably want to keep adding more datasources to the databases. Since this may or may not be practical, it may be the case that we start to create more and more databases. Or more likely, a migration strategy may have to be developed.