...
- The Dashboard will surface information about how users are actually behaving on Synapse so that Sage's Engineering and Management teams can drill into this data to make a variety of discussions related to the product roadmap and development of partnerships. The dashboard must serve both technical and non-technical users (e.g. it must be useable by the CEO).
- In the short term, we want to limit visibility of the dashboard to Sage employees, but make it very easy for all Sage employees to get at the dashboard. Embedding the dashboard in either the Sage intranet or a Confluence wiki page would be a great way to accomplish this.
- In the long term we might want to also surface some metrics about Synapse on Synapse itself, or expose a public API for others to access and mine our metrics for a variety of purposes. For example, a large data generator might want to be able to find out who is using their data. Journals or funding agencies might want to assess the impact of work performed on Synapse. However, we probably will always want a separate Sage-only dashboard that may be more specific or tailored to our needs than what we'd put on Synapse itself for public consumption.
- We are interested in observing long-term trends in user behavior over the course of months or years. We will want to demonstrate uptake of the technology for purposes like raising grant money to continue Synapse development. We are also interested in short term snapshots, e.g. what users have recently become active / inactive in last 30 days that might require someone making contact with the user and understanding what has happened.
- It's not necessary that this be an operational dashboard for technical people to monitor and trouble shoot the performance of Synapse or it's components. Cloud Watch type metrics on things like load on different infrastructure components are a different category of metric, and can be managed separately.
- We expect the specific metrics gathered to start off high level and general, and to continuously evolve and become more granular as we generate more questions to ask of Synapse about its users. We want to make it easy for new developers to incrementally add to the dashboard. For example, a new developer might develop a new feature and add new custom metrics to measure how the feature is actually used in production by live users.
- We want to capture both activity from the web application and analytical client tools. Note that we have turned on Google analytics for the Synapse web application at https://www.google.com/analytics/ login with account infrastructure@sagebase.org, password in the usual place. We don't want to duplicate things in our metrics system that we get for free out of this.
Design Options
There are several components to a tracking system such as this:
- The actual dashboard component, or the UI
- The data storage/collection mechanism
The dashboard is a pretty straightforward piece of software, simply taking information from the storage mechanism (whatever that might be) and displaying it to the end user. For this end of the project it seems natural to continue using technologies that are already in use - namely GWT. Specifically, to facilitate creating a user-friendly experience, and for the support that it provides in terms of graphs and charts GXT 3 seems like a strong candidate for doing the main UI work.
For the back end, the choices are less clear cut, so let's start by listing some requirements.
Back End Requirements
- Round Robin type data store - Basically any kind of storage that is fixed in size, and does semi-automatic data aggregation. The basic idea is that for a period X you have the full details of whatever data you log. Then after X has elapsed, the data is aggregated in one, or several ways (average, minimum, maximum). This aggregate data is then stored for another period Y. Repeat until the data is no longer relevant and can be dropped from the store.
- Data Collection - Since Synapse is on Amazon's Elastic Beanstalk, there is the possibility that data usage must be aggregated from several different Synapse instances. In addition, certain data (like user activity) is most easily collected from other sources than the services (like the Crowd servers). Thus some kind of data collection mechanism is needed.
- Data Interpretation - Since both metrics so far proposed (User and project activity levels) are somewhat expensive to compute (if it's even possible), ideally the front-end GUI will not request that this data be recomputed ever. Some background process - whether it is hosted in the metrics web server, or run independently - is needed to do any kind of pre-processing to the data before it is entered into the data store.