Librato Monitoring Settings
| Variable | Value |
|---|
| load_avg_1m | >1 for 10 min |
| router.service.median | >50ms for 10 min |
| router.service.perc95 | >500ms for 10 min |
| router.status.5xx | >10 for 10 min |
New Relic Monitoring Settings
Application Alert Policies
| Apdex score | 0.35 seconds (1.7 seconds in staging) |
| Alert policy | <0.94 apdex, >1% error rate for 10 minutes |
| Downtime | When unresponsive for 5 minutes |
Setting the Apdex score: https://docs.newrelic.com/docs/apm/new-relic-apm/apdex/changing-your-apdex-settings
DynamoDB Monitoring
CloudWatch alarms.
| Variable | Value |
|---|
| Read throughput | 0.8 (80%) for 5 or more more minutes |
| Write throughput | 0.8 (80%) for 5 or more minutes |
| Throttled reads | 50 or more in 5 minutes |
| Throttled writes | 50 or more in 5 minutes |
Rediscloud
| Variable | Value |
|---|
| Data store usage | >80% |
Logentries / Papertrail
| Variable | Value |
|---|
| Papertrail 503 errors | 10/hr |
| Logentries errors | 10/hr |
| Logentries warnings | 100/hr |