Document toolboxDocument toolbox

Monitoring policies

Librato Monitoring Settings

VariableValue
load_avg_1m>1 for 10 min
router.service.median>50ms for 10 min
router.service.perc95>500ms for 10 min
router.status.5xx>10 for 10 min

New Relic Monitoring Settings

Application Alert Policies

Apdex score0.35 seconds (1.7 seconds in staging)
Alert policy<0.94 apdex, >1% error rate for 10 minutes
DowntimeWhen unresponsive for 5 minutes

Setting the Apdex score: https://docs.newrelic.com/docs/apm/new-relic-apm/apdex/changing-your-apdex-settings

DynamoDB Monitoring

CloudWatch alarms.

VariableValue
Read throughput0.8 (80%) for 5 or more more minutes
Write throughput0.8 (80%) for 5 or more minutes
Throttled reads50 or more in 5 minutes
Throttled writes50 or more in 5 minutes

Rediscloud

VariableValue
Data store usage>80%

Logentries / Papertrail

VariableValue
Papertrail 503 errors10/hr
Logentries errors10/hr
Logentries warnings100/hr