Check that the staging stack is running
Check that data has migrated to the staging stack
Check that resolved issues having JIRA fixVersion=stack-n (staging stack) are validated (closed)
Make the Go-Live Decision for stack-n
Identify significant features that may merit an announcement to users
Set validators for stack-<n+1>
Review validation work on (R and Python) clients
Review performance metrics:
Dashboard: https://dashboard.synapse.org
Latencies & Errors > Global Latencies and Query Latencies
Note: Some spikes are due to known, administrative tasks. Check with team.
Latencies & Errors > Trending HTTP status codes (Percentage)
Users > (all metrics)
Table Metrics
AWS Cloudwatch:
Asynch workers (Are there many 'no retry' jobs that failed? Are there 'retry' jobs that retry forever?)
Throttling
Client logs (Note: look at metrics with no dimensions; select All to quickly see if there are many hits.)
AWS RDS
prod-<stack>-db
CPU (use 1 min resolution to check for spikes)
Free Storage Space (MB)
If there are symptoms of performance problems: Write Throughput and Read Throughput (Note: This shows the load bet. the network and the database.)
Now repeat the review for the Table database:
prod-<stack>-table-0
CPU (use 1 min resolution to check for spikes)
Free Storage Space (MB)
If there are symptoms of performance problems: Write Throughput and Read Throughput
AWS EC-2 page > Running instances
workers-prod-xx > CPU Utilization (Is work balanced across workers? Are workers 'pegged'?)
repo-prod-xx > CPU Utilization (Is work balanced across workers? Are workers 'pegged'?)
(use 1 min resolution to check for spikes)
portal-prod-xx > network OUT
For any metrics we track week-to-week, enter in this sheet: https://docs.google.com/a/sagebase.org/spreadsheets/d/1u1fYXFkW4pzQ4f1OhQvyON9PEtgcP6UZLierGeuknbw/edit?usp=sharing
Each team member briefly outline their work for the coming week.
Related links:
Google Analytics: http://www.google.com/analytics/
Look at Behavior > Site Content > All Pages
...