Issue management
This document describes the process flow of how we intake, triage, prioritize, and assign resources to incoming user issues/bugs. It also describes the matrix to define severity, impact, and target MTTR and MTTC for ranked issues. Issues will be prioritized for work based on the prevalence/number of users impacted and the severity of the impact according to evaluation criteria. Issue prioritization will be facilitated and finalized by PM in collaboration with the assigned response team. This process is NOT to triage new feature asks, but these issues may feedback into feature development and improvements via after-action reviews.
Definitions
DCC: Data Contribution Center. This is a short-hand, internal Sage term used to describe a group of researchers who are contributing, accessing, and sharing data as part of our platform.
Assigned response team: rotational team comprises PM, Sage Dev, Partner Dev, QA and DCC Scientific representative on point to resolve issues. Team is responsible for driving issues to completion within MTTC (3 wks = initial metric) and for participating in after-action reviews.
Impact level: describes the total number of known users who are impacted by the issue.
Web app user: defined as users with verified Synapse (or One Sage) accounts, who are participating in a data management project/DCC.
Severity level: describes the ability of the user to successfully access client functionality.
Assigned priority: describes the priority assigned to the issue. Assigned priority determines the recommended action. Priority will be assigned by PM.
Escalate: as additional user reports are gathered or investigative info predicts a change, impact and severity can be escalated and assigned priority updated accordingly.
Downgrade: as additional user reports and investigative info is gathered that predicts a change, impact and severity can be downgraded and assigned priority updated accordingly.
Recommended action: describes the actions and timeline the response team should follow to address the issue.
Alert thresholds - TBD. Settings for alert threshold to trigger team email alerts from Cloudfront/AWS dashboard. need to define scenarios to define granularity of tooling and reporting
on-page errors?
entire app crashes?
independent tool crash?
perf slowdowns?
site outage?
programmatic client?
Impact level | Description (count or %install base) |
---|---|
1 | x > 15 users on web app x > 2 DCCs |
2 | 3 < x < 15 on web app x> 1 DCC |
3 | x < 3 users on web app does not impact DCC as a whole |
Include - in pre-investigation, look at all crash logs to help inform impact/sev, repro
Severity level | Description | Examples |
---|---|---|
1 | Data contribution is blocked |
|
1 | User is blocked from using entire app |
|
2 | User is blocked from using some functionality |
|
3 | User is blocked but a work-around exists |
|
I/S | Assigned priority in Jira | Recommended action |
---|---|---|
1/1 | Blocker | Investigate and address immediately Hotfix deploy |
1/2 | Critical | Resource balance to address in current sprint Deploy on cadence |
1/3 | Critical | Resource balance to address in current sprint Deploy on cadence |
2/1 | Blocker | Investigate and address immediately Hotfix deploy |
2/2 | Critical | Resource balance to address in current or next sprint Deploy on cadence |
2/3 | Major | Resource balance to address in later sprint Deploy on cadence |
3/1 | Major | Resource balance to address in later sprint Monitor for additional cases Deploy on cadence |
3/2 | Major | Resource balance to address in later sprint Deploy on cadence |
3/3 | Minor | Address as time allows or designate as 'won’t fix' |
Process flow