Incident Report Template
This document outlines the reporting requirements for major incidents. A major incident is defined as an software failure incident reported via the Sage Privacy Officer, Funding Program Officer, or IRB. Describing incidents and their resolutions contributes to the de-risking of tools and contributes to SDLC improvements. Make a copy of this template for each incident and parent it under the Issue Management content section.
Title of Incident
A short descriptive title.
Jira ticket link
Every incident must be created as an Initiative within the BDF Jira board https://sagebionetworks.jira.com/browse/BDFLINC . All tasks related to the bug capture(s) and resolution(s) must be parented to this initiative. This enables transparent reporting and tracking of incidents.
Incident Response Team
A table of people participating in the incident response and their role and function
Name | Role | Function |
---|---|---|
| PM | QA, Release Management, Communications, Incident Response Lead, Prioritization |
| Dev | Root Cause Analysis, bug resolution |
| Leadership | Communications |
| SME | Data Validation |
Summary
A 2-3 paragraph description of the initiation, root cause analysis, and final resolution including dates. Example:
On XX/XX/20XX Collaborater Z reported that….The most immediate proximal cause of the incident turned out to be…..The incident was compounded/obscured by something else that was also occurring in parallel. The bug(s) were resolved by fixing ABC and a hotfix was deployed on XX/XX/20XX.
Impact and Risk assessment
Describe how many users and/or how much data was impacted. Describe what the resolution does and does not cover (ie new data only, or retroactive data) what users can expect to change going forward and if other apps/tools/processes were also impacted. Describe potential risks to users, data, systems, timelines and other project impacts.
Timeline
Describe the timeline of events. Include timeline of relevant releases and adjunct investigations. Update status when all issues are resolved.
Date | Action | Status |
---|---|---|
| include brief description and link to bug | resolved |
| release version | in production |
Proximal Cause
3-5 paragraphs describing the technical root cause of the issue. This can also include compounding errors in process or human behavior.
Resolution and Recovery
A table of dates of all actions taken to resolve the incident. Include bug filing, internal builds, test passes, release to production, communications, and user validation of resolution.
Opportunities for Improvement
Describe areas for improvement. This section should be completed after an After-Action Review. Include links to future feature development if relevant.
Recommendations
Describe recommendations for actions that users need to take and for communicating with users and funders. Describe any post-recovery analyses or monitoring recommendations. Confirm with Governance team if any protocol deviations or other actions need to addressed. Propose mitigations for identified risks.
Acknowledgement
Obtain signoff on the Jira ticket from Program Lead, LT member, and Governance lead to acknowledge that flaws and recommendations have been communicated to cross-team responsible parties, that follow up action items have been addressed and acknowledge the decisions for recommended risk mitigation. Assign the Program Lead (Milen Nikolov) as the validator. The validator will resolve and close the Jira ticket. LT member and Governance lead can provide their signoff in the comments, or a separate task can be created and assigned to them.