Skip to end of metadata
Go to start of metadata

You are viewing an old version of this content. View the current version.

Compare with Current View Version History

Version 1 Current »

This document outlines the reporting requirements for major incidents. A major incident is defined as an software failure incident reported via the Sage Privacy Officer, Funding Program Officer, or IRB. Describing incidents and their resolutions contributes to the de-risking of tools and contributes to SDLC improvements. Make a copy of this template for each incident and parent it under the Issue Management content section.

Title of Incident

A short descriptive title.

Every incident must be created as an Initiative within the BDF Jira board https://sagebionetworks.jira.com/browse/BDFLINC . All tasks related to the bug capture(s) and resolution(s) must be parented to this initiative. This enables transparent reporting and tracking of incidents.

Incident Response Team

A table of people participating in the incident response and their role and function

Name

Role

Function

PM

QA, Release Management, Communications, Incident Response Lead, Prioritization

Dev

Root Cause Analysis, bug resolution

Leadership

Communications

SME

Data Validation

Summary

A 2-3 paragraph description of the initiation, root cause analysis, and final resolution including dates. Example:

On XX/XX/20XX Collaborater Z reported that….The most immediate proximal cause of the incident turned out to be…..The incident was compounded/obscured by something else that was also occurring in parallel. The bug(s) were resolved by fixing ABC and a hotfix was deployed on XX/XX/20XX.

Impact and Risk assessment

Describe how many users and/or how much data was impacted. Describe what the resolution does and does not cover (ie new data only, or retroactive data) what users can expect to change going forward and if other apps/tools/processes were also impacted. Describe potential risks to users, data, systems, timelines and other project impacts.

Timeline

Describe the timeline of events. Include timeline of relevant releases and adjunct investigations. Update status when all issues are resolved.

Date

Action

Status

include brief description and link to bug

resolved

release version

in production

Proximal Cause

3-5 paragraphs describing the technical root cause of the issue. This can also include compounding errors in process or human behavior.

Resolution and Recovery

A table of dates of all actions taken to resolve the incident. Include bug filing, internal builds, test passes, release to production, communications, and user validation of resolution.

Opportunities for Improvement

Describe areas for improvement. This section should be completed after an After-Action Review. Include links to future feature development if relevant.

Recommendations

Describe recommendations for actions that users need to take and for communicating with users and funders. Describe any post-recovery analyses or monitoring recommendations. Confirm with Governance team if any protocol deviations or other actions need to addressed. Propose mitigations for identified risks.

Acknowledgement

Obtain signoff on the Jira ticket from Program Lead, LT member, and Governance lead to acknowledge that flaws and recommendations have been communicated to cross-team responsible parties and that follow up action items have been addressed. Assign the Program Lead (Milen Nikolov) as the validator. The validator will resolve and close the Jira ticket. LT member and Governance lead can provide their signoff in the comments, or a separate task can be created and assigned to them.

  • No labels

0 Comments

You are not logged in. Any changes you make will be marked as anonymous. You may want to Log In if you already have an account.