Files
gps-denied-desktop/_docs/00_templates/incident_playbook.md
T
Oleksandr Bezdieniezhnykh fd75243a84 more detailed SDLC plan
2025-12-10 19:05:17 +02:00

2.8 KiB

Incident Playbook Template

Incident Overview

Field Value
Playbook Name [Name]
Severity Critical / High / Medium / Low
Last Updated [YYYY-MM-DD]
Owner [Team/Person]

Detection

Symptoms

  • [How will you know this incident is occurring?]
  • Alert: [Alert name that triggers]
  • User reports: [Expected user complaints]

Monitoring

  • Dashboard: [Link to relevant dashboard]
  • Logs: [Log query to investigate]
  • Metrics: [Key metrics to watch]

Assessment

Impact Analysis

  • Users affected: [All / Subset / Internal only]
  • Data at risk: [Yes / No]
  • Revenue impact: [High / Medium / Low / None]

Severity Determination

Condition Severity
Service completely down Critical
Partial degradation High
Intermittent issues Medium
Minor impact Low

Response

Immediate Actions (First 5 minutes)

  1. Acknowledge alert
  2. Verify incident is real (not false positive)
  3. Notify on-call team
  4. Start incident channel/call

Investigation Steps

  1. Check recent deployments
  2. Review error logs
  3. Check infrastructure metrics
  4. Identify affected components

Communication

Audience Channel Frequency
Engineering Slack #incidents Continuous
Stakeholders Email Every 30 min
Users Status page Major updates

Resolution

Common Fixes

Fix 1: [Common issue]

# Commands to fix

Expected outcome: [What should happen]

Fix 2: [Another common issue]

# Commands to fix

Expected outcome: [What should happen]

Rollback Procedure

  1. Identify last known good version
  2. Execute rollback
# Rollback commands
  1. Verify service restored
  2. Monitor for 15 minutes

Escalation Path

Time Action
0-15 min On-call engineer
15-30 min Team lead
30-60 min Engineering manager
60+ min Director/VP

Post-Incident

Verification

  • Service fully restored
  • All alerts cleared
  • User-facing functionality verified
  • Monitoring back to normal

Documentation

  • Timeline documented
  • Root cause identified
  • Action items created
  • Post-mortem scheduled

Post-Mortem Template

## Incident Summary
- Date/Time:
- Duration:
- Impact:
- Root Cause:

## Timeline
- [Time] - Event

## What Went Well
-

## What Went Wrong
-

## Action Items
| Action | Owner | Due Date |
|--------|-------|----------|
| | | |

Contacts

Role Name Contact
On-call
Team Lead
Manager

Revision History

Date Author Changes