mirror of
https://github.com/azaion/gps-denied-desktop.git
synced 2026-04-22 22:46:36 +00:00
158 lines
2.8 KiB
Markdown
158 lines
2.8 KiB
Markdown
# Incident Playbook Template
|
|
|
|
## Incident Overview
|
|
|
|
| Field | Value |
|
|
|-------|-------|
|
|
| Playbook Name | [Name] |
|
|
| Severity | Critical / High / Medium / Low |
|
|
| Last Updated | [YYYY-MM-DD] |
|
|
| Owner | [Team/Person] |
|
|
|
|
---
|
|
|
|
## Detection
|
|
|
|
### Symptoms
|
|
- [How will you know this incident is occurring?]
|
|
- Alert: [Alert name that triggers]
|
|
- User reports: [Expected user complaints]
|
|
|
|
### Monitoring
|
|
- Dashboard: [Link to relevant dashboard]
|
|
- Logs: [Log query to investigate]
|
|
- Metrics: [Key metrics to watch]
|
|
|
|
---
|
|
|
|
## Assessment
|
|
|
|
### Impact Analysis
|
|
- Users affected: [All / Subset / Internal only]
|
|
- Data at risk: [Yes / No]
|
|
- Revenue impact: [High / Medium / Low / None]
|
|
|
|
### Severity Determination
|
|
| Condition | Severity |
|
|
|-----------|----------|
|
|
| Service completely down | Critical |
|
|
| Partial degradation | High |
|
|
| Intermittent issues | Medium |
|
|
| Minor impact | Low |
|
|
|
|
---
|
|
|
|
## Response
|
|
|
|
### Immediate Actions (First 5 minutes)
|
|
1. [ ] Acknowledge alert
|
|
2. [ ] Verify incident is real (not false positive)
|
|
3. [ ] Notify on-call team
|
|
4. [ ] Start incident channel/call
|
|
|
|
### Investigation Steps
|
|
1. [ ] Check recent deployments
|
|
2. [ ] Review error logs
|
|
3. [ ] Check infrastructure metrics
|
|
4. [ ] Identify affected components
|
|
|
|
### Communication
|
|
| Audience | Channel | Frequency |
|
|
|----------|---------|-----------|
|
|
| Engineering | Slack #incidents | Continuous |
|
|
| Stakeholders | Email | Every 30 min |
|
|
| Users | Status page | Major updates |
|
|
|
|
---
|
|
|
|
## Resolution
|
|
|
|
### Common Fixes
|
|
|
|
#### Fix 1: [Common issue]
|
|
```bash
|
|
# Commands to fix
|
|
```
|
|
Expected outcome: [What should happen]
|
|
|
|
#### Fix 2: [Another common issue]
|
|
```bash
|
|
# Commands to fix
|
|
```
|
|
Expected outcome: [What should happen]
|
|
|
|
### Rollback Procedure
|
|
1. [ ] Identify last known good version
|
|
2. [ ] Execute rollback
|
|
```bash
|
|
# Rollback commands
|
|
```
|
|
3. [ ] Verify service restored
|
|
4. [ ] Monitor for 15 minutes
|
|
|
|
### Escalation Path
|
|
| Time | Action |
|
|
|------|--------|
|
|
| 0-15 min | On-call engineer |
|
|
| 15-30 min | Team lead |
|
|
| 30-60 min | Engineering manager |
|
|
| 60+ min | Director/VP |
|
|
|
|
---
|
|
|
|
## Post-Incident
|
|
|
|
### Verification
|
|
- [ ] Service fully restored
|
|
- [ ] All alerts cleared
|
|
- [ ] User-facing functionality verified
|
|
- [ ] Monitoring back to normal
|
|
|
|
### Documentation
|
|
- [ ] Timeline documented
|
|
- [ ] Root cause identified
|
|
- [ ] Action items created
|
|
- [ ] Post-mortem scheduled
|
|
|
|
### Post-Mortem Template
|
|
```markdown
|
|
## Incident Summary
|
|
- Date/Time:
|
|
- Duration:
|
|
- Impact:
|
|
- Root Cause:
|
|
|
|
## Timeline
|
|
- [Time] - Event
|
|
|
|
## What Went Well
|
|
-
|
|
|
|
## What Went Wrong
|
|
-
|
|
|
|
## Action Items
|
|
| Action | Owner | Due Date |
|
|
|--------|-------|----------|
|
|
| | | |
|
|
```
|
|
|
|
---
|
|
|
|
## Contacts
|
|
|
|
| Role | Name | Contact |
|
|
|------|------|---------|
|
|
| On-call | | |
|
|
| Team Lead | | |
|
|
| Manager | | |
|
|
|
|
---
|
|
|
|
## Revision History
|
|
|
|
| Date | Author | Changes |
|
|
|------|--------|---------|
|
|
| | | |
|
|
|