mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-04-23 04:46:37 +00:00
more detailed SDLC plan
This commit is contained in:
@@ -0,0 +1,157 @@
|
||||
# Incident Playbook Template
|
||||
|
||||
## Incident Overview
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| Playbook Name | [Name] |
|
||||
| Severity | Critical / High / Medium / Low |
|
||||
| Last Updated | [YYYY-MM-DD] |
|
||||
| Owner | [Team/Person] |
|
||||
|
||||
---
|
||||
|
||||
## Detection
|
||||
|
||||
### Symptoms
|
||||
- [How will you know this incident is occurring?]
|
||||
- Alert: [Alert name that triggers]
|
||||
- User reports: [Expected user complaints]
|
||||
|
||||
### Monitoring
|
||||
- Dashboard: [Link to relevant dashboard]
|
||||
- Logs: [Log query to investigate]
|
||||
- Metrics: [Key metrics to watch]
|
||||
|
||||
---
|
||||
|
||||
## Assessment
|
||||
|
||||
### Impact Analysis
|
||||
- Users affected: [All / Subset / Internal only]
|
||||
- Data at risk: [Yes / No]
|
||||
- Revenue impact: [High / Medium / Low / None]
|
||||
|
||||
### Severity Determination
|
||||
| Condition | Severity |
|
||||
|-----------|----------|
|
||||
| Service completely down | Critical |
|
||||
| Partial degradation | High |
|
||||
| Intermittent issues | Medium |
|
||||
| Minor impact | Low |
|
||||
|
||||
---
|
||||
|
||||
## Response
|
||||
|
||||
### Immediate Actions (First 5 minutes)
|
||||
1. [ ] Acknowledge alert
|
||||
2. [ ] Verify incident is real (not false positive)
|
||||
3. [ ] Notify on-call team
|
||||
4. [ ] Start incident channel/call
|
||||
|
||||
### Investigation Steps
|
||||
1. [ ] Check recent deployments
|
||||
2. [ ] Review error logs
|
||||
3. [ ] Check infrastructure metrics
|
||||
4. [ ] Identify affected components
|
||||
|
||||
### Communication
|
||||
| Audience | Channel | Frequency |
|
||||
|----------|---------|-----------|
|
||||
| Engineering | Slack #incidents | Continuous |
|
||||
| Stakeholders | Email | Every 30 min |
|
||||
| Users | Status page | Major updates |
|
||||
|
||||
---
|
||||
|
||||
## Resolution
|
||||
|
||||
### Common Fixes
|
||||
|
||||
#### Fix 1: [Common issue]
|
||||
```bash
|
||||
# Commands to fix
|
||||
```
|
||||
Expected outcome: [What should happen]
|
||||
|
||||
#### Fix 2: [Another common issue]
|
||||
```bash
|
||||
# Commands to fix
|
||||
```
|
||||
Expected outcome: [What should happen]
|
||||
|
||||
### Rollback Procedure
|
||||
1. [ ] Identify last known good version
|
||||
2. [ ] Execute rollback
|
||||
```bash
|
||||
# Rollback commands
|
||||
```
|
||||
3. [ ] Verify service restored
|
||||
4. [ ] Monitor for 15 minutes
|
||||
|
||||
### Escalation Path
|
||||
| Time | Action |
|
||||
|------|--------|
|
||||
| 0-15 min | On-call engineer |
|
||||
| 15-30 min | Team lead |
|
||||
| 30-60 min | Engineering manager |
|
||||
| 60+ min | Director/VP |
|
||||
|
||||
---
|
||||
|
||||
## Post-Incident
|
||||
|
||||
### Verification
|
||||
- [ ] Service fully restored
|
||||
- [ ] All alerts cleared
|
||||
- [ ] User-facing functionality verified
|
||||
- [ ] Monitoring back to normal
|
||||
|
||||
### Documentation
|
||||
- [ ] Timeline documented
|
||||
- [ ] Root cause identified
|
||||
- [ ] Action items created
|
||||
- [ ] Post-mortem scheduled
|
||||
|
||||
### Post-Mortem Template
|
||||
```markdown
|
||||
## Incident Summary
|
||||
- Date/Time:
|
||||
- Duration:
|
||||
- Impact:
|
||||
- Root Cause:
|
||||
|
||||
## Timeline
|
||||
- [Time] - Event
|
||||
|
||||
## What Went Well
|
||||
-
|
||||
|
||||
## What Went Wrong
|
||||
-
|
||||
|
||||
## Action Items
|
||||
| Action | Owner | Due Date |
|
||||
|--------|-------|----------|
|
||||
| | | |
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Contacts
|
||||
|
||||
| Role | Name | Contact |
|
||||
|------|------|---------|
|
||||
| On-call | | |
|
||||
| Team Lead | | |
|
||||
| Manager | | |
|
||||
|
||||
---
|
||||
|
||||
## Revision History
|
||||
|
||||
| Date | Author | Changes |
|
||||
|------|--------|---------|
|
||||
| | | |
|
||||
|
||||
Reference in New Issue
Block a user