# Rollback Strategy Template ## Overview | Field | Value | |-------|-------| | Service/Component | [Name] | | Last Updated | [YYYY-MM-DD] | | Owner | [Team/Person] | | Max Rollback Time | [Target: X minutes] | --- ## Rollback Triggers ### Automatic Rollback Triggers - [ ] Health check failures > 3 consecutive - [ ] Error rate > 10% for 5 minutes - [ ] P99 latency > 2x baseline for 5 minutes - [ ] Critical alert triggered ### Manual Rollback Triggers - [ ] User-reported critical bug - [ ] Data corruption detected - [ ] Security vulnerability discovered - [ ] Stakeholder decision --- ## Pre-Rollback Checklist - [ ] Incident acknowledged and documented - [ ] Stakeholders notified of rollback decision - [ ] Current state captured (logs, metrics snapshot) - [ ] Rollback target version identified - [ ] Database state assessed (migrations reversible?) --- ## Rollback Procedures ### Application Rollback #### Option 1: Revert Deployment (Preferred) ```bash # Using CI/CD # Trigger previous successful deployment # Manual (if needed) git revert git push origin main ``` #### Option 2: Blue-Green Switch ```bash # Switch traffic to previous version # [Platform-specific commands] ``` #### Option 3: Feature Flag Disable ```bash # Disable feature flag # [Feature flag system commands] ``` ### Database Rollback #### If Migration is Reversible ```bash # Run down migration # [Migration tool command] ``` #### If Migration is NOT Reversible 1. [ ] Restore from backup 2. [ ] Point-in-time recovery to pre-deployment 3. [ ] **WARNING**: May cause data loss - requires approval ### Configuration Rollback ```bash # Restore previous configuration # [Config management commands] ``` --- ## Post-Rollback Verification ### Immediate (0-5 minutes) - [ ] Service responding to health checks - [ ] No error spikes in logs - [ ] Basic functionality verified ### Short-term (5-30 minutes) - [ ] All critical paths functional - [ ] Error rate returned to baseline - [ ] Performance metrics normal ### Extended (30-60 minutes) - [ ] No delayed issues appearing - [ ] User reports resolved - [ ] All alerts cleared --- ## Communication Plan ### During Rollback | Audience | Message | Channel | |----------|---------|---------| | Engineering | "Initiating rollback due to [reason]" | Slack | | Stakeholders | "Service issue detected, rollback in progress" | Email | | Users | "We're aware of issues and working on a fix" | Status page | ### After Rollback | Audience | Message | Channel | |----------|---------|---------| | Engineering | "Rollback complete, monitoring" | Slack | | Stakeholders | "Service restored, post-mortem scheduled" | Email | | Users | "Issue resolved, service fully operational" | Status page | --- ## Known Limitations ### Cannot Rollback If: - [ ] Database migration deleted columns with data - [ ] External API contracts changed - [ ] Third-party integrations updated ### Partial Rollback Scenarios - [ ] When only specific components affected - [ ] When data migration is complex --- ## Recovery After Rollback ### Investigation 1. [ ] Collect all relevant logs 2. [ ] Identify root cause 3. [ ] Document findings ### Re-deployment Planning 1. [ ] Fix identified in development 2. [ ] Additional tests added 3. [ ] Staged rollout planned 4. [ ] Monitoring enhanced --- ## Rollback Testing ### Test Schedule - [ ] Monthly rollback drill - [ ] After major infrastructure changes - [ ] Before critical releases ### Test Scenarios 1. Application rollback 2. Database rollback (in staging) 3. Configuration rollback --- ## Contacts | Role | Name | Contact | |------|------|---------| | On-call | | | | Database Admin | | | | Platform Team | | |