more detailed SDLC plan

This commit is contained in:
Oleksandr Bezdieniezhnykh
2025-12-10 19:05:17 +02:00
parent 73cbe43397
commit fd75243a84
22 changed files with 2087 additions and 34 deletions
+92
View File
@@ -0,0 +1,92 @@
# Definition of Done (DoD)
A feature/task is considered DONE when all applicable items are completed.
---
## Code Complete
- [ ] All acceptance criteria from the spec are implemented
- [ ] Code compiles/builds without errors
- [ ] No new linting errors or warnings
- [ ] Code follows project coding standards and conventions
- [ ] No hardcoded values (use configuration/environment variables)
- [ ] Error handling implemented per project standards
---
## Testing Complete
- [ ] Unit tests written for new code
- [ ] Unit tests pass locally
- [ ] Integration tests written (if applicable)
- [ ] Integration tests pass
- [ ] Code coverage meets minimum threshold (75%)
- [ ] Manual testing performed for UI changes
---
## Code Review Complete
- [ ] Pull request created with proper description
- [ ] PR linked to Jira ticket
- [ ] At least one approval from reviewer
- [ ] All review comments addressed
- [ ] No merge conflicts
---
## Documentation Complete
- [ ] Code comments for complex logic (if needed)
- [ ] API documentation updated (if endpoints changed)
- [ ] README updated (if setup/usage changed)
- [ ] CHANGELOG updated with changes
---
## CI/CD Complete
- [ ] All CI pipeline stages pass
- [ ] Security scan passes (no critical/high vulnerabilities)
- [ ] Build artifacts generated successfully
---
## Deployment Ready
- [ ] Database migrations tested (if applicable)
- [ ] Configuration changes documented
- [ ] Feature flags configured (if applicable)
- [ ] Rollback plan identified
---
## Communication Complete
- [ ] Jira ticket moved to Done
- [ ] Stakeholders notified of completion (if required)
- [ ] Any blockers or follow-up items documented
---
## Quick Reference
| Category | Must Have | Nice to Have |
|----------|-----------|--------------|
| Code | Builds, No lint errors | Optimized |
| Tests | Unit + Integration pass | E2E tests |
| Coverage | >= 75% | >= 85% |
| Review | 1 approval | 2 approvals |
| Docs | CHANGELOG | Full API docs |
---
## Exceptions
If any DoD item cannot be completed, document:
1. Which item is incomplete
2. Reason for exception
3. Plan to address (with timeline)
4. Approval from tech lead
+139
View File
@@ -0,0 +1,139 @@
# Environment Strategy Template
## Overview
Define the environment strategy for the project, including configuration, access, and deployment procedures for each environment.
---
## Environments
### Development (dev)
**Purpose**: Local development and feature testing
| Aspect | Configuration |
|--------|---------------|
| Branch | `dev`, feature branches |
| Database | Local or shared dev instance |
| External Services | Mock/sandbox endpoints |
| Logging Level | DEBUG |
| Access | All developers |
**Configuration**:
```
# .env.development
ENV=development
DATABASE_URL=<dev_database_url>
API_TIMEOUT=30
LOG_LEVEL=DEBUG
```
### Staging (stage)
**Purpose**: Pre-production testing, QA, UAT
| Aspect | Configuration |
|--------|---------------|
| Branch | `stage` |
| Database | Staging instance (production-like) |
| External Services | Sandbox/test endpoints |
| Logging Level | INFO |
| Access | Development team, QA |
**Configuration**:
```
# .env.staging
ENV=staging
DATABASE_URL=<staging_database_url>
API_TIMEOUT=15
LOG_LEVEL=INFO
```
**Deployment Trigger**: Merge to `stage` branch
### Production (prod)
**Purpose**: Live system serving end users
| Aspect | Configuration |
|--------|---------------|
| Branch | `main` |
| Database | Production instance |
| External Services | Production endpoints |
| Logging Level | WARN |
| Access | Restricted (ops team) |
**Configuration**:
```
# .env.production
ENV=production
DATABASE_URL=<production_database_url>
API_TIMEOUT=10
LOG_LEVEL=WARN
```
**Deployment Trigger**: Manual approval after staging validation
---
## Secrets Management
### Secret Categories
- Database credentials
- API keys (internal and external)
- Encryption keys
- Service account credentials
### Storage
| Environment | Secret Storage |
|-------------|----------------|
| Development | .env.local (gitignored) |
| Staging | CI/CD secrets / Vault |
| Production | CI/CD secrets / Vault |
### Rotation Policy
- Database passwords: Every 90 days
- API keys: Every 180 days or on compromise
- Encryption keys: Annually
---
## Environment Parity
### Required Parity
- Same database engine and version
- Same runtime version
- Same dependency versions
- Same configuration structure
### Allowed Differences
- Resource scaling (CPU, memory)
- External service endpoints (sandbox vs production)
- Logging verbosity
- Feature flags
---
## Access Control
| Role | Dev | Staging | Production |
|------|-----|---------|------------|
| Developer | Full | Read + Deploy | Read logs only |
| QA | Read | Full | Read logs only |
| DevOps | Full | Full | Full |
| Stakeholder | None | Read | Read dashboards |
---
## Backup & Recovery
| Environment | Backup Frequency | Retention | RTO | RPO |
|-------------|------------------|-----------|-----|-----|
| Development | None | N/A | N/A | N/A |
| Staging | Daily | 7 days | 4 hours | 24 hours |
| Production | Hourly | 30 days | 1 hour | 1 hour |
---
## Notes
- Never copy production data to lower environments without anonymization
- All environment-specific values must be externalized (no hardcoding)
- Document any environment-specific behaviors in code comments
@@ -0,0 +1,103 @@
# Feature Dependency Matrix
Track feature dependencies to ensure proper implementation order.
---
## Active Features
| Feature ID | Feature Name | Status | Dependencies | Blocks |
|------------|--------------|--------|--------------|--------|
| | | Draft/In Progress/Done | List IDs | List IDs |
---
## Dependency Rules
### Status Definitions
- **Draft**: Spec created, not started
- **In Progress**: Development started
- **Done**: Merged to dev, verified
- **Blocked**: Waiting on dependencies
### Dependency Types
- **Hard**: Cannot start without dependency complete
- **Soft**: Can mock dependency, integrate later
- **API**: Depends on API contract (can parallelize with mock)
- **Data**: Depends on data/schema (must be complete)
---
## Current Dependencies
### [Feature A] depends on:
| Dependency | Type | Status | Blocker? |
|------------|------|--------|----------|
| | Hard/Soft/API/Data | Done/In Progress | Yes/No |
### [Feature B] depends on:
| Dependency | Type | Status | Blocker? |
|------------|------|--------|----------|
| | | | |
---
## Dependency Graph
```
Feature A (Done)
└── Feature B (In Progress)
└── Feature D (Draft)
└── Feature C (Draft)
Feature E (Done)
└── Feature F (In Progress)
```
---
## Implementation Order
Based on dependencies, recommended implementation order:
1. **Phase 1** (No dependencies)
- [ ] Feature X
- [ ] Feature Y
2. **Phase 2** (Depends on Phase 1)
- [ ] Feature Z (after X)
- [ ] Feature W (after Y)
3. **Phase 3** (Depends on Phase 2)
- [ ] Feature V (after Z, W)
---
## Handling Blocked Features
When a feature is blocked:
1. **Identify** the blocking dependency
2. **Escalate** if blocker is delayed
3. **Consider** if feature can proceed with mocks
4. **Document** any workarounds used
5. **Schedule** integration when blocker completes
---
## Mock Strategy
When using mocks for dependencies:
| Feature | Mocked Dependency | Mock Type | Integration Task |
|---------|-------------------|-----------|------------------|
| | | Interface/Data/API | Link to task |
---
## Update Log
| Date | Feature | Change | By |
|------|---------|--------|-----|
| | | Added/Updated/Completed | |
@@ -0,0 +1,129 @@
# Feature Parity Checklist
Use this checklist to ensure all functionality is preserved during refactoring.
---
## Project: [Project Name]
## Refactoring Scope: [Brief description]
## Date: [YYYY-MM-DD]
---
## Feature Inventory
### API Endpoints
| Endpoint | Method | Before | After | Verified |
|----------|--------|--------|-------|----------|
| /api/v1/example | GET | Working | | [ ] |
| | | | | [ ] |
### Core Functions
| Function/Module | Purpose | Before | After | Verified |
|-----------------|---------|--------|-------|----------|
| | | Working | | [ ] |
| | | | | [ ] |
### User Workflows
| Workflow | Steps | Before | After | Verified |
|----------|-------|--------|-------|----------|
| User login | 1. Enter credentials 2. Submit | Working | | [ ] |
| | | | | [ ] |
### Integrations
| External System | Integration Type | Before | After | Verified |
|-----------------|------------------|--------|-------|----------|
| | API/Webhook/DB | Working | | [ ] |
| | | | | [ ] |
---
## Behavioral Parity
### Input Handling
- [ ] Same inputs produce same outputs
- [ ] Error messages unchanged (or improved)
- [ ] Validation rules preserved
- [ ] Edge cases handled identically
### Output Format
- [ ] Response structure unchanged
- [ ] Data types preserved
- [ ] Null handling consistent
- [ ] Date/time formats preserved
### Side Effects
- [ ] Database writes produce same results
- [ ] File operations unchanged
- [ ] External API calls preserved
- [ ] Event emissions maintained
---
## Non-Functional Parity
### Performance
- [ ] Response times within baseline +10%
- [ ] Memory usage within baseline +10%
- [ ] CPU usage within baseline +10%
- [ ] No new N+1 queries introduced
### Security
- [ ] Authentication unchanged
- [ ] Authorization rules preserved
- [ ] Input sanitization maintained
- [ ] No new vulnerabilities introduced
### Reliability
- [ ] Error handling preserved
- [ ] Retry logic maintained
- [ ] Timeout behavior unchanged
- [ ] Circuit breakers preserved
---
## Test Coverage
| Test Type | Before | After | Status |
|-----------|--------|-------|--------|
| Unit Tests | X pass | | [ ] Same or better |
| Integration Tests | X pass | | [ ] Same or better |
| E2E Tests | X pass | | [ ] Same or better |
---
## Verification Steps
### Automated Verification
1. [ ] All existing tests pass
2. [ ] No new linting errors
3. [ ] Coverage >= baseline
### Manual Verification
1. [ ] Smoke test critical paths
2. [ ] Verify UI behavior (if applicable)
3. [ ] Test error scenarios
### Stakeholder Sign-off
- [ ] QA approved
- [ ] Product owner approved (if behavior changed)
---
## Discrepancies Found
| Feature | Expected | Actual | Resolution | Status |
|---------|----------|--------|------------|--------|
| | | | | |
---
## Notes
- Any intentional behavior changes must be documented and approved
- Update this checklist as refactoring progresses
- Keep baseline metrics for comparison
+157
View File
@@ -0,0 +1,157 @@
# Incident Playbook Template
## Incident Overview
| Field | Value |
|-------|-------|
| Playbook Name | [Name] |
| Severity | Critical / High / Medium / Low |
| Last Updated | [YYYY-MM-DD] |
| Owner | [Team/Person] |
---
## Detection
### Symptoms
- [How will you know this incident is occurring?]
- Alert: [Alert name that triggers]
- User reports: [Expected user complaints]
### Monitoring
- Dashboard: [Link to relevant dashboard]
- Logs: [Log query to investigate]
- Metrics: [Key metrics to watch]
---
## Assessment
### Impact Analysis
- Users affected: [All / Subset / Internal only]
- Data at risk: [Yes / No]
- Revenue impact: [High / Medium / Low / None]
### Severity Determination
| Condition | Severity |
|-----------|----------|
| Service completely down | Critical |
| Partial degradation | High |
| Intermittent issues | Medium |
| Minor impact | Low |
---
## Response
### Immediate Actions (First 5 minutes)
1. [ ] Acknowledge alert
2. [ ] Verify incident is real (not false positive)
3. [ ] Notify on-call team
4. [ ] Start incident channel/call
### Investigation Steps
1. [ ] Check recent deployments
2. [ ] Review error logs
3. [ ] Check infrastructure metrics
4. [ ] Identify affected components
### Communication
| Audience | Channel | Frequency |
|----------|---------|-----------|
| Engineering | Slack #incidents | Continuous |
| Stakeholders | Email | Every 30 min |
| Users | Status page | Major updates |
---
## Resolution
### Common Fixes
#### Fix 1: [Common issue]
```bash
# Commands to fix
```
Expected outcome: [What should happen]
#### Fix 2: [Another common issue]
```bash
# Commands to fix
```
Expected outcome: [What should happen]
### Rollback Procedure
1. [ ] Identify last known good version
2. [ ] Execute rollback
```bash
# Rollback commands
```
3. [ ] Verify service restored
4. [ ] Monitor for 15 minutes
### Escalation Path
| Time | Action |
|------|--------|
| 0-15 min | On-call engineer |
| 15-30 min | Team lead |
| 30-60 min | Engineering manager |
| 60+ min | Director/VP |
---
## Post-Incident
### Verification
- [ ] Service fully restored
- [ ] All alerts cleared
- [ ] User-facing functionality verified
- [ ] Monitoring back to normal
### Documentation
- [ ] Timeline documented
- [ ] Root cause identified
- [ ] Action items created
- [ ] Post-mortem scheduled
### Post-Mortem Template
```markdown
## Incident Summary
- Date/Time:
- Duration:
- Impact:
- Root Cause:
## Timeline
- [Time] - Event
## What Went Well
-
## What Went Wrong
-
## Action Items
| Action | Owner | Due Date |
|--------|-------|----------|
| | | |
```
---
## Contacts
| Role | Name | Contact |
|------|------|---------|
| On-call | | |
| Team Lead | | |
| Manager | | |
---
## Revision History
| Date | Author | Changes |
|------|--------|---------|
| | | |
+46 -2
View File
@@ -11,16 +11,60 @@ Jira ticket: [AZ-XXX](link)
- [ ] New feature
- [ ] Refactoring
- [ ] Documentation
- [ ] Performance improvement
- [ ] Security fix
## Checklist
- [ ] Code follows project conventions
- [ ] Self-review completed
- [ ] Tests added/updated
- [ ] All tests pass
- [ ] Code coverage maintained/improved
- [ ] Documentation updated (if needed)
- [ ] CHANGELOG updated
## Breaking Changes
<!-- List any breaking changes, or write "None" -->
- None
## API Changes
<!-- List any API changes (new endpoints, changed signatures, removed endpoints) -->
- None
## Database Changes
<!-- List any database changes (migrations, schema changes) -->
- [ ] No database changes
- [ ] Migration included and tested
- [ ] Rollback migration included
## Deployment Notes
<!-- Special considerations for deployment -->
- [ ] No special deployment steps required
- [ ] Environment variables added/changed (documented in .env.example)
- [ ] Feature flags configured
- [ ] External service dependencies
## Rollback Plan
<!-- Steps to rollback if issues arise -->
1. Revert this PR commit
2. [Additional steps if needed]
## Testing
How to test these changes.
How to test these changes:
1.
2.
3.
## Performance Impact
<!-- Note any performance implications -->
- [ ] No performance impact expected
- [ ] Performance tested (attach results if applicable)
## Security Considerations
<!-- Note any security implications -->
- [ ] No security implications
- [ ] Security review completed
- [ ] Sensitive data handling reviewed
## Screenshots (if applicable)
<!-- Add screenshots for UI changes -->
+140
View File
@@ -0,0 +1,140 @@
# Quality Gates
Quality gates are checkpoints that must pass before proceeding to the next phase.
---
## Kickstart Tutorial Quality Gates
### Gate 1: Research Complete (after 1.40)
Before proceeding to Planning phase:
- [ ] Problem description is clear and complete
- [ ] Acceptance criteria are measurable and testable
- [ ] Restrictions are documented
- [ ] Security requirements defined
- [ ] Solution draft reviewed and finalized
- [ ] Tech stack evaluated and selected
### Gate 2: Planning Complete (after 2.40)
Before proceeding to Implementation phase:
- [ ] All components defined with clear boundaries
- [ ] Data model designed and reviewed
- [ ] API contracts defined
- [ ] Test specifications created
- [ ] Jira epics/tasks created
- [ ] Effort estimated
- [ ] Risks identified and mitigated
### Gate 3: Implementation Complete (after 3.40)
Before merging to main:
- [ ] All components implemented
- [ ] Code coverage >= 75%
- [ ] All tests pass (unit, integration)
- [ ] Code review approved
- [ ] Security scan passed
- [ ] CI/CD pipeline green
- [ ] Deployment tested on staging
- [ ] Documentation complete
---
## Iterative Tutorial Quality Gates
### Gate 1: Spec Ready (after step 20)
Before creating Jira task:
- [ ] Building block clearly defines problem/goal
- [ ] Feature spec has measurable acceptance criteria
- [ ] Dependencies identified
- [ ] Complexity estimated
### Gate 2: Implementation Ready (after step 50)
Before starting development:
- [ ] Plan reviewed and approved
- [ ] Test strategy defined
- [ ] Dependencies available or mocked
### Gate 3: Merge Ready (after step 70)
Before creating PR:
- [ ] All acceptance criteria met
- [ ] Tests pass locally
- [ ] Definition of Done checklist completed
- [ ] No unresolved TODOs in code
---
## Refactoring Tutorial Quality Gates
### Gate 1: Safety Net Ready (after 4.50)
Before starting refactoring:
- [ ] Baseline metrics captured
- [ ] Current behavior documented
- [ ] Integration tests pass (>= 75% coverage)
- [ ] Feature parity checklist created
### Gate 2: Refactoring Safe (after each 4.70 cycle)
After each refactoring step:
- [ ] All existing tests still pass
- [ ] No functionality lost (feature parity check)
- [ ] Performance not degraded (compare to baseline)
### Gate 3: Refactoring Complete (after 4.95)
Before declaring refactoring done:
- [ ] All tests pass
- [ ] Performance improved or maintained
- [ ] Security review passed
- [ ] Technical debt reduced
- [ ] Documentation updated
---
## Automated Gate Checks
### CI Pipeline Gates
```yaml
gates:
build:
- compilation_success: true
quality:
- lint_errors: 0
- code_coverage: ">= 75%"
- code_smells: "< 10 new"
security:
- critical_vulnerabilities: 0
- high_vulnerabilities: 0
tests:
- unit_tests_pass: true
- integration_tests_pass: true
```
### Manual Gate Checks
Some gates require human verification:
- Architecture review
- Security review
- UX review (for UI changes)
- Stakeholder sign-off
---
## Gate Failure Handling
When a gate fails:
1. **Stop** - Do not proceed to next phase
2. **Identify** - Determine which checks failed
3. **Fix** - Address the failures
4. **Re-verify** - Run gate checks again
5. **Document** - If exception needed, get approval and document reason
---
## Exception Process
If a gate must be bypassed:
1. Document the reason
2. Get tech lead approval
3. Create follow-up task to address
4. Set deadline for resolution
5. Add to risk register
+173
View File
@@ -0,0 +1,173 @@
# Rollback Strategy Template
## Overview
| Field | Value |
|-------|-------|
| Service/Component | [Name] |
| Last Updated | [YYYY-MM-DD] |
| Owner | [Team/Person] |
| Max Rollback Time | [Target: X minutes] |
---
## Rollback Triggers
### Automatic Rollback Triggers
- [ ] Health check failures > 3 consecutive
- [ ] Error rate > 10% for 5 minutes
- [ ] P99 latency > 2x baseline for 5 minutes
- [ ] Critical alert triggered
### Manual Rollback Triggers
- [ ] User-reported critical bug
- [ ] Data corruption detected
- [ ] Security vulnerability discovered
- [ ] Stakeholder decision
---
## Pre-Rollback Checklist
- [ ] Incident acknowledged and documented
- [ ] Stakeholders notified of rollback decision
- [ ] Current state captured (logs, metrics snapshot)
- [ ] Rollback target version identified
- [ ] Database state assessed (migrations reversible?)
---
## Rollback Procedures
### Application Rollback
#### Option 1: Revert Deployment (Preferred)
```bash
# Using CI/CD
# Trigger previous successful deployment
# Manual (if needed)
git revert <commit-hash>
git push origin main
```
#### Option 2: Blue-Green Switch
```bash
# Switch traffic to previous version
# [Platform-specific commands]
```
#### Option 3: Feature Flag Disable
```bash
# Disable feature flag
# [Feature flag system commands]
```
### Database Rollback
#### If Migration is Reversible
```bash
# Run down migration
# [Migration tool command]
```
#### If Migration is NOT Reversible
1. [ ] Restore from backup
2. [ ] Point-in-time recovery to pre-deployment
3. [ ] **WARNING**: May cause data loss - requires approval
### Configuration Rollback
```bash
# Restore previous configuration
# [Config management commands]
```
---
## Post-Rollback Verification
### Immediate (0-5 minutes)
- [ ] Service responding to health checks
- [ ] No error spikes in logs
- [ ] Basic functionality verified
### Short-term (5-30 minutes)
- [ ] All critical paths functional
- [ ] Error rate returned to baseline
- [ ] Performance metrics normal
### Extended (30-60 minutes)
- [ ] No delayed issues appearing
- [ ] User reports resolved
- [ ] All alerts cleared
---
## Communication Plan
### During Rollback
| Audience | Message | Channel |
|----------|---------|---------|
| Engineering | "Initiating rollback due to [reason]" | Slack |
| Stakeholders | "Service issue detected, rollback in progress" | Email |
| Users | "We're aware of issues and working on a fix" | Status page |
### After Rollback
| Audience | Message | Channel |
|----------|---------|---------|
| Engineering | "Rollback complete, monitoring" | Slack |
| Stakeholders | "Service restored, post-mortem scheduled" | Email |
| Users | "Issue resolved, service fully operational" | Status page |
---
## Known Limitations
### Cannot Rollback If:
- [ ] Database migration deleted columns with data
- [ ] External API contracts changed
- [ ] Third-party integrations updated
### Partial Rollback Scenarios
- [ ] When only specific components affected
- [ ] When data migration is complex
---
## Recovery After Rollback
### Investigation
1. [ ] Collect all relevant logs
2. [ ] Identify root cause
3. [ ] Document findings
### Re-deployment Planning
1. [ ] Fix identified in development
2. [ ] Additional tests added
3. [ ] Staged rollout planned
4. [ ] Monitoring enhanced
---
## Rollback Testing
### Test Schedule
- [ ] Monthly rollback drill
- [ ] After major infrastructure changes
- [ ] Before critical releases
### Test Scenarios
1. Application rollback
2. Database rollback (in staging)
3. Configuration rollback
---
## Contacts
| Role | Name | Contact |
|------|------|---------|
| On-call | | |
| Database Admin | | |
| Platform Team | | |