more detailed SDLC plan

This commit is contained in:
Oleksandr Bezdieniezhnykh
2025-12-10 19:05:17 +02:00
parent 73cbe43397
commit fd75243a84
22 changed files with 2087 additions and 34 deletions
@@ -22,10 +22,21 @@
- helpers - empty implementations or interfaces
- Add .gitignore appropriate for the project's language/framework
- Add .env.example with required environment variables
- Add CI/CD skeleton (GitHub Actions, GitLab CI, or appropriate)
- Configure CI/CD pipeline with full stages:
- Build stage
- Lint/Static analysis stage
- Unit tests stage
- Integration tests stage
- Security scan stage (SAST/dependency check)
- Deploy to staging stage (triggered on merge to stage branch)
- Define environment strategy based on `@_docs/00_templates/environment_strategy.md`:
- Development environment configuration
- Staging environment configuration
- Production environment configuration (if applicable)
- Add database migration setup if applicable
- Add README.md, describe the project by @_docs/01_solution/solution.md
- Create a separate folder for the integration tests (not a separate repo)
- Configure branch protection rules recommendations
## Example
The structure should roughly looks like this:
@@ -1,42 +1,64 @@
# CI/CD Setup
# CI/CD Pipeline Validation & Enhancement
## Initial data:
- Problem description: `@_docs/00_problem/problem_description.md`.
- Restrictions: `@_docs/00_problem/restrictions.md`.
- Problem description: `@_docs/00_problem/problem_description.md`
- Restrictions: `@_docs/00_problem/restrictions.md`
- Full Solution Description: `@_docs/01_solution/solution.md`
- Components: `@_docs/02_components`
- Environment Strategy: `@_docs/00_templates/environment_strategy.md`
## Role
You are a DevOps engineer
## Task
- Review project structure and dependencies
- Configure CI/CD pipeline with stages:
- Build
- Lint
- Unit tests
- Integration tests
- Security scan (if applicable)
- Deploy to staging (if applicable)
- Configure environment variables handling
- Set up test reporting
- Configure branch protection rules recommendations
- Review existing CI/CD pipeline configuration
- Validate all stages are working correctly
- Optimize pipeline performance (parallelization, caching)
- Ensure test coverage gates are enforced
- Verify security scanning is properly configured
- Add missing quality gates
## Checklist
### Pipeline Health
- [ ] All stages execute successfully
- [ ] Build time is acceptable (<10 min for most projects)
- [ ] Caching is properly configured (dependencies, build artifacts)
- [ ] Parallel execution where possible
### Quality Gates
- [ ] Code coverage threshold enforced (minimum 75%)
- [ ] Linting errors block merge
- [ ] Security vulnerabilities block merge (critical/high)
- [ ] All tests must pass
### Environment Deployments
- [ ] Staging deployment works on merge to stage branch
- [ ] Environment variables properly configured per environment
- [ ] Secrets are securely managed (not in code)
- [ ] Rollback procedure documented
### Monitoring
- [ ] Build notifications configured (Slack, email, etc.)
- [ ] Failed build alerts
- [ ] Deployment success/failure notifications
## Output
### Pipeline Configuration
- Pipeline file(s) created/updated
- Stages description
- Triggers (on push, PR, etc.)
### Environment Setup
- Required secrets/variables
- Environment-specific configs
### Pipeline Status Report
- Current pipeline configuration summary
- Issues found and fixes applied
- Performance metrics (build times)
### Deployment Strategy
- Staging deployment steps
- Production deployment steps (if applicable)
### Recommended Improvements
- Short-term improvements
- Long-term optimizations
### Quality Gate Configuration
- Thresholds configured
- Enforcement rules
## Notes
- Use project-appropriate CI/CD tool (GitHub Actions, GitLab CI, Azure DevOps, etc.)
- Keep pipeline fast - parallelize where possible
- Do not break existing functionality
- Test changes in separate branch first
- Document any manual steps required
@@ -0,0 +1,72 @@
# Deployment Strategy Planning
## Initial data:
- Problem description: `@_docs/00_problem/problem_description.md`
- Restrictions: `@_docs/00_problem/restrictions.md`
- Full Solution Description: `@_docs/01_solution/solution.md`
- Components: `@_docs/02_components`
- Environment Strategy: `@_docs/00_templates/environment_strategy.md`
## Role
You are a DevOps/Platform engineer
## Task
- Define deployment strategy for each environment
- Plan deployment procedures and automation
- Define rollback procedures
- Establish deployment verification steps
- Document manual intervention points
## Output
### Deployment Architecture
- Infrastructure diagram (where components run)
- Network topology
- Load balancing strategy
- Container/VM configuration
### Deployment Procedures
#### Staging Deployment
- Trigger conditions
- Pre-deployment checks
- Deployment steps
- Post-deployment verification
- Smoke tests to run
#### Production Deployment
- Approval workflow
- Deployment window
- Pre-deployment checks
- Deployment steps (blue-green, rolling, canary)
- Post-deployment verification
- Smoke tests to run
### Rollback Procedures
- Rollback trigger criteria
- Rollback steps per environment
- Data rollback considerations
- Communication plan during rollback
### Health Checks
- Liveness probe configuration
- Readiness probe configuration
- Custom health endpoints
### Deployment Checklist
- [ ] All tests pass in CI
- [ ] Security scan clean
- [ ] Database migrations reviewed
- [ ] Feature flags configured
- [ ] Monitoring alerts configured
- [ ] Rollback plan documented
- [ ] Stakeholders notified
Store output to `_docs/02_components/deployment_strategy.md`
## Notes
- Prefer automated deployments over manual
- Zero-downtime deployments for production
- Always have a rollback plan
- Ask questions about infrastructure constraints
@@ -0,0 +1,123 @@
# Observability Planning
## Initial data:
- Problem description: `@_docs/00_problem/problem_description.md`
- Full Solution Description: `@_docs/01_solution/solution.md`
- Components: `@_docs/02_components`
- Deployment Strategy: `@_docs/02_components/deployment_strategy.md`
## Role
You are a Site Reliability Engineer (SRE)
## Task
- Define logging strategy across all components
- Plan metrics collection and dashboards
- Design distributed tracing (if applicable)
- Establish alerting rules
- Document incident response procedures
## Output
### Logging Strategy
#### Log Levels
| Level | Usage | Example |
|-------|-------|---------|
| ERROR | Exceptions, failures requiring attention | Database connection failed |
| WARN | Potential issues, degraded performance | Retry attempt 2/3 |
| INFO | Significant business events | User registered, Order placed |
| DEBUG | Detailed diagnostic information | Request payload, Query params |
#### Log Format
```json
{
"timestamp": "ISO8601",
"level": "INFO",
"service": "service-name",
"correlation_id": "uuid",
"message": "Event description",
"context": {}
}
```
#### Log Storage
- Development: Console/file
- Staging: Centralized (ELK, CloudWatch, etc.)
- Production: Centralized with retention policy
### Metrics
#### System Metrics
- CPU usage
- Memory usage
- Disk I/O
- Network I/O
#### Application Metrics
| Metric | Type | Description |
|--------|------|-------------|
| request_count | Counter | Total requests |
| request_duration | Histogram | Response time |
| error_count | Counter | Failed requests |
| active_connections | Gauge | Current connections |
#### Business Metrics
- [Define based on acceptance criteria]
### Distributed Tracing
#### Trace Context
- Correlation ID propagation
- Span naming conventions
- Sampling strategy
#### Integration Points
- HTTP headers
- Message queue metadata
- Database query tagging
### Alerting
#### Alert Categories
| Severity | Response Time | Examples |
|----------|---------------|----------|
| Critical | 5 min | Service down, Data loss |
| High | 30 min | High error rate, Performance degradation |
| Medium | 4 hours | Elevated latency, Disk usage high |
| Low | Next business day | Non-critical warnings |
#### Alert Rules
```yaml
alerts:
- name: high_error_rate
condition: error_rate > 5%
duration: 5m
severity: high
- name: service_down
condition: health_check_failed
duration: 1m
severity: critical
```
### Dashboards
#### Operations Dashboard
- Service health status
- Request rate and error rate
- Response time percentiles
- Resource utilization
#### Business Dashboard
- Key business metrics
- User activity
- Transaction volumes
Store output to `_docs/02_components/observability_plan.md`
## Notes
- Follow the principle: "If it's not monitored, it's not in production"
- Balance verbosity with cost
- Ensure PII is not logged
- Plan for log rotation and retention