more detailed SDLC plan

2026-04-22 22:06:37 +00:00 · 2025-12-10 19:05:17 +02:00
parent 73cbe43397
commit fd75243a84
22 changed files with 2087 additions and 34 deletions
@@ -22,10 +22,21 @@
   - helpers - empty implementations or interfaces
 - Add .gitignore appropriate for the project's language/framework
 - Add .env.example with required environment variables
- - Add CI/CD skeleton (GitHub Actions, GitLab CI, or appropriate)
+ - Configure CI/CD pipeline with full stages:
+   - Build stage
+   - Lint/Static analysis stage
+   - Unit tests stage
+   - Integration tests stage
+   - Security scan stage (SAST/dependency check)
+   - Deploy to staging stage (triggered on merge to stage branch)
+ - Define environment strategy based on `@_docs/00_templates/environment_strategy.md`:
+   - Development environment configuration
+   - Staging environment configuration
+   - Production environment configuration (if applicable)
 - Add database migration setup if applicable
 - Add README.md, describe the project by @_docs/01_solution/solution.md
 - Create a separate folder for the integration tests (not a separate repo)
+ - Configure branch protection rules recommendations

 ## Example
 The structure should roughly looks like this:
@@ -1,42 +1,64 @@
-# CI/CD Setup
+# CI/CD Pipeline Validation & Enhancement

 ## Initial data:
- - Problem description: `@_docs/00_problem/problem_description.md`.
- - Restrictions: `@_docs/00_problem/restrictions.md`.
+ - Problem description: `@_docs/00_problem/problem_description.md`
+ - Restrictions: `@_docs/00_problem/restrictions.md`
 - Full Solution Description: `@_docs/01_solution/solution.md`
 - Components: `@_docs/02_components`
+ - Environment Strategy: `@_docs/00_templates/environment_strategy.md`

 ## Role
  You are a DevOps engineer

 ## Task
- - Review project structure and dependencies
- - Configure CI/CD pipeline with stages:
-   - Build
-   - Lint
-   - Unit tests
-   - Integration tests
-   - Security scan (if applicable)
-   - Deploy to staging (if applicable)
- - Configure environment variables handling
- - Set up test reporting
- - Configure branch protection rules recommendations
+ - Review existing CI/CD pipeline configuration
+ - Validate all stages are working correctly
+ - Optimize pipeline performance (parallelization, caching)
+ - Ensure test coverage gates are enforced
+ - Verify security scanning is properly configured
+ - Add missing quality gates
+
+## Checklist
+
+### Pipeline Health
+ - [ ] All stages execute successfully
+ - [ ] Build time is acceptable (<10 min for most projects)
+ - [ ] Caching is properly configured (dependencies, build artifacts)
+ - [ ] Parallel execution where possible
+
+### Quality Gates
+ - [ ] Code coverage threshold enforced (minimum 75%)
+ - [ ] Linting errors block merge
+ - [ ] Security vulnerabilities block merge (critical/high)
+ - [ ] All tests must pass
+
+### Environment Deployments
+ - [ ] Staging deployment works on merge to stage branch
+ - [ ] Environment variables properly configured per environment
+ - [ ] Secrets are securely managed (not in code)
+ - [ ] Rollback procedure documented
+
+### Monitoring
+ - [ ] Build notifications configured (Slack, email, etc.)
+ - [ ] Failed build alerts
+ - [ ] Deployment success/failure notifications

 ## Output
- ### Pipeline Configuration
-  - Pipeline file(s) created/updated
-  - Stages description
-  - Triggers (on push, PR, etc.)

- ### Environment Setup
-  - Required secrets/variables
-  - Environment-specific configs
+### Pipeline Status Report
+ - Current pipeline configuration summary
+ - Issues found and fixes applied
+ - Performance metrics (build times)

- ### Deployment Strategy
-  - Staging deployment steps
-  - Production deployment steps (if applicable)
+### Recommended Improvements
+ - Short-term improvements
+ - Long-term optimizations
+
+### Quality Gate Configuration
+ - Thresholds configured
+ - Enforcement rules

 ## Notes
- - Use project-appropriate CI/CD tool (GitHub Actions, GitLab CI, Azure DevOps, etc.)
- - Keep pipeline fast - parallelize where possible
-
+ - Do not break existing functionality
+ - Test changes in separate branch first
+ - Document any manual steps required
@@ -0,0 +1,72 @@
+# Deployment Strategy Planning
+
+## Initial data:
+ - Problem description: `@_docs/00_problem/problem_description.md`
+ - Restrictions: `@_docs/00_problem/restrictions.md`
+ - Full Solution Description: `@_docs/01_solution/solution.md`
+ - Components: `@_docs/02_components`
+ - Environment Strategy: `@_docs/00_templates/environment_strategy.md`
+
+## Role
+  You are a DevOps/Platform engineer
+
+## Task
+ - Define deployment strategy for each environment
+ - Plan deployment procedures and automation
+ - Define rollback procedures
+ - Establish deployment verification steps
+ - Document manual intervention points
+
+## Output
+
+### Deployment Architecture
+ - Infrastructure diagram (where components run)
+ - Network topology
+ - Load balancing strategy
+ - Container/VM configuration
+
+### Deployment Procedures
+
+#### Staging Deployment
+ - Trigger conditions
+ - Pre-deployment checks
+ - Deployment steps
+ - Post-deployment verification
+ - Smoke tests to run
+
+#### Production Deployment
+ - Approval workflow
+ - Deployment window
+ - Pre-deployment checks
+ - Deployment steps (blue-green, rolling, canary)
+ - Post-deployment verification
+ - Smoke tests to run
+
+### Rollback Procedures
+ - Rollback trigger criteria
+ - Rollback steps per environment
+ - Data rollback considerations
+ - Communication plan during rollback
+
+### Health Checks
+ - Liveness probe configuration
+ - Readiness probe configuration
+ - Custom health endpoints
+
+### Deployment Checklist
+ - [ ] All tests pass in CI
+ - [ ] Security scan clean
+ - [ ] Database migrations reviewed
+ - [ ] Feature flags configured
+ - [ ] Monitoring alerts configured
+ - [ ] Rollback plan documented
+ - [ ] Stakeholders notified
+
+Store output to `_docs/02_components/deployment_strategy.md`
+
+## Notes
+ - Prefer automated deployments over manual
+ - Zero-downtime deployments for production
+ - Always have a rollback plan
+ - Ask questions about infrastructure constraints
+
@@ -0,0 +1,123 @@
+# Observability Planning
+
+## Initial data:
+ - Problem description: `@_docs/00_problem/problem_description.md`
+ - Full Solution Description: `@_docs/01_solution/solution.md`
+ - Components: `@_docs/02_components`
+ - Deployment Strategy: `@_docs/02_components/deployment_strategy.md`
+
+## Role
+  You are a Site Reliability Engineer (SRE)
+
+## Task
+ - Define logging strategy across all components
+ - Plan metrics collection and dashboards
+ - Design distributed tracing (if applicable)
+ - Establish alerting rules
+ - Document incident response procedures
+
+## Output
+
+### Logging Strategy
+
+#### Log Levels
+| Level | Usage | Example |
+|-------|-------|---------|
+| ERROR | Exceptions, failures requiring attention | Database connection failed |
+| WARN | Potential issues, degraded performance | Retry attempt 2/3 |
+| INFO | Significant business events | User registered, Order placed |
+| DEBUG | Detailed diagnostic information | Request payload, Query params |
+
+#### Log Format
+```json
+{
+  "timestamp": "ISO8601",
+  "level": "INFO",
+  "service": "service-name",
+  "correlation_id": "uuid",
+  "message": "Event description",
+  "context": {}
+}
+```
+
+#### Log Storage
+- Development: Console/file
+- Staging: Centralized (ELK, CloudWatch, etc.)
+- Production: Centralized with retention policy
+
+### Metrics
+
+#### System Metrics
+- CPU usage
+- Memory usage
+- Disk I/O
+- Network I/O
+
+#### Application Metrics
+| Metric | Type | Description |
+|--------|------|-------------|
+| request_count | Counter | Total requests |
+| request_duration | Histogram | Response time |
+| error_count | Counter | Failed requests |
+| active_connections | Gauge | Current connections |
+
+#### Business Metrics
+- [Define based on acceptance criteria]
+
+### Distributed Tracing
+
+#### Trace Context
+- Correlation ID propagation
+- Span naming conventions
+- Sampling strategy
+
+#### Integration Points
+- HTTP headers
+- Message queue metadata
+- Database query tagging
+
+### Alerting
+
+#### Alert Categories
+| Severity | Response Time | Examples |
+|----------|---------------|----------|
+| Critical | 5 min | Service down, Data loss |
+| High | 30 min | High error rate, Performance degradation |
+| Medium | 4 hours | Elevated latency, Disk usage high |
+| Low | Next business day | Non-critical warnings |
+
+#### Alert Rules
+```yaml
+alerts:
+  - name: high_error_rate
+    condition: error_rate > 5%
+    duration: 5m
+    severity: high
+    
+  - name: service_down
+    condition: health_check_failed
+    duration: 1m
+    severity: critical
+```
+
+### Dashboards
+
+#### Operations Dashboard
+- Service health status
+- Request rate and error rate
+- Response time percentiles
+- Resource utilization
+
+#### Business Dashboard
+- Key business metrics
+- User activity
+- Transaction volumes
+
+Store output to `_docs/02_components/observability_plan.md`
+
+## Notes
+ - Follow the principle: "If it's not monitored, it's not in production"
+ - Balance verbosity with cost
+ - Ensure PII is not logged
+ - Plan for log rotation and retention
+