Refactor README and command documentation to streamline deployment and CI/CD processes. Consolidate deployment strategies and remove obsolete commands related to CI/CD and observability. Enhance task decomposition workflow by adding data model and deployment planning sections, and update directory structures for improved clarity.

This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-03-19 12:10:11 +02:00
parent 5b1739186e
commit cfd09c79e1
17 changed files with 1314 additions and 313 deletions
+40 -37
View File
@@ -20,23 +20,20 @@
5. /implement — auto-orchestrates all tasks: batches by dependencies, launches parallel implementers, runs code review, loops until done
6. /implement-black-box-tests — E2E tests via Docker consumer app (after all tasks)
7. commit & push
6. commit & push
```
### SHIP (deploy and operate)
```
8. /implement-cicdvalidate/enhance CI/CD pipeline
9. /deploy — deployment strategy per environment
10. /observability — monitoring, logging, alerting plan
7. /deploy containerization, CI/CD, environment strategy, observability, deployment procedures (skill, 5-step workflow)
```
### EVOLVE (maintenance and improvement)
```
11. /refactor — structured refactoring (skill, 6-phase workflow)
8. /refactor — structured refactoring (skill, 6-phase workflow)
9. /retrospective — collect metrics, analyze trends, produce improvement report (skill)
```
## Implementation Flow
@@ -67,29 +64,25 @@ Multi-phase code review invoked after each implementation batch:
Produces structured findings with severity (Critical/High/Medium/Low) and verdict (PASS/FAIL/PASS_WITH_WARNINGS).
### `/implement-black-box-tests`
Reads `_docs/02_plans/integration_tests/` (produced by plan skill Step 1). Builds a separate Docker-based consumer app that exercises the system as a black box — no internal imports, no direct DB access. Runs E2E scenarios, produces a CSV test report.
Run after all tasks are done.
### `/implement-cicd`
Reviews existing CI/CD pipeline configuration, validates all stages work, optimizes performance (parallelization, caching), ensures quality gates are enforced (coverage, linting, security scanning).
Run after `/implement` or after all tasks.
### `/deploy`
Defines deployment strategy per environment: deployment procedures, rollback procedures, health checks, deployment checklist. Outputs `_docs/02_components/deployment_strategy.md`.
Comprehensive deployment skill (5-step workflow):
Run before first production release.
1. Containerization — Dockerfiles per component, docker-compose for dev and tests
2. CI/CD Pipeline — lint, test, security scan, build, deploy with quality gates
3. Environment Strategy — dev, staging, production with secrets management
4. Observability — structured logging, metrics, tracing, alerting, dashboards
5. Deployment Procedures — rollback, health checks, graceful shutdown
### `/observability`
Outputs to `_docs/02_plans/deployment/`. Run after `/implement` or before first production release.
Plans logging strategy, metrics collection, distributed tracing, alerting rules, and dashboards. Outputs `_docs/02_components/observability_plan.md`.
### `/retrospective`
Run before first production release.
Collects metrics from batch reports and code review findings, analyzes trends across implementation cycles, and produces improvement reports. Outputs to `_docs/05_metrics/`.
### `/rollback`
Reverts implementation to a specific batch checkpoint using git revert, resets Jira ticket statuses, and verifies rollback integrity with tests.
### Commit
@@ -106,6 +99,8 @@ After each confirmed batch, the `/implement` skill automatically commits and pus
| **code-review** | "code review", "review code" | 6-phase structured review with findings |
| **refactor** | "refactor", "refactoring", "improve code" | 6-phase structured refactoring workflow |
| **security** | "security audit", "OWASP" | OWASP-based security testing |
| **deploy** | "deploy", "CI/CD", "containerize", "observability" | Containerization, CI/CD, observability, deployment procedures |
| **retrospective** | "retrospective", "retro", "metrics review" | Collect metrics, analyze trends, produce improvement report |
## Project Folder Structure
@@ -134,6 +129,7 @@ _docs/
├── 02_plans/
│ ├── architecture.md
│ ├── system-flows.md
│ ├── data_model.md
│ ├── risk_mitigations.md
│ ├── components/
│ │ └── [##]_[name]/
@@ -146,6 +142,12 @@ _docs/
│ │ ├── functional_tests.md
│ │ ├── non_functional_tests.md
│ │ └── traceability_matrix.md
│ ├── deployment/
│ │ ├── containerization.md
│ │ ├── ci_cd_pipeline.md
│ │ ├── environment_strategy.md
│ │ ├── observability.md
│ │ └── deployment_procedures.md
│ ├── diagrams/
│ └── FINAL_report.md
├── 02_tasks/
@@ -158,15 +160,17 @@ _docs/
│ ├── batch_02_report.md
│ ├── ...
│ └── FINAL_implementation_report.md
── 04_refactoring/
├── baseline_metrics.md
├── discovery/
├── analysis/
├── test_specs/
├── coupling_analysis.md
├── execution_log.md
├── hardening/
└── FINAL_report.md
── 04_refactoring/
├── baseline_metrics.md
├── discovery/
├── analysis/
├── test_specs/
├── coupling_analysis.md
├── execution_log.md
├── hardening/
└── FINAL_report.md
└── 05_metrics/
└── retro_[date].md
```
## Implementation Tools
@@ -176,10 +180,9 @@ _docs/
| `implementer` | Subagent | Implements a single task from its spec. Launched by /implement. |
| `/implement` | Skill | Orchestrates all tasks: dependency batching, parallel agents, code review. |
| `/code-review` | Skill | Multi-phase code review with structured findings. |
| `/implement-black-box-tests` | Command | E2E tests via Docker consumer app. After all tasks. |
| `/implement-cicd` | Command | Validate and enhance CI/CD pipeline. |
| `/deploy` | Command | Plan deployment strategy per environment. |
| `/observability` | Command | Plan logging, metrics, tracing, alerting. |
| `/deploy` | Skill | Containerization, CI/CD, observability, deployment procedures. |
| `/retrospective` | Skill | Collect metrics, analyze trends, produce improvement reports. |
| `/rollback` | Command | Revert to a batch checkpoint with Jira status reset. |
## Automations (Planned)
-71
View File
@@ -1,71 +0,0 @@
# Deployment Strategy Planning
## Initial data:
- Problem description: `@_docs/00_problem/problem_description.md`
- Restrictions: `@_docs/00_problem/restrictions.md`
- Full Solution Description: `@_docs/01_solution/solution.md`
- Components: `@_docs/02_components`
- Environment Strategy: `@_docs/00_templates/environment_strategy.md`
## Role
You are a DevOps/Platform engineer
## Task
- Define deployment strategy for each environment
- Plan deployment procedures and automation
- Define rollback procedures
- Establish deployment verification steps
- Document manual intervention points
## Output
### Deployment Architecture
- Infrastructure diagram (where components run)
- Network topology
- Load balancing strategy
- Container/VM configuration
### Deployment Procedures
#### Staging Deployment
- Trigger conditions
- Pre-deployment checks
- Deployment steps
- Post-deployment verification
- Smoke tests to run
#### Production Deployment
- Approval workflow
- Deployment window
- Pre-deployment checks
- Deployment steps (blue-green, rolling, canary)
- Post-deployment verification
- Smoke tests to run
### Rollback Procedures
- Rollback trigger criteria
- Rollback steps per environment
- Data rollback considerations
- Communication plan during rollback
### Health Checks
- Liveness probe configuration
- Readiness probe configuration
- Custom health endpoints
### Deployment Checklist
- [ ] All tests pass in CI
- [ ] Security scan clean
- [ ] Database migrations reviewed
- [ ] Feature flags configured
- [ ] Monitoring alerts configured
- [ ] Rollback plan documented
- [ ] Stakeholders notified
Store output to `_docs/02_components/deployment_strategy.md`
## Notes
- Prefer automated deployments over manual
- Zero-downtime deployments for production
- Always have a rollback plan
- Ask questions about infrastructure constraints
-64
View File
@@ -1,64 +0,0 @@
# CI/CD Pipeline Validation & Enhancement
## Initial data:
- Problem description: `@_docs/00_problem/problem_description.md`
- Restrictions: `@_docs/00_problem/restrictions.md`
- Full Solution Description: `@_docs/01_solution/solution.md`
- Components: `@_docs/02_components`
- Environment Strategy: `@_docs/00_templates/environment_strategy.md`
## Role
You are a DevOps engineer
## Task
- Review existing CI/CD pipeline configuration
- Validate all stages are working correctly
- Optimize pipeline performance (parallelization, caching)
- Ensure test coverage gates are enforced
- Verify security scanning is properly configured
- Add missing quality gates
## Checklist
### Pipeline Health
- [ ] All stages execute successfully
- [ ] Build time is acceptable (<10 min for most projects)
- [ ] Caching is properly configured (dependencies, build artifacts)
- [ ] Parallel execution where possible
### Quality Gates
- [ ] Code coverage threshold enforced (minimum 75%)
- [ ] Linting errors block merge
- [ ] Security vulnerabilities block merge (critical/high)
- [ ] All tests must pass
### Environment Deployments
- [ ] Staging deployment works on merge to stage branch
- [ ] Environment variables properly configured per environment
- [ ] Secrets are securely managed (not in code)
- [ ] Rollback procedure documented
### Monitoring
- [ ] Build notifications configured (Slack, email, etc.)
- [ ] Failed build alerts
- [ ] Deployment success/failure notifications
## Output
### Pipeline Status Report
- Current pipeline configuration summary
- Issues found and fixes applied
- Performance metrics (build times)
### Recommended Improvements
- Short-term improvements
- Long-term optimizations
### Quality Gate Configuration
- Thresholds configured
- Enforcement rules
## Notes
- Do not break existing functionality
- Test changes in separate branch first
- Document any manual steps required
-122
View File
@@ -1,122 +0,0 @@
# Observability Planning
## Initial data:
- Problem description: `@_docs/00_problem/problem_description.md`
- Full Solution Description: `@_docs/01_solution/solution.md`
- Components: `@_docs/02_components`
- Deployment Strategy: `@_docs/02_components/deployment_strategy.md`
## Role
You are a Site Reliability Engineer (SRE)
## Task
- Define logging strategy across all components
- Plan metrics collection and dashboards
- Design distributed tracing (if applicable)
- Establish alerting rules
- Document incident response procedures
## Output
### Logging Strategy
#### Log Levels
| Level | Usage | Example |
|-------|-------|---------|
| ERROR | Exceptions, failures requiring attention | Database connection failed |
| WARN | Potential issues, degraded performance | Retry attempt 2/3 |
| INFO | Significant business events | User registered, Order placed |
| DEBUG | Detailed diagnostic information | Request payload, Query params |
#### Log Format
```json
{
"timestamp": "ISO8601",
"level": "INFO",
"service": "service-name",
"correlation_id": "uuid",
"message": "Event description",
"context": {}
}
```
#### Log Storage
- Development: Console/file
- Staging: Centralized (ELK, CloudWatch, etc.)
- Production: Centralized with retention policy
### Metrics
#### System Metrics
- CPU usage
- Memory usage
- Disk I/O
- Network I/O
#### Application Metrics
| Metric | Type | Description |
|--------|------|-------------|
| request_count | Counter | Total requests |
| request_duration | Histogram | Response time |
| error_count | Counter | Failed requests |
| active_connections | Gauge | Current connections |
#### Business Metrics
- [Define based on acceptance criteria]
### Distributed Tracing
#### Trace Context
- Correlation ID propagation
- Span naming conventions
- Sampling strategy
#### Integration Points
- HTTP headers
- Message queue metadata
- Database query tagging
### Alerting
#### Alert Categories
| Severity | Response Time | Examples |
|----------|---------------|----------|
| Critical | 5 min | Service down, Data loss |
| High | 30 min | High error rate, Performance degradation |
| Medium | 4 hours | Elevated latency, Disk usage high |
| Low | Next business day | Non-critical warnings |
#### Alert Rules
```yaml
alerts:
- name: high_error_rate
condition: error_rate > 5%
duration: 5m
severity: high
- name: service_down
condition: health_check_failed
duration: 1m
severity: critical
```
### Dashboards
#### Operations Dashboard
- Service health status
- Request rate and error rate
- Response time percentiles
- Resource utilization
#### Business Dashboard
- Key business metrics
- User activity
- Transaction volumes
Store output to `_docs/02_components/observability_plan.md`
## Notes
- Follow the principle: "If it's not monitored, it's not in production"
- Balance verbosity with cost
- Ensure PII is not logged
- Plan for log rotation and retention
+54
View File
@@ -0,0 +1,54 @@
# Implementation Rollback
## Role
You are a DevOps engineer performing a controlled rollback of implementation batches.
## Input
- User specifies a target batch number or commit hash to roll back to
- If not specified, present the list of available batch checkpoints and ask
## Process
1. Read `_docs/03_implementation/batch_*_report.md` files to identify all batch checkpoints with their commit hashes
2. Present batch list to user with: batch number, date, tasks included, commit hash
3. Determine which commits need to be reverted (all commits after the target batch)
4. For each commit to revert (in reverse chronological order):
- Run `git revert <commit-hash> --no-edit`
- Verify no merge conflicts; if conflicts occur, ask user for resolution
5. Run the full test suite to verify rollback integrity
6. If tests fail, report failures and ask user how to proceed
7. Reset Jira ticket statuses for all reverted tasks back to "To Do" via Jira MCP
8. Commit the revert with message: `[ROLLBACK] Reverted to batch [N]: [task list]`
## Output
Write `_docs/03_implementation/rollback_report.md`:
```markdown
# Rollback Report
**Date**: [YYYY-MM-DD]
**Target**: Batch [N] (commit [hash])
**Reverted Batches**: [list]
## Reverted Tasks
| Task | Batch | Status Before | Status After |
|------|-------|--------------|-------------|
| [JIRA-ID] | [batch #] | In Testing | To Do |
## Test Results
- [pass/fail count]
## Jira Updates
- [list of ticket transitions]
## Notes
- [any conflicts, manual steps, or issues encountered]
```
## Safety Rules
- Never force-push; always use `git revert` to preserve history
- Always run tests after rollback
- Always update Jira statuses for reverted tasks
- If rollback fails midway, stop and report — do not leave the codebase in a partial state
+8
View File
@@ -0,0 +1,8 @@
---
description: "Git workflow: work on dev branch, commit message format with Jira IDs"
alwaysApply: true
---
# Git Workflow
- Work on the `dev` branch
- Commit message format: `[JIRA-ID-1] [JIRA-ID-2] Summary of changes`
+22 -2
View File
@@ -124,15 +124,35 @@ At the start of execution, create a TodoWrite with all applicable steps. Update
**Goal**: Produce `01_initial_structure.md` — the first task describing the project skeleton
**Constraints**: This is a plan document, not code. The `/implement` skill executes it.
1. Read architecture.md, all component specs, and system-flows.md from PLANS_DIR
1. Read architecture.md, all component specs, system-flows.md, data_model.md, and `deployment/` from PLANS_DIR
2. Read problem, solution, and restrictions from `_docs/00_problem/` and `_docs/01_solution/`
3. Research best implementation patterns for the identified tech stack
4. Document the structure plan using `templates/initial-structure-task.md`
The bootstrap structure plan must include:
- Project folder layout with all component directories
- Shared models, interfaces, and DTOs
- Dockerfile per component (multi-stage, non-root, health checks, pinned base images)
- `docker-compose.yml` for local development (all components + database + dependencies)
- `docker-compose.test.yml` for integration test environment (black-box test runner)
- `.dockerignore`
- CI/CD pipeline file (`.github/workflows/ci.yml` or `azure-pipelines.yml`) with stages from `deployment/ci_cd_pipeline.md`
- Database migration setup and initial seed data scripts
- Observability configuration: structured logging setup, health check endpoints (`/health/live`, `/health/ready`), metrics endpoint (`/metrics`)
- Environment variable documentation (`.env.example`)
- Test structure with unit and integration test locations
**Self-verification**:
- [ ] All components have corresponding folders in the layout
- [ ] All inter-component interfaces have DTOs defined
- [ ] CI/CD stages cover build, lint, test, security, deploy
- [ ] Dockerfile defined for each component
- [ ] `docker-compose.yml` covers all components and dependencies
- [ ] `docker-compose.test.yml` enables black-box integration testing
- [ ] CI/CD pipeline file defined with lint, test, security, build, deploy stages
- [ ] Database migration setup included
- [ ] Health check endpoints specified for each service
- [ ] Structured logging configuration included
- [ ] `.env.example` with all required environment variables
- [ ] Environment strategy covers dev, staging, production
- [ ] Test structure includes unit and integration test locations
+363
View File
@@ -0,0 +1,363 @@
---
name: deploy
description: |
Comprehensive deployment skill covering containerization, CI/CD pipeline, environment strategy, observability, and deployment procedures.
5-step workflow: Docker containerization, CI/CD pipeline definition, environment strategy, observability planning, deployment procedures.
Uses _docs/02_plans/deployment/ structure.
Trigger phrases:
- "deploy", "deployment", "deployment strategy"
- "CI/CD", "pipeline", "containerize"
- "observability", "monitoring", "logging"
- "dockerize", "docker compose"
category: ship
tags: [deployment, docker, ci-cd, observability, monitoring, containerization]
disable-model-invocation: true
---
# Deployment Planning
Plan and document the full deployment lifecycle: containerize the application, define CI/CD pipelines, configure environments, set up observability, and document deployment procedures.
## Core Principles
- **Docker-first**: every component runs in a container; local dev, integration tests, and production all use Docker
- **Infrastructure as code**: all deployment configuration is version-controlled
- **Observability built-in**: logging, metrics, and tracing are part of the deployment plan, not afterthoughts
- **Environment parity**: dev, staging, and production environments mirror each other as closely as possible
- **Save immediately**: write artifacts to disk after each step; never accumulate unsaved work
- **Ask, don't assume**: when infrastructure constraints or preferences are unclear, ask the user
- **Plan, don't code**: this workflow produces deployment documents and specifications, not implementation code
## Context Resolution
Fixed paths:
- PLANS_DIR: `_docs/02_plans/`
- DEPLOY_DIR: `_docs/02_plans/deployment/`
- ARCHITECTURE: `_docs/02_plans/architecture.md`
- COMPONENTS_DIR: `_docs/02_plans/components/`
Announce the resolved paths to the user before proceeding.
## Input Specification
### Required Files
| File | Purpose |
|------|---------|
| `_docs/00_problem/problem.md` | Problem description and context |
| `_docs/00_problem/restrictions.md` | Constraints and limitations |
| `_docs/01_solution/solution.md` | Finalized solution |
| `PLANS_DIR/architecture.md` | Architecture from plan skill |
| `PLANS_DIR/components/` | Component specs |
### Prerequisite Checks (BLOCKING)
1. `architecture.md` exists — **STOP if missing**, run `/plan` first
2. At least one component spec exists in `PLANS_DIR/components/`**STOP if missing**
3. Create DEPLOY_DIR if it does not exist
4. If DEPLOY_DIR already contains artifacts, ask user: **resume from last checkpoint or start fresh?**
## Artifact Management
### Directory Structure
```
DEPLOY_DIR/
├── containerization.md
├── ci_cd_pipeline.md
├── environment_strategy.md
├── observability.md
└── deployment_procedures.md
```
### Save Timing
| Step | Save immediately after | Filename |
|------|------------------------|----------|
| Step 1 | Containerization plan complete | `containerization.md` |
| Step 2 | CI/CD pipeline defined | `ci_cd_pipeline.md` |
| Step 3 | Environment strategy documented | `environment_strategy.md` |
| Step 4 | Observability plan complete | `observability.md` |
| Step 5 | Deployment procedures documented | `deployment_procedures.md` |
### Resumability
If DEPLOY_DIR already contains artifacts:
1. List existing files and match to the save timing table
2. Identify the last completed step
3. Resume from the next incomplete step
4. Inform the user which steps are being skipped
## Progress Tracking
At the start of execution, create a TodoWrite with all steps (1 through 5). Update status as each step completes.
## Workflow
### Step 1: Containerization
**Role**: DevOps / Platform engineer
**Goal**: Define Docker configuration for every component, local development, and integration test environments
**Constraints**: Plan only — no Dockerfile creation. Describe what each Dockerfile should contain.
1. Read architecture.md and all component specs
2. Read restrictions.md for infrastructure constraints
3. Research best Docker practices for the project's tech stack (multi-stage builds, base image selection, layer optimization)
4. For each component, define:
- Base image (pinned version, prefer alpine/distroless for production)
- Build stages (dependency install, build, production)
- Non-root user configuration
- Health check endpoint and command
- Exposed ports
- `.dockerignore` contents
5. Define `docker-compose.yml` for local development:
- All application components
- Database (Postgres) with named volume
- Any message queues, caches, or external service mocks
- Shared network
- Environment variable files (`.env.dev`)
6. Define `docker-compose.test.yml` for integration tests:
- Application components under test
- Test runner container (black-box, no internal imports)
- Isolated database with seed data
- All tests runnable via `docker compose -f docker-compose.test.yml up --abort-on-container-exit`
7. Define image tagging strategy: `<registry>/<project>/<component>:<git-sha>` for CI, `latest` for local dev only
**Self-verification**:
- [ ] Every component has a Dockerfile specification
- [ ] Multi-stage builds specified for all production images
- [ ] Non-root user for all containers
- [ ] Health checks defined for every service
- [ ] docker-compose.yml covers all components + dependencies
- [ ] docker-compose.test.yml enables black-box integration testing
- [ ] `.dockerignore` defined
**Save action**: Write `containerization.md` using `templates/containerization.md`
**BLOCKING**: Present containerization plan to user. Do NOT proceed until confirmed.
---
### Step 2: CI/CD Pipeline
**Role**: DevOps engineer
**Goal**: Define the CI/CD pipeline with quality gates, security scanning, and multi-environment deployment
**Constraints**: Pipeline definition only — produce YAML specification, not implementation
1. Read architecture.md for tech stack and deployment targets
2. Read restrictions.md for CI/CD constraints (cloud provider, registry, etc.)
3. Research CI/CD best practices for the project's platform (GitHub Actions / Azure Pipelines)
4. Define pipeline stages:
| Stage | Trigger | Steps | Quality Gate |
|-------|---------|-------|-------------|
| **Lint** | Every push | Run linters per language (black, rustfmt, prettier, dotnet format) | Zero errors |
| **Test** | Every push | Unit tests, integration tests, coverage report | 75%+ coverage |
| **Security** | Every push | Dependency audit, SAST scan (Semgrep/SonarQube), image scan (Trivy) | Zero critical/high CVEs |
| **Build** | PR merge to dev | Build Docker images, tag with git SHA | Build succeeds |
| **Push** | After build | Push to container registry | Push succeeds |
| **Deploy Staging** | After push | Deploy to staging environment | Health checks pass |
| **Smoke Tests** | After staging deploy | Run critical path tests against staging | All pass |
| **Deploy Production** | Manual approval | Deploy to production | Health checks pass |
5. Define caching strategy: dependency caches, Docker layer caches, build artifact caches
6. Define parallelization: which stages can run concurrently
7. Define notifications: build failures, deployment status, security alerts
**Self-verification**:
- [ ] All pipeline stages defined with triggers and gates
- [ ] Coverage threshold enforced (75%+)
- [ ] Security scanning included (dependencies + images + SAST)
- [ ] Caching configured for dependencies and Docker layers
- [ ] Multi-environment deployment (staging → production)
- [ ] Rollback procedure referenced
- [ ] Notifications configured
**Save action**: Write `ci_cd_pipeline.md` using `templates/ci_cd_pipeline.md`
---
### Step 3: Environment Strategy
**Role**: Platform engineer
**Goal**: Define environment configuration, secrets management, and environment parity
**Constraints**: Strategy document — no secrets or credentials in output
1. Define environments:
| Environment | Purpose | Infrastructure | Data |
|-------------|---------|---------------|------|
| **Development** | Local developer workflow | docker-compose, local volumes | Seed data, mocks for external APIs |
| **Staging** | Pre-production validation | Mirrors production topology | Anonymized production-like data |
| **Production** | Live system | Full infrastructure | Real data |
2. Define environment variable management:
- `.env.example` with all required variables (no real values)
- Per-environment variable sources (`.env` for dev, secret manager for staging/prod)
- Validation: fail fast on missing required variables at startup
3. Define secrets management:
- Never commit secrets to version control
- Development: `.env` files (git-ignored)
- Staging/Production: secret manager (AWS Secrets Manager / Azure Key Vault / Vault)
- Rotation policy
4. Define database management per environment:
- Development: Docker Postgres with named volume, seed data
- Staging: managed Postgres, migrations applied via CI/CD
- Production: managed Postgres, migrations require approval
**Self-verification**:
- [ ] All three environments defined with clear purpose
- [ ] Environment variable documentation complete (`.env.example`)
- [ ] No secrets in any output document
- [ ] Secret manager specified for staging/production
- [ ] Database strategy per environment
**Save action**: Write `environment_strategy.md` using `templates/environment_strategy.md`
---
### Step 4: Observability
**Role**: Site Reliability Engineer (SRE)
**Goal**: Define logging, metrics, tracing, and alerting strategy
**Constraints**: Strategy document — describe what to implement, not how to wire it
1. Read architecture.md and component specs for service boundaries
2. Research observability best practices for the tech stack
**Logging**:
- Structured JSON to stdout/stderr (no file logging in containers)
- Fields: `timestamp` (ISO 8601), `level`, `service`, `correlation_id`, `message`, `context`
- Levels: ERROR (exceptions), WARN (degraded), INFO (business events), DEBUG (diagnostics, dev only)
- No PII in logs
- Retention: dev = console, staging = 7 days, production = 30 days
**Metrics**:
- Expose Prometheus-compatible `/metrics` endpoint per service
- System metrics: CPU, memory, disk, network
- Application metrics: `request_count`, `request_duration` (histogram), `error_count`, `active_connections`
- Business metrics: derived from acceptance criteria
- Collection interval: 15s
**Distributed Tracing**:
- OpenTelemetry SDK integration
- Trace context propagation via HTTP headers and message queue metadata
- Span naming: `<service>.<operation>`
- Sampling: 100% in dev/staging, 10% in production (adjust based on volume)
**Alerting**:
| Severity | Response Time | Condition Examples |
|----------|---------------|-------------------|
| Critical | 5 min | Service down, data loss, health check failed |
| High | 30 min | Error rate > 5%, P95 latency > 2x baseline |
| Medium | 4 hours | Disk > 80%, elevated latency |
| Low | Next business day | Non-critical warnings |
**Dashboards**:
- Operations: service health, request rate, error rate, response time percentiles, resource utilization
- Business: key business metrics from acceptance criteria
**Self-verification**:
- [ ] Structured logging format defined with required fields
- [ ] Metrics endpoint specified per service
- [ ] OpenTelemetry tracing configured
- [ ] Alert severities with response times defined
- [ ] Dashboards cover operations and business metrics
- [ ] PII exclusion from logs addressed
**Save action**: Write `observability.md` using `templates/observability.md`
---
### Step 5: Deployment Procedures
**Role**: DevOps / Platform engineer
**Goal**: Define deployment strategy, rollback procedures, health checks, and deployment checklist
**Constraints**: Procedures document — no implementation
1. Define deployment strategy:
- Preferred pattern: blue-green / rolling / canary (choose based on architecture)
- Zero-downtime requirement for production
- Graceful shutdown: 30-second grace period for in-flight requests
- Database migration ordering: migrate before deploy, backward-compatible only
2. Define health checks:
| Check | Type | Endpoint | Interval | Threshold |
|-------|------|----------|----------|-----------|
| Liveness | HTTP GET | `/health/live` | 10s | 3 failures → restart |
| Readiness | HTTP GET | `/health/ready` | 5s | 3 failures → remove from LB |
| Startup | HTTP GET | `/health/ready` | 5s | 30 attempts max |
3. Define rollback procedures:
- Trigger criteria: health check failures, error rate spike, critical alert
- Rollback steps: redeploy previous image tag, verify health, rollback database if needed
- Communication: notify stakeholders during rollback
- Post-mortem: required after every production rollback
4. Define deployment checklist:
- [ ] All tests pass in CI
- [ ] Security scan clean (zero critical/high CVEs)
- [ ] Database migrations reviewed and tested
- [ ] Environment variables configured
- [ ] Health check endpoints responding
- [ ] Monitoring alerts configured
- [ ] Rollback plan documented and tested
- [ ] Stakeholders notified
**Self-verification**:
- [ ] Deployment strategy chosen and justified
- [ ] Zero-downtime approach specified
- [ ] Health checks defined (liveness, readiness, startup)
- [ ] Rollback trigger criteria and steps documented
- [ ] Deployment checklist complete
**Save action**: Write `deployment_procedures.md` using `templates/deployment_procedures.md`
**BLOCKING**: Present deployment procedures to user. Do NOT proceed until confirmed.
---
## Escalation Rules
| Situation | Action |
|-----------|--------|
| Unknown cloud provider or hosting | **ASK user** |
| Container registry not specified | **ASK user** |
| CI/CD platform preference unclear | **ASK user** — default to GitHub Actions |
| Secret manager not chosen | **ASK user** |
| Deployment pattern trade-offs | **ASK user** with recommendation |
| Missing architecture.md | **STOP** — run `/plan` first |
## Common Mistakes
- **Implementing during planning**: this workflow produces documents, not Dockerfiles or pipeline YAML
- **Hardcoding secrets**: never include real credentials in deployment documents
- **Ignoring integration test containerization**: the test environment must be containerized alongside the app
- **Skipping BLOCKING gates**: never proceed past a BLOCKING marker without user confirmation
- **Using `:latest` tags**: always pin base image versions
- **Forgetting observability**: logging, metrics, and tracing are deployment concerns, not post-deployment additions
## Methodology Quick Reference
```
┌────────────────────────────────────────────────────────────────┐
│ Deployment Planning (5-Step Method) │
├────────────────────────────────────────────────────────────────┤
│ PREREQ: architecture.md + component specs exist │
│ │
│ 1. Containerization → containerization.md │
│ [BLOCKING: user confirms Docker plan] │
│ 2. CI/CD Pipeline → ci_cd_pipeline.md │
│ 3. Environment → environment_strategy.md │
│ 4. Observability → observability.md │
│ 5. Procedures → deployment_procedures.md │
│ [BLOCKING: user confirms deployment plan] │
├────────────────────────────────────────────────────────────────┤
│ Principles: Docker-first · IaC · Observability built-in │
│ Environment parity · Save immediately │
└────────────────────────────────────────────────────────────────┘
```
@@ -0,0 +1,87 @@
# CI/CD Pipeline Template
Save as `_docs/02_plans/deployment/ci_cd_pipeline.md`.
---
```markdown
# [System Name] — CI/CD Pipeline
## Pipeline Overview
| Stage | Trigger | Quality Gate |
|-------|---------|-------------|
| Lint | Every push | Zero lint errors |
| Test | Every push | 75%+ coverage, all tests pass |
| Security | Every push | Zero critical/high CVEs |
| Build | PR merge to dev | Docker build succeeds |
| Push | After build | Images pushed to registry |
| Deploy Staging | After push | Health checks pass |
| Smoke Tests | After staging deploy | Critical paths pass |
| Deploy Production | Manual approval | Health checks pass |
## Stage Details
### Lint
- [Language-specific linters and formatters]
- Runs in parallel per language
### Test
- Unit tests: [framework and command]
- Integration tests: [framework and command, uses docker-compose.test.yml]
- Coverage threshold: 75% overall, 90% critical paths
- Coverage report published as pipeline artifact
### Security
- Dependency audit: [tool, e.g., npm audit / pip-audit / dotnet list package --vulnerable]
- SAST scan: [tool, e.g., Semgrep / SonarQube]
- Image scan: Trivy on built Docker images
- Block on: critical or high severity findings
### Build
- Docker images built using multi-stage Dockerfiles
- Tagged with git SHA: `<registry>/<component>:<sha>`
- Build cache: Docker layer cache via CI cache action
### Push
- Registry: [container registry URL]
- Authentication: [method]
### Deploy Staging
- Deployment method: [docker compose / Kubernetes / cloud service]
- Pre-deploy: run database migrations
- Post-deploy: verify health check endpoints
- Automated rollback on health check failure
### Smoke Tests
- Subset of integration tests targeting staging environment
- Validates critical user flows
- Timeout: [maximum duration]
### Deploy Production
- Requires manual approval via [mechanism]
- Deployment strategy: [blue-green / rolling / canary]
- Pre-deploy: database migration review
- Post-deploy: health checks + monitoring for 15 min
## Caching Strategy
| Cache | Key | Restore Keys |
|-------|-----|-------------|
| Dependencies | [lockfile hash] | [partial match] |
| Docker layers | [Dockerfile hash] | [partial match] |
| Build artifacts | [source hash] | [partial match] |
## Parallelization
[Diagram or description of which stages run concurrently]
## Notifications
| Event | Channel | Recipients |
|-------|---------|-----------|
| Build failure | [Slack/email] | [team] |
| Security alert | [Slack/email] | [team + security] |
| Deploy success | [Slack] | [team] |
| Deploy failure | [Slack/email + PagerDuty] | [on-call] |
```
@@ -0,0 +1,94 @@
# Containerization Plan Template
Save as `_docs/02_plans/deployment/containerization.md`.
---
```markdown
# [System Name] — Containerization
## Component Dockerfiles
### [Component Name]
| Property | Value |
|----------|-------|
| Base image | [e.g., mcr.microsoft.com/dotnet/aspnet:8.0-alpine] |
| Build image | [e.g., mcr.microsoft.com/dotnet/sdk:8.0-alpine] |
| Stages | [dependency install → build → production] |
| User | [non-root user name] |
| Health check | [endpoint and command] |
| Exposed ports | [port list] |
| Key build args | [if any] |
### [Repeat for each component]
## Docker Compose — Local Development
```yaml
# docker-compose.yml structure
services:
[component]:
build: ./[path]
ports: ["host:container"]
environment: [reference .env.dev]
depends_on: [dependencies with health condition]
healthcheck: [command, interval, timeout, retries]
db:
image: [postgres:version-alpine]
volumes: [named volume]
environment: [credentials from .env.dev]
healthcheck: [pg_isready]
volumes:
[named volumes]
networks:
[shared network]
```
## Docker Compose — Integration Tests
```yaml
# docker-compose.test.yml structure
services:
[app components under test]
test-runner:
build: ./tests/integration
depends_on: [app components with health condition]
environment: [test configuration]
# Exit code determines test pass/fail
db:
image: [postgres:version-alpine]
volumes: [seed data mount]
```
Run: `docker compose -f docker-compose.test.yml up --abort-on-container-exit`
## Image Tagging Strategy
| Context | Tag Format | Example |
|---------|-----------|---------|
| CI build | `<registry>/<project>/<component>:<git-sha>` | `ghcr.io/org/api:a1b2c3d` |
| Release | `<registry>/<project>/<component>:<semver>` | `ghcr.io/org/api:1.2.0` |
| Local dev | `<component>:latest` | `api:latest` |
## .dockerignore
```
.git
.cursor
_docs
_standalone
node_modules
**/bin
**/obj
**/__pycache__
*.md
.env*
docker-compose*.yml
```
```
@@ -0,0 +1,103 @@
# Deployment Procedures Template
Save as `_docs/02_plans/deployment/deployment_procedures.md`.
---
```markdown
# [System Name] — Deployment Procedures
## Deployment Strategy
**Pattern**: [blue-green / rolling / canary]
**Rationale**: [why this pattern fits the architecture]
**Zero-downtime**: required for production deployments
### Graceful Shutdown
- Grace period: 30 seconds for in-flight requests
- Sequence: stop accepting new requests → drain connections → shutdown
- Container orchestrator: `terminationGracePeriodSeconds: 40`
### Database Migration Ordering
- Migrations run **before** new code deploys
- All migrations must be backward-compatible (old code works with new schema)
- Irreversible migrations require explicit approval
## Health Checks
| Check | Type | Endpoint | Interval | Failure Threshold | Action |
|-------|------|----------|----------|-------------------|--------|
| Liveness | HTTP GET | `/health/live` | 10s | 3 failures | Restart container |
| Readiness | HTTP GET | `/health/ready` | 5s | 3 failures | Remove from load balancer |
| Startup | HTTP GET | `/health/ready` | 5s | 30 attempts | Kill and recreate |
### Health Check Responses
- `/health/live`: returns 200 if process is running (no dependency checks)
- `/health/ready`: returns 200 if all dependencies (DB, cache, queues) are reachable
## Staging Deployment
1. CI/CD builds and pushes Docker images tagged with git SHA
2. Run database migrations against staging
3. Deploy new images to staging environment
4. Wait for health checks to pass (readiness probe)
5. Run smoke tests against staging
6. If smoke tests fail: automatic rollback to previous image
## Production Deployment
1. **Approval**: manual approval required via [mechanism]
2. **Pre-deploy checks**:
- [ ] Staging smoke tests passed
- [ ] Security scan clean
- [ ] Database migration reviewed
- [ ] Monitoring alerts configured
- [ ] Rollback plan confirmed
3. **Deploy**: apply deployment strategy (blue-green / rolling / canary)
4. **Verify**: health checks pass, error rate stable, latency within baseline
5. **Monitor**: observe dashboards for 15 minutes post-deploy
6. **Finalize**: mark deployment as successful or trigger rollback
## Rollback Procedures
### Trigger Criteria
- Health check failures persist after deploy
- Error rate exceeds 5% for more than 5 minutes
- Critical alert fires within 15 minutes of deploy
- Manual decision by on-call engineer
### Rollback Steps
1. Redeploy previous Docker image tag (from CI/CD artifact)
2. Verify health checks pass
3. If database migration was applied:
- Run DOWN migration if reversible
- If irreversible: assess data impact, escalate if needed
4. Notify stakeholders
5. Schedule post-mortem within 24 hours
### Post-Mortem
Required after every production rollback:
- Timeline of events
- Root cause
- What went wrong
- Prevention measures
## Deployment Checklist
- [ ] All tests pass in CI
- [ ] Security scan clean (zero critical/high CVEs)
- [ ] Docker images built and pushed
- [ ] Database migrations reviewed and tested
- [ ] Environment variables configured for target environment
- [ ] Health check endpoints verified
- [ ] Monitoring alerts configured
- [ ] Rollback plan documented and tested
- [ ] Stakeholders notified of deployment window
- [ ] On-call engineer available during deployment
```
@@ -0,0 +1,61 @@
# Environment Strategy Template
Save as `_docs/02_plans/deployment/environment_strategy.md`.
---
```markdown
# [System Name] — Environment Strategy
## Environments
| Environment | Purpose | Infrastructure | Data Source |
|-------------|---------|---------------|-------------|
| Development | Local developer workflow | docker-compose | Seed data, mocked externals |
| Staging | Pre-production validation | [mirrors production] | Anonymized production-like data |
| Production | Live system | [full infrastructure] | Real data |
## Environment Variables
### Required Variables
| Variable | Purpose | Dev Default | Staging/Prod Source |
|----------|---------|-------------|-------------------|
| `DATABASE_URL` | Postgres connection | `postgres://dev:dev@db:5432/app` | Secret manager |
| [add all required variables] | | | |
### `.env.example`
```env
# Copy to .env and fill in values
DATABASE_URL=postgres://user:pass@host:5432/dbname
# [all required variables with placeholder values]
```
### Variable Validation
All services validate required environment variables at startup and fail fast with a clear error message if any are missing.
## Secrets Management
| Environment | Method | Tool |
|-------------|--------|------|
| Development | `.env` file (git-ignored) | dotenv |
| Staging | Secret manager | [AWS Secrets Manager / Azure Key Vault / Vault] |
| Production | Secret manager | [AWS Secrets Manager / Azure Key Vault / Vault] |
Rotation policy: [frequency and procedure]
## Database Management
| Environment | Type | Migrations | Data |
|-------------|------|-----------|------|
| Development | Docker Postgres, named volume | Applied on container start | Seed data via init script |
| Staging | Managed Postgres | Applied via CI/CD pipeline | Anonymized production snapshot |
| Production | Managed Postgres | Applied via CI/CD with approval | Live data |
Migration rules:
- All migrations must be backward-compatible (support old and new code simultaneously)
- Reversible migrations required (DOWN/rollback script)
- Production migrations require review before apply
```
@@ -0,0 +1,132 @@
# Observability Template
Save as `_docs/02_plans/deployment/observability.md`.
---
```markdown
# [System Name] — Observability
## Logging
### Format
Structured JSON to stdout/stderr. No file-based logging in containers.
```json
{
"timestamp": "ISO8601",
"level": "INFO",
"service": "service-name",
"correlation_id": "uuid",
"message": "Event description",
"context": {}
}
```
### Log Levels
| Level | Usage | Example |
|-------|-------|---------|
| ERROR | Exceptions, failures requiring attention | Database connection failed |
| WARN | Potential issues, degraded performance | Retry attempt 2/3 |
| INFO | Significant business events | User registered, Order placed |
| DEBUG | Detailed diagnostics (dev/staging only) | Request payload, Query params |
### Retention
| Environment | Destination | Retention |
|-------------|-------------|-----------|
| Development | Console | Session |
| Staging | [log aggregator] | 7 days |
| Production | [log aggregator] | 30 days |
### PII Rules
- Never log passwords, tokens, or session IDs
- Mask email addresses and personal identifiers
- Log user IDs (opaque) instead of usernames
## Metrics
### Endpoints
Every service exposes Prometheus-compatible metrics at `/metrics`.
### Application Metrics
| Metric | Type | Description |
|--------|------|-------------|
| `request_count` | Counter | Total HTTP requests by method, path, status |
| `request_duration_seconds` | Histogram | Response time by method, path |
| `error_count` | Counter | Failed requests by type |
| `active_connections` | Gauge | Current open connections |
### System Metrics
- CPU usage, Memory usage, Disk I/O, Network I/O
### Business Metrics
| Metric | Type | Description | Source |
|--------|------|-------------|--------|
| [from acceptance criteria] | | | |
Collection interval: 15 seconds
## Distributed Tracing
### Configuration
- SDK: OpenTelemetry
- Propagation: W3C Trace Context via HTTP headers
- Span naming: `<service>.<operation>`
### Sampling
| Environment | Rate | Rationale |
|-------------|------|-----------|
| Development | 100% | Full visibility |
| Staging | 100% | Full visibility |
| Production | 10% | Balance cost vs observability |
### Integration Points
- HTTP requests: automatic instrumentation
- Database queries: automatic instrumentation
- Message queues: manual span creation on publish/consume
## Alerting
| Severity | Response Time | Conditions |
|----------|---------------|-----------|
| Critical | 5 min | Service unreachable, health check failed for 1 min, data loss detected |
| High | 30 min | Error rate > 5% for 5 min, P95 latency > 2x baseline for 10 min |
| Medium | 4 hours | Disk usage > 80%, elevated latency, connection pool exhaustion |
| Low | Next business day | Non-critical warnings, deprecated API usage |
### Notification Channels
| Severity | Channel |
|----------|---------|
| Critical | [PagerDuty / phone] |
| High | [Slack + email] |
| Medium | [Slack] |
| Low | [Dashboard only] |
## Dashboards
### Operations Dashboard
- Service health status (up/down per component)
- Request rate and error rate
- Response time percentiles (P50, P95, P99)
- Resource utilization (CPU, memory per container)
- Active alerts
### Business Dashboard
- [Key business metrics from acceptance criteria]
- [User activity indicators]
- [Transaction volumes]
```
+81 -4
View File
@@ -1,7 +1,7 @@
---
name: plan
description: |
Decompose a solution into architecture, system flows, components, tests, and Jira epics.
Decompose a solution into architecture, data model, deployment plan, system flows, components, tests, and Jira epics.
Systematic 6-step planning workflow with BLOCKING gates, self-verification, and structured artifact management.
Uses _docs/ + _docs/02_plans/ structure.
Trigger phrases:
@@ -15,7 +15,7 @@ disable-model-invocation: true
# Solution Planning
Decompose a problem and solution into architecture, system flows, components, tests, and Jira epics through a systematic 6-step workflow.
Decompose a problem and solution into architecture, data model, deployment plan, system flows, components, tests, and Jira epics through a systematic 6-step workflow.
## Core Principles
@@ -91,6 +91,13 @@ PLANS_DIR/
│ └── traceability_matrix.md
├── architecture.md
├── system-flows.md
├── data_model.md
├── deployment/
│ ├── containerization.md
│ ├── ci_cd_pipeline.md
│ ├── environment_strategy.md
│ ├── observability.md
│ └── deployment_procedures.md
├── risk_mitigations.md
├── risk_mitigations_02.md (iterative, ## as sequence)
├── components/
@@ -124,6 +131,8 @@ PLANS_DIR/
| Step 1 | Integration traceability matrix | `integration_tests/traceability_matrix.md` |
| Step 2 | Architecture analysis complete | `architecture.md` |
| Step 2 | System flows documented | `system-flows.md` |
| Step 2 | Data model documented | `data_model.md` |
| Step 2 | Deployment plan complete | `deployment/` (5 files) |
| Step 3 | Each component analyzed | `components/[##]_[name]/description.md` |
| Step 3 | Common helpers generated | `common-helpers/[##]_helper_[name].md` |
| Step 3 | Diagrams generated | `diagrams/` |
@@ -210,9 +219,11 @@ Capture any new questions, findings, or insights that arise during test specific
### Step 2: Solution Analysis
**Role**: Professional software architect
**Goal**: Produce `architecture.md` and `system-flows.md` from the solution draft
**Goal**: Produce `architecture.md`, `system-flows.md`, `data_model.md`, and `deployment/` from the solution draft
**Constraints**: No code, no component-level detail yet; focus on system-level view
#### Phase 2a: Architecture & Flows
1. Read all input files thoroughly
2. Incorporate findings, questions, and insights discovered during Step 1 (integration tests)
3. Research unknown or questionable topics via internet; ask user about ambiguities
@@ -230,6 +241,56 @@ Capture any new questions, findings, or insights that arise during test specific
**BLOCKING**: Present architecture summary to user. Do NOT proceed until user confirms.
#### Phase 2b: Data Model
**Role**: Professional software architect
**Goal**: Produce a detailed data model document covering entities, relationships, and migration strategy
1. Extract core entities from architecture.md and solution.md
2. Define entity attributes, types, and constraints
3. Define relationships between entities (Mermaid ERD)
4. Define migration strategy: versioning tool (EF Core migrations / Alembic / sql-migrate), reversibility requirement, naming convention
5. Define seed data requirements per environment (dev, staging)
6. Define backward compatibility approach for schema changes (additive-only by default)
**Self-verification**:
- [ ] Every entity mentioned in architecture.md is defined
- [ ] Relationships are explicit with cardinality
- [ ] Migration strategy specifies reversibility requirement
- [ ] Seed data requirements defined
- [ ] Backward compatibility approach documented
**Save action**: Write `data_model.md`
#### Phase 2c: Deployment Planning
**Role**: DevOps / Platform engineer
**Goal**: Produce deployment plan covering containerization, CI/CD, environment strategy, observability, and deployment procedures
Use the `/deploy` skill's templates as structure for each artifact:
1. Read architecture.md and restrictions.md for infrastructure constraints
2. Research Docker best practices for the project's tech stack
3. Define containerization plan: Dockerfile per component, docker-compose for dev and tests
4. Define CI/CD pipeline: stages, quality gates, caching, parallelization
5. Define environment strategy: dev, staging, production with secrets management
6. Define observability: structured logging, metrics, tracing, alerting
7. Define deployment procedures: strategy, health checks, rollback, checklist
**Self-verification**:
- [ ] Every component has a Docker specification
- [ ] CI/CD pipeline covers lint, test, security, build, deploy
- [ ] Environment strategy covers dev, staging, production
- [ ] Observability covers logging, metrics, tracing, alerting
- [ ] Deployment procedures include rollback and health checks
**Save action**: Write all 5 files under `deployment/`:
- `containerization.md`
- `ci_cd_pipeline.md`
- `environment_strategy.md`
- `observability.md`
- `deployment_procedures.md`
---
### Step 3: Component Decomposition
@@ -375,6 +436,20 @@ Before writing the final report, verify ALL of the following:
- [ ] Deployment model is defined
- [ ] Integration test findings are reflected in architecture decisions
### Data Model
- [ ] Every entity from architecture.md is defined
- [ ] Relationships have explicit cardinality
- [ ] Migration strategy with reversibility requirement
- [ ] Seed data requirements defined
- [ ] Backward compatibility approach documented
### Deployment
- [ ] Containerization plan covers all components
- [ ] CI/CD pipeline includes lint, test, security, build, deploy stages
- [ ] Environment strategy covers dev, staging, production
- [ ] Observability covers logging, metrics, tracing, alerting
- [ ] Deployment procedures include rollback and health checks
### Components
- [ ] Every component follows SRP
- [ ] No circular dependencies
@@ -441,8 +516,10 @@ Before writing the final report, verify ALL of the following:
│ │
│ 1. Integration Tests → integration_tests/ (5 files) │
│ [BLOCKING: user confirms test coverage] │
│ 2. Solution Analysis → architecture.md, system-flows.md │
│ 2a. Architecture → architecture.md, system-flows.md │
│ [BLOCKING: user confirms architecture] │
│ 2b. Data Model → data_model.md │
│ 2c. Deployment → deployment/ (5 files) │
│ 3. Component Decompose → components/[##]_[name]/description │
│ [BLOCKING: user confirms decomposition] │
│ 4. Review & Risk → risk_mitigations.md │
+174
View File
@@ -0,0 +1,174 @@
---
name: retrospective
description: |
Collect metrics from implementation batch reports and code review findings, analyze trends across cycles,
and produce improvement reports with actionable recommendations.
3-step workflow: collect metrics, analyze trends, produce report.
Outputs to _docs/05_metrics/.
Trigger phrases:
- "retrospective", "retro", "run retro"
- "metrics review", "feedback loop"
- "implementation metrics", "analyze trends"
category: evolve
tags: [retrospective, metrics, trends, improvement, feedback-loop]
disable-model-invocation: true
---
# Retrospective
Collect metrics from implementation artifacts, analyze trends across development cycles, and produce actionable improvement reports.
## Core Principles
- **Data-driven**: conclusions come from metrics, not impressions
- **Actionable**: every finding must have a concrete improvement suggestion
- **Cumulative**: each retrospective compares against previous ones to track progress
- **Save immediately**: write artifacts to disk after each step
- **Non-judgmental**: focus on process improvement, not blame
## Context Resolution
Fixed paths:
- IMPL_DIR: `_docs/03_implementation/`
- METRICS_DIR: `_docs/05_metrics/`
- TASKS_DIR: `_docs/02_tasks/`
Announce the resolved paths to the user before proceeding.
## Prerequisite Checks (BLOCKING)
1. `IMPL_DIR` exists and contains at least one `batch_*_report.md`**STOP if missing** (nothing to analyze)
2. Create METRICS_DIR if it does not exist
3. Check for previous retrospective reports in METRICS_DIR to enable trend comparison
## Artifact Management
### Directory Structure
```
METRICS_DIR/
├── retro_[YYYY-MM-DD].md
├── retro_[YYYY-MM-DD].md
└── ...
```
## Progress Tracking
At the start of execution, create a TodoWrite with all steps (1 through 3). Update status as each step completes.
## Workflow
### Step 1: Collect Metrics
**Role**: Data analyst
**Goal**: Parse all implementation artifacts and extract quantitative metrics
**Constraints**: Collection only — no interpretation yet
#### Sources
| Source | Metrics Extracted |
|--------|------------------|
| `batch_*_report.md` | Tasks per batch, batch count, task statuses (Done/Blocked/Partial) |
| Code review sections in batch reports | PASS/FAIL/PASS_WITH_WARNINGS ratios, finding counts by severity and category |
| Task spec files in TASKS_DIR | Complexity points per task, dependency count |
| `FINAL_implementation_report.md` | Total tasks, total batches, overall duration |
| Git log (if available) | Commits per batch, files changed per batch |
#### Metrics to Compute
**Implementation Metrics**:
- Total tasks implemented
- Total batches executed
- Average tasks per batch
- Average complexity points per batch
- Total complexity points delivered
**Quality Metrics**:
- Code review pass rate (PASS / total reviews)
- Code review findings by severity: Critical, High, Medium, Low counts
- Code review findings by category: Bug, Spec-Gap, Security, Performance, Maintainability, Style, Scope
- FAIL count (batches that required user intervention)
**Efficiency Metrics**:
- Blocked task count and reasons
- Tasks completed on first attempt vs requiring fixes
- Batch with most findings (identify problem areas)
**Self-verification**:
- [ ] All batch reports parsed
- [ ] All metric categories computed
- [ ] No batch reports missed
---
### Step 2: Analyze Trends
**Role**: Process improvement analyst
**Goal**: Identify patterns, recurring issues, and improvement opportunities
**Constraints**: Analysis must be grounded in the metrics from Step 1
1. If previous retrospective reports exist in METRICS_DIR, load the most recent one for comparison
2. Identify patterns:
- **Recurring findings**: which code review categories appear most frequently?
- **Problem components**: which components/files generate the most findings?
- **Complexity accuracy**: do high-complexity tasks actually produce more issues?
- **Blocker patterns**: what types of blockers occur and can they be prevented?
3. Compare against previous retrospective (if exists):
- Which metrics improved?
- Which metrics degraded?
- Were previous improvement actions effective?
4. Identify top 3 improvement actions ranked by impact
**Self-verification**:
- [ ] Patterns are grounded in specific metrics
- [ ] Comparison with previous retro included (if exists)
- [ ] Top 3 actions are concrete and actionable
---
### Step 3: Produce Report
**Role**: Technical writer
**Goal**: Write a structured retrospective report with metrics, trends, and recommendations
**Constraints**: Concise, data-driven, actionable
Write `METRICS_DIR/retro_[YYYY-MM-DD].md` using `templates/retrospective-report.md` as structure.
**Self-verification**:
- [ ] All metrics from Step 1 included
- [ ] Trend analysis from Step 2 included
- [ ] Top 3 improvement actions clearly stated
- [ ] Suggested rule/skill updates are specific
**Save action**: Write `retro_[YYYY-MM-DD].md`
Present the report summary to the user.
---
## Escalation Rules
| Situation | Action |
|-----------|--------|
| No batch reports exist | **STOP** — nothing to analyze |
| Batch reports have inconsistent format | **WARN user**, extract what is available |
| No previous retrospective for comparison | PROCEED — report baseline metrics only |
| Metrics suggest systemic issue (>50% FAIL rate) | **WARN user** — suggest immediate process review |
## Methodology Quick Reference
```
┌────────────────────────────────────────────────────────────────┐
│ Retrospective (3-Step Method) │
├────────────────────────────────────────────────────────────────┤
│ PREREQ: batch reports exist in _docs/03_implementation/ │
│ │
│ 1. Collect Metrics → parse batch reports, compute metrics │
│ 2. Analyze Trends → patterns, comparison, improvement areas │
│ 3. Produce Report → _docs/05_metrics/retro_[date].md │
├────────────────────────────────────────────────────────────────┤
│ Principles: Data-driven · Actionable · Cumulative │
│ Non-judgmental · Save immediately │
└────────────────────────────────────────────────────────────────┘
```
@@ -0,0 +1,93 @@
# Retrospective Report Template
Save as `_docs/05_metrics/retro_[YYYY-MM-DD].md`.
---
```markdown
# Retrospective — [YYYY-MM-DD]
## Implementation Summary
| Metric | Value |
|--------|-------|
| Total tasks | [count] |
| Total batches | [count] |
| Total complexity points | [sum] |
| Avg tasks per batch | [value] |
| Avg complexity per batch | [value] |
## Quality Metrics
### Code Review Results
| Verdict | Count | Percentage |
|---------|-------|-----------|
| PASS | [count] | [%] |
| PASS_WITH_WARNINGS | [count] | [%] |
| FAIL | [count] | [%] |
### Findings by Severity
| Severity | Count |
|----------|-------|
| Critical | [count] |
| High | [count] |
| Medium | [count] |
| Low | [count] |
### Findings by Category
| Category | Count | Top Files |
|----------|-------|-----------|
| Bug | [count] | [most affected files] |
| Spec-Gap | [count] | [most affected files] |
| Security | [count] | [most affected files] |
| Performance | [count] | [most affected files] |
| Maintainability | [count] | [most affected files] |
| Style | [count] | [most affected files] |
## Efficiency
| Metric | Value |
|--------|-------|
| Blocked tasks | [count] |
| Tasks requiring fixes after review | [count] |
| Batch with most findings | Batch [N] — [reason] |
### Blocker Analysis
| Blocker Type | Count | Prevention |
|-------------|-------|-----------|
| [type] | [count] | [suggested prevention] |
## Trend Comparison
| Metric | Previous | Current | Change |
|--------|----------|---------|--------|
| Pass rate | [%] | [%] | [+/-] |
| Avg findings per batch | [value] | [value] | [+/-] |
| Blocked tasks | [count] | [count] | [+/-] |
*Previous retrospective: [date or "N/A — first retro"]*
## Top 3 Improvement Actions
1. **[Action title]**: [specific, actionable description]
- Impact: [expected improvement]
- Effort: [low/medium/high]
2. **[Action title]**: [specific, actionable description]
- Impact: [expected improvement]
- Effort: [low/medium/high]
3. **[Action title]**: [specific, actionable description]
- Impact: [expected improvement]
- Effort: [low/medium/high]
## Suggested Rule/Skill Updates
| File | Change | Rationale |
|------|--------|-----------|
| [.cursor/rules/... or .cursor/skills/...] | [specific change] | [based on which metric] |
```
+2 -13
View File
@@ -49,19 +49,8 @@ When testing security or conducting audits:
- Validating input sanitization
- Reviewing security configuration
### OWASP Top 10 (2021)
| # | Vulnerability | Key Test |
|---|---------------|----------|
| 1 | Broken Access Control | User A accessing User B's data |
| 2 | Cryptographic Failures | Plaintext passwords, HTTP |
| 3 | Injection | SQL/XSS/command injection |
| 4 | Insecure Design | Rate limiting, session timeout |
| 5 | Security Misconfiguration | Verbose errors, exposed /admin |
| 6 | Vulnerable Components | npm audit, outdated packages |
| 7 | Auth Failures | Weak passwords, no MFA |
| 8 | Integrity Failures | Unsigned updates, malware |
| 9 | Logging Failures | No audit trail for breaches |
| 10 | SSRF | Server fetching internal URLs |
### OWASP Top 10
Use the most recent **stable** version of the OWASP Top 10. At the start of each security audit, research the current version at https://owasp.org/www-project-top-ten/ and test against all listed categories. Do not rely on a hardcoded list — the OWASP Top 10 is updated periodically and the current version must be verified.
### Tools
| Type | Tool | Purpose |