Refactor README and command documentation to streamline deployment and CI/CD processes. Consolidate deployment strategies and remove obsolete commands related to CI/CD and observability. Enhance task decomposition workflow by adding data model and deployment planning sections, and update directory structures for improved clarity.

2026-06-22 09:51:13 +00:00 · 2026-03-19 12:10:11 +02:00
parent 5b1739186e
commit cfd09c79e1
17 changed files with 1314 additions and 313 deletions
@@ -20,23 +20,20 @@

 5. /implement                                    — auto-orchestrates all tasks: batches by dependencies, launches parallel implementers, runs code review, loops until done

-6. /implement-black-box-tests                    — E2E tests via Docker consumer app (after all tasks)
-
-7. commit & push
+6. commit & push
 ```

 ### SHIP (deploy and operate)

 ```
-8. /implement-cicd                               — validate/enhance CI/CD pipeline
-9. /deploy                                       — deployment strategy per environment
-10. /observability                               — monitoring, logging, alerting plan
+7. /deploy                                       — containerization, CI/CD, environment strategy, observability, deployment procedures (skill, 5-step workflow)
 ```

 ### EVOLVE (maintenance and improvement)

 ```
-11. /refactor                                    — structured refactoring (skill, 6-phase workflow)
+8. /refactor                                     — structured refactoring (skill, 6-phase workflow)
+9. /retrospective                                — collect metrics, analyze trends, produce improvement report (skill)
 ```

 ## Implementation Flow
@@ -67,29 +64,25 @@ Multi-phase code review invoked after each implementation batch:

 Produces structured findings with severity (Critical/High/Medium/Low) and verdict (PASS/FAIL/PASS_WITH_WARNINGS).

-### `/implement-black-box-tests`
-
-Reads `_docs/02_plans/integration_tests/` (produced by plan skill Step 1). Builds a separate Docker-based consumer app that exercises the system as a black box — no internal imports, no direct DB access. Runs E2E scenarios, produces a CSV test report.
-
-Run after all tasks are done.
-
-### `/implement-cicd`
-
-Reviews existing CI/CD pipeline configuration, validates all stages work, optimizes performance (parallelization, caching), ensures quality gates are enforced (coverage, linting, security scanning).
-
-Run after `/implement` or after all tasks.
-
 ### `/deploy`

-Defines deployment strategy per environment: deployment procedures, rollback procedures, health checks, deployment checklist. Outputs `_docs/02_components/deployment_strategy.md`.
+Comprehensive deployment skill (5-step workflow):

-Run before first production release.
+1. Containerization — Dockerfiles per component, docker-compose for dev and tests
+2. CI/CD Pipeline — lint, test, security scan, build, deploy with quality gates
+3. Environment Strategy — dev, staging, production with secrets management
+4. Observability — structured logging, metrics, tracing, alerting, dashboards
+5. Deployment Procedures — rollback, health checks, graceful shutdown

-### `/observability`
+Outputs to `_docs/02_plans/deployment/`. Run after `/implement` or before first production release.

-Plans logging strategy, metrics collection, distributed tracing, alerting rules, and dashboards. Outputs `_docs/02_components/observability_plan.md`.
+### `/retrospective`

-Run before first production release.
+Collects metrics from batch reports and code review findings, analyzes trends across implementation cycles, and produces improvement reports. Outputs to `_docs/05_metrics/`.
+
+### `/rollback`
+
+Reverts implementation to a specific batch checkpoint using git revert, resets Jira ticket statuses, and verifies rollback integrity with tests.

 ### Commit

@@ -106,6 +99,8 @@ After each confirmed batch, the `/implement` skill automatically commits and pus
 | **code-review** | "code review", "review code" | 6-phase structured review with findings |
 | **refactor** | "refactor", "refactoring", "improve code" | 6-phase structured refactoring workflow |
 | **security** | "security audit", "OWASP" | OWASP-based security testing |
+| **deploy** | "deploy", "CI/CD", "containerize", "observability" | Containerization, CI/CD, observability, deployment procedures |
+| **retrospective** | "retrospective", "retro", "metrics review" | Collect metrics, analyze trends, produce improvement report |

 ## Project Folder Structure

@@ -134,6 +129,7 @@ _docs/
 ├── 02_plans/
 │   ├── architecture.md
 │   ├── system-flows.md
+│   ├── data_model.md
 │   ├── risk_mitigations.md
 │   ├── components/
 │   │   └── [##]_[name]/
@@ -146,6 +142,12 @@ _docs/
 │   │   ├── functional_tests.md
 │   │   ├── non_functional_tests.md
 │   │   └── traceability_matrix.md
+│   ├── deployment/
+│   │   ├── containerization.md
+│   │   ├── ci_cd_pipeline.md
+│   │   ├── environment_strategy.md
+│   │   ├── observability.md
+│   │   └── deployment_procedures.md
 │   ├── diagrams/
 │   └── FINAL_report.md
 ├── 02_tasks/
@@ -158,15 +160,17 @@ _docs/
 │   ├── batch_02_report.md
 │   ├── ...
 │   └── FINAL_implementation_report.md
-└── 04_refactoring/
-    ├── baseline_metrics.md
-    ├── discovery/
-    ├── analysis/
-    ├── test_specs/
-    ├── coupling_analysis.md
-    ├── execution_log.md
-    ├── hardening/
-    └── FINAL_report.md
+├── 04_refactoring/
+│   ├── baseline_metrics.md
+│   ├── discovery/
+│   ├── analysis/
+│   ├── test_specs/
+│   ├── coupling_analysis.md
+│   ├── execution_log.md
+│   ├── hardening/
+│   └── FINAL_report.md
+└── 05_metrics/
+    └── retro_[date].md
 ```

 ## Implementation Tools
@@ -176,10 +180,9 @@ _docs/
 | `implementer` | Subagent | Implements a single task from its spec. Launched by /implement. |
 | `/implement` | Skill | Orchestrates all tasks: dependency batching, parallel agents, code review. |
 | `/code-review` | Skill | Multi-phase code review with structured findings. |
-| `/implement-black-box-tests` | Command | E2E tests via Docker consumer app. After all tasks. |
-| `/implement-cicd` | Command | Validate and enhance CI/CD pipeline. |
-| `/deploy` | Command | Plan deployment strategy per environment. |
-| `/observability` | Command | Plan logging, metrics, tracing, alerting. |
+| `/deploy` | Skill | Containerization, CI/CD, observability, deployment procedures. |
+| `/retrospective` | Skill | Collect metrics, analyze trends, produce improvement reports. |
+| `/rollback` | Command | Revert to a batch checkpoint with Jira status reset. |

 ## Automations (Planned)

@@ -1,71 +0,0 @@
-# Deployment Strategy Planning
-
-## Initial data:
- - Problem description: `@_docs/00_problem/problem_description.md`
- - Restrictions: `@_docs/00_problem/restrictions.md`
- - Full Solution Description: `@_docs/01_solution/solution.md`
- - Components: `@_docs/02_components`
- - Environment Strategy: `@_docs/00_templates/environment_strategy.md`
-
-## Role
-  You are a DevOps/Platform engineer
-
-## Task
- - Define deployment strategy for each environment
- - Plan deployment procedures and automation
- - Define rollback procedures
- - Establish deployment verification steps
- - Document manual intervention points
-
-## Output
-
-### Deployment Architecture
- - Infrastructure diagram (where components run)
- - Network topology
- - Load balancing strategy
- - Container/VM configuration
-
-### Deployment Procedures
-
-#### Staging Deployment
- - Trigger conditions
- - Pre-deployment checks
- - Deployment steps
- - Post-deployment verification
- - Smoke tests to run
-
-#### Production Deployment
- - Approval workflow
- - Deployment window
- - Pre-deployment checks
- - Deployment steps (blue-green, rolling, canary)
- - Post-deployment verification
- - Smoke tests to run
-
-### Rollback Procedures
- - Rollback trigger criteria
- - Rollback steps per environment
- - Data rollback considerations
- - Communication plan during rollback
-
-### Health Checks
- - Liveness probe configuration
- - Readiness probe configuration
- - Custom health endpoints
-
-### Deployment Checklist
- - [ ] All tests pass in CI
- - [ ] Security scan clean
- - [ ] Database migrations reviewed
- - [ ] Feature flags configured
- - [ ] Monitoring alerts configured
- - [ ] Rollback plan documented
- - [ ] Stakeholders notified
-
-Store output to `_docs/02_components/deployment_strategy.md`
-
-## Notes
- - Prefer automated deployments over manual
- - Zero-downtime deployments for production
- - Always have a rollback plan
- - Ask questions about infrastructure constraints
@@ -1,64 +0,0 @@
-# CI/CD Pipeline Validation & Enhancement
-
-## Initial data:
- - Problem description: `@_docs/00_problem/problem_description.md`
- - Restrictions: `@_docs/00_problem/restrictions.md`
- - Full Solution Description: `@_docs/01_solution/solution.md`
- - Components: `@_docs/02_components`
- - Environment Strategy: `@_docs/00_templates/environment_strategy.md`
-
-## Role
-  You are a DevOps engineer
-
-## Task
- - Review existing CI/CD pipeline configuration
- - Validate all stages are working correctly
- - Optimize pipeline performance (parallelization, caching)
- - Ensure test coverage gates are enforced
- - Verify security scanning is properly configured
- - Add missing quality gates
-
-## Checklist
-
-### Pipeline Health
- - [ ] All stages execute successfully
- - [ ] Build time is acceptable (<10 min for most projects)
- - [ ] Caching is properly configured (dependencies, build artifacts)
- - [ ] Parallel execution where possible
-
-### Quality Gates
- - [ ] Code coverage threshold enforced (minimum 75%)
- - [ ] Linting errors block merge
- - [ ] Security vulnerabilities block merge (critical/high)
- - [ ] All tests must pass
-
-### Environment Deployments
- - [ ] Staging deployment works on merge to stage branch
- - [ ] Environment variables properly configured per environment
- - [ ] Secrets are securely managed (not in code)
- - [ ] Rollback procedure documented
-
-### Monitoring
- - [ ] Build notifications configured (Slack, email, etc.)
- - [ ] Failed build alerts
- - [ ] Deployment success/failure notifications
-
-## Output
-
-### Pipeline Status Report
- - Current pipeline configuration summary
- - Issues found and fixes applied
- - Performance metrics (build times)
-
-### Recommended Improvements
- - Short-term improvements
- - Long-term optimizations
-
-### Quality Gate Configuration
- - Thresholds configured
- - Enforcement rules
-
-## Notes
- - Do not break existing functionality
- - Test changes in separate branch first
- - Document any manual steps required
@@ -1,122 +0,0 @@
-# Observability Planning
-
-## Initial data:
- - Problem description: `@_docs/00_problem/problem_description.md`
- - Full Solution Description: `@_docs/01_solution/solution.md`
- - Components: `@_docs/02_components`
- - Deployment Strategy: `@_docs/02_components/deployment_strategy.md`
-
-## Role
-  You are a Site Reliability Engineer (SRE)
-
-## Task
- - Define logging strategy across all components
- - Plan metrics collection and dashboards
- - Design distributed tracing (if applicable)
- - Establish alerting rules
- - Document incident response procedures
-
-## Output
-
-### Logging Strategy
-
-#### Log Levels
-| Level | Usage | Example |
-|-------|-------|---------|
-| ERROR | Exceptions, failures requiring attention | Database connection failed |
-| WARN | Potential issues, degraded performance | Retry attempt 2/3 |
-| INFO | Significant business events | User registered, Order placed |
-| DEBUG | Detailed diagnostic information | Request payload, Query params |
-
-#### Log Format
-```json
-{
-  "timestamp": "ISO8601",
-  "level": "INFO",
-  "service": "service-name",
-  "correlation_id": "uuid",
-  "message": "Event description",
-  "context": {}
-}
-```
-
-#### Log Storage
- Development: Console/file
- Staging: Centralized (ELK, CloudWatch, etc.)
- Production: Centralized with retention policy
-
-### Metrics
-
-#### System Metrics
- CPU usage
- Memory usage
- Disk I/O
- Network I/O
-
-#### Application Metrics
-| Metric | Type | Description |
-|--------|------|-------------|
-| request_count | Counter | Total requests |
-| request_duration | Histogram | Response time |
-| error_count | Counter | Failed requests |
-| active_connections | Gauge | Current connections |
-
-#### Business Metrics
- [Define based on acceptance criteria]
-
-### Distributed Tracing
-
-#### Trace Context
- Correlation ID propagation
- Span naming conventions
- Sampling strategy
-
-#### Integration Points
- HTTP headers
- Message queue metadata
- Database query tagging
-
-### Alerting
-
-#### Alert Categories
-| Severity | Response Time | Examples |
-|----------|---------------|----------|
-| Critical | 5 min | Service down, Data loss |
-| High | 30 min | High error rate, Performance degradation |
-| Medium | 4 hours | Elevated latency, Disk usage high |
-| Low | Next business day | Non-critical warnings |
-
-#### Alert Rules
-```yaml
-alerts:
-  - name: high_error_rate
-    condition: error_rate > 5%
-    duration: 5m
-    severity: high
-    
-  - name: service_down
-    condition: health_check_failed
-    duration: 1m
-    severity: critical
-```
-
-### Dashboards
-
-#### Operations Dashboard
- Service health status
- Request rate and error rate
- Response time percentiles
- Resource utilization
-
-#### Business Dashboard
- Key business metrics
- User activity
- Transaction volumes
-
-Store output to `_docs/02_components/observability_plan.md`
-
-## Notes
- - Follow the principle: "If it's not monitored, it's not in production"
- - Balance verbosity with cost
- - Ensure PII is not logged
- - Plan for log rotation and retention
@@ -0,0 +1,54 @@
+# Implementation Rollback
+
+## Role
+You are a DevOps engineer performing a controlled rollback of implementation batches.
+
+## Input
+- User specifies a target batch number or commit hash to roll back to
+- If not specified, present the list of available batch checkpoints and ask
+
+## Process
+
+1. Read `_docs/03_implementation/batch_*_report.md` files to identify all batch checkpoints with their commit hashes
+2. Present batch list to user with: batch number, date, tasks included, commit hash
+3. Determine which commits need to be reverted (all commits after the target batch)
+4. For each commit to revert (in reverse chronological order):
+   - Run `git revert <commit-hash> --no-edit`
+   - Verify no merge conflicts; if conflicts occur, ask user for resolution
+5. Run the full test suite to verify rollback integrity
+6. If tests fail, report failures and ask user how to proceed
+7. Reset Jira ticket statuses for all reverted tasks back to "To Do" via Jira MCP
+8. Commit the revert with message: `[ROLLBACK] Reverted to batch [N]: [task list]`
+
+## Output
+
+Write `_docs/03_implementation/rollback_report.md`:
+
+```markdown
+# Rollback Report
+
+**Date**: [YYYY-MM-DD]
+**Target**: Batch [N] (commit [hash])
+**Reverted Batches**: [list]
+
+## Reverted Tasks
+
+| Task | Batch | Status Before | Status After |
+|------|-------|--------------|-------------|
+| [JIRA-ID] | [batch #] | In Testing | To Do |
+
+## Test Results
+- [pass/fail count]
+
+## Jira Updates
+- [list of ticket transitions]
+
+## Notes
+- [any conflicts, manual steps, or issues encountered]
+```
+
+## Safety Rules
+- Never force-push; always use `git revert` to preserve history
+- Always run tests after rollback
+- Always update Jira statuses for reverted tasks
+- If rollback fails midway, stop and report — do not leave the codebase in a partial state
@@ -0,0 +1,8 @@
+---
+description: "Git workflow: work on dev branch, commit message format with Jira IDs"
+alwaysApply: true
+---
+# Git Workflow
+
+- Work on the `dev` branch
+- Commit message format: `[JIRA-ID-1] [JIRA-ID-2] Summary of changes`
@@ -124,15 +124,35 @@ At the start of execution, create a TodoWrite with all applicable steps. Update
 **Goal**: Produce `01_initial_structure.md` — the first task describing the project skeleton
 **Constraints**: This is a plan document, not code. The `/implement` skill executes it.

-1. Read architecture.md, all component specs, and system-flows.md from PLANS_DIR
+1. Read architecture.md, all component specs, system-flows.md, data_model.md, and `deployment/` from PLANS_DIR
 2. Read problem, solution, and restrictions from `_docs/00_problem/` and `_docs/01_solution/`
 3. Research best implementation patterns for the identified tech stack
 4. Document the structure plan using `templates/initial-structure-task.md`

+The bootstrap structure plan must include:
+- Project folder layout with all component directories
+- Shared models, interfaces, and DTOs
+- Dockerfile per component (multi-stage, non-root, health checks, pinned base images)
+- `docker-compose.yml` for local development (all components + database + dependencies)
+- `docker-compose.test.yml` for integration test environment (black-box test runner)
+- `.dockerignore`
+- CI/CD pipeline file (`.github/workflows/ci.yml` or `azure-pipelines.yml`) with stages from `deployment/ci_cd_pipeline.md`
+- Database migration setup and initial seed data scripts
+- Observability configuration: structured logging setup, health check endpoints (`/health/live`, `/health/ready`), metrics endpoint (`/metrics`)
+- Environment variable documentation (`.env.example`)
+- Test structure with unit and integration test locations
+
 **Self-verification**:
 - [ ] All components have corresponding folders in the layout
 - [ ] All inter-component interfaces have DTOs defined
- [ ] CI/CD stages cover build, lint, test, security, deploy
+- [ ] Dockerfile defined for each component
+- [ ] `docker-compose.yml` covers all components and dependencies
+- [ ] `docker-compose.test.yml` enables black-box integration testing
+- [ ] CI/CD pipeline file defined with lint, test, security, build, deploy stages
+- [ ] Database migration setup included
+- [ ] Health check endpoints specified for each service
+- [ ] Structured logging configuration included
+- [ ] `.env.example` with all required environment variables
 - [ ] Environment strategy covers dev, staging, production
 - [ ] Test structure includes unit and integration test locations

@@ -0,0 +1,363 @@
+---
+name: deploy
+description: |
+  Comprehensive deployment skill covering containerization, CI/CD pipeline, environment strategy, observability, and deployment procedures.
+  5-step workflow: Docker containerization, CI/CD pipeline definition, environment strategy, observability planning, deployment procedures.
+  Uses _docs/02_plans/deployment/ structure.
+  Trigger phrases:
+  - "deploy", "deployment", "deployment strategy"
+  - "CI/CD", "pipeline", "containerize"
+  - "observability", "monitoring", "logging"
+  - "dockerize", "docker compose"
+category: ship
+tags: [deployment, docker, ci-cd, observability, monitoring, containerization]
+disable-model-invocation: true
+---
+
+# Deployment Planning
+
+Plan and document the full deployment lifecycle: containerize the application, define CI/CD pipelines, configure environments, set up observability, and document deployment procedures.
+
+## Core Principles
+
+- **Docker-first**: every component runs in a container; local dev, integration tests, and production all use Docker
+- **Infrastructure as code**: all deployment configuration is version-controlled
+- **Observability built-in**: logging, metrics, and tracing are part of the deployment plan, not afterthoughts
+- **Environment parity**: dev, staging, and production environments mirror each other as closely as possible
+- **Save immediately**: write artifacts to disk after each step; never accumulate unsaved work
+- **Ask, don't assume**: when infrastructure constraints or preferences are unclear, ask the user
+- **Plan, don't code**: this workflow produces deployment documents and specifications, not implementation code
+
+## Context Resolution
+
+Fixed paths:
+
+- PLANS_DIR: `_docs/02_plans/`
+- DEPLOY_DIR: `_docs/02_plans/deployment/`
+- ARCHITECTURE: `_docs/02_plans/architecture.md`
+- COMPONENTS_DIR: `_docs/02_plans/components/`
+
+Announce the resolved paths to the user before proceeding.
+
+## Input Specification
+
+### Required Files
+
+| File | Purpose |
+|------|---------|
+| `_docs/00_problem/problem.md` | Problem description and context |
+| `_docs/00_problem/restrictions.md` | Constraints and limitations |
+| `_docs/01_solution/solution.md` | Finalized solution |
+| `PLANS_DIR/architecture.md` | Architecture from plan skill |
+| `PLANS_DIR/components/` | Component specs |
+
+### Prerequisite Checks (BLOCKING)
+
+1. `architecture.md` exists — **STOP if missing**, run `/plan` first
+2. At least one component spec exists in `PLANS_DIR/components/` — **STOP if missing**
+3. Create DEPLOY_DIR if it does not exist
+4. If DEPLOY_DIR already contains artifacts, ask user: **resume from last checkpoint or start fresh?**
+
+## Artifact Management
+
+### Directory Structure
+
+```
+DEPLOY_DIR/
+├── containerization.md
+├── ci_cd_pipeline.md
+├── environment_strategy.md
+├── observability.md
+└── deployment_procedures.md
+```
+
+### Save Timing
+
+| Step | Save immediately after | Filename |
+|------|------------------------|----------|
+| Step 1 | Containerization plan complete | `containerization.md` |
+| Step 2 | CI/CD pipeline defined | `ci_cd_pipeline.md` |
+| Step 3 | Environment strategy documented | `environment_strategy.md` |
+| Step 4 | Observability plan complete | `observability.md` |
+| Step 5 | Deployment procedures documented | `deployment_procedures.md` |
+
+### Resumability
+
+If DEPLOY_DIR already contains artifacts:
+
+1. List existing files and match to the save timing table
+2. Identify the last completed step
+3. Resume from the next incomplete step
+4. Inform the user which steps are being skipped
+
+## Progress Tracking
+
+At the start of execution, create a TodoWrite with all steps (1 through 5). Update status as each step completes.
+
+## Workflow
+
+### Step 1: Containerization
+
+**Role**: DevOps / Platform engineer
+**Goal**: Define Docker configuration for every component, local development, and integration test environments
+**Constraints**: Plan only — no Dockerfile creation. Describe what each Dockerfile should contain.
+
+1. Read architecture.md and all component specs
+2. Read restrictions.md for infrastructure constraints
+3. Research best Docker practices for the project's tech stack (multi-stage builds, base image selection, layer optimization)
+4. For each component, define:
+   - Base image (pinned version, prefer alpine/distroless for production)
+   - Build stages (dependency install, build, production)
+   - Non-root user configuration
+   - Health check endpoint and command
+   - Exposed ports
+   - `.dockerignore` contents
+5. Define `docker-compose.yml` for local development:
+   - All application components
+   - Database (Postgres) with named volume
+   - Any message queues, caches, or external service mocks
+   - Shared network
+   - Environment variable files (`.env.dev`)
+6. Define `docker-compose.test.yml` for integration tests:
+   - Application components under test
+   - Test runner container (black-box, no internal imports)
+   - Isolated database with seed data
+   - All tests runnable via `docker compose -f docker-compose.test.yml up --abort-on-container-exit`
+7. Define image tagging strategy: `<registry>/<project>/<component>:<git-sha>` for CI, `latest` for local dev only
+
+**Self-verification**:
+- [ ] Every component has a Dockerfile specification
+- [ ] Multi-stage builds specified for all production images
+- [ ] Non-root user for all containers
+- [ ] Health checks defined for every service
+- [ ] docker-compose.yml covers all components + dependencies
+- [ ] docker-compose.test.yml enables black-box integration testing
+- [ ] `.dockerignore` defined
+
+**Save action**: Write `containerization.md` using `templates/containerization.md`
+
+**BLOCKING**: Present containerization plan to user. Do NOT proceed until confirmed.
+
+---
+
+### Step 2: CI/CD Pipeline
+
+**Role**: DevOps engineer
+**Goal**: Define the CI/CD pipeline with quality gates, security scanning, and multi-environment deployment
+**Constraints**: Pipeline definition only — produce YAML specification, not implementation
+
+1. Read architecture.md for tech stack and deployment targets
+2. Read restrictions.md for CI/CD constraints (cloud provider, registry, etc.)
+3. Research CI/CD best practices for the project's platform (GitHub Actions / Azure Pipelines)
+4. Define pipeline stages:
+
+| Stage | Trigger | Steps | Quality Gate |
+|-------|---------|-------|-------------|
+| **Lint** | Every push | Run linters per language (black, rustfmt, prettier, dotnet format) | Zero errors |
+| **Test** | Every push | Unit tests, integration tests, coverage report | 75%+ coverage |
+| **Security** | Every push | Dependency audit, SAST scan (Semgrep/SonarQube), image scan (Trivy) | Zero critical/high CVEs |
+| **Build** | PR merge to dev | Build Docker images, tag with git SHA | Build succeeds |
+| **Push** | After build | Push to container registry | Push succeeds |
+| **Deploy Staging** | After push | Deploy to staging environment | Health checks pass |
+| **Smoke Tests** | After staging deploy | Run critical path tests against staging | All pass |
+| **Deploy Production** | Manual approval | Deploy to production | Health checks pass |
+
+5. Define caching strategy: dependency caches, Docker layer caches, build artifact caches
+6. Define parallelization: which stages can run concurrently
+7. Define notifications: build failures, deployment status, security alerts
+
+**Self-verification**:
+- [ ] All pipeline stages defined with triggers and gates
+- [ ] Coverage threshold enforced (75%+)
+- [ ] Security scanning included (dependencies + images + SAST)
+- [ ] Caching configured for dependencies and Docker layers
+- [ ] Multi-environment deployment (staging → production)
+- [ ] Rollback procedure referenced
+- [ ] Notifications configured
+
+**Save action**: Write `ci_cd_pipeline.md` using `templates/ci_cd_pipeline.md`
+
+---
+
+### Step 3: Environment Strategy
+
+**Role**: Platform engineer
+**Goal**: Define environment configuration, secrets management, and environment parity
+**Constraints**: Strategy document — no secrets or credentials in output
+
+1. Define environments:
+
+| Environment | Purpose | Infrastructure | Data |
+|-------------|---------|---------------|------|
+| **Development** | Local developer workflow | docker-compose, local volumes | Seed data, mocks for external APIs |
+| **Staging** | Pre-production validation | Mirrors production topology | Anonymized production-like data |
+| **Production** | Live system | Full infrastructure | Real data |
+
+2. Define environment variable management:
+   - `.env.example` with all required variables (no real values)
+   - Per-environment variable sources (`.env` for dev, secret manager for staging/prod)
+   - Validation: fail fast on missing required variables at startup
+3. Define secrets management:
+   - Never commit secrets to version control
+   - Development: `.env` files (git-ignored)
+   - Staging/Production: secret manager (AWS Secrets Manager / Azure Key Vault / Vault)
+   - Rotation policy
+4. Define database management per environment:
+   - Development: Docker Postgres with named volume, seed data
+   - Staging: managed Postgres, migrations applied via CI/CD
+   - Production: managed Postgres, migrations require approval
+
+**Self-verification**:
+- [ ] All three environments defined with clear purpose
+- [ ] Environment variable documentation complete (`.env.example`)
+- [ ] No secrets in any output document
+- [ ] Secret manager specified for staging/production
+- [ ] Database strategy per environment
+
+**Save action**: Write `environment_strategy.md` using `templates/environment_strategy.md`
+
+---
+
+### Step 4: Observability
+
+**Role**: Site Reliability Engineer (SRE)
+**Goal**: Define logging, metrics, tracing, and alerting strategy
+**Constraints**: Strategy document — describe what to implement, not how to wire it
+
+1. Read architecture.md and component specs for service boundaries
+2. Research observability best practices for the tech stack
+
+**Logging**:
+- Structured JSON to stdout/stderr (no file logging in containers)
+- Fields: `timestamp` (ISO 8601), `level`, `service`, `correlation_id`, `message`, `context`
+- Levels: ERROR (exceptions), WARN (degraded), INFO (business events), DEBUG (diagnostics, dev only)
+- No PII in logs
+- Retention: dev = console, staging = 7 days, production = 30 days
+
+**Metrics**:
+- Expose Prometheus-compatible `/metrics` endpoint per service
+- System metrics: CPU, memory, disk, network
+- Application metrics: `request_count`, `request_duration` (histogram), `error_count`, `active_connections`
+- Business metrics: derived from acceptance criteria
+- Collection interval: 15s
+
+**Distributed Tracing**:
+- OpenTelemetry SDK integration
+- Trace context propagation via HTTP headers and message queue metadata
+- Span naming: `<service>.<operation>`
+- Sampling: 100% in dev/staging, 10% in production (adjust based on volume)
+
+**Alerting**:
+
+| Severity | Response Time | Condition Examples |
+|----------|---------------|-------------------|
+| Critical | 5 min | Service down, data loss, health check failed |
+| High | 30 min | Error rate > 5%, P95 latency > 2x baseline |
+| Medium | 4 hours | Disk > 80%, elevated latency |
+| Low | Next business day | Non-critical warnings |
+
+**Dashboards**:
+- Operations: service health, request rate, error rate, response time percentiles, resource utilization
+- Business: key business metrics from acceptance criteria
+
+**Self-verification**:
+- [ ] Structured logging format defined with required fields
+- [ ] Metrics endpoint specified per service
+- [ ] OpenTelemetry tracing configured
+- [ ] Alert severities with response times defined
+- [ ] Dashboards cover operations and business metrics
+- [ ] PII exclusion from logs addressed
+
+**Save action**: Write `observability.md` using `templates/observability.md`
+
+---
+
+### Step 5: Deployment Procedures
+
+**Role**: DevOps / Platform engineer
+**Goal**: Define deployment strategy, rollback procedures, health checks, and deployment checklist
+**Constraints**: Procedures document — no implementation
+
+1. Define deployment strategy:
+   - Preferred pattern: blue-green / rolling / canary (choose based on architecture)
+   - Zero-downtime requirement for production
+   - Graceful shutdown: 30-second grace period for in-flight requests
+   - Database migration ordering: migrate before deploy, backward-compatible only
+
+2. Define health checks:
+
+| Check | Type | Endpoint | Interval | Threshold |
+|-------|------|----------|----------|-----------|
+| Liveness | HTTP GET | `/health/live` | 10s | 3 failures → restart |
+| Readiness | HTTP GET | `/health/ready` | 5s | 3 failures → remove from LB |
+| Startup | HTTP GET | `/health/ready` | 5s | 30 attempts max |
+
+3. Define rollback procedures:
+   - Trigger criteria: health check failures, error rate spike, critical alert
+   - Rollback steps: redeploy previous image tag, verify health, rollback database if needed
+   - Communication: notify stakeholders during rollback
+   - Post-mortem: required after every production rollback
+
+4. Define deployment checklist:
+   - [ ] All tests pass in CI
+   - [ ] Security scan clean (zero critical/high CVEs)
+   - [ ] Database migrations reviewed and tested
+   - [ ] Environment variables configured
+   - [ ] Health check endpoints responding
+   - [ ] Monitoring alerts configured
+   - [ ] Rollback plan documented and tested
+   - [ ] Stakeholders notified
+
+**Self-verification**:
+- [ ] Deployment strategy chosen and justified
+- [ ] Zero-downtime approach specified
+- [ ] Health checks defined (liveness, readiness, startup)
+- [ ] Rollback trigger criteria and steps documented
+- [ ] Deployment checklist complete
+
+**Save action**: Write `deployment_procedures.md` using `templates/deployment_procedures.md`
+
+**BLOCKING**: Present deployment procedures to user. Do NOT proceed until confirmed.
+
+---
+
+## Escalation Rules
+
+| Situation | Action |
+|-----------|--------|
+| Unknown cloud provider or hosting | **ASK user** |
+| Container registry not specified | **ASK user** |
+| CI/CD platform preference unclear | **ASK user** — default to GitHub Actions |
+| Secret manager not chosen | **ASK user** |
+| Deployment pattern trade-offs | **ASK user** with recommendation |
+| Missing architecture.md | **STOP** — run `/plan` first |
+
+## Common Mistakes
+
+- **Implementing during planning**: this workflow produces documents, not Dockerfiles or pipeline YAML
+- **Hardcoding secrets**: never include real credentials in deployment documents
+- **Ignoring integration test containerization**: the test environment must be containerized alongside the app
+- **Skipping BLOCKING gates**: never proceed past a BLOCKING marker without user confirmation
+- **Using `:latest` tags**: always pin base image versions
+- **Forgetting observability**: logging, metrics, and tracing are deployment concerns, not post-deployment additions
+
+## Methodology Quick Reference
+
+```
+┌────────────────────────────────────────────────────────────────┐
+│            Deployment Planning (5-Step Method)                   │
+├────────────────────────────────────────────────────────────────┤
+│ PREREQ: architecture.md + component specs exist                 │
+│                                                                │
+│ 1. Containerization  → containerization.md                      │
+│    [BLOCKING: user confirms Docker plan]                        │
+│ 2. CI/CD Pipeline    → ci_cd_pipeline.md                        │
+│ 3. Environment       → environment_strategy.md                  │
+│ 4. Observability     → observability.md                         │
+│ 5. Procedures        → deployment_procedures.md                 │
+│    [BLOCKING: user confirms deployment plan]                    │
+├────────────────────────────────────────────────────────────────┤
+│ Principles: Docker-first · IaC · Observability built-in         │
+│             Environment parity · Save immediately               │
+└────────────────────────────────────────────────────────────────┘
+```
@@ -0,0 +1,87 @@
+# CI/CD Pipeline Template
+
+Save as `_docs/02_plans/deployment/ci_cd_pipeline.md`.
+
+---
+
+```markdown
+# [System Name] — CI/CD Pipeline
+
+## Pipeline Overview
+
+| Stage | Trigger | Quality Gate |
+|-------|---------|-------------|
+| Lint | Every push | Zero lint errors |
+| Test | Every push | 75%+ coverage, all tests pass |
+| Security | Every push | Zero critical/high CVEs |
+| Build | PR merge to dev | Docker build succeeds |
+| Push | After build | Images pushed to registry |
+| Deploy Staging | After push | Health checks pass |
+| Smoke Tests | After staging deploy | Critical paths pass |
+| Deploy Production | Manual approval | Health checks pass |
+
+## Stage Details
+
+### Lint
+- [Language-specific linters and formatters]
+- Runs in parallel per language
+
+### Test
+- Unit tests: [framework and command]
+- Integration tests: [framework and command, uses docker-compose.test.yml]
+- Coverage threshold: 75% overall, 90% critical paths
+- Coverage report published as pipeline artifact
+
+### Security
+- Dependency audit: [tool, e.g., npm audit / pip-audit / dotnet list package --vulnerable]
+- SAST scan: [tool, e.g., Semgrep / SonarQube]
+- Image scan: Trivy on built Docker images
+- Block on: critical or high severity findings
+
+### Build
+- Docker images built using multi-stage Dockerfiles
+- Tagged with git SHA: `<registry>/<component>:<sha>`
+- Build cache: Docker layer cache via CI cache action
+
+### Push
+- Registry: [container registry URL]
+- Authentication: [method]
+
+### Deploy Staging
+- Deployment method: [docker compose / Kubernetes / cloud service]
+- Pre-deploy: run database migrations
+- Post-deploy: verify health check endpoints
+- Automated rollback on health check failure
+
+### Smoke Tests
+- Subset of integration tests targeting staging environment
+- Validates critical user flows
+- Timeout: [maximum duration]
+
+### Deploy Production
+- Requires manual approval via [mechanism]
+- Deployment strategy: [blue-green / rolling / canary]
+- Pre-deploy: database migration review
+- Post-deploy: health checks + monitoring for 15 min
+
+## Caching Strategy
+
+| Cache | Key | Restore Keys |
+|-------|-----|-------------|
+| Dependencies | [lockfile hash] | [partial match] |
+| Docker layers | [Dockerfile hash] | [partial match] |
+| Build artifacts | [source hash] | [partial match] |
+
+## Parallelization
+
+[Diagram or description of which stages run concurrently]
+
+## Notifications
+
+| Event | Channel | Recipients |
+|-------|---------|-----------|
+| Build failure | [Slack/email] | [team] |
+| Security alert | [Slack/email] | [team + security] |
+| Deploy success | [Slack] | [team] |
+| Deploy failure | [Slack/email + PagerDuty] | [on-call] |
+```
@@ -0,0 +1,94 @@
+# Containerization Plan Template
+
+Save as `_docs/02_plans/deployment/containerization.md`.
+
+---
+
+```markdown
+# [System Name] — Containerization
+
+## Component Dockerfiles
+
+### [Component Name]
+
+| Property | Value |
+|----------|-------|
+| Base image | [e.g., mcr.microsoft.com/dotnet/aspnet:8.0-alpine] |
+| Build image | [e.g., mcr.microsoft.com/dotnet/sdk:8.0-alpine] |
+| Stages | [dependency install → build → production] |
+| User | [non-root user name] |
+| Health check | [endpoint and command] |
+| Exposed ports | [port list] |
+| Key build args | [if any] |
+
+### [Repeat for each component]
+
+## Docker Compose — Local Development
+
+```yaml
+# docker-compose.yml structure
+services:
+  [component]:
+    build: ./[path]
+    ports: ["host:container"]
+    environment: [reference .env.dev]
+    depends_on: [dependencies with health condition]
+    healthcheck: [command, interval, timeout, retries]
+
+  db:
+    image: [postgres:version-alpine]
+    volumes: [named volume]
+    environment: [credentials from .env.dev]
+    healthcheck: [pg_isready]
+
+volumes:
+  [named volumes]
+
+networks:
+  [shared network]
+```
+
+## Docker Compose — Integration Tests
+
+```yaml
+# docker-compose.test.yml structure
+services:
+  [app components under test]
+
+  test-runner:
+    build: ./tests/integration
+    depends_on: [app components with health condition]
+    environment: [test configuration]
+    # Exit code determines test pass/fail
+
+  db:
+    image: [postgres:version-alpine]
+    volumes: [seed data mount]
+```
+
+Run: `docker compose -f docker-compose.test.yml up --abort-on-container-exit`
+
+## Image Tagging Strategy
+
+| Context | Tag Format | Example |
+|---------|-----------|---------|
+| CI build | `<registry>/<project>/<component>:<git-sha>` | `ghcr.io/org/api:a1b2c3d` |
+| Release | `<registry>/<project>/<component>:<semver>` | `ghcr.io/org/api:1.2.0` |
+| Local dev | `<component>:latest` | `api:latest` |
+
+## .dockerignore
+
+```
+.git
+.cursor
+_docs
+_standalone
+node_modules
+**/bin
+**/obj
+**/__pycache__
+*.md
+.env*
+docker-compose*.yml
+```
+```
@@ -0,0 +1,103 @@
+# Deployment Procedures Template
+
+Save as `_docs/02_plans/deployment/deployment_procedures.md`.
+
+---
+
+```markdown
+# [System Name] — Deployment Procedures
+
+## Deployment Strategy
+
+**Pattern**: [blue-green / rolling / canary]
+**Rationale**: [why this pattern fits the architecture]
+**Zero-downtime**: required for production deployments
+
+### Graceful Shutdown
+
+- Grace period: 30 seconds for in-flight requests
+- Sequence: stop accepting new requests → drain connections → shutdown
+- Container orchestrator: `terminationGracePeriodSeconds: 40`
+
+### Database Migration Ordering
+
+- Migrations run **before** new code deploys
+- All migrations must be backward-compatible (old code works with new schema)
+- Irreversible migrations require explicit approval
+
+## Health Checks
+
+| Check | Type | Endpoint | Interval | Failure Threshold | Action |
+|-------|------|----------|----------|-------------------|--------|
+| Liveness | HTTP GET | `/health/live` | 10s | 3 failures | Restart container |
+| Readiness | HTTP GET | `/health/ready` | 5s | 3 failures | Remove from load balancer |
+| Startup | HTTP GET | `/health/ready` | 5s | 30 attempts | Kill and recreate |
+
+### Health Check Responses
+
+- `/health/live`: returns 200 if process is running (no dependency checks)
+- `/health/ready`: returns 200 if all dependencies (DB, cache, queues) are reachable
+
+## Staging Deployment
+
+1. CI/CD builds and pushes Docker images tagged with git SHA
+2. Run database migrations against staging
+3. Deploy new images to staging environment
+4. Wait for health checks to pass (readiness probe)
+5. Run smoke tests against staging
+6. If smoke tests fail: automatic rollback to previous image
+
+## Production Deployment
+
+1. **Approval**: manual approval required via [mechanism]
+2. **Pre-deploy checks**:
+   - [ ] Staging smoke tests passed
+   - [ ] Security scan clean
+   - [ ] Database migration reviewed
+   - [ ] Monitoring alerts configured
+   - [ ] Rollback plan confirmed
+3. **Deploy**: apply deployment strategy (blue-green / rolling / canary)
+4. **Verify**: health checks pass, error rate stable, latency within baseline
+5. **Monitor**: observe dashboards for 15 minutes post-deploy
+6. **Finalize**: mark deployment as successful or trigger rollback
+
+## Rollback Procedures
+
+### Trigger Criteria
+
+- Health check failures persist after deploy
+- Error rate exceeds 5% for more than 5 minutes
+- Critical alert fires within 15 minutes of deploy
+- Manual decision by on-call engineer
+
+### Rollback Steps
+
+1. Redeploy previous Docker image tag (from CI/CD artifact)
+2. Verify health checks pass
+3. If database migration was applied:
+   - Run DOWN migration if reversible
+   - If irreversible: assess data impact, escalate if needed
+4. Notify stakeholders
+5. Schedule post-mortem within 24 hours
+
+### Post-Mortem
+
+Required after every production rollback:
+- Timeline of events
+- Root cause
+- What went wrong
+- Prevention measures
+
+## Deployment Checklist
+
+- [ ] All tests pass in CI
+- [ ] Security scan clean (zero critical/high CVEs)
+- [ ] Docker images built and pushed
+- [ ] Database migrations reviewed and tested
+- [ ] Environment variables configured for target environment
+- [ ] Health check endpoints verified
+- [ ] Monitoring alerts configured
+- [ ] Rollback plan documented and tested
+- [ ] Stakeholders notified of deployment window
+- [ ] On-call engineer available during deployment
+```
@@ -0,0 +1,61 @@
+# Environment Strategy Template
+
+Save as `_docs/02_plans/deployment/environment_strategy.md`.
+
+---
+
+```markdown
+# [System Name] — Environment Strategy
+
+## Environments
+
+| Environment | Purpose | Infrastructure | Data Source |
+|-------------|---------|---------------|-------------|
+| Development | Local developer workflow | docker-compose | Seed data, mocked externals |
+| Staging | Pre-production validation | [mirrors production] | Anonymized production-like data |
+| Production | Live system | [full infrastructure] | Real data |
+
+## Environment Variables
+
+### Required Variables
+
+| Variable | Purpose | Dev Default | Staging/Prod Source |
+|----------|---------|-------------|-------------------|
+| `DATABASE_URL` | Postgres connection | `postgres://dev:dev@db:5432/app` | Secret manager |
+| [add all required variables] | | | |
+
+### `.env.example`
+
+```env
+# Copy to .env and fill in values
+DATABASE_URL=postgres://user:pass@host:5432/dbname
+# [all required variables with placeholder values]
+```
+
+### Variable Validation
+
+All services validate required environment variables at startup and fail fast with a clear error message if any are missing.
+
+## Secrets Management
+
+| Environment | Method | Tool |
+|-------------|--------|------|
+| Development | `.env` file (git-ignored) | dotenv |
+| Staging | Secret manager | [AWS Secrets Manager / Azure Key Vault / Vault] |
+| Production | Secret manager | [AWS Secrets Manager / Azure Key Vault / Vault] |
+
+Rotation policy: [frequency and procedure]
+
+## Database Management
+
+| Environment | Type | Migrations | Data |
+|-------------|------|-----------|------|
+| Development | Docker Postgres, named volume | Applied on container start | Seed data via init script |
+| Staging | Managed Postgres | Applied via CI/CD pipeline | Anonymized production snapshot |
+| Production | Managed Postgres | Applied via CI/CD with approval | Live data |
+
+Migration rules:
+- All migrations must be backward-compatible (support old and new code simultaneously)
+- Reversible migrations required (DOWN/rollback script)
+- Production migrations require review before apply
+```
@@ -0,0 +1,132 @@
+# Observability Template
+
+Save as `_docs/02_plans/deployment/observability.md`.
+
+---
+
+```markdown
+# [System Name] — Observability
+
+## Logging
+
+### Format
+
+Structured JSON to stdout/stderr. No file-based logging in containers.
+
+```json
+{
+  "timestamp": "ISO8601",
+  "level": "INFO",
+  "service": "service-name",
+  "correlation_id": "uuid",
+  "message": "Event description",
+  "context": {}
+}
+```
+
+### Log Levels
+
+| Level | Usage | Example |
+|-------|-------|---------|
+| ERROR | Exceptions, failures requiring attention | Database connection failed |
+| WARN | Potential issues, degraded performance | Retry attempt 2/3 |
+| INFO | Significant business events | User registered, Order placed |
+| DEBUG | Detailed diagnostics (dev/staging only) | Request payload, Query params |
+
+### Retention
+
+| Environment | Destination | Retention |
+|-------------|-------------|-----------|
+| Development | Console | Session |
+| Staging | [log aggregator] | 7 days |
+| Production | [log aggregator] | 30 days |
+
+### PII Rules
+
+- Never log passwords, tokens, or session IDs
+- Mask email addresses and personal identifiers
+- Log user IDs (opaque) instead of usernames
+
+## Metrics
+
+### Endpoints
+
+Every service exposes Prometheus-compatible metrics at `/metrics`.
+
+### Application Metrics
+
+| Metric | Type | Description |
+|--------|------|-------------|
+| `request_count` | Counter | Total HTTP requests by method, path, status |
+| `request_duration_seconds` | Histogram | Response time by method, path |
+| `error_count` | Counter | Failed requests by type |
+| `active_connections` | Gauge | Current open connections |
+
+### System Metrics
+
+- CPU usage, Memory usage, Disk I/O, Network I/O
+
+### Business Metrics
+
+| Metric | Type | Description | Source |
+|--------|------|-------------|--------|
+| [from acceptance criteria] | | | |
+
+Collection interval: 15 seconds
+
+## Distributed Tracing
+
+### Configuration
+
+- SDK: OpenTelemetry
+- Propagation: W3C Trace Context via HTTP headers
+- Span naming: `<service>.<operation>`
+
+### Sampling
+
+| Environment | Rate | Rationale |
+|-------------|------|-----------|
+| Development | 100% | Full visibility |
+| Staging | 100% | Full visibility |
+| Production | 10% | Balance cost vs observability |
+
+### Integration Points
+
+- HTTP requests: automatic instrumentation
+- Database queries: automatic instrumentation
+- Message queues: manual span creation on publish/consume
+
+## Alerting
+
+| Severity | Response Time | Conditions |
+|----------|---------------|-----------|
+| Critical | 5 min | Service unreachable, health check failed for 1 min, data loss detected |
+| High | 30 min | Error rate > 5% for 5 min, P95 latency > 2x baseline for 10 min |
+| Medium | 4 hours | Disk usage > 80%, elevated latency, connection pool exhaustion |
+| Low | Next business day | Non-critical warnings, deprecated API usage |
+
+### Notification Channels
+
+| Severity | Channel |
+|----------|---------|
+| Critical | [PagerDuty / phone] |
+| High | [Slack + email] |
+| Medium | [Slack] |
+| Low | [Dashboard only] |
+
+## Dashboards
+
+### Operations Dashboard
+
+- Service health status (up/down per component)
+- Request rate and error rate
+- Response time percentiles (P50, P95, P99)
+- Resource utilization (CPU, memory per container)
+- Active alerts
+
+### Business Dashboard
+
+- [Key business metrics from acceptance criteria]
+- [User activity indicators]
+- [Transaction volumes]
+```
@@ -1,7 +1,7 @@
 ---
 name: plan
 description: |
-  Decompose a solution into architecture, system flows, components, tests, and Jira epics.
+  Decompose a solution into architecture, data model, deployment plan, system flows, components, tests, and Jira epics.
  Systematic 6-step planning workflow with BLOCKING gates, self-verification, and structured artifact management.
  Uses _docs/ + _docs/02_plans/ structure.
  Trigger phrases:
@@ -15,7 +15,7 @@ disable-model-invocation: true

 # Solution Planning

-Decompose a problem and solution into architecture, system flows, components, tests, and Jira epics through a systematic 6-step workflow.
+Decompose a problem and solution into architecture, data model, deployment plan, system flows, components, tests, and Jira epics through a systematic 6-step workflow.

 ## Core Principles

@@ -91,6 +91,13 @@ PLANS_DIR/
 │   └── traceability_matrix.md
 ├── architecture.md
 ├── system-flows.md
+├── data_model.md
+├── deployment/
+│   ├── containerization.md
+│   ├── ci_cd_pipeline.md
+│   ├── environment_strategy.md
+│   ├── observability.md
+│   └── deployment_procedures.md
 ├── risk_mitigations.md
 ├── risk_mitigations_02.md          (iterative, ## as sequence)
 ├── components/
@@ -124,6 +131,8 @@ PLANS_DIR/
 | Step 1 | Integration traceability matrix | `integration_tests/traceability_matrix.md` |
 | Step 2 | Architecture analysis complete | `architecture.md` |
 | Step 2 | System flows documented | `system-flows.md` |
+| Step 2 | Data model documented | `data_model.md` |
+| Step 2 | Deployment plan complete | `deployment/` (5 files) |
 | Step 3 | Each component analyzed | `components/[##]_[name]/description.md` |
 | Step 3 | Common helpers generated | `common-helpers/[##]_helper_[name].md` |
 | Step 3 | Diagrams generated | `diagrams/` |
@@ -210,9 +219,11 @@ Capture any new questions, findings, or insights that arise during test specific
 ### Step 2: Solution Analysis

 **Role**: Professional software architect
-**Goal**: Produce `architecture.md` and `system-flows.md` from the solution draft
+**Goal**: Produce `architecture.md`, `system-flows.md`, `data_model.md`, and `deployment/` from the solution draft
 **Constraints**: No code, no component-level detail yet; focus on system-level view

+#### Phase 2a: Architecture & Flows
+
 1. Read all input files thoroughly
 2. Incorporate findings, questions, and insights discovered during Step 1 (integration tests)
 3. Research unknown or questionable topics via internet; ask user about ambiguities
@@ -230,6 +241,56 @@ Capture any new questions, findings, or insights that arise during test specific

 **BLOCKING**: Present architecture summary to user. Do NOT proceed until user confirms.

+#### Phase 2b: Data Model
+
+**Role**: Professional software architect
+**Goal**: Produce a detailed data model document covering entities, relationships, and migration strategy
+
+1. Extract core entities from architecture.md and solution.md
+2. Define entity attributes, types, and constraints
+3. Define relationships between entities (Mermaid ERD)
+4. Define migration strategy: versioning tool (EF Core migrations / Alembic / sql-migrate), reversibility requirement, naming convention
+5. Define seed data requirements per environment (dev, staging)
+6. Define backward compatibility approach for schema changes (additive-only by default)
+
+**Self-verification**:
+- [ ] Every entity mentioned in architecture.md is defined
+- [ ] Relationships are explicit with cardinality
+- [ ] Migration strategy specifies reversibility requirement
+- [ ] Seed data requirements defined
+- [ ] Backward compatibility approach documented
+
+**Save action**: Write `data_model.md`
+
+#### Phase 2c: Deployment Planning
+
+**Role**: DevOps / Platform engineer
+**Goal**: Produce deployment plan covering containerization, CI/CD, environment strategy, observability, and deployment procedures
+
+Use the `/deploy` skill's templates as structure for each artifact:
+
+1. Read architecture.md and restrictions.md for infrastructure constraints
+2. Research Docker best practices for the project's tech stack
+3. Define containerization plan: Dockerfile per component, docker-compose for dev and tests
+4. Define CI/CD pipeline: stages, quality gates, caching, parallelization
+5. Define environment strategy: dev, staging, production with secrets management
+6. Define observability: structured logging, metrics, tracing, alerting
+7. Define deployment procedures: strategy, health checks, rollback, checklist
+
+**Self-verification**:
+- [ ] Every component has a Docker specification
+- [ ] CI/CD pipeline covers lint, test, security, build, deploy
+- [ ] Environment strategy covers dev, staging, production
+- [ ] Observability covers logging, metrics, tracing, alerting
+- [ ] Deployment procedures include rollback and health checks
+
+**Save action**: Write all 5 files under `deployment/`:
+- `containerization.md`
+- `ci_cd_pipeline.md`
+- `environment_strategy.md`
+- `observability.md`
+- `deployment_procedures.md`
+
 ---

 ### Step 3: Component Decomposition
@@ -375,6 +436,20 @@ Before writing the final report, verify ALL of the following:
 - [ ] Deployment model is defined
 - [ ] Integration test findings are reflected in architecture decisions

+### Data Model
+- [ ] Every entity from architecture.md is defined
+- [ ] Relationships have explicit cardinality
+- [ ] Migration strategy with reversibility requirement
+- [ ] Seed data requirements defined
+- [ ] Backward compatibility approach documented
+
+### Deployment
+- [ ] Containerization plan covers all components
+- [ ] CI/CD pipeline includes lint, test, security, build, deploy stages
+- [ ] Environment strategy covers dev, staging, production
+- [ ] Observability covers logging, metrics, tracing, alerting
+- [ ] Deployment procedures include rollback and health checks
+
 ### Components
 - [ ] Every component follows SRP
 - [ ] No circular dependencies
@@ -441,8 +516,10 @@ Before writing the final report, verify ALL of the following:
 │                                                                │
 │ 1. Integration Tests  → integration_tests/ (5 files)           │
 │    [BLOCKING: user confirms test coverage]                     │
-│ 2. Solution Analysis  → architecture.md, system-flows.md       │
+│ 2a. Architecture      → architecture.md, system-flows.md       │
 │    [BLOCKING: user confirms architecture]                      │
+│ 2b. Data Model        → data_model.md                          │
+│ 2c. Deployment        → deployment/ (5 files)                  │
 │ 3. Component Decompose → components/[##]_[name]/description    │
 │    [BLOCKING: user confirms decomposition]                     │
 │ 4. Review & Risk      → risk_mitigations.md                    │
@@ -0,0 +1,174 @@
+---
+name: retrospective
+description: |
+  Collect metrics from implementation batch reports and code review findings, analyze trends across cycles,
+  and produce improvement reports with actionable recommendations.
+  3-step workflow: collect metrics, analyze trends, produce report.
+  Outputs to _docs/05_metrics/.
+  Trigger phrases:
+  - "retrospective", "retro", "run retro"
+  - "metrics review", "feedback loop"
+  - "implementation metrics", "analyze trends"
+category: evolve
+tags: [retrospective, metrics, trends, improvement, feedback-loop]
+disable-model-invocation: true
+---
+
+# Retrospective
+
+Collect metrics from implementation artifacts, analyze trends across development cycles, and produce actionable improvement reports.
+
+## Core Principles
+
+- **Data-driven**: conclusions come from metrics, not impressions
+- **Actionable**: every finding must have a concrete improvement suggestion
+- **Cumulative**: each retrospective compares against previous ones to track progress
+- **Save immediately**: write artifacts to disk after each step
+- **Non-judgmental**: focus on process improvement, not blame
+
+## Context Resolution
+
+Fixed paths:
+
+- IMPL_DIR: `_docs/03_implementation/`
+- METRICS_DIR: `_docs/05_metrics/`
+- TASKS_DIR: `_docs/02_tasks/`
+
+Announce the resolved paths to the user before proceeding.
+
+## Prerequisite Checks (BLOCKING)
+
+1. `IMPL_DIR` exists and contains at least one `batch_*_report.md` — **STOP if missing** (nothing to analyze)
+2. Create METRICS_DIR if it does not exist
+3. Check for previous retrospective reports in METRICS_DIR to enable trend comparison
+
+## Artifact Management
+
+### Directory Structure
+
+```
+METRICS_DIR/
+├── retro_[YYYY-MM-DD].md
+├── retro_[YYYY-MM-DD].md
+└── ...
+```
+
+## Progress Tracking
+
+At the start of execution, create a TodoWrite with all steps (1 through 3). Update status as each step completes.
+
+## Workflow
+
+### Step 1: Collect Metrics
+
+**Role**: Data analyst
+**Goal**: Parse all implementation artifacts and extract quantitative metrics
+**Constraints**: Collection only — no interpretation yet
+
+#### Sources
+
+| Source | Metrics Extracted |
+|--------|------------------|
+| `batch_*_report.md` | Tasks per batch, batch count, task statuses (Done/Blocked/Partial) |
+| Code review sections in batch reports | PASS/FAIL/PASS_WITH_WARNINGS ratios, finding counts by severity and category |
+| Task spec files in TASKS_DIR | Complexity points per task, dependency count |
+| `FINAL_implementation_report.md` | Total tasks, total batches, overall duration |
+| Git log (if available) | Commits per batch, files changed per batch |
+
+#### Metrics to Compute
+
+**Implementation Metrics**:
+- Total tasks implemented
+- Total batches executed
+- Average tasks per batch
+- Average complexity points per batch
+- Total complexity points delivered
+
+**Quality Metrics**:
+- Code review pass rate (PASS / total reviews)
+- Code review findings by severity: Critical, High, Medium, Low counts
+- Code review findings by category: Bug, Spec-Gap, Security, Performance, Maintainability, Style, Scope
+- FAIL count (batches that required user intervention)
+
+**Efficiency Metrics**:
+- Blocked task count and reasons
+- Tasks completed on first attempt vs requiring fixes
+- Batch with most findings (identify problem areas)
+
+**Self-verification**:
+- [ ] All batch reports parsed
+- [ ] All metric categories computed
+- [ ] No batch reports missed
+
+---
+
+### Step 2: Analyze Trends
+
+**Role**: Process improvement analyst
+**Goal**: Identify patterns, recurring issues, and improvement opportunities
+**Constraints**: Analysis must be grounded in the metrics from Step 1
+
+1. If previous retrospective reports exist in METRICS_DIR, load the most recent one for comparison
+2. Identify patterns:
+   - **Recurring findings**: which code review categories appear most frequently?
+   - **Problem components**: which components/files generate the most findings?
+   - **Complexity accuracy**: do high-complexity tasks actually produce more issues?
+   - **Blocker patterns**: what types of blockers occur and can they be prevented?
+3. Compare against previous retrospective (if exists):
+   - Which metrics improved?
+   - Which metrics degraded?
+   - Were previous improvement actions effective?
+4. Identify top 3 improvement actions ranked by impact
+
+**Self-verification**:
+- [ ] Patterns are grounded in specific metrics
+- [ ] Comparison with previous retro included (if exists)
+- [ ] Top 3 actions are concrete and actionable
+
+---
+
+### Step 3: Produce Report
+
+**Role**: Technical writer
+**Goal**: Write a structured retrospective report with metrics, trends, and recommendations
+**Constraints**: Concise, data-driven, actionable
+
+Write `METRICS_DIR/retro_[YYYY-MM-DD].md` using `templates/retrospective-report.md` as structure.
+
+**Self-verification**:
+- [ ] All metrics from Step 1 included
+- [ ] Trend analysis from Step 2 included
+- [ ] Top 3 improvement actions clearly stated
+- [ ] Suggested rule/skill updates are specific
+
+**Save action**: Write `retro_[YYYY-MM-DD].md`
+
+Present the report summary to the user.
+
+---
+
+## Escalation Rules
+
+| Situation | Action |
+|-----------|--------|
+| No batch reports exist | **STOP** — nothing to analyze |
+| Batch reports have inconsistent format | **WARN user**, extract what is available |
+| No previous retrospective for comparison | PROCEED — report baseline metrics only |
+| Metrics suggest systemic issue (>50% FAIL rate) | **WARN user** — suggest immediate process review |
+
+## Methodology Quick Reference
+
+```
+┌────────────────────────────────────────────────────────────────┐
+│              Retrospective (3-Step Method)                       │
+├────────────────────────────────────────────────────────────────┤
+│ PREREQ: batch reports exist in _docs/03_implementation/         │
+│                                                                │
+│ 1. Collect Metrics  → parse batch reports, compute metrics      │
+│ 2. Analyze Trends   → patterns, comparison, improvement areas   │
+│ 3. Produce Report   → _docs/05_metrics/retro_[date].md         │
+├────────────────────────────────────────────────────────────────┤
+│ Principles: Data-driven · Actionable · Cumulative               │
+│             Non-judgmental · Save immediately                   │
+└────────────────────────────────────────────────────────────────┘
+```
@@ -0,0 +1,93 @@
+# Retrospective Report Template
+
+Save as `_docs/05_metrics/retro_[YYYY-MM-DD].md`.
+
+---
+
+```markdown
+# Retrospective — [YYYY-MM-DD]
+
+## Implementation Summary
+
+| Metric | Value |
+|--------|-------|
+| Total tasks | [count] |
+| Total batches | [count] |
+| Total complexity points | [sum] |
+| Avg tasks per batch | [value] |
+| Avg complexity per batch | [value] |
+
+## Quality Metrics
+
+### Code Review Results
+
+| Verdict | Count | Percentage |
+|---------|-------|-----------|
+| PASS | [count] | [%] |
+| PASS_WITH_WARNINGS | [count] | [%] |
+| FAIL | [count] | [%] |
+
+### Findings by Severity
+
+| Severity | Count |
+|----------|-------|
+| Critical | [count] |
+| High | [count] |
+| Medium | [count] |
+| Low | [count] |
+
+### Findings by Category
+
+| Category | Count | Top Files |
+|----------|-------|-----------|
+| Bug | [count] | [most affected files] |
+| Spec-Gap | [count] | [most affected files] |
+| Security | [count] | [most affected files] |
+| Performance | [count] | [most affected files] |
+| Maintainability | [count] | [most affected files] |
+| Style | [count] | [most affected files] |
+
+## Efficiency
+
+| Metric | Value |
+|--------|-------|
+| Blocked tasks | [count] |
+| Tasks requiring fixes after review | [count] |
+| Batch with most findings | Batch [N] — [reason] |
+
+### Blocker Analysis
+
+| Blocker Type | Count | Prevention |
+|-------------|-------|-----------|
+| [type] | [count] | [suggested prevention] |
+
+## Trend Comparison
+
+| Metric | Previous | Current | Change |
+|--------|----------|---------|--------|
+| Pass rate | [%] | [%] | [+/-] |
+| Avg findings per batch | [value] | [value] | [+/-] |
+| Blocked tasks | [count] | [count] | [+/-] |
+
+*Previous retrospective: [date or "N/A — first retro"]*
+
+## Top 3 Improvement Actions
+
+1. **[Action title]**: [specific, actionable description]
+   - Impact: [expected improvement]
+   - Effort: [low/medium/high]
+
+2. **[Action title]**: [specific, actionable description]
+   - Impact: [expected improvement]
+   - Effort: [low/medium/high]
+
+3. **[Action title]**: [specific, actionable description]
+   - Impact: [expected improvement]
+   - Effort: [low/medium/high]
+
+## Suggested Rule/Skill Updates
+
+| File | Change | Rationale |
+|------|--------|-----------|
+| [.cursor/rules/... or .cursor/skills/...] | [specific change] | [based on which metric] |
+```
@@ -49,19 +49,8 @@ When testing security or conducting audits:
 - Validating input sanitization
 - Reviewing security configuration

-### OWASP Top 10 (2021)
-| # | Vulnerability | Key Test |
-|---|---------------|----------|
-| 1 | Broken Access Control | User A accessing User B's data |
-| 2 | Cryptographic Failures | Plaintext passwords, HTTP |
-| 3 | Injection | SQL/XSS/command injection |
-| 4 | Insecure Design | Rate limiting, session timeout |
-| 5 | Security Misconfiguration | Verbose errors, exposed /admin |
-| 6 | Vulnerable Components | npm audit, outdated packages |
-| 7 | Auth Failures | Weak passwords, no MFA |
-| 8 | Integrity Failures | Unsigned updates, malware |
-| 9 | Logging Failures | No audit trail for breaches |
-| 10 | SSRF | Server fetching internal URLs |
+### OWASP Top 10
+Use the most recent **stable** version of the OWASP Top 10. At the start of each security audit, research the current version at https://owasp.org/www-project-top-ten/ and test against all listed categories. Do not rely on a hardcoded list — the OWASP Top 10 is updated periodically and the current version must be verified.

 ### Tools
 | Type | Tool | Purpose |