diff --git a/.cursor/README.md b/.cursor/README.md index 819f3f6..e373682 100644 --- a/.cursor/README.md +++ b/.cursor/README.md @@ -20,23 +20,20 @@ 5. /implement — auto-orchestrates all tasks: batches by dependencies, launches parallel implementers, runs code review, loops until done -6. /implement-black-box-tests — E2E tests via Docker consumer app (after all tasks) - -7. commit & push +6. commit & push ``` ### SHIP (deploy and operate) ``` -8. /implement-cicd — validate/enhance CI/CD pipeline -9. /deploy — deployment strategy per environment -10. /observability — monitoring, logging, alerting plan +7. /deploy — containerization, CI/CD, environment strategy, observability, deployment procedures (skill, 5-step workflow) ``` ### EVOLVE (maintenance and improvement) ``` -11. /refactor — structured refactoring (skill, 6-phase workflow) +8. /refactor — structured refactoring (skill, 6-phase workflow) +9. /retrospective — collect metrics, analyze trends, produce improvement report (skill) ``` ## Implementation Flow @@ -67,29 +64,25 @@ Multi-phase code review invoked after each implementation batch: Produces structured findings with severity (Critical/High/Medium/Low) and verdict (PASS/FAIL/PASS_WITH_WARNINGS). -### `/implement-black-box-tests` - -Reads `_docs/02_plans/integration_tests/` (produced by plan skill Step 1). Builds a separate Docker-based consumer app that exercises the system as a black box — no internal imports, no direct DB access. Runs E2E scenarios, produces a CSV test report. - -Run after all tasks are done. - -### `/implement-cicd` - -Reviews existing CI/CD pipeline configuration, validates all stages work, optimizes performance (parallelization, caching), ensures quality gates are enforced (coverage, linting, security scanning). - -Run after `/implement` or after all tasks. - ### `/deploy` -Defines deployment strategy per environment: deployment procedures, rollback procedures, health checks, deployment checklist. Outputs `_docs/02_components/deployment_strategy.md`. +Comprehensive deployment skill (5-step workflow): -Run before first production release. +1. Containerization — Dockerfiles per component, docker-compose for dev and tests +2. CI/CD Pipeline — lint, test, security scan, build, deploy with quality gates +3. Environment Strategy — dev, staging, production with secrets management +4. Observability — structured logging, metrics, tracing, alerting, dashboards +5. Deployment Procedures — rollback, health checks, graceful shutdown -### `/observability` +Outputs to `_docs/02_plans/deployment/`. Run after `/implement` or before first production release. -Plans logging strategy, metrics collection, distributed tracing, alerting rules, and dashboards. Outputs `_docs/02_components/observability_plan.md`. +### `/retrospective` -Run before first production release. +Collects metrics from batch reports and code review findings, analyzes trends across implementation cycles, and produces improvement reports. Outputs to `_docs/05_metrics/`. + +### `/rollback` + +Reverts implementation to a specific batch checkpoint using git revert, resets Jira ticket statuses, and verifies rollback integrity with tests. ### Commit @@ -106,6 +99,8 @@ After each confirmed batch, the `/implement` skill automatically commits and pus | **code-review** | "code review", "review code" | 6-phase structured review with findings | | **refactor** | "refactor", "refactoring", "improve code" | 6-phase structured refactoring workflow | | **security** | "security audit", "OWASP" | OWASP-based security testing | +| **deploy** | "deploy", "CI/CD", "containerize", "observability" | Containerization, CI/CD, observability, deployment procedures | +| **retrospective** | "retrospective", "retro", "metrics review" | Collect metrics, analyze trends, produce improvement report | ## Project Folder Structure @@ -134,6 +129,7 @@ _docs/ ├── 02_plans/ │ ├── architecture.md │ ├── system-flows.md +│ ├── data_model.md │ ├── risk_mitigations.md │ ├── components/ │ │ └── [##]_[name]/ @@ -146,6 +142,12 @@ _docs/ │ │ ├── functional_tests.md │ │ ├── non_functional_tests.md │ │ └── traceability_matrix.md +│ ├── deployment/ +│ │ ├── containerization.md +│ │ ├── ci_cd_pipeline.md +│ │ ├── environment_strategy.md +│ │ ├── observability.md +│ │ └── deployment_procedures.md │ ├── diagrams/ │ └── FINAL_report.md ├── 02_tasks/ @@ -158,15 +160,17 @@ _docs/ │ ├── batch_02_report.md │ ├── ... │ └── FINAL_implementation_report.md -└── 04_refactoring/ - ├── baseline_metrics.md - ├── discovery/ - ├── analysis/ - ├── test_specs/ - ├── coupling_analysis.md - ├── execution_log.md - ├── hardening/ - └── FINAL_report.md +├── 04_refactoring/ +│ ├── baseline_metrics.md +│ ├── discovery/ +│ ├── analysis/ +│ ├── test_specs/ +│ ├── coupling_analysis.md +│ ├── execution_log.md +│ ├── hardening/ +│ └── FINAL_report.md +└── 05_metrics/ + └── retro_[date].md ``` ## Implementation Tools @@ -176,10 +180,9 @@ _docs/ | `implementer` | Subagent | Implements a single task from its spec. Launched by /implement. | | `/implement` | Skill | Orchestrates all tasks: dependency batching, parallel agents, code review. | | `/code-review` | Skill | Multi-phase code review with structured findings. | -| `/implement-black-box-tests` | Command | E2E tests via Docker consumer app. After all tasks. | -| `/implement-cicd` | Command | Validate and enhance CI/CD pipeline. | -| `/deploy` | Command | Plan deployment strategy per environment. | -| `/observability` | Command | Plan logging, metrics, tracing, alerting. | +| `/deploy` | Skill | Containerization, CI/CD, observability, deployment procedures. | +| `/retrospective` | Skill | Collect metrics, analyze trends, produce improvement reports. | +| `/rollback` | Command | Revert to a batch checkpoint with Jira status reset. | ## Automations (Planned) diff --git a/.cursor/commands/deploy.md b/.cursor/commands/deploy.md deleted file mode 100644 index 4605aa2..0000000 --- a/.cursor/commands/deploy.md +++ /dev/null @@ -1,71 +0,0 @@ -# Deployment Strategy Planning - -## Initial data: - - Problem description: `@_docs/00_problem/problem_description.md` - - Restrictions: `@_docs/00_problem/restrictions.md` - - Full Solution Description: `@_docs/01_solution/solution.md` - - Components: `@_docs/02_components` - - Environment Strategy: `@_docs/00_templates/environment_strategy.md` - -## Role - You are a DevOps/Platform engineer - -## Task - - Define deployment strategy for each environment - - Plan deployment procedures and automation - - Define rollback procedures - - Establish deployment verification steps - - Document manual intervention points - -## Output - -### Deployment Architecture - - Infrastructure diagram (where components run) - - Network topology - - Load balancing strategy - - Container/VM configuration - -### Deployment Procedures - -#### Staging Deployment - - Trigger conditions - - Pre-deployment checks - - Deployment steps - - Post-deployment verification - - Smoke tests to run - -#### Production Deployment - - Approval workflow - - Deployment window - - Pre-deployment checks - - Deployment steps (blue-green, rolling, canary) - - Post-deployment verification - - Smoke tests to run - -### Rollback Procedures - - Rollback trigger criteria - - Rollback steps per environment - - Data rollback considerations - - Communication plan during rollback - -### Health Checks - - Liveness probe configuration - - Readiness probe configuration - - Custom health endpoints - -### Deployment Checklist - - [ ] All tests pass in CI - - [ ] Security scan clean - - [ ] Database migrations reviewed - - [ ] Feature flags configured - - [ ] Monitoring alerts configured - - [ ] Rollback plan documented - - [ ] Stakeholders notified - -Store output to `_docs/02_components/deployment_strategy.md` - -## Notes - - Prefer automated deployments over manual - - Zero-downtime deployments for production - - Always have a rollback plan - - Ask questions about infrastructure constraints diff --git a/.cursor/commands/implement-cicd.md b/.cursor/commands/implement-cicd.md deleted file mode 100644 index 282ec2c..0000000 --- a/.cursor/commands/implement-cicd.md +++ /dev/null @@ -1,64 +0,0 @@ -# CI/CD Pipeline Validation & Enhancement - -## Initial data: - - Problem description: `@_docs/00_problem/problem_description.md` - - Restrictions: `@_docs/00_problem/restrictions.md` - - Full Solution Description: `@_docs/01_solution/solution.md` - - Components: `@_docs/02_components` - - Environment Strategy: `@_docs/00_templates/environment_strategy.md` - -## Role - You are a DevOps engineer - -## Task - - Review existing CI/CD pipeline configuration - - Validate all stages are working correctly - - Optimize pipeline performance (parallelization, caching) - - Ensure test coverage gates are enforced - - Verify security scanning is properly configured - - Add missing quality gates - -## Checklist - -### Pipeline Health - - [ ] All stages execute successfully - - [ ] Build time is acceptable (<10 min for most projects) - - [ ] Caching is properly configured (dependencies, build artifacts) - - [ ] Parallel execution where possible - -### Quality Gates - - [ ] Code coverage threshold enforced (minimum 75%) - - [ ] Linting errors block merge - - [ ] Security vulnerabilities block merge (critical/high) - - [ ] All tests must pass - -### Environment Deployments - - [ ] Staging deployment works on merge to stage branch - - [ ] Environment variables properly configured per environment - - [ ] Secrets are securely managed (not in code) - - [ ] Rollback procedure documented - -### Monitoring - - [ ] Build notifications configured (Slack, email, etc.) - - [ ] Failed build alerts - - [ ] Deployment success/failure notifications - -## Output - -### Pipeline Status Report - - Current pipeline configuration summary - - Issues found and fixes applied - - Performance metrics (build times) - -### Recommended Improvements - - Short-term improvements - - Long-term optimizations - -### Quality Gate Configuration - - Thresholds configured - - Enforcement rules - -## Notes - - Do not break existing functionality - - Test changes in separate branch first - - Document any manual steps required diff --git a/.cursor/commands/observability.md b/.cursor/commands/observability.md deleted file mode 100644 index a9be136..0000000 --- a/.cursor/commands/observability.md +++ /dev/null @@ -1,122 +0,0 @@ -# Observability Planning - -## Initial data: - - Problem description: `@_docs/00_problem/problem_description.md` - - Full Solution Description: `@_docs/01_solution/solution.md` - - Components: `@_docs/02_components` - - Deployment Strategy: `@_docs/02_components/deployment_strategy.md` - -## Role - You are a Site Reliability Engineer (SRE) - -## Task - - Define logging strategy across all components - - Plan metrics collection and dashboards - - Design distributed tracing (if applicable) - - Establish alerting rules - - Document incident response procedures - -## Output - -### Logging Strategy - -#### Log Levels -| Level | Usage | Example | -|-------|-------|---------| -| ERROR | Exceptions, failures requiring attention | Database connection failed | -| WARN | Potential issues, degraded performance | Retry attempt 2/3 | -| INFO | Significant business events | User registered, Order placed | -| DEBUG | Detailed diagnostic information | Request payload, Query params | - -#### Log Format -```json -{ - "timestamp": "ISO8601", - "level": "INFO", - "service": "service-name", - "correlation_id": "uuid", - "message": "Event description", - "context": {} -} -``` - -#### Log Storage -- Development: Console/file -- Staging: Centralized (ELK, CloudWatch, etc.) -- Production: Centralized with retention policy - -### Metrics - -#### System Metrics -- CPU usage -- Memory usage -- Disk I/O -- Network I/O - -#### Application Metrics -| Metric | Type | Description | -|--------|------|-------------| -| request_count | Counter | Total requests | -| request_duration | Histogram | Response time | -| error_count | Counter | Failed requests | -| active_connections | Gauge | Current connections | - -#### Business Metrics -- [Define based on acceptance criteria] - -### Distributed Tracing - -#### Trace Context -- Correlation ID propagation -- Span naming conventions -- Sampling strategy - -#### Integration Points -- HTTP headers -- Message queue metadata -- Database query tagging - -### Alerting - -#### Alert Categories -| Severity | Response Time | Examples | -|----------|---------------|----------| -| Critical | 5 min | Service down, Data loss | -| High | 30 min | High error rate, Performance degradation | -| Medium | 4 hours | Elevated latency, Disk usage high | -| Low | Next business day | Non-critical warnings | - -#### Alert Rules -```yaml -alerts: - - name: high_error_rate - condition: error_rate > 5% - duration: 5m - severity: high - - - name: service_down - condition: health_check_failed - duration: 1m - severity: critical -``` - -### Dashboards - -#### Operations Dashboard -- Service health status -- Request rate and error rate -- Response time percentiles -- Resource utilization - -#### Business Dashboard -- Key business metrics -- User activity -- Transaction volumes - -Store output to `_docs/02_components/observability_plan.md` - -## Notes - - Follow the principle: "If it's not monitored, it's not in production" - - Balance verbosity with cost - - Ensure PII is not logged - - Plan for log rotation and retention diff --git a/.cursor/commands/rollback.md b/.cursor/commands/rollback.md new file mode 100644 index 0000000..0e39bb9 --- /dev/null +++ b/.cursor/commands/rollback.md @@ -0,0 +1,54 @@ +# Implementation Rollback + +## Role +You are a DevOps engineer performing a controlled rollback of implementation batches. + +## Input +- User specifies a target batch number or commit hash to roll back to +- If not specified, present the list of available batch checkpoints and ask + +## Process + +1. Read `_docs/03_implementation/batch_*_report.md` files to identify all batch checkpoints with their commit hashes +2. Present batch list to user with: batch number, date, tasks included, commit hash +3. Determine which commits need to be reverted (all commits after the target batch) +4. For each commit to revert (in reverse chronological order): + - Run `git revert --no-edit` + - Verify no merge conflicts; if conflicts occur, ask user for resolution +5. Run the full test suite to verify rollback integrity +6. If tests fail, report failures and ask user how to proceed +7. Reset Jira ticket statuses for all reverted tasks back to "To Do" via Jira MCP +8. Commit the revert with message: `[ROLLBACK] Reverted to batch [N]: [task list]` + +## Output + +Write `_docs/03_implementation/rollback_report.md`: + +```markdown +# Rollback Report + +**Date**: [YYYY-MM-DD] +**Target**: Batch [N] (commit [hash]) +**Reverted Batches**: [list] + +## Reverted Tasks + +| Task | Batch | Status Before | Status After | +|------|-------|--------------|-------------| +| [JIRA-ID] | [batch #] | In Testing | To Do | + +## Test Results +- [pass/fail count] + +## Jira Updates +- [list of ticket transitions] + +## Notes +- [any conflicts, manual steps, or issues encountered] +``` + +## Safety Rules +- Never force-push; always use `git revert` to preserve history +- Always run tests after rollback +- Always update Jira statuses for reverted tasks +- If rollback fails midway, stop and report — do not leave the codebase in a partial state diff --git a/.cursor/rules/git-workflow.mdc b/.cursor/rules/git-workflow.mdc new file mode 100644 index 0000000..2ab10c1 --- /dev/null +++ b/.cursor/rules/git-workflow.mdc @@ -0,0 +1,8 @@ +--- +description: "Git workflow: work on dev branch, commit message format with Jira IDs" +alwaysApply: true +--- +# Git Workflow + +- Work on the `dev` branch +- Commit message format: `[JIRA-ID-1] [JIRA-ID-2] Summary of changes` diff --git a/.cursor/skills/decompose/SKILL.md b/.cursor/skills/decompose/SKILL.md index d995bf9..8fac9a3 100644 --- a/.cursor/skills/decompose/SKILL.md +++ b/.cursor/skills/decompose/SKILL.md @@ -124,15 +124,35 @@ At the start of execution, create a TodoWrite with all applicable steps. Update **Goal**: Produce `01_initial_structure.md` — the first task describing the project skeleton **Constraints**: This is a plan document, not code. The `/implement` skill executes it. -1. Read architecture.md, all component specs, and system-flows.md from PLANS_DIR +1. Read architecture.md, all component specs, system-flows.md, data_model.md, and `deployment/` from PLANS_DIR 2. Read problem, solution, and restrictions from `_docs/00_problem/` and `_docs/01_solution/` 3. Research best implementation patterns for the identified tech stack 4. Document the structure plan using `templates/initial-structure-task.md` +The bootstrap structure plan must include: +- Project folder layout with all component directories +- Shared models, interfaces, and DTOs +- Dockerfile per component (multi-stage, non-root, health checks, pinned base images) +- `docker-compose.yml` for local development (all components + database + dependencies) +- `docker-compose.test.yml` for integration test environment (black-box test runner) +- `.dockerignore` +- CI/CD pipeline file (`.github/workflows/ci.yml` or `azure-pipelines.yml`) with stages from `deployment/ci_cd_pipeline.md` +- Database migration setup and initial seed data scripts +- Observability configuration: structured logging setup, health check endpoints (`/health/live`, `/health/ready`), metrics endpoint (`/metrics`) +- Environment variable documentation (`.env.example`) +- Test structure with unit and integration test locations + **Self-verification**: - [ ] All components have corresponding folders in the layout - [ ] All inter-component interfaces have DTOs defined -- [ ] CI/CD stages cover build, lint, test, security, deploy +- [ ] Dockerfile defined for each component +- [ ] `docker-compose.yml` covers all components and dependencies +- [ ] `docker-compose.test.yml` enables black-box integration testing +- [ ] CI/CD pipeline file defined with lint, test, security, build, deploy stages +- [ ] Database migration setup included +- [ ] Health check endpoints specified for each service +- [ ] Structured logging configuration included +- [ ] `.env.example` with all required environment variables - [ ] Environment strategy covers dev, staging, production - [ ] Test structure includes unit and integration test locations diff --git a/.cursor/skills/deploy/SKILL.md b/.cursor/skills/deploy/SKILL.md new file mode 100644 index 0000000..6f496ef --- /dev/null +++ b/.cursor/skills/deploy/SKILL.md @@ -0,0 +1,363 @@ +--- +name: deploy +description: | + Comprehensive deployment skill covering containerization, CI/CD pipeline, environment strategy, observability, and deployment procedures. + 5-step workflow: Docker containerization, CI/CD pipeline definition, environment strategy, observability planning, deployment procedures. + Uses _docs/02_plans/deployment/ structure. + Trigger phrases: + - "deploy", "deployment", "deployment strategy" + - "CI/CD", "pipeline", "containerize" + - "observability", "monitoring", "logging" + - "dockerize", "docker compose" +category: ship +tags: [deployment, docker, ci-cd, observability, monitoring, containerization] +disable-model-invocation: true +--- + +# Deployment Planning + +Plan and document the full deployment lifecycle: containerize the application, define CI/CD pipelines, configure environments, set up observability, and document deployment procedures. + +## Core Principles + +- **Docker-first**: every component runs in a container; local dev, integration tests, and production all use Docker +- **Infrastructure as code**: all deployment configuration is version-controlled +- **Observability built-in**: logging, metrics, and tracing are part of the deployment plan, not afterthoughts +- **Environment parity**: dev, staging, and production environments mirror each other as closely as possible +- **Save immediately**: write artifacts to disk after each step; never accumulate unsaved work +- **Ask, don't assume**: when infrastructure constraints or preferences are unclear, ask the user +- **Plan, don't code**: this workflow produces deployment documents and specifications, not implementation code + +## Context Resolution + +Fixed paths: + +- PLANS_DIR: `_docs/02_plans/` +- DEPLOY_DIR: `_docs/02_plans/deployment/` +- ARCHITECTURE: `_docs/02_plans/architecture.md` +- COMPONENTS_DIR: `_docs/02_plans/components/` + +Announce the resolved paths to the user before proceeding. + +## Input Specification + +### Required Files + +| File | Purpose | +|------|---------| +| `_docs/00_problem/problem.md` | Problem description and context | +| `_docs/00_problem/restrictions.md` | Constraints and limitations | +| `_docs/01_solution/solution.md` | Finalized solution | +| `PLANS_DIR/architecture.md` | Architecture from plan skill | +| `PLANS_DIR/components/` | Component specs | + +### Prerequisite Checks (BLOCKING) + +1. `architecture.md` exists — **STOP if missing**, run `/plan` first +2. At least one component spec exists in `PLANS_DIR/components/` — **STOP if missing** +3. Create DEPLOY_DIR if it does not exist +4. If DEPLOY_DIR already contains artifacts, ask user: **resume from last checkpoint or start fresh?** + +## Artifact Management + +### Directory Structure + +``` +DEPLOY_DIR/ +├── containerization.md +├── ci_cd_pipeline.md +├── environment_strategy.md +├── observability.md +└── deployment_procedures.md +``` + +### Save Timing + +| Step | Save immediately after | Filename | +|------|------------------------|----------| +| Step 1 | Containerization plan complete | `containerization.md` | +| Step 2 | CI/CD pipeline defined | `ci_cd_pipeline.md` | +| Step 3 | Environment strategy documented | `environment_strategy.md` | +| Step 4 | Observability plan complete | `observability.md` | +| Step 5 | Deployment procedures documented | `deployment_procedures.md` | + +### Resumability + +If DEPLOY_DIR already contains artifacts: + +1. List existing files and match to the save timing table +2. Identify the last completed step +3. Resume from the next incomplete step +4. Inform the user which steps are being skipped + +## Progress Tracking + +At the start of execution, create a TodoWrite with all steps (1 through 5). Update status as each step completes. + +## Workflow + +### Step 1: Containerization + +**Role**: DevOps / Platform engineer +**Goal**: Define Docker configuration for every component, local development, and integration test environments +**Constraints**: Plan only — no Dockerfile creation. Describe what each Dockerfile should contain. + +1. Read architecture.md and all component specs +2. Read restrictions.md for infrastructure constraints +3. Research best Docker practices for the project's tech stack (multi-stage builds, base image selection, layer optimization) +4. For each component, define: + - Base image (pinned version, prefer alpine/distroless for production) + - Build stages (dependency install, build, production) + - Non-root user configuration + - Health check endpoint and command + - Exposed ports + - `.dockerignore` contents +5. Define `docker-compose.yml` for local development: + - All application components + - Database (Postgres) with named volume + - Any message queues, caches, or external service mocks + - Shared network + - Environment variable files (`.env.dev`) +6. Define `docker-compose.test.yml` for integration tests: + - Application components under test + - Test runner container (black-box, no internal imports) + - Isolated database with seed data + - All tests runnable via `docker compose -f docker-compose.test.yml up --abort-on-container-exit` +7. Define image tagging strategy: `//:` for CI, `latest` for local dev only + +**Self-verification**: +- [ ] Every component has a Dockerfile specification +- [ ] Multi-stage builds specified for all production images +- [ ] Non-root user for all containers +- [ ] Health checks defined for every service +- [ ] docker-compose.yml covers all components + dependencies +- [ ] docker-compose.test.yml enables black-box integration testing +- [ ] `.dockerignore` defined + +**Save action**: Write `containerization.md` using `templates/containerization.md` + +**BLOCKING**: Present containerization plan to user. Do NOT proceed until confirmed. + +--- + +### Step 2: CI/CD Pipeline + +**Role**: DevOps engineer +**Goal**: Define the CI/CD pipeline with quality gates, security scanning, and multi-environment deployment +**Constraints**: Pipeline definition only — produce YAML specification, not implementation + +1. Read architecture.md for tech stack and deployment targets +2. Read restrictions.md for CI/CD constraints (cloud provider, registry, etc.) +3. Research CI/CD best practices for the project's platform (GitHub Actions / Azure Pipelines) +4. Define pipeline stages: + +| Stage | Trigger | Steps | Quality Gate | +|-------|---------|-------|-------------| +| **Lint** | Every push | Run linters per language (black, rustfmt, prettier, dotnet format) | Zero errors | +| **Test** | Every push | Unit tests, integration tests, coverage report | 75%+ coverage | +| **Security** | Every push | Dependency audit, SAST scan (Semgrep/SonarQube), image scan (Trivy) | Zero critical/high CVEs | +| **Build** | PR merge to dev | Build Docker images, tag with git SHA | Build succeeds | +| **Push** | After build | Push to container registry | Push succeeds | +| **Deploy Staging** | After push | Deploy to staging environment | Health checks pass | +| **Smoke Tests** | After staging deploy | Run critical path tests against staging | All pass | +| **Deploy Production** | Manual approval | Deploy to production | Health checks pass | + +5. Define caching strategy: dependency caches, Docker layer caches, build artifact caches +6. Define parallelization: which stages can run concurrently +7. Define notifications: build failures, deployment status, security alerts + +**Self-verification**: +- [ ] All pipeline stages defined with triggers and gates +- [ ] Coverage threshold enforced (75%+) +- [ ] Security scanning included (dependencies + images + SAST) +- [ ] Caching configured for dependencies and Docker layers +- [ ] Multi-environment deployment (staging → production) +- [ ] Rollback procedure referenced +- [ ] Notifications configured + +**Save action**: Write `ci_cd_pipeline.md` using `templates/ci_cd_pipeline.md` + +--- + +### Step 3: Environment Strategy + +**Role**: Platform engineer +**Goal**: Define environment configuration, secrets management, and environment parity +**Constraints**: Strategy document — no secrets or credentials in output + +1. Define environments: + +| Environment | Purpose | Infrastructure | Data | +|-------------|---------|---------------|------| +| **Development** | Local developer workflow | docker-compose, local volumes | Seed data, mocks for external APIs | +| **Staging** | Pre-production validation | Mirrors production topology | Anonymized production-like data | +| **Production** | Live system | Full infrastructure | Real data | + +2. Define environment variable management: + - `.env.example` with all required variables (no real values) + - Per-environment variable sources (`.env` for dev, secret manager for staging/prod) + - Validation: fail fast on missing required variables at startup +3. Define secrets management: + - Never commit secrets to version control + - Development: `.env` files (git-ignored) + - Staging/Production: secret manager (AWS Secrets Manager / Azure Key Vault / Vault) + - Rotation policy +4. Define database management per environment: + - Development: Docker Postgres with named volume, seed data + - Staging: managed Postgres, migrations applied via CI/CD + - Production: managed Postgres, migrations require approval + +**Self-verification**: +- [ ] All three environments defined with clear purpose +- [ ] Environment variable documentation complete (`.env.example`) +- [ ] No secrets in any output document +- [ ] Secret manager specified for staging/production +- [ ] Database strategy per environment + +**Save action**: Write `environment_strategy.md` using `templates/environment_strategy.md` + +--- + +### Step 4: Observability + +**Role**: Site Reliability Engineer (SRE) +**Goal**: Define logging, metrics, tracing, and alerting strategy +**Constraints**: Strategy document — describe what to implement, not how to wire it + +1. Read architecture.md and component specs for service boundaries +2. Research observability best practices for the tech stack + +**Logging**: +- Structured JSON to stdout/stderr (no file logging in containers) +- Fields: `timestamp` (ISO 8601), `level`, `service`, `correlation_id`, `message`, `context` +- Levels: ERROR (exceptions), WARN (degraded), INFO (business events), DEBUG (diagnostics, dev only) +- No PII in logs +- Retention: dev = console, staging = 7 days, production = 30 days + +**Metrics**: +- Expose Prometheus-compatible `/metrics` endpoint per service +- System metrics: CPU, memory, disk, network +- Application metrics: `request_count`, `request_duration` (histogram), `error_count`, `active_connections` +- Business metrics: derived from acceptance criteria +- Collection interval: 15s + +**Distributed Tracing**: +- OpenTelemetry SDK integration +- Trace context propagation via HTTP headers and message queue metadata +- Span naming: `.` +- Sampling: 100% in dev/staging, 10% in production (adjust based on volume) + +**Alerting**: + +| Severity | Response Time | Condition Examples | +|----------|---------------|-------------------| +| Critical | 5 min | Service down, data loss, health check failed | +| High | 30 min | Error rate > 5%, P95 latency > 2x baseline | +| Medium | 4 hours | Disk > 80%, elevated latency | +| Low | Next business day | Non-critical warnings | + +**Dashboards**: +- Operations: service health, request rate, error rate, response time percentiles, resource utilization +- Business: key business metrics from acceptance criteria + +**Self-verification**: +- [ ] Structured logging format defined with required fields +- [ ] Metrics endpoint specified per service +- [ ] OpenTelemetry tracing configured +- [ ] Alert severities with response times defined +- [ ] Dashboards cover operations and business metrics +- [ ] PII exclusion from logs addressed + +**Save action**: Write `observability.md` using `templates/observability.md` + +--- + +### Step 5: Deployment Procedures + +**Role**: DevOps / Platform engineer +**Goal**: Define deployment strategy, rollback procedures, health checks, and deployment checklist +**Constraints**: Procedures document — no implementation + +1. Define deployment strategy: + - Preferred pattern: blue-green / rolling / canary (choose based on architecture) + - Zero-downtime requirement for production + - Graceful shutdown: 30-second grace period for in-flight requests + - Database migration ordering: migrate before deploy, backward-compatible only + +2. Define health checks: + +| Check | Type | Endpoint | Interval | Threshold | +|-------|------|----------|----------|-----------| +| Liveness | HTTP GET | `/health/live` | 10s | 3 failures → restart | +| Readiness | HTTP GET | `/health/ready` | 5s | 3 failures → remove from LB | +| Startup | HTTP GET | `/health/ready` | 5s | 30 attempts max | + +3. Define rollback procedures: + - Trigger criteria: health check failures, error rate spike, critical alert + - Rollback steps: redeploy previous image tag, verify health, rollback database if needed + - Communication: notify stakeholders during rollback + - Post-mortem: required after every production rollback + +4. Define deployment checklist: + - [ ] All tests pass in CI + - [ ] Security scan clean (zero critical/high CVEs) + - [ ] Database migrations reviewed and tested + - [ ] Environment variables configured + - [ ] Health check endpoints responding + - [ ] Monitoring alerts configured + - [ ] Rollback plan documented and tested + - [ ] Stakeholders notified + +**Self-verification**: +- [ ] Deployment strategy chosen and justified +- [ ] Zero-downtime approach specified +- [ ] Health checks defined (liveness, readiness, startup) +- [ ] Rollback trigger criteria and steps documented +- [ ] Deployment checklist complete + +**Save action**: Write `deployment_procedures.md` using `templates/deployment_procedures.md` + +**BLOCKING**: Present deployment procedures to user. Do NOT proceed until confirmed. + +--- + +## Escalation Rules + +| Situation | Action | +|-----------|--------| +| Unknown cloud provider or hosting | **ASK user** | +| Container registry not specified | **ASK user** | +| CI/CD platform preference unclear | **ASK user** — default to GitHub Actions | +| Secret manager not chosen | **ASK user** | +| Deployment pattern trade-offs | **ASK user** with recommendation | +| Missing architecture.md | **STOP** — run `/plan` first | + +## Common Mistakes + +- **Implementing during planning**: this workflow produces documents, not Dockerfiles or pipeline YAML +- **Hardcoding secrets**: never include real credentials in deployment documents +- **Ignoring integration test containerization**: the test environment must be containerized alongside the app +- **Skipping BLOCKING gates**: never proceed past a BLOCKING marker without user confirmation +- **Using `:latest` tags**: always pin base image versions +- **Forgetting observability**: logging, metrics, and tracing are deployment concerns, not post-deployment additions + +## Methodology Quick Reference + +``` +┌────────────────────────────────────────────────────────────────┐ +│ Deployment Planning (5-Step Method) │ +├────────────────────────────────────────────────────────────────┤ +│ PREREQ: architecture.md + component specs exist │ +│ │ +│ 1. Containerization → containerization.md │ +│ [BLOCKING: user confirms Docker plan] │ +│ 2. CI/CD Pipeline → ci_cd_pipeline.md │ +│ 3. Environment → environment_strategy.md │ +│ 4. Observability → observability.md │ +│ 5. Procedures → deployment_procedures.md │ +│ [BLOCKING: user confirms deployment plan] │ +├────────────────────────────────────────────────────────────────┤ +│ Principles: Docker-first · IaC · Observability built-in │ +│ Environment parity · Save immediately │ +└────────────────────────────────────────────────────────────────┘ +``` diff --git a/.cursor/skills/deploy/templates/ci_cd_pipeline.md b/.cursor/skills/deploy/templates/ci_cd_pipeline.md new file mode 100644 index 0000000..d21c1f4 --- /dev/null +++ b/.cursor/skills/deploy/templates/ci_cd_pipeline.md @@ -0,0 +1,87 @@ +# CI/CD Pipeline Template + +Save as `_docs/02_plans/deployment/ci_cd_pipeline.md`. + +--- + +```markdown +# [System Name] — CI/CD Pipeline + +## Pipeline Overview + +| Stage | Trigger | Quality Gate | +|-------|---------|-------------| +| Lint | Every push | Zero lint errors | +| Test | Every push | 75%+ coverage, all tests pass | +| Security | Every push | Zero critical/high CVEs | +| Build | PR merge to dev | Docker build succeeds | +| Push | After build | Images pushed to registry | +| Deploy Staging | After push | Health checks pass | +| Smoke Tests | After staging deploy | Critical paths pass | +| Deploy Production | Manual approval | Health checks pass | + +## Stage Details + +### Lint +- [Language-specific linters and formatters] +- Runs in parallel per language + +### Test +- Unit tests: [framework and command] +- Integration tests: [framework and command, uses docker-compose.test.yml] +- Coverage threshold: 75% overall, 90% critical paths +- Coverage report published as pipeline artifact + +### Security +- Dependency audit: [tool, e.g., npm audit / pip-audit / dotnet list package --vulnerable] +- SAST scan: [tool, e.g., Semgrep / SonarQube] +- Image scan: Trivy on built Docker images +- Block on: critical or high severity findings + +### Build +- Docker images built using multi-stage Dockerfiles +- Tagged with git SHA: `/:` +- Build cache: Docker layer cache via CI cache action + +### Push +- Registry: [container registry URL] +- Authentication: [method] + +### Deploy Staging +- Deployment method: [docker compose / Kubernetes / cloud service] +- Pre-deploy: run database migrations +- Post-deploy: verify health check endpoints +- Automated rollback on health check failure + +### Smoke Tests +- Subset of integration tests targeting staging environment +- Validates critical user flows +- Timeout: [maximum duration] + +### Deploy Production +- Requires manual approval via [mechanism] +- Deployment strategy: [blue-green / rolling / canary] +- Pre-deploy: database migration review +- Post-deploy: health checks + monitoring for 15 min + +## Caching Strategy + +| Cache | Key | Restore Keys | +|-------|-----|-------------| +| Dependencies | [lockfile hash] | [partial match] | +| Docker layers | [Dockerfile hash] | [partial match] | +| Build artifacts | [source hash] | [partial match] | + +## Parallelization + +[Diagram or description of which stages run concurrently] + +## Notifications + +| Event | Channel | Recipients | +|-------|---------|-----------| +| Build failure | [Slack/email] | [team] | +| Security alert | [Slack/email] | [team + security] | +| Deploy success | [Slack] | [team] | +| Deploy failure | [Slack/email + PagerDuty] | [on-call] | +``` diff --git a/.cursor/skills/deploy/templates/containerization.md b/.cursor/skills/deploy/templates/containerization.md new file mode 100644 index 0000000..db982a7 --- /dev/null +++ b/.cursor/skills/deploy/templates/containerization.md @@ -0,0 +1,94 @@ +# Containerization Plan Template + +Save as `_docs/02_plans/deployment/containerization.md`. + +--- + +```markdown +# [System Name] — Containerization + +## Component Dockerfiles + +### [Component Name] + +| Property | Value | +|----------|-------| +| Base image | [e.g., mcr.microsoft.com/dotnet/aspnet:8.0-alpine] | +| Build image | [e.g., mcr.microsoft.com/dotnet/sdk:8.0-alpine] | +| Stages | [dependency install → build → production] | +| User | [non-root user name] | +| Health check | [endpoint and command] | +| Exposed ports | [port list] | +| Key build args | [if any] | + +### [Repeat for each component] + +## Docker Compose — Local Development + +```yaml +# docker-compose.yml structure +services: + [component]: + build: ./[path] + ports: ["host:container"] + environment: [reference .env.dev] + depends_on: [dependencies with health condition] + healthcheck: [command, interval, timeout, retries] + + db: + image: [postgres:version-alpine] + volumes: [named volume] + environment: [credentials from .env.dev] + healthcheck: [pg_isready] + +volumes: + [named volumes] + +networks: + [shared network] +``` + +## Docker Compose — Integration Tests + +```yaml +# docker-compose.test.yml structure +services: + [app components under test] + + test-runner: + build: ./tests/integration + depends_on: [app components with health condition] + environment: [test configuration] + # Exit code determines test pass/fail + + db: + image: [postgres:version-alpine] + volumes: [seed data mount] +``` + +Run: `docker compose -f docker-compose.test.yml up --abort-on-container-exit` + +## Image Tagging Strategy + +| Context | Tag Format | Example | +|---------|-----------|---------| +| CI build | `//:` | `ghcr.io/org/api:a1b2c3d` | +| Release | `//:` | `ghcr.io/org/api:1.2.0` | +| Local dev | `:latest` | `api:latest` | + +## .dockerignore + +``` +.git +.cursor +_docs +_standalone +node_modules +**/bin +**/obj +**/__pycache__ +*.md +.env* +docker-compose*.yml +``` +``` diff --git a/.cursor/skills/deploy/templates/deployment_procedures.md b/.cursor/skills/deploy/templates/deployment_procedures.md new file mode 100644 index 0000000..f9da36c --- /dev/null +++ b/.cursor/skills/deploy/templates/deployment_procedures.md @@ -0,0 +1,103 @@ +# Deployment Procedures Template + +Save as `_docs/02_plans/deployment/deployment_procedures.md`. + +--- + +```markdown +# [System Name] — Deployment Procedures + +## Deployment Strategy + +**Pattern**: [blue-green / rolling / canary] +**Rationale**: [why this pattern fits the architecture] +**Zero-downtime**: required for production deployments + +### Graceful Shutdown + +- Grace period: 30 seconds for in-flight requests +- Sequence: stop accepting new requests → drain connections → shutdown +- Container orchestrator: `terminationGracePeriodSeconds: 40` + +### Database Migration Ordering + +- Migrations run **before** new code deploys +- All migrations must be backward-compatible (old code works with new schema) +- Irreversible migrations require explicit approval + +## Health Checks + +| Check | Type | Endpoint | Interval | Failure Threshold | Action | +|-------|------|----------|----------|-------------------|--------| +| Liveness | HTTP GET | `/health/live` | 10s | 3 failures | Restart container | +| Readiness | HTTP GET | `/health/ready` | 5s | 3 failures | Remove from load balancer | +| Startup | HTTP GET | `/health/ready` | 5s | 30 attempts | Kill and recreate | + +### Health Check Responses + +- `/health/live`: returns 200 if process is running (no dependency checks) +- `/health/ready`: returns 200 if all dependencies (DB, cache, queues) are reachable + +## Staging Deployment + +1. CI/CD builds and pushes Docker images tagged with git SHA +2. Run database migrations against staging +3. Deploy new images to staging environment +4. Wait for health checks to pass (readiness probe) +5. Run smoke tests against staging +6. If smoke tests fail: automatic rollback to previous image + +## Production Deployment + +1. **Approval**: manual approval required via [mechanism] +2. **Pre-deploy checks**: + - [ ] Staging smoke tests passed + - [ ] Security scan clean + - [ ] Database migration reviewed + - [ ] Monitoring alerts configured + - [ ] Rollback plan confirmed +3. **Deploy**: apply deployment strategy (blue-green / rolling / canary) +4. **Verify**: health checks pass, error rate stable, latency within baseline +5. **Monitor**: observe dashboards for 15 minutes post-deploy +6. **Finalize**: mark deployment as successful or trigger rollback + +## Rollback Procedures + +### Trigger Criteria + +- Health check failures persist after deploy +- Error rate exceeds 5% for more than 5 minutes +- Critical alert fires within 15 minutes of deploy +- Manual decision by on-call engineer + +### Rollback Steps + +1. Redeploy previous Docker image tag (from CI/CD artifact) +2. Verify health checks pass +3. If database migration was applied: + - Run DOWN migration if reversible + - If irreversible: assess data impact, escalate if needed +4. Notify stakeholders +5. Schedule post-mortem within 24 hours + +### Post-Mortem + +Required after every production rollback: +- Timeline of events +- Root cause +- What went wrong +- Prevention measures + +## Deployment Checklist + +- [ ] All tests pass in CI +- [ ] Security scan clean (zero critical/high CVEs) +- [ ] Docker images built and pushed +- [ ] Database migrations reviewed and tested +- [ ] Environment variables configured for target environment +- [ ] Health check endpoints verified +- [ ] Monitoring alerts configured +- [ ] Rollback plan documented and tested +- [ ] Stakeholders notified of deployment window +- [ ] On-call engineer available during deployment +``` diff --git a/.cursor/skills/deploy/templates/environment_strategy.md b/.cursor/skills/deploy/templates/environment_strategy.md new file mode 100644 index 0000000..6c3632b --- /dev/null +++ b/.cursor/skills/deploy/templates/environment_strategy.md @@ -0,0 +1,61 @@ +# Environment Strategy Template + +Save as `_docs/02_plans/deployment/environment_strategy.md`. + +--- + +```markdown +# [System Name] — Environment Strategy + +## Environments + +| Environment | Purpose | Infrastructure | Data Source | +|-------------|---------|---------------|-------------| +| Development | Local developer workflow | docker-compose | Seed data, mocked externals | +| Staging | Pre-production validation | [mirrors production] | Anonymized production-like data | +| Production | Live system | [full infrastructure] | Real data | + +## Environment Variables + +### Required Variables + +| Variable | Purpose | Dev Default | Staging/Prod Source | +|----------|---------|-------------|-------------------| +| `DATABASE_URL` | Postgres connection | `postgres://dev:dev@db:5432/app` | Secret manager | +| [add all required variables] | | | | + +### `.env.example` + +```env +# Copy to .env and fill in values +DATABASE_URL=postgres://user:pass@host:5432/dbname +# [all required variables with placeholder values] +``` + +### Variable Validation + +All services validate required environment variables at startup and fail fast with a clear error message if any are missing. + +## Secrets Management + +| Environment | Method | Tool | +|-------------|--------|------| +| Development | `.env` file (git-ignored) | dotenv | +| Staging | Secret manager | [AWS Secrets Manager / Azure Key Vault / Vault] | +| Production | Secret manager | [AWS Secrets Manager / Azure Key Vault / Vault] | + +Rotation policy: [frequency and procedure] + +## Database Management + +| Environment | Type | Migrations | Data | +|-------------|------|-----------|------| +| Development | Docker Postgres, named volume | Applied on container start | Seed data via init script | +| Staging | Managed Postgres | Applied via CI/CD pipeline | Anonymized production snapshot | +| Production | Managed Postgres | Applied via CI/CD with approval | Live data | + +Migration rules: +- All migrations must be backward-compatible (support old and new code simultaneously) +- Reversible migrations required (DOWN/rollback script) +- Production migrations require review before apply +``` diff --git a/.cursor/skills/deploy/templates/observability.md b/.cursor/skills/deploy/templates/observability.md new file mode 100644 index 0000000..b656b29 --- /dev/null +++ b/.cursor/skills/deploy/templates/observability.md @@ -0,0 +1,132 @@ +# Observability Template + +Save as `_docs/02_plans/deployment/observability.md`. + +--- + +```markdown +# [System Name] — Observability + +## Logging + +### Format + +Structured JSON to stdout/stderr. No file-based logging in containers. + +```json +{ + "timestamp": "ISO8601", + "level": "INFO", + "service": "service-name", + "correlation_id": "uuid", + "message": "Event description", + "context": {} +} +``` + +### Log Levels + +| Level | Usage | Example | +|-------|-------|---------| +| ERROR | Exceptions, failures requiring attention | Database connection failed | +| WARN | Potential issues, degraded performance | Retry attempt 2/3 | +| INFO | Significant business events | User registered, Order placed | +| DEBUG | Detailed diagnostics (dev/staging only) | Request payload, Query params | + +### Retention + +| Environment | Destination | Retention | +|-------------|-------------|-----------| +| Development | Console | Session | +| Staging | [log aggregator] | 7 days | +| Production | [log aggregator] | 30 days | + +### PII Rules + +- Never log passwords, tokens, or session IDs +- Mask email addresses and personal identifiers +- Log user IDs (opaque) instead of usernames + +## Metrics + +### Endpoints + +Every service exposes Prometheus-compatible metrics at `/metrics`. + +### Application Metrics + +| Metric | Type | Description | +|--------|------|-------------| +| `request_count` | Counter | Total HTTP requests by method, path, status | +| `request_duration_seconds` | Histogram | Response time by method, path | +| `error_count` | Counter | Failed requests by type | +| `active_connections` | Gauge | Current open connections | + +### System Metrics + +- CPU usage, Memory usage, Disk I/O, Network I/O + +### Business Metrics + +| Metric | Type | Description | Source | +|--------|------|-------------|--------| +| [from acceptance criteria] | | | | + +Collection interval: 15 seconds + +## Distributed Tracing + +### Configuration + +- SDK: OpenTelemetry +- Propagation: W3C Trace Context via HTTP headers +- Span naming: `.` + +### Sampling + +| Environment | Rate | Rationale | +|-------------|------|-----------| +| Development | 100% | Full visibility | +| Staging | 100% | Full visibility | +| Production | 10% | Balance cost vs observability | + +### Integration Points + +- HTTP requests: automatic instrumentation +- Database queries: automatic instrumentation +- Message queues: manual span creation on publish/consume + +## Alerting + +| Severity | Response Time | Conditions | +|----------|---------------|-----------| +| Critical | 5 min | Service unreachable, health check failed for 1 min, data loss detected | +| High | 30 min | Error rate > 5% for 5 min, P95 latency > 2x baseline for 10 min | +| Medium | 4 hours | Disk usage > 80%, elevated latency, connection pool exhaustion | +| Low | Next business day | Non-critical warnings, deprecated API usage | + +### Notification Channels + +| Severity | Channel | +|----------|---------| +| Critical | [PagerDuty / phone] | +| High | [Slack + email] | +| Medium | [Slack] | +| Low | [Dashboard only] | + +## Dashboards + +### Operations Dashboard + +- Service health status (up/down per component) +- Request rate and error rate +- Response time percentiles (P50, P95, P99) +- Resource utilization (CPU, memory per container) +- Active alerts + +### Business Dashboard + +- [Key business metrics from acceptance criteria] +- [User activity indicators] +- [Transaction volumes] +``` diff --git a/.cursor/skills/plan/SKILL.md b/.cursor/skills/plan/SKILL.md index 9a0ef3b..5ee6222 100644 --- a/.cursor/skills/plan/SKILL.md +++ b/.cursor/skills/plan/SKILL.md @@ -1,7 +1,7 @@ --- name: plan description: | - Decompose a solution into architecture, system flows, components, tests, and Jira epics. + Decompose a solution into architecture, data model, deployment plan, system flows, components, tests, and Jira epics. Systematic 6-step planning workflow with BLOCKING gates, self-verification, and structured artifact management. Uses _docs/ + _docs/02_plans/ structure. Trigger phrases: @@ -15,7 +15,7 @@ disable-model-invocation: true # Solution Planning -Decompose a problem and solution into architecture, system flows, components, tests, and Jira epics through a systematic 6-step workflow. +Decompose a problem and solution into architecture, data model, deployment plan, system flows, components, tests, and Jira epics through a systematic 6-step workflow. ## Core Principles @@ -91,6 +91,13 @@ PLANS_DIR/ │ └── traceability_matrix.md ├── architecture.md ├── system-flows.md +├── data_model.md +├── deployment/ +│ ├── containerization.md +│ ├── ci_cd_pipeline.md +│ ├── environment_strategy.md +│ ├── observability.md +│ └── deployment_procedures.md ├── risk_mitigations.md ├── risk_mitigations_02.md (iterative, ## as sequence) ├── components/ @@ -124,6 +131,8 @@ PLANS_DIR/ | Step 1 | Integration traceability matrix | `integration_tests/traceability_matrix.md` | | Step 2 | Architecture analysis complete | `architecture.md` | | Step 2 | System flows documented | `system-flows.md` | +| Step 2 | Data model documented | `data_model.md` | +| Step 2 | Deployment plan complete | `deployment/` (5 files) | | Step 3 | Each component analyzed | `components/[##]_[name]/description.md` | | Step 3 | Common helpers generated | `common-helpers/[##]_helper_[name].md` | | Step 3 | Diagrams generated | `diagrams/` | @@ -210,9 +219,11 @@ Capture any new questions, findings, or insights that arise during test specific ### Step 2: Solution Analysis **Role**: Professional software architect -**Goal**: Produce `architecture.md` and `system-flows.md` from the solution draft +**Goal**: Produce `architecture.md`, `system-flows.md`, `data_model.md`, and `deployment/` from the solution draft **Constraints**: No code, no component-level detail yet; focus on system-level view +#### Phase 2a: Architecture & Flows + 1. Read all input files thoroughly 2. Incorporate findings, questions, and insights discovered during Step 1 (integration tests) 3. Research unknown or questionable topics via internet; ask user about ambiguities @@ -230,6 +241,56 @@ Capture any new questions, findings, or insights that arise during test specific **BLOCKING**: Present architecture summary to user. Do NOT proceed until user confirms. +#### Phase 2b: Data Model + +**Role**: Professional software architect +**Goal**: Produce a detailed data model document covering entities, relationships, and migration strategy + +1. Extract core entities from architecture.md and solution.md +2. Define entity attributes, types, and constraints +3. Define relationships between entities (Mermaid ERD) +4. Define migration strategy: versioning tool (EF Core migrations / Alembic / sql-migrate), reversibility requirement, naming convention +5. Define seed data requirements per environment (dev, staging) +6. Define backward compatibility approach for schema changes (additive-only by default) + +**Self-verification**: +- [ ] Every entity mentioned in architecture.md is defined +- [ ] Relationships are explicit with cardinality +- [ ] Migration strategy specifies reversibility requirement +- [ ] Seed data requirements defined +- [ ] Backward compatibility approach documented + +**Save action**: Write `data_model.md` + +#### Phase 2c: Deployment Planning + +**Role**: DevOps / Platform engineer +**Goal**: Produce deployment plan covering containerization, CI/CD, environment strategy, observability, and deployment procedures + +Use the `/deploy` skill's templates as structure for each artifact: + +1. Read architecture.md and restrictions.md for infrastructure constraints +2. Research Docker best practices for the project's tech stack +3. Define containerization plan: Dockerfile per component, docker-compose for dev and tests +4. Define CI/CD pipeline: stages, quality gates, caching, parallelization +5. Define environment strategy: dev, staging, production with secrets management +6. Define observability: structured logging, metrics, tracing, alerting +7. Define deployment procedures: strategy, health checks, rollback, checklist + +**Self-verification**: +- [ ] Every component has a Docker specification +- [ ] CI/CD pipeline covers lint, test, security, build, deploy +- [ ] Environment strategy covers dev, staging, production +- [ ] Observability covers logging, metrics, tracing, alerting +- [ ] Deployment procedures include rollback and health checks + +**Save action**: Write all 5 files under `deployment/`: +- `containerization.md` +- `ci_cd_pipeline.md` +- `environment_strategy.md` +- `observability.md` +- `deployment_procedures.md` + --- ### Step 3: Component Decomposition @@ -375,6 +436,20 @@ Before writing the final report, verify ALL of the following: - [ ] Deployment model is defined - [ ] Integration test findings are reflected in architecture decisions +### Data Model +- [ ] Every entity from architecture.md is defined +- [ ] Relationships have explicit cardinality +- [ ] Migration strategy with reversibility requirement +- [ ] Seed data requirements defined +- [ ] Backward compatibility approach documented + +### Deployment +- [ ] Containerization plan covers all components +- [ ] CI/CD pipeline includes lint, test, security, build, deploy stages +- [ ] Environment strategy covers dev, staging, production +- [ ] Observability covers logging, metrics, tracing, alerting +- [ ] Deployment procedures include rollback and health checks + ### Components - [ ] Every component follows SRP - [ ] No circular dependencies @@ -441,8 +516,10 @@ Before writing the final report, verify ALL of the following: │ │ │ 1. Integration Tests → integration_tests/ (5 files) │ │ [BLOCKING: user confirms test coverage] │ -│ 2. Solution Analysis → architecture.md, system-flows.md │ +│ 2a. Architecture → architecture.md, system-flows.md │ │ [BLOCKING: user confirms architecture] │ +│ 2b. Data Model → data_model.md │ +│ 2c. Deployment → deployment/ (5 files) │ │ 3. Component Decompose → components/[##]_[name]/description │ │ [BLOCKING: user confirms decomposition] │ │ 4. Review & Risk → risk_mitigations.md │ diff --git a/.cursor/skills/retrospective/SKILL.md b/.cursor/skills/retrospective/SKILL.md new file mode 100644 index 0000000..414c121 --- /dev/null +++ b/.cursor/skills/retrospective/SKILL.md @@ -0,0 +1,174 @@ +--- +name: retrospective +description: | + Collect metrics from implementation batch reports and code review findings, analyze trends across cycles, + and produce improvement reports with actionable recommendations. + 3-step workflow: collect metrics, analyze trends, produce report. + Outputs to _docs/05_metrics/. + Trigger phrases: + - "retrospective", "retro", "run retro" + - "metrics review", "feedback loop" + - "implementation metrics", "analyze trends" +category: evolve +tags: [retrospective, metrics, trends, improvement, feedback-loop] +disable-model-invocation: true +--- + +# Retrospective + +Collect metrics from implementation artifacts, analyze trends across development cycles, and produce actionable improvement reports. + +## Core Principles + +- **Data-driven**: conclusions come from metrics, not impressions +- **Actionable**: every finding must have a concrete improvement suggestion +- **Cumulative**: each retrospective compares against previous ones to track progress +- **Save immediately**: write artifacts to disk after each step +- **Non-judgmental**: focus on process improvement, not blame + +## Context Resolution + +Fixed paths: + +- IMPL_DIR: `_docs/03_implementation/` +- METRICS_DIR: `_docs/05_metrics/` +- TASKS_DIR: `_docs/02_tasks/` + +Announce the resolved paths to the user before proceeding. + +## Prerequisite Checks (BLOCKING) + +1. `IMPL_DIR` exists and contains at least one `batch_*_report.md` — **STOP if missing** (nothing to analyze) +2. Create METRICS_DIR if it does not exist +3. Check for previous retrospective reports in METRICS_DIR to enable trend comparison + +## Artifact Management + +### Directory Structure + +``` +METRICS_DIR/ +├── retro_[YYYY-MM-DD].md +├── retro_[YYYY-MM-DD].md +└── ... +``` + +## Progress Tracking + +At the start of execution, create a TodoWrite with all steps (1 through 3). Update status as each step completes. + +## Workflow + +### Step 1: Collect Metrics + +**Role**: Data analyst +**Goal**: Parse all implementation artifacts and extract quantitative metrics +**Constraints**: Collection only — no interpretation yet + +#### Sources + +| Source | Metrics Extracted | +|--------|------------------| +| `batch_*_report.md` | Tasks per batch, batch count, task statuses (Done/Blocked/Partial) | +| Code review sections in batch reports | PASS/FAIL/PASS_WITH_WARNINGS ratios, finding counts by severity and category | +| Task spec files in TASKS_DIR | Complexity points per task, dependency count | +| `FINAL_implementation_report.md` | Total tasks, total batches, overall duration | +| Git log (if available) | Commits per batch, files changed per batch | + +#### Metrics to Compute + +**Implementation Metrics**: +- Total tasks implemented +- Total batches executed +- Average tasks per batch +- Average complexity points per batch +- Total complexity points delivered + +**Quality Metrics**: +- Code review pass rate (PASS / total reviews) +- Code review findings by severity: Critical, High, Medium, Low counts +- Code review findings by category: Bug, Spec-Gap, Security, Performance, Maintainability, Style, Scope +- FAIL count (batches that required user intervention) + +**Efficiency Metrics**: +- Blocked task count and reasons +- Tasks completed on first attempt vs requiring fixes +- Batch with most findings (identify problem areas) + +**Self-verification**: +- [ ] All batch reports parsed +- [ ] All metric categories computed +- [ ] No batch reports missed + +--- + +### Step 2: Analyze Trends + +**Role**: Process improvement analyst +**Goal**: Identify patterns, recurring issues, and improvement opportunities +**Constraints**: Analysis must be grounded in the metrics from Step 1 + +1. If previous retrospective reports exist in METRICS_DIR, load the most recent one for comparison +2. Identify patterns: + - **Recurring findings**: which code review categories appear most frequently? + - **Problem components**: which components/files generate the most findings? + - **Complexity accuracy**: do high-complexity tasks actually produce more issues? + - **Blocker patterns**: what types of blockers occur and can they be prevented? +3. Compare against previous retrospective (if exists): + - Which metrics improved? + - Which metrics degraded? + - Were previous improvement actions effective? +4. Identify top 3 improvement actions ranked by impact + +**Self-verification**: +- [ ] Patterns are grounded in specific metrics +- [ ] Comparison with previous retro included (if exists) +- [ ] Top 3 actions are concrete and actionable + +--- + +### Step 3: Produce Report + +**Role**: Technical writer +**Goal**: Write a structured retrospective report with metrics, trends, and recommendations +**Constraints**: Concise, data-driven, actionable + +Write `METRICS_DIR/retro_[YYYY-MM-DD].md` using `templates/retrospective-report.md` as structure. + +**Self-verification**: +- [ ] All metrics from Step 1 included +- [ ] Trend analysis from Step 2 included +- [ ] Top 3 improvement actions clearly stated +- [ ] Suggested rule/skill updates are specific + +**Save action**: Write `retro_[YYYY-MM-DD].md` + +Present the report summary to the user. + +--- + +## Escalation Rules + +| Situation | Action | +|-----------|--------| +| No batch reports exist | **STOP** — nothing to analyze | +| Batch reports have inconsistent format | **WARN user**, extract what is available | +| No previous retrospective for comparison | PROCEED — report baseline metrics only | +| Metrics suggest systemic issue (>50% FAIL rate) | **WARN user** — suggest immediate process review | + +## Methodology Quick Reference + +``` +┌────────────────────────────────────────────────────────────────┐ +│ Retrospective (3-Step Method) │ +├────────────────────────────────────────────────────────────────┤ +│ PREREQ: batch reports exist in _docs/03_implementation/ │ +│ │ +│ 1. Collect Metrics → parse batch reports, compute metrics │ +│ 2. Analyze Trends → patterns, comparison, improvement areas │ +│ 3. Produce Report → _docs/05_metrics/retro_[date].md │ +├────────────────────────────────────────────────────────────────┤ +│ Principles: Data-driven · Actionable · Cumulative │ +│ Non-judgmental · Save immediately │ +└────────────────────────────────────────────────────────────────┘ +``` diff --git a/.cursor/skills/retrospective/templates/retrospective-report.md b/.cursor/skills/retrospective/templates/retrospective-report.md new file mode 100644 index 0000000..629c730 --- /dev/null +++ b/.cursor/skills/retrospective/templates/retrospective-report.md @@ -0,0 +1,93 @@ +# Retrospective Report Template + +Save as `_docs/05_metrics/retro_[YYYY-MM-DD].md`. + +--- + +```markdown +# Retrospective — [YYYY-MM-DD] + +## Implementation Summary + +| Metric | Value | +|--------|-------| +| Total tasks | [count] | +| Total batches | [count] | +| Total complexity points | [sum] | +| Avg tasks per batch | [value] | +| Avg complexity per batch | [value] | + +## Quality Metrics + +### Code Review Results + +| Verdict | Count | Percentage | +|---------|-------|-----------| +| PASS | [count] | [%] | +| PASS_WITH_WARNINGS | [count] | [%] | +| FAIL | [count] | [%] | + +### Findings by Severity + +| Severity | Count | +|----------|-------| +| Critical | [count] | +| High | [count] | +| Medium | [count] | +| Low | [count] | + +### Findings by Category + +| Category | Count | Top Files | +|----------|-------|-----------| +| Bug | [count] | [most affected files] | +| Spec-Gap | [count] | [most affected files] | +| Security | [count] | [most affected files] | +| Performance | [count] | [most affected files] | +| Maintainability | [count] | [most affected files] | +| Style | [count] | [most affected files] | + +## Efficiency + +| Metric | Value | +|--------|-------| +| Blocked tasks | [count] | +| Tasks requiring fixes after review | [count] | +| Batch with most findings | Batch [N] — [reason] | + +### Blocker Analysis + +| Blocker Type | Count | Prevention | +|-------------|-------|-----------| +| [type] | [count] | [suggested prevention] | + +## Trend Comparison + +| Metric | Previous | Current | Change | +|--------|----------|---------|--------| +| Pass rate | [%] | [%] | [+/-] | +| Avg findings per batch | [value] | [value] | [+/-] | +| Blocked tasks | [count] | [count] | [+/-] | + +*Previous retrospective: [date or "N/A — first retro"]* + +## Top 3 Improvement Actions + +1. **[Action title]**: [specific, actionable description] + - Impact: [expected improvement] + - Effort: [low/medium/high] + +2. **[Action title]**: [specific, actionable description] + - Impact: [expected improvement] + - Effort: [low/medium/high] + +3. **[Action title]**: [specific, actionable description] + - Impact: [expected improvement] + - Effort: [low/medium/high] + +## Suggested Rule/Skill Updates + +| File | Change | Rationale | +|------|--------|-----------| +| [.cursor/rules/... or .cursor/skills/...] | [specific change] | [based on which metric] | +``` diff --git a/.cursor/skills/security/SKILL.md b/.cursor/skills/security/SKILL.md index ceab368..5be5701 100644 --- a/.cursor/skills/security/SKILL.md +++ b/.cursor/skills/security/SKILL.md @@ -49,19 +49,8 @@ When testing security or conducting audits: - Validating input sanitization - Reviewing security configuration -### OWASP Top 10 (2021) -| # | Vulnerability | Key Test | -|---|---------------|----------| -| 1 | Broken Access Control | User A accessing User B's data | -| 2 | Cryptographic Failures | Plaintext passwords, HTTP | -| 3 | Injection | SQL/XSS/command injection | -| 4 | Insecure Design | Rate limiting, session timeout | -| 5 | Security Misconfiguration | Verbose errors, exposed /admin | -| 6 | Vulnerable Components | npm audit, outdated packages | -| 7 | Auth Failures | Weak passwords, no MFA | -| 8 | Integrity Failures | Unsigned updates, malware | -| 9 | Logging Failures | No audit trail for breaches | -| 10 | SSRF | Server fetching internal URLs | +### OWASP Top 10 +Use the most recent **stable** version of the OWASP Top 10. At the start of each security audit, research the current version at https://owasp.org/www-project-top-ten/ and test against all listed categories. Do not rely on a hardcoded list — the OWASP Top 10 is updated periodically and the current version must be verified. ### Tools | Type | Tool | Purpose |