Sync .cursor from suite (autodev orchestrator + monorepo skills)

2026-04-22 10:56:33 +00:00 · 2026-04-18 22:04:23 +03:00
parent d883fdb3cc
commit 14a81b0fcc
60 changed files with 4232 additions and 1728 deletions
@@ -115,332 +115,43 @@ At the start of execution, create a TodoWrite with all steps (1 through 7). Upda

 ### Step 1: Deployment Status & Environment Setup

-**Role**: DevOps / Platform engineer
-**Goal**: Assess current deployment readiness, identify all required environment variables, and create `.env` files
-**Constraints**: Must complete before any other step
-
-1. Read architecture.md, all component specs, and restrictions.md
-2. Assess deployment readiness:
-   - List all components and their current state (planned / implemented / tested)
-   - Identify external dependencies (databases, APIs, message queues, cloud services)
-   - Identify infrastructure prerequisites (container registry, cloud accounts, DNS, SSL certificates)
-   - Check if any deployment blockers exist
-3. Identify all required environment variables by scanning:
-   - Component specs for configuration needs
-   - Database connection requirements
-   - External API endpoints and credentials
-   - Feature flags and runtime configuration
-   - Container registry credentials
-   - Cloud provider credentials
-   - Monitoring/logging service endpoints
-4. Generate `.env.example` in project root with all variables and placeholder values (committed to VCS)
-5. Generate `.env` in project root with development defaults filled in where safe (git-ignored)
-6. Ensure `.gitignore` includes `.env` (but NOT `.env.example`)
-7. Produce a deployment status report summarizing readiness, blockers, and required setup
-
-**Self-verification**:
- [ ] All components assessed for deployment readiness
- [ ] External dependencies catalogued
- [ ] Infrastructure prerequisites identified
- [ ] All required environment variables discovered
- [ ] `.env.example` created with placeholder values
- [ ] `.env` created with safe development defaults
- [ ] `.gitignore` updated to exclude `.env`
- [ ] Status report written to `reports/deploy_status_report.md`
-
-**Save action**: Write `reports/deploy_status_report.md` using `templates/deploy_status_report.md`, create `.env` and `.env.example` in project root
-
-**BLOCKING**: Present status report and environment variables to user. Do NOT proceed until confirmed.
+Read and follow `steps/01_status-env.md`.

 ---

 ### Step 2: Containerization

-**Role**: DevOps / Platform engineer
-**Goal**: Define Docker configuration for every component, local development, and blackbox test environments
-**Constraints**: Plan only — no Dockerfile creation. Describe what each Dockerfile should contain.
-
-1. Read architecture.md and all component specs
-2. Read restrictions.md for infrastructure constraints
-3. Research best Docker practices for the project's tech stack (multi-stage builds, base image selection, layer optimization)
-4. For each component, define:
-   - Base image (pinned version, prefer alpine/distroless for production)
-   - Build stages (dependency install, build, production)
-   - Non-root user configuration
-   - Health check endpoint and command
-   - Exposed ports
-   - `.dockerignore` contents
-5. Define `docker-compose.yml` for local development:
-   - All application components
-   - Database (Postgres) with named volume
-   - Any message queues, caches, or external service mocks
-   - Shared network
-   - Environment variable files (`.env`)
-6. Define `docker-compose.test.yml` for blackbox tests:
-   - Application components under test
-   - Test runner container (black-box, no internal imports)
-   - Isolated database with seed data
-   - All tests runnable via `docker compose -f docker-compose.test.yml up --abort-on-container-exit`
-7. Define image tagging strategy: `<registry>/<project>/<component>:<git-sha>` for CI, `latest` for local dev only
-
-**Self-verification**:
- [ ] Every component has a Dockerfile specification
- [ ] Multi-stage builds specified for all production images
- [ ] Non-root user for all containers
- [ ] Health checks defined for every service
- [ ] docker-compose.yml covers all components + dependencies
- [ ] docker-compose.test.yml enables black-box testing
- [ ] `.dockerignore` defined
-
-**Save action**: Write `containerization.md` using `templates/containerization.md`
-
-**BLOCKING**: Present containerization plan to user. Do NOT proceed until confirmed.
+Read and follow `steps/02_containerization.md`.

 ---

 ### Step 3: CI/CD Pipeline

-**Role**: DevOps engineer
-**Goal**: Define the CI/CD pipeline with quality gates, security scanning, and multi-environment deployment
-**Constraints**: Pipeline definition only — produce YAML specification, not implementation
-
-1. Read architecture.md for tech stack and deployment targets
-2. Read restrictions.md for CI/CD constraints (cloud provider, registry, etc.)
-3. Research CI/CD best practices for the project's platform (GitHub Actions / Azure Pipelines)
-4. Define pipeline stages:
-
-| Stage | Trigger | Steps | Quality Gate |
-|-------|---------|-------|-------------|
-| **Lint** | Every push | Run linters per language (black, rustfmt, prettier, dotnet format) | Zero errors |
-| **Test** | Every push | Unit tests, blackbox tests, coverage report | 75%+ coverage (see `.cursor/rules/cursor-meta.mdc` Quality Thresholds) |
-| **Security** | Every push | Dependency audit, SAST scan (Semgrep/SonarQube), image scan (Trivy) | Zero critical/high CVEs |
-| **Build** | PR merge to dev | Build Docker images, tag with git SHA | Build succeeds |
-| **Push** | After build | Push to container registry | Push succeeds |
-| **Deploy Staging** | After push | Deploy to staging environment | Health checks pass |
-| **Smoke Tests** | After staging deploy | Run critical path tests against staging | All pass |
-| **Deploy Production** | Manual approval | Deploy to production | Health checks pass |
-
-5. Define caching strategy: dependency caches, Docker layer caches, build artifact caches
-6. Define parallelization: which stages can run concurrently
-7. Define notifications: build failures, deployment status, security alerts
-
-**Self-verification**:
- [ ] All pipeline stages defined with triggers and gates
- [ ] Coverage threshold enforced (75%+)
- [ ] Security scanning included (dependencies + images + SAST)
- [ ] Caching configured for dependencies and Docker layers
- [ ] Multi-environment deployment (staging → production)
- [ ] Rollback procedure referenced
- [ ] Notifications configured
-
-**Save action**: Write `ci_cd_pipeline.md` using `templates/ci_cd_pipeline.md`
+Read and follow `steps/03_ci-cd-pipeline.md`.

 ---

 ### Step 4: Environment Strategy

-**Role**: Platform engineer
-**Goal**: Define environment configuration, secrets management, and environment parity
-**Constraints**: Strategy document — no secrets or credentials in output
-
-1. Define environments:
-
-| Environment | Purpose | Infrastructure | Data |
-|-------------|---------|---------------|------|
-| **Development** | Local developer workflow | docker-compose, local volumes | Seed data, mocks for external APIs |
-| **Staging** | Pre-production validation | Mirrors production topology | Anonymized production-like data |
-| **Production** | Live system | Full infrastructure | Real data |
-
-2. Define environment variable management:
-   - Reference `.env.example` created in Step 1
-   - Per-environment variable sources (`.env` for dev, secret manager for staging/prod)
-   - Validation: fail fast on missing required variables at startup
-3. Define secrets management:
-   - Never commit secrets to version control
-   - Development: `.env` files (git-ignored)
-   - Staging/Production: secret manager (AWS Secrets Manager / Azure Key Vault / Vault)
-   - Rotation policy
-4. Define database management per environment:
-   - Development: Docker Postgres with named volume, seed data
-   - Staging: managed Postgres, migrations applied via CI/CD
-   - Production: managed Postgres, migrations require approval
-
-**Self-verification**:
- [ ] All three environments defined with clear purpose
- [ ] Environment variable documentation complete (references `.env.example` from Step 1)
- [ ] No secrets in any output document
- [ ] Secret manager specified for staging/production
- [ ] Database strategy per environment
-
-**Save action**: Write `environment_strategy.md` using `templates/environment_strategy.md`
+Read and follow `steps/04_environment-strategy.md`.

 ---

 ### Step 5: Observability

-**Role**: Site Reliability Engineer (SRE)
-**Goal**: Define logging, metrics, tracing, and alerting strategy
-**Constraints**: Strategy document — describe what to implement, not how to wire it
-
-1. Read architecture.md and component specs for service boundaries
-2. Research observability best practices for the tech stack
-
-**Logging**:
- Structured JSON to stdout/stderr (no file logging in containers)
- Fields: `timestamp` (ISO 8601), `level`, `service`, `correlation_id`, `message`, `context`
- Levels: ERROR (exceptions), WARN (degraded), INFO (business events), DEBUG (diagnostics, dev only)
- No PII in logs
- Retention: dev = console, staging = 7 days, production = 30 days
-
-**Metrics**:
- Expose Prometheus-compatible `/metrics` endpoint per service
- System metrics: CPU, memory, disk, network
- Application metrics: `request_count`, `request_duration` (histogram), `error_count`, `active_connections`
- Business metrics: derived from acceptance criteria
- Collection interval: 15s
-
-**Distributed Tracing**:
- OpenTelemetry SDK integration
- Trace context propagation via HTTP headers and message queue metadata
- Span naming: `<service>.<operation>`
- Sampling: 100% in dev/staging, 10% in production (adjust based on volume)
-
-**Alerting**:
-
-| Severity | Response Time | Condition Examples |
-|----------|---------------|-------------------|
-| Critical | 5 min | Service down, data loss, health check failed |
-| High | 30 min | Error rate > 5%, P95 latency > 2x baseline |
-| Medium | 4 hours | Disk > 80%, elevated latency |
-| Low | Next business day | Non-critical warnings |
-
-**Dashboards**:
- Operations: service health, request rate, error rate, response time percentiles, resource utilization
- Business: key business metrics from acceptance criteria
-
-**Self-verification**:
- [ ] Structured logging format defined with required fields
- [ ] Metrics endpoint specified per service
- [ ] OpenTelemetry tracing configured
- [ ] Alert severities with response times defined
- [ ] Dashboards cover operations and business metrics
- [ ] PII exclusion from logs addressed
-
-**Save action**: Write `observability.md` using `templates/observability.md`
+Read and follow `steps/05_observability.md`.

 ---

 ### Step 6: Deployment Procedures

-**Role**: DevOps / Platform engineer
-**Goal**: Define deployment strategy, rollback procedures, health checks, and deployment checklist
-**Constraints**: Procedures document — no implementation
-
-1. Define deployment strategy:
-   - Preferred pattern: blue-green / rolling / canary (choose based on architecture)
-   - Zero-downtime requirement for production
-   - Graceful shutdown: 30-second grace period for in-flight requests
-   - Database migration ordering: migrate before deploy, backward-compatible only
-
-2. Define health checks:
-
-| Check | Type | Endpoint | Interval | Threshold |
-|-------|------|----------|----------|-----------|
-| Liveness | HTTP GET | `/health/live` | 10s | 3 failures → restart |
-| Readiness | HTTP GET | `/health/ready` | 5s | 3 failures → remove from LB |
-| Startup | HTTP GET | `/health/ready` | 5s | 30 attempts max |
-
-3. Define rollback procedures:
-   - Trigger criteria: health check failures, error rate spike, critical alert
-   - Rollback steps: redeploy previous image tag, verify health, rollback database if needed
-   - Communication: notify stakeholders during rollback
-   - Post-mortem: required after every production rollback
-
-4. Define deployment checklist:
-   - [ ] All tests pass in CI
-   - [ ] Security scan clean (zero critical/high CVEs)
-   - [ ] Database migrations reviewed and tested
-   - [ ] Environment variables configured
-   - [ ] Health check endpoints responding
-   - [ ] Monitoring alerts configured
-   - [ ] Rollback plan documented and tested
-   - [ ] Stakeholders notified
-
-**Self-verification**:
- [ ] Deployment strategy chosen and justified
- [ ] Zero-downtime approach specified
- [ ] Health checks defined (liveness, readiness, startup)
- [ ] Rollback trigger criteria and steps documented
- [ ] Deployment checklist complete
-
-**Save action**: Write `deployment_procedures.md` using `templates/deployment_procedures.md`
-
-**BLOCKING**: Present deployment procedures to user. Do NOT proceed until confirmed.
+Read and follow `steps/06_procedures.md`.

 ---

 ### Step 7: Deployment Scripts

-**Role**: DevOps / Platform engineer
-**Goal**: Create executable deployment scripts for pulling Docker images and running services on the remote target machine
-**Constraints**: Produce real, executable shell scripts. This is the ONLY step that creates implementation artifacts.
-
-1. Read containerization.md and deployment_procedures.md from previous steps
-2. Read `.env.example` for required variables
-3. Create the following scripts in `SCRIPTS_DIR/`:
-
-**`deploy.sh`** — Main deployment orchestrator:
-   - Validates that required environment variables are set (sources `.env` if present)
-   - Calls `pull-images.sh`, then `stop-services.sh`, then `start-services.sh`, then `health-check.sh`
-   - Exits with non-zero code on any failure
-   - Supports `--rollback` flag to redeploy previous image tags
-
-**`pull-images.sh`** — Pull Docker images to target machine:
-   - Reads image list and tags from environment or config
-   - Authenticates with container registry
-   - Pulls all required images
-   - Verifies image integrity (digest check)
-
-**`start-services.sh`** — Start services on target machine:
-   - Runs `docker compose up -d` or individual `docker run` commands
-   - Applies environment variables from `.env`
-   - Configures networks and volumes
-   - Waits for containers to reach healthy state
-
-**`stop-services.sh`** — Graceful shutdown:
-   - Stops services with graceful shutdown period
-   - Saves current image tags for rollback reference
-   - Cleans up orphaned containers/networks
-
-**`health-check.sh`** — Verify deployment health:
-   - Checks all health endpoints
-   - Reports status per service
-   - Returns non-zero if any service is unhealthy
-
-4. All scripts must:
-   - Be POSIX-compatible (#!/bin/bash with set -euo pipefail)
-   - Source `.env` from project root or accept env vars from the environment
-   - Include usage/help output (`--help` flag)
-   - Be idempotent where possible
-   - Handle SSH connection to remote target (configurable via `DEPLOY_HOST` env var)
-
-5. Document all scripts in `deploy_scripts.md`
-
-**Self-verification**:
- [ ] All five scripts created and executable
- [ ] Scripts source environment variables correctly
- [ ] `deploy.sh` orchestrates the full flow
- [ ] `pull-images.sh` handles registry auth and image pull
- [ ] `start-services.sh` starts containers with correct config
- [ ] `stop-services.sh` handles graceful shutdown
- [ ] `health-check.sh` validates all endpoints
- [ ] Rollback supported via `deploy.sh --rollback`
- [ ] Scripts work for remote deployment via SSH (DEPLOY_HOST)
- [ ] `deploy_scripts.md` documents all scripts
-
-**Save action**: Write scripts to `SCRIPTS_DIR/`, write `deploy_scripts.md` using `templates/deploy_scripts.md`
-
---
+Read and follow `steps/07_scripts.md`.

 ## Escalation Rules

@@ -473,17 +184,24 @@ At the start of execution, create a TodoWrite with all steps (1 through 7). Upda
 ├────────────────────────────────────────────────────────────────┤
 │ PREREQ: architecture.md + component specs exist                │
 │                                                                │
-│ 1. Status & Env     → reports/deploy_status_report.md          │
+│ 1. Status & Env     → steps/01_status-env.md                   │
+│                       → reports/deploy_status_report.md         │
 │                       + .env + .env.example                    │
 │    [BLOCKING: user confirms status & env vars]                 │
-│ 2. Containerization  → containerization.md                     │
+│ 2. Containerization → steps/02_containerization.md             │
+│                       → containerization.md                    │
 │    [BLOCKING: user confirms Docker plan]                       │
-│ 3. CI/CD Pipeline    → ci_cd_pipeline.md                       │
-│ 4. Environment       → environment_strategy.md                 │
-│ 5. Observability     → observability.md                        │
-│ 6. Procedures        → deployment_procedures.md                │
+│ 3. CI/CD Pipeline   → steps/03_ci-cd-pipeline.md               │
+│                       → ci_cd_pipeline.md                      │
+│ 4. Environment      → steps/04_environment-strategy.md         │
+│                       → environment_strategy.md                │
+│ 5. Observability    → steps/05_observability.md                │
+│                       → observability.md                       │
+│ 6. Procedures       → steps/06_procedures.md                   │
+│                       → deployment_procedures.md               │
 │    [BLOCKING: user confirms deployment plan]                   │
-│ 7. Scripts           → deploy_scripts.md + scripts/            │
+│ 7. Scripts          → steps/07_scripts.md                      │
+│                       → deploy_scripts.md + scripts/           │
 ├────────────────────────────────────────────────────────────────┤
 │ Principles: Docker-first · IaC · Observability built-in        │
 │             Environment parity · Save immediately              │
@@ -0,0 +1,45 @@
+# Step 1: Deployment Status & Environment Setup
+
+**Role**: DevOps / Platform engineer
+**Goal**: Assess current deployment readiness, identify all required environment variables, and create `.env` files.
+**Constraints**: Must complete before any other step.
+
+## Steps
+
+1. Read `architecture.md`, all component specs, and `restrictions.md`
+2. Assess deployment readiness:
+   - List all components and their current state (planned / implemented / tested)
+   - Identify external dependencies (databases, APIs, message queues, cloud services)
+   - Identify infrastructure prerequisites (container registry, cloud accounts, DNS, SSL certificates)
+   - Check if any deployment blockers exist
+3. Identify all required environment variables by scanning:
+   - Component specs for configuration needs
+   - Database connection requirements
+   - External API endpoints and credentials
+   - Feature flags and runtime configuration
+   - Container registry credentials
+   - Cloud provider credentials
+   - Monitoring/logging service endpoints
+4. Generate `.env.example` in project root with all variables and placeholder values (committed to VCS)
+5. Generate `.env` in project root with development defaults filled in where safe (git-ignored)
+6. Ensure `.gitignore` includes `.env` (but NOT `.env.example`)
+7. Produce a deployment status report summarizing readiness, blockers, and required setup
+
+## Self-verification
+
+- [ ] All components assessed for deployment readiness
+- [ ] External dependencies catalogued
+- [ ] Infrastructure prerequisites identified
+- [ ] All required environment variables discovered
+- [ ] `.env.example` created with placeholder values
+- [ ] `.env` created with safe development defaults
+- [ ] `.gitignore` updated to exclude `.env`
+- [ ] Status report written to `reports/deploy_status_report.md`
+
+## Save action
+
+Write `reports/deploy_status_report.md` using `templates/deploy_status_report.md`. Create `.env` and `.env.example` in project root.
+
+## Blocking
+
+**BLOCKING**: Present status report and environment variables to user. Do NOT proceed until confirmed.
@@ -0,0 +1,48 @@
+# Step 2: Containerization
+
+**Role**: DevOps / Platform engineer
+**Goal**: Define Docker configuration for every component, local development, and blackbox test environments.
+**Constraints**: Plan only — no Dockerfile creation. Describe what each Dockerfile should contain.
+
+## Steps
+
+1. Read `architecture.md` and all component specs
+2. Read `restrictions.md` for infrastructure constraints
+3. Research best Docker practices for the project's tech stack (multi-stage builds, base image selection, layer optimization)
+4. For each component, define:
+   - Base image (pinned version, prefer alpine/distroless for production)
+   - Build stages (dependency install, build, production)
+   - Non-root user configuration
+   - Health check endpoint and command
+   - Exposed ports
+   - `.dockerignore` contents
+5. Define `docker-compose.yml` for local development:
+   - All application components
+   - Database (Postgres) with named volume
+   - Any message queues, caches, or external service mocks
+   - Shared network
+   - Environment variable files (`.env`)
+6. Define `docker-compose.test.yml` for blackbox tests:
+   - Application components under test
+   - Test runner container (black-box, no internal imports)
+   - Isolated database with seed data
+   - All tests runnable via `docker compose -f docker-compose.test.yml up --abort-on-container-exit`
+7. Define image tagging strategy: `<registry>/<project>/<component>:<git-sha>` for CI, `latest` for local dev only
+
+## Self-verification
+
+- [ ] Every component has a Dockerfile specification
+- [ ] Multi-stage builds specified for all production images
+- [ ] Non-root user for all containers
+- [ ] Health checks defined for every service
+- [ ] `docker-compose.yml` covers all components + dependencies
+- [ ] `docker-compose.test.yml` enables black-box testing
+- [ ] `.dockerignore` defined
+
+## Save action
+
+Write `containerization.md` using `templates/containerization.md`.
+
+## Blocking
+
+**BLOCKING**: Present containerization plan to user. Do NOT proceed until confirmed.
@@ -0,0 +1,41 @@
+# Step 3: CI/CD Pipeline
+
+**Role**: DevOps engineer
+**Goal**: Define the CI/CD pipeline with quality gates, security scanning, and multi-environment deployment.
+**Constraints**: Pipeline definition only — produce YAML specification, not implementation.
+
+## Steps
+
+1. Read `architecture.md` for tech stack and deployment targets
+2. Read `restrictions.md` for CI/CD constraints (cloud provider, registry, etc.)
+3. Research CI/CD best practices for the project's platform (GitHub Actions / Azure Pipelines)
+4. Define pipeline stages:
+
+| Stage | Trigger | Steps | Quality Gate |
+|-------|---------|-------|-------------|
+| **Lint** | Every push | Run linters per language (black, rustfmt, prettier, dotnet format) | Zero errors |
+| **Test** | Every push | Unit tests, blackbox tests, coverage report | 75%+ coverage (see `.cursor/rules/cursor-meta.mdc` Quality Thresholds) |
+| **Security** | Every push | Dependency audit, SAST scan (Semgrep/SonarQube), image scan (Trivy) | Zero critical/high CVEs |
+| **Build** | PR merge to dev | Build Docker images, tag with git SHA | Build succeeds |
+| **Push** | After build | Push to container registry | Push succeeds |
+| **Deploy Staging** | After push | Deploy to staging environment | Health checks pass |
+| **Smoke Tests** | After staging deploy | Run critical path tests against staging | All pass |
+| **Deploy Production** | Manual approval | Deploy to production | Health checks pass |
+
+5. Define caching strategy: dependency caches, Docker layer caches, build artifact caches
+6. Define parallelization: which stages can run concurrently
+7. Define notifications: build failures, deployment status, security alerts
+
+## Self-verification
+
+- [ ] All pipeline stages defined with triggers and gates
+- [ ] Coverage threshold enforced (75%+)
+- [ ] Security scanning included (dependencies + images + SAST)
+- [ ] Caching configured for dependencies and Docker layers
+- [ ] Multi-environment deployment (staging → production)
+- [ ] Rollback procedure referenced
+- [ ] Notifications configured
+
+## Save action
+
+Write `ci_cd_pipeline.md` using `templates/ci_cd_pipeline.md`.
@@ -0,0 +1,41 @@
+# Step 4: Environment Strategy
+
+**Role**: Platform engineer
+**Goal**: Define environment configuration, secrets management, and environment parity.
+**Constraints**: Strategy document — no secrets or credentials in output.
+
+## Steps
+
+1. Define environments:
+
+| Environment | Purpose | Infrastructure | Data |
+|-------------|---------|---------------|------|
+| **Development** | Local developer workflow | docker-compose, local volumes | Seed data, mocks for external APIs |
+| **Staging** | Pre-production validation | Mirrors production topology | Anonymized production-like data |
+| **Production** | Live system | Full infrastructure | Real data |
+
+2. Define environment variable management:
+   - Reference `.env.example` created in Step 1
+   - Per-environment variable sources (`.env` for dev, secret manager for staging/prod)
+   - Validation: fail fast on missing required variables at startup
+3. Define secrets management:
+   - Never commit secrets to version control
+   - Development: `.env` files (git-ignored)
+   - Staging/Production: secret manager (AWS Secrets Manager / Azure Key Vault / Vault)
+   - Rotation policy
+4. Define database management per environment:
+   - Development: Docker Postgres with named volume, seed data
+   - Staging: managed Postgres, migrations applied via CI/CD
+   - Production: managed Postgres, migrations require approval
+
+## Self-verification
+
+- [ ] All three environments defined with clear purpose
+- [ ] Environment variable documentation complete (references `.env.example` from Step 1)
+- [ ] No secrets in any output document
+- [ ] Secret manager specified for staging/production
+- [ ] Database strategy per environment
+
+## Save action
+
+Write `environment_strategy.md` using `templates/environment_strategy.md`.
@@ -0,0 +1,60 @@
+# Step 5: Observability
+
+**Role**: Site Reliability Engineer (SRE)
+**Goal**: Define logging, metrics, tracing, and alerting strategy.
+**Constraints**: Strategy document — describe what to implement, not how to wire it.
+
+## Steps
+
+1. Read `architecture.md` and component specs for service boundaries
+2. Research observability best practices for the tech stack
+
+## Logging
+
+- Structured JSON to stdout/stderr (no file logging in containers)
+- Fields: `timestamp` (ISO 8601), `level`, `service`, `correlation_id`, `message`, `context`
+- Levels: ERROR (exceptions), WARN (degraded), INFO (business events), DEBUG (diagnostics, dev only)
+- No PII in logs
+- Retention: dev = console, staging = 7 days, production = 30 days
+
+## Metrics
+
+- Expose Prometheus-compatible `/metrics` endpoint per service
+- System metrics: CPU, memory, disk, network
+- Application metrics: `request_count`, `request_duration` (histogram), `error_count`, `active_connections`
+- Business metrics: derived from acceptance criteria
+- Collection interval: 15s
+
+## Distributed Tracing
+
+- OpenTelemetry SDK integration
+- Trace context propagation via HTTP headers and message queue metadata
+- Span naming: `<service>.<operation>`
+- Sampling: 100% in dev/staging, 10% in production (adjust based on volume)
+
+## Alerting
+
+| Severity | Response Time | Condition Examples |
+|----------|---------------|-------------------|
+| Critical | 5 min | Service down, data loss, health check failed |
+| High | 30 min | Error rate > 5%, P95 latency > 2x baseline |
+| Medium | 4 hours | Disk > 80%, elevated latency |
+| Low | Next business day | Non-critical warnings |
+
+## Dashboards
+
+- Operations: service health, request rate, error rate, response time percentiles, resource utilization
+- Business: key business metrics from acceptance criteria
+
+## Self-verification
+
+- [ ] Structured logging format defined with required fields
+- [ ] Metrics endpoint specified per service
+- [ ] OpenTelemetry tracing configured
+- [ ] Alert severities with response times defined
+- [ ] Dashboards cover operations and business metrics
+- [ ] PII exclusion from logs addressed
+
+## Save action
+
+Write `observability.md` using `templates/observability.md`.
@@ -0,0 +1,53 @@
+# Step 6: Deployment Procedures
+
+**Role**: DevOps / Platform engineer
+**Goal**: Define deployment strategy, rollback procedures, health checks, and deployment checklist.
+**Constraints**: Procedures document — no implementation.
+
+## Steps
+
+1. Define deployment strategy:
+   - Preferred pattern: blue-green / rolling / canary (choose based on architecture)
+   - Zero-downtime requirement for production
+   - Graceful shutdown: 30-second grace period for in-flight requests
+   - Database migration ordering: migrate before deploy, backward-compatible only
+
+2. Define health checks:
+
+| Check | Type | Endpoint | Interval | Threshold |
+|-------|------|----------|----------|-----------|
+| Liveness | HTTP GET | `/health/live` | 10s | 3 failures → restart |
+| Readiness | HTTP GET | `/health/ready` | 5s | 3 failures → remove from LB |
+| Startup | HTTP GET | `/health/ready` | 5s | 30 attempts max |
+
+3. Define rollback procedures:
+   - Trigger criteria: health check failures, error rate spike, critical alert
+   - Rollback steps: redeploy previous image tag, verify health, rollback database if needed
+   - Communication: notify stakeholders during rollback
+   - Post-mortem: required after every production rollback
+
+4. Define deployment checklist:
+   - [ ] All tests pass in CI
+   - [ ] Security scan clean (zero critical/high CVEs)
+   - [ ] Database migrations reviewed and tested
+   - [ ] Environment variables configured
+   - [ ] Health check endpoints responding
+   - [ ] Monitoring alerts configured
+   - [ ] Rollback plan documented and tested
+   - [ ] Stakeholders notified
+
+## Self-verification
+
+- [ ] Deployment strategy chosen and justified
+- [ ] Zero-downtime approach specified
+- [ ] Health checks defined (liveness, readiness, startup)
+- [ ] Rollback trigger criteria and steps documented
+- [ ] Deployment checklist complete
+
+## Save action
+
+Write `deployment_procedures.md` using `templates/deployment_procedures.md`.
+
+## Blocking
+
+**BLOCKING**: Present deployment procedures to user. Do NOT proceed until confirmed.
@@ -0,0 +1,70 @@
+# Step 7: Deployment Scripts
+
+**Role**: DevOps / Platform engineer
+**Goal**: Create executable deployment scripts for pulling Docker images and running services on the remote target machine.
+**Constraints**: Produce real, executable shell scripts. This is the ONLY step that creates implementation artifacts.
+
+## Steps
+
+1. Read `containerization.md` and `deployment_procedures.md` from previous steps
+2. Read `.env.example` for required variables
+3. Create the following scripts in `SCRIPTS_DIR/`:
+
+### `deploy.sh` — Main deployment orchestrator
+
+- Validates that required environment variables are set (sources `.env` if present)
+- Calls `pull-images.sh`, then `stop-services.sh`, then `start-services.sh`, then `health-check.sh`
+- Exits with non-zero code on any failure
+- Supports `--rollback` flag to redeploy previous image tags
+
+### `pull-images.sh` — Pull Docker images to target machine
+
+- Reads image list and tags from environment or config
+- Authenticates with container registry
+- Pulls all required images
+- Verifies image integrity (digest check)
+
+### `start-services.sh` — Start services on target machine
+
+- Runs `docker compose up -d` or individual `docker run` commands
+- Applies environment variables from `.env`
+- Configures networks and volumes
+- Waits for containers to reach healthy state
+
+### `stop-services.sh` — Graceful shutdown
+
+- Stops services with graceful shutdown period
+- Saves current image tags for rollback reference
+- Cleans up orphaned containers/networks
+
+### `health-check.sh` — Verify deployment health
+
+- Checks all health endpoints
+- Reports status per service
+- Returns non-zero if any service is unhealthy
+
+4. All scripts must:
+   - Be POSIX-compatible (`#!/bin/bash` with `set -euo pipefail`)
+   - Source `.env` from project root or accept env vars from the environment
+   - Include usage/help output (`--help` flag)
+   - Be idempotent where possible
+   - Handle SSH connection to remote target (configurable via `DEPLOY_HOST` env var)
+
+5. Document all scripts in `deploy_scripts.md`
+
+## Self-verification
+
+- [ ] All five scripts created and executable
+- [ ] Scripts source environment variables correctly
+- [ ] `deploy.sh` orchestrates the full flow
+- [ ] `pull-images.sh` handles registry auth and image pull
+- [ ] `start-services.sh` starts containers with correct config
+- [ ] `stop-services.sh` handles graceful shutdown
+- [ ] `health-check.sh` validates all endpoints
+- [ ] Rollback supported via `deploy.sh --rollback`
+- [ ] Scripts work for remote deployment via SSH (`DEPLOY_HOST`)
+- [ ] `deploy_scripts.md` documents all scripts
+
+## Save action
+
+Write scripts to `SCRIPTS_DIR/`. Write `deploy_scripts.md` using `templates/deploy_scripts.md`.