add python scaffold folder and autodevelopment system

2026-04-22 11:06:34 +00:00 · 2026-03-25 21:57:26 +02:00
parent 1009af4a32
commit d8f91ef6a9
99 changed files with 11070 additions and 1 deletions
@@ -0,0 +1,491 @@
+---
+name: deploy
+description: |
+  Comprehensive deployment skill covering status check, env setup, containerization, CI/CD pipeline, environment strategy, observability, deployment procedures, and deployment scripts.
+  7-step workflow: Status & env check, Docker containerization, CI/CD pipeline definition, environment strategy, observability planning, deployment procedures, deployment scripts.
+  Uses _docs/04_deploy/ structure.
+  Trigger phrases:
+  - "deploy", "deployment", "deployment strategy"
+  - "CI/CD", "pipeline", "containerize"
+  - "observability", "monitoring", "logging"
+  - "dockerize", "docker compose"
+category: ship
+tags: [deployment, docker, ci-cd, observability, monitoring, containerization, scripts]
+disable-model-invocation: true
+---
+
+# Deployment Planning
+
+Plan and document the full deployment lifecycle: check deployment status and environment requirements, containerize the application, define CI/CD pipelines, configure environments, set up observability, document deployment procedures, and generate deployment scripts.
+
+## Core Principles
+
+- **Docker-first**: every component runs in a container; local dev, blackbox tests, and production all use Docker
+- **Infrastructure as code**: all deployment configuration is version-controlled
+- **Observability built-in**: logging, metrics, and tracing are part of the deployment plan, not afterthoughts
+- **Environment parity**: dev, staging, and production environments mirror each other as closely as possible
+- **Save immediately**: write artifacts to disk after each step; never accumulate unsaved work
+- **Ask, don't assume**: when infrastructure constraints or preferences are unclear, ask the user
+- **Plan, don't code**: this workflow produces deployment documents and specifications, not implementation code (except deployment scripts in Step 7)
+
+## Context Resolution
+
+Fixed paths:
+
+- DOCUMENT_DIR: `_docs/02_document/`
+- DEPLOY_DIR: `_docs/04_deploy/`
+- REPORTS_DIR: `_docs/04_deploy/reports/`
+- SCRIPTS_DIR: `scripts/`
+- ARCHITECTURE: `_docs/02_document/architecture.md`
+- COMPONENTS_DIR: `_docs/02_document/components/`
+
+Announce the resolved paths to the user before proceeding.
+
+## Input Specification
+
+### Required Files
+
+| File | Purpose | Required |
+|------|---------|----------|
+| `_docs/00_problem/problem.md` | Problem description and context | Greenfield only |
+| `_docs/00_problem/restrictions.md` | Constraints and limitations | Greenfield only |
+| `_docs/01_solution/solution.md` | Finalized solution | Greenfield only |
+| `DOCUMENT_DIR/architecture.md` | Architecture (from plan or document skill) | Always |
+| `DOCUMENT_DIR/components/` | Component specs | Always |
+
+### Prerequisite Checks (BLOCKING)
+
+1. `architecture.md` exists — **STOP if missing**, run `/plan` first
+2. At least one component spec exists in `DOCUMENT_DIR/components/` — **STOP if missing**
+3. Create DEPLOY_DIR, REPORTS_DIR, and SCRIPTS_DIR if they do not exist
+4. If DEPLOY_DIR already contains artifacts, ask user: **resume from last checkpoint or start fresh?**
+
+## Artifact Management
+
+### Directory Structure
+
+```
+DEPLOY_DIR/
+├── containerization.md
+├── ci_cd_pipeline.md
+├── environment_strategy.md
+├── observability.md
+├── deployment_procedures.md
+├── deploy_scripts.md
+└── reports/
+    └── deploy_status_report.md
+
+SCRIPTS_DIR/           (project root)
+├── deploy.sh
+├── pull-images.sh
+├── start-services.sh
+├── stop-services.sh
+└── health-check.sh
+
+.env                   (project root, git-ignored)
+.env.example           (project root, committed)
+```
+
+### Save Timing
+
+| Step | Save immediately after | Filename |
+|------|------------------------|----------|
+| Step 1 | Status check & env setup complete | `reports/deploy_status_report.md` + `.env` + `.env.example` |
+| Step 2 | Containerization plan complete | `containerization.md` |
+| Step 3 | CI/CD pipeline defined | `ci_cd_pipeline.md` |
+| Step 4 | Environment strategy documented | `environment_strategy.md` |
+| Step 5 | Observability plan complete | `observability.md` |
+| Step 6 | Deployment procedures documented | `deployment_procedures.md` |
+| Step 7 | Deployment scripts created | `deploy_scripts.md` + scripts in `SCRIPTS_DIR/` |
+
+### Resumability
+
+If DEPLOY_DIR already contains artifacts:
+
+1. List existing files and match to the save timing table
+2. Identify the last completed step
+3. Resume from the next incomplete step
+4. Inform the user which steps are being skipped
+
+## Progress Tracking
+
+At the start of execution, create a TodoWrite with all steps (1 through 7). Update status as each step completes.
+
+## Workflow
+
+### Step 1: Deployment Status & Environment Setup
+
+**Role**: DevOps / Platform engineer
+**Goal**: Assess current deployment readiness, identify all required environment variables, and create `.env` files
+**Constraints**: Must complete before any other step
+
+1. Read architecture.md, all component specs, and restrictions.md
+2. Assess deployment readiness:
+   - List all components and their current state (planned / implemented / tested)
+   - Identify external dependencies (databases, APIs, message queues, cloud services)
+   - Identify infrastructure prerequisites (container registry, cloud accounts, DNS, SSL certificates)
+   - Check if any deployment blockers exist
+3. Identify all required environment variables by scanning:
+   - Component specs for configuration needs
+   - Database connection requirements
+   - External API endpoints and credentials
+   - Feature flags and runtime configuration
+   - Container registry credentials
+   - Cloud provider credentials
+   - Monitoring/logging service endpoints
+4. Generate `.env.example` in project root with all variables and placeholder values (committed to VCS)
+5. Generate `.env` in project root with development defaults filled in where safe (git-ignored)
+6. Ensure `.gitignore` includes `.env` (but NOT `.env.example`)
+7. Produce a deployment status report summarizing readiness, blockers, and required setup
+
+**Self-verification**:
+- [ ] All components assessed for deployment readiness
+- [ ] External dependencies catalogued
+- [ ] Infrastructure prerequisites identified
+- [ ] All required environment variables discovered
+- [ ] `.env.example` created with placeholder values
+- [ ] `.env` created with safe development defaults
+- [ ] `.gitignore` updated to exclude `.env`
+- [ ] Status report written to `reports/deploy_status_report.md`
+
+**Save action**: Write `reports/deploy_status_report.md` using `templates/deploy_status_report.md`, create `.env` and `.env.example` in project root
+
+**BLOCKING**: Present status report and environment variables to user. Do NOT proceed until confirmed.
+
+---
+
+### Step 2: Containerization
+
+**Role**: DevOps / Platform engineer
+**Goal**: Define Docker configuration for every component, local development, and blackbox test environments
+**Constraints**: Plan only — no Dockerfile creation. Describe what each Dockerfile should contain.
+
+1. Read architecture.md and all component specs
+2. Read restrictions.md for infrastructure constraints
+3. Research best Docker practices for the project's tech stack (multi-stage builds, base image selection, layer optimization)
+4. For each component, define:
+   - Base image (pinned version, prefer alpine/distroless for production)
+   - Build stages (dependency install, build, production)
+   - Non-root user configuration
+   - Health check endpoint and command
+   - Exposed ports
+   - `.dockerignore` contents
+5. Define `docker-compose.yml` for local development:
+   - All application components
+   - Database (Postgres) with named volume
+   - Any message queues, caches, or external service mocks
+   - Shared network
+   - Environment variable files (`.env`)
+6. Define `docker-compose.test.yml` for blackbox tests:
+   - Application components under test
+   - Test runner container (black-box, no internal imports)
+   - Isolated database with seed data
+   - All tests runnable via `docker compose -f docker-compose.test.yml up --abort-on-container-exit`
+7. Define image tagging strategy: `<registry>/<project>/<component>:<git-sha>` for CI, `latest` for local dev only
+
+**Self-verification**:
+- [ ] Every component has a Dockerfile specification
+- [ ] Multi-stage builds specified for all production images
+- [ ] Non-root user for all containers
+- [ ] Health checks defined for every service
+- [ ] docker-compose.yml covers all components + dependencies
+- [ ] docker-compose.test.yml enables black-box testing
+- [ ] `.dockerignore` defined
+
+**Save action**: Write `containerization.md` using `templates/containerization.md`
+
+**BLOCKING**: Present containerization plan to user. Do NOT proceed until confirmed.
+
+---
+
+### Step 3: CI/CD Pipeline
+
+**Role**: DevOps engineer
+**Goal**: Define the CI/CD pipeline with quality gates, security scanning, and multi-environment deployment
+**Constraints**: Pipeline definition only — produce YAML specification, not implementation
+
+1. Read architecture.md for tech stack and deployment targets
+2. Read restrictions.md for CI/CD constraints (cloud provider, registry, etc.)
+3. Research CI/CD best practices for the project's platform (GitHub Actions / Azure Pipelines)
+4. Define pipeline stages:
+
+| Stage | Trigger | Steps | Quality Gate |
+|-------|---------|-------|-------------|
+| **Lint** | Every push | Run linters per language (black, rustfmt, prettier, dotnet format) | Zero errors |
+| **Test** | Every push | Unit tests, blackbox tests, coverage report | 75%+ coverage (see `.cursor/rules/cursor-meta.mdc` Quality Thresholds) |
+| **Security** | Every push | Dependency audit, SAST scan (Semgrep/SonarQube), image scan (Trivy) | Zero critical/high CVEs |
+| **Build** | PR merge to dev | Build Docker images, tag with git SHA | Build succeeds |
+| **Push** | After build | Push to container registry | Push succeeds |
+| **Deploy Staging** | After push | Deploy to staging environment | Health checks pass |
+| **Smoke Tests** | After staging deploy | Run critical path tests against staging | All pass |
+| **Deploy Production** | Manual approval | Deploy to production | Health checks pass |
+
+5. Define caching strategy: dependency caches, Docker layer caches, build artifact caches
+6. Define parallelization: which stages can run concurrently
+7. Define notifications: build failures, deployment status, security alerts
+
+**Self-verification**:
+- [ ] All pipeline stages defined with triggers and gates
+- [ ] Coverage threshold enforced (75%+)
+- [ ] Security scanning included (dependencies + images + SAST)
+- [ ] Caching configured for dependencies and Docker layers
+- [ ] Multi-environment deployment (staging → production)
+- [ ] Rollback procedure referenced
+- [ ] Notifications configured
+
+**Save action**: Write `ci_cd_pipeline.md` using `templates/ci_cd_pipeline.md`
+
+---
+
+### Step 4: Environment Strategy
+
+**Role**: Platform engineer
+**Goal**: Define environment configuration, secrets management, and environment parity
+**Constraints**: Strategy document — no secrets or credentials in output
+
+1. Define environments:
+
+| Environment | Purpose | Infrastructure | Data |
+|-------------|---------|---------------|------|
+| **Development** | Local developer workflow | docker-compose, local volumes | Seed data, mocks for external APIs |
+| **Staging** | Pre-production validation | Mirrors production topology | Anonymized production-like data |
+| **Production** | Live system | Full infrastructure | Real data |
+
+2. Define environment variable management:
+   - Reference `.env.example` created in Step 1
+   - Per-environment variable sources (`.env` for dev, secret manager for staging/prod)
+   - Validation: fail fast on missing required variables at startup
+3. Define secrets management:
+   - Never commit secrets to version control
+   - Development: `.env` files (git-ignored)
+   - Staging/Production: secret manager (AWS Secrets Manager / Azure Key Vault / Vault)
+   - Rotation policy
+4. Define database management per environment:
+   - Development: Docker Postgres with named volume, seed data
+   - Staging: managed Postgres, migrations applied via CI/CD
+   - Production: managed Postgres, migrations require approval
+
+**Self-verification**:
+- [ ] All three environments defined with clear purpose
+- [ ] Environment variable documentation complete (references `.env.example` from Step 1)
+- [ ] No secrets in any output document
+- [ ] Secret manager specified for staging/production
+- [ ] Database strategy per environment
+
+**Save action**: Write `environment_strategy.md` using `templates/environment_strategy.md`
+
+---
+
+### Step 5: Observability
+
+**Role**: Site Reliability Engineer (SRE)
+**Goal**: Define logging, metrics, tracing, and alerting strategy
+**Constraints**: Strategy document — describe what to implement, not how to wire it
+
+1. Read architecture.md and component specs for service boundaries
+2. Research observability best practices for the tech stack
+
+**Logging**:
+- Structured JSON to stdout/stderr (no file logging in containers)
+- Fields: `timestamp` (ISO 8601), `level`, `service`, `correlation_id`, `message`, `context`
+- Levels: ERROR (exceptions), WARN (degraded), INFO (business events), DEBUG (diagnostics, dev only)
+- No PII in logs
+- Retention: dev = console, staging = 7 days, production = 30 days
+
+**Metrics**:
+- Expose Prometheus-compatible `/metrics` endpoint per service
+- System metrics: CPU, memory, disk, network
+- Application metrics: `request_count`, `request_duration` (histogram), `error_count`, `active_connections`
+- Business metrics: derived from acceptance criteria
+- Collection interval: 15s
+
+**Distributed Tracing**:
+- OpenTelemetry SDK integration
+- Trace context propagation via HTTP headers and message queue metadata
+- Span naming: `<service>.<operation>`
+- Sampling: 100% in dev/staging, 10% in production (adjust based on volume)
+
+**Alerting**:
+
+| Severity | Response Time | Condition Examples |
+|----------|---------------|-------------------|
+| Critical | 5 min | Service down, data loss, health check failed |
+| High | 30 min | Error rate > 5%, P95 latency > 2x baseline |
+| Medium | 4 hours | Disk > 80%, elevated latency |
+| Low | Next business day | Non-critical warnings |
+
+**Dashboards**:
+- Operations: service health, request rate, error rate, response time percentiles, resource utilization
+- Business: key business metrics from acceptance criteria
+
+**Self-verification**:
+- [ ] Structured logging format defined with required fields
+- [ ] Metrics endpoint specified per service
+- [ ] OpenTelemetry tracing configured
+- [ ] Alert severities with response times defined
+- [ ] Dashboards cover operations and business metrics
+- [ ] PII exclusion from logs addressed
+
+**Save action**: Write `observability.md` using `templates/observability.md`
+
+---
+
+### Step 6: Deployment Procedures
+
+**Role**: DevOps / Platform engineer
+**Goal**: Define deployment strategy, rollback procedures, health checks, and deployment checklist
+**Constraints**: Procedures document — no implementation
+
+1. Define deployment strategy:
+   - Preferred pattern: blue-green / rolling / canary (choose based on architecture)
+   - Zero-downtime requirement for production
+   - Graceful shutdown: 30-second grace period for in-flight requests
+   - Database migration ordering: migrate before deploy, backward-compatible only
+
+2. Define health checks:
+
+| Check | Type | Endpoint | Interval | Threshold |
+|-------|------|----------|----------|-----------|
+| Liveness | HTTP GET | `/health/live` | 10s | 3 failures → restart |
+| Readiness | HTTP GET | `/health/ready` | 5s | 3 failures → remove from LB |
+| Startup | HTTP GET | `/health/ready` | 5s | 30 attempts max |
+
+3. Define rollback procedures:
+   - Trigger criteria: health check failures, error rate spike, critical alert
+   - Rollback steps: redeploy previous image tag, verify health, rollback database if needed
+   - Communication: notify stakeholders during rollback
+   - Post-mortem: required after every production rollback
+
+4. Define deployment checklist:
+   - [ ] All tests pass in CI
+   - [ ] Security scan clean (zero critical/high CVEs)
+   - [ ] Database migrations reviewed and tested
+   - [ ] Environment variables configured
+   - [ ] Health check endpoints responding
+   - [ ] Monitoring alerts configured
+   - [ ] Rollback plan documented and tested
+   - [ ] Stakeholders notified
+
+**Self-verification**:
+- [ ] Deployment strategy chosen and justified
+- [ ] Zero-downtime approach specified
+- [ ] Health checks defined (liveness, readiness, startup)
+- [ ] Rollback trigger criteria and steps documented
+- [ ] Deployment checklist complete
+
+**Save action**: Write `deployment_procedures.md` using `templates/deployment_procedures.md`
+
+**BLOCKING**: Present deployment procedures to user. Do NOT proceed until confirmed.
+
+---
+
+### Step 7: Deployment Scripts
+
+**Role**: DevOps / Platform engineer
+**Goal**: Create executable deployment scripts for pulling Docker images and running services on the remote target machine
+**Constraints**: Produce real, executable shell scripts. This is the ONLY step that creates implementation artifacts.
+
+1. Read containerization.md and deployment_procedures.md from previous steps
+2. Read `.env.example` for required variables
+3. Create the following scripts in `SCRIPTS_DIR/`:
+
+**`deploy.sh`** — Main deployment orchestrator:
+   - Validates that required environment variables are set (sources `.env` if present)
+   - Calls `pull-images.sh`, then `stop-services.sh`, then `start-services.sh`, then `health-check.sh`
+   - Exits with non-zero code on any failure
+   - Supports `--rollback` flag to redeploy previous image tags
+
+**`pull-images.sh`** — Pull Docker images to target machine:
+   - Reads image list and tags from environment or config
+   - Authenticates with container registry
+   - Pulls all required images
+   - Verifies image integrity (digest check)
+
+**`start-services.sh`** — Start services on target machine:
+   - Runs `docker compose up -d` or individual `docker run` commands
+   - Applies environment variables from `.env`
+   - Configures networks and volumes
+   - Waits for containers to reach healthy state
+
+**`stop-services.sh`** — Graceful shutdown:
+   - Stops services with graceful shutdown period
+   - Saves current image tags for rollback reference
+   - Cleans up orphaned containers/networks
+
+**`health-check.sh`** — Verify deployment health:
+   - Checks all health endpoints
+   - Reports status per service
+   - Returns non-zero if any service is unhealthy
+
+4. All scripts must:
+   - Be POSIX-compatible (#!/bin/bash with set -euo pipefail)
+   - Source `.env` from project root or accept env vars from the environment
+   - Include usage/help output (`--help` flag)
+   - Be idempotent where possible
+   - Handle SSH connection to remote target (configurable via `DEPLOY_HOST` env var)
+
+5. Document all scripts in `deploy_scripts.md`
+
+**Self-verification**:
+- [ ] All five scripts created and executable
+- [ ] Scripts source environment variables correctly
+- [ ] `deploy.sh` orchestrates the full flow
+- [ ] `pull-images.sh` handles registry auth and image pull
+- [ ] `start-services.sh` starts containers with correct config
+- [ ] `stop-services.sh` handles graceful shutdown
+- [ ] `health-check.sh` validates all endpoints
+- [ ] Rollback supported via `deploy.sh --rollback`
+- [ ] Scripts work for remote deployment via SSH (DEPLOY_HOST)
+- [ ] `deploy_scripts.md` documents all scripts
+
+**Save action**: Write scripts to `SCRIPTS_DIR/`, write `deploy_scripts.md` using `templates/deploy_scripts.md`
+
+---
+
+## Escalation Rules
+
+| Situation | Action |
+|-----------|--------|
+| Unknown cloud provider or hosting | **ASK user** |
+| Container registry not specified | **ASK user** |
+| CI/CD platform preference unclear | **ASK user** — default to GitHub Actions |
+| Secret manager not chosen | **ASK user** |
+| Deployment pattern trade-offs | **ASK user** with recommendation |
+| Missing architecture.md | **STOP** — run `/plan` first |
+| Remote target machine details unknown | **ASK user** for SSH access, OS, and specs |
+
+## Common Mistakes
+
+- **Implementing during planning**: Steps 1–6 produce documents, not code (Step 7 is the exception — it creates scripts)
+- **Hardcoding secrets**: never include real credentials in deployment documents or scripts
+- **Ignoring blackbox test containerization**: the test environment must be containerized alongside the app
+- **Skipping BLOCKING gates**: never proceed past a BLOCKING marker without user confirmation
+- **Using `:latest` tags**: always pin base image versions
+- **Forgetting observability**: logging, metrics, and tracing are deployment concerns, not post-deployment additions
+- **Committing `.env`**: only `.env.example` goes to version control; `.env` must be in `.gitignore`
+- **Non-portable scripts**: deployment scripts must work across environments; avoid hardcoded paths
+
+## Methodology Quick Reference
+
+```
+┌────────────────────────────────────────────────────────────────┐
+│            Deployment Planning (7-Step Method)                  │
+├────────────────────────────────────────────────────────────────┤
+│ PREREQ: architecture.md + component specs exist                │
+│                                                                │
+│ 1. Status & Env     → reports/deploy_status_report.md          │
+│                       + .env + .env.example                    │
+│    [BLOCKING: user confirms status & env vars]                 │
+│ 2. Containerization  → containerization.md                     │
+│    [BLOCKING: user confirms Docker plan]                       │
+│ 3. CI/CD Pipeline    → ci_cd_pipeline.md                       │
+│ 4. Environment       → environment_strategy.md                 │
+│ 5. Observability     → observability.md                        │
+│ 6. Procedures        → deployment_procedures.md                │
+│    [BLOCKING: user confirms deployment plan]                   │
+│ 7. Scripts           → deploy_scripts.md + scripts/            │
+├────────────────────────────────────────────────────────────────┤
+│ Principles: Docker-first · IaC · Observability built-in        │
+│             Environment parity · Save immediately              │
+└────────────────────────────────────────────────────────────────┘
+```
@@ -0,0 +1,87 @@
+# CI/CD Pipeline Template
+
+Save as `_docs/04_deploy/ci_cd_pipeline.md`.
+
+---
+
+```markdown
+# [System Name] — CI/CD Pipeline
+
+## Pipeline Overview
+
+| Stage | Trigger | Quality Gate |
+|-------|---------|-------------|
+| Lint | Every push | Zero lint errors |
+| Test | Every push | 75%+ coverage, all tests pass |
+| Security | Every push | Zero critical/high CVEs |
+| Build | PR merge to dev | Docker build succeeds |
+| Push | After build | Images pushed to registry |
+| Deploy Staging | After push | Health checks pass |
+| Smoke Tests | After staging deploy | Critical paths pass |
+| Deploy Production | Manual approval | Health checks pass |
+
+## Stage Details
+
+### Lint
+- [Language-specific linters and formatters]
+- Runs in parallel per language
+
+### Test
+- Unit tests: [framework and command]
+- Blackbox tests: [framework and command, uses docker-compose.test.yml]
+- Coverage threshold: 75% overall, 90% critical paths
+- Coverage report published as pipeline artifact
+
+### Security
+- Dependency audit: [tool, e.g., npm audit / pip-audit / dotnet list package --vulnerable]
+- SAST scan: [tool, e.g., Semgrep / SonarQube]
+- Image scan: Trivy on built Docker images
+- Block on: critical or high severity findings
+
+### Build
+- Docker images built using multi-stage Dockerfiles
+- Tagged with git SHA: `<registry>/<component>:<sha>`
+- Build cache: Docker layer cache via CI cache action
+
+### Push
+- Registry: [container registry URL]
+- Authentication: [method]
+
+### Deploy Staging
+- Deployment method: [docker compose / Kubernetes / cloud service]
+- Pre-deploy: run database migrations
+- Post-deploy: verify health check endpoints
+- Automated rollback on health check failure
+
+### Smoke Tests
+- Subset of blackbox tests targeting staging environment
+- Validates critical user flows
+- Timeout: [maximum duration]
+
+### Deploy Production
+- Requires manual approval via [mechanism]
+- Deployment strategy: [blue-green / rolling / canary]
+- Pre-deploy: database migration review
+- Post-deploy: health checks + monitoring for 15 min
+
+## Caching Strategy
+
+| Cache | Key | Restore Keys |
+|-------|-----|-------------|
+| Dependencies | [lockfile hash] | [partial match] |
+| Docker layers | [Dockerfile hash] | [partial match] |
+| Build artifacts | [source hash] | [partial match] |
+
+## Parallelization
+
+[Diagram or description of which stages run concurrently]
+
+## Notifications
+
+| Event | Channel | Recipients |
+|-------|---------|-----------|
+| Build failure | [Slack/email] | [team] |
+| Security alert | [Slack/email] | [team + security] |
+| Deploy success | [Slack] | [team] |
+| Deploy failure | [Slack/email + PagerDuty] | [on-call] |
+```
@@ -0,0 +1,94 @@
+# Containerization Plan Template
+
+Save as `_docs/04_deploy/containerization.md`.
+
+---
+
+```markdown
+# [System Name] — Containerization
+
+## Component Dockerfiles
+
+### [Component Name]
+
+| Property | Value |
+|----------|-------|
+| Base image | [e.g., mcr.microsoft.com/dotnet/aspnet:8.0-alpine] |
+| Build image | [e.g., mcr.microsoft.com/dotnet/sdk:8.0-alpine] |
+| Stages | [dependency install → build → production] |
+| User | [non-root user name] |
+| Health check | [endpoint and command] |
+| Exposed ports | [port list] |
+| Key build args | [if any] |
+
+### [Repeat for each component]
+
+## Docker Compose — Local Development
+
+```yaml
+# docker-compose.yml structure
+services:
+  [component]:
+    build: ./[path]
+    ports: ["host:container"]
+    environment: [reference .env.dev]
+    depends_on: [dependencies with health condition]
+    healthcheck: [command, interval, timeout, retries]
+
+  db:
+    image: [postgres:version-alpine]
+    volumes: [named volume]
+    environment: [credentials from .env.dev]
+    healthcheck: [pg_isready]
+
+volumes:
+  [named volumes]
+
+networks:
+  [shared network]
+```
+
+## Docker Compose — Blackbox Tests
+
+```yaml
+# docker-compose.test.yml structure
+services:
+  [app components under test]
+
+  test-runner:
+    build: ./tests/integration
+    depends_on: [app components with health condition]
+    environment: [test configuration]
+    # Exit code determines test pass/fail
+
+  db:
+    image: [postgres:version-alpine]
+    volumes: [seed data mount]
+```
+
+Run: `docker compose -f docker-compose.test.yml up --abort-on-container-exit`
+
+## Image Tagging Strategy
+
+| Context | Tag Format | Example |
+|---------|-----------|---------|
+| CI build | `<registry>/<project>/<component>:<git-sha>` | `ghcr.io/org/api:a1b2c3d` |
+| Release | `<registry>/<project>/<component>:<semver>` | `ghcr.io/org/api:1.2.0` |
+| Local dev | `<component>:latest` | `api:latest` |
+
+## .dockerignore
+
+```
+.git
+.cursor
+_docs
+_standalone
+node_modules
+**/bin
+**/obj
+**/__pycache__
+*.md
+.env*
+docker-compose*.yml
+```
+```
@@ -0,0 +1,114 @@
+# Deployment Scripts Documentation Template
+
+Save as `_docs/04_deploy/deploy_scripts.md`.
+
+---
+
+```markdown
+# [System Name] — Deployment Scripts
+
+## Overview
+
+| Script | Purpose | Location |
+|--------|---------|----------|
+| `deploy.sh` | Main deployment orchestrator | `scripts/deploy.sh` |
+| `pull-images.sh` | Pull Docker images from registry | `scripts/pull-images.sh` |
+| `start-services.sh` | Start all services | `scripts/start-services.sh` |
+| `stop-services.sh` | Graceful shutdown | `scripts/stop-services.sh` |
+| `health-check.sh` | Verify deployment health | `scripts/health-check.sh` |
+
+## Prerequisites
+
+- Docker and Docker Compose installed on target machine
+- SSH access to target machine (configured via `DEPLOY_HOST`)
+- Container registry credentials configured
+- `.env` file with required environment variables (see `.env.example`)
+
+## Environment Variables
+
+All scripts source `.env` from the project root or accept variables from the environment.
+
+| Variable | Required By | Purpose |
+|----------|------------|---------|
+| `DEPLOY_HOST` | All (remote mode) | SSH target for remote deployment |
+| `REGISTRY_URL` | `pull-images.sh` | Container registry URL |
+| `REGISTRY_USER` | `pull-images.sh` | Registry authentication |
+| `REGISTRY_PASS` | `pull-images.sh` | Registry authentication |
+| `IMAGE_TAG` | `pull-images.sh`, `start-services.sh` | Image version to deploy (default: latest git SHA) |
+| [add project-specific variables] | | |
+
+## Script Details
+
+### deploy.sh
+
+Main orchestrator that runs the full deployment flow.
+
+**Usage**:
+- `./scripts/deploy.sh` — Deploy latest version
+- `./scripts/deploy.sh --rollback` — Rollback to previous version
+- `./scripts/deploy.sh --help` — Show usage
+
+**Flow**:
+1. Validate required environment variables
+2. Call `pull-images.sh`
+3. Call `stop-services.sh`
+4. Call `start-services.sh`
+5. Call `health-check.sh`
+6. Report success or failure
+
+**Rollback**: When `--rollback` is passed, reads the previous image tags saved by `stop-services.sh` and redeploys those versions.
+
+### pull-images.sh
+
+**Usage**: `./scripts/pull-images.sh [--help]`
+
+**Steps**:
+1. Authenticate with container registry (`REGISTRY_URL`)
+2. Pull all required images with specified `IMAGE_TAG`
+3. Verify image integrity via digest check
+4. Report pull results per image
+
+### start-services.sh
+
+**Usage**: `./scripts/start-services.sh [--help]`
+
+**Steps**:
+1. Run `docker compose up -d` with the correct env file
+2. Configure networks and volumes
+3. Wait for all containers to report healthy state
+4. Report startup status per service
+
+### stop-services.sh
+
+**Usage**: `./scripts/stop-services.sh [--help]`
+
+**Steps**:
+1. Save current image tags to `previous_tags.env` (for rollback)
+2. Stop services with graceful shutdown period (30s)
+3. Clean up orphaned containers and networks
+
+### health-check.sh
+
+**Usage**: `./scripts/health-check.sh [--help]`
+
+**Checks**:
+
+| Service | Endpoint | Expected |
+|---------|----------|----------|
+| [Component 1] | `http://localhost:[port]/health/live` | HTTP 200 |
+| [Component 2] | `http://localhost:[port]/health/ready` | HTTP 200 |
+| [add all services] | | |
+
+**Exit codes**:
+- `0` — All services healthy
+- `1` — One or more services unhealthy
+
+## Common Script Properties
+
+All scripts:
+- Use `#!/bin/bash` with `set -euo pipefail`
+- Support `--help` flag for usage information
+- Source `.env` from project root if present
+- Are idempotent where possible
+- Support remote execution via SSH when `DEPLOY_HOST` is set
+```
@@ -0,0 +1,73 @@
+# Deployment Status Report Template
+
+Save as `_docs/04_deploy/reports/deploy_status_report.md`.
+
+---
+
+```markdown
+# [System Name] — Deployment Status Report
+
+## Deployment Readiness Summary
+
+| Aspect | Status | Notes |
+|--------|--------|-------|
+| Architecture defined | ✅ / ❌ | |
+| Component specs complete | ✅ / ❌ | |
+| Infrastructure prerequisites met | ✅ / ❌ | |
+| External dependencies identified | ✅ / ❌ | |
+| Blockers | [count] | [summary] |
+
+## Component Status
+
+| Component | State | Docker-ready | Notes |
+|-----------|-------|-------------|-------|
+| [Component 1] | planned / implemented / tested | yes / no | |
+| [Component 2] | planned / implemented / tested | yes / no | |
+
+## External Dependencies
+
+| Dependency | Type | Required For | Status |
+|------------|------|-------------|--------|
+| [e.g., PostgreSQL] | Database | Data persistence | [available / needs setup] |
+| [e.g., Redis] | Cache | Session management | [available / needs setup] |
+| [e.g., External API] | API | [purpose] | [available / needs setup] |
+
+## Infrastructure Prerequisites
+
+| Prerequisite | Status | Action Needed |
+|-------------|--------|--------------|
+| Container registry | [ready / not set up] | [action] |
+| Cloud account | [ready / not set up] | [action] |
+| DNS configuration | [ready / not set up] | [action] |
+| SSL certificates | [ready / not set up] | [action] |
+| CI/CD platform | [ready / not set up] | [action] |
+| Secret manager | [ready / not set up] | [action] |
+
+## Deployment Blockers
+
+| Blocker | Severity | Resolution |
+|---------|----------|-----------|
+| [blocker description] | critical / high / medium | [resolution steps] |
+
+## Required Environment Variables
+
+| Variable | Purpose | Required In | Default (Dev) | Source (Staging/Prod) |
+|----------|---------|------------|---------------|----------------------|
+| `DATABASE_URL` | Postgres connection string | All components | `postgres://dev:dev@db:5432/app` | Secret manager |
+| `DEPLOY_HOST` | Remote target machine | Deployment scripts | `localhost` | Environment |
+| `REGISTRY_URL` | Container registry URL | CI/CD, deploy scripts | `localhost:5000` | Environment |
+| `REGISTRY_USER` | Registry username | CI/CD, deploy scripts | — | Secret manager |
+| `REGISTRY_PASS` | Registry password | CI/CD, deploy scripts | — | Secret manager |
+| [add all required variables] | | | | |
+
+## .env Files Created
+
+- `.env.example` — committed to VCS, contains all variable names with placeholder values
+- `.env` — git-ignored, contains development defaults
+
+## Next Steps
+
+1. [Resolve any blockers listed above]
+2. [Set up missing infrastructure prerequisites]
+3. [Proceed to containerization planning]
+```
@@ -0,0 +1,103 @@
+# Deployment Procedures Template
+
+Save as `_docs/04_deploy/deployment_procedures.md`.
+
+---
+
+```markdown
+# [System Name] — Deployment Procedures
+
+## Deployment Strategy
+
+**Pattern**: [blue-green / rolling / canary]
+**Rationale**: [why this pattern fits the architecture]
+**Zero-downtime**: required for production deployments
+
+### Graceful Shutdown
+
+- Grace period: 30 seconds for in-flight requests
+- Sequence: stop accepting new requests → drain connections → shutdown
+- Container orchestrator: `terminationGracePeriodSeconds: 40`
+
+### Database Migration Ordering
+
+- Migrations run **before** new code deploys
+- All migrations must be backward-compatible (old code works with new schema)
+- Irreversible migrations require explicit approval
+
+## Health Checks
+
+| Check | Type | Endpoint | Interval | Failure Threshold | Action |
+|-------|------|----------|----------|-------------------|--------|
+| Liveness | HTTP GET | `/health/live` | 10s | 3 failures | Restart container |
+| Readiness | HTTP GET | `/health/ready` | 5s | 3 failures | Remove from load balancer |
+| Startup | HTTP GET | `/health/ready` | 5s | 30 attempts | Kill and recreate |
+
+### Health Check Responses
+
+- `/health/live`: returns 200 if process is running (no dependency checks)
+- `/health/ready`: returns 200 if all dependencies (DB, cache, queues) are reachable
+
+## Staging Deployment
+
+1. CI/CD builds and pushes Docker images tagged with git SHA
+2. Run database migrations against staging
+3. Deploy new images to staging environment
+4. Wait for health checks to pass (readiness probe)
+5. Run smoke tests against staging
+6. If smoke tests fail: automatic rollback to previous image
+
+## Production Deployment
+
+1. **Approval**: manual approval required via [mechanism]
+2. **Pre-deploy checks**:
+   - [ ] Staging smoke tests passed
+   - [ ] Security scan clean
+   - [ ] Database migration reviewed
+   - [ ] Monitoring alerts configured
+   - [ ] Rollback plan confirmed
+3. **Deploy**: apply deployment strategy (blue-green / rolling / canary)
+4. **Verify**: health checks pass, error rate stable, latency within baseline
+5. **Monitor**: observe dashboards for 15 minutes post-deploy
+6. **Finalize**: mark deployment as successful or trigger rollback
+
+## Rollback Procedures
+
+### Trigger Criteria
+
+- Health check failures persist after deploy
+- Error rate exceeds 5% for more than 5 minutes
+- Critical alert fires within 15 minutes of deploy
+- Manual decision by on-call engineer
+
+### Rollback Steps
+
+1. Redeploy previous Docker image tag (from CI/CD artifact)
+2. Verify health checks pass
+3. If database migration was applied:
+   - Run DOWN migration if reversible
+   - If irreversible: assess data impact, escalate if needed
+4. Notify stakeholders
+5. Schedule post-mortem within 24 hours
+
+### Post-Mortem
+
+Required after every production rollback:
+- Timeline of events
+- Root cause
+- What went wrong
+- Prevention measures
+
+## Deployment Checklist
+
+- [ ] All tests pass in CI
+- [ ] Security scan clean (zero critical/high CVEs)
+- [ ] Docker images built and pushed
+- [ ] Database migrations reviewed and tested
+- [ ] Environment variables configured for target environment
+- [ ] Health check endpoints verified
+- [ ] Monitoring alerts configured
+- [ ] Rollback plan documented and tested
+- [ ] Stakeholders notified of deployment window
+- [ ] On-call engineer available during deployment
+```
@@ -0,0 +1,61 @@
+# Environment Strategy Template
+
+Save as `_docs/04_deploy/environment_strategy.md`.
+
+---
+
+```markdown
+# [System Name] — Environment Strategy
+
+## Environments
+
+| Environment | Purpose | Infrastructure | Data Source |
+|-------------|---------|---------------|-------------|
+| Development | Local developer workflow | docker-compose | Seed data, mocked externals |
+| Staging | Pre-production validation | [mirrors production] | Anonymized production-like data |
+| Production | Live system | [full infrastructure] | Real data |
+
+## Environment Variables
+
+### Required Variables
+
+| Variable | Purpose | Dev Default | Staging/Prod Source |
+|----------|---------|-------------|-------------------|
+| `DATABASE_URL` | Postgres connection | `postgres://dev:dev@db:5432/app` | Secret manager |
+| [add all required variables] | | | |
+
+### `.env.example`
+
+```env
+# Copy to .env and fill in values
+DATABASE_URL=postgres://user:pass@host:5432/dbname
+# [all required variables with placeholder values]
+```
+
+### Variable Validation
+
+All services validate required environment variables at startup and fail fast with a clear error message if any are missing.
+
+## Secrets Management
+
+| Environment | Method | Tool |
+|-------------|--------|------|
+| Development | `.env` file (git-ignored) | dotenv |
+| Staging | Secret manager | [AWS Secrets Manager / Azure Key Vault / Vault] |
+| Production | Secret manager | [AWS Secrets Manager / Azure Key Vault / Vault] |
+
+Rotation policy: [frequency and procedure]
+
+## Database Management
+
+| Environment | Type | Migrations | Data |
+|-------------|------|-----------|------|
+| Development | Docker Postgres, named volume | Applied on container start | Seed data via init script |
+| Staging | Managed Postgres | Applied via CI/CD pipeline | Anonymized production snapshot |
+| Production | Managed Postgres | Applied via CI/CD with approval | Live data |
+
+Migration rules:
+- All migrations must be backward-compatible (support old and new code simultaneously)
+- Reversible migrations required (DOWN/rollback script)
+- Production migrations require review before apply
+```
@@ -0,0 +1,132 @@
+# Observability Template
+
+Save as `_docs/04_deploy/observability.md`.
+
+---
+
+```markdown
+# [System Name] — Observability
+
+## Logging
+
+### Format
+
+Structured JSON to stdout/stderr. No file-based logging in containers.
+
+```json
+{
+  "timestamp": "ISO8601",
+  "level": "INFO",
+  "service": "service-name",
+  "correlation_id": "uuid",
+  "message": "Event description",
+  "context": {}
+}
+```
+
+### Log Levels
+
+| Level | Usage | Example |
+|-------|-------|---------|
+| ERROR | Exceptions, failures requiring attention | Database connection failed |
+| WARN | Potential issues, degraded performance | Retry attempt 2/3 |
+| INFO | Significant business events | User registered, Order placed |
+| DEBUG | Detailed diagnostics (dev/staging only) | Request payload, Query params |
+
+### Retention
+
+| Environment | Destination | Retention |
+|-------------|-------------|-----------|
+| Development | Console | Session |
+| Staging | [log aggregator] | 7 days |
+| Production | [log aggregator] | 30 days |
+
+### PII Rules
+
+- Never log passwords, tokens, or session IDs
+- Mask email addresses and personal identifiers
+- Log user IDs (opaque) instead of usernames
+
+## Metrics
+
+### Endpoints
+
+Every service exposes Prometheus-compatible metrics at `/metrics`.
+
+### Application Metrics
+
+| Metric | Type | Description |
+|--------|------|-------------|
+| `request_count` | Counter | Total HTTP requests by method, path, status |
+| `request_duration_seconds` | Histogram | Response time by method, path |
+| `error_count` | Counter | Failed requests by type |
+| `active_connections` | Gauge | Current open connections |
+
+### System Metrics
+
+- CPU usage, Memory usage, Disk I/O, Network I/O
+
+### Business Metrics
+
+| Metric | Type | Description | Source |
+|--------|------|-------------|--------|
+| [from acceptance criteria] | | | |
+
+Collection interval: 15 seconds
+
+## Distributed Tracing
+
+### Configuration
+
+- SDK: OpenTelemetry
+- Propagation: W3C Trace Context via HTTP headers
+- Span naming: `<service>.<operation>`
+
+### Sampling
+
+| Environment | Rate | Rationale |
+|-------------|------|-----------|
+| Development | 100% | Full visibility |
+| Staging | 100% | Full visibility |
+| Production | 10% | Balance cost vs observability |
+
+### Integration Points
+
+- HTTP requests: automatic instrumentation
+- Database queries: automatic instrumentation
+- Message queues: manual span creation on publish/consume
+
+## Alerting
+
+| Severity | Response Time | Conditions |
+|----------|---------------|-----------|
+| Critical | 5 min | Service unreachable, health check failed for 1 min, data loss detected |
+| High | 30 min | Error rate > 5% for 5 min, P95 latency > 2x baseline for 10 min |
+| Medium | 4 hours | Disk usage > 80%, elevated latency, connection pool exhaustion |
+| Low | Next business day | Non-critical warnings, deprecated API usage |
+
+### Notification Channels
+
+| Severity | Channel |
+|----------|---------|
+| Critical | [PagerDuty / phone] |
+| High | [Slack + email] |
+| Medium | [Slack] |
+| Low | [Dashboard only] |
+
+## Dashboards
+
+### Operations Dashboard
+
+- Service health status (up/down per component)
+- Request rate and error rate
+- Response time percentiles (P50, P95, P99)
+- Resource utilization (CPU, memory per container)
+- Active alerts
+
+### Business Dashboard
+
+- [Key business metrics from acceptance criteria]
+- [User activity indicators]
+- [Transaction volumes]
+```