add python scaffold folder and autodevelopment system

This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-03-25 21:57:26 +02:00
parent 1009af4a32
commit d8f91ef6a9
99 changed files with 11070 additions and 1 deletions
+491
View File
@@ -0,0 +1,491 @@
---
name: deploy
description: |
Comprehensive deployment skill covering status check, env setup, containerization, CI/CD pipeline, environment strategy, observability, deployment procedures, and deployment scripts.
7-step workflow: Status & env check, Docker containerization, CI/CD pipeline definition, environment strategy, observability planning, deployment procedures, deployment scripts.
Uses _docs/04_deploy/ structure.
Trigger phrases:
- "deploy", "deployment", "deployment strategy"
- "CI/CD", "pipeline", "containerize"
- "observability", "monitoring", "logging"
- "dockerize", "docker compose"
category: ship
tags: [deployment, docker, ci-cd, observability, monitoring, containerization, scripts]
disable-model-invocation: true
---
# Deployment Planning
Plan and document the full deployment lifecycle: check deployment status and environment requirements, containerize the application, define CI/CD pipelines, configure environments, set up observability, document deployment procedures, and generate deployment scripts.
## Core Principles
- **Docker-first**: every component runs in a container; local dev, blackbox tests, and production all use Docker
- **Infrastructure as code**: all deployment configuration is version-controlled
- **Observability built-in**: logging, metrics, and tracing are part of the deployment plan, not afterthoughts
- **Environment parity**: dev, staging, and production environments mirror each other as closely as possible
- **Save immediately**: write artifacts to disk after each step; never accumulate unsaved work
- **Ask, don't assume**: when infrastructure constraints or preferences are unclear, ask the user
- **Plan, don't code**: this workflow produces deployment documents and specifications, not implementation code (except deployment scripts in Step 7)
## Context Resolution
Fixed paths:
- DOCUMENT_DIR: `_docs/02_document/`
- DEPLOY_DIR: `_docs/04_deploy/`
- REPORTS_DIR: `_docs/04_deploy/reports/`
- SCRIPTS_DIR: `scripts/`
- ARCHITECTURE: `_docs/02_document/architecture.md`
- COMPONENTS_DIR: `_docs/02_document/components/`
Announce the resolved paths to the user before proceeding.
## Input Specification
### Required Files
| File | Purpose | Required |
|------|---------|----------|
| `_docs/00_problem/problem.md` | Problem description and context | Greenfield only |
| `_docs/00_problem/restrictions.md` | Constraints and limitations | Greenfield only |
| `_docs/01_solution/solution.md` | Finalized solution | Greenfield only |
| `DOCUMENT_DIR/architecture.md` | Architecture (from plan or document skill) | Always |
| `DOCUMENT_DIR/components/` | Component specs | Always |
### Prerequisite Checks (BLOCKING)
1. `architecture.md` exists — **STOP if missing**, run `/plan` first
2. At least one component spec exists in `DOCUMENT_DIR/components/`**STOP if missing**
3. Create DEPLOY_DIR, REPORTS_DIR, and SCRIPTS_DIR if they do not exist
4. If DEPLOY_DIR already contains artifacts, ask user: **resume from last checkpoint or start fresh?**
## Artifact Management
### Directory Structure
```
DEPLOY_DIR/
├── containerization.md
├── ci_cd_pipeline.md
├── environment_strategy.md
├── observability.md
├── deployment_procedures.md
├── deploy_scripts.md
└── reports/
└── deploy_status_report.md
SCRIPTS_DIR/ (project root)
├── deploy.sh
├── pull-images.sh
├── start-services.sh
├── stop-services.sh
└── health-check.sh
.env (project root, git-ignored)
.env.example (project root, committed)
```
### Save Timing
| Step | Save immediately after | Filename |
|------|------------------------|----------|
| Step 1 | Status check & env setup complete | `reports/deploy_status_report.md` + `.env` + `.env.example` |
| Step 2 | Containerization plan complete | `containerization.md` |
| Step 3 | CI/CD pipeline defined | `ci_cd_pipeline.md` |
| Step 4 | Environment strategy documented | `environment_strategy.md` |
| Step 5 | Observability plan complete | `observability.md` |
| Step 6 | Deployment procedures documented | `deployment_procedures.md` |
| Step 7 | Deployment scripts created | `deploy_scripts.md` + scripts in `SCRIPTS_DIR/` |
### Resumability
If DEPLOY_DIR already contains artifacts:
1. List existing files and match to the save timing table
2. Identify the last completed step
3. Resume from the next incomplete step
4. Inform the user which steps are being skipped
## Progress Tracking
At the start of execution, create a TodoWrite with all steps (1 through 7). Update status as each step completes.
## Workflow
### Step 1: Deployment Status & Environment Setup
**Role**: DevOps / Platform engineer
**Goal**: Assess current deployment readiness, identify all required environment variables, and create `.env` files
**Constraints**: Must complete before any other step
1. Read architecture.md, all component specs, and restrictions.md
2. Assess deployment readiness:
- List all components and their current state (planned / implemented / tested)
- Identify external dependencies (databases, APIs, message queues, cloud services)
- Identify infrastructure prerequisites (container registry, cloud accounts, DNS, SSL certificates)
- Check if any deployment blockers exist
3. Identify all required environment variables by scanning:
- Component specs for configuration needs
- Database connection requirements
- External API endpoints and credentials
- Feature flags and runtime configuration
- Container registry credentials
- Cloud provider credentials
- Monitoring/logging service endpoints
4. Generate `.env.example` in project root with all variables and placeholder values (committed to VCS)
5. Generate `.env` in project root with development defaults filled in where safe (git-ignored)
6. Ensure `.gitignore` includes `.env` (but NOT `.env.example`)
7. Produce a deployment status report summarizing readiness, blockers, and required setup
**Self-verification**:
- [ ] All components assessed for deployment readiness
- [ ] External dependencies catalogued
- [ ] Infrastructure prerequisites identified
- [ ] All required environment variables discovered
- [ ] `.env.example` created with placeholder values
- [ ] `.env` created with safe development defaults
- [ ] `.gitignore` updated to exclude `.env`
- [ ] Status report written to `reports/deploy_status_report.md`
**Save action**: Write `reports/deploy_status_report.md` using `templates/deploy_status_report.md`, create `.env` and `.env.example` in project root
**BLOCKING**: Present status report and environment variables to user. Do NOT proceed until confirmed.
---
### Step 2: Containerization
**Role**: DevOps / Platform engineer
**Goal**: Define Docker configuration for every component, local development, and blackbox test environments
**Constraints**: Plan only — no Dockerfile creation. Describe what each Dockerfile should contain.
1. Read architecture.md and all component specs
2. Read restrictions.md for infrastructure constraints
3. Research best Docker practices for the project's tech stack (multi-stage builds, base image selection, layer optimization)
4. For each component, define:
- Base image (pinned version, prefer alpine/distroless for production)
- Build stages (dependency install, build, production)
- Non-root user configuration
- Health check endpoint and command
- Exposed ports
- `.dockerignore` contents
5. Define `docker-compose.yml` for local development:
- All application components
- Database (Postgres) with named volume
- Any message queues, caches, or external service mocks
- Shared network
- Environment variable files (`.env`)
6. Define `docker-compose.test.yml` for blackbox tests:
- Application components under test
- Test runner container (black-box, no internal imports)
- Isolated database with seed data
- All tests runnable via `docker compose -f docker-compose.test.yml up --abort-on-container-exit`
7. Define image tagging strategy: `<registry>/<project>/<component>:<git-sha>` for CI, `latest` for local dev only
**Self-verification**:
- [ ] Every component has a Dockerfile specification
- [ ] Multi-stage builds specified for all production images
- [ ] Non-root user for all containers
- [ ] Health checks defined for every service
- [ ] docker-compose.yml covers all components + dependencies
- [ ] docker-compose.test.yml enables black-box testing
- [ ] `.dockerignore` defined
**Save action**: Write `containerization.md` using `templates/containerization.md`
**BLOCKING**: Present containerization plan to user. Do NOT proceed until confirmed.
---
### Step 3: CI/CD Pipeline
**Role**: DevOps engineer
**Goal**: Define the CI/CD pipeline with quality gates, security scanning, and multi-environment deployment
**Constraints**: Pipeline definition only — produce YAML specification, not implementation
1. Read architecture.md for tech stack and deployment targets
2. Read restrictions.md for CI/CD constraints (cloud provider, registry, etc.)
3. Research CI/CD best practices for the project's platform (GitHub Actions / Azure Pipelines)
4. Define pipeline stages:
| Stage | Trigger | Steps | Quality Gate |
|-------|---------|-------|-------------|
| **Lint** | Every push | Run linters per language (black, rustfmt, prettier, dotnet format) | Zero errors |
| **Test** | Every push | Unit tests, blackbox tests, coverage report | 75%+ coverage (see `.cursor/rules/cursor-meta.mdc` Quality Thresholds) |
| **Security** | Every push | Dependency audit, SAST scan (Semgrep/SonarQube), image scan (Trivy) | Zero critical/high CVEs |
| **Build** | PR merge to dev | Build Docker images, tag with git SHA | Build succeeds |
| **Push** | After build | Push to container registry | Push succeeds |
| **Deploy Staging** | After push | Deploy to staging environment | Health checks pass |
| **Smoke Tests** | After staging deploy | Run critical path tests against staging | All pass |
| **Deploy Production** | Manual approval | Deploy to production | Health checks pass |
5. Define caching strategy: dependency caches, Docker layer caches, build artifact caches
6. Define parallelization: which stages can run concurrently
7. Define notifications: build failures, deployment status, security alerts
**Self-verification**:
- [ ] All pipeline stages defined with triggers and gates
- [ ] Coverage threshold enforced (75%+)
- [ ] Security scanning included (dependencies + images + SAST)
- [ ] Caching configured for dependencies and Docker layers
- [ ] Multi-environment deployment (staging → production)
- [ ] Rollback procedure referenced
- [ ] Notifications configured
**Save action**: Write `ci_cd_pipeline.md` using `templates/ci_cd_pipeline.md`
---
### Step 4: Environment Strategy
**Role**: Platform engineer
**Goal**: Define environment configuration, secrets management, and environment parity
**Constraints**: Strategy document — no secrets or credentials in output
1. Define environments:
| Environment | Purpose | Infrastructure | Data |
|-------------|---------|---------------|------|
| **Development** | Local developer workflow | docker-compose, local volumes | Seed data, mocks for external APIs |
| **Staging** | Pre-production validation | Mirrors production topology | Anonymized production-like data |
| **Production** | Live system | Full infrastructure | Real data |
2. Define environment variable management:
- Reference `.env.example` created in Step 1
- Per-environment variable sources (`.env` for dev, secret manager for staging/prod)
- Validation: fail fast on missing required variables at startup
3. Define secrets management:
- Never commit secrets to version control
- Development: `.env` files (git-ignored)
- Staging/Production: secret manager (AWS Secrets Manager / Azure Key Vault / Vault)
- Rotation policy
4. Define database management per environment:
- Development: Docker Postgres with named volume, seed data
- Staging: managed Postgres, migrations applied via CI/CD
- Production: managed Postgres, migrations require approval
**Self-verification**:
- [ ] All three environments defined with clear purpose
- [ ] Environment variable documentation complete (references `.env.example` from Step 1)
- [ ] No secrets in any output document
- [ ] Secret manager specified for staging/production
- [ ] Database strategy per environment
**Save action**: Write `environment_strategy.md` using `templates/environment_strategy.md`
---
### Step 5: Observability
**Role**: Site Reliability Engineer (SRE)
**Goal**: Define logging, metrics, tracing, and alerting strategy
**Constraints**: Strategy document — describe what to implement, not how to wire it
1. Read architecture.md and component specs for service boundaries
2. Research observability best practices for the tech stack
**Logging**:
- Structured JSON to stdout/stderr (no file logging in containers)
- Fields: `timestamp` (ISO 8601), `level`, `service`, `correlation_id`, `message`, `context`
- Levels: ERROR (exceptions), WARN (degraded), INFO (business events), DEBUG (diagnostics, dev only)
- No PII in logs
- Retention: dev = console, staging = 7 days, production = 30 days
**Metrics**:
- Expose Prometheus-compatible `/metrics` endpoint per service
- System metrics: CPU, memory, disk, network
- Application metrics: `request_count`, `request_duration` (histogram), `error_count`, `active_connections`
- Business metrics: derived from acceptance criteria
- Collection interval: 15s
**Distributed Tracing**:
- OpenTelemetry SDK integration
- Trace context propagation via HTTP headers and message queue metadata
- Span naming: `<service>.<operation>`
- Sampling: 100% in dev/staging, 10% in production (adjust based on volume)
**Alerting**:
| Severity | Response Time | Condition Examples |
|----------|---------------|-------------------|
| Critical | 5 min | Service down, data loss, health check failed |
| High | 30 min | Error rate > 5%, P95 latency > 2x baseline |
| Medium | 4 hours | Disk > 80%, elevated latency |
| Low | Next business day | Non-critical warnings |
**Dashboards**:
- Operations: service health, request rate, error rate, response time percentiles, resource utilization
- Business: key business metrics from acceptance criteria
**Self-verification**:
- [ ] Structured logging format defined with required fields
- [ ] Metrics endpoint specified per service
- [ ] OpenTelemetry tracing configured
- [ ] Alert severities with response times defined
- [ ] Dashboards cover operations and business metrics
- [ ] PII exclusion from logs addressed
**Save action**: Write `observability.md` using `templates/observability.md`
---
### Step 6: Deployment Procedures
**Role**: DevOps / Platform engineer
**Goal**: Define deployment strategy, rollback procedures, health checks, and deployment checklist
**Constraints**: Procedures document — no implementation
1. Define deployment strategy:
- Preferred pattern: blue-green / rolling / canary (choose based on architecture)
- Zero-downtime requirement for production
- Graceful shutdown: 30-second grace period for in-flight requests
- Database migration ordering: migrate before deploy, backward-compatible only
2. Define health checks:
| Check | Type | Endpoint | Interval | Threshold |
|-------|------|----------|----------|-----------|
| Liveness | HTTP GET | `/health/live` | 10s | 3 failures → restart |
| Readiness | HTTP GET | `/health/ready` | 5s | 3 failures → remove from LB |
| Startup | HTTP GET | `/health/ready` | 5s | 30 attempts max |
3. Define rollback procedures:
- Trigger criteria: health check failures, error rate spike, critical alert
- Rollback steps: redeploy previous image tag, verify health, rollback database if needed
- Communication: notify stakeholders during rollback
- Post-mortem: required after every production rollback
4. Define deployment checklist:
- [ ] All tests pass in CI
- [ ] Security scan clean (zero critical/high CVEs)
- [ ] Database migrations reviewed and tested
- [ ] Environment variables configured
- [ ] Health check endpoints responding
- [ ] Monitoring alerts configured
- [ ] Rollback plan documented and tested
- [ ] Stakeholders notified
**Self-verification**:
- [ ] Deployment strategy chosen and justified
- [ ] Zero-downtime approach specified
- [ ] Health checks defined (liveness, readiness, startup)
- [ ] Rollback trigger criteria and steps documented
- [ ] Deployment checklist complete
**Save action**: Write `deployment_procedures.md` using `templates/deployment_procedures.md`
**BLOCKING**: Present deployment procedures to user. Do NOT proceed until confirmed.
---
### Step 7: Deployment Scripts
**Role**: DevOps / Platform engineer
**Goal**: Create executable deployment scripts for pulling Docker images and running services on the remote target machine
**Constraints**: Produce real, executable shell scripts. This is the ONLY step that creates implementation artifacts.
1. Read containerization.md and deployment_procedures.md from previous steps
2. Read `.env.example` for required variables
3. Create the following scripts in `SCRIPTS_DIR/`:
**`deploy.sh`** — Main deployment orchestrator:
- Validates that required environment variables are set (sources `.env` if present)
- Calls `pull-images.sh`, then `stop-services.sh`, then `start-services.sh`, then `health-check.sh`
- Exits with non-zero code on any failure
- Supports `--rollback` flag to redeploy previous image tags
**`pull-images.sh`** — Pull Docker images to target machine:
- Reads image list and tags from environment or config
- Authenticates with container registry
- Pulls all required images
- Verifies image integrity (digest check)
**`start-services.sh`** — Start services on target machine:
- Runs `docker compose up -d` or individual `docker run` commands
- Applies environment variables from `.env`
- Configures networks and volumes
- Waits for containers to reach healthy state
**`stop-services.sh`** — Graceful shutdown:
- Stops services with graceful shutdown period
- Saves current image tags for rollback reference
- Cleans up orphaned containers/networks
**`health-check.sh`** — Verify deployment health:
- Checks all health endpoints
- Reports status per service
- Returns non-zero if any service is unhealthy
4. All scripts must:
- Be POSIX-compatible (#!/bin/bash with set -euo pipefail)
- Source `.env` from project root or accept env vars from the environment
- Include usage/help output (`--help` flag)
- Be idempotent where possible
- Handle SSH connection to remote target (configurable via `DEPLOY_HOST` env var)
5. Document all scripts in `deploy_scripts.md`
**Self-verification**:
- [ ] All five scripts created and executable
- [ ] Scripts source environment variables correctly
- [ ] `deploy.sh` orchestrates the full flow
- [ ] `pull-images.sh` handles registry auth and image pull
- [ ] `start-services.sh` starts containers with correct config
- [ ] `stop-services.sh` handles graceful shutdown
- [ ] `health-check.sh` validates all endpoints
- [ ] Rollback supported via `deploy.sh --rollback`
- [ ] Scripts work for remote deployment via SSH (DEPLOY_HOST)
- [ ] `deploy_scripts.md` documents all scripts
**Save action**: Write scripts to `SCRIPTS_DIR/`, write `deploy_scripts.md` using `templates/deploy_scripts.md`
---
## Escalation Rules
| Situation | Action |
|-----------|--------|
| Unknown cloud provider or hosting | **ASK user** |
| Container registry not specified | **ASK user** |
| CI/CD platform preference unclear | **ASK user** — default to GitHub Actions |
| Secret manager not chosen | **ASK user** |
| Deployment pattern trade-offs | **ASK user** with recommendation |
| Missing architecture.md | **STOP** — run `/plan` first |
| Remote target machine details unknown | **ASK user** for SSH access, OS, and specs |
## Common Mistakes
- **Implementing during planning**: Steps 16 produce documents, not code (Step 7 is the exception — it creates scripts)
- **Hardcoding secrets**: never include real credentials in deployment documents or scripts
- **Ignoring blackbox test containerization**: the test environment must be containerized alongside the app
- **Skipping BLOCKING gates**: never proceed past a BLOCKING marker without user confirmation
- **Using `:latest` tags**: always pin base image versions
- **Forgetting observability**: logging, metrics, and tracing are deployment concerns, not post-deployment additions
- **Committing `.env`**: only `.env.example` goes to version control; `.env` must be in `.gitignore`
- **Non-portable scripts**: deployment scripts must work across environments; avoid hardcoded paths
## Methodology Quick Reference
```
┌────────────────────────────────────────────────────────────────┐
│ Deployment Planning (7-Step Method) │
├────────────────────────────────────────────────────────────────┤
│ PREREQ: architecture.md + component specs exist │
│ │
│ 1. Status & Env → reports/deploy_status_report.md │
│ + .env + .env.example │
│ [BLOCKING: user confirms status & env vars] │
│ 2. Containerization → containerization.md │
│ [BLOCKING: user confirms Docker plan] │
│ 3. CI/CD Pipeline → ci_cd_pipeline.md │
│ 4. Environment → environment_strategy.md │
│ 5. Observability → observability.md │
│ 6. Procedures → deployment_procedures.md │
│ [BLOCKING: user confirms deployment plan] │
│ 7. Scripts → deploy_scripts.md + scripts/ │
├────────────────────────────────────────────────────────────────┤
│ Principles: Docker-first · IaC · Observability built-in │
│ Environment parity · Save immediately │
└────────────────────────────────────────────────────────────────┘
```
@@ -0,0 +1,87 @@
# CI/CD Pipeline Template
Save as `_docs/04_deploy/ci_cd_pipeline.md`.
---
```markdown
# [System Name] — CI/CD Pipeline
## Pipeline Overview
| Stage | Trigger | Quality Gate |
|-------|---------|-------------|
| Lint | Every push | Zero lint errors |
| Test | Every push | 75%+ coverage, all tests pass |
| Security | Every push | Zero critical/high CVEs |
| Build | PR merge to dev | Docker build succeeds |
| Push | After build | Images pushed to registry |
| Deploy Staging | After push | Health checks pass |
| Smoke Tests | After staging deploy | Critical paths pass |
| Deploy Production | Manual approval | Health checks pass |
## Stage Details
### Lint
- [Language-specific linters and formatters]
- Runs in parallel per language
### Test
- Unit tests: [framework and command]
- Blackbox tests: [framework and command, uses docker-compose.test.yml]
- Coverage threshold: 75% overall, 90% critical paths
- Coverage report published as pipeline artifact
### Security
- Dependency audit: [tool, e.g., npm audit / pip-audit / dotnet list package --vulnerable]
- SAST scan: [tool, e.g., Semgrep / SonarQube]
- Image scan: Trivy on built Docker images
- Block on: critical or high severity findings
### Build
- Docker images built using multi-stage Dockerfiles
- Tagged with git SHA: `<registry>/<component>:<sha>`
- Build cache: Docker layer cache via CI cache action
### Push
- Registry: [container registry URL]
- Authentication: [method]
### Deploy Staging
- Deployment method: [docker compose / Kubernetes / cloud service]
- Pre-deploy: run database migrations
- Post-deploy: verify health check endpoints
- Automated rollback on health check failure
### Smoke Tests
- Subset of blackbox tests targeting staging environment
- Validates critical user flows
- Timeout: [maximum duration]
### Deploy Production
- Requires manual approval via [mechanism]
- Deployment strategy: [blue-green / rolling / canary]
- Pre-deploy: database migration review
- Post-deploy: health checks + monitoring for 15 min
## Caching Strategy
| Cache | Key | Restore Keys |
|-------|-----|-------------|
| Dependencies | [lockfile hash] | [partial match] |
| Docker layers | [Dockerfile hash] | [partial match] |
| Build artifacts | [source hash] | [partial match] |
## Parallelization
[Diagram or description of which stages run concurrently]
## Notifications
| Event | Channel | Recipients |
|-------|---------|-----------|
| Build failure | [Slack/email] | [team] |
| Security alert | [Slack/email] | [team + security] |
| Deploy success | [Slack] | [team] |
| Deploy failure | [Slack/email + PagerDuty] | [on-call] |
```
@@ -0,0 +1,94 @@
# Containerization Plan Template
Save as `_docs/04_deploy/containerization.md`.
---
```markdown
# [System Name] — Containerization
## Component Dockerfiles
### [Component Name]
| Property | Value |
|----------|-------|
| Base image | [e.g., mcr.microsoft.com/dotnet/aspnet:8.0-alpine] |
| Build image | [e.g., mcr.microsoft.com/dotnet/sdk:8.0-alpine] |
| Stages | [dependency install → build → production] |
| User | [non-root user name] |
| Health check | [endpoint and command] |
| Exposed ports | [port list] |
| Key build args | [if any] |
### [Repeat for each component]
## Docker Compose — Local Development
```yaml
# docker-compose.yml structure
services:
[component]:
build: ./[path]
ports: ["host:container"]
environment: [reference .env.dev]
depends_on: [dependencies with health condition]
healthcheck: [command, interval, timeout, retries]
db:
image: [postgres:version-alpine]
volumes: [named volume]
environment: [credentials from .env.dev]
healthcheck: [pg_isready]
volumes:
[named volumes]
networks:
[shared network]
```
## Docker Compose — Blackbox Tests
```yaml
# docker-compose.test.yml structure
services:
[app components under test]
test-runner:
build: ./tests/integration
depends_on: [app components with health condition]
environment: [test configuration]
# Exit code determines test pass/fail
db:
image: [postgres:version-alpine]
volumes: [seed data mount]
```
Run: `docker compose -f docker-compose.test.yml up --abort-on-container-exit`
## Image Tagging Strategy
| Context | Tag Format | Example |
|---------|-----------|---------|
| CI build | `<registry>/<project>/<component>:<git-sha>` | `ghcr.io/org/api:a1b2c3d` |
| Release | `<registry>/<project>/<component>:<semver>` | `ghcr.io/org/api:1.2.0` |
| Local dev | `<component>:latest` | `api:latest` |
## .dockerignore
```
.git
.cursor
_docs
_standalone
node_modules
**/bin
**/obj
**/__pycache__
*.md
.env*
docker-compose*.yml
```
```
@@ -0,0 +1,114 @@
# Deployment Scripts Documentation Template
Save as `_docs/04_deploy/deploy_scripts.md`.
---
```markdown
# [System Name] — Deployment Scripts
## Overview
| Script | Purpose | Location |
|--------|---------|----------|
| `deploy.sh` | Main deployment orchestrator | `scripts/deploy.sh` |
| `pull-images.sh` | Pull Docker images from registry | `scripts/pull-images.sh` |
| `start-services.sh` | Start all services | `scripts/start-services.sh` |
| `stop-services.sh` | Graceful shutdown | `scripts/stop-services.sh` |
| `health-check.sh` | Verify deployment health | `scripts/health-check.sh` |
## Prerequisites
- Docker and Docker Compose installed on target machine
- SSH access to target machine (configured via `DEPLOY_HOST`)
- Container registry credentials configured
- `.env` file with required environment variables (see `.env.example`)
## Environment Variables
All scripts source `.env` from the project root or accept variables from the environment.
| Variable | Required By | Purpose |
|----------|------------|---------|
| `DEPLOY_HOST` | All (remote mode) | SSH target for remote deployment |
| `REGISTRY_URL` | `pull-images.sh` | Container registry URL |
| `REGISTRY_USER` | `pull-images.sh` | Registry authentication |
| `REGISTRY_PASS` | `pull-images.sh` | Registry authentication |
| `IMAGE_TAG` | `pull-images.sh`, `start-services.sh` | Image version to deploy (default: latest git SHA) |
| [add project-specific variables] | | |
## Script Details
### deploy.sh
Main orchestrator that runs the full deployment flow.
**Usage**:
- `./scripts/deploy.sh` — Deploy latest version
- `./scripts/deploy.sh --rollback` — Rollback to previous version
- `./scripts/deploy.sh --help` — Show usage
**Flow**:
1. Validate required environment variables
2. Call `pull-images.sh`
3. Call `stop-services.sh`
4. Call `start-services.sh`
5. Call `health-check.sh`
6. Report success or failure
**Rollback**: When `--rollback` is passed, reads the previous image tags saved by `stop-services.sh` and redeploys those versions.
### pull-images.sh
**Usage**: `./scripts/pull-images.sh [--help]`
**Steps**:
1. Authenticate with container registry (`REGISTRY_URL`)
2. Pull all required images with specified `IMAGE_TAG`
3. Verify image integrity via digest check
4. Report pull results per image
### start-services.sh
**Usage**: `./scripts/start-services.sh [--help]`
**Steps**:
1. Run `docker compose up -d` with the correct env file
2. Configure networks and volumes
3. Wait for all containers to report healthy state
4. Report startup status per service
### stop-services.sh
**Usage**: `./scripts/stop-services.sh [--help]`
**Steps**:
1. Save current image tags to `previous_tags.env` (for rollback)
2. Stop services with graceful shutdown period (30s)
3. Clean up orphaned containers and networks
### health-check.sh
**Usage**: `./scripts/health-check.sh [--help]`
**Checks**:
| Service | Endpoint | Expected |
|---------|----------|----------|
| [Component 1] | `http://localhost:[port]/health/live` | HTTP 200 |
| [Component 2] | `http://localhost:[port]/health/ready` | HTTP 200 |
| [add all services] | | |
**Exit codes**:
- `0` — All services healthy
- `1` — One or more services unhealthy
## Common Script Properties
All scripts:
- Use `#!/bin/bash` with `set -euo pipefail`
- Support `--help` flag for usage information
- Source `.env` from project root if present
- Are idempotent where possible
- Support remote execution via SSH when `DEPLOY_HOST` is set
```
@@ -0,0 +1,73 @@
# Deployment Status Report Template
Save as `_docs/04_deploy/reports/deploy_status_report.md`.
---
```markdown
# [System Name] — Deployment Status Report
## Deployment Readiness Summary
| Aspect | Status | Notes |
|--------|--------|-------|
| Architecture defined | ✅ / ❌ | |
| Component specs complete | ✅ / ❌ | |
| Infrastructure prerequisites met | ✅ / ❌ | |
| External dependencies identified | ✅ / ❌ | |
| Blockers | [count] | [summary] |
## Component Status
| Component | State | Docker-ready | Notes |
|-----------|-------|-------------|-------|
| [Component 1] | planned / implemented / tested | yes / no | |
| [Component 2] | planned / implemented / tested | yes / no | |
## External Dependencies
| Dependency | Type | Required For | Status |
|------------|------|-------------|--------|
| [e.g., PostgreSQL] | Database | Data persistence | [available / needs setup] |
| [e.g., Redis] | Cache | Session management | [available / needs setup] |
| [e.g., External API] | API | [purpose] | [available / needs setup] |
## Infrastructure Prerequisites
| Prerequisite | Status | Action Needed |
|-------------|--------|--------------|
| Container registry | [ready / not set up] | [action] |
| Cloud account | [ready / not set up] | [action] |
| DNS configuration | [ready / not set up] | [action] |
| SSL certificates | [ready / not set up] | [action] |
| CI/CD platform | [ready / not set up] | [action] |
| Secret manager | [ready / not set up] | [action] |
## Deployment Blockers
| Blocker | Severity | Resolution |
|---------|----------|-----------|
| [blocker description] | critical / high / medium | [resolution steps] |
## Required Environment Variables
| Variable | Purpose | Required In | Default (Dev) | Source (Staging/Prod) |
|----------|---------|------------|---------------|----------------------|
| `DATABASE_URL` | Postgres connection string | All components | `postgres://dev:dev@db:5432/app` | Secret manager |
| `DEPLOY_HOST` | Remote target machine | Deployment scripts | `localhost` | Environment |
| `REGISTRY_URL` | Container registry URL | CI/CD, deploy scripts | `localhost:5000` | Environment |
| `REGISTRY_USER` | Registry username | CI/CD, deploy scripts | — | Secret manager |
| `REGISTRY_PASS` | Registry password | CI/CD, deploy scripts | — | Secret manager |
| [add all required variables] | | | | |
## .env Files Created
- `.env.example` — committed to VCS, contains all variable names with placeholder values
- `.env` — git-ignored, contains development defaults
## Next Steps
1. [Resolve any blockers listed above]
2. [Set up missing infrastructure prerequisites]
3. [Proceed to containerization planning]
```
@@ -0,0 +1,103 @@
# Deployment Procedures Template
Save as `_docs/04_deploy/deployment_procedures.md`.
---
```markdown
# [System Name] — Deployment Procedures
## Deployment Strategy
**Pattern**: [blue-green / rolling / canary]
**Rationale**: [why this pattern fits the architecture]
**Zero-downtime**: required for production deployments
### Graceful Shutdown
- Grace period: 30 seconds for in-flight requests
- Sequence: stop accepting new requests → drain connections → shutdown
- Container orchestrator: `terminationGracePeriodSeconds: 40`
### Database Migration Ordering
- Migrations run **before** new code deploys
- All migrations must be backward-compatible (old code works with new schema)
- Irreversible migrations require explicit approval
## Health Checks
| Check | Type | Endpoint | Interval | Failure Threshold | Action |
|-------|------|----------|----------|-------------------|--------|
| Liveness | HTTP GET | `/health/live` | 10s | 3 failures | Restart container |
| Readiness | HTTP GET | `/health/ready` | 5s | 3 failures | Remove from load balancer |
| Startup | HTTP GET | `/health/ready` | 5s | 30 attempts | Kill and recreate |
### Health Check Responses
- `/health/live`: returns 200 if process is running (no dependency checks)
- `/health/ready`: returns 200 if all dependencies (DB, cache, queues) are reachable
## Staging Deployment
1. CI/CD builds and pushes Docker images tagged with git SHA
2. Run database migrations against staging
3. Deploy new images to staging environment
4. Wait for health checks to pass (readiness probe)
5. Run smoke tests against staging
6. If smoke tests fail: automatic rollback to previous image
## Production Deployment
1. **Approval**: manual approval required via [mechanism]
2. **Pre-deploy checks**:
- [ ] Staging smoke tests passed
- [ ] Security scan clean
- [ ] Database migration reviewed
- [ ] Monitoring alerts configured
- [ ] Rollback plan confirmed
3. **Deploy**: apply deployment strategy (blue-green / rolling / canary)
4. **Verify**: health checks pass, error rate stable, latency within baseline
5. **Monitor**: observe dashboards for 15 minutes post-deploy
6. **Finalize**: mark deployment as successful or trigger rollback
## Rollback Procedures
### Trigger Criteria
- Health check failures persist after deploy
- Error rate exceeds 5% for more than 5 minutes
- Critical alert fires within 15 minutes of deploy
- Manual decision by on-call engineer
### Rollback Steps
1. Redeploy previous Docker image tag (from CI/CD artifact)
2. Verify health checks pass
3. If database migration was applied:
- Run DOWN migration if reversible
- If irreversible: assess data impact, escalate if needed
4. Notify stakeholders
5. Schedule post-mortem within 24 hours
### Post-Mortem
Required after every production rollback:
- Timeline of events
- Root cause
- What went wrong
- Prevention measures
## Deployment Checklist
- [ ] All tests pass in CI
- [ ] Security scan clean (zero critical/high CVEs)
- [ ] Docker images built and pushed
- [ ] Database migrations reviewed and tested
- [ ] Environment variables configured for target environment
- [ ] Health check endpoints verified
- [ ] Monitoring alerts configured
- [ ] Rollback plan documented and tested
- [ ] Stakeholders notified of deployment window
- [ ] On-call engineer available during deployment
```
@@ -0,0 +1,61 @@
# Environment Strategy Template
Save as `_docs/04_deploy/environment_strategy.md`.
---
```markdown
# [System Name] — Environment Strategy
## Environments
| Environment | Purpose | Infrastructure | Data Source |
|-------------|---------|---------------|-------------|
| Development | Local developer workflow | docker-compose | Seed data, mocked externals |
| Staging | Pre-production validation | [mirrors production] | Anonymized production-like data |
| Production | Live system | [full infrastructure] | Real data |
## Environment Variables
### Required Variables
| Variable | Purpose | Dev Default | Staging/Prod Source |
|----------|---------|-------------|-------------------|
| `DATABASE_URL` | Postgres connection | `postgres://dev:dev@db:5432/app` | Secret manager |
| [add all required variables] | | | |
### `.env.example`
```env
# Copy to .env and fill in values
DATABASE_URL=postgres://user:pass@host:5432/dbname
# [all required variables with placeholder values]
```
### Variable Validation
All services validate required environment variables at startup and fail fast with a clear error message if any are missing.
## Secrets Management
| Environment | Method | Tool |
|-------------|--------|------|
| Development | `.env` file (git-ignored) | dotenv |
| Staging | Secret manager | [AWS Secrets Manager / Azure Key Vault / Vault] |
| Production | Secret manager | [AWS Secrets Manager / Azure Key Vault / Vault] |
Rotation policy: [frequency and procedure]
## Database Management
| Environment | Type | Migrations | Data |
|-------------|------|-----------|------|
| Development | Docker Postgres, named volume | Applied on container start | Seed data via init script |
| Staging | Managed Postgres | Applied via CI/CD pipeline | Anonymized production snapshot |
| Production | Managed Postgres | Applied via CI/CD with approval | Live data |
Migration rules:
- All migrations must be backward-compatible (support old and new code simultaneously)
- Reversible migrations required (DOWN/rollback script)
- Production migrations require review before apply
```
@@ -0,0 +1,132 @@
# Observability Template
Save as `_docs/04_deploy/observability.md`.
---
```markdown
# [System Name] — Observability
## Logging
### Format
Structured JSON to stdout/stderr. No file-based logging in containers.
```json
{
"timestamp": "ISO8601",
"level": "INFO",
"service": "service-name",
"correlation_id": "uuid",
"message": "Event description",
"context": {}
}
```
### Log Levels
| Level | Usage | Example |
|-------|-------|---------|
| ERROR | Exceptions, failures requiring attention | Database connection failed |
| WARN | Potential issues, degraded performance | Retry attempt 2/3 |
| INFO | Significant business events | User registered, Order placed |
| DEBUG | Detailed diagnostics (dev/staging only) | Request payload, Query params |
### Retention
| Environment | Destination | Retention |
|-------------|-------------|-----------|
| Development | Console | Session |
| Staging | [log aggregator] | 7 days |
| Production | [log aggregator] | 30 days |
### PII Rules
- Never log passwords, tokens, or session IDs
- Mask email addresses and personal identifiers
- Log user IDs (opaque) instead of usernames
## Metrics
### Endpoints
Every service exposes Prometheus-compatible metrics at `/metrics`.
### Application Metrics
| Metric | Type | Description |
|--------|------|-------------|
| `request_count` | Counter | Total HTTP requests by method, path, status |
| `request_duration_seconds` | Histogram | Response time by method, path |
| `error_count` | Counter | Failed requests by type |
| `active_connections` | Gauge | Current open connections |
### System Metrics
- CPU usage, Memory usage, Disk I/O, Network I/O
### Business Metrics
| Metric | Type | Description | Source |
|--------|------|-------------|--------|
| [from acceptance criteria] | | | |
Collection interval: 15 seconds
## Distributed Tracing
### Configuration
- SDK: OpenTelemetry
- Propagation: W3C Trace Context via HTTP headers
- Span naming: `<service>.<operation>`
### Sampling
| Environment | Rate | Rationale |
|-------------|------|-----------|
| Development | 100% | Full visibility |
| Staging | 100% | Full visibility |
| Production | 10% | Balance cost vs observability |
### Integration Points
- HTTP requests: automatic instrumentation
- Database queries: automatic instrumentation
- Message queues: manual span creation on publish/consume
## Alerting
| Severity | Response Time | Conditions |
|----------|---------------|-----------|
| Critical | 5 min | Service unreachable, health check failed for 1 min, data loss detected |
| High | 30 min | Error rate > 5% for 5 min, P95 latency > 2x baseline for 10 min |
| Medium | 4 hours | Disk usage > 80%, elevated latency, connection pool exhaustion |
| Low | Next business day | Non-critical warnings, deprecated API usage |
### Notification Channels
| Severity | Channel |
|----------|---------|
| Critical | [PagerDuty / phone] |
| High | [Slack + email] |
| Medium | [Slack] |
| Low | [Dashboard only] |
## Dashboards
### Operations Dashboard
- Service health status (up/down per component)
- Request rate and error rate
- Response time percentiles (P50, P95, P99)
- Resource utilization (CPU, memory per container)
- Active alerts
### Business Dashboard
- [Key business metrics from acceptance criteria]
- [User activity indicators]
- [Transaction volumes]
```