mirror of
https://github.com/azaion/detections.git
synced 2026-04-22 11:06:32 +00:00
Update .gitignore to include additional file types and directories for Python projects, enhancing environment management and build artifacts exclusion.
This commit is contained in:
@@ -0,0 +1,491 @@
|
||||
---
|
||||
name: deploy
|
||||
description: |
|
||||
Comprehensive deployment skill covering status check, env setup, containerization, CI/CD pipeline, environment strategy, observability, deployment procedures, and deployment scripts.
|
||||
7-step workflow: Status & env check, Docker containerization, CI/CD pipeline definition, environment strategy, observability planning, deployment procedures, deployment scripts.
|
||||
Uses _docs/04_deploy/ structure.
|
||||
Trigger phrases:
|
||||
- "deploy", "deployment", "deployment strategy"
|
||||
- "CI/CD", "pipeline", "containerize"
|
||||
- "observability", "monitoring", "logging"
|
||||
- "dockerize", "docker compose"
|
||||
category: ship
|
||||
tags: [deployment, docker, ci-cd, observability, monitoring, containerization, scripts]
|
||||
disable-model-invocation: true
|
||||
---
|
||||
|
||||
# Deployment Planning
|
||||
|
||||
Plan and document the full deployment lifecycle: check deployment status and environment requirements, containerize the application, define CI/CD pipelines, configure environments, set up observability, document deployment procedures, and generate deployment scripts.
|
||||
|
||||
## Core Principles
|
||||
|
||||
- **Docker-first**: every component runs in a container; local dev, integration tests, and production all use Docker
|
||||
- **Infrastructure as code**: all deployment configuration is version-controlled
|
||||
- **Observability built-in**: logging, metrics, and tracing are part of the deployment plan, not afterthoughts
|
||||
- **Environment parity**: dev, staging, and production environments mirror each other as closely as possible
|
||||
- **Save immediately**: write artifacts to disk after each step; never accumulate unsaved work
|
||||
- **Ask, don't assume**: when infrastructure constraints or preferences are unclear, ask the user
|
||||
- **Plan, don't code**: this workflow produces deployment documents and specifications, not implementation code (except deployment scripts in Step 7)
|
||||
|
||||
## Context Resolution
|
||||
|
||||
Fixed paths:
|
||||
|
||||
- PLANS_DIR: `_docs/02_plans/`
|
||||
- DEPLOY_DIR: `_docs/04_deploy/`
|
||||
- REPORTS_DIR: `_docs/04_deploy/reports/`
|
||||
- SCRIPTS_DIR: `scripts/`
|
||||
- ARCHITECTURE: `_docs/02_plans/architecture.md`
|
||||
- COMPONENTS_DIR: `_docs/02_plans/components/`
|
||||
|
||||
Announce the resolved paths to the user before proceeding.
|
||||
|
||||
## Input Specification
|
||||
|
||||
### Required Files
|
||||
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| `_docs/00_problem/problem.md` | Problem description and context |
|
||||
| `_docs/00_problem/restrictions.md` | Constraints and limitations |
|
||||
| `_docs/01_solution/solution.md` | Finalized solution |
|
||||
| `PLANS_DIR/architecture.md` | Architecture from plan skill |
|
||||
| `PLANS_DIR/components/` | Component specs |
|
||||
|
||||
### Prerequisite Checks (BLOCKING)
|
||||
|
||||
1. `architecture.md` exists — **STOP if missing**, run `/plan` first
|
||||
2. At least one component spec exists in `PLANS_DIR/components/` — **STOP if missing**
|
||||
3. Create DEPLOY_DIR, REPORTS_DIR, and SCRIPTS_DIR if they do not exist
|
||||
4. If DEPLOY_DIR already contains artifacts, ask user: **resume from last checkpoint or start fresh?**
|
||||
|
||||
## Artifact Management
|
||||
|
||||
### Directory Structure
|
||||
|
||||
```
|
||||
DEPLOY_DIR/
|
||||
├── containerization.md
|
||||
├── ci_cd_pipeline.md
|
||||
├── environment_strategy.md
|
||||
├── observability.md
|
||||
├── deployment_procedures.md
|
||||
├── deploy_scripts.md
|
||||
└── reports/
|
||||
└── deploy_status_report.md
|
||||
|
||||
SCRIPTS_DIR/ (project root)
|
||||
├── deploy.sh
|
||||
├── pull-images.sh
|
||||
├── start-services.sh
|
||||
├── stop-services.sh
|
||||
└── health-check.sh
|
||||
|
||||
.env (project root, git-ignored)
|
||||
.env.example (project root, committed)
|
||||
```
|
||||
|
||||
### Save Timing
|
||||
|
||||
| Step | Save immediately after | Filename |
|
||||
|------|------------------------|----------|
|
||||
| Step 1 | Status check & env setup complete | `reports/deploy_status_report.md` + `.env` + `.env.example` |
|
||||
| Step 2 | Containerization plan complete | `containerization.md` |
|
||||
| Step 3 | CI/CD pipeline defined | `ci_cd_pipeline.md` |
|
||||
| Step 4 | Environment strategy documented | `environment_strategy.md` |
|
||||
| Step 5 | Observability plan complete | `observability.md` |
|
||||
| Step 6 | Deployment procedures documented | `deployment_procedures.md` |
|
||||
| Step 7 | Deployment scripts created | `deploy_scripts.md` + scripts in `SCRIPTS_DIR/` |
|
||||
|
||||
### Resumability
|
||||
|
||||
If DEPLOY_DIR already contains artifacts:
|
||||
|
||||
1. List existing files and match to the save timing table
|
||||
2. Identify the last completed step
|
||||
3. Resume from the next incomplete step
|
||||
4. Inform the user which steps are being skipped
|
||||
|
||||
## Progress Tracking
|
||||
|
||||
At the start of execution, create a TodoWrite with all steps (1 through 7). Update status as each step completes.
|
||||
|
||||
## Workflow
|
||||
|
||||
### Step 1: Deployment Status & Environment Setup
|
||||
|
||||
**Role**: DevOps / Platform engineer
|
||||
**Goal**: Assess current deployment readiness, identify all required environment variables, and create `.env` files
|
||||
**Constraints**: Must complete before any other step
|
||||
|
||||
1. Read architecture.md, all component specs, and restrictions.md
|
||||
2. Assess deployment readiness:
|
||||
- List all components and their current state (planned / implemented / tested)
|
||||
- Identify external dependencies (databases, APIs, message queues, cloud services)
|
||||
- Identify infrastructure prerequisites (container registry, cloud accounts, DNS, SSL certificates)
|
||||
- Check if any deployment blockers exist
|
||||
3. Identify all required environment variables by scanning:
|
||||
- Component specs for configuration needs
|
||||
- Database connection requirements
|
||||
- External API endpoints and credentials
|
||||
- Feature flags and runtime configuration
|
||||
- Container registry credentials
|
||||
- Cloud provider credentials
|
||||
- Monitoring/logging service endpoints
|
||||
4. Generate `.env.example` in project root with all variables and placeholder values (committed to VCS)
|
||||
5. Generate `.env` in project root with development defaults filled in where safe (git-ignored)
|
||||
6. Ensure `.gitignore` includes `.env` (but NOT `.env.example`)
|
||||
7. Produce a deployment status report summarizing readiness, blockers, and required setup
|
||||
|
||||
**Self-verification**:
|
||||
- [ ] All components assessed for deployment readiness
|
||||
- [ ] External dependencies catalogued
|
||||
- [ ] Infrastructure prerequisites identified
|
||||
- [ ] All required environment variables discovered
|
||||
- [ ] `.env.example` created with placeholder values
|
||||
- [ ] `.env` created with safe development defaults
|
||||
- [ ] `.gitignore` updated to exclude `.env`
|
||||
- [ ] Status report written to `reports/deploy_status_report.md`
|
||||
|
||||
**Save action**: Write `reports/deploy_status_report.md` using `templates/deploy_status_report.md`, create `.env` and `.env.example` in project root
|
||||
|
||||
**BLOCKING**: Present status report and environment variables to user. Do NOT proceed until confirmed.
|
||||
|
||||
---
|
||||
|
||||
### Step 2: Containerization
|
||||
|
||||
**Role**: DevOps / Platform engineer
|
||||
**Goal**: Define Docker configuration for every component, local development, and integration test environments
|
||||
**Constraints**: Plan only — no Dockerfile creation. Describe what each Dockerfile should contain.
|
||||
|
||||
1. Read architecture.md and all component specs
|
||||
2. Read restrictions.md for infrastructure constraints
|
||||
3. Research best Docker practices for the project's tech stack (multi-stage builds, base image selection, layer optimization)
|
||||
4. For each component, define:
|
||||
- Base image (pinned version, prefer alpine/distroless for production)
|
||||
- Build stages (dependency install, build, production)
|
||||
- Non-root user configuration
|
||||
- Health check endpoint and command
|
||||
- Exposed ports
|
||||
- `.dockerignore` contents
|
||||
5. Define `docker-compose.yml` for local development:
|
||||
- All application components
|
||||
- Database (Postgres) with named volume
|
||||
- Any message queues, caches, or external service mocks
|
||||
- Shared network
|
||||
- Environment variable files (`.env`)
|
||||
6. Define `docker-compose.test.yml` for integration tests:
|
||||
- Application components under test
|
||||
- Test runner container (black-box, no internal imports)
|
||||
- Isolated database with seed data
|
||||
- All tests runnable via `docker compose -f docker-compose.test.yml up --abort-on-container-exit`
|
||||
7. Define image tagging strategy: `<registry>/<project>/<component>:<git-sha>` for CI, `latest` for local dev only
|
||||
|
||||
**Self-verification**:
|
||||
- [ ] Every component has a Dockerfile specification
|
||||
- [ ] Multi-stage builds specified for all production images
|
||||
- [ ] Non-root user for all containers
|
||||
- [ ] Health checks defined for every service
|
||||
- [ ] docker-compose.yml covers all components + dependencies
|
||||
- [ ] docker-compose.test.yml enables black-box integration testing
|
||||
- [ ] `.dockerignore` defined
|
||||
|
||||
**Save action**: Write `containerization.md` using `templates/containerization.md`
|
||||
|
||||
**BLOCKING**: Present containerization plan to user. Do NOT proceed until confirmed.
|
||||
|
||||
---
|
||||
|
||||
### Step 3: CI/CD Pipeline
|
||||
|
||||
**Role**: DevOps engineer
|
||||
**Goal**: Define the CI/CD pipeline with quality gates, security scanning, and multi-environment deployment
|
||||
**Constraints**: Pipeline definition only — produce YAML specification, not implementation
|
||||
|
||||
1. Read architecture.md for tech stack and deployment targets
|
||||
2. Read restrictions.md for CI/CD constraints (cloud provider, registry, etc.)
|
||||
3. Research CI/CD best practices for the project's platform (GitHub Actions / Azure Pipelines)
|
||||
4. Define pipeline stages:
|
||||
|
||||
| Stage | Trigger | Steps | Quality Gate |
|
||||
|-------|---------|-------|-------------|
|
||||
| **Lint** | Every push | Run linters per language (black, rustfmt, prettier, dotnet format) | Zero errors |
|
||||
| **Test** | Every push | Unit tests, integration tests, coverage report | 75%+ coverage |
|
||||
| **Security** | Every push | Dependency audit, SAST scan (Semgrep/SonarQube), image scan (Trivy) | Zero critical/high CVEs |
|
||||
| **Build** | PR merge to dev | Build Docker images, tag with git SHA | Build succeeds |
|
||||
| **Push** | After build | Push to container registry | Push succeeds |
|
||||
| **Deploy Staging** | After push | Deploy to staging environment | Health checks pass |
|
||||
| **Smoke Tests** | After staging deploy | Run critical path tests against staging | All pass |
|
||||
| **Deploy Production** | Manual approval | Deploy to production | Health checks pass |
|
||||
|
||||
5. Define caching strategy: dependency caches, Docker layer caches, build artifact caches
|
||||
6. Define parallelization: which stages can run concurrently
|
||||
7. Define notifications: build failures, deployment status, security alerts
|
||||
|
||||
**Self-verification**:
|
||||
- [ ] All pipeline stages defined with triggers and gates
|
||||
- [ ] Coverage threshold enforced (75%+)
|
||||
- [ ] Security scanning included (dependencies + images + SAST)
|
||||
- [ ] Caching configured for dependencies and Docker layers
|
||||
- [ ] Multi-environment deployment (staging → production)
|
||||
- [ ] Rollback procedure referenced
|
||||
- [ ] Notifications configured
|
||||
|
||||
**Save action**: Write `ci_cd_pipeline.md` using `templates/ci_cd_pipeline.md`
|
||||
|
||||
---
|
||||
|
||||
### Step 4: Environment Strategy
|
||||
|
||||
**Role**: Platform engineer
|
||||
**Goal**: Define environment configuration, secrets management, and environment parity
|
||||
**Constraints**: Strategy document — no secrets or credentials in output
|
||||
|
||||
1. Define environments:
|
||||
|
||||
| Environment | Purpose | Infrastructure | Data |
|
||||
|-------------|---------|---------------|------|
|
||||
| **Development** | Local developer workflow | docker-compose, local volumes | Seed data, mocks for external APIs |
|
||||
| **Staging** | Pre-production validation | Mirrors production topology | Anonymized production-like data |
|
||||
| **Production** | Live system | Full infrastructure | Real data |
|
||||
|
||||
2. Define environment variable management:
|
||||
- Reference `.env.example` created in Step 1
|
||||
- Per-environment variable sources (`.env` for dev, secret manager for staging/prod)
|
||||
- Validation: fail fast on missing required variables at startup
|
||||
3. Define secrets management:
|
||||
- Never commit secrets to version control
|
||||
- Development: `.env` files (git-ignored)
|
||||
- Staging/Production: secret manager (AWS Secrets Manager / Azure Key Vault / Vault)
|
||||
- Rotation policy
|
||||
4. Define database management per environment:
|
||||
- Development: Docker Postgres with named volume, seed data
|
||||
- Staging: managed Postgres, migrations applied via CI/CD
|
||||
- Production: managed Postgres, migrations require approval
|
||||
|
||||
**Self-verification**:
|
||||
- [ ] All three environments defined with clear purpose
|
||||
- [ ] Environment variable documentation complete (references `.env.example` from Step 1)
|
||||
- [ ] No secrets in any output document
|
||||
- [ ] Secret manager specified for staging/production
|
||||
- [ ] Database strategy per environment
|
||||
|
||||
**Save action**: Write `environment_strategy.md` using `templates/environment_strategy.md`
|
||||
|
||||
---
|
||||
|
||||
### Step 5: Observability
|
||||
|
||||
**Role**: Site Reliability Engineer (SRE)
|
||||
**Goal**: Define logging, metrics, tracing, and alerting strategy
|
||||
**Constraints**: Strategy document — describe what to implement, not how to wire it
|
||||
|
||||
1. Read architecture.md and component specs for service boundaries
|
||||
2. Research observability best practices for the tech stack
|
||||
|
||||
**Logging**:
|
||||
- Structured JSON to stdout/stderr (no file logging in containers)
|
||||
- Fields: `timestamp` (ISO 8601), `level`, `service`, `correlation_id`, `message`, `context`
|
||||
- Levels: ERROR (exceptions), WARN (degraded), INFO (business events), DEBUG (diagnostics, dev only)
|
||||
- No PII in logs
|
||||
- Retention: dev = console, staging = 7 days, production = 30 days
|
||||
|
||||
**Metrics**:
|
||||
- Expose Prometheus-compatible `/metrics` endpoint per service
|
||||
- System metrics: CPU, memory, disk, network
|
||||
- Application metrics: `request_count`, `request_duration` (histogram), `error_count`, `active_connections`
|
||||
- Business metrics: derived from acceptance criteria
|
||||
- Collection interval: 15s
|
||||
|
||||
**Distributed Tracing**:
|
||||
- OpenTelemetry SDK integration
|
||||
- Trace context propagation via HTTP headers and message queue metadata
|
||||
- Span naming: `<service>.<operation>`
|
||||
- Sampling: 100% in dev/staging, 10% in production (adjust based on volume)
|
||||
|
||||
**Alerting**:
|
||||
|
||||
| Severity | Response Time | Condition Examples |
|
||||
|----------|---------------|-------------------|
|
||||
| Critical | 5 min | Service down, data loss, health check failed |
|
||||
| High | 30 min | Error rate > 5%, P95 latency > 2x baseline |
|
||||
| Medium | 4 hours | Disk > 80%, elevated latency |
|
||||
| Low | Next business day | Non-critical warnings |
|
||||
|
||||
**Dashboards**:
|
||||
- Operations: service health, request rate, error rate, response time percentiles, resource utilization
|
||||
- Business: key business metrics from acceptance criteria
|
||||
|
||||
**Self-verification**:
|
||||
- [ ] Structured logging format defined with required fields
|
||||
- [ ] Metrics endpoint specified per service
|
||||
- [ ] OpenTelemetry tracing configured
|
||||
- [ ] Alert severities with response times defined
|
||||
- [ ] Dashboards cover operations and business metrics
|
||||
- [ ] PII exclusion from logs addressed
|
||||
|
||||
**Save action**: Write `observability.md` using `templates/observability.md`
|
||||
|
||||
---
|
||||
|
||||
### Step 6: Deployment Procedures
|
||||
|
||||
**Role**: DevOps / Platform engineer
|
||||
**Goal**: Define deployment strategy, rollback procedures, health checks, and deployment checklist
|
||||
**Constraints**: Procedures document — no implementation
|
||||
|
||||
1. Define deployment strategy:
|
||||
- Preferred pattern: blue-green / rolling / canary (choose based on architecture)
|
||||
- Zero-downtime requirement for production
|
||||
- Graceful shutdown: 30-second grace period for in-flight requests
|
||||
- Database migration ordering: migrate before deploy, backward-compatible only
|
||||
|
||||
2. Define health checks:
|
||||
|
||||
| Check | Type | Endpoint | Interval | Threshold |
|
||||
|-------|------|----------|----------|-----------|
|
||||
| Liveness | HTTP GET | `/health/live` | 10s | 3 failures → restart |
|
||||
| Readiness | HTTP GET | `/health/ready` | 5s | 3 failures → remove from LB |
|
||||
| Startup | HTTP GET | `/health/ready` | 5s | 30 attempts max |
|
||||
|
||||
3. Define rollback procedures:
|
||||
- Trigger criteria: health check failures, error rate spike, critical alert
|
||||
- Rollback steps: redeploy previous image tag, verify health, rollback database if needed
|
||||
- Communication: notify stakeholders during rollback
|
||||
- Post-mortem: required after every production rollback
|
||||
|
||||
4. Define deployment checklist:
|
||||
- [ ] All tests pass in CI
|
||||
- [ ] Security scan clean (zero critical/high CVEs)
|
||||
- [ ] Database migrations reviewed and tested
|
||||
- [ ] Environment variables configured
|
||||
- [ ] Health check endpoints responding
|
||||
- [ ] Monitoring alerts configured
|
||||
- [ ] Rollback plan documented and tested
|
||||
- [ ] Stakeholders notified
|
||||
|
||||
**Self-verification**:
|
||||
- [ ] Deployment strategy chosen and justified
|
||||
- [ ] Zero-downtime approach specified
|
||||
- [ ] Health checks defined (liveness, readiness, startup)
|
||||
- [ ] Rollback trigger criteria and steps documented
|
||||
- [ ] Deployment checklist complete
|
||||
|
||||
**Save action**: Write `deployment_procedures.md` using `templates/deployment_procedures.md`
|
||||
|
||||
**BLOCKING**: Present deployment procedures to user. Do NOT proceed until confirmed.
|
||||
|
||||
---
|
||||
|
||||
### Step 7: Deployment Scripts
|
||||
|
||||
**Role**: DevOps / Platform engineer
|
||||
**Goal**: Create executable deployment scripts for pulling Docker images and running services on the remote target machine
|
||||
**Constraints**: Produce real, executable shell scripts. This is the ONLY step that creates implementation artifacts.
|
||||
|
||||
1. Read containerization.md and deployment_procedures.md from previous steps
|
||||
2. Read `.env.example` for required variables
|
||||
3. Create the following scripts in `SCRIPTS_DIR/`:
|
||||
|
||||
**`deploy.sh`** — Main deployment orchestrator:
|
||||
- Validates that required environment variables are set (sources `.env` if present)
|
||||
- Calls `pull-images.sh`, then `stop-services.sh`, then `start-services.sh`, then `health-check.sh`
|
||||
- Exits with non-zero code on any failure
|
||||
- Supports `--rollback` flag to redeploy previous image tags
|
||||
|
||||
**`pull-images.sh`** — Pull Docker images to target machine:
|
||||
- Reads image list and tags from environment or config
|
||||
- Authenticates with container registry
|
||||
- Pulls all required images
|
||||
- Verifies image integrity (digest check)
|
||||
|
||||
**`start-services.sh`** — Start services on target machine:
|
||||
- Runs `docker compose up -d` or individual `docker run` commands
|
||||
- Applies environment variables from `.env`
|
||||
- Configures networks and volumes
|
||||
- Waits for containers to reach healthy state
|
||||
|
||||
**`stop-services.sh`** — Graceful shutdown:
|
||||
- Stops services with graceful shutdown period
|
||||
- Saves current image tags for rollback reference
|
||||
- Cleans up orphaned containers/networks
|
||||
|
||||
**`health-check.sh`** — Verify deployment health:
|
||||
- Checks all health endpoints
|
||||
- Reports status per service
|
||||
- Returns non-zero if any service is unhealthy
|
||||
|
||||
4. All scripts must:
|
||||
- Be POSIX-compatible (#!/bin/bash with set -euo pipefail)
|
||||
- Source `.env` from project root or accept env vars from the environment
|
||||
- Include usage/help output (`--help` flag)
|
||||
- Be idempotent where possible
|
||||
- Handle SSH connection to remote target (configurable via `DEPLOY_HOST` env var)
|
||||
|
||||
5. Document all scripts in `deploy_scripts.md`
|
||||
|
||||
**Self-verification**:
|
||||
- [ ] All five scripts created and executable
|
||||
- [ ] Scripts source environment variables correctly
|
||||
- [ ] `deploy.sh` orchestrates the full flow
|
||||
- [ ] `pull-images.sh` handles registry auth and image pull
|
||||
- [ ] `start-services.sh` starts containers with correct config
|
||||
- [ ] `stop-services.sh` handles graceful shutdown
|
||||
- [ ] `health-check.sh` validates all endpoints
|
||||
- [ ] Rollback supported via `deploy.sh --rollback`
|
||||
- [ ] Scripts work for remote deployment via SSH (DEPLOY_HOST)
|
||||
- [ ] `deploy_scripts.md` documents all scripts
|
||||
|
||||
**Save action**: Write scripts to `SCRIPTS_DIR/`, write `deploy_scripts.md` using `templates/deploy_scripts.md`
|
||||
|
||||
---
|
||||
|
||||
## Escalation Rules
|
||||
|
||||
| Situation | Action |
|
||||
|-----------|--------|
|
||||
| Unknown cloud provider or hosting | **ASK user** |
|
||||
| Container registry not specified | **ASK user** |
|
||||
| CI/CD platform preference unclear | **ASK user** — default to GitHub Actions |
|
||||
| Secret manager not chosen | **ASK user** |
|
||||
| Deployment pattern trade-offs | **ASK user** with recommendation |
|
||||
| Missing architecture.md | **STOP** — run `/plan` first |
|
||||
| Remote target machine details unknown | **ASK user** for SSH access, OS, and specs |
|
||||
|
||||
## Common Mistakes
|
||||
|
||||
- **Implementing during planning**: Steps 1–6 produce documents, not code (Step 7 is the exception — it creates scripts)
|
||||
- **Hardcoding secrets**: never include real credentials in deployment documents or scripts
|
||||
- **Ignoring integration test containerization**: the test environment must be containerized alongside the app
|
||||
- **Skipping BLOCKING gates**: never proceed past a BLOCKING marker without user confirmation
|
||||
- **Using `:latest` tags**: always pin base image versions
|
||||
- **Forgetting observability**: logging, metrics, and tracing are deployment concerns, not post-deployment additions
|
||||
- **Committing `.env`**: only `.env.example` goes to version control; `.env` must be in `.gitignore`
|
||||
- **Non-portable scripts**: deployment scripts must work across environments; avoid hardcoded paths
|
||||
|
||||
## Methodology Quick Reference
|
||||
|
||||
```
|
||||
┌────────────────────────────────────────────────────────────────┐
|
||||
│ Deployment Planning (7-Step Method) │
|
||||
├────────────────────────────────────────────────────────────────┤
|
||||
│ PREREQ: architecture.md + component specs exist │
|
||||
│ │
|
||||
│ 1. Status & Env → reports/deploy_status_report.md │
|
||||
│ + .env + .env.example │
|
||||
│ [BLOCKING: user confirms status & env vars] │
|
||||
│ 2. Containerization → containerization.md │
|
||||
│ [BLOCKING: user confirms Docker plan] │
|
||||
│ 3. CI/CD Pipeline → ci_cd_pipeline.md │
|
||||
│ 4. Environment → environment_strategy.md │
|
||||
│ 5. Observability → observability.md │
|
||||
│ 6. Procedures → deployment_procedures.md │
|
||||
│ [BLOCKING: user confirms deployment plan] │
|
||||
│ 7. Scripts → deploy_scripts.md + scripts/ │
|
||||
├────────────────────────────────────────────────────────────────┤
|
||||
│ Principles: Docker-first · IaC · Observability built-in │
|
||||
│ Environment parity · Save immediately │
|
||||
└────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
@@ -0,0 +1,87 @@
|
||||
# CI/CD Pipeline Template
|
||||
|
||||
Save as `_docs/04_deploy/ci_cd_pipeline.md`.
|
||||
|
||||
---
|
||||
|
||||
```markdown
|
||||
# [System Name] — CI/CD Pipeline
|
||||
|
||||
## Pipeline Overview
|
||||
|
||||
| Stage | Trigger | Quality Gate |
|
||||
|-------|---------|-------------|
|
||||
| Lint | Every push | Zero lint errors |
|
||||
| Test | Every push | 75%+ coverage, all tests pass |
|
||||
| Security | Every push | Zero critical/high CVEs |
|
||||
| Build | PR merge to dev | Docker build succeeds |
|
||||
| Push | After build | Images pushed to registry |
|
||||
| Deploy Staging | After push | Health checks pass |
|
||||
| Smoke Tests | After staging deploy | Critical paths pass |
|
||||
| Deploy Production | Manual approval | Health checks pass |
|
||||
|
||||
## Stage Details
|
||||
|
||||
### Lint
|
||||
- [Language-specific linters and formatters]
|
||||
- Runs in parallel per language
|
||||
|
||||
### Test
|
||||
- Unit tests: [framework and command]
|
||||
- Integration tests: [framework and command, uses docker-compose.test.yml]
|
||||
- Coverage threshold: 75% overall, 90% critical paths
|
||||
- Coverage report published as pipeline artifact
|
||||
|
||||
### Security
|
||||
- Dependency audit: [tool, e.g., npm audit / pip-audit / dotnet list package --vulnerable]
|
||||
- SAST scan: [tool, e.g., Semgrep / SonarQube]
|
||||
- Image scan: Trivy on built Docker images
|
||||
- Block on: critical or high severity findings
|
||||
|
||||
### Build
|
||||
- Docker images built using multi-stage Dockerfiles
|
||||
- Tagged with git SHA: `<registry>/<component>:<sha>`
|
||||
- Build cache: Docker layer cache via CI cache action
|
||||
|
||||
### Push
|
||||
- Registry: [container registry URL]
|
||||
- Authentication: [method]
|
||||
|
||||
### Deploy Staging
|
||||
- Deployment method: [docker compose / Kubernetes / cloud service]
|
||||
- Pre-deploy: run database migrations
|
||||
- Post-deploy: verify health check endpoints
|
||||
- Automated rollback on health check failure
|
||||
|
||||
### Smoke Tests
|
||||
- Subset of integration tests targeting staging environment
|
||||
- Validates critical user flows
|
||||
- Timeout: [maximum duration]
|
||||
|
||||
### Deploy Production
|
||||
- Requires manual approval via [mechanism]
|
||||
- Deployment strategy: [blue-green / rolling / canary]
|
||||
- Pre-deploy: database migration review
|
||||
- Post-deploy: health checks + monitoring for 15 min
|
||||
|
||||
## Caching Strategy
|
||||
|
||||
| Cache | Key | Restore Keys |
|
||||
|-------|-----|-------------|
|
||||
| Dependencies | [lockfile hash] | [partial match] |
|
||||
| Docker layers | [Dockerfile hash] | [partial match] |
|
||||
| Build artifacts | [source hash] | [partial match] |
|
||||
|
||||
## Parallelization
|
||||
|
||||
[Diagram or description of which stages run concurrently]
|
||||
|
||||
## Notifications
|
||||
|
||||
| Event | Channel | Recipients |
|
||||
|-------|---------|-----------|
|
||||
| Build failure | [Slack/email] | [team] |
|
||||
| Security alert | [Slack/email] | [team + security] |
|
||||
| Deploy success | [Slack] | [team] |
|
||||
| Deploy failure | [Slack/email + PagerDuty] | [on-call] |
|
||||
```
|
||||
@@ -0,0 +1,94 @@
|
||||
# Containerization Plan Template
|
||||
|
||||
Save as `_docs/04_deploy/containerization.md`.
|
||||
|
||||
---
|
||||
|
||||
```markdown
|
||||
# [System Name] — Containerization
|
||||
|
||||
## Component Dockerfiles
|
||||
|
||||
### [Component Name]
|
||||
|
||||
| Property | Value |
|
||||
|----------|-------|
|
||||
| Base image | [e.g., mcr.microsoft.com/dotnet/aspnet:8.0-alpine] |
|
||||
| Build image | [e.g., mcr.microsoft.com/dotnet/sdk:8.0-alpine] |
|
||||
| Stages | [dependency install → build → production] |
|
||||
| User | [non-root user name] |
|
||||
| Health check | [endpoint and command] |
|
||||
| Exposed ports | [port list] |
|
||||
| Key build args | [if any] |
|
||||
|
||||
### [Repeat for each component]
|
||||
|
||||
## Docker Compose — Local Development
|
||||
|
||||
```yaml
|
||||
# docker-compose.yml structure
|
||||
services:
|
||||
[component]:
|
||||
build: ./[path]
|
||||
ports: ["host:container"]
|
||||
environment: [reference .env.dev]
|
||||
depends_on: [dependencies with health condition]
|
||||
healthcheck: [command, interval, timeout, retries]
|
||||
|
||||
db:
|
||||
image: [postgres:version-alpine]
|
||||
volumes: [named volume]
|
||||
environment: [credentials from .env.dev]
|
||||
healthcheck: [pg_isready]
|
||||
|
||||
volumes:
|
||||
[named volumes]
|
||||
|
||||
networks:
|
||||
[shared network]
|
||||
```
|
||||
|
||||
## Docker Compose — Integration Tests
|
||||
|
||||
```yaml
|
||||
# docker-compose.test.yml structure
|
||||
services:
|
||||
[app components under test]
|
||||
|
||||
test-runner:
|
||||
build: ./tests/integration
|
||||
depends_on: [app components with health condition]
|
||||
environment: [test configuration]
|
||||
# Exit code determines test pass/fail
|
||||
|
||||
db:
|
||||
image: [postgres:version-alpine]
|
||||
volumes: [seed data mount]
|
||||
```
|
||||
|
||||
Run: `docker compose -f docker-compose.test.yml up --abort-on-container-exit`
|
||||
|
||||
## Image Tagging Strategy
|
||||
|
||||
| Context | Tag Format | Example |
|
||||
|---------|-----------|---------|
|
||||
| CI build | `<registry>/<project>/<component>:<git-sha>` | `ghcr.io/org/api:a1b2c3d` |
|
||||
| Release | `<registry>/<project>/<component>:<semver>` | `ghcr.io/org/api:1.2.0` |
|
||||
| Local dev | `<component>:latest` | `api:latest` |
|
||||
|
||||
## .dockerignore
|
||||
|
||||
```
|
||||
.git
|
||||
.cursor
|
||||
_docs
|
||||
_standalone
|
||||
node_modules
|
||||
**/bin
|
||||
**/obj
|
||||
**/__pycache__
|
||||
*.md
|
||||
.env*
|
||||
docker-compose*.yml
|
||||
```
|
||||
```
|
||||
@@ -0,0 +1,73 @@
|
||||
# Deployment Status Report Template
|
||||
|
||||
Save as `_docs/04_deploy/reports/deploy_status_report.md`.
|
||||
|
||||
---
|
||||
|
||||
```markdown
|
||||
# [System Name] — Deployment Status Report
|
||||
|
||||
## Deployment Readiness Summary
|
||||
|
||||
| Aspect | Status | Notes |
|
||||
|--------|--------|-------|
|
||||
| Architecture defined | ✅ / ❌ | |
|
||||
| Component specs complete | ✅ / ❌ | |
|
||||
| Infrastructure prerequisites met | ✅ / ❌ | |
|
||||
| External dependencies identified | ✅ / ❌ | |
|
||||
| Blockers | [count] | [summary] |
|
||||
|
||||
## Component Status
|
||||
|
||||
| Component | State | Docker-ready | Notes |
|
||||
|-----------|-------|-------------|-------|
|
||||
| [Component 1] | planned / implemented / tested | yes / no | |
|
||||
| [Component 2] | planned / implemented / tested | yes / no | |
|
||||
|
||||
## External Dependencies
|
||||
|
||||
| Dependency | Type | Required For | Status |
|
||||
|------------|------|-------------|--------|
|
||||
| [e.g., PostgreSQL] | Database | Data persistence | [available / needs setup] |
|
||||
| [e.g., Redis] | Cache | Session management | [available / needs setup] |
|
||||
| [e.g., External API] | API | [purpose] | [available / needs setup] |
|
||||
|
||||
## Infrastructure Prerequisites
|
||||
|
||||
| Prerequisite | Status | Action Needed |
|
||||
|-------------|--------|--------------|
|
||||
| Container registry | [ready / not set up] | [action] |
|
||||
| Cloud account | [ready / not set up] | [action] |
|
||||
| DNS configuration | [ready / not set up] | [action] |
|
||||
| SSL certificates | [ready / not set up] | [action] |
|
||||
| CI/CD platform | [ready / not set up] | [action] |
|
||||
| Secret manager | [ready / not set up] | [action] |
|
||||
|
||||
## Deployment Blockers
|
||||
|
||||
| Blocker | Severity | Resolution |
|
||||
|---------|----------|-----------|
|
||||
| [blocker description] | critical / high / medium | [resolution steps] |
|
||||
|
||||
## Required Environment Variables
|
||||
|
||||
| Variable | Purpose | Required In | Default (Dev) | Source (Staging/Prod) |
|
||||
|----------|---------|------------|---------------|----------------------|
|
||||
| `DATABASE_URL` | Postgres connection string | All components | `postgres://dev:dev@db:5432/app` | Secret manager |
|
||||
| `DEPLOY_HOST` | Remote target machine | Deployment scripts | `localhost` | Environment |
|
||||
| `REGISTRY_URL` | Container registry URL | CI/CD, deploy scripts | `localhost:5000` | Environment |
|
||||
| `REGISTRY_USER` | Registry username | CI/CD, deploy scripts | — | Secret manager |
|
||||
| `REGISTRY_PASS` | Registry password | CI/CD, deploy scripts | — | Secret manager |
|
||||
| [add all required variables] | | | | |
|
||||
|
||||
## .env Files Created
|
||||
|
||||
- `.env.example` — committed to VCS, contains all variable names with placeholder values
|
||||
- `.env` — git-ignored, contains development defaults
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. [Resolve any blockers listed above]
|
||||
2. [Set up missing infrastructure prerequisites]
|
||||
3. [Proceed to containerization planning]
|
||||
```
|
||||
@@ -0,0 +1,103 @@
|
||||
# Deployment Procedures Template
|
||||
|
||||
Save as `_docs/04_deploy/deployment_procedures.md`.
|
||||
|
||||
---
|
||||
|
||||
```markdown
|
||||
# [System Name] — Deployment Procedures
|
||||
|
||||
## Deployment Strategy
|
||||
|
||||
**Pattern**: [blue-green / rolling / canary]
|
||||
**Rationale**: [why this pattern fits the architecture]
|
||||
**Zero-downtime**: required for production deployments
|
||||
|
||||
### Graceful Shutdown
|
||||
|
||||
- Grace period: 30 seconds for in-flight requests
|
||||
- Sequence: stop accepting new requests → drain connections → shutdown
|
||||
- Container orchestrator: `terminationGracePeriodSeconds: 40`
|
||||
|
||||
### Database Migration Ordering
|
||||
|
||||
- Migrations run **before** new code deploys
|
||||
- All migrations must be backward-compatible (old code works with new schema)
|
||||
- Irreversible migrations require explicit approval
|
||||
|
||||
## Health Checks
|
||||
|
||||
| Check | Type | Endpoint | Interval | Failure Threshold | Action |
|
||||
|-------|------|----------|----------|-------------------|--------|
|
||||
| Liveness | HTTP GET | `/health/live` | 10s | 3 failures | Restart container |
|
||||
| Readiness | HTTP GET | `/health/ready` | 5s | 3 failures | Remove from load balancer |
|
||||
| Startup | HTTP GET | `/health/ready` | 5s | 30 attempts | Kill and recreate |
|
||||
|
||||
### Health Check Responses
|
||||
|
||||
- `/health/live`: returns 200 if process is running (no dependency checks)
|
||||
- `/health/ready`: returns 200 if all dependencies (DB, cache, queues) are reachable
|
||||
|
||||
## Staging Deployment
|
||||
|
||||
1. CI/CD builds and pushes Docker images tagged with git SHA
|
||||
2. Run database migrations against staging
|
||||
3. Deploy new images to staging environment
|
||||
4. Wait for health checks to pass (readiness probe)
|
||||
5. Run smoke tests against staging
|
||||
6. If smoke tests fail: automatic rollback to previous image
|
||||
|
||||
## Production Deployment
|
||||
|
||||
1. **Approval**: manual approval required via [mechanism]
|
||||
2. **Pre-deploy checks**:
|
||||
- [ ] Staging smoke tests passed
|
||||
- [ ] Security scan clean
|
||||
- [ ] Database migration reviewed
|
||||
- [ ] Monitoring alerts configured
|
||||
- [ ] Rollback plan confirmed
|
||||
3. **Deploy**: apply deployment strategy (blue-green / rolling / canary)
|
||||
4. **Verify**: health checks pass, error rate stable, latency within baseline
|
||||
5. **Monitor**: observe dashboards for 15 minutes post-deploy
|
||||
6. **Finalize**: mark deployment as successful or trigger rollback
|
||||
|
||||
## Rollback Procedures
|
||||
|
||||
### Trigger Criteria
|
||||
|
||||
- Health check failures persist after deploy
|
||||
- Error rate exceeds 5% for more than 5 minutes
|
||||
- Critical alert fires within 15 minutes of deploy
|
||||
- Manual decision by on-call engineer
|
||||
|
||||
### Rollback Steps
|
||||
|
||||
1. Redeploy previous Docker image tag (from CI/CD artifact)
|
||||
2. Verify health checks pass
|
||||
3. If database migration was applied:
|
||||
- Run DOWN migration if reversible
|
||||
- If irreversible: assess data impact, escalate if needed
|
||||
4. Notify stakeholders
|
||||
5. Schedule post-mortem within 24 hours
|
||||
|
||||
### Post-Mortem
|
||||
|
||||
Required after every production rollback:
|
||||
- Timeline of events
|
||||
- Root cause
|
||||
- What went wrong
|
||||
- Prevention measures
|
||||
|
||||
## Deployment Checklist
|
||||
|
||||
- [ ] All tests pass in CI
|
||||
- [ ] Security scan clean (zero critical/high CVEs)
|
||||
- [ ] Docker images built and pushed
|
||||
- [ ] Database migrations reviewed and tested
|
||||
- [ ] Environment variables configured for target environment
|
||||
- [ ] Health check endpoints verified
|
||||
- [ ] Monitoring alerts configured
|
||||
- [ ] Rollback plan documented and tested
|
||||
- [ ] Stakeholders notified of deployment window
|
||||
- [ ] On-call engineer available during deployment
|
||||
```
|
||||
@@ -0,0 +1,61 @@
|
||||
# Environment Strategy Template
|
||||
|
||||
Save as `_docs/04_deploy/environment_strategy.md`.
|
||||
|
||||
---
|
||||
|
||||
```markdown
|
||||
# [System Name] — Environment Strategy
|
||||
|
||||
## Environments
|
||||
|
||||
| Environment | Purpose | Infrastructure | Data Source |
|
||||
|-------------|---------|---------------|-------------|
|
||||
| Development | Local developer workflow | docker-compose | Seed data, mocked externals |
|
||||
| Staging | Pre-production validation | [mirrors production] | Anonymized production-like data |
|
||||
| Production | Live system | [full infrastructure] | Real data |
|
||||
|
||||
## Environment Variables
|
||||
|
||||
### Required Variables
|
||||
|
||||
| Variable | Purpose | Dev Default | Staging/Prod Source |
|
||||
|----------|---------|-------------|-------------------|
|
||||
| `DATABASE_URL` | Postgres connection | `postgres://dev:dev@db:5432/app` | Secret manager |
|
||||
| [add all required variables] | | | |
|
||||
|
||||
### `.env.example`
|
||||
|
||||
```env
|
||||
# Copy to .env and fill in values
|
||||
DATABASE_URL=postgres://user:pass@host:5432/dbname
|
||||
# [all required variables with placeholder values]
|
||||
```
|
||||
|
||||
### Variable Validation
|
||||
|
||||
All services validate required environment variables at startup and fail fast with a clear error message if any are missing.
|
||||
|
||||
## Secrets Management
|
||||
|
||||
| Environment | Method | Tool |
|
||||
|-------------|--------|------|
|
||||
| Development | `.env` file (git-ignored) | dotenv |
|
||||
| Staging | Secret manager | [AWS Secrets Manager / Azure Key Vault / Vault] |
|
||||
| Production | Secret manager | [AWS Secrets Manager / Azure Key Vault / Vault] |
|
||||
|
||||
Rotation policy: [frequency and procedure]
|
||||
|
||||
## Database Management
|
||||
|
||||
| Environment | Type | Migrations | Data |
|
||||
|-------------|------|-----------|------|
|
||||
| Development | Docker Postgres, named volume | Applied on container start | Seed data via init script |
|
||||
| Staging | Managed Postgres | Applied via CI/CD pipeline | Anonymized production snapshot |
|
||||
| Production | Managed Postgres | Applied via CI/CD with approval | Live data |
|
||||
|
||||
Migration rules:
|
||||
- All migrations must be backward-compatible (support old and new code simultaneously)
|
||||
- Reversible migrations required (DOWN/rollback script)
|
||||
- Production migrations require review before apply
|
||||
```
|
||||
@@ -0,0 +1,132 @@
|
||||
# Observability Template
|
||||
|
||||
Save as `_docs/04_deploy/observability.md`.
|
||||
|
||||
---
|
||||
|
||||
```markdown
|
||||
# [System Name] — Observability
|
||||
|
||||
## Logging
|
||||
|
||||
### Format
|
||||
|
||||
Structured JSON to stdout/stderr. No file-based logging in containers.
|
||||
|
||||
```json
|
||||
{
|
||||
"timestamp": "ISO8601",
|
||||
"level": "INFO",
|
||||
"service": "service-name",
|
||||
"correlation_id": "uuid",
|
||||
"message": "Event description",
|
||||
"context": {}
|
||||
}
|
||||
```
|
||||
|
||||
### Log Levels
|
||||
|
||||
| Level | Usage | Example |
|
||||
|-------|-------|---------|
|
||||
| ERROR | Exceptions, failures requiring attention | Database connection failed |
|
||||
| WARN | Potential issues, degraded performance | Retry attempt 2/3 |
|
||||
| INFO | Significant business events | User registered, Order placed |
|
||||
| DEBUG | Detailed diagnostics (dev/staging only) | Request payload, Query params |
|
||||
|
||||
### Retention
|
||||
|
||||
| Environment | Destination | Retention |
|
||||
|-------------|-------------|-----------|
|
||||
| Development | Console | Session |
|
||||
| Staging | [log aggregator] | 7 days |
|
||||
| Production | [log aggregator] | 30 days |
|
||||
|
||||
### PII Rules
|
||||
|
||||
- Never log passwords, tokens, or session IDs
|
||||
- Mask email addresses and personal identifiers
|
||||
- Log user IDs (opaque) instead of usernames
|
||||
|
||||
## Metrics
|
||||
|
||||
### Endpoints
|
||||
|
||||
Every service exposes Prometheus-compatible metrics at `/metrics`.
|
||||
|
||||
### Application Metrics
|
||||
|
||||
| Metric | Type | Description |
|
||||
|--------|------|-------------|
|
||||
| `request_count` | Counter | Total HTTP requests by method, path, status |
|
||||
| `request_duration_seconds` | Histogram | Response time by method, path |
|
||||
| `error_count` | Counter | Failed requests by type |
|
||||
| `active_connections` | Gauge | Current open connections |
|
||||
|
||||
### System Metrics
|
||||
|
||||
- CPU usage, Memory usage, Disk I/O, Network I/O
|
||||
|
||||
### Business Metrics
|
||||
|
||||
| Metric | Type | Description | Source |
|
||||
|--------|------|-------------|--------|
|
||||
| [from acceptance criteria] | | | |
|
||||
|
||||
Collection interval: 15 seconds
|
||||
|
||||
## Distributed Tracing
|
||||
|
||||
### Configuration
|
||||
|
||||
- SDK: OpenTelemetry
|
||||
- Propagation: W3C Trace Context via HTTP headers
|
||||
- Span naming: `<service>.<operation>`
|
||||
|
||||
### Sampling
|
||||
|
||||
| Environment | Rate | Rationale |
|
||||
|-------------|------|-----------|
|
||||
| Development | 100% | Full visibility |
|
||||
| Staging | 100% | Full visibility |
|
||||
| Production | 10% | Balance cost vs observability |
|
||||
|
||||
### Integration Points
|
||||
|
||||
- HTTP requests: automatic instrumentation
|
||||
- Database queries: automatic instrumentation
|
||||
- Message queues: manual span creation on publish/consume
|
||||
|
||||
## Alerting
|
||||
|
||||
| Severity | Response Time | Conditions |
|
||||
|----------|---------------|-----------|
|
||||
| Critical | 5 min | Service unreachable, health check failed for 1 min, data loss detected |
|
||||
| High | 30 min | Error rate > 5% for 5 min, P95 latency > 2x baseline for 10 min |
|
||||
| Medium | 4 hours | Disk usage > 80%, elevated latency, connection pool exhaustion |
|
||||
| Low | Next business day | Non-critical warnings, deprecated API usage |
|
||||
|
||||
### Notification Channels
|
||||
|
||||
| Severity | Channel |
|
||||
|----------|---------|
|
||||
| Critical | [PagerDuty / phone] |
|
||||
| High | [Slack + email] |
|
||||
| Medium | [Slack] |
|
||||
| Low | [Dashboard only] |
|
||||
|
||||
## Dashboards
|
||||
|
||||
### Operations Dashboard
|
||||
|
||||
- Service health status (up/down per component)
|
||||
- Request rate and error rate
|
||||
- Response time percentiles (P50, P95, P99)
|
||||
- Resource utilization (CPU, memory per container)
|
||||
- Active alerts
|
||||
|
||||
### Business Dashboard
|
||||
|
||||
- [Key business metrics from acceptance criteria]
|
||||
- [User activity indicators]
|
||||
- [Transaction volumes]
|
||||
```
|
||||
Reference in New Issue
Block a user