Files
admin/_docs/04_deploy/deploy_scripts.md
T
Oleksandr Bezdieniezhnykh f369153149 [AZ-552] [AZ-553] [AZ-554] [AZ-555] Cycle-2 hotfix: deploy/infra chain
Batch 5 (cycle 2 hotfix sprint, batch 1 of 2). 6 story points under epic
AZ-530. Addresses 2 Critical + 2 High deploy-blocking findings from
security_report_cycle2.md (F-INFRA-1..F-INFRA-4).

AZ-552 — drop_jwt_secret_deploy_preflight (1 pt, F-INFRA-1 Critical)
  scripts/start-services.sh swaps obsolete JwtConfig__Secret preflight
  for the cycle-2 trio (KeysFolder + ActiveKid + DataProtection.KeysFolder).
  .env.example, env/api/env.ps1, _docs/04_deploy/* updated to match. Repo
  scan in scripts/ and .env.example returns 0 offenders.

AZ-553 — bind_mount_es256_keys (2 pts, F-INFRA-2 Critical)
  start-services.sh bind-mounts DEPLOY_HOST_JWT_KEYS_DIR read-only at
  /etc/azaion/jwt-keys; preflight fails fast on a missing or empty host
  directory with operator-actionable error messages.

AZ-554 — persist_dataprotection_keys (2 pts, F-INFRA-3 High)
  Program.cs DataProtection wiring now fails fast in Production when
  KeysFolder is unset OR not probe-writable. start-services.sh bind-mounts
  DEPLOY_HOST_DP_KEYS_DIR read-write at /var/lib/azaion/dp-keys.
  Development behaviour unchanged (ephemeral default).

AZ-555 — secrets_readme_es256_rewrite (1 pt, F-INFRA-4 High)
  secrets/README.md schema fully rewritten; new "Host-side directories"
  subsection with bind-mount table + ownership/permission guidance.
  Cycle-1 JwtConfig__Secret removed from live schema (one prose
  deprecation paragraph retained).

Adjacent hygiene
  module-layout.md "Owns" extended to include scripts/, secrets/, env/,
  .env.example (gap from Step 9 new-task layout-delta).

Tests
  e2e/Azaion.E2E/Tests/Cycle2HotfixDeployTests.cs — 19 facts (8 exec,
  11 Skip with rationale per AZ-537/AZ-538 precedent). Skipped tests
  cover preflight/restart/Production-only paths verified at deploy gate.

Build: 0W 0E across Azaion.AdminApi + Azaion.E2E.
Test run deferred to autodev Step 11 (Run Tests).
Tracker transition deferred to next batch (MCP availability unverified
in this session — Leftovers pattern).

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-14 09:35:57 +03:00

9.6 KiB

Azaion Admin API — Deployment Scripts

Date: 2026-05-13 · Cycle: 1 · Status: shipped (this is the only doc that matches concrete files in scripts/ and secrets/).

1. Overview

Script Purpose Location
deploy.sh Main orchestrator (pull → stop → start → health) scripts/deploy.sh
pull-images.sh docker login + docker pull the target image scripts/pull-images.sh
stop-services.sh Graceful stop + record rollback target scripts/stop-services.sh
start-services.sh docker run with the materialized env file and bind mounts scripts/start-services.sh
health-check.sh Poll /health/ready until 200 or timeout scripts/health-check.sh
smoke.sh 6 critical-path checks against the public URL scripts/smoke.sh
_lib.sh Shared logging + env-overlay helpers scripts/_lib.sh (sourced, not executed)
run-tests.sh Existing — runs the docker-compose test suite locally scripts/run-tests.sh
run-performance-tests.sh Existing — runs k6 against the test compose stack scripts/run-performance-tests.sh

2. Prerequisites

On the deploy host:

Requirement Why
Docker 24+ docker pull, docker run, --restart unless-stopped
sops (≥ 3.8) Decrypt secrets/<env>.env
age (≥ 1.1) Backing crypto for sops
curl Used by health-check.sh and smoke.sh
jq Used by smoke.sh for JSON parsing
/etc/azaion/age.key (mode 0400) Per-host age private key (see secrets/README.md)

On the operator's machine (only for smoke.sh):

Requirement Why
curl, jq Same as host
Network access to the public URL BASE_URL is the production / staging hostname

3. Environment Variables

scripts/_lib.sh load_env_overlay <env> resolves variables in this order (later sources override earlier):

  1. <repo>/.env (if present — local-dev convenience; harmless on a prod host that has no .env)
  2. secrets/<env>.public.env (committed plain text; loaded with set -a)
  3. secrets/<env>.env (sops-decrypted to a tempfile, sourced, tempfile deleted on exit)
  4. The shell environment that invoked deploy.sh (operator overrides)

The complete variable inventory is .env.example at the repo root. Variables specifically consumed by these scripts:

Variable Required by Source Notes
ENV deploy.sh operator shell staging or production
REGISTRY_HOST, REGISTRY_IMAGE, REGISTRY_TAG pull / start public env / operator tag is the <sha12>-<arch> immutable tag from .woodpecker/02-build-push.yml
REGISTRY_USER, REGISTRY_TOKEN pull encrypted env optional; if both missing, assumes docker login was done out-of-band
DEPLOY_CONTAINER_NAME, DEPLOY_HOST_PORT, DEPLOY_HOST_CONTENT_DIR, DEPLOY_HOST_LOGS_DIR stop / start public env identical for staging and prod by default
ASPNETCORE_ConnectionStrings__AzaionDb, __AzaionDbAdmin start encrypted env the API fail-fast checks these on boot
ASPNETCORE_JwtConfig__KeysFolder, __ActiveKid (AZ-552/AZ-553) start public env container-side path to the ES256 PEMs + active kid; preflight + JwtSigningKeyProvider fail-fast if unset
ASPNETCORE_DataProtection__KeysFolder (AZ-554) start public env container-side path to the persisted DataProtection key ring; Production fail-fast if unset
DEPLOY_HOST_JWT_KEYS_DIR, DEPLOY_HOST_DP_KEYS_DIR (AZ-553/AZ-554) start host env / public env host-side directories bind-mounted into the container (JWT keys RO; DP keys RW)
ASPNETCORE_ResourcesConfig__*, JwtConfig__{Issuer,Audience,AccessTokenLifetimeMinutes} start public env (defaults from appsettings.json) only override if the env value differs from the appsettings default
SOPS_AGE_KEY_FILE _lib.sh host defaults to /etc/azaion/age.key if unset
SMOKE_ADMIN_EMAIL, SMOKE_ADMIN_PASSWORD smoke.sh operator shell dedicated smoke-test admin user; rotate as a regular admin password

4. Script details

deploy.sh

Usage:

ENV=staging   ./scripts/deploy.sh <sha-tag>
ENV=production ./scripts/deploy.sh <sha-tag>
ENV=staging   ./scripts/deploy.sh --rollback   # uses scripts/.previous_tags.env
./scripts/deploy.sh --help

Flow (matches _docs/04_deploy/deployment_procedures.md §3 / §4):

  1. Validate ENV and required commands.
  2. Load env overlay (public + sops-decrypted).
  3. If --rollback: read scripts/.previous_tags.env → set SHA_TAG to PREVIOUS_SHA_TAG.
  4. pull-images.sh (login + pull).
  5. stop-services.sh (records the SHA of whatever was running; graceful stop with docker stop -t 40; remove).
  6. start-services.sh (docker run --restart unless-stopped --env-file <materialized> --publish $DEPLOY_HOST_PORT:8080).
  7. health-check.sh (poll /health/ready with timeout).
  8. Print success line with the running revision.

Failure handling: any non-zero exit from a sub-script aborts deploy.sh (because set -euo pipefail propagates). The previously-recorded SHA in .previous_tags.env is unchanged, so --rollback after a failed deploy targets the version that was running BEFORE the failed attempt.

pull-images.sh

  • docker login only when both REGISTRY_USER and REGISTRY_TOKEN are set; otherwise warns and continues (assumes pre-auth).
  • docker pull $REGISTRY_HOST/$REGISTRY_IMAGE:$REGISTRY_TAG.
  • Logs the resolved RepoDigests[0] to give the operator an immutable identifier in the deploy log.

stop-services.sh

  • Reads org.opencontainers.image.revision from the running container (label set by the Dockerfile).
  • Writes scripts/.previous_tags.env:
    PREVIOUS_SHA_TAG=<sha12>-<arch>
    PREVIOUS_REVISION=<full sha>
    RECORDED_AT=<ISO 8601>
    
  • docker stop -t 40 then docker rm -f.
  • If the container does not exist, logs and exits 0 (idempotent — first deploy on a new host should succeed).

start-services.sh

  • Materializes a runtime env file by filtering the current shell environment with grep '^(ASPNETCORE_|AZAION_)'. Registry credentials and deploy-host plumbing variables stay on the host and never enter the container.
  • mkdir -p for the bind-mounted Content/ and logs/ dirs (idempotent).
  • docker run --detach --name --restart unless-stopped --env-file --publish --volume.
  • Logs the container ID and the running revision.

health-check.sh

  • One-shot check on /health/live first (3 s timeout). If this fails the container is wedged — fail fast.
  • Polls /health/ready every HEALTH_INTERVAL (default 2 s) until 200 or HEALTH_TIMEOUT (default 60 s).
  • Returns 0 on first 200; non-zero on timeout.

smoke.sh

Six checks, each ≤ 10 s, against the public BASE_URL:

  1. GET /health/live (200)
  2. GET /health/ready (200, best-effort — public URL may legitimately not expose this)
  3. POST /login — extract JWT
  4. GET /users/current (Bearer auth)
  5. GET /users — count rows
  6. GET /resources/list — sanity that filesystem-backed paths are reachable

Smoke is intentionally lightweight; it does NOT exercise CRUD or detection-class endpoints (those are covered by E2E in CI).

_lib.sh

Shared sourced library. Sourced via . "$SCRIPT_DIR/_lib.sh" from every script. NOT executable (lives at scripts/_lib.sh mode 0644). Contains:

  • log_info / log_warn / log_error / die
  • require_env <var…> / require_cmd <cmd…>
  • load_env_overlay <env> (the sops + age decryption pipeline)
  • container_exists, container_running, current_image_revision

5. Examples

First-ever staging deploy

# On the staging host, as deploy operator:
cd /opt/azaion/admin   # or wherever the repo is checked out
ENV=staging ./scripts/deploy.sh a1b2c3d4e5f6-arm

Rolling back production after a bad deploy

# Same host, immediately after the failed deploy:
ENV=production ./scripts/deploy.sh --rollback

Running smoke from the operator workstation

export BASE_URL=https://stage.admin.azaion.com
export SMOKE_ADMIN_EMAIL=ops-smoke@azaion.com
export SMOKE_ADMIN_PASSWORD=...   # from the operator's password manager
./scripts/smoke.sh

Local development against the dockerized stack

The dev-time compose was deferred (Drift K-adjacent). Until it lands, run the API directly:

# Postgres on host port 4312 (per env/db/00_install.sh)
dotnet run --project Azaion.AdminApi

6. Common script properties

All scripts:

  • Use #!/usr/bin/env bash with set -euo pipefail.
  • Support --help / -h for usage.
  • Source _lib.sh for logging and env-overlay helpers.
  • Are idempotent where possible (running deploy.sh twice with the same SHA tag is a no-op for pull-images.sh, recreates the container in stop/start, and re-checks health).
  • Echo to stderr for log lines (so stdout from a sub-process can still be piped).

7. What is NOT shipped in cycle 1

  • Remote SSH wrapper. The deploy procedure assumes the operator runs the script on the target host. A --remote $DEPLOY_HOST mode is recorded as Drift O (carried forward).
  • Slack notifications from inside the scripts. Notifications happen out-of-band per _docs/04_deploy/observability.md §5.
  • Database migration step. Migrations are applied manually with psql per _docs/04_deploy/environment_strategy.md §4 (Drift J).
  • Postmortem template: _docs/06_metrics/postmortem_template.md
  • Procedures: _docs/04_deploy/deployment_procedures.md
  • Environment strategy: _docs/04_deploy/environment_strategy.md
  • secrets/ folder onboarding: secrets/README.md