Batch 5 (cycle 2 hotfix sprint, batch 1 of 2). 6 story points under epic AZ-530. Addresses 2 Critical + 2 High deploy-blocking findings from security_report_cycle2.md (F-INFRA-1..F-INFRA-4). AZ-552 — drop_jwt_secret_deploy_preflight (1 pt, F-INFRA-1 Critical) scripts/start-services.sh swaps obsolete JwtConfig__Secret preflight for the cycle-2 trio (KeysFolder + ActiveKid + DataProtection.KeysFolder). .env.example, env/api/env.ps1, _docs/04_deploy/* updated to match. Repo scan in scripts/ and .env.example returns 0 offenders. AZ-553 — bind_mount_es256_keys (2 pts, F-INFRA-2 Critical) start-services.sh bind-mounts DEPLOY_HOST_JWT_KEYS_DIR read-only at /etc/azaion/jwt-keys; preflight fails fast on a missing or empty host directory with operator-actionable error messages. AZ-554 — persist_dataprotection_keys (2 pts, F-INFRA-3 High) Program.cs DataProtection wiring now fails fast in Production when KeysFolder is unset OR not probe-writable. start-services.sh bind-mounts DEPLOY_HOST_DP_KEYS_DIR read-write at /var/lib/azaion/dp-keys. Development behaviour unchanged (ephemeral default). AZ-555 — secrets_readme_es256_rewrite (1 pt, F-INFRA-4 High) secrets/README.md schema fully rewritten; new "Host-side directories" subsection with bind-mount table + ownership/permission guidance. Cycle-1 JwtConfig__Secret removed from live schema (one prose deprecation paragraph retained). Adjacent hygiene module-layout.md "Owns" extended to include scripts/, secrets/, env/, .env.example (gap from Step 9 new-task layout-delta). Tests e2e/Azaion.E2E/Tests/Cycle2HotfixDeployTests.cs — 19 facts (8 exec, 11 Skip with rationale per AZ-537/AZ-538 precedent). Skipped tests cover preflight/restart/Production-only paths verified at deploy gate. Build: 0W 0E across Azaion.AdminApi + Azaion.E2E. Test run deferred to autodev Step 11 (Run Tests). Tracker transition deferred to next batch (MCP availability unverified in this session — Leftovers pattern). Co-authored-by: Cursor <cursoragent@cursor.com>
9.6 KiB
Azaion Admin API — Deployment Scripts
Date: 2026-05-13 · Cycle: 1 · Status: shipped (this is the only doc that matches concrete files in scripts/ and secrets/).
1. Overview
| Script | Purpose | Location |
|---|---|---|
deploy.sh |
Main orchestrator (pull → stop → start → health) | scripts/deploy.sh |
pull-images.sh |
docker login + docker pull the target image |
scripts/pull-images.sh |
stop-services.sh |
Graceful stop + record rollback target | scripts/stop-services.sh |
start-services.sh |
docker run with the materialized env file and bind mounts |
scripts/start-services.sh |
health-check.sh |
Poll /health/ready until 200 or timeout |
scripts/health-check.sh |
smoke.sh |
6 critical-path checks against the public URL | scripts/smoke.sh |
_lib.sh |
Shared logging + env-overlay helpers | scripts/_lib.sh (sourced, not executed) |
run-tests.sh |
Existing — runs the docker-compose test suite locally | scripts/run-tests.sh |
run-performance-tests.sh |
Existing — runs k6 against the test compose stack | scripts/run-performance-tests.sh |
2. Prerequisites
On the deploy host:
| Requirement | Why |
|---|---|
| Docker 24+ | docker pull, docker run, --restart unless-stopped |
sops (≥ 3.8) |
Decrypt secrets/<env>.env |
age (≥ 1.1) |
Backing crypto for sops |
curl |
Used by health-check.sh and smoke.sh |
jq |
Used by smoke.sh for JSON parsing |
/etc/azaion/age.key (mode 0400) |
Per-host age private key (see secrets/README.md) |
On the operator's machine (only for smoke.sh):
| Requirement | Why |
|---|---|
curl, jq |
Same as host |
| Network access to the public URL | BASE_URL is the production / staging hostname |
3. Environment Variables
scripts/_lib.sh load_env_overlay <env> resolves variables in this order (later sources override earlier):
<repo>/.env(if present — local-dev convenience; harmless on a prod host that has no.env)secrets/<env>.public.env(committed plain text; loaded withset -a)secrets/<env>.env(sops-decrypted to a tempfile, sourced, tempfile deleted on exit)- The shell environment that invoked
deploy.sh(operator overrides)
The complete variable inventory is .env.example at the repo root. Variables specifically consumed by these scripts:
| Variable | Required by | Source | Notes |
|---|---|---|---|
ENV |
deploy.sh |
operator shell | staging or production |
REGISTRY_HOST, REGISTRY_IMAGE, REGISTRY_TAG |
pull / start | public env / operator | tag is the <sha12>-<arch> immutable tag from .woodpecker/02-build-push.yml |
REGISTRY_USER, REGISTRY_TOKEN |
pull | encrypted env | optional; if both missing, assumes docker login was done out-of-band |
DEPLOY_CONTAINER_NAME, DEPLOY_HOST_PORT, DEPLOY_HOST_CONTENT_DIR, DEPLOY_HOST_LOGS_DIR |
stop / start | public env | identical for staging and prod by default |
ASPNETCORE_ConnectionStrings__AzaionDb, __AzaionDbAdmin |
start | encrypted env | the API fail-fast checks these on boot |
ASPNETCORE_JwtConfig__KeysFolder, __ActiveKid (AZ-552/AZ-553) |
start | public env | container-side path to the ES256 PEMs + active kid; preflight + JwtSigningKeyProvider fail-fast if unset |
ASPNETCORE_DataProtection__KeysFolder (AZ-554) |
start | public env | container-side path to the persisted DataProtection key ring; Production fail-fast if unset |
DEPLOY_HOST_JWT_KEYS_DIR, DEPLOY_HOST_DP_KEYS_DIR (AZ-553/AZ-554) |
start | host env / public env | host-side directories bind-mounted into the container (JWT keys RO; DP keys RW) |
ASPNETCORE_ResourcesConfig__*, JwtConfig__{Issuer,Audience,AccessTokenLifetimeMinutes} |
start | public env (defaults from appsettings.json) |
only override if the env value differs from the appsettings default |
SOPS_AGE_KEY_FILE |
_lib.sh |
host | defaults to /etc/azaion/age.key if unset |
SMOKE_ADMIN_EMAIL, SMOKE_ADMIN_PASSWORD |
smoke.sh |
operator shell | dedicated smoke-test admin user; rotate as a regular admin password |
4. Script details
deploy.sh
Usage:
ENV=staging ./scripts/deploy.sh <sha-tag>
ENV=production ./scripts/deploy.sh <sha-tag>
ENV=staging ./scripts/deploy.sh --rollback # uses scripts/.previous_tags.env
./scripts/deploy.sh --help
Flow (matches _docs/04_deploy/deployment_procedures.md §3 / §4):
- Validate
ENVand required commands. - Load env overlay (public + sops-decrypted).
- If
--rollback: readscripts/.previous_tags.env→ setSHA_TAGtoPREVIOUS_SHA_TAG. pull-images.sh(login + pull).stop-services.sh(records the SHA of whatever was running; graceful stop withdocker stop -t 40; remove).start-services.sh(docker run --restart unless-stopped --env-file <materialized> --publish $DEPLOY_HOST_PORT:8080).health-check.sh(poll/health/readywith timeout).- Print success line with the running revision.
Failure handling: any non-zero exit from a sub-script aborts deploy.sh (because set -euo pipefail propagates). The previously-recorded SHA in .previous_tags.env is unchanged, so --rollback after a failed deploy targets the version that was running BEFORE the failed attempt.
pull-images.sh
docker loginonly when bothREGISTRY_USERandREGISTRY_TOKENare set; otherwise warns and continues (assumes pre-auth).docker pull $REGISTRY_HOST/$REGISTRY_IMAGE:$REGISTRY_TAG.- Logs the resolved
RepoDigests[0]to give the operator an immutable identifier in the deploy log.
stop-services.sh
- Reads
org.opencontainers.image.revisionfrom the running container (label set by the Dockerfile). - Writes
scripts/.previous_tags.env:PREVIOUS_SHA_TAG=<sha12>-<arch> PREVIOUS_REVISION=<full sha> RECORDED_AT=<ISO 8601> docker stop -t 40thendocker rm -f.- If the container does not exist, logs and exits 0 (idempotent — first deploy on a new host should succeed).
start-services.sh
- Materializes a runtime env file by filtering the current shell environment with
grep '^(ASPNETCORE_|AZAION_)'. Registry credentials and deploy-host plumbing variables stay on the host and never enter the container. mkdir -pfor the bind-mountedContent/andlogs/dirs (idempotent).docker run --detach --name --restart unless-stopped --env-file --publish --volume.- Logs the container ID and the running revision.
health-check.sh
- One-shot check on
/health/livefirst (3 s timeout). If this fails the container is wedged — fail fast. - Polls
/health/readyeveryHEALTH_INTERVAL(default 2 s) until 200 orHEALTH_TIMEOUT(default 60 s). - Returns 0 on first 200; non-zero on timeout.
smoke.sh
Six checks, each ≤ 10 s, against the public BASE_URL:
GET /health/live(200)GET /health/ready(200, best-effort — public URL may legitimately not expose this)POST /login— extract JWTGET /users/current(Bearer auth)GET /users— count rowsGET /resources/list— sanity that filesystem-backed paths are reachable
Smoke is intentionally lightweight; it does NOT exercise CRUD or detection-class endpoints (those are covered by E2E in CI).
_lib.sh
Shared sourced library. Sourced via . "$SCRIPT_DIR/_lib.sh" from every script. NOT executable (lives at scripts/_lib.sh mode 0644). Contains:
log_info/log_warn/log_error/dierequire_env <var…>/require_cmd <cmd…>load_env_overlay <env>(the sops + age decryption pipeline)container_exists,container_running,current_image_revision
5. Examples
First-ever staging deploy
# On the staging host, as deploy operator:
cd /opt/azaion/admin # or wherever the repo is checked out
ENV=staging ./scripts/deploy.sh a1b2c3d4e5f6-arm
Rolling back production after a bad deploy
# Same host, immediately after the failed deploy:
ENV=production ./scripts/deploy.sh --rollback
Running smoke from the operator workstation
export BASE_URL=https://stage.admin.azaion.com
export SMOKE_ADMIN_EMAIL=ops-smoke@azaion.com
export SMOKE_ADMIN_PASSWORD=... # from the operator's password manager
./scripts/smoke.sh
Local development against the dockerized stack
The dev-time compose was deferred (Drift K-adjacent). Until it lands, run the API directly:
# Postgres on host port 4312 (per env/db/00_install.sh)
dotnet run --project Azaion.AdminApi
6. Common script properties
All scripts:
- Use
#!/usr/bin/env bashwithset -euo pipefail. - Support
--help/-hfor usage. - Source
_lib.shfor logging and env-overlay helpers. - Are idempotent where possible (running
deploy.shtwice with the same SHA tag is a no-op forpull-images.sh, recreates the container instop/start, and re-checks health). - Echo to stderr for log lines (so stdout from a sub-process can still be piped).
7. What is NOT shipped in cycle 1
- Remote SSH wrapper. The deploy procedure assumes the operator runs the script on the target host. A
--remote $DEPLOY_HOSTmode is recorded as Drift O (carried forward). - Slack notifications from inside the scripts. Notifications happen out-of-band per
_docs/04_deploy/observability.md§5. - Database migration step. Migrations are applied manually with
psqlper_docs/04_deploy/environment_strategy.md§4 (Drift J).
8. Related artifacts
- Postmortem template:
_docs/06_metrics/postmortem_template.md - Procedures:
_docs/04_deploy/deployment_procedures.md - Environment strategy:
_docs/04_deploy/environment_strategy.md - secrets/ folder onboarding:
secrets/README.md