mirror of
https://github.com/azaion/admin.git
synced 2026-06-21 14:41:08 +00:00
f369153149
Batch 5 (cycle 2 hotfix sprint, batch 1 of 2). 6 story points under epic AZ-530. Addresses 2 Critical + 2 High deploy-blocking findings from security_report_cycle2.md (F-INFRA-1..F-INFRA-4). AZ-552 — drop_jwt_secret_deploy_preflight (1 pt, F-INFRA-1 Critical) scripts/start-services.sh swaps obsolete JwtConfig__Secret preflight for the cycle-2 trio (KeysFolder + ActiveKid + DataProtection.KeysFolder). .env.example, env/api/env.ps1, _docs/04_deploy/* updated to match. Repo scan in scripts/ and .env.example returns 0 offenders. AZ-553 — bind_mount_es256_keys (2 pts, F-INFRA-2 Critical) start-services.sh bind-mounts DEPLOY_HOST_JWT_KEYS_DIR read-only at /etc/azaion/jwt-keys; preflight fails fast on a missing or empty host directory with operator-actionable error messages. AZ-554 — persist_dataprotection_keys (2 pts, F-INFRA-3 High) Program.cs DataProtection wiring now fails fast in Production when KeysFolder is unset OR not probe-writable. start-services.sh bind-mounts DEPLOY_HOST_DP_KEYS_DIR read-write at /var/lib/azaion/dp-keys. Development behaviour unchanged (ephemeral default). AZ-555 — secrets_readme_es256_rewrite (1 pt, F-INFRA-4 High) secrets/README.md schema fully rewritten; new "Host-side directories" subsection with bind-mount table + ownership/permission guidance. Cycle-1 JwtConfig__Secret removed from live schema (one prose deprecation paragraph retained). Adjacent hygiene module-layout.md "Owns" extended to include scripts/, secrets/, env/, .env.example (gap from Step 9 new-task layout-delta). Tests e2e/Azaion.E2E/Tests/Cycle2HotfixDeployTests.cs — 19 facts (8 exec, 11 Skip with rationale per AZ-537/AZ-538 precedent). Skipped tests cover preflight/restart/Production-only paths verified at deploy gate. Build: 0W 0E across Azaion.AdminApi + Azaion.E2E. Test run deferred to autodev Step 11 (Run Tests). Tracker transition deferred to next batch (MCP availability unverified in this session — Leftovers pattern). Co-authored-by: Cursor <cursoragent@cursor.com>
200 lines
9.6 KiB
Markdown
200 lines
9.6 KiB
Markdown
# Azaion Admin API — Deployment Scripts
|
|
|
|
**Date**: 2026-05-13 · **Cycle**: 1 · **Status**: shipped (this is the only doc that matches concrete files in `scripts/` and `secrets/`).
|
|
|
|
## 1. Overview
|
|
|
|
| Script | Purpose | Location |
|
|
|--------|---------|----------|
|
|
| `deploy.sh` | Main orchestrator (pull → stop → start → health) | `scripts/deploy.sh` |
|
|
| `pull-images.sh` | `docker login` + `docker pull` the target image | `scripts/pull-images.sh` |
|
|
| `stop-services.sh` | Graceful stop + record rollback target | `scripts/stop-services.sh` |
|
|
| `start-services.sh` | `docker run` with the materialized env file and bind mounts | `scripts/start-services.sh` |
|
|
| `health-check.sh` | Poll `/health/ready` until 200 or timeout | `scripts/health-check.sh` |
|
|
| `smoke.sh` | 6 critical-path checks against the **public** URL | `scripts/smoke.sh` |
|
|
| `_lib.sh` | Shared logging + env-overlay helpers | `scripts/_lib.sh` (sourced, not executed) |
|
|
| `run-tests.sh` | Existing — runs the docker-compose test suite locally | `scripts/run-tests.sh` |
|
|
| `run-performance-tests.sh` | Existing — runs k6 against the test compose stack | `scripts/run-performance-tests.sh` |
|
|
|
|
## 2. Prerequisites
|
|
|
|
On the **deploy host**:
|
|
|
|
| Requirement | Why |
|
|
|-------------|-----|
|
|
| Docker 24+ | `docker pull`, `docker run`, `--restart unless-stopped` |
|
|
| `sops` (≥ 3.8) | Decrypt `secrets/<env>.env` |
|
|
| `age` (≥ 1.1) | Backing crypto for sops |
|
|
| `curl` | Used by `health-check.sh` and `smoke.sh` |
|
|
| `jq` | Used by `smoke.sh` for JSON parsing |
|
|
| `/etc/azaion/age.key` (mode 0400) | Per-host age private key (see `secrets/README.md`) |
|
|
|
|
On the **operator's machine** (only for `smoke.sh`):
|
|
|
|
| Requirement | Why |
|
|
|-------------|-----|
|
|
| `curl`, `jq` | Same as host |
|
|
| Network access to the public URL | `BASE_URL` is the production / staging hostname |
|
|
|
|
## 3. Environment Variables
|
|
|
|
`scripts/_lib.sh` `load_env_overlay <env>` resolves variables in this order (later sources override earlier):
|
|
|
|
1. `<repo>/.env` (if present — local-dev convenience; harmless on a prod host that has no `.env`)
|
|
2. `secrets/<env>.public.env` (committed plain text; loaded with `set -a`)
|
|
3. `secrets/<env>.env` (sops-decrypted to a tempfile, sourced, tempfile deleted on exit)
|
|
4. The shell environment that invoked `deploy.sh` (operator overrides)
|
|
|
|
The complete variable inventory is `.env.example` at the repo root. Variables specifically consumed by these scripts:
|
|
|
|
| Variable | Required by | Source | Notes |
|
|
|----------|-------------|--------|-------|
|
|
| `ENV` | `deploy.sh` | operator shell | `staging` or `production` |
|
|
| `REGISTRY_HOST`, `REGISTRY_IMAGE`, `REGISTRY_TAG` | pull / start | public env / operator | tag is the `<sha12>-<arch>` immutable tag from `.woodpecker/02-build-push.yml` |
|
|
| `REGISTRY_USER`, `REGISTRY_TOKEN` | pull | encrypted env | optional; if both missing, assumes `docker login` was done out-of-band |
|
|
| `DEPLOY_CONTAINER_NAME`, `DEPLOY_HOST_PORT`, `DEPLOY_HOST_CONTENT_DIR`, `DEPLOY_HOST_LOGS_DIR` | stop / start | public env | identical for staging and prod by default |
|
|
| `ASPNETCORE_ConnectionStrings__AzaionDb`, `__AzaionDbAdmin` | start | encrypted env | the API fail-fast checks these on boot |
|
|
| `ASPNETCORE_JwtConfig__KeysFolder`, `__ActiveKid` (AZ-552/AZ-553) | start | public env | container-side path to the ES256 PEMs + active kid; preflight + `JwtSigningKeyProvider` fail-fast if unset |
|
|
| `ASPNETCORE_DataProtection__KeysFolder` (AZ-554) | start | public env | container-side path to the persisted DataProtection key ring; Production fail-fast if unset |
|
|
| `DEPLOY_HOST_JWT_KEYS_DIR`, `DEPLOY_HOST_DP_KEYS_DIR` (AZ-553/AZ-554) | start | host env / public env | host-side directories bind-mounted into the container (JWT keys RO; DP keys RW) |
|
|
| `ASPNETCORE_ResourcesConfig__*`, `JwtConfig__{Issuer,Audience,AccessTokenLifetimeMinutes}` | start | public env (defaults from `appsettings.json`) | only override if the env value differs from the appsettings default |
|
|
| `SOPS_AGE_KEY_FILE` | `_lib.sh` | host | defaults to `/etc/azaion/age.key` if unset |
|
|
| `SMOKE_ADMIN_EMAIL`, `SMOKE_ADMIN_PASSWORD` | `smoke.sh` | operator shell | dedicated smoke-test admin user; rotate as a regular admin password |
|
|
|
|
## 4. Script details
|
|
|
|
### `deploy.sh`
|
|
|
|
**Usage**:
|
|
|
|
```bash
|
|
ENV=staging ./scripts/deploy.sh <sha-tag>
|
|
ENV=production ./scripts/deploy.sh <sha-tag>
|
|
ENV=staging ./scripts/deploy.sh --rollback # uses scripts/.previous_tags.env
|
|
./scripts/deploy.sh --help
|
|
```
|
|
|
|
**Flow** (matches `_docs/04_deploy/deployment_procedures.md` §3 / §4):
|
|
|
|
1. Validate `ENV` and required commands.
|
|
2. Load env overlay (public + sops-decrypted).
|
|
3. If `--rollback`: read `scripts/.previous_tags.env` → set `SHA_TAG` to `PREVIOUS_SHA_TAG`.
|
|
4. `pull-images.sh` (login + pull).
|
|
5. `stop-services.sh` (records the SHA of whatever was running; graceful stop with `docker stop -t 40`; remove).
|
|
6. `start-services.sh` (`docker run --restart unless-stopped --env-file <materialized> --publish $DEPLOY_HOST_PORT:8080`).
|
|
7. `health-check.sh` (poll `/health/ready` with timeout).
|
|
8. Print success line with the running revision.
|
|
|
|
**Failure handling**: any non-zero exit from a sub-script aborts `deploy.sh` (because `set -euo pipefail` propagates). The previously-recorded SHA in `.previous_tags.env` is unchanged, so `--rollback` after a failed deploy targets the version that was running BEFORE the failed attempt.
|
|
|
|
### `pull-images.sh`
|
|
|
|
- `docker login` only when both `REGISTRY_USER` and `REGISTRY_TOKEN` are set; otherwise warns and continues (assumes pre-auth).
|
|
- `docker pull $REGISTRY_HOST/$REGISTRY_IMAGE:$REGISTRY_TAG`.
|
|
- Logs the resolved `RepoDigests[0]` to give the operator an immutable identifier in the deploy log.
|
|
|
|
### `stop-services.sh`
|
|
|
|
- Reads `org.opencontainers.image.revision` from the running container (label set by the Dockerfile).
|
|
- Writes `scripts/.previous_tags.env`:
|
|
```
|
|
PREVIOUS_SHA_TAG=<sha12>-<arch>
|
|
PREVIOUS_REVISION=<full sha>
|
|
RECORDED_AT=<ISO 8601>
|
|
```
|
|
- `docker stop -t 40` then `docker rm -f`.
|
|
- If the container does not exist, logs and exits 0 (idempotent — first deploy on a new host should succeed).
|
|
|
|
### `start-services.sh`
|
|
|
|
- Materializes a runtime env file by filtering the current shell environment with `grep '^(ASPNETCORE_|AZAION_)'`. Registry credentials and deploy-host plumbing variables stay on the host and never enter the container.
|
|
- `mkdir -p` for the bind-mounted `Content/` and `logs/` dirs (idempotent).
|
|
- `docker run --detach --name --restart unless-stopped --env-file --publish --volume`.
|
|
- Logs the container ID and the running revision.
|
|
|
|
### `health-check.sh`
|
|
|
|
- One-shot check on `/health/live` first (3 s timeout). If this fails the container is wedged — fail fast.
|
|
- Polls `/health/ready` every `HEALTH_INTERVAL` (default 2 s) until 200 or `HEALTH_TIMEOUT` (default 60 s).
|
|
- Returns 0 on first 200; non-zero on timeout.
|
|
|
|
### `smoke.sh`
|
|
|
|
Six checks, each ≤ 10 s, against the public `BASE_URL`:
|
|
|
|
1. `GET /health/live` (200)
|
|
2. `GET /health/ready` (200, best-effort — public URL may legitimately not expose this)
|
|
3. `POST /login` — extract JWT
|
|
4. `GET /users/current` (Bearer auth)
|
|
5. `GET /users` — count rows
|
|
6. `GET /resources/list` — sanity that filesystem-backed paths are reachable
|
|
|
|
Smoke is intentionally lightweight; it does NOT exercise CRUD or detection-class endpoints (those are covered by E2E in CI).
|
|
|
|
### `_lib.sh`
|
|
|
|
Shared sourced library. Sourced via `. "$SCRIPT_DIR/_lib.sh"` from every script. NOT executable (lives at `scripts/_lib.sh` mode 0644). Contains:
|
|
|
|
- `log_info` / `log_warn` / `log_error` / `die`
|
|
- `require_env <var…>` / `require_cmd <cmd…>`
|
|
- `load_env_overlay <env>` (the sops + age decryption pipeline)
|
|
- `container_exists`, `container_running`, `current_image_revision`
|
|
|
|
## 5. Examples
|
|
|
|
### First-ever staging deploy
|
|
|
|
```bash
|
|
# On the staging host, as deploy operator:
|
|
cd /opt/azaion/admin # or wherever the repo is checked out
|
|
ENV=staging ./scripts/deploy.sh a1b2c3d4e5f6-arm
|
|
```
|
|
|
|
### Rolling back production after a bad deploy
|
|
|
|
```bash
|
|
# Same host, immediately after the failed deploy:
|
|
ENV=production ./scripts/deploy.sh --rollback
|
|
```
|
|
|
|
### Running smoke from the operator workstation
|
|
|
|
```bash
|
|
export BASE_URL=https://stage.admin.azaion.com
|
|
export SMOKE_ADMIN_EMAIL=ops-smoke@azaion.com
|
|
export SMOKE_ADMIN_PASSWORD=... # from the operator's password manager
|
|
./scripts/smoke.sh
|
|
```
|
|
|
|
### Local development against the dockerized stack
|
|
|
|
The dev-time compose was deferred (Drift K-adjacent). Until it lands, run the API directly:
|
|
|
|
```bash
|
|
# Postgres on host port 4312 (per env/db/00_install.sh)
|
|
dotnet run --project Azaion.AdminApi
|
|
```
|
|
|
|
## 6. Common script properties
|
|
|
|
All scripts:
|
|
|
|
- Use `#!/usr/bin/env bash` with `set -euo pipefail`.
|
|
- Support `--help` / `-h` for usage.
|
|
- Source `_lib.sh` for logging and env-overlay helpers.
|
|
- Are idempotent where possible (running `deploy.sh` twice with the same SHA tag is a no-op for `pull-images.sh`, recreates the container in `stop`/`start`, and re-checks health).
|
|
- Echo to stderr for log lines (so stdout from a sub-process can still be piped).
|
|
|
|
## 7. What is NOT shipped in cycle 1
|
|
|
|
- Remote SSH wrapper. The deploy procedure assumes the operator runs the script on the target host. A `--remote $DEPLOY_HOST` mode is recorded as **Drift O** (carried forward).
|
|
- Slack notifications from inside the scripts. Notifications happen out-of-band per `_docs/04_deploy/observability.md` §5.
|
|
- Database migration step. Migrations are applied manually with `psql` per `_docs/04_deploy/environment_strategy.md` §4 (Drift J).
|
|
|
|
## 8. Related artifacts
|
|
|
|
- Postmortem template: `_docs/06_metrics/postmortem_template.md`
|
|
- Procedures: `_docs/04_deploy/deployment_procedures.md`
|
|
- Environment strategy: `_docs/04_deploy/environment_strategy.md`
|
|
- secrets/ folder onboarding: `secrets/README.md`
|