[AZ-552] [AZ-553] [AZ-554] [AZ-555] Cycle-2 hotfix: deploy/infra chain

Batch 5 (cycle 2 hotfix sprint, batch 1 of 2). 6 story points under epic
AZ-530. Addresses 2 Critical + 2 High deploy-blocking findings from
security_report_cycle2.md (F-INFRA-1..F-INFRA-4).

AZ-552 — drop_jwt_secret_deploy_preflight (1 pt, F-INFRA-1 Critical)
  scripts/start-services.sh swaps obsolete JwtConfig__Secret preflight
  for the cycle-2 trio (KeysFolder + ActiveKid + DataProtection.KeysFolder).
  .env.example, env/api/env.ps1, _docs/04_deploy/* updated to match. Repo
  scan in scripts/ and .env.example returns 0 offenders.

AZ-553 — bind_mount_es256_keys (2 pts, F-INFRA-2 Critical)
  start-services.sh bind-mounts DEPLOY_HOST_JWT_KEYS_DIR read-only at
  /etc/azaion/jwt-keys; preflight fails fast on a missing or empty host
  directory with operator-actionable error messages.

AZ-554 — persist_dataprotection_keys (2 pts, F-INFRA-3 High)
  Program.cs DataProtection wiring now fails fast in Production when
  KeysFolder is unset OR not probe-writable. start-services.sh bind-mounts
  DEPLOY_HOST_DP_KEYS_DIR read-write at /var/lib/azaion/dp-keys.
  Development behaviour unchanged (ephemeral default).

AZ-555 — secrets_readme_es256_rewrite (1 pt, F-INFRA-4 High)
  secrets/README.md schema fully rewritten; new "Host-side directories"
  subsection with bind-mount table + ownership/permission guidance.
  Cycle-1 JwtConfig__Secret removed from live schema (one prose
  deprecation paragraph retained).

Adjacent hygiene
  module-layout.md "Owns" extended to include scripts/, secrets/, env/,
  .env.example (gap from Step 9 new-task layout-delta).

Tests
  e2e/Azaion.E2E/Tests/Cycle2HotfixDeployTests.cs — 19 facts (8 exec,
  11 Skip with rationale per AZ-537/AZ-538 precedent). Skipped tests
  cover preflight/restart/Production-only paths verified at deploy gate.

Build: 0W 0E across Azaion.AdminApi + Azaion.E2E.
Test run deferred to autodev Step 11 (Run Tests).
Tracker transition deferred to next batch (MCP availability unverified
in this session — Leftovers pattern).

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-14 09:35:57 +03:00
parent d2b5308b45
commit f369153149
20 changed files with 517 additions and 45 deletions
+60 -8
View File
@@ -1,4 +1,4 @@
# `secrets/` — sops + age secret material
# `secrets/` — sops + age secret material + host handover
This folder holds **per-environment** runtime configuration for the Admin API.
@@ -9,6 +9,7 @@ This folder holds **per-environment** runtime configuration for the Admin API.
| `production.public.env` | yes | no | same |
| `staging.env` | yes (after first encryption) | **yes** (sops + age) | `scripts/deploy.sh` decrypts to a tempfile then sources it |
| `production.env` | yes (after first encryption) | **yes** (sops + age) | same |
| `jwt-keys/` | yes (PEMs are committed under the sops recipient set) | private keys are filesystem-protected (0600 in dev; bind-mounted on the host in prod) | `JwtSigningKeyProvider` reads them from `JwtConfig.KeysFolder` |
| age private key | **never tracked** | n/a | lives at `/etc/azaion/age.key` on the deploy host (mode 0400) |
## First-time bootstrap on a fresh host
@@ -33,25 +34,76 @@ sudo grep '^# public key:' /etc/azaion/age.key
# 4. Sanity-check on the host:
SOPS_AGE_KEY_FILE=/etc/azaion/age.key sops -d secrets/staging.env | head
# 5. Generate the cycle-2 ES256 JWT signing key on the host (AZ-552/AZ-553):
sudo install -d -m 0750 -o <container-uid> -g <container-gid> /var/lib/azaion/jwt-keys
sudo bash scripts/generate-jwt-key.sh "" /var/lib/azaion/jwt-keys
# Take note of the generated kid; you'll set ASPNETCORE_JwtConfig__ActiveKid to it.
# 6. Create the DataProtection key folder on the host (AZ-554):
sudo install -d -m 0700 -o <container-uid> -g <container-gid> /var/lib/azaion/dp-keys
```
## Rotation
## Host-side directories (bind-mounted into the container)
See `_docs/04_deploy/environment_strategy.md` §3 for the per-secret rotation cadence and procedure.
`scripts/start-services.sh` bind-mounts two host directories into the admin
container. Both are operator-provisioned and MUST exist before deploy.
For procedural detail (rotation, recovery, etc.) see
`_docs/04_deploy/environment_strategy.md` and `_docs/04_deploy/deploy_scripts.md`.
| Host env var | Default host path | Container path | Mode | Holds |
|--------------|-------------------|----------------|------|-------|
| `DEPLOY_HOST_JWT_KEYS_DIR` (AZ-553) | `/var/lib/azaion/jwt-keys` | `/etc/azaion/jwt-keys` | **read-only** | ES256 PEM(s) signed by the operator; each filename minus `.pem` is the JWK kid |
| `DEPLOY_HOST_DP_KEYS_DIR` (AZ-554) | `/var/lib/azaion/dp-keys` | `/var/lib/azaion/dp-keys` | **read-write** | DataProtection master key ring; rotated automatically by ASP.NET Core |
Ownership / permissions guidance:
- **JWT keys** — `chown <container-uid>:<container-gid>`, `chmod 0750` on the directory and `chmod 0400` (or `0640`) on each PEM. Container needs read; nothing else needs anything.
- **DataProtection keys** — `chown <container-uid>:<container-gid>`, `chmod 0700` on the directory. The ring file is rotated by the framework, so the container needs write. Never world-readable.
- The `<container-uid>` / `<container-gid>` are whatever the `app` user maps to in `Dockerfile` (cycle-2: see `Dockerfile:7-11`).
## Key rotation
- **ES256 signing keys** — follow the procedure in the `scripts/generate-jwt-key.sh` header (steps 1-6). Rotation is non-breaking because both kids stay in JWKS during the verifier-cache overlap window.
- **DataProtection master keys** — rotated automatically by ASP.NET Core (default lifetime 90 days). The directory must remain writable across restarts; never delete it manually unless you also accept that every MFA secret ciphertext becomes unreadable.
- **Postgres role passwords** — every 90 days; see `_docs/04_deploy/environment_strategy.md` §rotation table.
- **Registry token** — every 90 days OR on CI compromise; same table.
- **age private key** — every 365 days OR on host compromise; same table.
## What goes where
- **Public env (staging.public.env / production.public.env)** — anything that is NOT a secret: hostname, port, container name, JWT issuer/audience, resource folder names. Reviewable in PRs.
- **Encrypted env (staging.env / production.env)** — DB connection strings (with passwords), `JwtConfig__Secret`, `REGISTRY_USER`, `REGISTRY_TOKEN`, anything else sensitive. NEVER readable in plain text outside the host.
- **Public env (`staging.public.env` / `production.public.env`)** — anything that is NOT a secret: hostname, port, container name, JWT issuer/audience, KeysFolder paths, resource folder names. Reviewable in PRs.
- **Encrypted env (`staging.env` / `production.env`)** — DB connection strings (with passwords), `JwtConfig__ActiveKid` (if you prefer not to commit it), `REGISTRY_USER`, `REGISTRY_TOKEN`, anything else sensitive. NEVER readable in plain text outside the host.
## Schema (variables that MUST be in the encrypted file)
## Schema (variables that MUST be set for a Production deploy)
The cycle-2 startup pipeline fail-fasts on these. `scripts/start-services.sh`
runs the preflight check against the same list.
```
# --- Database -----------------------------------------------------------------
ASPNETCORE_ConnectionStrings__AzaionDb=Host=...;Port=4312;Database=azaion;Username=azaion_reader;Password=...
ASPNETCORE_ConnectionStrings__AzaionDbAdmin=Host=...;Port=4312;Database=azaion;Username=azaion_admin;Password=...
ASPNETCORE_JwtConfig__Secret=<>= 32 random bytes>
# --- JWT signing (cycle-2 ES256 — AZ-532/AZ-552/AZ-553) -----------------------
# Container-side path; host dir is bind-mounted by start-services.sh.
ASPNETCORE_JwtConfig__KeysFolder=/etc/azaion/jwt-keys
# kid of the PEM currently used to sign. Set during generate-jwt-key.sh rotation.
ASPNETCORE_JwtConfig__ActiveKid=<kid-of-active-pem>
# --- DataProtection (cycle-2 MFA at-rest — AZ-554) ----------------------------
# Container-side path; host dir is RW bind-mounted by start-services.sh.
ASPNETCORE_DataProtection__KeysFolder=/var/lib/azaion/dp-keys
# --- Host-side bind-mount sources (consumed by scripts/, NOT the app) ---------
DEPLOY_HOST_JWT_KEYS_DIR=/var/lib/azaion/jwt-keys
DEPLOY_HOST_DP_KEYS_DIR=/var/lib/azaion/dp-keys
# --- Registry -----------------------------------------------------------------
REGISTRY_USER=<registry account>
REGISTRY_TOKEN=<registry token>
```
The deploy script will fail-fast if any of the first three are missing once the container starts.
The cycle-1 symmetric `JwtConfig.Secret` was removed by AZ-532 and is **no
longer supported** — verifiers fetch the public key from
`/.well-known/jwks.json` instead. Any operator runbook or `.env` that still
sets it should drop the line.