mirror of https://github.com/azaion/admin.git synced 2026-06-21 06:51:08 +00:00

Files

T

Oleksandr Bezdieniezhnykh f369153149 [AZ-552] [AZ-553] [AZ-554] [AZ-555] Cycle-2 hotfix: deploy/infra chain

Batch 5 (cycle 2 hotfix sprint, batch 1 of 2). 6 story points under epic
AZ-530. Addresses 2 Critical + 2 High deploy-blocking findings from
security_report_cycle2.md (F-INFRA-1..F-INFRA-4).

AZ-552 — drop_jwt_secret_deploy_preflight (1 pt, F-INFRA-1 Critical)
  scripts/start-services.sh swaps obsolete JwtConfig__Secret preflight
  for the cycle-2 trio (KeysFolder + ActiveKid + DataProtection.KeysFolder).
  .env.example, env/api/env.ps1, _docs/04_deploy/* updated to match. Repo
  scan in scripts/ and .env.example returns 0 offenders.

AZ-553 — bind_mount_es256_keys (2 pts, F-INFRA-2 Critical)
  start-services.sh bind-mounts DEPLOY_HOST_JWT_KEYS_DIR read-only at
  /etc/azaion/jwt-keys; preflight fails fast on a missing or empty host
  directory with operator-actionable error messages.

AZ-554 — persist_dataprotection_keys (2 pts, F-INFRA-3 High)
  Program.cs DataProtection wiring now fails fast in Production when
  KeysFolder is unset OR not probe-writable. start-services.sh bind-mounts
  DEPLOY_HOST_DP_KEYS_DIR read-write at /var/lib/azaion/dp-keys.
  Development behaviour unchanged (ephemeral default).

AZ-555 — secrets_readme_es256_rewrite (1 pt, F-INFRA-4 High)
  secrets/README.md schema fully rewritten; new "Host-side directories"
  subsection with bind-mount table + ownership/permission guidance.
  Cycle-1 JwtConfig__Secret removed from live schema (one prose
  deprecation paragraph retained).

Adjacent hygiene
  module-layout.md "Owns" extended to include scripts/, secrets/, env/,
  .env.example (gap from Step 9 new-task layout-delta).

Tests
  e2e/Azaion.E2E/Tests/Cycle2HotfixDeployTests.cs — 19 facts (8 exec,
  11 Skip with rationale per AZ-537/AZ-538 precedent). Skipped tests
  cover preflight/restart/Production-only paths verified at deploy gate.

Build: 0W 0E across Azaion.AdminApi + Azaion.E2E.
Test run deferred to autodev Step 11 (Run Tests).
Tracker transition deferred to next batch (MCP availability unverified
in this session — Leftovers pattern).

Co-authored-by: Cursor <cursoragent@cursor.com>

2026-05-14 09:35:57 +03:00

12 KiB

Raw Blame History

Azaion Admin API — Environment Strategy

Date: 2026-05-13 · Cycle: 1 · Status: planning artifact (no scripts; concrete wiring lands in Step 7).

1. Environments

Environment	Purpose	Infrastructure	Data Source
Development	Local developer workflow on macOS / Linux.	Either bare `dotnet run` against host Postgres (port 4312) or the new `docker-compose.yml` planned in Step 2 (API + Postgres on a private Docker network).	Empty database; SQL files under `env/db/` create roles + schema; no fixtures.
Test (CI)	Black-box tests in CI and locally via `scripts/run-tests.sh`.	`docker-compose.test.yml` — API + Postgres + e2e-runner on a Docker network.	Functional fixtures from `e2e/db-init/00_run_all.sh` + `99_test_seed.sql`.
Staging	Pre-production validation.	Self-hosted Linux server, single Docker host, behind Nginx reverse proxy on `stage.admin.azaion.com`. Mirrors prod topology and Postgres major version.	Anonymized snapshot of production (PII scrubbed by an offline script before import).
Production	Live system.	Self-hosted Linux server, single Docker host, behind Nginx reverse proxy on `admin.azaion.com`.	Live data; daily off-host backups.

Test is added as a first-class environment because cycle 1 already exercises it (docker-compose.test.yml). The deploy template lists three; we list four to match reality.

2. Environment Variables

Source of Truth

The complete variable inventory lives in .env.example at the repo root (Step 1, 24 entries). This document does NOT duplicate that table — it only specifies, per environment, where each variable is sourced.

Per-environment sourcing

Variable group	Development	Test (CI)	Staging	Production
`ASPNETCORE_ENVIRONMENT`	`.env` (`Development`)	docker-compose `environment:` (`Development`)	docker-compose / `--env-file` (`Staging`)	docker-compose / `--env-file` (`Production`)
`ASPNETCORE_URLS`	`.env`	compose	host `.env` (rendered from sops)	host `.env` (rendered from sops)
`ConnectionStrings__*`	`.env` (real local creds)	compose (literal — accepted F-10)	sops-encrypted file in git → decrypted on host at deploy time	same as staging
`JwtConfig__KeysFolder`, `__ActiveKid` (AZ-552/AZ-553)	`.env` (dev-only path)	compose (volume mount)	public env + bind-mount via `DEPLOY_HOST_JWT_KEYS_DIR`	same
`DataProtection__KeysFolder` (AZ-554)	unset (ephemeral dev default)	unset	public env + bind-mount via `DEPLOY_HOST_DP_KEYS_DIR`	same; Production fail-fast if unset
`JwtConfig__{Issuer,Audience,AccessTokenLifetimeMinutes}`	appsettings defaults	appsettings defaults	host `.env` if non-default	host `.env` if non-default
`ResourcesConfig__*`	appsettings defaults	compose	host `.env` if non-default	host `.env` if non-default
`DEPLOY_*`, `REGISTRY_TAG`	`.env` (developer machine)	n/a	passed to `scripts/deploy.sh` from operator's shell or CI manual trigger	same
`REGISTRY_USER`, `REGISTRY_TOKEN`	empty in dev `.env`	Woodpecker secrets `registry_user` / `registry_token`	Woodpecker secrets (CI deploy) or operator's shell (manual deploy)	same
`CI_COMMIT_SHA`	unset → image label `unknown`	Woodpecker built-in	Woodpecker built-in	Woodpecker built-in

Variable Validation (fail-fast)

The Admin API already does this for the most security-critical variable:

var jwtConfig = builder.Configuration.GetSection(nameof(JwtConfig)).Get<JwtConfig>();
if (jwtConfig == null || string.IsNullOrEmpty(jwtConfig.Secret))
    throw new Exception("Missing configuration section: JwtConfig");

The deploy plan adds the same fail-fast check for connection strings during Step 7 wiring (a one-time _ = configuration.GetConnectionString("AzaionDb") ?? throw … plus the same for AzaionDbAdmin, executed during WebApplication build). Without the check, a missing variable currently surfaces only on the first DB call, which is too late.

Static / lookup-style variables (ResourcesConfig__*, JwtConfig__{Issuer,Audience,Lifetime}) keep their appsettings.json defaults in every environment unless an override is required. We do NOT add fail-fast checks for them.

3. Secrets Management

Decision

Environment	Method	Tool
Development	`.env` file	committed `.env.example` + per-developer `.env` (git-ignored)
Test (CI)	docker-compose `environment:` literals	accepted as test-only (security audit F-10)
Staging	git-tracked encrypted file	sops + age
Production	git-tracked encrypted file	sops + age

Why sops + age (not Vault, not Woodpecker secrets, not hand-edited `.env`)

Constraints: self-hosted, no cloud account, single ops engineer, currently hand-editing .env on the host.

Option	Pros	Cons	Verdict
sops + age (chosen)	Secrets versioned in git, encrypted at rest, decrypted on the host with a single age key. No new infra. Works offline.	Requires per-environment age keypair stored on the host outside git. Manual key rotation.	✅ pragmatic for this team size and topology
HashiCorp Vault (self-hosted)	Dynamic DB creds, audit log, fine-grained ACL, KV v2.	Adds a service to operate, monitor, back up. Single-engineer ops budget cannot absorb it now.	⏳ revisit in a future cycle when ops capacity grows
Woodpecker secrets exported into runtime container	Reuses existing secret store.	Couples runtime config to CI; secrets are not visible/auditable outside Woodpecker UI; cannot run the container outside CI without manually exporting them.	❌ leaks the CI/runtime boundary
Hand-edited host `.env` (status quo)	Zero new tooling.	No version history, no encryption, no review trail. Single point of failure if the file is lost; security audit can't track changes.	❌ status quo we are leaving behind (Drift B)

sops + age conventions for this repo

secrets/
├── .sops.yaml                 # routes secrets/staging.env / production.env to the right age recipients
├── staging.env                # SOPS-encrypted; safe to commit
└── production.env             # SOPS-encrypted; safe to commit

.sops.yaml declares two age recipients: recipient_staging and recipient_production (public keys).
The matching age private keys live on each host at /etc/azaion/age.key, mode 0400, owned by root. They are NEVER committed.
scripts/deploy.sh (Step 7) runs SOPS_AGE_KEY_FILE=/etc/azaion/age.key sops -d secrets/${env}.env > /tmp/azaion.env and feeds it to docker run --env-file.
All staging/production env values that are NOT secret (e.g. DEPLOY_HOST_PORT, REGISTRY_TAG) live in plain-text secrets/staging.public.env / secrets/production.public.env next to the encrypted file, also git-tracked. Loaded before the decrypted overlay.

Rotation policy

Secret	Rotation cadence	Procedure
Postgres `azaion_admin` / `azaion_reader` passwords	every 90 days, on operator schedule	`ALTER ROLE … WITH PASSWORD …` → re-encrypt `production.env` → `scripts/deploy.sh`
JWT signing PEMs in `DEPLOY_HOST_JWT_KEYS_DIR` (AZ-532/AZ-552/AZ-553)	every 180 days, AND on any suspected leak	follow `scripts/generate-jwt-key.sh` header (steps 1-6: drop a new PEM next to the active one → restart → wait verifier-cache TTL → switch `ActiveKid` → wait access-token TTL → delete old PEM). Rotation is non-breaking because both kids are exposed via `/.well-known/jwks.json` during the overlap window.
`azaion_superadmin` password	every 365 days, AND on owner change	manual; not used by the running app, only by DB migrations
Registry `REGISTRY_TOKEN`	every 90 days OR on CI compromise	rotate registry credential → update Woodpecker secret `registry_token` → re-encrypt `production.env` if also referenced there
age private key (`/etc/azaion/age.key`)	every 365 days OR on host compromise	generate new key → add public recipient to `.sops.yaml` → `sops updatekeys secrets/*.env` → distribute new private key out-of-band → remove old recipient

4. Database Management

Environment	Type	Migrations	Data	Backup
Development	Local Postgres on host (port 4312) or dockerized Postgres from `docker-compose.yml`	`env/db/.sql` applied manually by developer the first time, then `_users_email_unique.sql`-style additive scripts run with `psql` on demand	empty	none
Test (CI)	Postgres 16-alpine from `docker-compose.test.yml`	`env/db/*.sql` mounted into `/docker-entrypoint-initdb.d/sql/`, ordered by `00_run_all.sh`	`99_test_seed.sql` (functional) + 500 perf users injected by `scripts/run-performance-tests.sh` when needed	none — `down -v` between runs
Staging	Same Postgres major (16) on the staging server, port 4312, `azaion` database	`env/db/.sql` applied manually under change control* via `psql -U azaion_superadmin`. New migrations land in the same numeric-prefix sequence (`07_.sql`, `08_.sql`, …)	anonymized prod snapshot, refreshed on demand	nightly `pg_dump` snapshot retained 14 days
Production	Same Postgres 16 on prod server	Same as staging; migration must be applied to staging first, observed for ≥ 24 h, then promoted to prod with operator approval	live	nightly `pg_dump` retained 30 days; weekly snapshot retained 12 weeks; off-host copy via `rsync`

Migration rules (cycle 1)

The project does NOT use an ORM migration framework (linq2db; restrictions.md). The conventions below replace it:

Numeric-prefix ordering — every new migration is added as env/db/NN_<description>.sql where NN continues the existing sequence. The current sequence is 01..06; the next is 07_*.sql.
Forward-only by default. Reversibility is provided by the off-host backup, NOT by hand-written DOWN scripts. The existing files (02_structure.sql, 03_add_timestamp_columns.sql, 04_detection_classes.sql, 06_users_email_unique.sql) follow this pattern; we keep it.
Backward-compatible deploys — every schema change must be safe to apply BEFORE the matching code is deployed (additive change → deploy code → cleanup change in a later release). The cycle 1 example: 06_users_email_unique.sql was applied first; the RegisterUser change to translate 23505 came after. AZ-197's User.Hardware column was kept as a tombstone instead of dropped, for the same reason.
Production migrations need approval — operator manually runs the SQL on prod after staging soak. No automatic CI execution against prod in cycle 1 (Drift J — automation is a future cycle's work).

Drifts logged here

ID	Severity	Description	Resolved In
B	Medium	No secret manager (status quo: hand-edited host `.env`)	Resolved in spec — sops + age (§3); concrete files + script in Step 7
J	Low (NEW)	DB migrations applied manually on staging/prod; no automation	Carried forward to a future cycle

5. Self-verification

Four environments (Dev, Test/CI, Staging, Production) defined with purpose, infrastructure, and data source.
Environment variable sourcing matrix references .env.example (Step 1) without duplicating it.
No literal secrets in this document — only variable names and tool names.
Secret manager chosen for staging/production (sops + age) with rotation policy.
Database strategy per environment, including the explicit no-ORM-migrations convention.

12 KiB Raw Blame History