# Azaion Admin API — Environment Strategy **Date**: 2026-05-13 · **Cycle**: 1 · **Status**: planning artifact (no scripts; concrete wiring lands in Step 7). ## 1. Environments | Environment | Purpose | Infrastructure | Data Source | |-------------|---------|----------------|-------------| | **Development** | Local developer workflow on macOS / Linux. | Either bare `dotnet run` against host Postgres (port 4312) **or** the new `docker-compose.yml` planned in Step 2 (API + Postgres on a private Docker network). | Empty database; SQL files under `env/db/` create roles + schema; no fixtures. | | **Test (CI)** | Black-box tests in CI and locally via `scripts/run-tests.sh`. | `docker-compose.test.yml` — API + Postgres + e2e-runner on a Docker network. | Functional fixtures from `e2e/db-init/00_run_all.sh` + `99_test_seed.sql`. | | **Staging** | Pre-production validation. | Self-hosted Linux server, single Docker host, behind Nginx reverse proxy on `stage.admin.azaion.com`. Mirrors prod topology and Postgres major version. | Anonymized snapshot of production (PII scrubbed by an offline script before import). | | **Production** | Live system. | Self-hosted Linux server, single Docker host, behind Nginx reverse proxy on `admin.azaion.com`. | Live data; daily off-host backups. | > Test is added as a first-class environment because cycle 1 already exercises it (`docker-compose.test.yml`). The deploy template lists three; we list four to match reality. ## 2. Environment Variables ### Source of Truth The complete variable inventory lives in `.env.example` at the repo root (Step 1, 24 entries). This document does NOT duplicate that table — it only specifies, per environment, **where each variable is sourced**. ### Per-environment sourcing | Variable group | Development | Test (CI) | Staging | Production | |----------------|-------------|-----------|---------|------------| | `ASPNETCORE_ENVIRONMENT` | `.env` (`Development`) | docker-compose `environment:` (`Development`) | docker-compose / `--env-file` (`Staging`) | docker-compose / `--env-file` (`Production`) | | `ASPNETCORE_URLS` | `.env` | compose | host `.env` (rendered from sops) | host `.env` (rendered from sops) | | `ConnectionStrings__*` | `.env` (real local creds) | compose (literal — accepted F-10) | **sops-encrypted file in git** → decrypted on host at deploy time | same as staging | | `JwtConfig__Secret` | `.env` (dev-only literal) | compose (literal — accepted F-10) | **sops-encrypted** | **sops-encrypted** | | `JwtConfig__{Issuer,Audience,Lifetime}` | appsettings defaults | appsettings defaults | host `.env` if non-default | host `.env` if non-default | | `ResourcesConfig__*` | appsettings defaults | compose | host `.env` if non-default | host `.env` if non-default | | `DEPLOY_*`, `REGISTRY_TAG` | `.env` (developer machine) | n/a | passed to `scripts/deploy.sh` from operator's shell or CI manual trigger | same | | `REGISTRY_USER`, `REGISTRY_TOKEN` | empty in dev `.env` | Woodpecker secrets `registry_user` / `registry_token` | Woodpecker secrets (CI deploy) or operator's shell (manual deploy) | same | | `CI_COMMIT_SHA` | unset → image label `unknown` | Woodpecker built-in | Woodpecker built-in | Woodpecker built-in | ### Variable Validation (fail-fast) The Admin API already does this for the most security-critical variable: ```csharp var jwtConfig = builder.Configuration.GetSection(nameof(JwtConfig)).Get(); if (jwtConfig == null || string.IsNullOrEmpty(jwtConfig.Secret)) throw new Exception("Missing configuration section: JwtConfig"); ``` The deploy plan **adds** the same fail-fast check for connection strings during Step 7 wiring (a one-time `_ = configuration.GetConnectionString("AzaionDb") ?? throw …` plus the same for `AzaionDbAdmin`, executed during `WebApplication` build). Without the check, a missing variable currently surfaces only on the first DB call, which is too late. > Static / lookup-style variables (`ResourcesConfig__*`, `JwtConfig__{Issuer,Audience,Lifetime}`) keep their `appsettings.json` defaults in every environment unless an override is required. We do NOT add fail-fast checks for them. ## 3. Secrets Management ### Decision | Environment | Method | Tool | |-------------|--------|------| | Development | `.env` file | committed `.env.example` + per-developer `.env` (git-ignored) | | Test (CI) | docker-compose `environment:` literals | accepted as test-only (security audit F-10) | | Staging | git-tracked encrypted file | **sops + age** | | Production | git-tracked encrypted file | **sops + age** | ### Why sops + age (not Vault, not Woodpecker secrets, not hand-edited `.env`) Constraints: self-hosted, no cloud account, single ops engineer, currently hand-editing `.env` on the host. | Option | Pros | Cons | Verdict | |--------|------|------|---------| | sops + age (chosen) | Secrets versioned in git, encrypted at rest, decrypted on the host with a single age key. No new infra. Works offline. | Requires per-environment age keypair stored on the host outside git. Manual key rotation. | ✅ pragmatic for this team size and topology | | HashiCorp Vault (self-hosted) | Dynamic DB creds, audit log, fine-grained ACL, KV v2. | Adds a service to operate, monitor, back up. Single-engineer ops budget cannot absorb it now. | ⏳ revisit in a future cycle when ops capacity grows | | Woodpecker secrets exported into runtime container | Reuses existing secret store. | Couples runtime config to CI; secrets are not visible/auditable outside Woodpecker UI; cannot run the container outside CI without manually exporting them. | ❌ leaks the CI/runtime boundary | | Hand-edited host `.env` (status quo) | Zero new tooling. | No version history, no encryption, no review trail. Single point of failure if the file is lost; security audit can't track changes. | ❌ status quo we are leaving behind (Drift B) | ### sops + age conventions for this repo ``` secrets/ ├── .sops.yaml # routes secrets/staging.env / production.env to the right age recipients ├── staging.env # SOPS-encrypted; safe to commit └── production.env # SOPS-encrypted; safe to commit ``` - `.sops.yaml` declares two age recipients: `recipient_staging` and `recipient_production` (public keys). - The matching age **private** keys live on each host at `/etc/azaion/age.key`, mode `0400`, owned by root. They are NEVER committed. - `scripts/deploy.sh` (Step 7) runs `SOPS_AGE_KEY_FILE=/etc/azaion/age.key sops -d secrets/${env}.env > /tmp/azaion.env` and feeds it to `docker run --env-file`. - All staging/production env values that are NOT secret (e.g. `DEPLOY_HOST_PORT`, `REGISTRY_TAG`) live in plain-text `secrets/staging.public.env` / `secrets/production.public.env` next to the encrypted file, also git-tracked. Loaded before the decrypted overlay. ### Rotation policy | Secret | Rotation cadence | Procedure | |--------|------------------|-----------| | Postgres `azaion_admin` / `azaion_reader` passwords | every 90 days, on operator schedule | `ALTER ROLE … WITH PASSWORD …` → re-encrypt `production.env` → `scripts/deploy.sh` | | JWT `JwtConfig__Secret` | every 180 days, AND on any suspected leak | re-encrypt → deploy. **All issued tokens become invalid** — communicate maintenance window. | | `azaion_superadmin` password | every 365 days, AND on owner change | manual; not used by the running app, only by DB migrations | | Registry `REGISTRY_TOKEN` | every 90 days OR on CI compromise | rotate registry credential → update Woodpecker secret `registry_token` → re-encrypt `production.env` if also referenced there | | age private key (`/etc/azaion/age.key`) | every 365 days OR on host compromise | generate new key → add public recipient to `.sops.yaml` → `sops updatekeys secrets/*.env` → distribute new private key out-of-band → remove old recipient | ## 4. Database Management | Environment | Type | Migrations | Data | Backup | |-------------|------|------------|------|--------| | Development | Local Postgres on host (port 4312) **or** dockerized Postgres from `docker-compose.yml` | `env/db/*.sql` applied manually by developer the first time, then `*_users_email_unique.sql`-style additive scripts run with `psql` on demand | empty | none | | Test (CI) | Postgres 16-alpine from `docker-compose.test.yml` | `env/db/*.sql` mounted into `/docker-entrypoint-initdb.d/sql/`, ordered by `00_run_all.sh` | `99_test_seed.sql` (functional) + 500 perf users injected by `scripts/run-performance-tests.sh` when needed | none — `down -v` between runs | | Staging | Same Postgres major (16) on the staging server, port 4312, `azaion` database | `env/db/*.sql` applied **manually under change control** via `psql -U azaion_superadmin`. New migrations land in the same numeric-prefix sequence (`07_*.sql`, `08_*.sql`, …) | anonymized prod snapshot, refreshed on demand | nightly `pg_dump` snapshot retained 14 days | | Production | Same Postgres 16 on prod server | Same as staging; **migration must be applied to staging first**, observed for ≥ 24 h, then promoted to prod with operator approval | live | nightly `pg_dump` retained 30 days; weekly snapshot retained 12 weeks; off-host copy via `rsync` | ### Migration rules (cycle 1) The project does NOT use an ORM migration framework (linq2db; restrictions.md). The conventions below replace it: 1. **Numeric-prefix ordering** — every new migration is added as `env/db/NN_.sql` where `NN` continues the existing sequence. The current sequence is `01..06`; the next is `07_*.sql`. 2. **Forward-only by default**. Reversibility is provided by the off-host backup, NOT by hand-written DOWN scripts. The existing files (`02_structure.sql`, `03_add_timestamp_columns.sql`, `04_detection_classes.sql`, `06_users_email_unique.sql`) follow this pattern; we keep it. 3. **Backward-compatible deploys** — every schema change must be safe to apply BEFORE the matching code is deployed (additive change → deploy code → cleanup change in a later release). The cycle 1 example: `06_users_email_unique.sql` was applied first; the `RegisterUser` change to translate `23505` came after. AZ-197's `User.Hardware` column was kept as a tombstone instead of dropped, for the same reason. 4. **Production migrations need approval** — operator manually runs the SQL on prod after staging soak. No automatic CI execution against prod in cycle 1 (Drift J — automation is a future cycle's work). ### Drifts logged here | ID | Severity | Description | Resolved In | |----|----------|-------------|-------------| | B | Medium | No secret manager (status quo: hand-edited host `.env`) | **Resolved in spec** — sops + age (§3); concrete files + script in Step 7 | | J | Low (NEW) | DB migrations applied manually on staging/prod; no automation | **Carried forward** to a future cycle | ## 5. Self-verification - [x] Four environments (Dev, Test/CI, Staging, Production) defined with purpose, infrastructure, and data source. - [x] Environment variable sourcing matrix references `.env.example` (Step 1) without duplicating it. - [x] No literal secrets in this document — only variable names and tool names. - [x] Secret manager chosen for staging/production (sops + age) with rotation policy. - [x] Database strategy per environment, including the explicit no-ORM-migrations convention.