# Azaion Admin API — Environment Strategy

**Date**: 2026-05-13 · **Cycle**: 1 · **Status**: planning artifact (no scripts; concrete wiring lands in Step 7).

## 1. Environments

| Environment | Purpose | Infrastructure | Data Source |
|-------------|---------|----------------|-------------|
| **Development** | Local developer workflow on macOS / Linux. | Either bare `dotnet run` against host Postgres (port 4312) **or** the new `docker-compose.yml` planned in Step 2 (API + Postgres on a private Docker network). | Empty database; SQL files under `env/db/` create roles + schema; no fixtures. |
| **Test (CI)** | Black-box tests in CI and locally via `scripts/run-tests.sh`. | `docker-compose.test.yml` — API + Postgres + e2e-runner on a Docker network. | Functional fixtures from `e2e/db-init/00_run_all.sh` + `99_test_seed.sql`. |
| **Staging** | Pre-production validation. | Self-hosted Linux server, single Docker host, behind Nginx reverse proxy on `stage.admin.azaion.com`. Mirrors prod topology and Postgres major version. | Anonymized snapshot of production (PII scrubbed by an offline script before import). |
| **Production** | Live system. | Self-hosted Linux server, single Docker host, behind Nginx reverse proxy on `admin.azaion.com`. | Live data; daily off-host backups. |

> Test is added as a first-class environment because cycle 1 already exercises it (`docker-compose.test.yml`). The deploy template lists three; we list four to match reality.

## 2. Environment Variables

### Source of Truth

The complete variable inventory lives in `.env.example` at the repo root (Step 1, 24 entries). This document does NOT duplicate that table — it only specifies, per environment, **where each variable is sourced**.

### Per-environment sourcing

| Variable group | Development | Test (CI) | Staging | Production |
|----------------|-------------|-----------|---------|------------|
| `ASPNETCORE_ENVIRONMENT` | `.env` (`Development`) | docker-compose `environment:` (`Development`) | docker-compose / `--env-file` (`Staging`) | docker-compose / `--env-file` (`Production`) |
| `ASPNETCORE_URLS` | `.env` | compose | host `.env` (rendered from sops) | host `.env` (rendered from sops) |
| `ConnectionStrings__*` | `.env` (real local creds) | compose (literal — accepted F-10) | **sops-encrypted file in git** → decrypted on host at deploy time | same as staging |
| `JwtConfig__Secret` | `.env` (dev-only literal) | compose (literal — accepted F-10) | **sops-encrypted** | **sops-encrypted** |
| `JwtConfig__{Issuer,Audience,Lifetime}` | appsettings defaults | appsettings defaults | host `.env` if non-default | host `.env` if non-default |
| `ResourcesConfig__*` | appsettings defaults | compose | host `.env` if non-default | host `.env` if non-default |
| `DEPLOY_*`, `REGISTRY_TAG` | `.env` (developer machine) | n/a | passed to `scripts/deploy.sh` from operator's shell or CI manual trigger | same |
| `REGISTRY_USER`, `REGISTRY_TOKEN` | empty in dev `.env` | Woodpecker secrets `registry_user` / `registry_token` | Woodpecker secrets (CI deploy) or operator's shell (manual deploy) | same |
| `CI_COMMIT_SHA` | unset → image label `unknown` | Woodpecker built-in | Woodpecker built-in | Woodpecker built-in |

### Variable Validation (fail-fast)

The Admin API already does this for the most security-critical variable:

```csharp
var jwtConfig = builder.Configuration.GetSection(nameof(JwtConfig)).Get<JwtConfig>();
if (jwtConfig == null || string.IsNullOrEmpty(jwtConfig.Secret))
    throw new Exception("Missing configuration section: JwtConfig");
```

The deploy plan **adds** the same fail-fast check for connection strings during Step 7 wiring (a one-time `_ = configuration.GetConnectionString("AzaionDb") ?? throw …` plus the same for `AzaionDbAdmin`, executed during `WebApplication` build). Without the check, a missing variable currently surfaces only on the first DB call, which is too late.

> Static / lookup-style variables (`ResourcesConfig__*`, `JwtConfig__{Issuer,Audience,Lifetime}`) keep their `appsettings.json` defaults in every environment unless an override is required. We do NOT add fail-fast checks for them.

## 3. Secrets Management

### Decision

| Environment | Method | Tool |
|-------------|--------|------|
| Development | `.env` file | committed `.env.example` + per-developer `.env` (git-ignored) |
| Test (CI) | docker-compose `environment:` literals | accepted as test-only (security audit F-10) |
| Staging | git-tracked encrypted file | **sops + age** |
| Production | git-tracked encrypted file | **sops + age** |

### Why sops + age (not Vault, not Woodpecker secrets, not hand-edited `.env`)

Constraints: self-hosted, no cloud account, single ops engineer, currently hand-editing `.env` on the host.

| Option | Pros | Cons | Verdict |
|--------|------|------|---------|
| sops + age (chosen) | Secrets versioned in git, encrypted at rest, decrypted on the host with a single age key. No new infra. Works offline. | Requires per-environment age keypair stored on the host outside git. Manual key rotation. | ✅ pragmatic for this team size and topology |
| HashiCorp Vault (self-hosted) | Dynamic DB creds, audit log, fine-grained ACL, KV v2. | Adds a service to operate, monitor, back up. Single-engineer ops budget cannot absorb it now. | ⏳ revisit in a future cycle when ops capacity grows |
| Woodpecker secrets exported into runtime container | Reuses existing secret store. | Couples runtime config to CI; secrets are not visible/auditable outside Woodpecker UI; cannot run the container outside CI without manually exporting them. | ❌ leaks the CI/runtime boundary |
| Hand-edited host `.env` (status quo) | Zero new tooling. | No version history, no encryption, no review trail. Single point of failure if the file is lost; security audit can't track changes. | ❌ status quo we are leaving behind (Drift B) |

### sops + age conventions for this repo

```
secrets/
├── .sops.yaml                 # routes secrets/staging.env / production.env to the right age recipients
├── staging.env                # SOPS-encrypted; safe to commit
└── production.env             # SOPS-encrypted; safe to commit
```

- `.sops.yaml` declares two age recipients: `recipient_staging` and `recipient_production` (public keys).
- The matching age **private** keys live on each host at `/etc/azaion/age.key`, mode `0400`, owned by root. They are NEVER committed.
- `scripts/deploy.sh` (Step 7) runs `SOPS_AGE_KEY_FILE=/etc/azaion/age.key sops -d secrets/${env}.env > /tmp/azaion.env` and feeds it to `docker run --env-file`.
- All staging/production env values that are NOT secret (e.g. `DEPLOY_HOST_PORT`, `REGISTRY_TAG`) live in plain-text `secrets/staging.public.env` / `secrets/production.public.env` next to the encrypted file, also git-tracked. Loaded before the decrypted overlay.

### Rotation policy

| Secret | Rotation cadence | Procedure |
|--------|------------------|-----------|
| Postgres `azaion_admin` / `azaion_reader` passwords | every 90 days, on operator schedule | `ALTER ROLE … WITH PASSWORD …` → re-encrypt `production.env` → `scripts/deploy.sh` |
| JWT `JwtConfig__Secret` | every 180 days, AND on any suspected leak | re-encrypt → deploy. **All issued tokens become invalid** — communicate maintenance window. |
| `azaion_superadmin` password | every 365 days, AND on owner change | manual; not used by the running app, only by DB migrations |
| Registry `REGISTRY_TOKEN` | every 90 days OR on CI compromise | rotate registry credential → update Woodpecker secret `registry_token` → re-encrypt `production.env` if also referenced there |
| age private key (`/etc/azaion/age.key`) | every 365 days OR on host compromise | generate new key → add public recipient to `.sops.yaml` → `sops updatekeys secrets/*.env` → distribute new private key out-of-band → remove old recipient |

## 4. Database Management

| Environment | Type | Migrations | Data | Backup |
|-------------|------|------------|------|--------|
| Development | Local Postgres on host (port 4312) **or** dockerized Postgres from `docker-compose.yml` | `env/db/*.sql` applied manually by developer the first time, then `*_users_email_unique.sql`-style additive scripts run with `psql` on demand | empty | none |
| Test (CI) | Postgres 16-alpine from `docker-compose.test.yml` | `env/db/*.sql` mounted into `/docker-entrypoint-initdb.d/sql/`, ordered by `00_run_all.sh` | `99_test_seed.sql` (functional) + 500 perf users injected by `scripts/run-performance-tests.sh` when needed | none — `down -v` between runs |
| Staging | Same Postgres major (16) on the staging server, port 4312, `azaion` database | `env/db/*.sql` applied **manually under change control** via `psql -U azaion_superadmin`. New migrations land in the same numeric-prefix sequence (`07_*.sql`, `08_*.sql`, …) | anonymized prod snapshot, refreshed on demand | nightly `pg_dump` snapshot retained 14 days |
| Production | Same Postgres 16 on prod server | Same as staging; **migration must be applied to staging first**, observed for ≥ 24 h, then promoted to prod with operator approval | live | nightly `pg_dump` retained 30 days; weekly snapshot retained 12 weeks; off-host copy via `rsync` |

### Migration rules (cycle 1)

The project does NOT use an ORM migration framework (linq2db; restrictions.md). The conventions below replace it:

1. **Numeric-prefix ordering** — every new migration is added as `env/db/NN_<description>.sql` where `NN` continues the existing sequence. The current sequence is `01..06`; the next is `07_*.sql`.
2. **Forward-only by default**. Reversibility is provided by the off-host backup, NOT by hand-written DOWN scripts. The existing files (`02_structure.sql`, `03_add_timestamp_columns.sql`, `04_detection_classes.sql`, `06_users_email_unique.sql`) follow this pattern; we keep it.
3. **Backward-compatible deploys** — every schema change must be safe to apply BEFORE the matching code is deployed (additive change → deploy code → cleanup change in a later release). The cycle 1 example: `06_users_email_unique.sql` was applied first; the `RegisterUser` change to translate `23505` came after. AZ-197's `User.Hardware` column was kept as a tombstone instead of dropped, for the same reason.
4. **Production migrations need approval** — operator manually runs the SQL on prod after staging soak. No automatic CI execution against prod in cycle 1 (Drift J — automation is a future cycle's work).

### Drifts logged here

| ID | Severity | Description | Resolved In |
|----|----------|-------------|-------------|
| B  | Medium | No secret manager (status quo: hand-edited host `.env`) | **Resolved in spec** — sops + age (§3); concrete files + script in Step 7 |
| J  | Low (NEW) | DB migrations applied manually on staging/prod; no automation | **Carried forward** to a future cycle |

## 5. Self-verification

- [x] Four environments (Dev, Test/CI, Staging, Production) defined with purpose, infrastructure, and data source.
- [x] Environment variable sourcing matrix references `.env.example` (Step 1) without duplicating it.
- [x] No literal secrets in this document — only variable names and tool names.
- [x] Secret manager chosen for staging/production (sops + age) with rotation policy.
- [x] Database strategy per environment, including the explicit no-ORM-migrations convention.