Files
admin/_docs/04_deploy/environment_strategy.md
T
Oleksandr Bezdieniezhnykh c7b297de83
ci/woodpecker/push/01-test Pipeline failed
ci/woodpecker/push/02-build-push unknown status
refactor: remove deploy.cmd and update Dockerfile for health checks
- Deleted the deploy.cmd script as it was no longer needed.
- Updated Dockerfile to include curl for health checks and added a non-root user for improved security.
- Modified health check command to use curl for better reliability.
- Adjusted docker-compose.test.yml to reflect changes in health check configuration.
- Cleaned up appsettings.json and removed unused configuration properties.
- Removed Resource entity and related requests from the codebase as part of the architectural shift.
- Updated documentation to reflect the removal of hardware binding and related endpoints.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-13 08:47:21 +03:00

11 KiB

Azaion Admin API — Environment Strategy

Date: 2026-05-13 · Cycle: 1 · Status: planning artifact (no scripts; concrete wiring lands in Step 7).

1. Environments

Environment Purpose Infrastructure Data Source
Development Local developer workflow on macOS / Linux. Either bare dotnet run against host Postgres (port 4312) or the new docker-compose.yml planned in Step 2 (API + Postgres on a private Docker network). Empty database; SQL files under env/db/ create roles + schema; no fixtures.
Test (CI) Black-box tests in CI and locally via scripts/run-tests.sh. docker-compose.test.yml — API + Postgres + e2e-runner on a Docker network. Functional fixtures from e2e/db-init/00_run_all.sh + 99_test_seed.sql.
Staging Pre-production validation. Self-hosted Linux server, single Docker host, behind Nginx reverse proxy on stage.admin.azaion.com. Mirrors prod topology and Postgres major version. Anonymized snapshot of production (PII scrubbed by an offline script before import).
Production Live system. Self-hosted Linux server, single Docker host, behind Nginx reverse proxy on admin.azaion.com. Live data; daily off-host backups.

Test is added as a first-class environment because cycle 1 already exercises it (docker-compose.test.yml). The deploy template lists three; we list four to match reality.

2. Environment Variables

Source of Truth

The complete variable inventory lives in .env.example at the repo root (Step 1, 24 entries). This document does NOT duplicate that table — it only specifies, per environment, where each variable is sourced.

Per-environment sourcing

Variable group Development Test (CI) Staging Production
ASPNETCORE_ENVIRONMENT .env (Development) docker-compose environment: (Development) docker-compose / --env-file (Staging) docker-compose / --env-file (Production)
ASPNETCORE_URLS .env compose host .env (rendered from sops) host .env (rendered from sops)
ConnectionStrings__* .env (real local creds) compose (literal — accepted F-10) sops-encrypted file in git → decrypted on host at deploy time same as staging
JwtConfig__Secret .env (dev-only literal) compose (literal — accepted F-10) sops-encrypted sops-encrypted
JwtConfig__{Issuer,Audience,Lifetime} appsettings defaults appsettings defaults host .env if non-default host .env if non-default
ResourcesConfig__* appsettings defaults compose host .env if non-default host .env if non-default
DEPLOY_*, REGISTRY_TAG .env (developer machine) n/a passed to scripts/deploy.sh from operator's shell or CI manual trigger same
REGISTRY_USER, REGISTRY_TOKEN empty in dev .env Woodpecker secrets registry_user / registry_token Woodpecker secrets (CI deploy) or operator's shell (manual deploy) same
CI_COMMIT_SHA unset → image label unknown Woodpecker built-in Woodpecker built-in Woodpecker built-in

Variable Validation (fail-fast)

The Admin API already does this for the most security-critical variable:

var jwtConfig = builder.Configuration.GetSection(nameof(JwtConfig)).Get<JwtConfig>();
if (jwtConfig == null || string.IsNullOrEmpty(jwtConfig.Secret))
    throw new Exception("Missing configuration section: JwtConfig");

The deploy plan adds the same fail-fast check for connection strings during Step 7 wiring (a one-time _ = configuration.GetConnectionString("AzaionDb") ?? throw … plus the same for AzaionDbAdmin, executed during WebApplication build). Without the check, a missing variable currently surfaces only on the first DB call, which is too late.

Static / lookup-style variables (ResourcesConfig__*, JwtConfig__{Issuer,Audience,Lifetime}) keep their appsettings.json defaults in every environment unless an override is required. We do NOT add fail-fast checks for them.

3. Secrets Management

Decision

Environment Method Tool
Development .env file committed .env.example + per-developer .env (git-ignored)
Test (CI) docker-compose environment: literals accepted as test-only (security audit F-10)
Staging git-tracked encrypted file sops + age
Production git-tracked encrypted file sops + age

Why sops + age (not Vault, not Woodpecker secrets, not hand-edited .env)

Constraints: self-hosted, no cloud account, single ops engineer, currently hand-editing .env on the host.

Option Pros Cons Verdict
sops + age (chosen) Secrets versioned in git, encrypted at rest, decrypted on the host with a single age key. No new infra. Works offline. Requires per-environment age keypair stored on the host outside git. Manual key rotation. pragmatic for this team size and topology
HashiCorp Vault (self-hosted) Dynamic DB creds, audit log, fine-grained ACL, KV v2. Adds a service to operate, monitor, back up. Single-engineer ops budget cannot absorb it now. revisit in a future cycle when ops capacity grows
Woodpecker secrets exported into runtime container Reuses existing secret store. Couples runtime config to CI; secrets are not visible/auditable outside Woodpecker UI; cannot run the container outside CI without manually exporting them. leaks the CI/runtime boundary
Hand-edited host .env (status quo) Zero new tooling. No version history, no encryption, no review trail. Single point of failure if the file is lost; security audit can't track changes. status quo we are leaving behind (Drift B)

sops + age conventions for this repo

secrets/
├── .sops.yaml                 # routes secrets/staging.env / production.env to the right age recipients
├── staging.env                # SOPS-encrypted; safe to commit
└── production.env             # SOPS-encrypted; safe to commit
  • .sops.yaml declares two age recipients: recipient_staging and recipient_production (public keys).
  • The matching age private keys live on each host at /etc/azaion/age.key, mode 0400, owned by root. They are NEVER committed.
  • scripts/deploy.sh (Step 7) runs SOPS_AGE_KEY_FILE=/etc/azaion/age.key sops -d secrets/${env}.env > /tmp/azaion.env and feeds it to docker run --env-file.
  • All staging/production env values that are NOT secret (e.g. DEPLOY_HOST_PORT, REGISTRY_TAG) live in plain-text secrets/staging.public.env / secrets/production.public.env next to the encrypted file, also git-tracked. Loaded before the decrypted overlay.

Rotation policy

Secret Rotation cadence Procedure
Postgres azaion_admin / azaion_reader passwords every 90 days, on operator schedule ALTER ROLE … WITH PASSWORD … → re-encrypt production.envscripts/deploy.sh
JWT JwtConfig__Secret every 180 days, AND on any suspected leak re-encrypt → deploy. All issued tokens become invalid — communicate maintenance window.
azaion_superadmin password every 365 days, AND on owner change manual; not used by the running app, only by DB migrations
Registry REGISTRY_TOKEN every 90 days OR on CI compromise rotate registry credential → update Woodpecker secret registry_token → re-encrypt production.env if also referenced there
age private key (/etc/azaion/age.key) every 365 days OR on host compromise generate new key → add public recipient to .sops.yamlsops updatekeys secrets/*.env → distribute new private key out-of-band → remove old recipient

4. Database Management

Environment Type Migrations Data Backup
Development Local Postgres on host (port 4312) or dockerized Postgres from docker-compose.yml env/db/*.sql applied manually by developer the first time, then *_users_email_unique.sql-style additive scripts run with psql on demand empty none
Test (CI) Postgres 16-alpine from docker-compose.test.yml env/db/*.sql mounted into /docker-entrypoint-initdb.d/sql/, ordered by 00_run_all.sh 99_test_seed.sql (functional) + 500 perf users injected by scripts/run-performance-tests.sh when needed none — down -v between runs
Staging Same Postgres major (16) on the staging server, port 4312, azaion database env/db/*.sql applied manually under change control via psql -U azaion_superadmin. New migrations land in the same numeric-prefix sequence (07_*.sql, 08_*.sql, …) anonymized prod snapshot, refreshed on demand nightly pg_dump snapshot retained 14 days
Production Same Postgres 16 on prod server Same as staging; migration must be applied to staging first, observed for ≥ 24 h, then promoted to prod with operator approval live nightly pg_dump retained 30 days; weekly snapshot retained 12 weeks; off-host copy via rsync

Migration rules (cycle 1)

The project does NOT use an ORM migration framework (linq2db; restrictions.md). The conventions below replace it:

  1. Numeric-prefix ordering — every new migration is added as env/db/NN_<description>.sql where NN continues the existing sequence. The current sequence is 01..06; the next is 07_*.sql.
  2. Forward-only by default. Reversibility is provided by the off-host backup, NOT by hand-written DOWN scripts. The existing files (02_structure.sql, 03_add_timestamp_columns.sql, 04_detection_classes.sql, 06_users_email_unique.sql) follow this pattern; we keep it.
  3. Backward-compatible deploys — every schema change must be safe to apply BEFORE the matching code is deployed (additive change → deploy code → cleanup change in a later release). The cycle 1 example: 06_users_email_unique.sql was applied first; the RegisterUser change to translate 23505 came after. AZ-197's User.Hardware column was kept as a tombstone instead of dropped, for the same reason.
  4. Production migrations need approval — operator manually runs the SQL on prod after staging soak. No automatic CI execution against prod in cycle 1 (Drift J — automation is a future cycle's work).

Drifts logged here

ID Severity Description Resolved In
B Medium No secret manager (status quo: hand-edited host .env) Resolved in spec — sops + age (§3); concrete files + script in Step 7
J Low (NEW) DB migrations applied manually on staging/prod; no automation Carried forward to a future cycle

5. Self-verification

  • Four environments (Dev, Test/CI, Staging, Production) defined with purpose, infrastructure, and data source.
  • Environment variable sourcing matrix references .env.example (Step 1) without duplicating it.
  • No literal secrets in this document — only variable names and tool names.
  • Secret manager chosen for staging/production (sops + age) with rotation policy.
  • Database strategy per environment, including the explicit no-ORM-migrations convention.