Files
admin/_docs/05_security/infrastructure_review_cycle2.md
T
Oleksandr Bezdieniezhnykh 1bdbe8c96d [AZ-529] [AZ-530] Cycle-2 security audit reports
Step 14 (Security Audit) output for cycle 2. Verdict: FAIL — 2 Critical
(F-INFRA-1, F-INFRA-2) + 4 High (F-INFRA-3, F-INFRA-4, F-AUTH-1,
F-AUTH-2) block deploy. 13 cycle-2 findings total; cycle-1 closures
confirmed for F-6, F-7, F-8, F-13, A09.

Files:
- security_report_cycle2.md (delta on cycle-1 report; FAIL verdict,
  tracker follow-ups filed as AZ-552..AZ-557 + 9 deferred Medium/Low)
- owasp_review_cycle2.md (A01..A09 delta; 2 FAIL / 2 PASS_W_W / 5 PASS)
- static_analysis_cycle2.md (F-AUTH-1..9 with locations + remediation)
- infrastructure_review_cycle2.md (F-INFRA-1..6 with locations
  + remediation)
- dependency_scan_cycle2.md (no new CVEs; cycle-1 deprecations re-flagged)

Cycle-1 reports remain authoritative for non-cycle-2 surface.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-14 09:23:02 +03:00

11 KiB
Raw Blame History

Infrastructure & Configuration Review — Cycle 2 (Delta)

Date: 2026-05-14 Scope: cycle-2 surface only — Program.cs middleware/CORS/HSTS, DataProtection wiring, ES256 key store, secrets/, scripts/deploy.sh + start-services.sh + _lib.sh + health-check.sh, docker-compose.test.yml, .env.example, .woodpecker/. Cycle-1 categories that did not change since infrastructure_review.md are out of scope here. Read order: this file is a delta on infrastructure_review.md. Categories not listed here keep their cycle-1 status.

Cycle-2 Findings

F-2026Q2-INFRA-1: Deploy script hard-blocks on obsolete JwtConfig__Secret (CRITICAL — deploy-blocking)

  • Location: scripts/start-services.sh:32.
  • Description: start-services.sh does require_env ... ASPNETCORE_JwtConfig__Secret, which kills the deploy if the variable is not set. AZ-532 removed JwtConfig.Secret entirely — Program.cs:60-83 configures JwtBearer against IssuerSigningKeyResolver backed by JwtSigningKeyProvider (ES256 PEMs). A correctly-configured cycle-2 deploy that follows .env.example (which does NOT include JwtConfig__Secret) will fail at the deploy-script preflight.
  • Impact: cycle-2 cannot deploy with the current scripts. Either the deploy fails on preflight, or the operator sets a dummy JwtConfig__Secret=dummy to get past the check — and then we hit F-INFRA-2 below.
  • Remediation: replace the line with:
    require_env ... ASPNETCORE_JwtConfig__KeysFolder ASPNETCORE_JwtConfig__ActiveKid
    
    Drop ASPNETCORE_JwtConfig__Secret from the schema in secrets/README.md and from any documented env templates.

F-2026Q2-INFRA-2: ES256 keys folder not bind-mounted into container (CRITICAL — deploy-blocking)

  • Location: scripts/start-services.sh:48-56 (the docker run line).
  • Description: JwtConfig.KeysFolder defaults to secrets/jwt-keys (relative path, per appsettings.json:15). Inside the container, this resolves under /app/. The Dockerfile does not COPY secrets/, and start-services.sh does not add a --volume mapping for the host secrets/jwt-keys directory. Result: JwtSigningKeyProvider.Load (cycle-2 ctor) fails-fast at startup with "no PEM files found".
  • Impact: container restart loop on every cycle-2 deploy. The only way to bring it up today is to manually docker cp the PEMs into the container — defeats reproducibility, no rotation story.
  • Remediation: add a host-side directory (e.g. /etc/azaion/jwt-keys owned by the runtime user, mode 0700, PEMs mode 0400) and a corresponding --volume "$DEPLOY_HOST_JWT_KEYS_DIR:/etc/azaion/jwt-keys:ro" line in start-services.sh. Set ASPNETCORE_JwtConfig__KeysFolder=/etc/azaion/jwt-keys in the public env overlay. Document the host-side procedure in secrets/README.md and _docs/04_deploy/.
  • Cross-ref: e2e/test-keys is correctly mounted in docker-compose.test.yml:42 — the test stack works; only the prod deploy script is broken.

F-2026Q2-INFRA-3: DataProtection key store ephemeral by default — MFA secrets unrecoverable across restarts (HIGH)

  • Location: Program.cs:147-160, scripts/start-services.sh (no DataProtection bind-mount), secrets/README.md (no entry).
  • Description: AZ-534 wraps MfaSecret ciphertext with IDataProtector (Azaion.Mfa.Secret.v1 purpose). When DataProtection:KeysFolder is unset, ASP.NET Core writes its master keys to a per-machine path inside the container (%LOCALAPPDATA%/ASP.NET/DataProtection-Keys or ~/.aspnet/DataProtection-Keys depending on platform), which is lost on every container restart. After the first restart, every existing MfaSecret becomes undecryptable; users with MFA enabled can no longer log in (their /login/mfa fails because the server can't unwrap the secret to verify TOTP), and they can't even self-disable MFA via /users/me/mfa/disable because that path also re-validates the existing TOTP. Recovery codes still work (SHA-256 hashed, no DataProtection involvement) — so the only escape is recovery-code-based login, then disable-and-re-enroll.
  • Impact: catastrophic data loss for the auth surface. Every docker stop && docker run cycle locks every MFA user out.
  • Mitigating control (current): the cycle-2 test deploy is single-instance, so within one process lifetime the keys are stable. The risk crystallizes on first restart.
  • Remediation:
    1. Add a host-side persistent directory (e.g. /var/lib/azaion/dp-keys) owned by the runtime user, mode 0700.
    2. Add --volume "$DEPLOY_HOST_DP_KEYS_DIR:/var/lib/azaion/dp-keys" to start-services.sh.
    3. Set ASPNETCORE_DataProtection__KeysFolder=/var/lib/azaion/dp-keys in secrets/<env>.public.env.
    4. Add a fail-fast in Program.cs:151-160: if app.Environment.IsProduction() and DataProtection:KeysFolder is unset, throw at startup. This makes the misconfiguration loud instead of silent.
    5. Document key persistence and rotation in secrets/README.md and _docs/04_deploy/.

F-2026Q2-INFRA-4: secrets/README.md schema still lists HS256-era JwtConfig__Secret (HIGH — doc drift, deploy-blocking by proxy)

  • Location: secrets/README.md:50-55 ("Schema (variables that MUST be in the encrypted file)").
  • Description: the documented schema still requires ASPNETCORE_JwtConfig__Secret=<32 random bytes>. This is the same root cause as F-INFRA-1 but on the documentation side — operators following the README will set a useless variable, miss JwtConfig__KeysFolder / JwtConfig__ActiveKid, and miss DataProtection__KeysFolder.
  • Impact: misleads any operator onboarding to the project; reinforces the broken deploy script.
  • Remediation: rewrite the "What goes where" + "Schema" sections to:
    • Drop ASPNETCORE_JwtConfig__Secret.
    • Add ASPNETCORE_JwtConfig__KeysFolder=/etc/azaion/jwt-keys (path; not a secret — belongs in <env>.public.env).
    • Add ASPNETCORE_JwtConfig__ActiveKid=<current-kid> (path; not a secret — belongs in <env>.public.env).
    • Add ASPNETCORE_DataProtection__KeysFolder=/var/lib/azaion/dp-keys (path; not a secret).
    • Note: the PEM private keys themselves are NOT in sops; they live on the host filesystem at KeysFolder, owned by the runtime user, mode 0400. Rotation procedure is in scripts/generate-jwt-key.sh.

F-2026Q2-INFRA-5: HSTS preload+includeSubDomains may break legacy subdomains (MEDIUM)

  • Location: Program.cs:217-225.
  • Description: HSTS is configured with MaxAge = 365 days, IncludeSubDomains = true, Preload = true in non-Development environments. If any current or future *.azaion.com subdomain serves over plain HTTP (legacy admin tools, internal monitoring, dev/staging mirrors of unrelated systems), browsers that have seen the header will refuse to connect to those subdomains. Worse, HSTS preload registration is essentially permanent — the Chrome HSTS preload list takes weeks/months to be removed from once submitted, even after the header is disabled.
  • Impact: operational blast radius if a non-HTTPS subdomain exists or is added later. Preload makes the mistake hard to reverse.
  • Remediation:
    1. Audit all *.azaion.com subdomains; confirm 100% HTTPS-only (including any internal-only ones — DNS hijacking can expose them to user browsers).
    2. Document the subdomain inventory in _docs/04_deploy/.
    3. Consider gating Preload = true behind an env var so staging and dev hosts don't trigger preload-list submission attempts.
    4. Do NOT submit to the public preload list (https://hstspreload.org) until the audit is complete and signed off.

F-2026Q2-INFRA-6: audit_events table has no retention policy (LOW — operational hygiene)

  • Location: env/db/07_auth_lockout_and_audit.sql.
  • Description: audit_events is append-only with no documented retention or partitioning. Every login attempt writes a row. At 10K users × 5 attempts/day = 50K rows/day = ~18M rows/year. Postgres handles this fine, and the composite index (event_type, email, occurred_at DESC) keeps CountRecentFailedLogins sub-millisecond, but unbounded growth has compliance implications (GDPR / data minimization), backup/restore time, and storage cost.
  • Impact: not a security risk per se — audit completeness is the goal — but the regulatory storage horizon needs a stated answer.
  • Remediation: agree retention (CMMC says ≥1 year for audit logs), add a nightly DELETE FROM audit_events WHERE occurred_at < now() - interval '13 months' job (cron + small script), document in _docs/04_deploy/. Optional: switch to monthly partitions so the cleanup is DROP PARTITION instead of a row-by-row delete.

F-2026Q2-INFRA-7: JwtSigningKeyProvider silent fallback to first PEM (MEDIUM)

  • Location: Azaion.Services/JwtSigningKeyProvider.cs:73-86.
  • Description: when JwtConfig.ActiveKid is unset, the provider picks the alphabetically-first PEM and only logs at LogInformation. Adding a new PEM with a name that sorts earlier silently changes the signing key on next restart. The deploy script in scripts/start-services.sh does not require ActiveKid, and .env.example:25-26 calls it "optional".
  • Impact: operator drops kid-2026-04-aaa.pem thinking it's a side-by-side rotation key, restart, and now all newly-minted tokens are signed under a kid the verifiers may not yet have in their JWKS cache (1-h max-age — fleet sees signature failures for up to 1 h).
  • Remediation:
    1. Make ActiveKid required when more than one PEM is present (fail-fast at startup if ambiguous).
    2. If exactly one PEM exists, accept it but log at Warning (not Information).
    3. Update .env.example to mark ActiveKid as required for prod rather than "optional".
  • Cross-ref: same finding documented in static_analysis_cycle2.md as F-2026Q2-AUTH-7. Listed here too because the operational-hardening fix (require it in start-services.sh) is in this scope.

Re-verified categories (no cycle-2 regression)

Area Cycle-1 status Cycle-2 verdict
Container non-root user (F-6) FAIL PASS — Dockerfile now sets USER app (line 40) and chowns /app/Content + /app/logs. Closes F-6.
Production HTTPS enforcement (F-13) FAIL PASSapp.UseHttpsRedirection() + app.UseHsts() enabled in non-Development. Closes F-13 in code (still reverse-proxy fronting in deploy).
CORS tight TIGHTER — cycle-2 dropped the http://admin.azaion.com origin. Only https://admin.azaion.com remains.
Image pinned by digest WARN unchanged. Deferred.
Secrets via env vars PASS unchanged.
Test sidecar / E2E images acceptable unchanged.
Test compose ASPNETCORE_ENVIRONMENT=Development acceptable for tests unchanged. Flag operator risk: a misconfigured prod that inherits this value silently loses HTTPS enforcement and HSTS.
.gitignore excludes secrets PASS PASSsecrets/jwt-keys/* is gitignored; only .gitkeep tracked. Verified no PEMs in repo.

Self-verification

  • All cycle-2-touched infra files reviewed (Dockerfile, docker-compose.test.yml, scripts/, secrets/, .env.example, appsettings*.json, .woodpecker/*)
  • Each finding has file path + line number + remediation
  • Cycle-1 closures verified by re-reading the code (F-6 USER directive, F-13 HSTS+HttpsRedirection)
  • No false positives from test-only files (test fixtures are flagged as acceptable, not as findings)