mirror of
https://github.com/azaion/admin.git
synced 2026-06-21 13:01:08 +00:00
1bdbe8c96d
Step 14 (Security Audit) output for cycle 2. Verdict: FAIL — 2 Critical (F-INFRA-1, F-INFRA-2) + 4 High (F-INFRA-3, F-INFRA-4, F-AUTH-1, F-AUTH-2) block deploy. 13 cycle-2 findings total; cycle-1 closures confirmed for F-6, F-7, F-8, F-13, A09. Files: - security_report_cycle2.md (delta on cycle-1 report; FAIL verdict, tracker follow-ups filed as AZ-552..AZ-557 + 9 deferred Medium/Low) - owasp_review_cycle2.md (A01..A09 delta; 2 FAIL / 2 PASS_W_W / 5 PASS) - static_analysis_cycle2.md (F-AUTH-1..9 with locations + remediation) - infrastructure_review_cycle2.md (F-INFRA-1..6 with locations + remediation) - dependency_scan_cycle2.md (no new CVEs; cycle-1 deprecations re-flagged) Cycle-1 reports remain authoritative for non-cycle-2 surface. Co-authored-by: Cursor <cursoragent@cursor.com>
11 KiB
11 KiB
Infrastructure & Configuration Review — Cycle 2 (Delta)
Date: 2026-05-14
Scope: cycle-2 surface only — Program.cs middleware/CORS/HSTS, DataProtection wiring, ES256 key store, secrets/, scripts/deploy.sh + start-services.sh + _lib.sh + health-check.sh, docker-compose.test.yml, .env.example, .woodpecker/. Cycle-1 categories that did not change since infrastructure_review.md are out of scope here.
Read order: this file is a delta on infrastructure_review.md. Categories not listed here keep their cycle-1 status.
Cycle-2 Findings
F-2026Q2-INFRA-1: Deploy script hard-blocks on obsolete JwtConfig__Secret (CRITICAL — deploy-blocking)
- Location:
scripts/start-services.sh:32. - Description:
start-services.shdoesrequire_env ... ASPNETCORE_JwtConfig__Secret, which kills the deploy if the variable is not set. AZ-532 removedJwtConfig.Secretentirely —Program.cs:60-83configuresJwtBeareragainstIssuerSigningKeyResolverbacked byJwtSigningKeyProvider(ES256 PEMs). A correctly-configured cycle-2 deploy that follows.env.example(which does NOT includeJwtConfig__Secret) will fail at the deploy-script preflight. - Impact: cycle-2 cannot deploy with the current scripts. Either the deploy fails on preflight, or the operator sets a dummy
JwtConfig__Secret=dummyto get past the check — and then we hit F-INFRA-2 below. - Remediation: replace the line with:
Drop
require_env ... ASPNETCORE_JwtConfig__KeysFolder ASPNETCORE_JwtConfig__ActiveKidASPNETCORE_JwtConfig__Secretfrom the schema insecrets/README.mdand from any documented env templates.
F-2026Q2-INFRA-2: ES256 keys folder not bind-mounted into container (CRITICAL — deploy-blocking)
- Location:
scripts/start-services.sh:48-56(thedocker runline). - Description:
JwtConfig.KeysFolderdefaults tosecrets/jwt-keys(relative path, perappsettings.json:15). Inside the container, this resolves under/app/. The Dockerfile does not COPYsecrets/, andstart-services.shdoes not add a--volumemapping for the hostsecrets/jwt-keysdirectory. Result:JwtSigningKeyProvider.Load(cycle-2 ctor) fails-fast at startup with "no PEM files found". - Impact: container restart loop on every cycle-2 deploy. The only way to bring it up today is to manually
docker cpthe PEMs into the container — defeats reproducibility, no rotation story. - Remediation: add a host-side directory (e.g.
/etc/azaion/jwt-keysowned by the runtime user, mode 0700, PEMs mode 0400) and a corresponding--volume "$DEPLOY_HOST_JWT_KEYS_DIR:/etc/azaion/jwt-keys:ro"line instart-services.sh. SetASPNETCORE_JwtConfig__KeysFolder=/etc/azaion/jwt-keysin the public env overlay. Document the host-side procedure insecrets/README.mdand_docs/04_deploy/. - Cross-ref:
e2e/test-keysis correctly mounted indocker-compose.test.yml:42— the test stack works; only the prod deploy script is broken.
F-2026Q2-INFRA-3: DataProtection key store ephemeral by default — MFA secrets unrecoverable across restarts (HIGH)
- Location:
Program.cs:147-160,scripts/start-services.sh(no DataProtection bind-mount),secrets/README.md(no entry). - Description: AZ-534 wraps
MfaSecretciphertext withIDataProtector(Azaion.Mfa.Secret.v1purpose). WhenDataProtection:KeysFolderis unset, ASP.NET Core writes its master keys to a per-machine path inside the container (%LOCALAPPDATA%/ASP.NET/DataProtection-Keysor~/.aspnet/DataProtection-Keysdepending on platform), which is lost on every container restart. After the first restart, every existingMfaSecretbecomes undecryptable; users with MFA enabled can no longer log in (their/login/mfafails because the server can't unwrap the secret to verify TOTP), and they can't even self-disable MFA via/users/me/mfa/disablebecause that path also re-validates the existing TOTP. Recovery codes still work (SHA-256 hashed, no DataProtection involvement) — so the only escape is recovery-code-based login, then disable-and-re-enroll. - Impact: catastrophic data loss for the auth surface. Every
docker stop && docker runcycle locks every MFA user out. - Mitigating control (current): the cycle-2 test deploy is single-instance, so within one process lifetime the keys are stable. The risk crystallizes on first restart.
- Remediation:
- Add a host-side persistent directory (e.g.
/var/lib/azaion/dp-keys) owned by the runtime user, mode 0700. - Add
--volume "$DEPLOY_HOST_DP_KEYS_DIR:/var/lib/azaion/dp-keys"tostart-services.sh. - Set
ASPNETCORE_DataProtection__KeysFolder=/var/lib/azaion/dp-keysinsecrets/<env>.public.env. - Add a fail-fast in
Program.cs:151-160: ifapp.Environment.IsProduction()andDataProtection:KeysFolderis unset, throw at startup. This makes the misconfiguration loud instead of silent. - Document key persistence and rotation in
secrets/README.mdand_docs/04_deploy/.
- Add a host-side persistent directory (e.g.
F-2026Q2-INFRA-4: secrets/README.md schema still lists HS256-era JwtConfig__Secret (HIGH — doc drift, deploy-blocking by proxy)
- Location:
secrets/README.md:50-55("Schema (variables that MUST be in the encrypted file)"). - Description: the documented schema still requires
ASPNETCORE_JwtConfig__Secret=<32 random bytes>. This is the same root cause as F-INFRA-1 but on the documentation side — operators following the README will set a useless variable, missJwtConfig__KeysFolder/JwtConfig__ActiveKid, and missDataProtection__KeysFolder. - Impact: misleads any operator onboarding to the project; reinforces the broken deploy script.
- Remediation: rewrite the "What goes where" + "Schema" sections to:
- Drop
ASPNETCORE_JwtConfig__Secret. - Add
ASPNETCORE_JwtConfig__KeysFolder=/etc/azaion/jwt-keys(path; not a secret — belongs in<env>.public.env). - Add
ASPNETCORE_JwtConfig__ActiveKid=<current-kid>(path; not a secret — belongs in<env>.public.env). - Add
ASPNETCORE_DataProtection__KeysFolder=/var/lib/azaion/dp-keys(path; not a secret). - Note: the PEM private keys themselves are NOT in sops; they live on the host filesystem at
KeysFolder, owned by the runtime user, mode 0400. Rotation procedure is inscripts/generate-jwt-key.sh.
- Drop
F-2026Q2-INFRA-5: HSTS preload+includeSubDomains may break legacy subdomains (MEDIUM)
- Location:
Program.cs:217-225. - Description: HSTS is configured with
MaxAge = 365 days,IncludeSubDomains = true,Preload = truein non-Development environments. If any current or future*.azaion.comsubdomain serves over plain HTTP (legacy admin tools, internal monitoring, dev/staging mirrors of unrelated systems), browsers that have seen the header will refuse to connect to those subdomains. Worse, HSTS preload registration is essentially permanent — the Chrome HSTS preload list takes weeks/months to be removed from once submitted, even after the header is disabled. - Impact: operational blast radius if a non-HTTPS subdomain exists or is added later. Preload makes the mistake hard to reverse.
- Remediation:
- Audit all
*.azaion.comsubdomains; confirm 100% HTTPS-only (including any internal-only ones — DNS hijacking can expose them to user browsers). - Document the subdomain inventory in
_docs/04_deploy/. - Consider gating
Preload = truebehind an env var so staging and dev hosts don't trigger preload-list submission attempts. - Do NOT submit to the public preload list (https://hstspreload.org) until the audit is complete and signed off.
- Audit all
F-2026Q2-INFRA-6: audit_events table has no retention policy (LOW — operational hygiene)
- Location:
env/db/07_auth_lockout_and_audit.sql. - Description:
audit_eventsis append-only with no documented retention or partitioning. Every login attempt writes a row. At 10K users × 5 attempts/day = 50K rows/day = ~18M rows/year. Postgres handles this fine, and the composite index(event_type, email, occurred_at DESC)keepsCountRecentFailedLoginssub-millisecond, but unbounded growth has compliance implications (GDPR / data minimization), backup/restore time, and storage cost. - Impact: not a security risk per se — audit completeness is the goal — but the regulatory storage horizon needs a stated answer.
- Remediation: agree retention (CMMC says ≥1 year for audit logs), add a nightly
DELETE FROM audit_events WHERE occurred_at < now() - interval '13 months'job (cron + small script), document in_docs/04_deploy/. Optional: switch to monthly partitions so the cleanup isDROP PARTITIONinstead of a row-by-row delete.
F-2026Q2-INFRA-7: JwtSigningKeyProvider silent fallback to first PEM (MEDIUM)
- Location:
Azaion.Services/JwtSigningKeyProvider.cs:73-86. - Description: when
JwtConfig.ActiveKidis unset, the provider picks the alphabetically-first PEM and only logs atLogInformation. Adding a new PEM with a name that sorts earlier silently changes the signing key on next restart. The deploy script inscripts/start-services.shdoes not requireActiveKid, and.env.example:25-26calls it "optional". - Impact: operator drops
kid-2026-04-aaa.pemthinking it's a side-by-side rotation key, restart, and now all newly-minted tokens are signed under a kid the verifiers may not yet have in their JWKS cache (1-h max-age — fleet sees signature failures for up to 1 h). - Remediation:
- Make
ActiveKidrequired when more than one PEM is present (fail-fast at startup if ambiguous). - If exactly one PEM exists, accept it but log at
Warning(notInformation). - Update
.env.exampleto markActiveKidas required for prod rather than "optional".
- Make
- Cross-ref: same finding documented in
static_analysis_cycle2.mdas F-2026Q2-AUTH-7. Listed here too because the operational-hardening fix (require it instart-services.sh) is in this scope.
Re-verified categories (no cycle-2 regression)
| Area | Cycle-1 status | Cycle-2 verdict |
|---|---|---|
| Container non-root user (F-6) | FAIL | PASS — Dockerfile now sets USER app (line 40) and chowns /app/Content + /app/logs. Closes F-6. |
| Production HTTPS enforcement (F-13) | FAIL | PASS — app.UseHttpsRedirection() + app.UseHsts() enabled in non-Development. Closes F-13 in code (still reverse-proxy fronting in deploy). |
| CORS | tight | TIGHTER — cycle-2 dropped the http://admin.azaion.com origin. Only https://admin.azaion.com remains. |
| Image pinned by digest | WARN | unchanged. Deferred. |
| Secrets via env vars | PASS | unchanged. |
| Test sidecar / E2E images | acceptable | unchanged. |
Test compose ASPNETCORE_ENVIRONMENT=Development |
acceptable for tests | unchanged. Flag operator risk: a misconfigured prod that inherits this value silently loses HTTPS enforcement and HSTS. |
.gitignore excludes secrets |
PASS | PASS — secrets/jwt-keys/* is gitignored; only .gitkeep tracked. Verified no PEMs in repo. |
Self-verification
- All cycle-2-touched infra files reviewed (Dockerfile, docker-compose.test.yml, scripts/, secrets/, .env.example, appsettings*.json, .woodpecker/*)
- Each finding has file path + line number + remediation
- Cycle-1 closures verified by re-reading the code (F-6 USER directive, F-13 HSTS+HttpsRedirection)
- No false positives from test-only files (test fixtures are flagged as acceptable, not as findings)