mirror of
https://github.com/azaion/admin.git
synced 2026-06-21 09:41:10 +00:00
1bdbe8c96d
Step 14 (Security Audit) output for cycle 2. Verdict: FAIL — 2 Critical (F-INFRA-1, F-INFRA-2) + 4 High (F-INFRA-3, F-INFRA-4, F-AUTH-1, F-AUTH-2) block deploy. 13 cycle-2 findings total; cycle-1 closures confirmed for F-6, F-7, F-8, F-13, A09. Files: - security_report_cycle2.md (delta on cycle-1 report; FAIL verdict, tracker follow-ups filed as AZ-552..AZ-557 + 9 deferred Medium/Low) - owasp_review_cycle2.md (A01..A09 delta; 2 FAIL / 2 PASS_W_W / 5 PASS) - static_analysis_cycle2.md (F-AUTH-1..9 with locations + remediation) - infrastructure_review_cycle2.md (F-INFRA-1..6 with locations + remediation) - dependency_scan_cycle2.md (no new CVEs; cycle-1 deprecations re-flagged) Cycle-1 reports remain authoritative for non-cycle-2 surface. Co-authored-by: Cursor <cursoragent@cursor.com>
101 lines
11 KiB
Markdown
101 lines
11 KiB
Markdown
# Infrastructure & Configuration Review — Cycle 2 (Delta)
|
||
|
||
**Date**: 2026-05-14
|
||
**Scope**: cycle-2 surface only — `Program.cs` middleware/CORS/HSTS, DataProtection wiring, ES256 key store, `secrets/`, `scripts/deploy.sh` + `start-services.sh` + `_lib.sh` + `health-check.sh`, `docker-compose.test.yml`, `.env.example`, `.woodpecker/`. Cycle-1 categories that did not change since `infrastructure_review.md` are out of scope here.
|
||
**Read order**: this file is a delta on `infrastructure_review.md`. Categories not listed here keep their cycle-1 status.
|
||
|
||
## Cycle-2 Findings
|
||
|
||
### F-2026Q2-INFRA-1: Deploy script hard-blocks on obsolete `JwtConfig__Secret` (CRITICAL — deploy-blocking)
|
||
|
||
- **Location**: `scripts/start-services.sh:32`.
|
||
- **Description**: `start-services.sh` does `require_env ... ASPNETCORE_JwtConfig__Secret`, which kills the deploy if the variable is not set. AZ-532 removed `JwtConfig.Secret` entirely — `Program.cs:60-83` configures `JwtBearer` against `IssuerSigningKeyResolver` backed by `JwtSigningKeyProvider` (ES256 PEMs). A correctly-configured cycle-2 deploy that follows `.env.example` (which does NOT include `JwtConfig__Secret`) will fail at the deploy-script preflight.
|
||
- **Impact**: cycle-2 cannot deploy with the current scripts. Either the deploy fails on preflight, or the operator sets a dummy `JwtConfig__Secret=dummy` to get past the check — and then we hit F-INFRA-2 below.
|
||
- **Remediation**: replace the line with:
|
||
```
|
||
require_env ... ASPNETCORE_JwtConfig__KeysFolder ASPNETCORE_JwtConfig__ActiveKid
|
||
```
|
||
Drop `ASPNETCORE_JwtConfig__Secret` from the schema in `secrets/README.md` and from any documented env templates.
|
||
|
||
### F-2026Q2-INFRA-2: ES256 keys folder not bind-mounted into container (CRITICAL — deploy-blocking)
|
||
|
||
- **Location**: `scripts/start-services.sh:48-56` (the `docker run` line).
|
||
- **Description**: `JwtConfig.KeysFolder` defaults to `secrets/jwt-keys` (relative path, per `appsettings.json:15`). Inside the container, this resolves under `/app/`. The Dockerfile **does not** COPY `secrets/`, and `start-services.sh` **does not** add a `--volume` mapping for the host `secrets/jwt-keys` directory. Result: `JwtSigningKeyProvider.Load` (cycle-2 ctor) fails-fast at startup with "no PEM files found".
|
||
- **Impact**: container restart loop on every cycle-2 deploy. The only way to bring it up today is to manually `docker cp` the PEMs into the container — defeats reproducibility, no rotation story.
|
||
- **Remediation**: add a host-side directory (e.g. `/etc/azaion/jwt-keys` owned by the runtime user, mode 0700, PEMs mode 0400) and a corresponding `--volume "$DEPLOY_HOST_JWT_KEYS_DIR:/etc/azaion/jwt-keys:ro"` line in `start-services.sh`. Set `ASPNETCORE_JwtConfig__KeysFolder=/etc/azaion/jwt-keys` in the public env overlay. Document the host-side procedure in `secrets/README.md` and `_docs/04_deploy/`.
|
||
- **Cross-ref**: `e2e/test-keys` is correctly mounted in `docker-compose.test.yml:42` — the test stack works; only the prod deploy script is broken.
|
||
|
||
### F-2026Q2-INFRA-3: DataProtection key store ephemeral by default — MFA secrets unrecoverable across restarts (HIGH)
|
||
|
||
- **Location**: `Program.cs:147-160`, `scripts/start-services.sh` (no DataProtection bind-mount), `secrets/README.md` (no entry).
|
||
- **Description**: AZ-534 wraps `MfaSecret` ciphertext with `IDataProtector` (`Azaion.Mfa.Secret.v1` purpose). When `DataProtection:KeysFolder` is unset, ASP.NET Core writes its master keys to a per-machine path inside the container (`%LOCALAPPDATA%/ASP.NET/DataProtection-Keys` or `~/.aspnet/DataProtection-Keys` depending on platform), which is **lost on every container restart**. After the first restart, every existing `MfaSecret` becomes undecryptable; users with MFA enabled can no longer log in (their `/login/mfa` fails because the server can't unwrap the secret to verify TOTP), and they can't even self-disable MFA via `/users/me/mfa/disable` because that path also re-validates the existing TOTP. Recovery codes still work (SHA-256 hashed, no DataProtection involvement) — so the only escape is recovery-code-based login, then disable-and-re-enroll.
|
||
- **Impact**: catastrophic data loss for the auth surface. Every `docker stop && docker run` cycle locks every MFA user out.
|
||
- **Mitigating control (current)**: the cycle-2 test deploy is single-instance, so within one process lifetime the keys are stable. The risk crystallizes on first restart.
|
||
- **Remediation**:
|
||
1. Add a host-side persistent directory (e.g. `/var/lib/azaion/dp-keys`) owned by the runtime user, mode 0700.
|
||
2. Add `--volume "$DEPLOY_HOST_DP_KEYS_DIR:/var/lib/azaion/dp-keys"` to `start-services.sh`.
|
||
3. Set `ASPNETCORE_DataProtection__KeysFolder=/var/lib/azaion/dp-keys` in `secrets/<env>.public.env`.
|
||
4. Add a fail-fast in `Program.cs:151-160`: if `app.Environment.IsProduction()` and `DataProtection:KeysFolder` is unset, throw at startup. This makes the misconfiguration loud instead of silent.
|
||
5. Document key persistence and rotation in `secrets/README.md` and `_docs/04_deploy/`.
|
||
|
||
### F-2026Q2-INFRA-4: `secrets/README.md` schema still lists HS256-era `JwtConfig__Secret` (HIGH — doc drift, deploy-blocking by proxy)
|
||
|
||
- **Location**: `secrets/README.md:50-55` ("Schema (variables that MUST be in the encrypted file)").
|
||
- **Description**: the documented schema still requires `ASPNETCORE_JwtConfig__Secret=<32 random bytes>`. This is the same root cause as F-INFRA-1 but on the documentation side — operators following the README will set a useless variable, miss `JwtConfig__KeysFolder` / `JwtConfig__ActiveKid`, and miss `DataProtection__KeysFolder`.
|
||
- **Impact**: misleads any operator onboarding to the project; reinforces the broken deploy script.
|
||
- **Remediation**: rewrite the "What goes where" + "Schema" sections to:
|
||
- Drop `ASPNETCORE_JwtConfig__Secret`.
|
||
- Add `ASPNETCORE_JwtConfig__KeysFolder=/etc/azaion/jwt-keys` (path; not a secret — belongs in `<env>.public.env`).
|
||
- Add `ASPNETCORE_JwtConfig__ActiveKid=<current-kid>` (path; not a secret — belongs in `<env>.public.env`).
|
||
- Add `ASPNETCORE_DataProtection__KeysFolder=/var/lib/azaion/dp-keys` (path; not a secret).
|
||
- Note: the PEM private keys themselves are NOT in sops; they live on the host filesystem at `KeysFolder`, owned by the runtime user, mode 0400. Rotation procedure is in `scripts/generate-jwt-key.sh`.
|
||
|
||
### F-2026Q2-INFRA-5: HSTS preload+includeSubDomains may break legacy subdomains (MEDIUM)
|
||
|
||
- **Location**: `Program.cs:217-225`.
|
||
- **Description**: HSTS is configured with `MaxAge = 365 days`, `IncludeSubDomains = true`, `Preload = true` in non-Development environments. If any current or future `*.azaion.com` subdomain serves over plain HTTP (legacy admin tools, internal monitoring, dev/staging mirrors of unrelated systems), browsers that have seen the header will refuse to connect to those subdomains. Worse, **HSTS preload registration is essentially permanent** — the Chrome HSTS preload list takes weeks/months to be removed from once submitted, even after the header is disabled.
|
||
- **Impact**: operational blast radius if a non-HTTPS subdomain exists or is added later. Preload makes the mistake hard to reverse.
|
||
- **Remediation**:
|
||
1. Audit all `*.azaion.com` subdomains; confirm 100% HTTPS-only (including any internal-only ones — DNS hijacking can expose them to user browsers).
|
||
2. Document the subdomain inventory in `_docs/04_deploy/`.
|
||
3. Consider gating `Preload = true` behind an env var so staging and dev hosts don't trigger preload-list submission attempts.
|
||
4. Do NOT submit to the public preload list (https://hstspreload.org) until the audit is complete and signed off.
|
||
|
||
### F-2026Q2-INFRA-6: `audit_events` table has no retention policy (LOW — operational hygiene)
|
||
|
||
- **Location**: `env/db/07_auth_lockout_and_audit.sql`.
|
||
- **Description**: `audit_events` is append-only with no documented retention or partitioning. Every login attempt writes a row. At 10K users × 5 attempts/day = 50K rows/day = ~18M rows/year. Postgres handles this fine, and the composite index `(event_type, email, occurred_at DESC)` keeps `CountRecentFailedLogins` sub-millisecond, but unbounded growth has compliance implications (GDPR / data minimization), backup/restore time, and storage cost.
|
||
- **Impact**: not a security risk per se — audit completeness is the goal — but the regulatory storage horizon needs a stated answer.
|
||
- **Remediation**: agree retention (CMMC says ≥1 year for audit logs), add a nightly `DELETE FROM audit_events WHERE occurred_at < now() - interval '13 months'` job (cron + small script), document in `_docs/04_deploy/`. Optional: switch to monthly partitions so the cleanup is `DROP PARTITION` instead of a row-by-row delete.
|
||
|
||
### F-2026Q2-INFRA-7: `JwtSigningKeyProvider` silent fallback to first PEM (MEDIUM)
|
||
|
||
- **Location**: `Azaion.Services/JwtSigningKeyProvider.cs:73-86`.
|
||
- **Description**: when `JwtConfig.ActiveKid` is unset, the provider picks the alphabetically-first PEM and only logs at `LogInformation`. Adding a new PEM with a name that sorts earlier silently changes the signing key on next restart. The deploy script in `scripts/start-services.sh` does not require `ActiveKid`, and `.env.example:25-26` calls it "optional".
|
||
- **Impact**: operator drops `kid-2026-04-aaa.pem` thinking it's a side-by-side rotation key, restart, and now all newly-minted tokens are signed under a kid the verifiers may not yet have in their JWKS cache (1-h max-age — fleet sees signature failures for up to 1 h).
|
||
- **Remediation**:
|
||
1. Make `ActiveKid` required when more than one PEM is present (fail-fast at startup if ambiguous).
|
||
2. If exactly one PEM exists, accept it but log at `Warning` (not `Information`).
|
||
3. Update `.env.example` to mark `ActiveKid` as **required for prod** rather than "optional".
|
||
- **Cross-ref**: same finding documented in `static_analysis_cycle2.md` as F-2026Q2-AUTH-7. Listed here too because the operational-hardening fix (require it in `start-services.sh`) is in this scope.
|
||
|
||
## Re-verified categories (no cycle-2 regression)
|
||
|
||
| Area | Cycle-1 status | Cycle-2 verdict |
|
||
|------|----------------|-----------------|
|
||
| Container non-root user (F-6) | FAIL | **PASS** — Dockerfile now sets `USER app` (line 40) and chowns `/app/Content` + `/app/logs`. Closes F-6. |
|
||
| Production HTTPS enforcement (F-13) | FAIL | **PASS** — `app.UseHttpsRedirection()` + `app.UseHsts()` enabled in non-Development. Closes F-13 in code (still reverse-proxy fronting in deploy). |
|
||
| CORS | tight | **TIGHTER** — cycle-2 dropped the `http://admin.azaion.com` origin. Only `https://admin.azaion.com` remains. |
|
||
| Image pinned by digest | WARN | unchanged. Deferred. |
|
||
| Secrets via env vars | PASS | unchanged. |
|
||
| Test sidecar / E2E images | acceptable | unchanged. |
|
||
| Test compose `ASPNETCORE_ENVIRONMENT=Development` | acceptable for tests | unchanged. Flag operator risk: a misconfigured prod that inherits this value silently loses HTTPS enforcement and HSTS. |
|
||
| `.gitignore` excludes secrets | PASS | **PASS** — `secrets/jwt-keys/*` is gitignored; only `.gitkeep` tracked. Verified no PEMs in repo. |
|
||
|
||
## Self-verification
|
||
|
||
- [x] All cycle-2-touched infra files reviewed (Dockerfile, docker-compose.test.yml, scripts/*, secrets/*, .env.example, appsettings*.json, .woodpecker/*)
|
||
- [x] Each finding has file path + line number + remediation
|
||
- [x] Cycle-1 closures verified by re-reading the code (F-6 USER directive, F-13 HSTS+HttpsRedirection)
|
||
- [x] No false positives from test-only files (test fixtures are flagged as acceptable, not as findings)
|