Files
admin/_docs/05_security/infrastructure_review_cycle2.md
T
Oleksandr Bezdieniezhnykh 1bdbe8c96d [AZ-529] [AZ-530] Cycle-2 security audit reports
Step 14 (Security Audit) output for cycle 2. Verdict: FAIL — 2 Critical
(F-INFRA-1, F-INFRA-2) + 4 High (F-INFRA-3, F-INFRA-4, F-AUTH-1,
F-AUTH-2) block deploy. 13 cycle-2 findings total; cycle-1 closures
confirmed for F-6, F-7, F-8, F-13, A09.

Files:
- security_report_cycle2.md (delta on cycle-1 report; FAIL verdict,
  tracker follow-ups filed as AZ-552..AZ-557 + 9 deferred Medium/Low)
- owasp_review_cycle2.md (A01..A09 delta; 2 FAIL / 2 PASS_W_W / 5 PASS)
- static_analysis_cycle2.md (F-AUTH-1..9 with locations + remediation)
- infrastructure_review_cycle2.md (F-INFRA-1..6 with locations
  + remediation)
- dependency_scan_cycle2.md (no new CVEs; cycle-1 deprecations re-flagged)

Cycle-1 reports remain authoritative for non-cycle-2 surface.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-14 09:23:02 +03:00

101 lines
11 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Infrastructure & Configuration Review — Cycle 2 (Delta)
**Date**: 2026-05-14
**Scope**: cycle-2 surface only — `Program.cs` middleware/CORS/HSTS, DataProtection wiring, ES256 key store, `secrets/`, `scripts/deploy.sh` + `start-services.sh` + `_lib.sh` + `health-check.sh`, `docker-compose.test.yml`, `.env.example`, `.woodpecker/`. Cycle-1 categories that did not change since `infrastructure_review.md` are out of scope here.
**Read order**: this file is a delta on `infrastructure_review.md`. Categories not listed here keep their cycle-1 status.
## Cycle-2 Findings
### F-2026Q2-INFRA-1: Deploy script hard-blocks on obsolete `JwtConfig__Secret` (CRITICAL — deploy-blocking)
- **Location**: `scripts/start-services.sh:32`.
- **Description**: `start-services.sh` does `require_env ... ASPNETCORE_JwtConfig__Secret`, which kills the deploy if the variable is not set. AZ-532 removed `JwtConfig.Secret` entirely — `Program.cs:60-83` configures `JwtBearer` against `IssuerSigningKeyResolver` backed by `JwtSigningKeyProvider` (ES256 PEMs). A correctly-configured cycle-2 deploy that follows `.env.example` (which does NOT include `JwtConfig__Secret`) will fail at the deploy-script preflight.
- **Impact**: cycle-2 cannot deploy with the current scripts. Either the deploy fails on preflight, or the operator sets a dummy `JwtConfig__Secret=dummy` to get past the check — and then we hit F-INFRA-2 below.
- **Remediation**: replace the line with:
```
require_env ... ASPNETCORE_JwtConfig__KeysFolder ASPNETCORE_JwtConfig__ActiveKid
```
Drop `ASPNETCORE_JwtConfig__Secret` from the schema in `secrets/README.md` and from any documented env templates.
### F-2026Q2-INFRA-2: ES256 keys folder not bind-mounted into container (CRITICAL — deploy-blocking)
- **Location**: `scripts/start-services.sh:48-56` (the `docker run` line).
- **Description**: `JwtConfig.KeysFolder` defaults to `secrets/jwt-keys` (relative path, per `appsettings.json:15`). Inside the container, this resolves under `/app/`. The Dockerfile **does not** COPY `secrets/`, and `start-services.sh` **does not** add a `--volume` mapping for the host `secrets/jwt-keys` directory. Result: `JwtSigningKeyProvider.Load` (cycle-2 ctor) fails-fast at startup with "no PEM files found".
- **Impact**: container restart loop on every cycle-2 deploy. The only way to bring it up today is to manually `docker cp` the PEMs into the container — defeats reproducibility, no rotation story.
- **Remediation**: add a host-side directory (e.g. `/etc/azaion/jwt-keys` owned by the runtime user, mode 0700, PEMs mode 0400) and a corresponding `--volume "$DEPLOY_HOST_JWT_KEYS_DIR:/etc/azaion/jwt-keys:ro"` line in `start-services.sh`. Set `ASPNETCORE_JwtConfig__KeysFolder=/etc/azaion/jwt-keys` in the public env overlay. Document the host-side procedure in `secrets/README.md` and `_docs/04_deploy/`.
- **Cross-ref**: `e2e/test-keys` is correctly mounted in `docker-compose.test.yml:42` — the test stack works; only the prod deploy script is broken.
### F-2026Q2-INFRA-3: DataProtection key store ephemeral by default — MFA secrets unrecoverable across restarts (HIGH)
- **Location**: `Program.cs:147-160`, `scripts/start-services.sh` (no DataProtection bind-mount), `secrets/README.md` (no entry).
- **Description**: AZ-534 wraps `MfaSecret` ciphertext with `IDataProtector` (`Azaion.Mfa.Secret.v1` purpose). When `DataProtection:KeysFolder` is unset, ASP.NET Core writes its master keys to a per-machine path inside the container (`%LOCALAPPDATA%/ASP.NET/DataProtection-Keys` or `~/.aspnet/DataProtection-Keys` depending on platform), which is **lost on every container restart**. After the first restart, every existing `MfaSecret` becomes undecryptable; users with MFA enabled can no longer log in (their `/login/mfa` fails because the server can't unwrap the secret to verify TOTP), and they can't even self-disable MFA via `/users/me/mfa/disable` because that path also re-validates the existing TOTP. Recovery codes still work (SHA-256 hashed, no DataProtection involvement) — so the only escape is recovery-code-based login, then disable-and-re-enroll.
- **Impact**: catastrophic data loss for the auth surface. Every `docker stop && docker run` cycle locks every MFA user out.
- **Mitigating control (current)**: the cycle-2 test deploy is single-instance, so within one process lifetime the keys are stable. The risk crystallizes on first restart.
- **Remediation**:
1. Add a host-side persistent directory (e.g. `/var/lib/azaion/dp-keys`) owned by the runtime user, mode 0700.
2. Add `--volume "$DEPLOY_HOST_DP_KEYS_DIR:/var/lib/azaion/dp-keys"` to `start-services.sh`.
3. Set `ASPNETCORE_DataProtection__KeysFolder=/var/lib/azaion/dp-keys` in `secrets/<env>.public.env`.
4. Add a fail-fast in `Program.cs:151-160`: if `app.Environment.IsProduction()` and `DataProtection:KeysFolder` is unset, throw at startup. This makes the misconfiguration loud instead of silent.
5. Document key persistence and rotation in `secrets/README.md` and `_docs/04_deploy/`.
### F-2026Q2-INFRA-4: `secrets/README.md` schema still lists HS256-era `JwtConfig__Secret` (HIGH — doc drift, deploy-blocking by proxy)
- **Location**: `secrets/README.md:50-55` ("Schema (variables that MUST be in the encrypted file)").
- **Description**: the documented schema still requires `ASPNETCORE_JwtConfig__Secret=<32 random bytes>`. This is the same root cause as F-INFRA-1 but on the documentation side — operators following the README will set a useless variable, miss `JwtConfig__KeysFolder` / `JwtConfig__ActiveKid`, and miss `DataProtection__KeysFolder`.
- **Impact**: misleads any operator onboarding to the project; reinforces the broken deploy script.
- **Remediation**: rewrite the "What goes where" + "Schema" sections to:
- Drop `ASPNETCORE_JwtConfig__Secret`.
- Add `ASPNETCORE_JwtConfig__KeysFolder=/etc/azaion/jwt-keys` (path; not a secret — belongs in `<env>.public.env`).
- Add `ASPNETCORE_JwtConfig__ActiveKid=<current-kid>` (path; not a secret — belongs in `<env>.public.env`).
- Add `ASPNETCORE_DataProtection__KeysFolder=/var/lib/azaion/dp-keys` (path; not a secret).
- Note: the PEM private keys themselves are NOT in sops; they live on the host filesystem at `KeysFolder`, owned by the runtime user, mode 0400. Rotation procedure is in `scripts/generate-jwt-key.sh`.
### F-2026Q2-INFRA-5: HSTS preload+includeSubDomains may break legacy subdomains (MEDIUM)
- **Location**: `Program.cs:217-225`.
- **Description**: HSTS is configured with `MaxAge = 365 days`, `IncludeSubDomains = true`, `Preload = true` in non-Development environments. If any current or future `*.azaion.com` subdomain serves over plain HTTP (legacy admin tools, internal monitoring, dev/staging mirrors of unrelated systems), browsers that have seen the header will refuse to connect to those subdomains. Worse, **HSTS preload registration is essentially permanent** — the Chrome HSTS preload list takes weeks/months to be removed from once submitted, even after the header is disabled.
- **Impact**: operational blast radius if a non-HTTPS subdomain exists or is added later. Preload makes the mistake hard to reverse.
- **Remediation**:
1. Audit all `*.azaion.com` subdomains; confirm 100% HTTPS-only (including any internal-only ones — DNS hijacking can expose them to user browsers).
2. Document the subdomain inventory in `_docs/04_deploy/`.
3. Consider gating `Preload = true` behind an env var so staging and dev hosts don't trigger preload-list submission attempts.
4. Do NOT submit to the public preload list (https://hstspreload.org) until the audit is complete and signed off.
### F-2026Q2-INFRA-6: `audit_events` table has no retention policy (LOW — operational hygiene)
- **Location**: `env/db/07_auth_lockout_and_audit.sql`.
- **Description**: `audit_events` is append-only with no documented retention or partitioning. Every login attempt writes a row. At 10K users × 5 attempts/day = 50K rows/day = ~18M rows/year. Postgres handles this fine, and the composite index `(event_type, email, occurred_at DESC)` keeps `CountRecentFailedLogins` sub-millisecond, but unbounded growth has compliance implications (GDPR / data minimization), backup/restore time, and storage cost.
- **Impact**: not a security risk per se — audit completeness is the goal — but the regulatory storage horizon needs a stated answer.
- **Remediation**: agree retention (CMMC says ≥1 year for audit logs), add a nightly `DELETE FROM audit_events WHERE occurred_at < now() - interval '13 months'` job (cron + small script), document in `_docs/04_deploy/`. Optional: switch to monthly partitions so the cleanup is `DROP PARTITION` instead of a row-by-row delete.
### F-2026Q2-INFRA-7: `JwtSigningKeyProvider` silent fallback to first PEM (MEDIUM)
- **Location**: `Azaion.Services/JwtSigningKeyProvider.cs:73-86`.
- **Description**: when `JwtConfig.ActiveKid` is unset, the provider picks the alphabetically-first PEM and only logs at `LogInformation`. Adding a new PEM with a name that sorts earlier silently changes the signing key on next restart. The deploy script in `scripts/start-services.sh` does not require `ActiveKid`, and `.env.example:25-26` calls it "optional".
- **Impact**: operator drops `kid-2026-04-aaa.pem` thinking it's a side-by-side rotation key, restart, and now all newly-minted tokens are signed under a kid the verifiers may not yet have in their JWKS cache (1-h max-age — fleet sees signature failures for up to 1 h).
- **Remediation**:
1. Make `ActiveKid` required when more than one PEM is present (fail-fast at startup if ambiguous).
2. If exactly one PEM exists, accept it but log at `Warning` (not `Information`).
3. Update `.env.example` to mark `ActiveKid` as **required for prod** rather than "optional".
- **Cross-ref**: same finding documented in `static_analysis_cycle2.md` as F-2026Q2-AUTH-7. Listed here too because the operational-hardening fix (require it in `start-services.sh`) is in this scope.
## Re-verified categories (no cycle-2 regression)
| Area | Cycle-1 status | Cycle-2 verdict |
|------|----------------|-----------------|
| Container non-root user (F-6) | FAIL | **PASS** — Dockerfile now sets `USER app` (line 40) and chowns `/app/Content` + `/app/logs`. Closes F-6. |
| Production HTTPS enforcement (F-13) | FAIL | **PASS** — `app.UseHttpsRedirection()` + `app.UseHsts()` enabled in non-Development. Closes F-13 in code (still reverse-proxy fronting in deploy). |
| CORS | tight | **TIGHTER** — cycle-2 dropped the `http://admin.azaion.com` origin. Only `https://admin.azaion.com` remains. |
| Image pinned by digest | WARN | unchanged. Deferred. |
| Secrets via env vars | PASS | unchanged. |
| Test sidecar / E2E images | acceptable | unchanged. |
| Test compose `ASPNETCORE_ENVIRONMENT=Development` | acceptable for tests | unchanged. Flag operator risk: a misconfigured prod that inherits this value silently loses HTTPS enforcement and HSTS. |
| `.gitignore` excludes secrets | PASS | **PASS** — `secrets/jwt-keys/*` is gitignored; only `.gitkeep` tracked. Verified no PEMs in repo. |
## Self-verification
- [x] All cycle-2-touched infra files reviewed (Dockerfile, docker-compose.test.yml, scripts/*, secrets/*, .env.example, appsettings*.json, .woodpecker/*)
- [x] Each finding has file path + line number + remediation
- [x] Cycle-1 closures verified by re-reading the code (F-6 USER directive, F-13 HSTS+HttpsRedirection)
- [x] No false positives from test-only files (test fixtures are flagged as acceptable, not as findings)