mirror of
https://github.com/azaion/admin.git
synced 2026-06-21 12:11:09 +00:00
[AZ-552] [AZ-553] [AZ-554] [AZ-555] Cycle-2 hotfix: deploy/infra chain
Batch 5 (cycle 2 hotfix sprint, batch 1 of 2). 6 story points under epic AZ-530. Addresses 2 Critical + 2 High deploy-blocking findings from security_report_cycle2.md (F-INFRA-1..F-INFRA-4). AZ-552 — drop_jwt_secret_deploy_preflight (1 pt, F-INFRA-1 Critical) scripts/start-services.sh swaps obsolete JwtConfig__Secret preflight for the cycle-2 trio (KeysFolder + ActiveKid + DataProtection.KeysFolder). .env.example, env/api/env.ps1, _docs/04_deploy/* updated to match. Repo scan in scripts/ and .env.example returns 0 offenders. AZ-553 — bind_mount_es256_keys (2 pts, F-INFRA-2 Critical) start-services.sh bind-mounts DEPLOY_HOST_JWT_KEYS_DIR read-only at /etc/azaion/jwt-keys; preflight fails fast on a missing or empty host directory with operator-actionable error messages. AZ-554 — persist_dataprotection_keys (2 pts, F-INFRA-3 High) Program.cs DataProtection wiring now fails fast in Production when KeysFolder is unset OR not probe-writable. start-services.sh bind-mounts DEPLOY_HOST_DP_KEYS_DIR read-write at /var/lib/azaion/dp-keys. Development behaviour unchanged (ephemeral default). AZ-555 — secrets_readme_es256_rewrite (1 pt, F-INFRA-4 High) secrets/README.md schema fully rewritten; new "Host-side directories" subsection with bind-mount table + ownership/permission guidance. Cycle-1 JwtConfig__Secret removed from live schema (one prose deprecation paragraph retained). Adjacent hygiene module-layout.md "Owns" extended to include scripts/, secrets/, env/, .env.example (gap from Step 9 new-task layout-delta). Tests e2e/Azaion.E2E/Tests/Cycle2HotfixDeployTests.cs — 19 facts (8 exec, 11 Skip with rationale per AZ-537/AZ-538 precedent). Skipped tests cover preflight/restart/Production-only paths verified at deploy gate. Build: 0W 0E across Azaion.AdminApi + Azaion.E2E. Test run deferred to autodev Step 11 (Run Tests). Tracker transition deferred to next batch (MCP availability unverified in this session — Leftovers pattern). Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -0,0 +1,89 @@
|
||||
# Drop Obsolete `JwtConfig__Secret` From Deploy Preflight
|
||||
|
||||
**Task**: AZ-552_drop_jwt_secret_deploy_preflight
|
||||
**Name**: Drop obsolete `JwtConfig__Secret` from deploy preflight
|
||||
**Description**: `scripts/start-services.sh` still hard-requires `ASPNETCORE_JwtConfig__Secret`, the HS256-era env var that AZ-532 removed. A correctly-configured cycle-2 deploy fails at preflight before the container starts. Replace the check with the new ES256 inputs (`KeysFolder` + `ActiveKid`).
|
||||
**Complexity**: 1 point
|
||||
**Dependencies**: None
|
||||
**Component**: Deploy / scripts
|
||||
**Tracker**: AZ-552
|
||||
**Epic**: AZ-530
|
||||
**CMMC ref**: SC.L2-3.13.11 (FIPS-validated cryptography — cycle-2 ES256 supersedes HS256)
|
||||
**Source**: `_docs/05_security/security_report_cycle2.md` F-INFRA-1 (Critical, deploy-blocking); `_docs/05_security/infrastructure_review_cycle2.md` §F-2026Q2-INFRA-1
|
||||
|
||||
## Problem
|
||||
|
||||
`scripts/start-services.sh` calls `require_env ... ASPNETCORE_JwtConfig__Secret` against the obsolete HS256 symmetric secret. AZ-532 removed `JwtConfig.Secret` from `Azaion.Common/Configs/JwtConfig.cs` — `Program.cs` now configures JwtBearer via `IssuerSigningKeyResolver` backed by `JwtSigningKeyProvider`, which reads ES256 PEMs from `JwtConfig.KeysFolder` and selects the active key by `JwtConfig.ActiveKid`. A cycle-2 deploy that follows the new `.env.example` (which does NOT set `JwtConfig__Secret`) fails the preflight gate and never starts the container. Operators who work around this by setting a dummy `JwtConfig__Secret=dummy` immediately hit F-INFRA-2 (no key folder mounted), so the workaround doesn't help.
|
||||
|
||||
## Outcome
|
||||
|
||||
- Cycle-2 deploys that supply `ASPNETCORE_JwtConfig__KeysFolder` + `ASPNETCORE_JwtConfig__ActiveKid` pass preflight without `JwtConfig__Secret` being set.
|
||||
- Cycle-2 deploys that omit `KeysFolder` or `ActiveKid` fail preflight with a clear, actionable error naming the missing variable.
|
||||
- The deploy script no longer references `JwtConfig__Secret` anywhere.
|
||||
- `.env.example` no longer documents `JwtConfig__Secret`.
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
|
||||
- Edit `scripts/start-services.sh`: replace the `require_env ... ASPNETCORE_JwtConfig__Secret` line with the cycle-2 required pair.
|
||||
- Audit `scripts/_lib.sh`, `scripts/deploy.sh`, `scripts/pull-images.sh`, `scripts/health-check.sh` and `.env.example` for any other reference to `JwtConfig__Secret` / `JWT_SECRET`; remove them.
|
||||
- Update `_docs/04_deploy/` if any deploy doc still names `JwtConfig__Secret` as required.
|
||||
|
||||
### Excluded
|
||||
|
||||
- The bind-mount of the keys folder itself — that is AZ-553. This ticket only stops the deploy from failing on the obsolete env var; AZ-553 makes the keys actually reach the container.
|
||||
- `secrets/README.md` rewrite — that is AZ-555.
|
||||
- The suite-level `_infra/deploy/webserver/` flow that still uses `JWT_SECRET`. That is owned by the suite repo, not admin. Logged separately as a process leftover.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: Deploy preflight passes without `JwtConfig__Secret`**
|
||||
Given `ASPNETCORE_JwtConfig__KeysFolder=/etc/azaion/jwt-keys` and `ASPNETCORE_JwtConfig__ActiveKid=<kid>` are set
|
||||
And `ASPNETCORE_JwtConfig__Secret` is unset
|
||||
When `scripts/start-services.sh` runs preflight
|
||||
Then preflight completes successfully and the container is started.
|
||||
|
||||
**AC-2: Preflight fails clearly when `KeysFolder` is missing**
|
||||
Given `ASPNETCORE_JwtConfig__ActiveKid` is set but `ASPNETCORE_JwtConfig__KeysFolder` is unset
|
||||
When `scripts/start-services.sh` runs preflight
|
||||
Then the script exits non-zero with an error message that names `ASPNETCORE_JwtConfig__KeysFolder`.
|
||||
|
||||
**AC-3: Preflight fails clearly when `ActiveKid` is missing**
|
||||
Given `ASPNETCORE_JwtConfig__KeysFolder` is set but `ASPNETCORE_JwtConfig__ActiveKid` is unset
|
||||
When `scripts/start-services.sh` runs preflight
|
||||
Then the script exits non-zero with an error message that names `ASPNETCORE_JwtConfig__ActiveKid`.
|
||||
|
||||
**AC-4: No references to `JwtConfig__Secret` remain in `scripts/` or `.env.example`**
|
||||
Given the admin repo at HEAD
|
||||
When `rg "JwtConfig__Secret"` is run against `scripts/` and `.env.example`
|
||||
Then no matches are returned.
|
||||
|
||||
## Non-Functional Requirements
|
||||
|
||||
**Compatibility**
|
||||
- Existing operators with both old and new env vars set must not be broken by the change — the old var is simply ignored.
|
||||
|
||||
## Blackbox Tests
|
||||
|
||||
| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
|
||||
|--------|------------------------|-------------|-------------------|----------------|
|
||||
| AC-1 | Env: `KeysFolder`+`ActiveKid` set, `Secret` unset | Run `start-services.sh` preflight | Preflight passes, container starts | — |
|
||||
| AC-2 | Env: `ActiveKid` set, `KeysFolder` unset | Run `start-services.sh` preflight | Exit non-zero, error names `KeysFolder` | — |
|
||||
| AC-3 | Env: `KeysFolder` set, `ActiveKid` unset | Run `start-services.sh` preflight | Exit non-zero, error names `ActiveKid` | — |
|
||||
| AC-4 | Repo at HEAD | `rg "JwtConfig__Secret" scripts/ .env.example` | Empty result | — |
|
||||
|
||||
## Constraints
|
||||
|
||||
- Must not change any runtime behaviour of the application — this is a script-only change.
|
||||
- Error messages must come from the existing `require_env` helper in `_lib.sh` (do not add a new ad-hoc error path).
|
||||
|
||||
## Risks & Mitigation
|
||||
|
||||
**Risk 1: Operators with stale `.env` files**
|
||||
- *Risk*: An operator with an old `.env` that sets `JwtConfig__Secret` but not the new pair will see the deploy fail at preflight.
|
||||
- *Mitigation*: This is the desired behaviour. Document the migration in `secrets/README.md` (AZ-555) so the failure is self-diagnosable.
|
||||
|
||||
**Risk 2: Suite-level `_infra/deploy/webserver/` deploy still works the old way**
|
||||
- *Risk*: The suite-level webserver deploy pipeline at `suite/_infra/deploy/webserver/` injects `JWT_SECRET` and would still appear functional even though it shouldn't. Out-of-scope here; logged as suite-level leftover.
|
||||
- *Mitigation*: Cross-reference the suite-level follow-up ticket in this task's commit message so the linkage is discoverable.
|
||||
@@ -0,0 +1,105 @@
|
||||
# Bind-Mount ES256 Keys Folder Into Container + Host-Side Procedure
|
||||
|
||||
**Task**: AZ-553_bind_mount_es256_keys
|
||||
**Name**: Bind-mount ES256 keys folder into container + host-side procedure
|
||||
**Description**: `JwtSigningKeyProvider` fail-fasts on startup if `JwtConfig.KeysFolder` is missing or empty. The deploy script never makes `secrets/jwt-keys` visible inside the container — the path is host-only. Add the bind-mount, document the host-side directory, and gate it through the existing env-template machinery.
|
||||
**Complexity**: 2 points
|
||||
**Dependencies**: AZ-552 (preflight must accept the new env vars first)
|
||||
**Component**: Deploy / scripts + host provisioning
|
||||
**Tracker**: AZ-553
|
||||
**Epic**: AZ-530
|
||||
**CMMC ref**: SC.L2-3.13.10 (key management), SC.L2-3.13.11 (FIPS-validated crypto)
|
||||
**Source**: `_docs/05_security/security_report_cycle2.md` F-INFRA-2 (Critical, deploy-blocking); `_docs/05_security/infrastructure_review_cycle2.md` §F-2026Q2-INFRA-2
|
||||
|
||||
## Problem
|
||||
|
||||
`Azaion.AdminApi/Program.cs` configures JwtBearer to resolve signing keys via `JwtSigningKeyProvider`, which reads PEM files from `JwtConfig.KeysFolder` at startup and fails fast if the folder is missing, empty, or unreadable. `appsettings.json` defaults `KeysFolder` to a container-local path (e.g. `/etc/azaion/jwt-keys`), but `scripts/start-services.sh` does not bind-mount the host's `secrets/jwt-keys` into that path. Even if AZ-552 unblocks the preflight, the container itself fails to start because the keys folder inside the container is empty.
|
||||
|
||||
## Outcome
|
||||
|
||||
- Container has read-only access to ES256 PEMs at the path named by `JwtConfig.KeysFolder` at startup.
|
||||
- The host-side directory is parameterised by an env var (`DEPLOY_HOST_JWT_KEYS_DIR`) so the deploy works from CI runners, dev VMs, and production hosts without code changes.
|
||||
- `JwtSigningKeyProvider` startup probe passes on a freshly-deployed cycle-2 container with a populated host-side keys folder.
|
||||
- `.env.example` documents the new host-side env var with a sensible default and a note that it must point at a directory the container user can read.
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
|
||||
- Edit `scripts/start-services.sh`: add `--volume "$DEPLOY_HOST_JWT_KEYS_DIR:/etc/azaion/jwt-keys:ro"` (or the equivalent in the docker-compose stack the script orchestrates) to the admin container args.
|
||||
- Preflight: also require `DEPLOY_HOST_JWT_KEYS_DIR` to be set AND to point at an existing directory containing at least one `.pem` file.
|
||||
- Document `DEPLOY_HOST_JWT_KEYS_DIR` in `.env.example`.
|
||||
- Add a short host-side runbook section to `_docs/04_deploy/` (or extend the existing one) covering: where the host directory lives, how to populate it (use `scripts/generate-jwt-key.sh`), file ownership/permissions (readable by the container's `app` UID), and rotation.
|
||||
- Sanity-check that `JwtConfig.KeysFolder` in `appsettings.json` matches the container-side mount target the script uses; if not, align them.
|
||||
|
||||
### Excluded
|
||||
|
||||
- Operational key-rotation policy (cadence, key-revocation lifecycle). Tracked separately if not already captured in cycle-1 deploy docs.
|
||||
- DataProtection key folder — that is AZ-554.
|
||||
- `secrets/README.md` rewrite for the new env vars — that is AZ-555.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: Container can read PEMs at the configured KeysFolder path**
|
||||
Given `DEPLOY_HOST_JWT_KEYS_DIR=/var/lib/azaion/jwt-keys` exists on the host and contains a valid PEM
|
||||
And `ASPNETCORE_JwtConfig__KeysFolder=/etc/azaion/jwt-keys`
|
||||
And `ASPNETCORE_JwtConfig__ActiveKid=<kid>` matches a PEM in the folder
|
||||
When `scripts/start-services.sh` deploys the admin container
|
||||
Then the container reports a successful startup and the readiness probe on `/health/ready` returns 200.
|
||||
|
||||
**AC-2: Preflight fails when the host-side directory is missing**
|
||||
Given `DEPLOY_HOST_JWT_KEYS_DIR` is set but the directory does not exist
|
||||
When `scripts/start-services.sh` runs preflight
|
||||
Then the script exits non-zero with an error message that names the missing directory.
|
||||
|
||||
**AC-3: Preflight fails when the host-side directory is empty**
|
||||
Given `DEPLOY_HOST_JWT_KEYS_DIR` is set and the directory exists but contains no `.pem` files
|
||||
When `scripts/start-services.sh` runs preflight
|
||||
Then the script exits non-zero with an actionable error referencing the missing PEMs.
|
||||
|
||||
**AC-4: Bind-mount is read-only**
|
||||
Given the admin container is running with the new bind-mount
|
||||
When the container process attempts to write to `/etc/azaion/jwt-keys/`
|
||||
Then the write is denied by the filesystem layer.
|
||||
|
||||
**AC-5: `.env.example` documents the new variable**
|
||||
Given the admin repo at HEAD
|
||||
When `.env.example` is opened
|
||||
Then it contains a `DEPLOY_HOST_JWT_KEYS_DIR=` entry with a comment explaining its purpose.
|
||||
|
||||
## Non-Functional Requirements
|
||||
|
||||
**Security**
|
||||
- The bind-mount MUST be read-only. The container process never has write authority over the key store.
|
||||
|
||||
**Reliability**
|
||||
- Preflight failures must be explicit and actionable — operators should not have to inspect container logs to diagnose a missing mount.
|
||||
|
||||
## Blackbox Tests
|
||||
|
||||
| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
|
||||
|--------|------------------------|-------------|-------------------|----------------|
|
||||
| AC-1 | Host dir populated, env vars set | Run `start-services.sh`, then `curl /health/ready` | Container up, `/health/ready` → 200 | — |
|
||||
| AC-2 | Env var set, host dir missing | Run `start-services.sh` preflight | Exit non-zero, error names the directory | — |
|
||||
| AC-3 | Env var set, host dir present but empty | Run `start-services.sh` preflight | Exit non-zero, error names the missing PEMs | — |
|
||||
| AC-4 | Container running, attempt write inside container | `touch /etc/azaion/jwt-keys/x` from container | Permission denied | Security |
|
||||
| AC-5 | Repo at HEAD | Open `.env.example` | `DEPLOY_HOST_JWT_KEYS_DIR=` is documented | — |
|
||||
|
||||
## Constraints
|
||||
|
||||
- Must follow the existing `_lib.sh` helper style — do not introduce a new preflight pattern.
|
||||
- Must work on both the CI runner deploy path AND the production host deploy path (no host-specific hard-coding).
|
||||
|
||||
## Risks & Mitigation
|
||||
|
||||
**Risk 1: Container user cannot read the host-side PEMs**
|
||||
- *Risk*: PEMs owned by `root:root 600` on the host are invisible to the container's `app` user.
|
||||
- *Mitigation*: Host runbook prescribes ownership/perms (`chown app:app`, `chmod 640` or `0400`). Include a verification step in the runbook.
|
||||
|
||||
**Risk 2: KeysFolder default in `appsettings.json` drifts from the mount target**
|
||||
- *Risk*: If `JwtConfig.KeysFolder` in `appsettings.json` says `/secrets/jwt-keys` but the bind-mount uses `/etc/azaion/jwt-keys`, the container fails-fast even with the mount in place.
|
||||
- *Mitigation*: AC-1 covers the end-to-end happy path; if it fails, the alignment is the first thing to check. Document the contract in the runbook.
|
||||
|
||||
**Risk 3: Multiple PEMs, ambiguous active key**
|
||||
- *Risk*: If the operator drops several PEMs into the folder, `JwtSigningKeyProvider` must still pick one deterministically.
|
||||
- *Mitigation*: Already covered by AZ-NEW-10 (F-AUTH-7) which tightens `ActiveKid` semantics. This task only ensures the folder is reachable.
|
||||
@@ -0,0 +1,111 @@
|
||||
# Persist DataProtection Keys Folder + Fail-Fast In Production
|
||||
|
||||
**Task**: AZ-554_persist_dataprotection_keys
|
||||
**Name**: Persist DataProtection keys folder + fail-fast in Production
|
||||
**Description**: DataProtection (which encrypts MFA secrets, recovery codes, and any other protected payload) currently writes its master keys to an ephemeral container path. Every container restart rotates the master key, which permanently locks every MFA-enrolled user out of their account. Persist the key folder onto the host, document the env var, and fail-fast in Production if the folder is unconfigured.
|
||||
**Complexity**: 2 points
|
||||
**Dependencies**: AZ-553 (host-side volume pattern + runbook section established)
|
||||
**Component**: Admin API + Deploy / scripts
|
||||
**Tracker**: AZ-554
|
||||
**Epic**: AZ-530
|
||||
**CMMC ref**: SC.L2-3.13.10 (key management), IA.L2-3.5.7 (passwords, secrets storage)
|
||||
**Source**: `_docs/05_security/security_report_cycle2.md` F-INFRA-3 (High); `_docs/05_security/infrastructure_review_cycle2.md` §F-2026Q2-INFRA-3
|
||||
|
||||
## Problem
|
||||
|
||||
`Program.cs` configures `services.AddDataProtection()` without specifying a persistent key folder. ASP.NET Core defaults the key ring to an OS-specific path that, inside a container, lives on the writable layer and vanishes on every restart. AZ-534 uses DataProtection to encrypt the per-user TOTP `MfaSecret` at rest; AZ-534 also encrypts recovery codes. When the master key rotates on restart:
|
||||
|
||||
- Existing `MfaSecret` ciphertexts can no longer be decrypted → no user can verify TOTP at login.
|
||||
- Existing recovery-code hashes (if also DataProtection-wrapped) become unusable.
|
||||
|
||||
The net effect on the next `docker restart` is a hard lockout of every MFA-enrolled user. No data is corrupted on disk — but recovery requires either operator intervention or a re-enrolment campaign.
|
||||
|
||||
## Outcome
|
||||
|
||||
- DataProtection master keys persist across container restarts in Production.
|
||||
- In Production, the app refuses to start if `DataProtection.KeysFolder` is unset (no silent fallback to the ephemeral path).
|
||||
- Development environment continues to work with the ephemeral default (no behavioural change for local devs).
|
||||
- `.env.example` and the deploy runbook document the new host-side env var.
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
|
||||
- `Program.cs`: bind `DataProtection.KeysFolder` from configuration, call `PersistKeysToFileSystem(...)` when set, and add a Production-only fail-fast in the `AppEnv.IsProduction()` branch if the folder is unset, missing, or not writable.
|
||||
- `appsettings.json`: add a `DataProtection` section with documented keys (`KeysFolder`).
|
||||
- `scripts/start-services.sh`: bind-mount `$DEPLOY_HOST_DP_KEYS_DIR` onto the container at `/var/lib/azaion/dp-keys` (read-write — DataProtection must rotate keys on its own schedule).
|
||||
- `secrets/<env>.public.env`: set `ASPNETCORE_DataProtection__KeysFolder=/var/lib/azaion/dp-keys` in production/staging templates.
|
||||
- `.env.example`: document `DEPLOY_HOST_DP_KEYS_DIR`.
|
||||
- Extend the deploy runbook section authored by AZ-553 to cover the DataProtection mount alongside the JWT mount (same host-side layout, same ownership/perms guidance).
|
||||
|
||||
### Excluded
|
||||
|
||||
- Encrypting the DataProtection keys at rest with a hardware secret (HSM / KMS-wrapped). Larger scope; would belong to a separate hardening epic.
|
||||
- Cross-instance key sharing for a horizontally-scaled admin deployment. Currently single-instance per environment.
|
||||
- Reading the AZ-534 / AZ-NEW-12 user-cache invalidation concern — out of scope for this ticket.
|
||||
- `secrets/README.md` rewrite — AZ-555.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: MFA survives container restart in Production**
|
||||
Given a Production deploy with `DEPLOY_HOST_DP_KEYS_DIR` mounted
|
||||
And a user has enrolled in TOTP MFA before the restart
|
||||
When the admin container is stopped and started again
|
||||
Then the user can complete a fresh `/login` + `/login/mfa` cycle using their existing TOTP authenticator (no recovery code, no re-enrolment).
|
||||
|
||||
**AC-2: Production fails-fast when `KeysFolder` is unset**
|
||||
Given `ASPNETCORE_ENVIRONMENT=Production` and `ASPNETCORE_DataProtection__KeysFolder` is unset
|
||||
When the admin process starts
|
||||
Then the process exits non-zero with a startup-log entry that names `DataProtection.KeysFolder` as the missing/invalid configuration.
|
||||
|
||||
**AC-3: Production fails-fast when `KeysFolder` is not writable**
|
||||
Given `ASPNETCORE_ENVIRONMENT=Production` and `KeysFolder` points at a path that is not writable by the container user
|
||||
When the admin process starts
|
||||
Then the process exits non-zero with a startup-log entry naming the path and the missing permission.
|
||||
|
||||
**AC-4: Development unchanged**
|
||||
Given `ASPNETCORE_ENVIRONMENT=Development` and `KeysFolder` is unset
|
||||
When the admin process starts
|
||||
Then the process starts normally (uses the ephemeral default) and no fail-fast is triggered.
|
||||
|
||||
**AC-5: Mount is read-write**
|
||||
Given the admin container is running with the new bind-mount
|
||||
When the DataProtection key ring rotates (test by writing a probe file `/var/lib/azaion/dp-keys/.probe`)
|
||||
Then the write succeeds.
|
||||
|
||||
## Non-Functional Requirements
|
||||
|
||||
**Reliability**
|
||||
- Container restart MUST NOT invalidate already-issued MFA secrets or DataProtection-wrapped ciphertexts.
|
||||
|
||||
**Security**
|
||||
- Mount must be writable by the container user but not world-readable on the host (`chmod 0700` host-side, container user owns).
|
||||
|
||||
## Blackbox Tests
|
||||
|
||||
| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
|
||||
|--------|------------------------|-------------|-------------------|----------------|
|
||||
| AC-1 | Prod env, mount configured, user MFA-enrolled, restart container | Login + MFA verify after restart | Same TOTP secret still works | Reliability |
|
||||
| AC-2 | Prod env, `KeysFolder` unset | Start admin process | Exit non-zero, log names `DataProtection.KeysFolder` | — |
|
||||
| AC-3 | Prod env, `KeysFolder` read-only path | Start admin process | Exit non-zero, log names path + permission | — |
|
||||
| AC-4 | Dev env, `KeysFolder` unset | Start admin process | Process starts, ephemeral default used | — |
|
||||
| AC-5 | Container running, mount RW | Probe write inside mount | Write succeeds | Security |
|
||||
|
||||
## Constraints
|
||||
|
||||
- Persist via `PersistKeysToFileSystem` on the configured folder; do not introduce a database-backed or third-party key store in this ticket.
|
||||
- Fail-fast must be Production-only — Development workflows depend on the ephemeral default.
|
||||
|
||||
## Risks & Mitigation
|
||||
|
||||
**Risk 1: Existing prod users locked out at first restart after deploy**
|
||||
- *Risk*: The first container restart AFTER this fix ships is fine going forward, but any MFA enrolments done on the cycle-2 build BEFORE this fix are encrypted with an already-lost master key. Those users are still locked out.
|
||||
- *Mitigation*: Cycle 2 has not been deployed to Production yet (the security audit FAILed before deploy). No real users are affected. Document this lifecycle clearly in the runbook so future hotfix sequencing avoids the same trap.
|
||||
|
||||
**Risk 2: Host-side directory permissions wrong**
|
||||
- *Risk*: If the operator creates `$DEPLOY_HOST_DP_KEYS_DIR` as `root:root 700`, the container user cannot write.
|
||||
- *Mitigation*: AC-3 fail-fast catches this immediately on startup. Runbook includes the explicit ownership/perms command.
|
||||
|
||||
**Risk 3: Drift between `appsettings.json` default and the runtime mount target**
|
||||
- *Risk*: Default in `appsettings.json` says one path; deploy script mounts another; container fails-fast.
|
||||
- *Mitigation*: AC-5 indirectly covers this via the probe-write step; runbook section explicitly states the mount target == config value.
|
||||
@@ -0,0 +1,106 @@
|
||||
# Rewrite `secrets/README.md` Schema For ES256 + DataProtection
|
||||
|
||||
**Task**: AZ-555_secrets_readme_es256_rewrite
|
||||
**Name**: Rewrite `secrets/README.md` schema for ES256 + DataProtection
|
||||
**Description**: `secrets/README.md` still documents the obsolete HS256-era `JwtConfig__Secret` env var and omits the new cycle-2 env vars (`JwtConfig__KeysFolder`, `JwtConfig__ActiveKid`, `DataProtection__KeysFolder`, and their `DEPLOY_HOST_*` host-side counterparts). Operators following this README will misconfigure the deploy, producing the same failure modes that F-INFRA-1/2/3 describe. Rewrite the schema section to match the cycle-2 reality.
|
||||
**Complexity**: 1 point
|
||||
**Dependencies**: AZ-552, AZ-553, AZ-554 (all three must define their env vars first so the README documents what actually exists)
|
||||
**Component**: Operator docs / `secrets/`
|
||||
**Tracker**: AZ-555
|
||||
**Epic**: AZ-530
|
||||
**CMMC ref**: CM.L2-3.4.1 (baseline configuration), CM.L2-3.4.2 (security configuration settings)
|
||||
**Source**: `_docs/05_security/security_report_cycle2.md` F-INFRA-4 (High); `_docs/05_security/infrastructure_review_cycle2.md` §F-2026Q2-INFRA-4
|
||||
|
||||
## Problem
|
||||
|
||||
`secrets/README.md` is the canonical operator handover for what env vars to set, where, and why. Today it still:
|
||||
- Lists `ASPNETCORE_JwtConfig__Secret` as a required HS256 symmetric secret with rotation guidance.
|
||||
- Does not document `ASPNETCORE_JwtConfig__KeysFolder` or `ASPNETCORE_JwtConfig__ActiveKid`.
|
||||
- Does not mention DataProtection key persistence at all.
|
||||
- Does not mention the host-side `DEPLOY_HOST_JWT_KEYS_DIR` / `DEPLOY_HOST_DP_KEYS_DIR` bind-mount sources.
|
||||
|
||||
An operator following this README produces a misconfigured deploy. Even after AZ-552/553/554 land, the README will silently steer operators back to the broken pattern.
|
||||
|
||||
## Outcome
|
||||
|
||||
- `secrets/README.md` "Schema" section is the source of truth for cycle-2 env vars.
|
||||
- Removed: every reference to `JwtConfig__Secret` / `JWT_SECRET` for the admin component.
|
||||
- Added: `JwtConfig__KeysFolder`, `JwtConfig__ActiveKid`, `DataProtection__KeysFolder`, plus the `DEPLOY_HOST_*` host-side variables.
|
||||
- Added: a short "Host-side directories" subsection that mirrors the deploy runbook (with a one-line cross-link, not a duplicate).
|
||||
- Added: a "Key rotation" subsection covering both JWT signing keys and DataProtection master keys, with file-ownership / permission guidance.
|
||||
- README's "Files in this folder" inventory matches the actual filesystem layout under `secrets/`.
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
|
||||
- Rewrite `secrets/README.md` Schema section in full.
|
||||
- Update the inventory list to include `jwt-keys/` and (if introduced for prod) the DataProtection key dir handover.
|
||||
- Cross-link to the deploy runbook section authored by AZ-553/AZ-554 — do not duplicate the runbook content here.
|
||||
- Reconcile against `.env.example` so no required env var is listed in one place and not the other.
|
||||
|
||||
### Excluded
|
||||
|
||||
- Cycle-1 sections of the README that are still accurate (signing-cert handover, database connection strings) — leave them alone unless inconsistent.
|
||||
- Operational SOPs that live in `_docs/04_deploy/` — those are owned by the deploy skill.
|
||||
- A real key-rotation runbook (cadence, revocation lifecycle) — only document the file-level guidance here.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: No remaining references to `JwtConfig__Secret`**
|
||||
Given the admin repo at HEAD
|
||||
When `rg "JwtConfig__Secret|JWT_SECRET" secrets/README.md` is run
|
||||
Then no matches are returned.
|
||||
|
||||
**AC-2: New env vars are documented**
|
||||
Given `secrets/README.md` at HEAD
|
||||
When the Schema section is read
|
||||
Then it documents each of: `ASPNETCORE_JwtConfig__KeysFolder`, `ASPNETCORE_JwtConfig__ActiveKid`, `ASPNETCORE_DataProtection__KeysFolder`, `DEPLOY_HOST_JWT_KEYS_DIR`, `DEPLOY_HOST_DP_KEYS_DIR`.
|
||||
|
||||
**AC-3: README and `.env.example` are consistent**
|
||||
Given both files at HEAD
|
||||
When the lists of required env vars are diffed
|
||||
Then every variable required by the README is present in `.env.example` and vice versa (no orphans in either direction).
|
||||
|
||||
**AC-4: File-ownership guidance present**
|
||||
Given `secrets/README.md` at HEAD
|
||||
When the Host-side directories subsection is read
|
||||
Then it states the required ownership/perms for the host-side directories (container user readable for JWT keys, container user writable for DataProtection keys).
|
||||
|
||||
**AC-5: Operator can deploy from README alone**
|
||||
Given a fresh operator who has never seen the cycle-2 deploy
|
||||
When they follow only `secrets/README.md` and `.env.example`
|
||||
Then they end up with a deploy that passes preflight (AZ-552), starts the container (AZ-553), and survives a restart with MFA intact (AZ-554). This is verified by a dry-run review during code review, not by an automated test.
|
||||
|
||||
## Non-Functional Requirements
|
||||
|
||||
**Accuracy**
|
||||
- Every env var named in the README must exist in code (`appsettings.json`, `Program.cs`, deploy script). No phantom vars.
|
||||
|
||||
**Maintainability**
|
||||
- One-line cross-links to the deploy runbook for procedural detail; the README is a schema reference, not a procedure manual.
|
||||
|
||||
## Blackbox Tests
|
||||
|
||||
| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
|
||||
|--------|------------------------|-------------|-------------------|----------------|
|
||||
| AC-1 | Repo at HEAD | `rg "JwtConfig__Secret|JWT_SECRET" secrets/README.md` | Empty result | Accuracy |
|
||||
| AC-2 | Repo at HEAD | Schema section names all 5 new env vars | All present | Accuracy |
|
||||
| AC-3 | Repo at HEAD | Diff README required-list against `.env.example` | No orphans on either side | Accuracy |
|
||||
| AC-4 | Repo at HEAD | Host-side subsection read | Ownership/perms guidance present | — |
|
||||
| AC-5 | Fresh operator dry-run | Follow README + `.env.example` to a working deploy | Deploy reaches `/health/ready` 200 | Maintainability |
|
||||
|
||||
## Constraints
|
||||
|
||||
- Do not change behaviour. This is a docs-only ticket.
|
||||
- Keep the README short — operators do not read long files. Refactor the existing structure rather than appending.
|
||||
|
||||
## Risks & Mitigation
|
||||
|
||||
**Risk 1: Out-of-band consumers of the old schema**
|
||||
- *Risk*: Internal wikis, runbooks, or CI templates may still reference `JwtConfig__Secret`.
|
||||
- *Mitigation*: Out of scope here. Note in the commit message that operators should grep their own infra for the obsolete name.
|
||||
|
||||
**Risk 2: README and `.env.example` drift again on the next change**
|
||||
- *Risk*: A future cycle adds a new env var to one but not the other.
|
||||
- *Mitigation*: A LESSONS-style note in `_docs/LESSONS.md` to suggest a CI lint or pre-commit check is the right long-term fix, but that is a separate hardening ticket — out of scope for this hotfix.
|
||||
Reference in New Issue
Block a user