mirror of
https://github.com/azaion/admin.git
synced 2026-06-21 13:01:08 +00:00
[AZ-552..AZ-557] Cycle-2 hotfix task intake (6 specs, 11 pts)
Materializes cycle-2 hotfix sprint task specs from security_report_cycle2.md findings. All six roll up to epic AZ-530 per the `cycle-2-hotfix` / `AZ-530-followup` Jira labels. Total 11 story points; gates the next deploy. Tasks: - AZ-552 drop_jwt_secret_deploy_preflight (1 pt) — F-INFRA-1 Critical - AZ-553 bind_mount_es256_keys (2 pts) — F-INFRA-2 Critical - AZ-554 persist_dataprotection_keys (2 pts) — F-INFRA-3 High - AZ-555 secrets_readme_es256_rewrite (1 pt) — F-INFRA-4 High - AZ-556 unify_login_error_codes (2 pts) — F-AUTH-1+F-AUTH-3 High - AZ-557 mfa_brute_force_lockout (3 pts) — F-AUTH-2 High Also: - _dependencies_table.md updated (25 tasks / 82 pts; hotfix landing order) - _autodev_state.md rolled to step: 10 (Implement) not_started - _process_leftovers/2026-05-14_suite_infra_jwt_secret_drift.md logs the out-of-scope suite-level _infra/deploy/webserver/ JWT_SECRET drift — separate Jira ticket needed against the suite repo, not blocking. Step 9 (New Task) cycle-2-hotfix-intake output. Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -1,30 +1,36 @@
|
||||
# Dependencies Table
|
||||
|
||||
**Date**: 2026-05-14 (post batch 4 cycle 2; previous 2026-05-14)
|
||||
**Total Tasks**: 19 (7 done test tasks + 4 done product tasks + 5 done cross-workspace + 3 done CMMC + 5 done auth-modernization)
|
||||
**Total Complexity Points**: 71
|
||||
**Date**: 2026-05-14 (post cycle-2 security audit; previous 2026-05-14)
|
||||
**Total Tasks**: 25 (7 done test tasks + 4 done product tasks + 5 done cross-workspace + 3 done CMMC + 5 done auth-modernization + 6 todo cycle-2 hotfix)
|
||||
**Total Complexity Points**: 82 (71 done + 11 todo)
|
||||
|
||||
| Task | Name | Complexity | Dependencies | Epic | Status |
|
||||
|--------|-------------------------------|-----------:|-------------------------|--------|--------|
|
||||
| AZ-189 | test_infrastructure | 5 | None | AZ-188 | done |
|
||||
| AZ-190 | auth_tests | 3 | AZ-189 | AZ-188 | done |
|
||||
| AZ-191 | user_mgmt_tests | 5 | AZ-189, AZ-190 | AZ-188 | done |
|
||||
| AZ-192 | hardware_tests | 3 | AZ-189, AZ-190 | AZ-188 | done |
|
||||
| AZ-193 | resource_tests | 5 | AZ-189, AZ-190, AZ-192 | AZ-188 | done |
|
||||
| AZ-194 | security_tests | 3 | AZ-189, AZ-190 | AZ-188 | done |
|
||||
| AZ-195 | resilience_perf_tests | 5 | AZ-189, AZ-190 | AZ-188 | done |
|
||||
| AZ-183 | resources_table_update_api | 3 | None | AZ-181 | done |
|
||||
| AZ-196 | register_device_endpoint | 2 | None | AZ-181 | done |
|
||||
| AZ-197 | remove_hardware_id | 3 | None | AZ-181 | done |
|
||||
| AZ-513 | classes_crud_routes | 3 | None | AZ-509 | done |
|
||||
| AZ-531 | refresh_token_flow | 5 | None | AZ-529 | done |
|
||||
| AZ-532 | asymmetric_signing_jwks | 5 | None | AZ-529 | done |
|
||||
| AZ-533 | mission_token_uav | 5 | AZ-531 | AZ-529 | done |
|
||||
| AZ-534 | totp_2fa_login | 5 | None (coord. AZ-531/537) | AZ-529 | done |
|
||||
| AZ-535 | logout_revocation | 3 | AZ-531 | AZ-529 | done |
|
||||
| AZ-536 | argon2id_password_hashing | 3 | None | AZ-530 | done |
|
||||
| AZ-537 | login_rate_limit_lockout | 3 | None (coord. AZ-536) | AZ-530 | done |
|
||||
| AZ-538 | cors_https_only_hsts | 2 | None | AZ-530 | done |
|
||||
| Task | Name | Complexity | Dependencies | Epic | Status |
|
||||
|--------|-------------------------------------|-----------:|-------------------------|--------|--------|
|
||||
| AZ-189 | test_infrastructure | 5 | None | AZ-188 | done |
|
||||
| AZ-190 | auth_tests | 3 | AZ-189 | AZ-188 | done |
|
||||
| AZ-191 | user_mgmt_tests | 5 | AZ-189, AZ-190 | AZ-188 | done |
|
||||
| AZ-192 | hardware_tests | 3 | AZ-189, AZ-190 | AZ-188 | done |
|
||||
| AZ-193 | resource_tests | 5 | AZ-189, AZ-190, AZ-192 | AZ-188 | done |
|
||||
| AZ-194 | security_tests | 3 | AZ-189, AZ-190 | AZ-188 | done |
|
||||
| AZ-195 | resilience_perf_tests | 5 | AZ-189, AZ-190 | AZ-188 | done |
|
||||
| AZ-183 | resources_table_update_api | 3 | None | AZ-181 | done |
|
||||
| AZ-196 | register_device_endpoint | 2 | None | AZ-181 | done |
|
||||
| AZ-197 | remove_hardware_id | 3 | None | AZ-181 | done |
|
||||
| AZ-513 | classes_crud_routes | 3 | None | AZ-509 | done |
|
||||
| AZ-531 | refresh_token_flow | 5 | None | AZ-529 | done |
|
||||
| AZ-532 | asymmetric_signing_jwks | 5 | None | AZ-529 | done |
|
||||
| AZ-533 | mission_token_uav | 5 | AZ-531 | AZ-529 | done |
|
||||
| AZ-534 | totp_2fa_login | 5 | None (coord. AZ-531/537) | AZ-529 | done |
|
||||
| AZ-535 | logout_revocation | 3 | AZ-531 | AZ-529 | done |
|
||||
| AZ-536 | argon2id_password_hashing | 3 | None | AZ-530 | done |
|
||||
| AZ-537 | login_rate_limit_lockout | 3 | None (coord. AZ-536) | AZ-530 | done |
|
||||
| AZ-538 | cors_https_only_hsts | 2 | None | AZ-530 | done |
|
||||
| AZ-552 | drop_jwt_secret_deploy_preflight | 1 | None | AZ-530 | todo |
|
||||
| AZ-553 | bind_mount_es256_keys | 2 | AZ-552 | AZ-530 | todo |
|
||||
| AZ-554 | persist_dataprotection_keys | 2 | AZ-553 | AZ-530 | todo |
|
||||
| AZ-555 | secrets_readme_es256_rewrite | 1 | AZ-552, AZ-553, AZ-554 | AZ-530 | todo |
|
||||
| AZ-556 | unify_login_error_codes | 2 | None | AZ-530 | todo |
|
||||
| AZ-557 | mfa_brute_force_lockout | 3 | AZ-534, AZ-537 | AZ-530 | todo |
|
||||
|
||||
## Notes
|
||||
|
||||
@@ -35,3 +41,4 @@
|
||||
- **Cross-workspace verifier work** (satellite-provider, gps-denied, ui must switch from HS256 shared secret to JWKS verification, plus add denylist polling) is intentionally **deferred** to per-workspace tickets, to be filed once admin's AZ-529 epic is close to shipping.
|
||||
- AZ-513 added 2026-05-13 (cross-workspace prerequisite from `ui/` workspace AZ-512). Filed under epic AZ-509.
|
||||
- AZ-197 originally listed `Component: Admin API, Loader`; the Loader workspace was architecturally retired (see `suite/_docs/_repo-config.yaml` `unresolved:loader-retirement-arch-doc`) and the spec was adapted on 2026-05-13 to be admin-only.
|
||||
- **AZ-552..AZ-557 added 2026-05-14** as the cycle-2 hotfix sprint blocking the next deploy. All six roll up to **AZ-530** per the `cycle-2-hotfix` / `AZ-530-followup` Jira labels. Source of truth: `_docs/05_security/security_report_cycle2.md` "Tracker Follow-Ups" section. 11 story points total. Recommended landing order: AZ-552 → AZ-553 → AZ-554 → AZ-555 (docs) in one PR train; AZ-556 + AZ-557 (auth-surface) can land in parallel with the deploy chain. None of the six depend on the deferred Medium / Low items (AZ-NEW-7..AZ-NEW-15 — see security_report_cycle2.md "Open" table).
|
||||
|
||||
@@ -0,0 +1,89 @@
|
||||
# Drop Obsolete `JwtConfig__Secret` From Deploy Preflight
|
||||
|
||||
**Task**: AZ-552_drop_jwt_secret_deploy_preflight
|
||||
**Name**: Drop obsolete `JwtConfig__Secret` from deploy preflight
|
||||
**Description**: `scripts/start-services.sh` still hard-requires `ASPNETCORE_JwtConfig__Secret`, the HS256-era env var that AZ-532 removed. A correctly-configured cycle-2 deploy fails at preflight before the container starts. Replace the check with the new ES256 inputs (`KeysFolder` + `ActiveKid`).
|
||||
**Complexity**: 1 point
|
||||
**Dependencies**: None
|
||||
**Component**: Deploy / scripts
|
||||
**Tracker**: AZ-552
|
||||
**Epic**: AZ-530
|
||||
**CMMC ref**: SC.L2-3.13.11 (FIPS-validated cryptography — cycle-2 ES256 supersedes HS256)
|
||||
**Source**: `_docs/05_security/security_report_cycle2.md` F-INFRA-1 (Critical, deploy-blocking); `_docs/05_security/infrastructure_review_cycle2.md` §F-2026Q2-INFRA-1
|
||||
|
||||
## Problem
|
||||
|
||||
`scripts/start-services.sh` calls `require_env ... ASPNETCORE_JwtConfig__Secret` against the obsolete HS256 symmetric secret. AZ-532 removed `JwtConfig.Secret` from `Azaion.Common/Configs/JwtConfig.cs` — `Program.cs` now configures JwtBearer via `IssuerSigningKeyResolver` backed by `JwtSigningKeyProvider`, which reads ES256 PEMs from `JwtConfig.KeysFolder` and selects the active key by `JwtConfig.ActiveKid`. A cycle-2 deploy that follows the new `.env.example` (which does NOT set `JwtConfig__Secret`) fails the preflight gate and never starts the container. Operators who work around this by setting a dummy `JwtConfig__Secret=dummy` immediately hit F-INFRA-2 (no key folder mounted), so the workaround doesn't help.
|
||||
|
||||
## Outcome
|
||||
|
||||
- Cycle-2 deploys that supply `ASPNETCORE_JwtConfig__KeysFolder` + `ASPNETCORE_JwtConfig__ActiveKid` pass preflight without `JwtConfig__Secret` being set.
|
||||
- Cycle-2 deploys that omit `KeysFolder` or `ActiveKid` fail preflight with a clear, actionable error naming the missing variable.
|
||||
- The deploy script no longer references `JwtConfig__Secret` anywhere.
|
||||
- `.env.example` no longer documents `JwtConfig__Secret`.
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
|
||||
- Edit `scripts/start-services.sh`: replace the `require_env ... ASPNETCORE_JwtConfig__Secret` line with the cycle-2 required pair.
|
||||
- Audit `scripts/_lib.sh`, `scripts/deploy.sh`, `scripts/pull-images.sh`, `scripts/health-check.sh` and `.env.example` for any other reference to `JwtConfig__Secret` / `JWT_SECRET`; remove them.
|
||||
- Update `_docs/04_deploy/` if any deploy doc still names `JwtConfig__Secret` as required.
|
||||
|
||||
### Excluded
|
||||
|
||||
- The bind-mount of the keys folder itself — that is AZ-553. This ticket only stops the deploy from failing on the obsolete env var; AZ-553 makes the keys actually reach the container.
|
||||
- `secrets/README.md` rewrite — that is AZ-555.
|
||||
- The suite-level `_infra/deploy/webserver/` flow that still uses `JWT_SECRET`. That is owned by the suite repo, not admin. Logged separately as a process leftover.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: Deploy preflight passes without `JwtConfig__Secret`**
|
||||
Given `ASPNETCORE_JwtConfig__KeysFolder=/etc/azaion/jwt-keys` and `ASPNETCORE_JwtConfig__ActiveKid=<kid>` are set
|
||||
And `ASPNETCORE_JwtConfig__Secret` is unset
|
||||
When `scripts/start-services.sh` runs preflight
|
||||
Then preflight completes successfully and the container is started.
|
||||
|
||||
**AC-2: Preflight fails clearly when `KeysFolder` is missing**
|
||||
Given `ASPNETCORE_JwtConfig__ActiveKid` is set but `ASPNETCORE_JwtConfig__KeysFolder` is unset
|
||||
When `scripts/start-services.sh` runs preflight
|
||||
Then the script exits non-zero with an error message that names `ASPNETCORE_JwtConfig__KeysFolder`.
|
||||
|
||||
**AC-3: Preflight fails clearly when `ActiveKid` is missing**
|
||||
Given `ASPNETCORE_JwtConfig__KeysFolder` is set but `ASPNETCORE_JwtConfig__ActiveKid` is unset
|
||||
When `scripts/start-services.sh` runs preflight
|
||||
Then the script exits non-zero with an error message that names `ASPNETCORE_JwtConfig__ActiveKid`.
|
||||
|
||||
**AC-4: No references to `JwtConfig__Secret` remain in `scripts/` or `.env.example`**
|
||||
Given the admin repo at HEAD
|
||||
When `rg "JwtConfig__Secret"` is run against `scripts/` and `.env.example`
|
||||
Then no matches are returned.
|
||||
|
||||
## Non-Functional Requirements
|
||||
|
||||
**Compatibility**
|
||||
- Existing operators with both old and new env vars set must not be broken by the change — the old var is simply ignored.
|
||||
|
||||
## Blackbox Tests
|
||||
|
||||
| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
|
||||
|--------|------------------------|-------------|-------------------|----------------|
|
||||
| AC-1 | Env: `KeysFolder`+`ActiveKid` set, `Secret` unset | Run `start-services.sh` preflight | Preflight passes, container starts | — |
|
||||
| AC-2 | Env: `ActiveKid` set, `KeysFolder` unset | Run `start-services.sh` preflight | Exit non-zero, error names `KeysFolder` | — |
|
||||
| AC-3 | Env: `KeysFolder` set, `ActiveKid` unset | Run `start-services.sh` preflight | Exit non-zero, error names `ActiveKid` | — |
|
||||
| AC-4 | Repo at HEAD | `rg "JwtConfig__Secret" scripts/ .env.example` | Empty result | — |
|
||||
|
||||
## Constraints
|
||||
|
||||
- Must not change any runtime behaviour of the application — this is a script-only change.
|
||||
- Error messages must come from the existing `require_env` helper in `_lib.sh` (do not add a new ad-hoc error path).
|
||||
|
||||
## Risks & Mitigation
|
||||
|
||||
**Risk 1: Operators with stale `.env` files**
|
||||
- *Risk*: An operator with an old `.env` that sets `JwtConfig__Secret` but not the new pair will see the deploy fail at preflight.
|
||||
- *Mitigation*: This is the desired behaviour. Document the migration in `secrets/README.md` (AZ-555) so the failure is self-diagnosable.
|
||||
|
||||
**Risk 2: Suite-level `_infra/deploy/webserver/` deploy still works the old way**
|
||||
- *Risk*: The suite-level webserver deploy pipeline at `suite/_infra/deploy/webserver/` injects `JWT_SECRET` and would still appear functional even though it shouldn't. Out-of-scope here; logged as suite-level leftover.
|
||||
- *Mitigation*: Cross-reference the suite-level follow-up ticket in this task's commit message so the linkage is discoverable.
|
||||
@@ -0,0 +1,105 @@
|
||||
# Bind-Mount ES256 Keys Folder Into Container + Host-Side Procedure
|
||||
|
||||
**Task**: AZ-553_bind_mount_es256_keys
|
||||
**Name**: Bind-mount ES256 keys folder into container + host-side procedure
|
||||
**Description**: `JwtSigningKeyProvider` fail-fasts on startup if `JwtConfig.KeysFolder` is missing or empty. The deploy script never makes `secrets/jwt-keys` visible inside the container — the path is host-only. Add the bind-mount, document the host-side directory, and gate it through the existing env-template machinery.
|
||||
**Complexity**: 2 points
|
||||
**Dependencies**: AZ-552 (preflight must accept the new env vars first)
|
||||
**Component**: Deploy / scripts + host provisioning
|
||||
**Tracker**: AZ-553
|
||||
**Epic**: AZ-530
|
||||
**CMMC ref**: SC.L2-3.13.10 (key management), SC.L2-3.13.11 (FIPS-validated crypto)
|
||||
**Source**: `_docs/05_security/security_report_cycle2.md` F-INFRA-2 (Critical, deploy-blocking); `_docs/05_security/infrastructure_review_cycle2.md` §F-2026Q2-INFRA-2
|
||||
|
||||
## Problem
|
||||
|
||||
`Azaion.AdminApi/Program.cs` configures JwtBearer to resolve signing keys via `JwtSigningKeyProvider`, which reads PEM files from `JwtConfig.KeysFolder` at startup and fails fast if the folder is missing, empty, or unreadable. `appsettings.json` defaults `KeysFolder` to a container-local path (e.g. `/etc/azaion/jwt-keys`), but `scripts/start-services.sh` does not bind-mount the host's `secrets/jwt-keys` into that path. Even if AZ-552 unblocks the preflight, the container itself fails to start because the keys folder inside the container is empty.
|
||||
|
||||
## Outcome
|
||||
|
||||
- Container has read-only access to ES256 PEMs at the path named by `JwtConfig.KeysFolder` at startup.
|
||||
- The host-side directory is parameterised by an env var (`DEPLOY_HOST_JWT_KEYS_DIR`) so the deploy works from CI runners, dev VMs, and production hosts without code changes.
|
||||
- `JwtSigningKeyProvider` startup probe passes on a freshly-deployed cycle-2 container with a populated host-side keys folder.
|
||||
- `.env.example` documents the new host-side env var with a sensible default and a note that it must point at a directory the container user can read.
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
|
||||
- Edit `scripts/start-services.sh`: add `--volume "$DEPLOY_HOST_JWT_KEYS_DIR:/etc/azaion/jwt-keys:ro"` (or the equivalent in the docker-compose stack the script orchestrates) to the admin container args.
|
||||
- Preflight: also require `DEPLOY_HOST_JWT_KEYS_DIR` to be set AND to point at an existing directory containing at least one `.pem` file.
|
||||
- Document `DEPLOY_HOST_JWT_KEYS_DIR` in `.env.example`.
|
||||
- Add a short host-side runbook section to `_docs/04_deploy/` (or extend the existing one) covering: where the host directory lives, how to populate it (use `scripts/generate-jwt-key.sh`), file ownership/permissions (readable by the container's `app` UID), and rotation.
|
||||
- Sanity-check that `JwtConfig.KeysFolder` in `appsettings.json` matches the container-side mount target the script uses; if not, align them.
|
||||
|
||||
### Excluded
|
||||
|
||||
- Operational key-rotation policy (cadence, key-revocation lifecycle). Tracked separately if not already captured in cycle-1 deploy docs.
|
||||
- DataProtection key folder — that is AZ-554.
|
||||
- `secrets/README.md` rewrite for the new env vars — that is AZ-555.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: Container can read PEMs at the configured KeysFolder path**
|
||||
Given `DEPLOY_HOST_JWT_KEYS_DIR=/var/lib/azaion/jwt-keys` exists on the host and contains a valid PEM
|
||||
And `ASPNETCORE_JwtConfig__KeysFolder=/etc/azaion/jwt-keys`
|
||||
And `ASPNETCORE_JwtConfig__ActiveKid=<kid>` matches a PEM in the folder
|
||||
When `scripts/start-services.sh` deploys the admin container
|
||||
Then the container reports a successful startup and the readiness probe on `/health/ready` returns 200.
|
||||
|
||||
**AC-2: Preflight fails when the host-side directory is missing**
|
||||
Given `DEPLOY_HOST_JWT_KEYS_DIR` is set but the directory does not exist
|
||||
When `scripts/start-services.sh` runs preflight
|
||||
Then the script exits non-zero with an error message that names the missing directory.
|
||||
|
||||
**AC-3: Preflight fails when the host-side directory is empty**
|
||||
Given `DEPLOY_HOST_JWT_KEYS_DIR` is set and the directory exists but contains no `.pem` files
|
||||
When `scripts/start-services.sh` runs preflight
|
||||
Then the script exits non-zero with an actionable error referencing the missing PEMs.
|
||||
|
||||
**AC-4: Bind-mount is read-only**
|
||||
Given the admin container is running with the new bind-mount
|
||||
When the container process attempts to write to `/etc/azaion/jwt-keys/`
|
||||
Then the write is denied by the filesystem layer.
|
||||
|
||||
**AC-5: `.env.example` documents the new variable**
|
||||
Given the admin repo at HEAD
|
||||
When `.env.example` is opened
|
||||
Then it contains a `DEPLOY_HOST_JWT_KEYS_DIR=` entry with a comment explaining its purpose.
|
||||
|
||||
## Non-Functional Requirements
|
||||
|
||||
**Security**
|
||||
- The bind-mount MUST be read-only. The container process never has write authority over the key store.
|
||||
|
||||
**Reliability**
|
||||
- Preflight failures must be explicit and actionable — operators should not have to inspect container logs to diagnose a missing mount.
|
||||
|
||||
## Blackbox Tests
|
||||
|
||||
| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
|
||||
|--------|------------------------|-------------|-------------------|----------------|
|
||||
| AC-1 | Host dir populated, env vars set | Run `start-services.sh`, then `curl /health/ready` | Container up, `/health/ready` → 200 | — |
|
||||
| AC-2 | Env var set, host dir missing | Run `start-services.sh` preflight | Exit non-zero, error names the directory | — |
|
||||
| AC-3 | Env var set, host dir present but empty | Run `start-services.sh` preflight | Exit non-zero, error names the missing PEMs | — |
|
||||
| AC-4 | Container running, attempt write inside container | `touch /etc/azaion/jwt-keys/x` from container | Permission denied | Security |
|
||||
| AC-5 | Repo at HEAD | Open `.env.example` | `DEPLOY_HOST_JWT_KEYS_DIR=` is documented | — |
|
||||
|
||||
## Constraints
|
||||
|
||||
- Must follow the existing `_lib.sh` helper style — do not introduce a new preflight pattern.
|
||||
- Must work on both the CI runner deploy path AND the production host deploy path (no host-specific hard-coding).
|
||||
|
||||
## Risks & Mitigation
|
||||
|
||||
**Risk 1: Container user cannot read the host-side PEMs**
|
||||
- *Risk*: PEMs owned by `root:root 600` on the host are invisible to the container's `app` user.
|
||||
- *Mitigation*: Host runbook prescribes ownership/perms (`chown app:app`, `chmod 640` or `0400`). Include a verification step in the runbook.
|
||||
|
||||
**Risk 2: KeysFolder default in `appsettings.json` drifts from the mount target**
|
||||
- *Risk*: If `JwtConfig.KeysFolder` in `appsettings.json` says `/secrets/jwt-keys` but the bind-mount uses `/etc/azaion/jwt-keys`, the container fails-fast even with the mount in place.
|
||||
- *Mitigation*: AC-1 covers the end-to-end happy path; if it fails, the alignment is the first thing to check. Document the contract in the runbook.
|
||||
|
||||
**Risk 3: Multiple PEMs, ambiguous active key**
|
||||
- *Risk*: If the operator drops several PEMs into the folder, `JwtSigningKeyProvider` must still pick one deterministically.
|
||||
- *Mitigation*: Already covered by AZ-NEW-10 (F-AUTH-7) which tightens `ActiveKid` semantics. This task only ensures the folder is reachable.
|
||||
@@ -0,0 +1,111 @@
|
||||
# Persist DataProtection Keys Folder + Fail-Fast In Production
|
||||
|
||||
**Task**: AZ-554_persist_dataprotection_keys
|
||||
**Name**: Persist DataProtection keys folder + fail-fast in Production
|
||||
**Description**: DataProtection (which encrypts MFA secrets, recovery codes, and any other protected payload) currently writes its master keys to an ephemeral container path. Every container restart rotates the master key, which permanently locks every MFA-enrolled user out of their account. Persist the key folder onto the host, document the env var, and fail-fast in Production if the folder is unconfigured.
|
||||
**Complexity**: 2 points
|
||||
**Dependencies**: AZ-553 (host-side volume pattern + runbook section established)
|
||||
**Component**: Admin API + Deploy / scripts
|
||||
**Tracker**: AZ-554
|
||||
**Epic**: AZ-530
|
||||
**CMMC ref**: SC.L2-3.13.10 (key management), IA.L2-3.5.7 (passwords, secrets storage)
|
||||
**Source**: `_docs/05_security/security_report_cycle2.md` F-INFRA-3 (High); `_docs/05_security/infrastructure_review_cycle2.md` §F-2026Q2-INFRA-3
|
||||
|
||||
## Problem
|
||||
|
||||
`Program.cs` configures `services.AddDataProtection()` without specifying a persistent key folder. ASP.NET Core defaults the key ring to an OS-specific path that, inside a container, lives on the writable layer and vanishes on every restart. AZ-534 uses DataProtection to encrypt the per-user TOTP `MfaSecret` at rest; AZ-534 also encrypts recovery codes. When the master key rotates on restart:
|
||||
|
||||
- Existing `MfaSecret` ciphertexts can no longer be decrypted → no user can verify TOTP at login.
|
||||
- Existing recovery-code hashes (if also DataProtection-wrapped) become unusable.
|
||||
|
||||
The net effect on the next `docker restart` is a hard lockout of every MFA-enrolled user. No data is corrupted on disk — but recovery requires either operator intervention or a re-enrolment campaign.
|
||||
|
||||
## Outcome
|
||||
|
||||
- DataProtection master keys persist across container restarts in Production.
|
||||
- In Production, the app refuses to start if `DataProtection.KeysFolder` is unset (no silent fallback to the ephemeral path).
|
||||
- Development environment continues to work with the ephemeral default (no behavioural change for local devs).
|
||||
- `.env.example` and the deploy runbook document the new host-side env var.
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
|
||||
- `Program.cs`: bind `DataProtection.KeysFolder` from configuration, call `PersistKeysToFileSystem(...)` when set, and add a Production-only fail-fast in the `AppEnv.IsProduction()` branch if the folder is unset, missing, or not writable.
|
||||
- `appsettings.json`: add a `DataProtection` section with documented keys (`KeysFolder`).
|
||||
- `scripts/start-services.sh`: bind-mount `$DEPLOY_HOST_DP_KEYS_DIR` onto the container at `/var/lib/azaion/dp-keys` (read-write — DataProtection must rotate keys on its own schedule).
|
||||
- `secrets/<env>.public.env`: set `ASPNETCORE_DataProtection__KeysFolder=/var/lib/azaion/dp-keys` in production/staging templates.
|
||||
- `.env.example`: document `DEPLOY_HOST_DP_KEYS_DIR`.
|
||||
- Extend the deploy runbook section authored by AZ-553 to cover the DataProtection mount alongside the JWT mount (same host-side layout, same ownership/perms guidance).
|
||||
|
||||
### Excluded
|
||||
|
||||
- Encrypting the DataProtection keys at rest with a hardware secret (HSM / KMS-wrapped). Larger scope; would belong to a separate hardening epic.
|
||||
- Cross-instance key sharing for a horizontally-scaled admin deployment. Currently single-instance per environment.
|
||||
- Reading the AZ-534 / AZ-NEW-12 user-cache invalidation concern — out of scope for this ticket.
|
||||
- `secrets/README.md` rewrite — AZ-555.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: MFA survives container restart in Production**
|
||||
Given a Production deploy with `DEPLOY_HOST_DP_KEYS_DIR` mounted
|
||||
And a user has enrolled in TOTP MFA before the restart
|
||||
When the admin container is stopped and started again
|
||||
Then the user can complete a fresh `/login` + `/login/mfa` cycle using their existing TOTP authenticator (no recovery code, no re-enrolment).
|
||||
|
||||
**AC-2: Production fails-fast when `KeysFolder` is unset**
|
||||
Given `ASPNETCORE_ENVIRONMENT=Production` and `ASPNETCORE_DataProtection__KeysFolder` is unset
|
||||
When the admin process starts
|
||||
Then the process exits non-zero with a startup-log entry that names `DataProtection.KeysFolder` as the missing/invalid configuration.
|
||||
|
||||
**AC-3: Production fails-fast when `KeysFolder` is not writable**
|
||||
Given `ASPNETCORE_ENVIRONMENT=Production` and `KeysFolder` points at a path that is not writable by the container user
|
||||
When the admin process starts
|
||||
Then the process exits non-zero with a startup-log entry naming the path and the missing permission.
|
||||
|
||||
**AC-4: Development unchanged**
|
||||
Given `ASPNETCORE_ENVIRONMENT=Development` and `KeysFolder` is unset
|
||||
When the admin process starts
|
||||
Then the process starts normally (uses the ephemeral default) and no fail-fast is triggered.
|
||||
|
||||
**AC-5: Mount is read-write**
|
||||
Given the admin container is running with the new bind-mount
|
||||
When the DataProtection key ring rotates (test by writing a probe file `/var/lib/azaion/dp-keys/.probe`)
|
||||
Then the write succeeds.
|
||||
|
||||
## Non-Functional Requirements
|
||||
|
||||
**Reliability**
|
||||
- Container restart MUST NOT invalidate already-issued MFA secrets or DataProtection-wrapped ciphertexts.
|
||||
|
||||
**Security**
|
||||
- Mount must be writable by the container user but not world-readable on the host (`chmod 0700` host-side, container user owns).
|
||||
|
||||
## Blackbox Tests
|
||||
|
||||
| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
|
||||
|--------|------------------------|-------------|-------------------|----------------|
|
||||
| AC-1 | Prod env, mount configured, user MFA-enrolled, restart container | Login + MFA verify after restart | Same TOTP secret still works | Reliability |
|
||||
| AC-2 | Prod env, `KeysFolder` unset | Start admin process | Exit non-zero, log names `DataProtection.KeysFolder` | — |
|
||||
| AC-3 | Prod env, `KeysFolder` read-only path | Start admin process | Exit non-zero, log names path + permission | — |
|
||||
| AC-4 | Dev env, `KeysFolder` unset | Start admin process | Process starts, ephemeral default used | — |
|
||||
| AC-5 | Container running, mount RW | Probe write inside mount | Write succeeds | Security |
|
||||
|
||||
## Constraints
|
||||
|
||||
- Persist via `PersistKeysToFileSystem` on the configured folder; do not introduce a database-backed or third-party key store in this ticket.
|
||||
- Fail-fast must be Production-only — Development workflows depend on the ephemeral default.
|
||||
|
||||
## Risks & Mitigation
|
||||
|
||||
**Risk 1: Existing prod users locked out at first restart after deploy**
|
||||
- *Risk*: The first container restart AFTER this fix ships is fine going forward, but any MFA enrolments done on the cycle-2 build BEFORE this fix are encrypted with an already-lost master key. Those users are still locked out.
|
||||
- *Mitigation*: Cycle 2 has not been deployed to Production yet (the security audit FAILed before deploy). No real users are affected. Document this lifecycle clearly in the runbook so future hotfix sequencing avoids the same trap.
|
||||
|
||||
**Risk 2: Host-side directory permissions wrong**
|
||||
- *Risk*: If the operator creates `$DEPLOY_HOST_DP_KEYS_DIR` as `root:root 700`, the container user cannot write.
|
||||
- *Mitigation*: AC-3 fail-fast catches this immediately on startup. Runbook includes the explicit ownership/perms command.
|
||||
|
||||
**Risk 3: Drift between `appsettings.json` default and the runtime mount target**
|
||||
- *Risk*: Default in `appsettings.json` says one path; deploy script mounts another; container fails-fast.
|
||||
- *Mitigation*: AC-5 indirectly covers this via the probe-write step; runbook section explicitly states the mount target == config value.
|
||||
@@ -0,0 +1,106 @@
|
||||
# Rewrite `secrets/README.md` Schema For ES256 + DataProtection
|
||||
|
||||
**Task**: AZ-555_secrets_readme_es256_rewrite
|
||||
**Name**: Rewrite `secrets/README.md` schema for ES256 + DataProtection
|
||||
**Description**: `secrets/README.md` still documents the obsolete HS256-era `JwtConfig__Secret` env var and omits the new cycle-2 env vars (`JwtConfig__KeysFolder`, `JwtConfig__ActiveKid`, `DataProtection__KeysFolder`, and their `DEPLOY_HOST_*` host-side counterparts). Operators following this README will misconfigure the deploy, producing the same failure modes that F-INFRA-1/2/3 describe. Rewrite the schema section to match the cycle-2 reality.
|
||||
**Complexity**: 1 point
|
||||
**Dependencies**: AZ-552, AZ-553, AZ-554 (all three must define their env vars first so the README documents what actually exists)
|
||||
**Component**: Operator docs / `secrets/`
|
||||
**Tracker**: AZ-555
|
||||
**Epic**: AZ-530
|
||||
**CMMC ref**: CM.L2-3.4.1 (baseline configuration), CM.L2-3.4.2 (security configuration settings)
|
||||
**Source**: `_docs/05_security/security_report_cycle2.md` F-INFRA-4 (High); `_docs/05_security/infrastructure_review_cycle2.md` §F-2026Q2-INFRA-4
|
||||
|
||||
## Problem
|
||||
|
||||
`secrets/README.md` is the canonical operator handover for what env vars to set, where, and why. Today it still:
|
||||
- Lists `ASPNETCORE_JwtConfig__Secret` as a required HS256 symmetric secret with rotation guidance.
|
||||
- Does not document `ASPNETCORE_JwtConfig__KeysFolder` or `ASPNETCORE_JwtConfig__ActiveKid`.
|
||||
- Does not mention DataProtection key persistence at all.
|
||||
- Does not mention the host-side `DEPLOY_HOST_JWT_KEYS_DIR` / `DEPLOY_HOST_DP_KEYS_DIR` bind-mount sources.
|
||||
|
||||
An operator following this README produces a misconfigured deploy. Even after AZ-552/553/554 land, the README will silently steer operators back to the broken pattern.
|
||||
|
||||
## Outcome
|
||||
|
||||
- `secrets/README.md` "Schema" section is the source of truth for cycle-2 env vars.
|
||||
- Removed: every reference to `JwtConfig__Secret` / `JWT_SECRET` for the admin component.
|
||||
- Added: `JwtConfig__KeysFolder`, `JwtConfig__ActiveKid`, `DataProtection__KeysFolder`, plus the `DEPLOY_HOST_*` host-side variables.
|
||||
- Added: a short "Host-side directories" subsection that mirrors the deploy runbook (with a one-line cross-link, not a duplicate).
|
||||
- Added: a "Key rotation" subsection covering both JWT signing keys and DataProtection master keys, with file-ownership / permission guidance.
|
||||
- README's "Files in this folder" inventory matches the actual filesystem layout under `secrets/`.
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
|
||||
- Rewrite `secrets/README.md` Schema section in full.
|
||||
- Update the inventory list to include `jwt-keys/` and (if introduced for prod) the DataProtection key dir handover.
|
||||
- Cross-link to the deploy runbook section authored by AZ-553/AZ-554 — do not duplicate the runbook content here.
|
||||
- Reconcile against `.env.example` so no required env var is listed in one place and not the other.
|
||||
|
||||
### Excluded
|
||||
|
||||
- Cycle-1 sections of the README that are still accurate (signing-cert handover, database connection strings) — leave them alone unless inconsistent.
|
||||
- Operational SOPs that live in `_docs/04_deploy/` — those are owned by the deploy skill.
|
||||
- A real key-rotation runbook (cadence, revocation lifecycle) — only document the file-level guidance here.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: No remaining references to `JwtConfig__Secret`**
|
||||
Given the admin repo at HEAD
|
||||
When `rg "JwtConfig__Secret|JWT_SECRET" secrets/README.md` is run
|
||||
Then no matches are returned.
|
||||
|
||||
**AC-2: New env vars are documented**
|
||||
Given `secrets/README.md` at HEAD
|
||||
When the Schema section is read
|
||||
Then it documents each of: `ASPNETCORE_JwtConfig__KeysFolder`, `ASPNETCORE_JwtConfig__ActiveKid`, `ASPNETCORE_DataProtection__KeysFolder`, `DEPLOY_HOST_JWT_KEYS_DIR`, `DEPLOY_HOST_DP_KEYS_DIR`.
|
||||
|
||||
**AC-3: README and `.env.example` are consistent**
|
||||
Given both files at HEAD
|
||||
When the lists of required env vars are diffed
|
||||
Then every variable required by the README is present in `.env.example` and vice versa (no orphans in either direction).
|
||||
|
||||
**AC-4: File-ownership guidance present**
|
||||
Given `secrets/README.md` at HEAD
|
||||
When the Host-side directories subsection is read
|
||||
Then it states the required ownership/perms for the host-side directories (container user readable for JWT keys, container user writable for DataProtection keys).
|
||||
|
||||
**AC-5: Operator can deploy from README alone**
|
||||
Given a fresh operator who has never seen the cycle-2 deploy
|
||||
When they follow only `secrets/README.md` and `.env.example`
|
||||
Then they end up with a deploy that passes preflight (AZ-552), starts the container (AZ-553), and survives a restart with MFA intact (AZ-554). This is verified by a dry-run review during code review, not by an automated test.
|
||||
|
||||
## Non-Functional Requirements
|
||||
|
||||
**Accuracy**
|
||||
- Every env var named in the README must exist in code (`appsettings.json`, `Program.cs`, deploy script). No phantom vars.
|
||||
|
||||
**Maintainability**
|
||||
- One-line cross-links to the deploy runbook for procedural detail; the README is a schema reference, not a procedure manual.
|
||||
|
||||
## Blackbox Tests
|
||||
|
||||
| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
|
||||
|--------|------------------------|-------------|-------------------|----------------|
|
||||
| AC-1 | Repo at HEAD | `rg "JwtConfig__Secret|JWT_SECRET" secrets/README.md` | Empty result | Accuracy |
|
||||
| AC-2 | Repo at HEAD | Schema section names all 5 new env vars | All present | Accuracy |
|
||||
| AC-3 | Repo at HEAD | Diff README required-list against `.env.example` | No orphans on either side | Accuracy |
|
||||
| AC-4 | Repo at HEAD | Host-side subsection read | Ownership/perms guidance present | — |
|
||||
| AC-5 | Fresh operator dry-run | Follow README + `.env.example` to a working deploy | Deploy reaches `/health/ready` 200 | Maintainability |
|
||||
|
||||
## Constraints
|
||||
|
||||
- Do not change behaviour. This is a docs-only ticket.
|
||||
- Keep the README short — operators do not read long files. Refactor the existing structure rather than appending.
|
||||
|
||||
## Risks & Mitigation
|
||||
|
||||
**Risk 1: Out-of-band consumers of the old schema**
|
||||
- *Risk*: Internal wikis, runbooks, or CI templates may still reference `JwtConfig__Secret`.
|
||||
- *Mitigation*: Out of scope here. Note in the commit message that operators should grep their own infra for the obsolete name.
|
||||
|
||||
**Risk 2: README and `.env.example` drift again on the next change**
|
||||
- *Risk*: A future cycle adds a new env var to one but not the other.
|
||||
- *Mitigation*: A LESSONS-style note in `_docs/LESSONS.md` to suggest a CI lint or pre-commit check is the right long-term fix, but that is a separate hardening ticket — out of scope for this hotfix.
|
||||
@@ -0,0 +1,145 @@
|
||||
# Unify Login Error Codes To `InvalidCredentials` + Reorder `IsEnabled` Check
|
||||
|
||||
**Task**: AZ-556_unify_login_error_codes
|
||||
**Name**: Unify login error codes to `InvalidCredentials` + reorder `IsEnabled` check
|
||||
**Description**: `/login` returns distinguishable error codes (`NoEmailFound` vs `WrongPassword`) and additionally leaks disabled-account status by checking `IsEnabled` *after* password verification. Combined with the new per-account lockout, an attacker can pre-filter a credential-stuffing list to known-real accounts and selectively trigger lockout DoS. Collapse both paths to a single opaque `InvalidCredentials` code and move the `IsEnabled` check to BEFORE the password verify (timing-equivalent rejection).
|
||||
**Complexity**: 2 points
|
||||
**Dependencies**: None (touches AZ-537 lockout logic but that work is already shipped)
|
||||
**Component**: Services (UserService) + Common (BusinessException)
|
||||
**Tracker**: AZ-556
|
||||
**Epic**: AZ-530
|
||||
**CMMC ref**: IA.L2-3.5.11 (obscure feedback of authentication information), AC.L2-3.1.8 (limit unsuccessful login attempts)
|
||||
**Source**: `_docs/05_security/security_report_cycle2.md` F-AUTH-1 + F-AUTH-3 (High); `_docs/05_security/static_analysis_cycle2.md` §F-2026Q2-AUTH-1, §F-2026Q2-AUTH-3
|
||||
|
||||
## Problem
|
||||
|
||||
`Azaion.Services/UserService.ValidateUser` (~lines 120–148) and `Azaion.Common/BusinessException.cs` (codes 10 + 30, ~lines 33–52) expose two materially-distinguishable login failure signals:
|
||||
|
||||
1. `BusinessException(NoEmailFound)` — code 10, message "No such email found." — when the email doesn't exist.
|
||||
2. `BusinessException(WrongPassword)` — code 30, message "Passwords do not match." — when the email exists but the password is wrong.
|
||||
|
||||
A client can trivially separate "real account" from "unknown account" via this signal. Combined with the cycle-2 per-account lockout (AZ-537), an attacker can:
|
||||
- Enumerate real accounts at request volume.
|
||||
- Selectively trigger lockout on real accounts to DoS specific users.
|
||||
- Pre-filter credential-stuffing lists to maximise hit rate.
|
||||
|
||||
Separately, `ValidateUser` runs the password verify (Argon2id) *before* checking `IsEnabled`. A disabled account therefore takes the slow Argon2id path AND returns a different error from a wrong-password path — both timing and error-shape leak the disabled state.
|
||||
|
||||
## Outcome
|
||||
|
||||
- `/login` returns the same error code, HTTP status, response shape, and human-readable message for: unknown email, wrong password, and disabled account.
|
||||
- The new unified path takes effectively the same wall-clock time for all three rejection categories (constant-time within the resolution practical for a request-response API).
|
||||
- The order of checks in `ValidateUser` is: short-circuit `IsEnabled` first, then password verify, then lockout-on-failure accounting.
|
||||
- Audit log still distinguishes the three categories internally (so SecOps can analyse them) — the leak is only fixed at the wire.
|
||||
- Existing callers of `BusinessException` codes 10 and 30 continue to work; the codes themselves are deprecated in favour of the new `InvalidCredentials` code, with a migration plan documented in the BusinessException file.
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
|
||||
- Introduce a new `BusinessException` code (e.g. `InvalidCredentials`, code 70 or next-available) with a single opaque message.
|
||||
- Update `Azaion.Services/UserService.ValidateUser` to:
|
||||
- Look up the user (or get a `null` for unknown email).
|
||||
- If user is `null` OR `!IsEnabled`, perform a **dummy Argon2id verify** against a known constant hash to equalise timing, then throw `InvalidCredentials`. (The lockout accounting branch is skipped — there is nothing to lock out.)
|
||||
- If user exists and is enabled, run real Argon2id verify; on mismatch, run the existing failure-count + lockout pipeline, then throw `InvalidCredentials`.
|
||||
- On lockout-state-reached, also throw `InvalidCredentials` with the existing `Retry-After` header populated.
|
||||
- Update `Azaion.Services/AuditLog` callers: each rejection path still records its true reason (`LoginFailed_UnknownEmail`, `LoginFailed_WrongPassword`, `LoginFailed_AccountDisabled`) for internal forensics.
|
||||
- Update tests under `e2e/Azaion.E2E/Tests/` to assert the new unified wire response and verify the audit-log internal distinction.
|
||||
- Document the deprecation of codes 10 and 30 in a comment near their declaration (do not delete — there may be cross-workspace consumers).
|
||||
|
||||
### Excluded
|
||||
|
||||
- A full constant-time audit of every error path in admin — only the `/login` path is in scope.
|
||||
- Account-discovery via response timing on other endpoints (`/users/me/mfa/*` etc.). Tracked separately under F-AUTH-4 / AZ-NEW-7.
|
||||
- Changing the lockout policy itself — AZ-537 owns the policy; this ticket only changes which path leads to lockout accounting.
|
||||
- UI changes to map the new code. The UI already shows a generic "Invalid credentials" string for both codes today, so no UI change is required (verify during code review).
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: Unknown email returns `InvalidCredentials`**
|
||||
Given `POST /login` with email that does not exist in the `users` table
|
||||
When the request is processed
|
||||
Then the response is the same `InvalidCredentials` error code, HTTP status, and body as a wrong-password attempt on a known account.
|
||||
|
||||
**AC-2: Wrong password returns `InvalidCredentials`**
|
||||
Given `POST /login` with a known email and a wrong password
|
||||
When the request is processed
|
||||
Then the response is `InvalidCredentials`, AND the account's `failed_login_count` is incremented per the existing AZ-537 policy.
|
||||
|
||||
**AC-3: Disabled account returns `InvalidCredentials`**
|
||||
Given `POST /login` with a known email belonging to a disabled (`IsEnabled = false`) account
|
||||
When the request is processed
|
||||
Then the response is `InvalidCredentials`, AND the audit log records the rejection as `LoginFailed_AccountDisabled` internally.
|
||||
|
||||
**AC-4: `IsEnabled` checked before password verify**
|
||||
Given a disabled account
|
||||
When `ValidateUser` runs
|
||||
Then the password verify is **not** invoked on the real stored hash for that account. (Verified by an instrumented test that asserts no Argon2id-against-the-real-hash call occurs.)
|
||||
|
||||
**AC-5: Timing equivalence (smoke level)**
|
||||
Given 1000 paired requests — half "unknown email", half "known email wrong password"
|
||||
When request latency is measured at the API edge
|
||||
Then the median and p95 latencies of the two groups are within 5% of each other. (Not a constant-time crypto guarantee; this is a smoke ceiling against gross timing differences.)
|
||||
|
||||
**AC-6: Audit log still distinguishes internally**
|
||||
Given the three rejection categories
|
||||
When the `audit_events` table is read after a representative run
|
||||
Then each category produces a distinct internal action label, with email + IP + timestamp.
|
||||
|
||||
**AC-7: Lockout still triggers**
|
||||
Given a known enabled account hit with N wrong passwords (per AZ-537 policy)
|
||||
When the threshold is reached
|
||||
Then the account is locked AND the lockout response uses `InvalidCredentials` + the existing `Retry-After` header.
|
||||
|
||||
## Non-Functional Requirements
|
||||
|
||||
**Security**
|
||||
- The wire response carries no signal that distinguishes the three rejection categories — code, body, headers, AND timing within the AC-5 ceiling.
|
||||
|
||||
**Compatibility**
|
||||
- BusinessException codes 10 and 30 remain defined (deprecated, comment-marked) for any cross-workspace caller. Removal scheduled in a separate ticket only after a deprecation window.
|
||||
|
||||
## Unit Tests
|
||||
|
||||
| AC Ref | What to Test | Required Outcome |
|
||||
|--------|-------------|-----------------|
|
||||
| AC-1 | `ValidateUser` with unknown email | Throws `InvalidCredentials`, performs dummy verify |
|
||||
| AC-2 | `ValidateUser` with wrong password | Throws `InvalidCredentials`, increments failure count |
|
||||
| AC-3 | `ValidateUser` with disabled account | Throws `InvalidCredentials`, no real-hash verify |
|
||||
| AC-4 | Instrumented Argon2id wrapper | Real-hash verify not called for disabled account |
|
||||
| AC-6 | AuditLog write for each category | Distinct internal action label per rejection |
|
||||
| AC-7 | Threshold-reaching wrong-password sequence | Throws `InvalidCredentials` + `Retry-After` |
|
||||
|
||||
## Blackbox Tests
|
||||
|
||||
| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
|
||||
|--------|------------------------|-------------|-------------------|----------------|
|
||||
| AC-1 | DB empty of test email | `POST /login` unknown | `InvalidCredentials`, identical body to AC-2 | Security |
|
||||
| AC-2 | Known account, wrong pwd | `POST /login` wrong | `InvalidCredentials`, failure count + 1 | — |
|
||||
| AC-3 | Known disabled account | `POST /login` correct pwd | `InvalidCredentials`, identical body to AC-1/AC-2 | Security |
|
||||
| AC-5 | 1000 paired requests | Latency p50, p95 | Within 5% | Security |
|
||||
| AC-7 | At-threshold account, one more wrong | `POST /login` | `InvalidCredentials` + `Retry-After` | — |
|
||||
|
||||
## Constraints
|
||||
|
||||
- The dummy Argon2id verify must use the same `AuthConfig` parameters as the real verify (same time/memory cost) so timing equalises authentically.
|
||||
- Audit log writes must NOT be skipped just because the wire-side error is unified — internal forensics depend on the distinction.
|
||||
- Lockout accounting MUST NOT run on the "unknown email" path (there is no row to update).
|
||||
|
||||
## Risks & Mitigation
|
||||
|
||||
**Risk 1: Dummy Argon2id verify becomes a DoS amplifier**
|
||||
- *Risk*: An attacker hitting `/login` with rotating unknown emails now consumes Argon2id CPU per request even though no real account exists.
|
||||
- *Mitigation*: This is the desired property — without it, the timing leak survives. The per-IP rate limiter (existing, from AZ-537) bounds the amplification.
|
||||
|
||||
**Risk 2: Constant test-hash leaks**
|
||||
- *Risk*: If the dummy verify uses a checked-in hash of a known password, an attacker who reads the binary can craft a request that "succeeds" against the dummy path.
|
||||
- *Mitigation*: The dummy verify path always throws `InvalidCredentials` regardless of result — the verify is run only for timing, not for control-flow.
|
||||
|
||||
**Risk 3: BusinessException code churn breaks cross-workspace verifiers**
|
||||
- *Risk*: Other admin-API consumers (gps-denied, satellite-provider) decode response bodies and may pattern-match on the old codes.
|
||||
- *Mitigation*: Old codes remain defined; new code is additive. Audit cross-workspace usage during code review.
|
||||
|
||||
**Risk 4: UI shows different strings for each old code**
|
||||
- *Risk*: UI may have branched on code 10 vs 30. If so, both branches now show the same message, but the UI continues to map both to "Invalid credentials".
|
||||
- *Mitigation*: Code review checklist: verify `ui/` workspace already maps codes 10/30 to the same display string. If not, file a UI ticket.
|
||||
@@ -0,0 +1,142 @@
|
||||
# Wire MFA Brute-Force Into Per-Account Lockout / Rate-Limit Pipeline
|
||||
|
||||
**Task**: AZ-557_mfa_brute_force_lockout
|
||||
**Name**: Wire MFA brute-force into per-account lockout / rate-limit pipeline
|
||||
**Description**: `MfaService.VerifyForLogin` validates the second-factor TOTP but never increments `failed_login_count` and is excluded from `AuditLog.CountRecentFailedLogins`. An attacker who has captured the step-1 token from a known account can brute-force the 6-digit TOTP at full request volume from rotating IPs without ever tripping lockout. Bring MFA failures into the same per-account lockout/rate-limit pipeline that AZ-537 built for `/login`.
|
||||
**Complexity**: 3 points
|
||||
**Dependencies**: AZ-537 (lockout pipeline), AZ-534 (MFA endpoints)
|
||||
**Component**: Services (MfaService, AuditLog, UserService) + Admin API
|
||||
**Tracker**: AZ-557
|
||||
**Epic**: AZ-530
|
||||
**CMMC ref**: IA.L2-3.5.11 (obscure feedback of authentication information), AC.L2-3.1.8 (limit unsuccessful login attempts)
|
||||
**Source**: `_docs/05_security/security_report_cycle2.md` F-AUTH-2 (High); `_docs/05_security/static_analysis_cycle2.md` §F-2026Q2-AUTH-2
|
||||
|
||||
## Problem
|
||||
|
||||
The cycle-2 auth pipeline has a gap between login factor 1 and factor 2:
|
||||
|
||||
- `Azaion.Services/UserService.ValidateUser` (AZ-537) tracks `failed_login_count`, enforces the per-account rate limit, and trips lockout when the threshold is crossed.
|
||||
- `Azaion.Services/MfaService.VerifyForLogin` (~lines 247–278) ALSO returns `Wrong code` on a failed TOTP, but it does NOT call into the lockout pipeline.
|
||||
- `Azaion.Services/AuditLog.CountRecentFailedLogins` (~lines 53–63) queries only `LoginFailed` events; it ignores `MfaLoginFailed`.
|
||||
|
||||
Concretely: an attacker who phishes (or steals via XSS, or sniffs from logs) a step-1 MFA token can hit `/login/mfa` at full request rate, trying all 10^6 TOTP candidates within the token's lifetime, from rotating source IPs. Per-IP rate-limit doesn't apply (rotates IPs). Per-account rate-limit doesn't apply (different code path). The account never locks out. This entirely defeats the second factor.
|
||||
|
||||
## Outcome
|
||||
|
||||
- A failed MFA verify increments the same `failed_login_count` that AZ-537 maintains for password failures.
|
||||
- `AuditLog.CountRecentFailedLogins` counts `MfaLoginFailed` events alongside `LoginFailed` events.
|
||||
- When the combined failed-count crosses the AZ-537 threshold, the account locks out — regardless of whether the failures were password-side, MFA-side, or mixed.
|
||||
- The MFA verify rejects with the same response shape it does today (no new error code on the wire), but a locked-out account at the MFA step now responds with the existing lockout response + `Retry-After`.
|
||||
- Per-IP rate-limit also applies to `/login/mfa` (defence in depth even if IPs aren't rotating fast enough).
|
||||
- Audit log still records the rejection category (`MfaLoginFailed` vs `LoginFailed`) internally so SecOps can analyse separately.
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
|
||||
- `Azaion.Services/MfaService.VerifyForLogin`:
|
||||
- On TOTP mismatch: call the shared failure-accounting path (extract from `UserService.ValidateUser` into a private helper or a tiny internal collaborator that both services use). Same increment, same threshold check, same `Retry-After` shape on lockout.
|
||||
- On lockout-state-reached during MFA verify: throw the same lockout response shape that the password path throws.
|
||||
- `Azaion.Services/AuditLog.CountRecentFailedLogins`: extend the query to `WHERE action IN ('LoginFailed', 'MfaLoginFailed')`.
|
||||
- `Azaion.AdminApi/Program.cs`: attach the existing `LoginPerIpPolicy` (or a parallel `MfaLoginPerIpPolicy` with the same parameters) to the `/login/mfa` endpoint.
|
||||
- Tests under `e2e/Azaion.E2E/Tests/`: add cases for the four failure-mix scenarios (5×password-fail → lock; 5×MFA-fail → lock; 3×password + 2×MFA → lock; 1×password + 4×MFA → lock). Plus the `/login/mfa` per-IP rate-limit smoke test.
|
||||
- Audit-log assertion: each rejection step writes the right internal action label.
|
||||
|
||||
### Excluded
|
||||
|
||||
- `/users/me/mfa/{enroll,confirm,disable}` rate limiting — that is F-AUTH-4 / AZ-NEW-7. Separate ticket because step-up auth there is different.
|
||||
- TOTP code reuse / replay detection beyond the existing window — out of scope.
|
||||
- Recovery-code brute-force protection — recovery codes are high-entropy (verified in security audit); not the same risk profile.
|
||||
- Cross-workspace verifier changes (gps-denied, satellite-provider, ui) — none required; this is admin-only.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: 5 wrong TOTP attempts lock the account**
|
||||
Given a known account with valid step-1 token
|
||||
When 5 sequential `POST /login/mfa` calls with wrong TOTP are made (per AZ-537 policy threshold)
|
||||
Then the 6th call (any code, even the correct one) returns the lockout response with `Retry-After`.
|
||||
|
||||
**AC-2: Mixed-mode failures aggregate**
|
||||
Given a known account
|
||||
When 3 wrong-password attempts then 2 wrong-MFA attempts occur within the rate-limit window
|
||||
Then the 6th attempt (password-side OR MFA-side) returns the lockout response.
|
||||
|
||||
**AC-3: `CountRecentFailedLogins` includes MFA failures**
|
||||
Given an account with 2 `LoginFailed` and 3 `MfaLoginFailed` rows within the window
|
||||
When `CountRecentFailedLogins` is called
|
||||
Then it returns 5.
|
||||
|
||||
**AC-4: `/login/mfa` is per-IP rate-limited**
|
||||
Given a single source IP sending `/login/mfa` requests at high volume across many fabricated step-1 tokens
|
||||
When the per-IP burst limit is exceeded
|
||||
Then subsequent requests from that IP are rejected at the rate-limit layer (HTTP 429 or equivalent), regardless of which account is targeted.
|
||||
|
||||
**AC-5: Locked-out account at MFA step gets the same response shape**
|
||||
Given a locked-out account that still presents a valid step-1 token
|
||||
When `POST /login/mfa` is called
|
||||
Then the response code, body, and `Retry-After` header match the response of a locked-out account at `/login` (no new shape).
|
||||
|
||||
**AC-6: Audit log records the right action**
|
||||
Given a wrong-TOTP rejection
|
||||
When the `audit_events` row is read
|
||||
Then `action = 'MfaLoginFailed'` (not `LoginFailed`), with email + IP + timestamp.
|
||||
|
||||
**AC-7: Correct TOTP after partial failures resets nothing prematurely**
|
||||
Given an account with 2 prior MFA failures (under the threshold)
|
||||
When the user submits the correct TOTP
|
||||
Then verification succeeds AND the failure count is reset per the existing AZ-537 reset policy.
|
||||
|
||||
## Non-Functional Requirements
|
||||
|
||||
**Security**
|
||||
- Wire response from `/login/mfa` carries no extra information distinguishing "wrong code" from "locked out" beyond what AZ-537 already exposes at `/login`.
|
||||
|
||||
**Performance**
|
||||
- The shared failure-accounting helper is hot-path. Must not add a network round-trip or extra DB transaction beyond what the password path already does.
|
||||
|
||||
**Reliability**
|
||||
- Race condition on concurrent failures must not undercount — use the same locking / `RowVersion` pattern that AZ-537 uses (verify in code review).
|
||||
|
||||
## Unit Tests
|
||||
|
||||
| AC Ref | What to Test | Required Outcome |
|
||||
|--------|-------------|-----------------|
|
||||
| AC-1 | `MfaService.VerifyForLogin` 5 wrong TOTPs | 6th call throws lockout, `Retry-After` populated |
|
||||
| AC-2 | Mixed 3-password + 2-MFA | 6th throws lockout |
|
||||
| AC-3 | `CountRecentFailedLogins` with mixed actions | Returns combined count |
|
||||
| AC-6 | Audit-log row after wrong TOTP | `action = 'MfaLoginFailed'` |
|
||||
| AC-7 | Correct TOTP after 2 failures | Verify succeeds, failure count reset |
|
||||
|
||||
## Blackbox Tests
|
||||
|
||||
| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
|
||||
|--------|------------------------|-------------|-------------------|----------------|
|
||||
| AC-1 | Known MFA-enrolled account | 5 wrong-TOTP → 6th any-TOTP | Lockout + `Retry-After` | Security |
|
||||
| AC-2 | Same account | 3 wrong-pwd + 2 wrong-TOTP → 6th any | Lockout | Security |
|
||||
| AC-4 | Single IP, many step-1 tokens | Burst `/login/mfa` calls | HTTP 429 at threshold | Security |
|
||||
| AC-5 | Locked account, valid step-1 | `POST /login/mfa` | Identical shape to `/login` lockout response | Security |
|
||||
| AC-7 | Account with 2 prior MFA fails | Correct TOTP | Verify OK, count reset | Reliability |
|
||||
|
||||
## Constraints
|
||||
|
||||
- Re-use the AZ-537 `AuthConfig.LockoutOptions` and `RateLimitOptions` values — do not introduce a separate threshold tuned just for MFA.
|
||||
- The shared failure-accounting helper must live where both `UserService` and `MfaService` can reach it without one importing the other.
|
||||
- Audit-log writes happen in the same transaction as the failure-count increment to avoid drift between the two stores.
|
||||
|
||||
## Risks & Mitigation
|
||||
|
||||
**Risk 1: Helper extraction breaks AZ-537 behaviour**
|
||||
- *Risk*: Pulling the accounting code out of `ValidateUser` introduces a regression on the password path.
|
||||
- *Mitigation*: AZ-537's existing E2E tests are exercised at every test run; any regression appears immediately. Code review focuses on parity.
|
||||
|
||||
**Risk 2: MFA step-up endpoints still unprotected**
|
||||
- *Risk*: `/users/me/mfa/{enroll,confirm,disable}` remain rate-unlimited; a stolen access token can brute-force MFA disable.
|
||||
- *Mitigation*: Tracked separately under F-AUTH-4 / AZ-NEW-7. Not in scope here.
|
||||
|
||||
**Risk 3: Friendly false lockouts during legitimate roaming**
|
||||
- *Risk*: A user who fat-fingers their TOTP across two devices in quick succession may now lock out where they wouldn't before.
|
||||
- *Mitigation*: The threshold values are the same as AZ-537's already-shipping `/login` thresholds, which were sized for password fat-fingering. The risk is bounded by that prior tuning.
|
||||
|
||||
**Risk 4: Test environment has rate-limit windows that interfere**
|
||||
- *Risk*: E2E tests that hit `/login/mfa` repeatedly may themselves be rate-limited.
|
||||
- *Mitigation*: Existing E2E test infrastructure already manages this for `/login` (per `AZ-189` test infrastructure). Re-use the same reset hooks.
|
||||
@@ -2,13 +2,13 @@
|
||||
|
||||
## Current Step
|
||||
flow: existing-code
|
||||
step: 12
|
||||
name: Test-Spec Sync
|
||||
status: not_started
|
||||
step: 10
|
||||
name: Implement
|
||||
status: in_progress
|
||||
sub_step:
|
||||
phase: 0
|
||||
name: awaiting-invocation
|
||||
detail: ""
|
||||
detail: "cycle-2 hotfix sprint: AZ-552..AZ-557 (11 pts) under epic AZ-530"
|
||||
retry_count: 0
|
||||
cycle: 2
|
||||
tracker: jira
|
||||
|
||||
@@ -0,0 +1,44 @@
|
||||
# Leftover: Suite-Level `_infra/deploy/webserver/` Still Uses Obsolete `JWT_SECRET`
|
||||
|
||||
**Timestamp**: 2026-05-14T09:18:00+03:00
|
||||
**Type**: cross-workspace follow-up (non-blocking)
|
||||
**Source**: `/autodev` Step 9 (cycle-2 hotfix intake) — ownership verification before drafting AZ-552..AZ-557
|
||||
|
||||
## What was blocked
|
||||
|
||||
Nothing in this workspace is blocked. This leftover records a related concern that lives **outside** the admin repo and therefore cannot be addressed by tickets AZ-552..AZ-557 (which are admin-only).
|
||||
|
||||
## Observation
|
||||
|
||||
The suite-level deploy artifact at `/Users/obezdienie001/dev/azaion/suite/_infra/deploy/webserver/` still references the obsolete HS256-era `JWT_SECRET` for the admin service:
|
||||
|
||||
- `_infra/deploy/webserver/docker-compose.yml:45,52-60,71,141` — `JWT_SECRET: ${JWT_SECRET}` injected into admin and at least one other service.
|
||||
- `_infra/deploy/webserver/install.sh:87` — `JWT_SECRET=changeme` default in the installer.
|
||||
- `_infra/deploy/webserver/.env.example:20` — `JWT_SECRET=changeme` template.
|
||||
|
||||
The cycle-2 admin build no longer reads `JWT_SECRET` / `JwtConfig__Secret` (AZ-532 removed it; AZ-552 will remove the script-level preflight check). The suite-level webserver deploy path is therefore out-of-sync: it injects an env var that the cycle-2 admin container ignores, and it does NOT set up the new `JwtConfig__KeysFolder` / `JwtConfig__ActiveKid` / `DataProtection__KeysFolder` env vars that the cycle-2 admin REQUIRES.
|
||||
|
||||
If anyone deploys cycle-2 admin via `_infra/deploy/webserver/`, the container will fail-fast at startup (same root cause as F-INFRA-1/F-INFRA-2, just at a different layer).
|
||||
|
||||
## What the suite repo needs to do
|
||||
|
||||
Equivalent of AZ-552..AZ-555 but against the `_infra/deploy/webserver/` flow:
|
||||
|
||||
1. Drop `JWT_SECRET` injection for the admin service from `docker-compose.yml`, `install.sh`, `.env.example`.
|
||||
2. Add `JwtConfig__KeysFolder`, `JwtConfig__ActiveKid`, `DataProtection__KeysFolder` env vars to the admin service block.
|
||||
3. Bind-mount the host-side JWT keys folder and DataProtection keys folder into the admin container (mirroring AZ-553/AZ-554's pattern from the admin repo).
|
||||
4. Update `_infra/deploy/webserver/README.md` schema.
|
||||
|
||||
These changes must land in the **suite repo** (`/Users/obezdienie001/dev/azaion/suite/`), not the admin repo. They are NOT covered by AZ-552..AZ-557.
|
||||
|
||||
## Recommended action
|
||||
|
||||
File a Jira ticket against the suite repo under epic AZ-530 (or whichever epic owns suite-level deploy) titled "Update `_infra/deploy/webserver/` for cycle-2 ES256 + DataProtection env vars". Cross-link from this admin's AZ-553/AZ-555 commit messages so reviewers see the suite-side follow-up exists.
|
||||
|
||||
Estimated complexity: 3 points (mirrors AZ-552 + AZ-553 + AZ-555 combined but in a different repo).
|
||||
|
||||
## Replay status
|
||||
|
||||
- Replay attempted: not yet — this is informational only; no automated tracker write is queued.
|
||||
- Next replay opportunity: at the start of the next `/autodev` invocation, the user should be reminded of this entry. If they confirm the suite ticket has been filed, delete this leftover.
|
||||
- Blocker for autodev progress: **NO**. This leftover does not block any cycle-2 hotfix work in the admin repo.
|
||||
Reference in New Issue
Block a user