[AZ-556] [AZ-557] Close cycle-2 hotfix sprint, hand off to Run Tests

Archive AZ-556 + AZ-557 task specs, mark dependencies table 25/25 done (82/82 pts), write batch_06_cycle2_report.md and the sprint-level implementation_report_auth_modernization_cycle2_hotfix.md, advance _autodev_state.md to step 11 (Run Tests). Per implement skill step 16, the final-suite gate is owned by the test-run skill; not run here to avoid duplicate full runs. Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-21 15:41:08 +00:00 · 2026-05-14 09:59:23 +03:00
parent 4bf2e689cb
commit 8b7d8a4275
6 changed files with 141 additions and 11 deletions
@@ -0,0 +1,145 @@
+# Unify Login Error Codes To `InvalidCredentials` + Reorder `IsEnabled` Check
+
+**Task**: AZ-556_unify_login_error_codes
+**Name**: Unify login error codes to `InvalidCredentials` + reorder `IsEnabled` check
+**Description**: `/login` returns distinguishable error codes (`NoEmailFound` vs `WrongPassword`) and additionally leaks disabled-account status by checking `IsEnabled` *after* password verification. Combined with the new per-account lockout, an attacker can pre-filter a credential-stuffing list to known-real accounts and selectively trigger lockout DoS. Collapse both paths to a single opaque `InvalidCredentials` code and move the `IsEnabled` check to BEFORE the password verify (timing-equivalent rejection).
+**Complexity**: 2 points
+**Dependencies**: None (touches AZ-537 lockout logic but that work is already shipped)
+**Component**: Services (UserService) + Common (BusinessException)
+**Tracker**: AZ-556
+**Epic**: AZ-530
+**CMMC ref**: IA.L2-3.5.11 (obscure feedback of authentication information), AC.L2-3.1.8 (limit unsuccessful login attempts)
+**Source**: `_docs/05_security/security_report_cycle2.md` F-AUTH-1 + F-AUTH-3 (High); `_docs/05_security/static_analysis_cycle2.md` §F-2026Q2-AUTH-1, §F-2026Q2-AUTH-3
+
+## Problem
+
+`Azaion.Services/UserService.ValidateUser` (~lines 120–148) and `Azaion.Common/BusinessException.cs` (codes 10 + 30, ~lines 33–52) expose two materially-distinguishable login failure signals:
+
+1. `BusinessException(NoEmailFound)` — code 10, message "No such email found." — when the email doesn't exist.
+2. `BusinessException(WrongPassword)` — code 30, message "Passwords do not match." — when the email exists but the password is wrong.
+
+A client can trivially separate "real account" from "unknown account" via this signal. Combined with the cycle-2 per-account lockout (AZ-537), an attacker can:
+- Enumerate real accounts at request volume.
+- Selectively trigger lockout on real accounts to DoS specific users.
+- Pre-filter credential-stuffing lists to maximise hit rate.
+
+Separately, `ValidateUser` runs the password verify (Argon2id) *before* checking `IsEnabled`. A disabled account therefore takes the slow Argon2id path AND returns a different error from a wrong-password path — both timing and error-shape leak the disabled state.
+
+## Outcome
+
+- `/login` returns the same error code, HTTP status, response shape, and human-readable message for: unknown email, wrong password, and disabled account.
+- The new unified path takes effectively the same wall-clock time for all three rejection categories (constant-time within the resolution practical for a request-response API).
+- The order of checks in `ValidateUser` is: short-circuit `IsEnabled` first, then password verify, then lockout-on-failure accounting.
+- Audit log still distinguishes the three categories internally (so SecOps can analyse them) — the leak is only fixed at the wire.
+- Existing callers of `BusinessException` codes 10 and 30 continue to work; the codes themselves are deprecated in favour of the new `InvalidCredentials` code, with a migration plan documented in the BusinessException file.
+
+## Scope
+
+### Included
+
+- Introduce a new `BusinessException` code (e.g. `InvalidCredentials`, code 70 or next-available) with a single opaque message.
+- Update `Azaion.Services/UserService.ValidateUser` to:
+  - Look up the user (or get a `null` for unknown email).
+  - If user is `null` OR `!IsEnabled`, perform a **dummy Argon2id verify** against a known constant hash to equalise timing, then throw `InvalidCredentials`. (The lockout accounting branch is skipped — there is nothing to lock out.)
+  - If user exists and is enabled, run real Argon2id verify; on mismatch, run the existing failure-count + lockout pipeline, then throw `InvalidCredentials`.
+  - On lockout-state-reached, also throw `InvalidCredentials` with the existing `Retry-After` header populated.
+- Update `Azaion.Services/AuditLog` callers: each rejection path still records its true reason (`LoginFailed_UnknownEmail`, `LoginFailed_WrongPassword`, `LoginFailed_AccountDisabled`) for internal forensics.
+- Update tests under `e2e/Azaion.E2E/Tests/` to assert the new unified wire response and verify the audit-log internal distinction.
+- Document the deprecation of codes 10 and 30 in a comment near their declaration (do not delete — there may be cross-workspace consumers).
+
+### Excluded
+
+- A full constant-time audit of every error path in admin — only the `/login` path is in scope.
+- Account-discovery via response timing on other endpoints (`/users/me/mfa/*` etc.). Tracked separately under F-AUTH-4 / AZ-NEW-7.
+- Changing the lockout policy itself — AZ-537 owns the policy; this ticket only changes which path leads to lockout accounting.
+- UI changes to map the new code. The UI already shows a generic "Invalid credentials" string for both codes today, so no UI change is required (verify during code review).
+
+## Acceptance Criteria
+
+**AC-1: Unknown email returns `InvalidCredentials`**
+Given `POST /login` with email that does not exist in the `users` table
+When the request is processed
+Then the response is the same `InvalidCredentials` error code, HTTP status, and body as a wrong-password attempt on a known account.
+
+**AC-2: Wrong password returns `InvalidCredentials`**
+Given `POST /login` with a known email and a wrong password
+When the request is processed
+Then the response is `InvalidCredentials`, AND the account's `failed_login_count` is incremented per the existing AZ-537 policy.
+
+**AC-3: Disabled account returns `InvalidCredentials`**
+Given `POST /login` with a known email belonging to a disabled (`IsEnabled = false`) account
+When the request is processed
+Then the response is `InvalidCredentials`, AND the audit log records the rejection as `LoginFailed_AccountDisabled` internally.
+
+**AC-4: `IsEnabled` checked before password verify**
+Given a disabled account
+When `ValidateUser` runs
+Then the password verify is **not** invoked on the real stored hash for that account. (Verified by an instrumented test that asserts no Argon2id-against-the-real-hash call occurs.)
+
+**AC-5: Timing equivalence (smoke level)**
+Given 1000 paired requests — half "unknown email", half "known email wrong password"
+When request latency is measured at the API edge
+Then the median and p95 latencies of the two groups are within 5% of each other. (Not a constant-time crypto guarantee; this is a smoke ceiling against gross timing differences.)
+
+**AC-6: Audit log still distinguishes internally**
+Given the three rejection categories
+When the `audit_events` table is read after a representative run
+Then each category produces a distinct internal action label, with email + IP + timestamp.
+
+**AC-7: Lockout still triggers**
+Given a known enabled account hit with N wrong passwords (per AZ-537 policy)
+When the threshold is reached
+Then the account is locked AND the lockout response uses `InvalidCredentials` + the existing `Retry-After` header.
+
+## Non-Functional Requirements
+
+**Security**
+- The wire response carries no signal that distinguishes the three rejection categories — code, body, headers, AND timing within the AC-5 ceiling.
+
+**Compatibility**
+- BusinessException codes 10 and 30 remain defined (deprecated, comment-marked) for any cross-workspace caller. Removal scheduled in a separate ticket only after a deprecation window.
+
+## Unit Tests
+
+| AC Ref | What to Test | Required Outcome |
+|--------|-------------|-----------------|
+| AC-1 | `ValidateUser` with unknown email | Throws `InvalidCredentials`, performs dummy verify |
+| AC-2 | `ValidateUser` with wrong password | Throws `InvalidCredentials`, increments failure count |
+| AC-3 | `ValidateUser` with disabled account | Throws `InvalidCredentials`, no real-hash verify |
+| AC-4 | Instrumented Argon2id wrapper | Real-hash verify not called for disabled account |
+| AC-6 | AuditLog write for each category | Distinct internal action label per rejection |
+| AC-7 | Threshold-reaching wrong-password sequence | Throws `InvalidCredentials` + `Retry-After` |
+
+## Blackbox Tests
+
+| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
+|--------|------------------------|-------------|-------------------|----------------|
+| AC-1 | DB empty of test email | `POST /login` unknown | `InvalidCredentials`, identical body to AC-2 | Security |
+| AC-2 | Known account, wrong pwd | `POST /login` wrong | `InvalidCredentials`, failure count + 1 | — |
+| AC-3 | Known disabled account | `POST /login` correct pwd | `InvalidCredentials`, identical body to AC-1/AC-2 | Security |
+| AC-5 | 1000 paired requests | Latency p50, p95 | Within 5% | Security |
+| AC-7 | At-threshold account, one more wrong | `POST /login` | `InvalidCredentials` + `Retry-After` | — |
+
+## Constraints
+
+- The dummy Argon2id verify must use the same `AuthConfig` parameters as the real verify (same time/memory cost) so timing equalises authentically.
+- Audit log writes must NOT be skipped just because the wire-side error is unified — internal forensics depend on the distinction.
+- Lockout accounting MUST NOT run on the "unknown email" path (there is no row to update).
+
+## Risks & Mitigation
+
+**Risk 1: Dummy Argon2id verify becomes a DoS amplifier**
+- *Risk*: An attacker hitting `/login` with rotating unknown emails now consumes Argon2id CPU per request even though no real account exists.
+- *Mitigation*: This is the desired property — without it, the timing leak survives. The per-IP rate limiter (existing, from AZ-537) bounds the amplification.
+
+**Risk 2: Constant test-hash leaks**
+- *Risk*: If the dummy verify uses a checked-in hash of a known password, an attacker who reads the binary can craft a request that "succeeds" against the dummy path.
+- *Mitigation*: The dummy verify path always throws `InvalidCredentials` regardless of result — the verify is run only for timing, not for control-flow.
+
+**Risk 3: BusinessException code churn breaks cross-workspace verifiers**
+- *Risk*: Other admin-API consumers (gps-denied, satellite-provider) decode response bodies and may pattern-match on the old codes.
+- *Mitigation*: Old codes remain defined; new code is additive. Audit cross-workspace usage during code review.
+
+**Risk 4: UI shows different strings for each old code**
+- *Risk*: UI may have branched on code 10 vs 30. If so, both branches now show the same message, but the UI continues to map both to "Invalid credentials".
+- *Mitigation*: Code review checklist: verify `ui/` workspace already maps codes 10/30 to the same display string. If not, file a UI ticket.
@@ -0,0 +1,142 @@
+# Wire MFA Brute-Force Into Per-Account Lockout / Rate-Limit Pipeline
+
+**Task**: AZ-557_mfa_brute_force_lockout
+**Name**: Wire MFA brute-force into per-account lockout / rate-limit pipeline
+**Description**: `MfaService.VerifyForLogin` validates the second-factor TOTP but never increments `failed_login_count` and is excluded from `AuditLog.CountRecentFailedLogins`. An attacker who has captured the step-1 token from a known account can brute-force the 6-digit TOTP at full request volume from rotating IPs without ever tripping lockout. Bring MFA failures into the same per-account lockout/rate-limit pipeline that AZ-537 built for `/login`.
+**Complexity**: 3 points
+**Dependencies**: AZ-537 (lockout pipeline), AZ-534 (MFA endpoints)
+**Component**: Services (MfaService, AuditLog, UserService) + Admin API
+**Tracker**: AZ-557
+**Epic**: AZ-530
+**CMMC ref**: IA.L2-3.5.11 (obscure feedback of authentication information), AC.L2-3.1.8 (limit unsuccessful login attempts)
+**Source**: `_docs/05_security/security_report_cycle2.md` F-AUTH-2 (High); `_docs/05_security/static_analysis_cycle2.md` §F-2026Q2-AUTH-2
+
+## Problem
+
+The cycle-2 auth pipeline has a gap between login factor 1 and factor 2:
+
+- `Azaion.Services/UserService.ValidateUser` (AZ-537) tracks `failed_login_count`, enforces the per-account rate limit, and trips lockout when the threshold is crossed.
+- `Azaion.Services/MfaService.VerifyForLogin` (~lines 247–278) ALSO returns `Wrong code` on a failed TOTP, but it does NOT call into the lockout pipeline.
+- `Azaion.Services/AuditLog.CountRecentFailedLogins` (~lines 53–63) queries only `LoginFailed` events; it ignores `MfaLoginFailed`.
+
+Concretely: an attacker who phishes (or steals via XSS, or sniffs from logs) a step-1 MFA token can hit `/login/mfa` at full request rate, trying all 10^6 TOTP candidates within the token's lifetime, from rotating source IPs. Per-IP rate-limit doesn't apply (rotates IPs). Per-account rate-limit doesn't apply (different code path). The account never locks out. This entirely defeats the second factor.
+
+## Outcome
+
+- A failed MFA verify increments the same `failed_login_count` that AZ-537 maintains for password failures.
+- `AuditLog.CountRecentFailedLogins` counts `MfaLoginFailed` events alongside `LoginFailed` events.
+- When the combined failed-count crosses the AZ-537 threshold, the account locks out — regardless of whether the failures were password-side, MFA-side, or mixed.
+- The MFA verify rejects with the same response shape it does today (no new error code on the wire), but a locked-out account at the MFA step now responds with the existing lockout response + `Retry-After`.
+- Per-IP rate-limit also applies to `/login/mfa` (defence in depth even if IPs aren't rotating fast enough).
+- Audit log still records the rejection category (`MfaLoginFailed` vs `LoginFailed`) internally so SecOps can analyse separately.
+
+## Scope
+
+### Included
+
+- `Azaion.Services/MfaService.VerifyForLogin`:
+  - On TOTP mismatch: call the shared failure-accounting path (extract from `UserService.ValidateUser` into a private helper or a tiny internal collaborator that both services use). Same increment, same threshold check, same `Retry-After` shape on lockout.
+  - On lockout-state-reached during MFA verify: throw the same lockout response shape that the password path throws.
+- `Azaion.Services/AuditLog.CountRecentFailedLogins`: extend the query to `WHERE action IN ('LoginFailed', 'MfaLoginFailed')`.
+- `Azaion.AdminApi/Program.cs`: attach the existing `LoginPerIpPolicy` (or a parallel `MfaLoginPerIpPolicy` with the same parameters) to the `/login/mfa` endpoint.
+- Tests under `e2e/Azaion.E2E/Tests/`: add cases for the four failure-mix scenarios (5×password-fail → lock; 5×MFA-fail → lock; 3×password + 2×MFA → lock; 1×password + 4×MFA → lock). Plus the `/login/mfa` per-IP rate-limit smoke test.
+- Audit-log assertion: each rejection step writes the right internal action label.
+
+### Excluded
+
+- `/users/me/mfa/{enroll,confirm,disable}` rate limiting — that is F-AUTH-4 / AZ-NEW-7. Separate ticket because step-up auth there is different.
+- TOTP code reuse / replay detection beyond the existing window — out of scope.
+- Recovery-code brute-force protection — recovery codes are high-entropy (verified in security audit); not the same risk profile.
+- Cross-workspace verifier changes (gps-denied, satellite-provider, ui) — none required; this is admin-only.
+
+## Acceptance Criteria
+
+**AC-1: 5 wrong TOTP attempts lock the account**
+Given a known account with valid step-1 token
+When 5 sequential `POST /login/mfa` calls with wrong TOTP are made (per AZ-537 policy threshold)
+Then the 6th call (any code, even the correct one) returns the lockout response with `Retry-After`.
+
+**AC-2: Mixed-mode failures aggregate**
+Given a known account
+When 3 wrong-password attempts then 2 wrong-MFA attempts occur within the rate-limit window
+Then the 6th attempt (password-side OR MFA-side) returns the lockout response.
+
+**AC-3: `CountRecentFailedLogins` includes MFA failures**
+Given an account with 2 `LoginFailed` and 3 `MfaLoginFailed` rows within the window
+When `CountRecentFailedLogins` is called
+Then it returns 5.
+
+**AC-4: `/login/mfa` is per-IP rate-limited**
+Given a single source IP sending `/login/mfa` requests at high volume across many fabricated step-1 tokens
+When the per-IP burst limit is exceeded
+Then subsequent requests from that IP are rejected at the rate-limit layer (HTTP 429 or equivalent), regardless of which account is targeted.
+
+**AC-5: Locked-out account at MFA step gets the same response shape**
+Given a locked-out account that still presents a valid step-1 token
+When `POST /login/mfa` is called
+Then the response code, body, and `Retry-After` header match the response of a locked-out account at `/login` (no new shape).
+
+**AC-6: Audit log records the right action**
+Given a wrong-TOTP rejection
+When the `audit_events` row is read
+Then `action = 'MfaLoginFailed'` (not `LoginFailed`), with email + IP + timestamp.
+
+**AC-7: Correct TOTP after partial failures resets nothing prematurely**
+Given an account with 2 prior MFA failures (under the threshold)
+When the user submits the correct TOTP
+Then verification succeeds AND the failure count is reset per the existing AZ-537 reset policy.
+
+## Non-Functional Requirements
+
+**Security**
+- Wire response from `/login/mfa` carries no extra information distinguishing "wrong code" from "locked out" beyond what AZ-537 already exposes at `/login`.
+
+**Performance**
+- The shared failure-accounting helper is hot-path. Must not add a network round-trip or extra DB transaction beyond what the password path already does.
+
+**Reliability**
+- Race condition on concurrent failures must not undercount — use the same locking / `RowVersion` pattern that AZ-537 uses (verify in code review).
+
+## Unit Tests
+
+| AC Ref | What to Test | Required Outcome |
+|--------|-------------|-----------------|
+| AC-1 | `MfaService.VerifyForLogin` 5 wrong TOTPs | 6th call throws lockout, `Retry-After` populated |
+| AC-2 | Mixed 3-password + 2-MFA | 6th throws lockout |
+| AC-3 | `CountRecentFailedLogins` with mixed actions | Returns combined count |
+| AC-6 | Audit-log row after wrong TOTP | `action = 'MfaLoginFailed'` |
+| AC-7 | Correct TOTP after 2 failures | Verify succeeds, failure count reset |
+
+## Blackbox Tests
+
+| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
+|--------|------------------------|-------------|-------------------|----------------|
+| AC-1 | Known MFA-enrolled account | 5 wrong-TOTP → 6th any-TOTP | Lockout + `Retry-After` | Security |
+| AC-2 | Same account | 3 wrong-pwd + 2 wrong-TOTP → 6th any | Lockout | Security |
+| AC-4 | Single IP, many step-1 tokens | Burst `/login/mfa` calls | HTTP 429 at threshold | Security |
+| AC-5 | Locked account, valid step-1 | `POST /login/mfa` | Identical shape to `/login` lockout response | Security |
+| AC-7 | Account with 2 prior MFA fails | Correct TOTP | Verify OK, count reset | Reliability |
+
+## Constraints
+
+- Re-use the AZ-537 `AuthConfig.LockoutOptions` and `RateLimitOptions` values — do not introduce a separate threshold tuned just for MFA.
+- The shared failure-accounting helper must live where both `UserService` and `MfaService` can reach it without one importing the other.
+- Audit-log writes happen in the same transaction as the failure-count increment to avoid drift between the two stores.
+
+## Risks & Mitigation
+
+**Risk 1: Helper extraction breaks AZ-537 behaviour**
+- *Risk*: Pulling the accounting code out of `ValidateUser` introduces a regression on the password path.
+- *Mitigation*: AZ-537's existing E2E tests are exercised at every test run; any regression appears immediately. Code review focuses on parity.
+
+**Risk 2: MFA step-up endpoints still unprotected**
+- *Risk*: `/users/me/mfa/{enroll,confirm,disable}` remain rate-unlimited; a stolen access token can brute-force MFA disable.
+- *Mitigation*: Tracked separately under F-AUTH-4 / AZ-NEW-7. Not in scope here.
+
+**Risk 3: Friendly false lockouts during legitimate roaming**
+- *Risk*: A user who fat-fingers their TOTP across two devices in quick succession may now lock out where they wouldn't before.
+- *Mitigation*: The threshold values are the same as AZ-537's already-shipping `/login` thresholds, which were sized for password fat-fingering. The risk is bounded by that prior tuning.
+
+**Risk 4: Test environment has rate-limit windows that interfere**
+- *Risk*: E2E tests that hit `/login/mfa` repeatedly may themselves be rate-limited.
+- *Mitigation*: Existing E2E test infrastructure already manages this for `/login` (per `AZ-189` test infrastructure). Re-use the same reset hooks.