[AZ-556] [AZ-557] Close cycle-2 hotfix sprint, hand off to Run Tests

Archive AZ-556 + AZ-557 task specs, mark dependencies table 25/25 done
(82/82 pts), write batch_06_cycle2_report.md and the sprint-level
implementation_report_auth_modernization_cycle2_hotfix.md, advance
_autodev_state.md to step 11 (Run Tests).

Per implement skill step 16, the final-suite gate is owned by the
test-run skill; not run here to avoid duplicate full runs.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-14 09:59:23 +03:00
parent 4bf2e689cb
commit 8b7d8a4275
6 changed files with 141 additions and 11 deletions
@@ -0,0 +1,145 @@
# Unify Login Error Codes To `InvalidCredentials` + Reorder `IsEnabled` Check
**Task**: AZ-556_unify_login_error_codes
**Name**: Unify login error codes to `InvalidCredentials` + reorder `IsEnabled` check
**Description**: `/login` returns distinguishable error codes (`NoEmailFound` vs `WrongPassword`) and additionally leaks disabled-account status by checking `IsEnabled` *after* password verification. Combined with the new per-account lockout, an attacker can pre-filter a credential-stuffing list to known-real accounts and selectively trigger lockout DoS. Collapse both paths to a single opaque `InvalidCredentials` code and move the `IsEnabled` check to BEFORE the password verify (timing-equivalent rejection).
**Complexity**: 2 points
**Dependencies**: None (touches AZ-537 lockout logic but that work is already shipped)
**Component**: Services (UserService) + Common (BusinessException)
**Tracker**: AZ-556
**Epic**: AZ-530
**CMMC ref**: IA.L2-3.5.11 (obscure feedback of authentication information), AC.L2-3.1.8 (limit unsuccessful login attempts)
**Source**: `_docs/05_security/security_report_cycle2.md` F-AUTH-1 + F-AUTH-3 (High); `_docs/05_security/static_analysis_cycle2.md` §F-2026Q2-AUTH-1, §F-2026Q2-AUTH-3
## Problem
`Azaion.Services/UserService.ValidateUser` (~lines 120148) and `Azaion.Common/BusinessException.cs` (codes 10 + 30, ~lines 3352) expose two materially-distinguishable login failure signals:
1. `BusinessException(NoEmailFound)` — code 10, message "No such email found." — when the email doesn't exist.
2. `BusinessException(WrongPassword)` — code 30, message "Passwords do not match." — when the email exists but the password is wrong.
A client can trivially separate "real account" from "unknown account" via this signal. Combined with the cycle-2 per-account lockout (AZ-537), an attacker can:
- Enumerate real accounts at request volume.
- Selectively trigger lockout on real accounts to DoS specific users.
- Pre-filter credential-stuffing lists to maximise hit rate.
Separately, `ValidateUser` runs the password verify (Argon2id) *before* checking `IsEnabled`. A disabled account therefore takes the slow Argon2id path AND returns a different error from a wrong-password path — both timing and error-shape leak the disabled state.
## Outcome
- `/login` returns the same error code, HTTP status, response shape, and human-readable message for: unknown email, wrong password, and disabled account.
- The new unified path takes effectively the same wall-clock time for all three rejection categories (constant-time within the resolution practical for a request-response API).
- The order of checks in `ValidateUser` is: short-circuit `IsEnabled` first, then password verify, then lockout-on-failure accounting.
- Audit log still distinguishes the three categories internally (so SecOps can analyse them) — the leak is only fixed at the wire.
- Existing callers of `BusinessException` codes 10 and 30 continue to work; the codes themselves are deprecated in favour of the new `InvalidCredentials` code, with a migration plan documented in the BusinessException file.
## Scope
### Included
- Introduce a new `BusinessException` code (e.g. `InvalidCredentials`, code 70 or next-available) with a single opaque message.
- Update `Azaion.Services/UserService.ValidateUser` to:
- Look up the user (or get a `null` for unknown email).
- If user is `null` OR `!IsEnabled`, perform a **dummy Argon2id verify** against a known constant hash to equalise timing, then throw `InvalidCredentials`. (The lockout accounting branch is skipped — there is nothing to lock out.)
- If user exists and is enabled, run real Argon2id verify; on mismatch, run the existing failure-count + lockout pipeline, then throw `InvalidCredentials`.
- On lockout-state-reached, also throw `InvalidCredentials` with the existing `Retry-After` header populated.
- Update `Azaion.Services/AuditLog` callers: each rejection path still records its true reason (`LoginFailed_UnknownEmail`, `LoginFailed_WrongPassword`, `LoginFailed_AccountDisabled`) for internal forensics.
- Update tests under `e2e/Azaion.E2E/Tests/` to assert the new unified wire response and verify the audit-log internal distinction.
- Document the deprecation of codes 10 and 30 in a comment near their declaration (do not delete — there may be cross-workspace consumers).
### Excluded
- A full constant-time audit of every error path in admin — only the `/login` path is in scope.
- Account-discovery via response timing on other endpoints (`/users/me/mfa/*` etc.). Tracked separately under F-AUTH-4 / AZ-NEW-7.
- Changing the lockout policy itself — AZ-537 owns the policy; this ticket only changes which path leads to lockout accounting.
- UI changes to map the new code. The UI already shows a generic "Invalid credentials" string for both codes today, so no UI change is required (verify during code review).
## Acceptance Criteria
**AC-1: Unknown email returns `InvalidCredentials`**
Given `POST /login` with email that does not exist in the `users` table
When the request is processed
Then the response is the same `InvalidCredentials` error code, HTTP status, and body as a wrong-password attempt on a known account.
**AC-2: Wrong password returns `InvalidCredentials`**
Given `POST /login` with a known email and a wrong password
When the request is processed
Then the response is `InvalidCredentials`, AND the account's `failed_login_count` is incremented per the existing AZ-537 policy.
**AC-3: Disabled account returns `InvalidCredentials`**
Given `POST /login` with a known email belonging to a disabled (`IsEnabled = false`) account
When the request is processed
Then the response is `InvalidCredentials`, AND the audit log records the rejection as `LoginFailed_AccountDisabled` internally.
**AC-4: `IsEnabled` checked before password verify**
Given a disabled account
When `ValidateUser` runs
Then the password verify is **not** invoked on the real stored hash for that account. (Verified by an instrumented test that asserts no Argon2id-against-the-real-hash call occurs.)
**AC-5: Timing equivalence (smoke level)**
Given 1000 paired requests — half "unknown email", half "known email wrong password"
When request latency is measured at the API edge
Then the median and p95 latencies of the two groups are within 5% of each other. (Not a constant-time crypto guarantee; this is a smoke ceiling against gross timing differences.)
**AC-6: Audit log still distinguishes internally**
Given the three rejection categories
When the `audit_events` table is read after a representative run
Then each category produces a distinct internal action label, with email + IP + timestamp.
**AC-7: Lockout still triggers**
Given a known enabled account hit with N wrong passwords (per AZ-537 policy)
When the threshold is reached
Then the account is locked AND the lockout response uses `InvalidCredentials` + the existing `Retry-After` header.
## Non-Functional Requirements
**Security**
- The wire response carries no signal that distinguishes the three rejection categories — code, body, headers, AND timing within the AC-5 ceiling.
**Compatibility**
- BusinessException codes 10 and 30 remain defined (deprecated, comment-marked) for any cross-workspace caller. Removal scheduled in a separate ticket only after a deprecation window.
## Unit Tests
| AC Ref | What to Test | Required Outcome |
|--------|-------------|-----------------|
| AC-1 | `ValidateUser` with unknown email | Throws `InvalidCredentials`, performs dummy verify |
| AC-2 | `ValidateUser` with wrong password | Throws `InvalidCredentials`, increments failure count |
| AC-3 | `ValidateUser` with disabled account | Throws `InvalidCredentials`, no real-hash verify |
| AC-4 | Instrumented Argon2id wrapper | Real-hash verify not called for disabled account |
| AC-6 | AuditLog write for each category | Distinct internal action label per rejection |
| AC-7 | Threshold-reaching wrong-password sequence | Throws `InvalidCredentials` + `Retry-After` |
## Blackbox Tests
| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
|--------|------------------------|-------------|-------------------|----------------|
| AC-1 | DB empty of test email | `POST /login` unknown | `InvalidCredentials`, identical body to AC-2 | Security |
| AC-2 | Known account, wrong pwd | `POST /login` wrong | `InvalidCredentials`, failure count + 1 | — |
| AC-3 | Known disabled account | `POST /login` correct pwd | `InvalidCredentials`, identical body to AC-1/AC-2 | Security |
| AC-5 | 1000 paired requests | Latency p50, p95 | Within 5% | Security |
| AC-7 | At-threshold account, one more wrong | `POST /login` | `InvalidCredentials` + `Retry-After` | — |
## Constraints
- The dummy Argon2id verify must use the same `AuthConfig` parameters as the real verify (same time/memory cost) so timing equalises authentically.
- Audit log writes must NOT be skipped just because the wire-side error is unified — internal forensics depend on the distinction.
- Lockout accounting MUST NOT run on the "unknown email" path (there is no row to update).
## Risks & Mitigation
**Risk 1: Dummy Argon2id verify becomes a DoS amplifier**
- *Risk*: An attacker hitting `/login` with rotating unknown emails now consumes Argon2id CPU per request even though no real account exists.
- *Mitigation*: This is the desired property — without it, the timing leak survives. The per-IP rate limiter (existing, from AZ-537) bounds the amplification.
**Risk 2: Constant test-hash leaks**
- *Risk*: If the dummy verify uses a checked-in hash of a known password, an attacker who reads the binary can craft a request that "succeeds" against the dummy path.
- *Mitigation*: The dummy verify path always throws `InvalidCredentials` regardless of result — the verify is run only for timing, not for control-flow.
**Risk 3: BusinessException code churn breaks cross-workspace verifiers**
- *Risk*: Other admin-API consumers (gps-denied, satellite-provider) decode response bodies and may pattern-match on the old codes.
- *Mitigation*: Old codes remain defined; new code is additive. Audit cross-workspace usage during code review.
**Risk 4: UI shows different strings for each old code**
- *Risk*: UI may have branched on code 10 vs 30. If so, both branches now show the same message, but the UI continues to map both to "Invalid credentials".
- *Mitigation*: Code review checklist: verify `ui/` workspace already maps codes 10/30 to the same display string. If not, file a UI ticket.
@@ -0,0 +1,142 @@
# Wire MFA Brute-Force Into Per-Account Lockout / Rate-Limit Pipeline
**Task**: AZ-557_mfa_brute_force_lockout
**Name**: Wire MFA brute-force into per-account lockout / rate-limit pipeline
**Description**: `MfaService.VerifyForLogin` validates the second-factor TOTP but never increments `failed_login_count` and is excluded from `AuditLog.CountRecentFailedLogins`. An attacker who has captured the step-1 token from a known account can brute-force the 6-digit TOTP at full request volume from rotating IPs without ever tripping lockout. Bring MFA failures into the same per-account lockout/rate-limit pipeline that AZ-537 built for `/login`.
**Complexity**: 3 points
**Dependencies**: AZ-537 (lockout pipeline), AZ-534 (MFA endpoints)
**Component**: Services (MfaService, AuditLog, UserService) + Admin API
**Tracker**: AZ-557
**Epic**: AZ-530
**CMMC ref**: IA.L2-3.5.11 (obscure feedback of authentication information), AC.L2-3.1.8 (limit unsuccessful login attempts)
**Source**: `_docs/05_security/security_report_cycle2.md` F-AUTH-2 (High); `_docs/05_security/static_analysis_cycle2.md` §F-2026Q2-AUTH-2
## Problem
The cycle-2 auth pipeline has a gap between login factor 1 and factor 2:
- `Azaion.Services/UserService.ValidateUser` (AZ-537) tracks `failed_login_count`, enforces the per-account rate limit, and trips lockout when the threshold is crossed.
- `Azaion.Services/MfaService.VerifyForLogin` (~lines 247278) ALSO returns `Wrong code` on a failed TOTP, but it does NOT call into the lockout pipeline.
- `Azaion.Services/AuditLog.CountRecentFailedLogins` (~lines 5363) queries only `LoginFailed` events; it ignores `MfaLoginFailed`.
Concretely: an attacker who phishes (or steals via XSS, or sniffs from logs) a step-1 MFA token can hit `/login/mfa` at full request rate, trying all 10^6 TOTP candidates within the token's lifetime, from rotating source IPs. Per-IP rate-limit doesn't apply (rotates IPs). Per-account rate-limit doesn't apply (different code path). The account never locks out. This entirely defeats the second factor.
## Outcome
- A failed MFA verify increments the same `failed_login_count` that AZ-537 maintains for password failures.
- `AuditLog.CountRecentFailedLogins` counts `MfaLoginFailed` events alongside `LoginFailed` events.
- When the combined failed-count crosses the AZ-537 threshold, the account locks out — regardless of whether the failures were password-side, MFA-side, or mixed.
- The MFA verify rejects with the same response shape it does today (no new error code on the wire), but a locked-out account at the MFA step now responds with the existing lockout response + `Retry-After`.
- Per-IP rate-limit also applies to `/login/mfa` (defence in depth even if IPs aren't rotating fast enough).
- Audit log still records the rejection category (`MfaLoginFailed` vs `LoginFailed`) internally so SecOps can analyse separately.
## Scope
### Included
- `Azaion.Services/MfaService.VerifyForLogin`:
- On TOTP mismatch: call the shared failure-accounting path (extract from `UserService.ValidateUser` into a private helper or a tiny internal collaborator that both services use). Same increment, same threshold check, same `Retry-After` shape on lockout.
- On lockout-state-reached during MFA verify: throw the same lockout response shape that the password path throws.
- `Azaion.Services/AuditLog.CountRecentFailedLogins`: extend the query to `WHERE action IN ('LoginFailed', 'MfaLoginFailed')`.
- `Azaion.AdminApi/Program.cs`: attach the existing `LoginPerIpPolicy` (or a parallel `MfaLoginPerIpPolicy` with the same parameters) to the `/login/mfa` endpoint.
- Tests under `e2e/Azaion.E2E/Tests/`: add cases for the four failure-mix scenarios (5×password-fail → lock; 5×MFA-fail → lock; 3×password + 2×MFA → lock; 1×password + 4×MFA → lock). Plus the `/login/mfa` per-IP rate-limit smoke test.
- Audit-log assertion: each rejection step writes the right internal action label.
### Excluded
- `/users/me/mfa/{enroll,confirm,disable}` rate limiting — that is F-AUTH-4 / AZ-NEW-7. Separate ticket because step-up auth there is different.
- TOTP code reuse / replay detection beyond the existing window — out of scope.
- Recovery-code brute-force protection — recovery codes are high-entropy (verified in security audit); not the same risk profile.
- Cross-workspace verifier changes (gps-denied, satellite-provider, ui) — none required; this is admin-only.
## Acceptance Criteria
**AC-1: 5 wrong TOTP attempts lock the account**
Given a known account with valid step-1 token
When 5 sequential `POST /login/mfa` calls with wrong TOTP are made (per AZ-537 policy threshold)
Then the 6th call (any code, even the correct one) returns the lockout response with `Retry-After`.
**AC-2: Mixed-mode failures aggregate**
Given a known account
When 3 wrong-password attempts then 2 wrong-MFA attempts occur within the rate-limit window
Then the 6th attempt (password-side OR MFA-side) returns the lockout response.
**AC-3: `CountRecentFailedLogins` includes MFA failures**
Given an account with 2 `LoginFailed` and 3 `MfaLoginFailed` rows within the window
When `CountRecentFailedLogins` is called
Then it returns 5.
**AC-4: `/login/mfa` is per-IP rate-limited**
Given a single source IP sending `/login/mfa` requests at high volume across many fabricated step-1 tokens
When the per-IP burst limit is exceeded
Then subsequent requests from that IP are rejected at the rate-limit layer (HTTP 429 or equivalent), regardless of which account is targeted.
**AC-5: Locked-out account at MFA step gets the same response shape**
Given a locked-out account that still presents a valid step-1 token
When `POST /login/mfa` is called
Then the response code, body, and `Retry-After` header match the response of a locked-out account at `/login` (no new shape).
**AC-6: Audit log records the right action**
Given a wrong-TOTP rejection
When the `audit_events` row is read
Then `action = 'MfaLoginFailed'` (not `LoginFailed`), with email + IP + timestamp.
**AC-7: Correct TOTP after partial failures resets nothing prematurely**
Given an account with 2 prior MFA failures (under the threshold)
When the user submits the correct TOTP
Then verification succeeds AND the failure count is reset per the existing AZ-537 reset policy.
## Non-Functional Requirements
**Security**
- Wire response from `/login/mfa` carries no extra information distinguishing "wrong code" from "locked out" beyond what AZ-537 already exposes at `/login`.
**Performance**
- The shared failure-accounting helper is hot-path. Must not add a network round-trip or extra DB transaction beyond what the password path already does.
**Reliability**
- Race condition on concurrent failures must not undercount — use the same locking / `RowVersion` pattern that AZ-537 uses (verify in code review).
## Unit Tests
| AC Ref | What to Test | Required Outcome |
|--------|-------------|-----------------|
| AC-1 | `MfaService.VerifyForLogin` 5 wrong TOTPs | 6th call throws lockout, `Retry-After` populated |
| AC-2 | Mixed 3-password + 2-MFA | 6th throws lockout |
| AC-3 | `CountRecentFailedLogins` with mixed actions | Returns combined count |
| AC-6 | Audit-log row after wrong TOTP | `action = 'MfaLoginFailed'` |
| AC-7 | Correct TOTP after 2 failures | Verify succeeds, failure count reset |
## Blackbox Tests
| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
|--------|------------------------|-------------|-------------------|----------------|
| AC-1 | Known MFA-enrolled account | 5 wrong-TOTP → 6th any-TOTP | Lockout + `Retry-After` | Security |
| AC-2 | Same account | 3 wrong-pwd + 2 wrong-TOTP → 6th any | Lockout | Security |
| AC-4 | Single IP, many step-1 tokens | Burst `/login/mfa` calls | HTTP 429 at threshold | Security |
| AC-5 | Locked account, valid step-1 | `POST /login/mfa` | Identical shape to `/login` lockout response | Security |
| AC-7 | Account with 2 prior MFA fails | Correct TOTP | Verify OK, count reset | Reliability |
## Constraints
- Re-use the AZ-537 `AuthConfig.LockoutOptions` and `RateLimitOptions` values — do not introduce a separate threshold tuned just for MFA.
- The shared failure-accounting helper must live where both `UserService` and `MfaService` can reach it without one importing the other.
- Audit-log writes happen in the same transaction as the failure-count increment to avoid drift between the two stores.
## Risks & Mitigation
**Risk 1: Helper extraction breaks AZ-537 behaviour**
- *Risk*: Pulling the accounting code out of `ValidateUser` introduces a regression on the password path.
- *Mitigation*: AZ-537's existing E2E tests are exercised at every test run; any regression appears immediately. Code review focuses on parity.
**Risk 2: MFA step-up endpoints still unprotected**
- *Risk*: `/users/me/mfa/{enroll,confirm,disable}` remain rate-unlimited; a stolen access token can brute-force MFA disable.
- *Mitigation*: Tracked separately under F-AUTH-4 / AZ-NEW-7. Not in scope here.
**Risk 3: Friendly false lockouts during legitimate roaming**
- *Risk*: A user who fat-fingers their TOTP across two devices in quick succession may now lock out where they wouldn't before.
- *Mitigation*: The threshold values are the same as AZ-537's already-shipping `/login` thresholds, which were sized for password fat-fingering. The risk is bounded by that prior tuning.
**Risk 4: Test environment has rate-limit windows that interfere**
- *Risk*: E2E tests that hit `/login/mfa` repeatedly may themselves be rate-limited.
- *Mitigation*: Existing E2E test infrastructure already manages this for `/login` (per `AZ-189` test infrastructure). Re-use the same reset hooks.