mirror of
https://github.com/azaion/admin.git
synced 2026-06-21 15:41:08 +00:00
[AZ-556] [AZ-557] Close cycle-2 hotfix sprint, hand off to Run Tests
Archive AZ-556 + AZ-557 task specs, mark dependencies table 25/25 done (82/82 pts), write batch_06_cycle2_report.md and the sprint-level implementation_report_auth_modernization_cycle2_hotfix.md, advance _autodev_state.md to step 11 (Run Tests). Per implement skill step 16, the final-suite gate is owned by the test-run skill; not run here to avoid duplicate full runs. Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -0,0 +1,145 @@
|
||||
# Unify Login Error Codes To `InvalidCredentials` + Reorder `IsEnabled` Check
|
||||
|
||||
**Task**: AZ-556_unify_login_error_codes
|
||||
**Name**: Unify login error codes to `InvalidCredentials` + reorder `IsEnabled` check
|
||||
**Description**: `/login` returns distinguishable error codes (`NoEmailFound` vs `WrongPassword`) and additionally leaks disabled-account status by checking `IsEnabled` *after* password verification. Combined with the new per-account lockout, an attacker can pre-filter a credential-stuffing list to known-real accounts and selectively trigger lockout DoS. Collapse both paths to a single opaque `InvalidCredentials` code and move the `IsEnabled` check to BEFORE the password verify (timing-equivalent rejection).
|
||||
**Complexity**: 2 points
|
||||
**Dependencies**: None (touches AZ-537 lockout logic but that work is already shipped)
|
||||
**Component**: Services (UserService) + Common (BusinessException)
|
||||
**Tracker**: AZ-556
|
||||
**Epic**: AZ-530
|
||||
**CMMC ref**: IA.L2-3.5.11 (obscure feedback of authentication information), AC.L2-3.1.8 (limit unsuccessful login attempts)
|
||||
**Source**: `_docs/05_security/security_report_cycle2.md` F-AUTH-1 + F-AUTH-3 (High); `_docs/05_security/static_analysis_cycle2.md` §F-2026Q2-AUTH-1, §F-2026Q2-AUTH-3
|
||||
|
||||
## Problem
|
||||
|
||||
`Azaion.Services/UserService.ValidateUser` (~lines 120–148) and `Azaion.Common/BusinessException.cs` (codes 10 + 30, ~lines 33–52) expose two materially-distinguishable login failure signals:
|
||||
|
||||
1. `BusinessException(NoEmailFound)` — code 10, message "No such email found." — when the email doesn't exist.
|
||||
2. `BusinessException(WrongPassword)` — code 30, message "Passwords do not match." — when the email exists but the password is wrong.
|
||||
|
||||
A client can trivially separate "real account" from "unknown account" via this signal. Combined with the cycle-2 per-account lockout (AZ-537), an attacker can:
|
||||
- Enumerate real accounts at request volume.
|
||||
- Selectively trigger lockout on real accounts to DoS specific users.
|
||||
- Pre-filter credential-stuffing lists to maximise hit rate.
|
||||
|
||||
Separately, `ValidateUser` runs the password verify (Argon2id) *before* checking `IsEnabled`. A disabled account therefore takes the slow Argon2id path AND returns a different error from a wrong-password path — both timing and error-shape leak the disabled state.
|
||||
|
||||
## Outcome
|
||||
|
||||
- `/login` returns the same error code, HTTP status, response shape, and human-readable message for: unknown email, wrong password, and disabled account.
|
||||
- The new unified path takes effectively the same wall-clock time for all three rejection categories (constant-time within the resolution practical for a request-response API).
|
||||
- The order of checks in `ValidateUser` is: short-circuit `IsEnabled` first, then password verify, then lockout-on-failure accounting.
|
||||
- Audit log still distinguishes the three categories internally (so SecOps can analyse them) — the leak is only fixed at the wire.
|
||||
- Existing callers of `BusinessException` codes 10 and 30 continue to work; the codes themselves are deprecated in favour of the new `InvalidCredentials` code, with a migration plan documented in the BusinessException file.
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
|
||||
- Introduce a new `BusinessException` code (e.g. `InvalidCredentials`, code 70 or next-available) with a single opaque message.
|
||||
- Update `Azaion.Services/UserService.ValidateUser` to:
|
||||
- Look up the user (or get a `null` for unknown email).
|
||||
- If user is `null` OR `!IsEnabled`, perform a **dummy Argon2id verify** against a known constant hash to equalise timing, then throw `InvalidCredentials`. (The lockout accounting branch is skipped — there is nothing to lock out.)
|
||||
- If user exists and is enabled, run real Argon2id verify; on mismatch, run the existing failure-count + lockout pipeline, then throw `InvalidCredentials`.
|
||||
- On lockout-state-reached, also throw `InvalidCredentials` with the existing `Retry-After` header populated.
|
||||
- Update `Azaion.Services/AuditLog` callers: each rejection path still records its true reason (`LoginFailed_UnknownEmail`, `LoginFailed_WrongPassword`, `LoginFailed_AccountDisabled`) for internal forensics.
|
||||
- Update tests under `e2e/Azaion.E2E/Tests/` to assert the new unified wire response and verify the audit-log internal distinction.
|
||||
- Document the deprecation of codes 10 and 30 in a comment near their declaration (do not delete — there may be cross-workspace consumers).
|
||||
|
||||
### Excluded
|
||||
|
||||
- A full constant-time audit of every error path in admin — only the `/login` path is in scope.
|
||||
- Account-discovery via response timing on other endpoints (`/users/me/mfa/*` etc.). Tracked separately under F-AUTH-4 / AZ-NEW-7.
|
||||
- Changing the lockout policy itself — AZ-537 owns the policy; this ticket only changes which path leads to lockout accounting.
|
||||
- UI changes to map the new code. The UI already shows a generic "Invalid credentials" string for both codes today, so no UI change is required (verify during code review).
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: Unknown email returns `InvalidCredentials`**
|
||||
Given `POST /login` with email that does not exist in the `users` table
|
||||
When the request is processed
|
||||
Then the response is the same `InvalidCredentials` error code, HTTP status, and body as a wrong-password attempt on a known account.
|
||||
|
||||
**AC-2: Wrong password returns `InvalidCredentials`**
|
||||
Given `POST /login` with a known email and a wrong password
|
||||
When the request is processed
|
||||
Then the response is `InvalidCredentials`, AND the account's `failed_login_count` is incremented per the existing AZ-537 policy.
|
||||
|
||||
**AC-3: Disabled account returns `InvalidCredentials`**
|
||||
Given `POST /login` with a known email belonging to a disabled (`IsEnabled = false`) account
|
||||
When the request is processed
|
||||
Then the response is `InvalidCredentials`, AND the audit log records the rejection as `LoginFailed_AccountDisabled` internally.
|
||||
|
||||
**AC-4: `IsEnabled` checked before password verify**
|
||||
Given a disabled account
|
||||
When `ValidateUser` runs
|
||||
Then the password verify is **not** invoked on the real stored hash for that account. (Verified by an instrumented test that asserts no Argon2id-against-the-real-hash call occurs.)
|
||||
|
||||
**AC-5: Timing equivalence (smoke level)**
|
||||
Given 1000 paired requests — half "unknown email", half "known email wrong password"
|
||||
When request latency is measured at the API edge
|
||||
Then the median and p95 latencies of the two groups are within 5% of each other. (Not a constant-time crypto guarantee; this is a smoke ceiling against gross timing differences.)
|
||||
|
||||
**AC-6: Audit log still distinguishes internally**
|
||||
Given the three rejection categories
|
||||
When the `audit_events` table is read after a representative run
|
||||
Then each category produces a distinct internal action label, with email + IP + timestamp.
|
||||
|
||||
**AC-7: Lockout still triggers**
|
||||
Given a known enabled account hit with N wrong passwords (per AZ-537 policy)
|
||||
When the threshold is reached
|
||||
Then the account is locked AND the lockout response uses `InvalidCredentials` + the existing `Retry-After` header.
|
||||
|
||||
## Non-Functional Requirements
|
||||
|
||||
**Security**
|
||||
- The wire response carries no signal that distinguishes the three rejection categories — code, body, headers, AND timing within the AC-5 ceiling.
|
||||
|
||||
**Compatibility**
|
||||
- BusinessException codes 10 and 30 remain defined (deprecated, comment-marked) for any cross-workspace caller. Removal scheduled in a separate ticket only after a deprecation window.
|
||||
|
||||
## Unit Tests
|
||||
|
||||
| AC Ref | What to Test | Required Outcome |
|
||||
|--------|-------------|-----------------|
|
||||
| AC-1 | `ValidateUser` with unknown email | Throws `InvalidCredentials`, performs dummy verify |
|
||||
| AC-2 | `ValidateUser` with wrong password | Throws `InvalidCredentials`, increments failure count |
|
||||
| AC-3 | `ValidateUser` with disabled account | Throws `InvalidCredentials`, no real-hash verify |
|
||||
| AC-4 | Instrumented Argon2id wrapper | Real-hash verify not called for disabled account |
|
||||
| AC-6 | AuditLog write for each category | Distinct internal action label per rejection |
|
||||
| AC-7 | Threshold-reaching wrong-password sequence | Throws `InvalidCredentials` + `Retry-After` |
|
||||
|
||||
## Blackbox Tests
|
||||
|
||||
| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
|
||||
|--------|------------------------|-------------|-------------------|----------------|
|
||||
| AC-1 | DB empty of test email | `POST /login` unknown | `InvalidCredentials`, identical body to AC-2 | Security |
|
||||
| AC-2 | Known account, wrong pwd | `POST /login` wrong | `InvalidCredentials`, failure count + 1 | — |
|
||||
| AC-3 | Known disabled account | `POST /login` correct pwd | `InvalidCredentials`, identical body to AC-1/AC-2 | Security |
|
||||
| AC-5 | 1000 paired requests | Latency p50, p95 | Within 5% | Security |
|
||||
| AC-7 | At-threshold account, one more wrong | `POST /login` | `InvalidCredentials` + `Retry-After` | — |
|
||||
|
||||
## Constraints
|
||||
|
||||
- The dummy Argon2id verify must use the same `AuthConfig` parameters as the real verify (same time/memory cost) so timing equalises authentically.
|
||||
- Audit log writes must NOT be skipped just because the wire-side error is unified — internal forensics depend on the distinction.
|
||||
- Lockout accounting MUST NOT run on the "unknown email" path (there is no row to update).
|
||||
|
||||
## Risks & Mitigation
|
||||
|
||||
**Risk 1: Dummy Argon2id verify becomes a DoS amplifier**
|
||||
- *Risk*: An attacker hitting `/login` with rotating unknown emails now consumes Argon2id CPU per request even though no real account exists.
|
||||
- *Mitigation*: This is the desired property — without it, the timing leak survives. The per-IP rate limiter (existing, from AZ-537) bounds the amplification.
|
||||
|
||||
**Risk 2: Constant test-hash leaks**
|
||||
- *Risk*: If the dummy verify uses a checked-in hash of a known password, an attacker who reads the binary can craft a request that "succeeds" against the dummy path.
|
||||
- *Mitigation*: The dummy verify path always throws `InvalidCredentials` regardless of result — the verify is run only for timing, not for control-flow.
|
||||
|
||||
**Risk 3: BusinessException code churn breaks cross-workspace verifiers**
|
||||
- *Risk*: Other admin-API consumers (gps-denied, satellite-provider) decode response bodies and may pattern-match on the old codes.
|
||||
- *Mitigation*: Old codes remain defined; new code is additive. Audit cross-workspace usage during code review.
|
||||
|
||||
**Risk 4: UI shows different strings for each old code**
|
||||
- *Risk*: UI may have branched on code 10 vs 30. If so, both branches now show the same message, but the UI continues to map both to "Invalid credentials".
|
||||
- *Mitigation*: Code review checklist: verify `ui/` workspace already maps codes 10/30 to the same display string. If not, file a UI ticket.
|
||||
@@ -0,0 +1,142 @@
|
||||
# Wire MFA Brute-Force Into Per-Account Lockout / Rate-Limit Pipeline
|
||||
|
||||
**Task**: AZ-557_mfa_brute_force_lockout
|
||||
**Name**: Wire MFA brute-force into per-account lockout / rate-limit pipeline
|
||||
**Description**: `MfaService.VerifyForLogin` validates the second-factor TOTP but never increments `failed_login_count` and is excluded from `AuditLog.CountRecentFailedLogins`. An attacker who has captured the step-1 token from a known account can brute-force the 6-digit TOTP at full request volume from rotating IPs without ever tripping lockout. Bring MFA failures into the same per-account lockout/rate-limit pipeline that AZ-537 built for `/login`.
|
||||
**Complexity**: 3 points
|
||||
**Dependencies**: AZ-537 (lockout pipeline), AZ-534 (MFA endpoints)
|
||||
**Component**: Services (MfaService, AuditLog, UserService) + Admin API
|
||||
**Tracker**: AZ-557
|
||||
**Epic**: AZ-530
|
||||
**CMMC ref**: IA.L2-3.5.11 (obscure feedback of authentication information), AC.L2-3.1.8 (limit unsuccessful login attempts)
|
||||
**Source**: `_docs/05_security/security_report_cycle2.md` F-AUTH-2 (High); `_docs/05_security/static_analysis_cycle2.md` §F-2026Q2-AUTH-2
|
||||
|
||||
## Problem
|
||||
|
||||
The cycle-2 auth pipeline has a gap between login factor 1 and factor 2:
|
||||
|
||||
- `Azaion.Services/UserService.ValidateUser` (AZ-537) tracks `failed_login_count`, enforces the per-account rate limit, and trips lockout when the threshold is crossed.
|
||||
- `Azaion.Services/MfaService.VerifyForLogin` (~lines 247–278) ALSO returns `Wrong code` on a failed TOTP, but it does NOT call into the lockout pipeline.
|
||||
- `Azaion.Services/AuditLog.CountRecentFailedLogins` (~lines 53–63) queries only `LoginFailed` events; it ignores `MfaLoginFailed`.
|
||||
|
||||
Concretely: an attacker who phishes (or steals via XSS, or sniffs from logs) a step-1 MFA token can hit `/login/mfa` at full request rate, trying all 10^6 TOTP candidates within the token's lifetime, from rotating source IPs. Per-IP rate-limit doesn't apply (rotates IPs). Per-account rate-limit doesn't apply (different code path). The account never locks out. This entirely defeats the second factor.
|
||||
|
||||
## Outcome
|
||||
|
||||
- A failed MFA verify increments the same `failed_login_count` that AZ-537 maintains for password failures.
|
||||
- `AuditLog.CountRecentFailedLogins` counts `MfaLoginFailed` events alongside `LoginFailed` events.
|
||||
- When the combined failed-count crosses the AZ-537 threshold, the account locks out — regardless of whether the failures were password-side, MFA-side, or mixed.
|
||||
- The MFA verify rejects with the same response shape it does today (no new error code on the wire), but a locked-out account at the MFA step now responds with the existing lockout response + `Retry-After`.
|
||||
- Per-IP rate-limit also applies to `/login/mfa` (defence in depth even if IPs aren't rotating fast enough).
|
||||
- Audit log still records the rejection category (`MfaLoginFailed` vs `LoginFailed`) internally so SecOps can analyse separately.
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
|
||||
- `Azaion.Services/MfaService.VerifyForLogin`:
|
||||
- On TOTP mismatch: call the shared failure-accounting path (extract from `UserService.ValidateUser` into a private helper or a tiny internal collaborator that both services use). Same increment, same threshold check, same `Retry-After` shape on lockout.
|
||||
- On lockout-state-reached during MFA verify: throw the same lockout response shape that the password path throws.
|
||||
- `Azaion.Services/AuditLog.CountRecentFailedLogins`: extend the query to `WHERE action IN ('LoginFailed', 'MfaLoginFailed')`.
|
||||
- `Azaion.AdminApi/Program.cs`: attach the existing `LoginPerIpPolicy` (or a parallel `MfaLoginPerIpPolicy` with the same parameters) to the `/login/mfa` endpoint.
|
||||
- Tests under `e2e/Azaion.E2E/Tests/`: add cases for the four failure-mix scenarios (5×password-fail → lock; 5×MFA-fail → lock; 3×password + 2×MFA → lock; 1×password + 4×MFA → lock). Plus the `/login/mfa` per-IP rate-limit smoke test.
|
||||
- Audit-log assertion: each rejection step writes the right internal action label.
|
||||
|
||||
### Excluded
|
||||
|
||||
- `/users/me/mfa/{enroll,confirm,disable}` rate limiting — that is F-AUTH-4 / AZ-NEW-7. Separate ticket because step-up auth there is different.
|
||||
- TOTP code reuse / replay detection beyond the existing window — out of scope.
|
||||
- Recovery-code brute-force protection — recovery codes are high-entropy (verified in security audit); not the same risk profile.
|
||||
- Cross-workspace verifier changes (gps-denied, satellite-provider, ui) — none required; this is admin-only.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: 5 wrong TOTP attempts lock the account**
|
||||
Given a known account with valid step-1 token
|
||||
When 5 sequential `POST /login/mfa` calls with wrong TOTP are made (per AZ-537 policy threshold)
|
||||
Then the 6th call (any code, even the correct one) returns the lockout response with `Retry-After`.
|
||||
|
||||
**AC-2: Mixed-mode failures aggregate**
|
||||
Given a known account
|
||||
When 3 wrong-password attempts then 2 wrong-MFA attempts occur within the rate-limit window
|
||||
Then the 6th attempt (password-side OR MFA-side) returns the lockout response.
|
||||
|
||||
**AC-3: `CountRecentFailedLogins` includes MFA failures**
|
||||
Given an account with 2 `LoginFailed` and 3 `MfaLoginFailed` rows within the window
|
||||
When `CountRecentFailedLogins` is called
|
||||
Then it returns 5.
|
||||
|
||||
**AC-4: `/login/mfa` is per-IP rate-limited**
|
||||
Given a single source IP sending `/login/mfa` requests at high volume across many fabricated step-1 tokens
|
||||
When the per-IP burst limit is exceeded
|
||||
Then subsequent requests from that IP are rejected at the rate-limit layer (HTTP 429 or equivalent), regardless of which account is targeted.
|
||||
|
||||
**AC-5: Locked-out account at MFA step gets the same response shape**
|
||||
Given a locked-out account that still presents a valid step-1 token
|
||||
When `POST /login/mfa` is called
|
||||
Then the response code, body, and `Retry-After` header match the response of a locked-out account at `/login` (no new shape).
|
||||
|
||||
**AC-6: Audit log records the right action**
|
||||
Given a wrong-TOTP rejection
|
||||
When the `audit_events` row is read
|
||||
Then `action = 'MfaLoginFailed'` (not `LoginFailed`), with email + IP + timestamp.
|
||||
|
||||
**AC-7: Correct TOTP after partial failures resets nothing prematurely**
|
||||
Given an account with 2 prior MFA failures (under the threshold)
|
||||
When the user submits the correct TOTP
|
||||
Then verification succeeds AND the failure count is reset per the existing AZ-537 reset policy.
|
||||
|
||||
## Non-Functional Requirements
|
||||
|
||||
**Security**
|
||||
- Wire response from `/login/mfa` carries no extra information distinguishing "wrong code" from "locked out" beyond what AZ-537 already exposes at `/login`.
|
||||
|
||||
**Performance**
|
||||
- The shared failure-accounting helper is hot-path. Must not add a network round-trip or extra DB transaction beyond what the password path already does.
|
||||
|
||||
**Reliability**
|
||||
- Race condition on concurrent failures must not undercount — use the same locking / `RowVersion` pattern that AZ-537 uses (verify in code review).
|
||||
|
||||
## Unit Tests
|
||||
|
||||
| AC Ref | What to Test | Required Outcome |
|
||||
|--------|-------------|-----------------|
|
||||
| AC-1 | `MfaService.VerifyForLogin` 5 wrong TOTPs | 6th call throws lockout, `Retry-After` populated |
|
||||
| AC-2 | Mixed 3-password + 2-MFA | 6th throws lockout |
|
||||
| AC-3 | `CountRecentFailedLogins` with mixed actions | Returns combined count |
|
||||
| AC-6 | Audit-log row after wrong TOTP | `action = 'MfaLoginFailed'` |
|
||||
| AC-7 | Correct TOTP after 2 failures | Verify succeeds, failure count reset |
|
||||
|
||||
## Blackbox Tests
|
||||
|
||||
| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
|
||||
|--------|------------------------|-------------|-------------------|----------------|
|
||||
| AC-1 | Known MFA-enrolled account | 5 wrong-TOTP → 6th any-TOTP | Lockout + `Retry-After` | Security |
|
||||
| AC-2 | Same account | 3 wrong-pwd + 2 wrong-TOTP → 6th any | Lockout | Security |
|
||||
| AC-4 | Single IP, many step-1 tokens | Burst `/login/mfa` calls | HTTP 429 at threshold | Security |
|
||||
| AC-5 | Locked account, valid step-1 | `POST /login/mfa` | Identical shape to `/login` lockout response | Security |
|
||||
| AC-7 | Account with 2 prior MFA fails | Correct TOTP | Verify OK, count reset | Reliability |
|
||||
|
||||
## Constraints
|
||||
|
||||
- Re-use the AZ-537 `AuthConfig.LockoutOptions` and `RateLimitOptions` values — do not introduce a separate threshold tuned just for MFA.
|
||||
- The shared failure-accounting helper must live where both `UserService` and `MfaService` can reach it without one importing the other.
|
||||
- Audit-log writes happen in the same transaction as the failure-count increment to avoid drift between the two stores.
|
||||
|
||||
## Risks & Mitigation
|
||||
|
||||
**Risk 1: Helper extraction breaks AZ-537 behaviour**
|
||||
- *Risk*: Pulling the accounting code out of `ValidateUser` introduces a regression on the password path.
|
||||
- *Mitigation*: AZ-537's existing E2E tests are exercised at every test run; any regression appears immediately. Code review focuses on parity.
|
||||
|
||||
**Risk 2: MFA step-up endpoints still unprotected**
|
||||
- *Risk*: `/users/me/mfa/{enroll,confirm,disable}` remain rate-unlimited; a stolen access token can brute-force MFA disable.
|
||||
- *Mitigation*: Tracked separately under F-AUTH-4 / AZ-NEW-7. Not in scope here.
|
||||
|
||||
**Risk 3: Friendly false lockouts during legitimate roaming**
|
||||
- *Risk*: A user who fat-fingers their TOTP across two devices in quick succession may now lock out where they wouldn't before.
|
||||
- *Mitigation*: The threshold values are the same as AZ-537's already-shipping `/login` thresholds, which were sized for password fat-fingering. The risk is bounded by that prior tuning.
|
||||
|
||||
**Risk 4: Test environment has rate-limit windows that interfere**
|
||||
- *Risk*: E2E tests that hit `/login/mfa` repeatedly may themselves be rate-limited.
|
||||
- *Mitigation*: Existing E2E test infrastructure already manages this for `/login` (per `AZ-189` test infrastructure). Re-use the same reset hooks.
|
||||
Reference in New Issue
Block a user