[AZ-534] TOTP-based 2FA at credential login
ci/woodpecker/push/01-test Pipeline failed
ci/woodpecker/push/02-build-push unknown status

Add RFC 6238 TOTP enrollment, two-step /login flow, recovery codes, and
the amr=["pwd","mfa"] claim that propagates through refresh-token rotation.

- New endpoints: /users/me/mfa/{enroll,confirm,disable} and /login/mfa.
- /login short-circuits to a 5-min ES256 step-1 token (audience-pinned
  azaion-mfa-step2) when the user has MFA enabled; real access+refresh
  pair is minted only after /login/mfa.
- mfa_secret encrypted at rest via ASP.NET Core IDataProtector
  (purpose=Azaion.Mfa.Secret.v1; key folder configurable via
  DataProtection:KeysFolder for production persistence).
- Recovery codes (10 single-use, base32, ~80-bit entropy) hashed with
  SHA-256 and stored as JSONB; constant-time compare on lookup.
- RFC 6238 §5.2 replay defense via mfa_last_used_window per user.
- Sessions carry mfa_authenticated so /token/refresh re-stamps the
  amr claim correctly across the entire 30-day refresh window.
- New audit events: enroll, confirm, disable, login-success/failed,
  recovery-used.
- Schema: env/db/10_users_mfa.sql adds users.mfa_* columns and
  sessions.mfa_authenticated; mfa_recovery_codes mapped as BinaryJson
  in AzaionDbSchemaHolder; disable path uses raw parameterised SQL to
  avoid LinqToDB null-literal type-inference on jsonb columns.

E2E: 6 new tests in MfaLoginTests cover all six AC; full suite
82 passed / 0 failed / 3 intentional skips.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-14 06:21:28 +03:00
parent 8e7c602f51
commit 1e1ded73f5
24 changed files with 1188 additions and 57 deletions
@@ -0,0 +1,75 @@
# Batch Report
**Batch**: 4 (cycle 2)
**Tasks**: AZ-534 (totp_2fa_login)
**Date**: 2026-05-14
**Total Complexity**: 5 points
**Epic**: AZ-529 — Auth Mechanism Modernization
## Task Results
| Task | Status | Files Modified | Tests | AC Coverage | Issues |
|--------|--------|-----------------------------------------|-------------|-------------|--------|
| AZ-534 | Done | 9 source + 1 sql migration + 1 test | 6/6 pass | 6/6 | None blocking — see review |
## Files Touched
**Source (production)**
- `Azaion.AdminApi/Program.cs` — DI for `IMfaService`; configure ASP.NET Core DataProtection (with optional `DataProtection:KeysFolder` for production persistence); `/login` short-circuits to step-1 token when `user.MfaEnabled`; new `/login/mfa` endpoint; new `/users/me/mfa/{enroll,confirm,disable}` endpoints; `IssueDualTokens` helper centralises access+refresh minting; `/token/refresh` propagates `amr` from the persisted `MfaAuthenticated` flag
- `Azaion.AdminApi/BusinessExceptionHandler.cs` — map `MfaAlreadyEnabled` / `MfaNotEnrolling` / `MfaNotEnabled` → 409, `InvalidMfaCode` / `InvalidMfaToken` → 401
- `Azaion.Common/BusinessException.cs` — add `MfaAlreadyEnabled = 56`, `MfaNotEnrolling = 57`, `MfaNotEnabled = 58`, `InvalidMfaCode = 59`, `InvalidMfaToken = 61`
- `Azaion.Common/Database/AzaionDbShemaHolder.cs``User.MfaRecoveryCodes` mapped to `DataType.BinaryJson` so Npgsql sends the JSONB type oid on insert/update
- `Azaion.Common/Entities/User.cs` — add `MfaEnabled`, `MfaSecret`, `MfaRecoveryCodes`, `MfaEnrolledAt`, `MfaLastUsedWindow`; sensitive fields `[JsonIgnore]`
- `Azaion.Common/Entities/Session.cs` — add `MfaAuthenticated` (preserves AMR strength across refresh rotations)
- `Azaion.Common/Entities/AuditEvent.cs` — new event type strings: `MfaEnroll`, `MfaConfirm`, `MfaDisable`, `MfaLoginSuccess`, `MfaLoginFailed`, `MfaRecoveryUsed`
- `Azaion.Common/Requests/MfaRequests.cs`*new*; `MfaEnrollRequest`/`Response`, `MfaConfirmRequest`, `MfaDisableRequest`, `MfaRequiredResponse`, `MfaLoginRequest`
- `Azaion.Services/AuthService.cs``CreateToken` accepts optional `amr` collection; values stamped as repeated `amr` claims per RFC 8176
- `Azaion.Services/AuditLog.cs` — new `RecordMfa…` helpers
- `Azaion.Services/MfaService.cs`*new*; TOTP enrol / confirm / disable / verify-for-login; ES256 step-1 token (5-min, audience-pinned `azaion-mfa-step2`); single-use recovery codes (SHA-256 hashed, JSONB-stored); RFC 6238 replay defence via `MfaLastUsedWindow`; `IDataProtector` encrypts `mfa_secret` at rest
- `Azaion.Services/RefreshTokenService.cs``IssueForNewLogin` accepts `mfaAuthenticated`; `Rotate` carries the flag forward to the new session row
**Migrations / infra**
- `env/db/10_users_mfa.sql`*new*; ALTER TABLE adds `mfa_enabled` (default false), `mfa_secret` (text), `mfa_recovery_codes` (jsonb), `mfa_enrolled_at` (timestamp), `mfa_last_used_window` (bigint); `sessions.mfa_authenticated` (default false)
- `e2e/db-init/00_run_all.sh` — apply 10_users_mfa.sql in test DB
- `e2e/Azaion.E2E/Azaion.E2E.csproj` — add `Otp.NET` package (test-side TOTP code generation)
**Tests**
- `e2e/Azaion.E2E/Tests/MfaLoginTests.cs`*new*; 6 tests (enrol payload shape, confirm activates, two-step login + amr, recovery single-use, disable round-trip, ciphertext-at-rest)
- `e2e/Azaion.E2E/Helpers/DbHelper.cs` — add `GetMfaSecretRaw`, `GetMfaEnabled`
## Test Run Results
**Batch 4 only** (`--filter MfaLoginTests`): **6 / 6 passed**, ~14 s.
**Full suite**: **82 passed, 0 failed, 3 skipped**, ~77 s.
The `PasswordHashingTests.AC5_Verify_uses_constant_time_comparator_no_obvious_timing_leak` flake noted in batch 3 review passed cleanly in this run, confirming it as an environmental flake rather than a regression.
## AC Coverage
- **AC-1**: Enrol returns base32 `secret` (32 chars), `otpauth://` URL, base64 PNG QR, 10 recovery codes ≥12 chars; DB still `mfa_enabled=false``AC1_Enroll_returns_secret_otpauth_qr_and_recovery_codes`
- **AC-2**: Confirm with valid TOTP flips `mfa_enabled=true``AC2_Confirm_enables_MFA`
- **AC-3**: `/login` returns `{mfa_required, mfa_token, expires_in:300}` then `/login/mfa` returns access+refresh with `amr=["pwd","mfa"]``AC3_Login_returns_mfa_required_then_step2_returns_tokens_with_amr_pwd_mfa`
- **AC-4**: Recovery code works once (yields `amr=["pwd","mfa","recovery"]`); reuse rejected — `AC4_Recovery_code_works_once_then_fails`
- **AC-5**: `/users/me/mfa/disable` requires password + valid TOTP; subsequent `/login` returns access+refresh directly without step 2 — `AC5_Disable_requires_password_and_code_then_login_returns_tokens_directly`
- **AC-6**: `users.mfa_secret` read directly from Postgres is ciphertext (DataProtection envelope), not the base32 secret — `AC6_Mfa_secret_is_encrypted_at_rest`
## Key Implementation Decisions
1. **`IDataProtector` for `mfa_secret`, not a hand-rolled AES wrapper.** ASP.NET Core's DataProtection handles key generation, automatic 90-day rotation, and a versioned envelope format that survives key rolls without re-encrypting all rows. Custom AES-GCM would have given the same security guarantee but with three new test vectors and a manual rotation runbook. `Purpose = "Azaion.Mfa.Secret.v1"` namespaces the keys so an accidental cross-purpose decrypt fails. Key persistence is opt-in via `DataProtection:KeysFolder` — production deployments MUST set it (Program.cs comment is explicit), or restarts invalidate every enrolled secret.
2. **SHA-256 for recovery code hashing, not Argon2id.** Recovery codes are 16-character base32 strings (~80 bits of entropy from `KeyGeneration.GenerateRandomKey(10)`). Argon2id at the calibrated `~250 ms` cost would add 2.5 s to every wrong-code attempt (we walk all unused codes). High-entropy secrets need a fast hash, not a slow KDF — the same reasoning the refresh-token store uses. Constant-time compare via `CryptographicOperations.FixedTimeEquals` defends against timing oracles on the hash bytes.
3. **`mfa_authenticated` persisted on the session row, not re-derived from the access token.** Refresh-token rotation produces a brand-new access token; we'd otherwise have no source of truth for "was this session born of MFA?" once the original access token expires. Storing the boolean on the session lets `/token/refresh` re-stamp `amr=["pwd","mfa"]` correctly across the entire 30-day refresh window. Costs one boolean column.
4. **Step-1 MFA token is ES256, audience-pinned `azaion-mfa-step2`.** Re-uses the JWKS keypair so verifiers don't need to learn a second key. The narrow audience makes the main `JwtBearer` middleware reject this token for normal endpoints, and `MfaService.ValidateMfaStepToken` rejects any other audience — so a step-1 token cannot be presented at `/users/me`, and an access token cannot be presented at `/login/mfa`.
5. **`VerifyTotpCode` checks `lastUsedWindow > matchedWindow` first.** RFC 6238 §5.2 says "the verifier MUST reject any code that was already used in the current or previous window". `OtpNet.VerificationWindow.RfcSpecifiedNetworkDelay` accepts the prior + current + next 30-second window. Without the per-user `mfa_last_used_window` check, a man-in-the-middle who captured the code mid-flight could replay it within the 30-90 s acceptance window. Persisting the matched window is one extra `UPDATE users` per successful login.
6. **Disable uses raw SQL parameter for the JSONB null.** LinqToDB's `UpdateAsync` lambda compiles `MfaRecoveryCodes = null` into an untyped `NULL` literal which Postgres parses as `text` and rejects against the `jsonb` column (42804). The `BinaryJson` mapping handles non-null values fine, but null literals in expression bodies bypass parameter typing. Switched the disable path to a single parameterised `UPDATE … SET mfa_recovery_codes = NULL::jsonb …`. Local fix, doesn't affect the enrol/confirm/login paths.
## Backward Compatibility
- All new `users` columns default to MFA-off (`mfa_enabled=false`, others NULL). Existing rows untouched.
- Pre-existing `sessions` rows default `mfa_authenticated=false`; `/token/refresh` against an old session continues to issue `amr=["pwd"]` — same behaviour as before.
- `/login` response shape is unchanged for users without MFA enabled — no client-visible change for the existing CompanionPC fleet or any non-enrolled admin.
- `LoginResponse` and `LoginRequest` DTOs unchanged. The MFA branch returns a different DTO (`MfaRequiredResponse`); clients that don't recognise the `mfaRequired` field will see an unexpected payload — UI workspace ticket flagged in the spec under "Risks / Notes".