Files
Oleksandr Bezdieniezhnykh 1e1ded73f5
ci/woodpecker/push/01-test Pipeline failed
ci/woodpecker/push/02-build-push unknown status
[AZ-534] TOTP-based 2FA at credential login
Add RFC 6238 TOTP enrollment, two-step /login flow, recovery codes, and
the amr=["pwd","mfa"] claim that propagates through refresh-token rotation.

- New endpoints: /users/me/mfa/{enroll,confirm,disable} and /login/mfa.
- /login short-circuits to a 5-min ES256 step-1 token (audience-pinned
  azaion-mfa-step2) when the user has MFA enabled; real access+refresh
  pair is minted only after /login/mfa.
- mfa_secret encrypted at rest via ASP.NET Core IDataProtector
  (purpose=Azaion.Mfa.Secret.v1; key folder configurable via
  DataProtection:KeysFolder for production persistence).
- Recovery codes (10 single-use, base32, ~80-bit entropy) hashed with
  SHA-256 and stored as JSONB; constant-time compare on lookup.
- RFC 6238 §5.2 replay defense via mfa_last_used_window per user.
- Sessions carry mfa_authenticated so /token/refresh re-stamps the
  amr claim correctly across the entire 30-day refresh window.
- New audit events: enroll, confirm, disable, login-success/failed,
  recovery-used.
- Schema: env/db/10_users_mfa.sql adds users.mfa_* columns and
  sessions.mfa_authenticated; mfa_recovery_codes mapped as BinaryJson
  in AzaionDbSchemaHolder; disable path uses raw parameterised SQL to
  avoid LinqToDB null-literal type-inference on jsonb columns.

E2E: 6 new tests in MfaLoginTests cover all six AC; full suite
82 passed / 0 failed / 3 intentional skips.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-14 06:21:28 +03:00

7.1 KiB
Raw Permalink Blame History

Code Review Report

Batch: 4 (cycle 2) — AZ-534 (totp_2fa_login) Date: 2026-05-14 Verdict: PASS_WITH_WARNINGS

Phases Covered

  • Phase 1: Context loading (read AZ-534 spec + Program.cs / MfaService / AuthService / RefreshTokenService deltas)
  • Phase 2: Spec compliance (6/6 ACs covered, see below)
  • Phase 3: Code quality (SOLID, naming, error handling, complexity)
  • Phase 4: Security quick-scan (TOTP replay, recovery codes, encryption-at-rest, step-1 token audience pinning, AMR propagation)
  • Phase 5: Performance scan (per-login DB writes, recovery-code verification cost)
  • Phase 6: Cross-task consistency (sessions schema reused; AMR claim feeds future AZ-533 mission gate)
  • Phase 7: Architecture compliance (DI registration follows existing pattern; no new cross-component imports; ProjectReference layering respected)

AC Coverage

AC Test Status
1 MfaLoginTests.AC1_Enroll_returns_secret_otpauth_qr_and_recovery_codes Covered
2 MfaLoginTests.AC2_Confirm_enables_MFA Covered
3 MfaLoginTests.AC3_Login_returns_mfa_required_then_step2_returns_tokens_with_amr_pwd_mfa Covered
4 MfaLoginTests.AC4_Recovery_code_works_once_then_fails Covered
5 MfaLoginTests.AC5_Disable_requires_password_and_code_then_login_returns_tokens_directly Covered
6 MfaLoginTests.AC6_Mfa_secret_is_encrypted_at_rest Covered

6 of 6 acceptance criteria covered by running tests.

Findings

# Severity Category File Title
1 Medium Spec-Gap Azaion.AdminApi/Program.cs (/sessions/mission endpoint) Cross-batch: amr=mfa gate on mission-issuance still a TODO
2 Low Security Azaion.Services/MfaService.TryConsumeRecoveryCode Conditional update returns true even on 0-row write
3 Low Operational Azaion.AdminApi/Program.cs (DataProtection block) Default key-store is ephemeral inside containers
4 Low Performance Azaion.Services/MfaService.VerifyForLogin Successful login costs an extra UPDATE users for mfa_last_used_window

Finding Details

F1: Mission-issuance MFA gate still a TODO (Medium / Spec-Gap)

  • Location: Azaion.AdminApi/Program.cs/sessions/mission endpoint, line ~489
  • Description: Batch 3 deferred the RequireClaim("amr","mfa") gate on /sessions/mission with the comment "until MFA ships this is a code comment per the AZ-533 spec, not an enforced gate." MFA has now shipped. The endpoint is still gated only on RequireAuthorization — any password-only access token can issue a mission token.
  • Suggestion: small follow-up ticket (or amendment to AZ-533) — add .RequireAuthorization(p => p.RequireClaim("amr", "mfa")) (or a named policy) and remove the TODO. Intentionally not done in this batch (scope discipline: AZ-534 spec does not list it as an AC).
  • Task: AZ-533 / AZ-534 follow-up

F2: Recovery-code conditional update bypass (Low / Security)

  • Location: Azaion.Services/MfaService.TryConsumeRecoveryCode, line ~316322
  • Description: The conditional WHERE id = @userId AND mfa_recovery_codes = @priorJson defends against the read-modify-write race between two concurrent /login/mfa calls submitting the same recovery code. But we don't check the affected row count — both flows hit auditLog.RecordMfaRecoveryUsed and return tokens. Only the write of the consumed-code state is single-shot; the outcome (token issuance) double-spends. Practical risk is low (recovery codes are 80-bit secrets, not user-known; concurrent same-code attacks require an attacker who already has the code), but it's a real correctness gap.
  • Suggestion: capture the LinqToDB UpdateAsync return value and treat 0 as "lost the race; reject this attempt". Adds one branch.
  • Task: AZ-534 follow-up

F3: DataProtection key-store ephemeral by default (Low / Operational)

  • Location: Azaion.AdminApi/Program.cs — DataProtection configuration block, line ~151
  • Description: When DataProtection:KeysFolder is not set, ASP.NET Core defaults to ~/.aspnet/DataProtection-Keys inside the container. On container restart that path is lost → every previously-encrypted mfa_secret becomes unrecoverable, locking out every enrolled user. The Program.cs comment is explicit about it ("Production deployments MUST set..."), and the SUT log even prints the framework's own warning. Ops needs the runbook entry, not just a code comment.
  • Suggestion: (a) document DataProtection:KeysFolder in _docs/04_deploy/deployment_procedures.md next to the JWKS key-rotation section; (b) add a startup warning when running in Production and the folder is unset.
  • Task: AZ-534 follow-up (operational)

F4: Successful login costs an extra UPDATE (Low / Performance)

  • Location: Azaion.Services/MfaService.VerifyForLogin, line ~260264
  • Description: Every TOTP success persists mfa_last_used_window (RFC 6238 replay defence). One UPDATE users per /login/mfa for MFA-enabled users. At admin-only MFA scope (handful of accounts) this is a non-issue. If MFA is later mandated for Role IN (Admin, ApiAdmin, ResourceUploader) and the fleet grows, watch the users row write rate.
  • Suggestion: monitor only — no change today.
  • Task: AZ-534 (informational)

Notes (non-blocking)

  • The AC-5 test deliberately uses a recovery code for the mid-test login so the TOTP window stays unused for the subsequent /disable call. Without that, the same code presented twice within 30 s would be rejected by the (correct) replay-window check, producing a flaky 31-second Task.Delay. Worth highlighting in case anyone refactors that test later.
  • User.MfaRecoveryCodes mapped in AzaionDbSchemaHolder with DataType.BinaryJson so inserts work; the disable path uses raw SQL because LinqToDB doesn't carry the type annotation through to literal null values in update-set expressions. Captured in the batch report (Decision #6).
  • RoleEnum.Service = 60 from batch 3 is unaffected by this change. No new role added.

Verdict Rationale

PASS_WITH_WARNINGS — 6/6 ACs pass; full E2E suite green (82/82 enabled tests). Architecture is consistent with the existing Auth*Service/SessionService separation, DI registration follows the existing pattern, and the amr claim now feeds correctly through /login → session → /token/refresh. The findings are deferred-improvement items, not blocking defects.