[AZ-529] [AZ-530] Cycle-2 documentation refresh

Refreshes _docs/02_document/ to reflect the cycle-2 auth-modernization
+ CMMC hardening landings (AZ-531..AZ-538). Authoritative source for
the ripple set is ripple_log_cycle2.md.

Covered:
- architecture.md (section 1 rewritten, ADRs 6-9 added)
- data_model.md (sessions, audit_events, user columns, migrations)
- system-flows.md (F1 rewritten; F11-F17 added; F2/F7/F9 minor)
- module-layout.md (cycle-2 sub-component table)
- diagrams/flows/flow_login.md (dual-token + MFA)
- components/{01_data_layer,03_auth_and_security,05_admin_api}
- modules/ (12 new, 8 modified — full Argon2id/ES256/MFA/refresh
  /mission/session/audit/jwks rollup)
- tests/{blackbox,security,traceability-matrix}

Step 13 (Update Docs) output for cycle 2.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-14 09:22:53 +03:00
parent c2c659ef62
commit a77b3f8a59
35 changed files with 3624 additions and 468 deletions
@@ -1,69 +1,92 @@
# Module: Azaion.Services.UserService
## Purpose
Core business logic for user management: registration (web users + provisioned devices), authentication, role management, and account lifecycle.
Core business logic for user management: registration (web users + provisioned devices), authentication (with rate-limit + lockout enforcement), role management, and account lifecycle.
> **Cycle 1 (2026-05-13) note** — hardware-binding methods (`UpdateHardware`, `CheckHardwareHash`, private `UpdateLastLoginDate`) and the bound `IUserService` declarations were removed by AZ-197 (admin-side hardware-binding cleanup). Device auto-provisioning (`RegisterDevice`) was added by AZ-196. **Post-cycle-1 (security audit F-3)**: `RegisterDevice` was refactored to delegate the row insert to `RegisterUser`, and `RegisterUser` itself now relies on the new `users_email_uidx` UNIQUE INDEX (`env/db/06_users_email_unique.sql`) — the check-then-insert race is gone; `Npgsql.PostgresException(SqlState=23505)` is translated to `BusinessException(EmailExists)`. See `_docs/03_implementation/batch_05_report.md` and `batch_06_report.md`.
> **Cycle 1 (2026-05-13) note** — hardware-binding methods removed by AZ-197; device auto-provisioning (`RegisterDevice`) added by AZ-196. Post-cycle-1: `RegisterUser` now relies on the `users_email_uidx` UNIQUE INDEX; `Npgsql.PostgresException(SqlState=23505)` is translated to `BusinessException(EmailExists)`.
>
> **Cycle 2 (2026-05-14) note A (AZ-536)** — password hashing switched to Argon2id. `RegisterUser` calls `Security.HashPassword`; `ValidateUser` calls `Security.VerifyPassword`. On a `NeedsRehash=true` outcome the user's row is updated transactionally with a fresh Argon2id hash (conditional on the original `password_hash` to avoid clobbering a parallel rehash from a concurrent login).
>
> **Cycle 2 (2026-05-14) note B (AZ-537)** — `ValidateUser` now enforces account lockout (423) and per-account sliding-window rate limit (429-equivalent via `BusinessException(LoginRateLimited)`). The lockout state lives on `users.failed_login_count` / `users.lockout_until`; the rate-limit feed is `audit_events` rows of type `login_failed`. `IAuditLog` and `IOptions<AuthConfig>` are new constructor dependencies.
## Public Interface
### IUserService
| Method | Signature | Description |
|--------|-----------|-------------|
| `RegisterUser` | `Task RegisterUser(RegisterUserRequest request, CancellationToken ct)` | Creates a new user with hashed password |
| `RegisterDevice` | `Task<RegisterDeviceResponse> RegisterDevice(CancellationToken ct)` | Creates a new `CompanionPC` user with auto-assigned `azj-NNNN` serial / email and a 32-char hex password (returned plaintext exactly once) |
| `ValidateUser` | `Task<User> ValidateUser(LoginRequest request, CancellationToken ct)` | Validates email + password, returns user. Throws `NoEmailFound`, `WrongPassword`, or `UserDisabled` |
| `GetByEmail` | `Task<User?> GetByEmail(string? email, CancellationToken ct)` | Cached user lookup by email |
| `UpdateQueueOffsets` | `Task UpdateQueueOffsets(string email, UserQueueOffsets offsets, CancellationToken ct)` | Updates user's annotation queue offsets |
| `GetUsers` | `Task<IEnumerable<User>> GetUsers(string? searchEmail, RoleEnum? searchRole, CancellationToken ct)` | Lists users with optional email/role filters |
| `ChangeRole` | `Task ChangeRole(string email, RoleEnum newRole, CancellationToken ct)` | Changes a user's role |
| `SetEnableStatus` | `Task SetEnableStatus(string email, bool isEnabled, CancellationToken ct)` | Enables or disables a user account |
| `RemoveUser` | `Task RemoveUser(string email, CancellationToken ct)` | Permanently deletes a user |
| `RegisterUser` | `Task RegisterUser(RegisterUserRequest request, CancellationToken ct)` | Creates a new user with Argon2id-hashed password. Translates `users_email_uidx` 23505 violations to `BusinessException(EmailExists)`. |
| `RegisterDevice` | `Task<RegisterDeviceResponse> RegisterDevice(CancellationToken ct)` | Creates a new `CompanionPC` user with auto-assigned `azj-NNNN` serial / email and a 32-char hex password (returned plaintext exactly once). |
| `ValidateUser` | `Task<User> ValidateUser(LoginRequest request, CancellationToken ct)` | Validates email + password; enforces account lockout and per-account rate limit. Returns the user on success (with `failed_login_count` zeroed and any legacy SHA-384 hash transparently upgraded). Throws `NoEmailFound`, `AccountLocked` (with retry-after seconds), `LoginRateLimited` (with retry-after window), `WrongPassword`, or `UserDisabled`. |
| `GetByEmail` | `Task<User?> GetByEmail(string? email, CancellationToken ct)` | Cached user lookup by email. |
| `GetById` | `Task<User?> GetById(Guid userId, CancellationToken ct)` | Direct DB lookup by id (used by token-bound flows: refresh, MFA, mission). Not cached. |
| `UpdateQueueOffsets` | `Task UpdateQueueOffsets(string email, UserQueueOffsets offsets, CancellationToken ct)` | Updates user's annotation queue offsets. |
| `GetUsers` | `Task<IEnumerable<User>> GetUsers(string? searchEmail, RoleEnum? searchRole, CancellationToken ct)` | Lists users with optional email/role filters. |
| `ChangeRole` | `Task ChangeRole(string email, RoleEnum newRole, CancellationToken ct)` | Changes a user's role. |
| `SetEnableStatus` | `Task SetEnableStatus(string email, bool isEnabled, CancellationToken ct)` | Enables or disables a user account. |
| `RemoveUser` | `Task RemoveUser(string email, CancellationToken ct)` | Permanently deletes a user. |
## Internal Logic
- **RegisterUser**: hashes password via `Security.ToHash`, inserts via `RunAdmin`. Catches `Npgsql.PostgresException` with `SqlState == PostgresErrorCodes.UniqueViolation` (23505) on the `users_email_uidx` UNIQUE INDEX and rethrows as `BusinessException(EmailExists)`. The previous check-then-insert pattern was removed (race-prone before the index existed; redundant after).
- **RegisterDevice**: calls private `NextDeviceIdentity` (read-only) to compute the next `azj-NNNN` serial + matching email, generates a 32-char hex password from `RandomNumberGenerator.GetBytes(16)`, then delegates the row insert to `RegisterUser` (so any future change to user-creation policy applies here too). Returns `{Serial, Email, Password}` (plaintext password exposed exactly once at provisioning time). On a serial-allocation race, the second caller's insert hits the UNIQUE INDEX and surfaces `BusinessException(EmailExists)`; the caller can retry.
- **NextDeviceIdentity** (private): queries the most recent `RoleEnum.CompanionPC` user via `dbFactory.Run` (read connection), parses the `azj-NNNN` suffix (chars `[SerialNumberStart, SerialNumberLength)` of the email, constants on the class), increments by 1, returns `(serial, email)`.
- **ValidateUser**: finds user by email, compares password hash. Throws `NoEmailFound`, `WrongPassword`, or `UserDisabled`.
- **GetByEmail**: uses `ICache.GetFromCacheAsync` with key `User.{email}`.
- **UpdateQueueOffsets**: writes via `RunAdmin`, then invalidates the user cache.
- **GetUsers**: uses `WhereIf` for optional filter predicates.
- **RegisterUser**: hashes password via `Security.HashPassword` (Argon2id), inserts via `RunAdmin`. Catches `PostgresException(23505)` on `users_email_uidx` and rethrows as `BusinessException(EmailExists)`.
- **RegisterDevice**: queries the most recent `RoleEnum.CompanionPC` user via `dbFactory.Run`, parses the `azj-NNNN` suffix, increments by 1, generates a 32-char hex password from `RandomNumberGenerator.GetBytes(16)`, then delegates the row insert to `RegisterUser` (so future user-creation policy changes apply here too).
- **ValidateUser** (sequence — order matters):
1. Lookup by email; missing → `NoEmailFound`.
2. **Lockout gate** — if `lockout_until > now()`, throw `AccountLocked` with the remaining seconds as `RetryAfterSeconds`. This precedes the password check (CMMC AC.L2-3.1.8 — even a correct password is rejected during lockout).
3. **Per-account rate limit**`IAuditLog.CountRecentFailedLogins` over `AuthConfig.RateLimit.PerAccountWindowSeconds`; if ≥ `PerAccountPermitLimit`, throw `LoginRateLimited` with the window as `RetryAfterSeconds`.
4. **Password verify** via `Security.VerifyPassword`. Failure → `RegisterFailedLogin` (audit row + counter increment + maybe lockout) → throw `WrongPassword` (or `AccountLocked` if the failure crossed the threshold).
5. `IsEnabled` check (after verify so wrong-password and disabled-account look identical to attackers from the outside).
6. **Success path**`RegisterSuccessfulLogin`: lazy Argon2id rehash if `NeedsRehash=true` (conditional on the original hash to avoid clobbering a parallel rehash), zero `failed_login_count`, clear `lockout_until`, invalidate cache, write `login_success` audit row.
- **RegisterFailedLogin**: writes `login_failed` audit row, increments `failed_login_count`. If the new count reaches `Lockout.MaxAttempts`, sets `lockout_until = now() + DurationSeconds`, writes a `login_lockout` audit row, and throws `AccountLocked` immediately so the caller learns the threshold was crossed.
- **GetByEmail**: cached via `ICache.GetFromCacheAsync` keyed `User.{email}`.
- **GetById**: not cached (used by token-bound flows where the user id is already authenticated).
Private constants (device provisioning):
- `DeviceEmailPrefix = "azj-"`, `DeviceEmailDomain = "@azaion.com"`, `SerialNumberStart = 4`, `SerialNumberLength = 4`, `DevicePasswordBytes = 16`.
## Dependencies
- `IDbFactory` (database access)
- `ICache` (user caching)
- `Security` (hashing — `ToHash`)
- `IAuditLog` (cycle 2 — audit row writes + per-account rate-limit feed)
- `IOptions<AuthConfig>` (cycle 2 — `RateLimit.*`, `Lockout.*` thresholds)
- `Security` (Argon2id hashing — `HashPassword` / `VerifyPassword`)
- `System.Security.Cryptography.RandomNumberGenerator` (device password entropy)
- `Npgsql` (`PostgresException`, `PostgresErrorCodes.UniqueViolation` — used to translate UNIQUE-INDEX violations to `BusinessException(EmailExists)`)
- `BusinessException` (domain errors)
- `Npgsql` (`PostgresException`, `PostgresErrorCodes.UniqueViolation`)
- `BusinessException` / `ExceptionEnum` (`NoEmailFound`, `WrongPassword`, `EmailExists`, `UserDisabled`, `AccountLocked`, `LoginRateLimited`)
- `QueryableExtensions.WhereIf`
- `User`, `UserConfig`, `UserQueueOffsets`, `RoleEnum`
- `RegisterUserRequest`, `LoginRequest`, `RegisterDeviceResponse`
## Consumers
- `Program.cs``/users/*` endpoints delegate to `IUserService`
- `Program.cs` `POST /devices` calls `RegisterDevice` (added by AZ-196)
- `Program.cs` `/users/*` endpoints — delegate to `IUserService`
- `Program.cs` `POST /devices` — calls `RegisterDevice`
- `Program.cs` `/login` — calls `ValidateUser` then either short-circuits to MFA step-1 or issues dual tokens
- `Program.cs` `/login/mfa`, `/token/refresh`, `/sessions/mission` — call `GetById` after token-side identity is established
- `AuthService.GetCurrentUser` — calls `GetByEmail`
- `MfaService` — calls `GetById` for re-auth in `Enroll` / `Confirm` / `Disable` / `VerifyForLogin`
## Data Models
Operates on `User` entity via `AzaionDb.Users` table. The `User.Hardware` column is left in place (nullable, unused) per AZ-197 — see the entity doc.
Operates on `User` entity via `AzaionDb.Users`. Reads `failed_login_count` / `lockout_until` (AZ-537) and `mfa_enabled` (AZ-534). Writes `password_hash`, `failed_login_count`, `lockout_until` along the lockout/rehash paths. The `User.Hardware` column remains a tombstone (nullable, unused) per AZ-197.
## Configuration
None.
- `AuthConfig.RateLimit.PerAccountPermitLimit` / `PerAccountWindowSeconds` — sliding-window thresholds.
- `AuthConfig.Lockout.MaxAttempts` / `DurationSeconds` — consecutive-failure lockout.
## External Integrations
PostgreSQL via `IDbFactory`.
## Security
- Passwords hashed with SHA-384 (via `Security.ToHash`) before storage.
- Device passwords are returned plaintext to the caller exactly once at provisioning; the persisted form is the SHA-384 hash. The plaintext is never re-derivable.
- Passwords hashed with Argon2id (post-AZ-536). Legacy SHA-384 entries still validate and are transparently upgraded on next successful login.
- Device passwords are returned plaintext to the caller exactly once at provisioning; the persisted form is the Argon2id hash. The plaintext is never re-derivable.
- Lockout precedence (CMMC AC.L2-3.1.8): a locked account returns 423 even for a correct password until `lockout_until` passes.
- The per-account rate limit is DB-backed (via `audit_events`) so it survives process restarts — distinct from the in-memory per-IP limiter that lives in `Program.cs`.
- Read operations use the read-only DB connection; writes use the admin connection.
## Tests
- `e2e/Azaion.E2E/Tests/DeviceTests.cs` — e2e for AZ-196 device-provisioning ACs
- `e2e/Azaion.E2E/Tests/UserManagementTests.cs` and `LoginTests.cs` — e2e coverage for the rest of the user lifecycle (login, register, role change, enable/disable, delete, queue offsets)
(Unit-test coverage in `Azaion.Test/UserServiceTest.cs` was removed earlier with the AZ-197 hardware-binding cleanup; the `Azaion.Test` project itself was removed from the solution in cycle 2 once its only remaining file — `SecurityTest.cs` — was deleted with the encrypted-download stack.)
- `e2e/Azaion.E2E/Tests/RateLimitLockoutTests.cs` — AZ-537 ACs (per-IP 429, per-account 429, lockout 423, counter reset, lockout auto-expires, audit_events row on lockout).
- `e2e/Azaion.E2E/Tests/PasswordHashingTests.cs` — AZ-536 ACs (Argon2id format, legacy verify, transparent re-hash, wrong-password fail, constant-time verify).
- `e2e/Azaion.E2E/Tests/DeviceTests.cs` — AZ-196 device-provisioning ACs.
- `e2e/Azaion.E2E/Tests/UserManagementTests.cs` and `LoginTests.cs` — broader user lifecycle coverage.