[AZ-529] [AZ-530] Cycle-2 documentation refresh

Refreshes _docs/02_document/ to reflect the cycle-2 auth-modernization
+ CMMC hardening landings (AZ-531..AZ-538). Authoritative source for
the ripple set is ripple_log_cycle2.md.

Covered:
- architecture.md (section 1 rewritten, ADRs 6-9 added)
- data_model.md (sessions, audit_events, user columns, migrations)
- system-flows.md (F1 rewritten; F11-F17 added; F2/F7/F9 minor)
- module-layout.md (cycle-2 sub-component table)
- diagrams/flows/flow_login.md (dual-token + MFA)
- components/{01_data_layer,03_auth_and_security,05_admin_api}
- modules/ (12 new, 8 modified — full Argon2id/ES256/MFA/refresh
  /mission/session/audit/jwks rollup)
- tests/{blackbox,security,traceability-matrix}

Step 13 (Update Docs) output for cycle 2.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-14 09:22:53 +03:00
parent c2c659ef62
commit a77b3f8a59
35 changed files with 3624 additions and 468 deletions
@@ -1,90 +1,181 @@
# Authentication & Security
> **Cycle 1 (2026-05-13) note** — AZ-197 simplified `GetApiEncryptionKey` to `(email, password)` and removed `GetHWHash` outright. The hardware-binding threat model that motivated those primitives is no longer in scope (fTPM-anchored Jetsons + browser SaaS).
> **Cycle 1 (2026-05-13) note** — AZ-197 simplified `GetApiEncryptionKey` to `(email, password)` and removed `GetHWHash` outright. The hardware-binding threat model is no longer in scope.
>
> **Cycle 2 (2026-05-14) note** — `GetApiEncryptionKey`, `EncryptTo`, and `DecryptTo` were all removed along with the encrypted-download endpoint. `Security` is now a one-method utility (`ToHash`) that backs SHA-384 password hashing.
> **Cycle 2 — early (2026-05-14, batches 01-04)** — `GetApiEncryptionKey`, `EncryptTo`, and `DecryptTo` were removed along with the encrypted-download endpoint. `Security` was briefly a one-method utility (`ToHash`) wrapping SHA-384.
>
> **Cycle 2 — Auth Modernization (2026-05-14, AZ-531..AZ-538)** — this component was rebuilt from a single-token issuer + SHA-384 hasher into the full session/refresh/MFA/audit/mission stack described below. Old single-token, symmetric-HS256, SHA-384 paths are gone.
## 1. High-Level Overview
**Purpose**: JWT token creation/validation and password hashing (`Security.ToHash`).
**Purpose**: end-to-end authentication, authorization, session management, second factor (TOTP), token signing/verification, mission credentials, audit, and request-time abuse protection (rate limiting / lockout).
**Architectural Pattern**: Service + static utility — `AuthService` is a DI-managed service for JWT operations; `Security` is a static class with a single SHA-384 helper.
**Architectural Pattern**: a cluster of focused DI-registered services backed by Postgres tables, fronted by Admin API endpoints. Token signing is asymmetric (ES256) with file-system key storage and JWKS publication. Refresh tokens use server-side rotation with reuse detection. MFA secrets are encrypted at rest via ASP.NET `IDataProtector`.
**Upstream dependencies**: Data Layer (JwtConfig, IUserService for GetByEmail), ASP.NET Core (IHttpContextAccessor).
**Upstream dependencies**:
- Data Layer (`AzaionDb`, `JwtConfig`, `SessionConfig`, `AuthConfig`, `IUserService.GetByEmail`)
- ASP.NET Core (`IHttpContextAccessor`, `IDataProtectionProvider`, `RateLimiter` middleware)
- File system (`JwtConfig.KeysFolder` for ES256 keys; one PEM per kid)
**Downstream consumers**: Admin API (token creation on login, current user resolution), User Management (password hashing for both web users and provisioned devices).
**Downstream consumers**:
- Admin API endpoints (`/login`, `/login/mfa`, `/refresh`, `/logout`, `/logout/all`, `/users/me/mfa/*`, `/sessions/{sid}`, `/aircraft/{id}/sessions`, `/sessions/revoked`, `/missions/sessions`, `/.well-known/jwks.json`)
- All authorized requests (JWT bearer middleware verifies via `IJwtSigningKeyProvider` and Verifier services consult the revoked-sessions snapshot)
- User Management (Argon2id hashing for register/update; lazy migration on login)
## 2. Internal Interfaces
### Interface: IAuthService
### Service: `IAuthService`
| Method | Input | Output | Async | Error Types |
|--------|-------|--------|-------|-------------|
| `GetCurrentUser` | (none — reads from HttpContext) | `User?` | Yes | None |
| `CreateToken` | `User` | `string` (JWT) | No | None |
| Method | Input | Output | Async | Notes |
|--------|-------|--------|-------|-------|
| `GetCurrentUser` | (HttpContext) | `User?` | Yes | Reads `ClaimTypes.Name` (email) and looks up via `IUserService.GetByEmail` |
| `CreateToken` | `User`, `Guid sessionId`, `Guid jti`, `IEnumerable<string>? amr` | `AccessToken` (record: `Jwt`, `ExpiresAt`) | No | ES256 signed; lifetime from `JwtConfig.AccessTokenLifetimeMinutes`. Stamps `sub` (`NameIdentifier`), `email` (`Name`), `role`, `sid`, `jti`, and one `amr` claim per value (defaults to `["pwd"]`). |
### Static: Security
### Service: `IRefreshTokenService` *(AZ-531)*
| Method | Input | Output | Notes |
|--------|-------|--------|-------|
| `IssueForNewLogin` | `Guid userId`, `bool mfaAuthenticated`, `CancellationToken` | `(string OpaqueToken, Session Session)` | Creates a new session family (the returned `Session.Id` is the `sid` claim) + initial refresh token. `MfaAuthenticated` is pinned on the session so refresh rotations inherit AMR strength. |
| `Rotate` | `string opaqueToken`, `CancellationToken` | `(string OpaqueToken, Session Session)` | Validates → marks old as rotated → inserts new row in same family. Presenting an already-rotated token revokes the entire family. |
### Service: `ISessionService` *(AZ-535)*
| Method | Input | Output |
|--------|-------|--------|
| `RevokeBySid` | `Guid sessionId`, `Guid? byUserId`, `string reason`, `CancellationToken` | `Task<bool>` (true = was already revoked = no-op) |
| `RevokeAllForUser` | `Guid userId`, `Guid? byUserId`, `string reason`, `CancellationToken` | `Task<int>` (rows revoked) |
| `RevokeMissionsForAircraft` | `Guid aircraftId`, `CancellationToken` | `Task<int>` (called from `MissionTokenService.Issue` and from any successful aircraft re-login) |
| `GetRevokedSince` | `DateTime since`, `CancellationToken` | `Task<IReadOnlyList<RevokedSession>>` (sid, exp, revokedAt, reason) |
### Service: `IMfaService` *(AZ-534)*
| Method | Input | Output |
|--------|-------|--------|
| `Enroll` | `Guid userId`, `string password`, `CancellationToken` | `Task<MfaEnrollResponse>` (otpauth URL, base32 secret, QR PNG bytes — DataProtection-encrypted secret persisted) |
| `Confirm` | `Guid userId`, `string code`, `CancellationToken` | `Task` (sets `MfaEnabled=true`, generates and stores hashed recovery codes) |
| `Disable` | `Guid userId`, `string password`, `string code`, `CancellationToken` | `Task` |
| `IssueMfaStepToken` | `Guid userId` | `string` (short-lived JWT with `mfa_pending`, audience `mfa-step`, signed by active ES256 key) |
| `ValidateMfaStepToken` | `string token` | `Guid userId` |
| `VerifyForLogin` | `Guid userId`, `string code`, `CancellationToken` | `Task<string[]>` — returns the AMR array (`["pwd","mfa"]` or with `"recovery"` appended); throws `InvalidMfaCode` on failure |
### Service: `IMissionTokenService` *(AZ-533)*
| Method | Input | Output | Notes |
|--------|-------|--------|-------|
| `Issue` | `Guid pilotUserId`, `MissionSessionRequest`, `CancellationToken` | `Task<MissionSessionResponse>` | Validates aircraft is `CompanionPC`; auto-revokes prior mission sessions for the aircraft; inserts session row with `Class = "mission"` BEFORE signing so `sid` is bound; planned duration = absolute lifetime (no refresh). |
### Service: `IJwtSigningKeyProvider` *(AZ-532)*
| Member | Output | Notes |
|--------|--------|-------|
| `Active` | `JwtSigningKey` (`Kid`, `EcdsaSecurityKey SecurityKey`, `ECDsa Ecdsa`) | The signing key. Eager — constructed once at app start so missing/malformed keys fail-fast. |
| `All` | `IReadOnlyList<JwtSigningKey>` | Drives `/.well-known/jwks.json` and `IssuerSigningKeyResolver`. All discovered keys are exposed; only `Active` signs. |
### Service: `IAuditLog` *(AZ-537 + AZ-534)*
| Method | Purpose |
|--------|---------|
| `RecordLoginSuccess(email)` / `RecordLoginFailed(email)` / `RecordLoginLockout(email)` | Persists `audit_events` rows with normalised email + caller IP. |
| `RecordMfaEnroll/Confirm/Disable/LoginSuccess/LoginFailed/RecoveryUsed(email)` | One per MFA lifecycle event. |
| `CountRecentFailedLogins(email, windowSeconds)` | Backs the per-account sliding-window check in `UserService.ValidateUser`. |
### Static: `Security` *(AZ-536 — replaces SHA-384)*
| Method | Input | Output | Description |
|--------|-------|--------|-------------|
| `ToHash` | `string` | `string` (Base64) | SHA-384 hash |
| `HashPassword` | `string` | `string` (PHC) | Argon2id, parameters from `AuthConfig.PasswordHashing` |
| `VerifyPassword` | `string presented`, `string stored` | `VerifyResult` (`Ok`, `NeedsRehash`) | Constant-time; recognizes legacy SHA-384 base64 strings and returns `Ok=true, NeedsRehash=true` so `UserService` can lazy-upgrade |
**Removed**:
- `GetHWHash(string hardware)` — removed by AZ-197 (cycle 1).
- `GetApiEncryptionKey(string email, string password)` — removed in cycle 2 (no remaining callers after `POST /resources/get/{dataFolder?}` was deleted).
- `EncryptTo` / `DecryptTo` extension methods — removed in cycle 2 (no remaining callers; the only consumer was `ResourcesService.GetEncryptedResource`, also deleted).
- `ToHash(string)` — removed by AZ-536. All callers now use `HashPassword` / `VerifyPassword`.
- `GetHWHash`, `GetApiEncryptionKey`, `EncryptTo`, `DecryptTo` — removed earlier in cycle 2.
## 3. External API Specification
N/A — exposed through Admin API.
Exposed via Admin API (component 05). Cycle 2 added:
- `POST /login` — now returns either `LoginResponse` (access + refresh + sid) or `MfaRequiredResponse` (mfa_token only when MFA is enabled). Per-IP sliding-window rate limit applied.
- `POST /login/mfa` — completes MFA login (anonymous + per-IP rate limit; the step-1 token is the proof of mid-flow) → `LoginResponse`
- `POST /token/refresh` — rotates refresh token + new access token (anonymous; the refresh token IS the proof)
- `POST /logout` — revokes the caller's current `sid` (read from the access-token claim). Idempotent.
- `POST /logout/all` — revokes every session for the caller's user
- `POST /users/me/mfa/enroll` / `confirm` / `disable`
- `POST /sessions/{sid:guid}/revoke` *(ApiAdmin)*
- `GET /sessions/revoked?since=...` *(verifier role / ApiAdmin via `revocationReaderPolicy`)*
- `POST /sessions/mission` *(authenticated; pilot's interactive token)* → mission `LoginResponse`-shaped reply
- `GET /.well-known/jwks.json` — anonymous; serves all loaded ES256 public keys (active + retiring); cached 1h.
## 4. Data Access Patterns
No direct database access. `AuthService.GetCurrentUser` delegates to `IUserService.GetByEmail`.
| Service | Tables touched | Pattern |
|---------|----------------|---------|
| `RefreshTokenService` | `public.sessions` | Insert on issue / rotate; update `RevokedAt`+`RevokedReason` on rotate / reuse-detected; index lookup by `RefreshTokenHash` |
| `SessionService` | `public.sessions` | Update by `Sid`; bulk update by `UserId`; range read for revoked-since snapshot |
| `MfaService` | `public.users` | Update MFA columns (`MfaEnabled`, `MfaSecret`, `MfaRecoveryCodes`, `MfaEnrolledAt`, `MfaLastUsedWindow`) |
| `MissionTokenService` | `public.sessions`, `public.users` | Insert mission session row; lookup aircraft user |
| `AuditLog` | `public.audit_events`, `public.users` | Insert events; update `FailedLoginCount` / `LockoutUntil` on the user |
| `AuthService` / `UserService` | `public.users` | Reads for current-user resolution and password verify; updates on lazy rehash |
All tables are LinqToDB-mapped via `AzaionDbShemaHolder`; recovery codes use `jsonb`.
## 5. Implementation Details
**Algorithmic Complexity**: SHA-384 hashing is O(n) where n is input length; in practice it operates on short password strings only.
**Argon2id parameters** (cycle 2 default): time=3, memory=64 MiB, parallelism=2 — overridable via `AuthConfig.PasswordHashing`. Output is a PHC-format string self-describing all parameters; verification re-derives them from the stored value.
**State Management**: `AuthService` is stateless (reads claims from HTTP context per request). `Security` is purely static.
**ES256 keys**: one PEM file per kid in `JwtConfig.KeysFolder`. `ActiveKid` selects the signer; all PEMs with valid `P-256` curves are exposed via JWKS. Rotation procedure: drop a new PEM, set `ActiveKid` to it, restart. Old keys remain in JWKS until physically removed (by ops) so already-issued tokens stay verifiable.
**Key Dependencies**:
**Refresh token format**: opaque random `Base64Url(32 bytes)`. Server stores SHA-256 hash + family id (`Sid`) + `RotatedFromTokenId` to support reuse detection. Sliding window per `SessionConfig.RefreshSlidingHours`; absolute cap per `SessionConfig.RefreshAbsoluteHours`.
| Library | Version | Purpose |
|---------|---------|---------|
| System.IdentityModel.Tokens.Jwt | 7.1.2 | JWT token generation |
| Microsoft.AspNetCore.Authentication.JwtBearer | 10.0.3 | JWT middleware integration |
**Reuse detection**: presenting an already-rotated refresh token revokes the entire family (`Sid`) with reason `RefreshReuseDetected`. The next-snapshot poll picks this up.
**Error Handling Strategy**:
- JWT token creation does not throw (malformed config would cause runtime errors at middleware level).
- `GetCurrentUser` returns null if claims are missing or user not found.
**MFA**:
- Secret: 20 random bytes → base32; URL `otpauth://totp/Azaion:{email}?secret=...&issuer=Azaion`.
- QR: PNG generated with `QRCoder` and returned as bytes (only on enroll).
- Recovery codes: 10 codes, each `Argon2id`-hashed before storage. Single-use; checked on `VerifyForLoginAsync` after TOTP fails.
- Step-1 token: short-lived JWT (`mfa_pending = true`, audience `mfa-step`) signed by the active ES256 key. Lifetime `JwtConfig.MfaStepTokenLifetimeMinutes`.
- Replay defense: persisted `MfaLastUsedWindow` blocks reuse of the same TOTP window within the 30s step.
**Rate limiting / lockout** (AZ-537):
- Per-IP token-bucket via ASP.NET Core `RateLimiter` on `/login`, `/login/mfa`, `/refresh`.
- Per-account sliding window via `IAuditLog.CountRecentFailedLoginsAsync`; threshold + window from `AuthConfig.RateLimit`.
- Lockout via `LockoutOptions`: N consecutive failures within window → `LockoutUntil` set; subsequent logins throw `AccountLocked` with `RetryAfterSeconds`.
**HSTS / HTTPS / CORS** (AZ-538):
- HSTS enabled in non-Development with the standard 1y `includeSubDomains` policy.
- HTTPS redirection in non-Development.
- CORS narrowed to the configured admin origins; credentials allowed only for those origins.
## 6. Extensions and Helpers
None — `Security` itself is a utility consumed by other components.
- `Program.cs` helpers: `ParseSidClaim`, `ParseUserIdClaim` (both throw `InvalidRefreshToken` on malformed/missing claims so handlers don't need to repeat the check).
- `BusinessExceptionHandler` adds the `Retry-After` header for `AccountLocked` / `LoginRateLimited`.
## 7. Caveats & Edge Cases
**Known limitations**:
- Password hashing uses SHA-384 without per-user salt or key stretching. Not resistant to rainbow table attacks. (Unchanged by cycles 1 and 2.)
- `GetCurrentUserEmail` assumes `ClaimTypes.Name` is always present; accessing a missing key would throw `KeyNotFoundException`.
**Removed in cycle 1**: hardware fingerprint hashing was a known weakness (static salt, no rotation); deleting it via AZ-197 also removed that attack surface.
**Removed in cycle 2**: per-user file encryption (`GetApiEncryptionKey` + `EncryptTo` + `DecryptTo`). The hardcoded encryption-key salt and the in-memory `MemoryStream` round-trip are no longer attack / performance surfaces in this codebase.
- **Asymmetric key roll-forward only**: revoking a kid means deleting its PEM. There is no per-kid revocation list separate from the file system. Operators must coordinate kid retirement with refresh-token expiry.
- **Verifier polling cadence**: `GET /sessions/revoked?since=` returns the snapshot since a timestamp. Verifiers must clock-skew-tolerate by stepping `since` back ~30s. Snapshot rows are pruned only after both `expiry + grace` window has passed.
- **MFA recovery codes are single-use**: there is no `regenerate` endpoint in cycle 2. A user who burns all 10 codes and loses their authenticator must contact an admin to disable MFA via `/users/me/mfa/disable` (re-uses password + TOTP, so admin is currently NOT able to disable on behalf of the user — flagged as a follow-up).
- **Mission tokens have no refresh**: `planned_duration_h` is the hard cap; expiry is absolute. Aircraft must re-request via the admin path on re-connect.
- **Lazy password rehash leak window**: a successful login with a SHA-384 stored hash returns `Ok=true, NeedsRehash=true` and `UserService` re-hashes via Argon2id within the same request. If that update fails (DB error), the legacy hash stays — surfaced via logs but not blocking.
## 8. Dependency Graph
**Must be implemented after**: Data Layer (for JwtConfig, IUserService).
**Must be implemented after**: Data Layer (configs + DB tables `users`, `sessions`, `audit_events`).
**Can be implemented in parallel with**: User Management (shared dependency on Data Layer).
**Blocks**: Admin API. (Resource Management no longer depends on this component after cycle 2 removed `EncryptTo` / `DecryptTo`.)
**Blocks**: Admin API (every authenticated endpoint), Verifier components (consume `GET /sessions/revoked` and JWKS).
## 9. Logging Strategy
No explicit logging in AuthService or Security.
- All MFA failures, lockouts, refresh-reuse events, and admin revocations log at `Warning`+ via `IAuditLog` and structured logger.
- Successful logins log at `Information`.
- Argon2id verification failures log only the audit row (no plaintext, no hash).
## Modules Covered
- `Services/AuthService`
- `Services/Security`
- `Services/RefreshTokenService`
- `Services/SessionService`
- `Services/MfaService`
- `Services/MissionTokenService`
- `Services/JwtSigningKeyProvider`
- `Services/AuditLog`