mirror of
https://github.com/azaion/admin.git
synced 2026-06-21 21:11:08 +00:00
a77b3f8a59
Refreshes _docs/02_document/ to reflect the cycle-2 auth-modernization
+ CMMC hardening landings (AZ-531..AZ-538). Authoritative source for
the ripple set is ripple_log_cycle2.md.
Covered:
- architecture.md (section 1 rewritten, ADRs 6-9 added)
- data_model.md (sessions, audit_events, user columns, migrations)
- system-flows.md (F1 rewritten; F11-F17 added; F2/F7/F9 minor)
- module-layout.md (cycle-2 sub-component table)
- diagrams/flows/flow_login.md (dual-token + MFA)
- components/{01_data_layer,03_auth_and_security,05_admin_api}
- modules/ (12 new, 8 modified — full Argon2id/ES256/MFA/refresh
/mission/session/audit/jwks rollup)
- tests/{blackbox,security,traceability-matrix}
Step 13 (Update Docs) output for cycle 2.
Co-authored-by: Cursor <cursoragent@cursor.com>
279 lines
22 KiB
Markdown
279 lines
22 KiB
Markdown
# Azaion Admin API — Architecture
|
||
|
||
## 1. System Context
|
||
|
||
**Problem being solved**: Azaion Suite requires a centralized admin API to manage users + roles, authenticate humans (with optional second factor), authenticate UAVs for offline missions, and broker token revocation across a fleet of verifier services.
|
||
|
||
**System boundaries**:
|
||
- **Inside**: user management, password hashing (Argon2id), authentication (ES256 JWT + opaque refresh tokens with rotation + reuse detection), TOTP MFA, mission-token issuance, session revocation + verifier-poll snapshot, account lockout + per-IP and per-account rate limiting, JWKS publication, role-based authorization, file-based resource storage (upload / list / clear), HSTS + HTTPS redirect.
|
||
- **Outside**: admin web panel (`admin.azaion.com`), fTPM-secured Jetson edge devices (CompanionPC), verifier fleet (satellite-provider, gps-denied, ui — service-role identities), PostgreSQL, server filesystem.
|
||
|
||
> **Note (AZ-197, cycle 1)**: hardware-fingerprint binding removed.
|
||
>
|
||
> **Note (cycle 2 early)**: encrypted resource download + installer endpoints removed; ADR-003 retired.
|
||
>
|
||
> **Note (cycle 2 — Auth Modernization, 2026-05-14, AZ-531..AZ-538)**: the entire authentication layer was rebuilt:
|
||
> - **AZ-536** — Argon2id password hashing replaced SHA-384; lazy migration on login.
|
||
> - **AZ-531** — opaque refresh tokens with server-side rotation, family-based reuse detection, sliding + absolute lifetimes (`SessionConfig`).
|
||
> - **AZ-532** — symmetric HS256 → asymmetric ES256 with file-system key store + JWKS endpoint.
|
||
> - **AZ-534** — TOTP MFA (enroll/confirm/disable, recovery codes, two-step login, `IDataProtector`-encrypted secret, `amr` claim).
|
||
> - **AZ-535** — logout (single + all) + admin revoke + verifier-poll snapshot of revoked sessions; new `Service` role for verifier identities.
|
||
> - **AZ-533** — long-lived no-refresh mission tokens for UAV ops, with auto-revoke on aircraft reconnect.
|
||
> - **AZ-537** — DB-backed account lockout + per-account sliding-window rate limit + per-IP token-bucket via ASP.NET `RateLimiter`; `audit_events` table.
|
||
> - **AZ-538** — CORS narrowed to single HTTPS origin, HSTS enabled (non-Development), HTTPS redirection (non-Development).
|
||
> - New ADRs **ADR-006** through **ADR-009** below capture the per-decision context.
|
||
|
||
**External systems**:
|
||
|
||
| System | Integration Type | Direction | Purpose |
|
||
|--------|-----------------|-----------|---------|
|
||
| PostgreSQL | Database (linq2db) | Both | User + session + audit_events persistence |
|
||
| Server filesystem | File I/O | Both | Resource files; ES256 PEM key store; DataProtection key store (when `DataProtection:KeysFolder` is set) |
|
||
| Admin web panel (admin.azaion.com) | REST API | Inbound | User management, login, MFA, refresh, resource upload |
|
||
| Verifier fleet (Service role) | REST API | Inbound | Polls `/sessions/revoked`, fetches `/.well-known/jwks.json` |
|
||
| CompanionPC (Jetson) edge devices | REST API | Inbound | Login + refresh; mission-token consumer |
|
||
|
||
## 2. Technology Stack
|
||
|
||
| Layer | Technology | Version | Rationale |
|
||
|-------|-----------|---------|-----------|
|
||
| Language | C# | .NET 10.0 | Modern, cross-platform, strong typing |
|
||
| Framework | ASP.NET Core Minimal API | 10.0 | Lightweight, minimal boilerplate |
|
||
| Database | PostgreSQL | (server-side) | Open-source, robust relational DB |
|
||
| ORM | linq2db | 5.4.1 | Lightweight, LINQ-native, no migrations overhead |
|
||
| Cache | LazyCache (in-memory) | 2.4.0 | Simple async caching for user lookups |
|
||
| Auth | JWT Bearer (ES256) | 10.0.3 | Stateless token auth; cycle 2 — switched from HS256 to ES256 with JWKS (AZ-532) |
|
||
| Password hashing | Konscious.Security.Cryptography (Argon2id) | (cycle 2 add) | Replaces SHA-384 (AZ-536) |
|
||
| MFA | OtpNet (TOTP) + QRCoder (PNG) | (cycle 2 add) | TOTP + recovery codes (AZ-534) |
|
||
| Rate limiting | Microsoft.AspNetCore.RateLimiting | 10.0 | Per-IP sliding window (AZ-537) |
|
||
| Data protection | Microsoft.AspNetCore.DataProtection | 10.0 | Encrypt MFA secret at rest (AZ-534) |
|
||
| Validation | FluentValidation | 11.3.0 / 11.10.0 | Declarative request validation |
|
||
| Logging | Serilog | 4.1.0 | Structured logging (console + file) |
|
||
| API Docs | Swashbuckle (Swagger) | 10.1.4 | OpenAPI specification |
|
||
| Serialization | Newtonsoft.Json | 13.0.4 | JSON for DB field mapping and responses (bumped from 13.0.1 by audit D-1) |
|
||
| Container | Docker | .NET 10.0 images | Multi-stage build, ARM64 support |
|
||
| CI/CD | Woodpecker CI | — | Branch-based ARM64 builds |
|
||
| Registry | docker.azaion.com | — | Private container registry |
|
||
|
||
## 3. Deployment Model
|
||
|
||
**Environments**: Development (local), Production (Linux server)
|
||
|
||
**Infrastructure**:
|
||
- Self-hosted Linux server (evidenced by `env/` provisioning scripts for Debian/Ubuntu)
|
||
- Docker containerization with private registry (`docker.azaion.com`, `localhost:5000`)
|
||
- No orchestration (single container deployment via `deploy.cmd`)
|
||
|
||
**Environment-specific configuration**:
|
||
|
||
| Config | Development | Production |
|
||
|--------|-------------|------------|
|
||
| Database | Local PostgreSQL (port 4312) | Remote PostgreSQL (same custom port) |
|
||
| Secrets | Environment variables (`ASPNETCORE_*`) | Environment variables |
|
||
| Logging | Console + file | Console + rolling file (`logs/log.txt`) |
|
||
| Swagger | Enabled | Disabled |
|
||
| CORS | (same policy registered, allows `https://admin.azaion.com`) | `https://admin.azaion.com` only |
|
||
| HSTS | **Disabled** (Development bypass) | **Enabled** (1 y, includeSubDomains, preload) |
|
||
| HTTPS redirect | **Disabled** (Development bypass) | **Enabled** |
|
||
| ES256 keys | `JwtConfig.KeysFolder` — at least one PEM, `ActiveKid` selects | Same; persistent volume mandatory |
|
||
| DataProtection keys | Ephemeral OK (single-instance dev) | `DataProtection:KeysFolder` MUST be a persistent volume — otherwise MFA secrets are unrecoverable after restart |
|
||
|
||
## 4. Data Model Overview
|
||
|
||
**Core entities**:
|
||
|
||
| Entity | Description | Owned By Component |
|
||
|--------|-------------|--------------------|
|
||
| User | System user. Cycle 2 added `failed_login_count`, `lockout_until` (AZ-537) and `mfa_*` columns (AZ-534). `password_hash` is now Argon2id PHC; legacy SHA-384 base64 lazily upgraded on next login (AZ-536). | 01 Data Layer |
|
||
| Session *(AZ-531+535+533+534)* | One row per refresh token (interactive) or per mission token. Carries `family_id` (rotation chain), `revoked_at`/`revoked_reason`/`revoked_by_user_id`, `class` ∈ {`interactive`, `mission`}, `aircraft_id`, `mfa_authenticated`. | 01 Data Layer |
|
||
| AuditEvent *(AZ-537+534)* | Append-only `audit_events` row: login_failed/success/lockout, mfa_enroll/confirm/disable/login_success/login_failed/recovery_used. | 01 Data Layer |
|
||
| UserConfig | JSON-serialized per-user configuration (queue offsets). | 01 Data Layer |
|
||
| RoleEnum | Authorization role hierarchy. Cycle 2 added `Service = 60` for verifier identities (AZ-535). | 01 Data Layer |
|
||
| DetectionClass | Operator-managed catalogue. Unchanged in cycle 2. | 01 Data Layer |
|
||
| ExceptionEnum | Business error code catalog. Cycle 2 added codes 50–61 for the auth/MFA/refresh/mission/lockout paths. | Common Helpers |
|
||
|
||
**Key relationships** (cycle 2 additions):
|
||
- User 1 — N Session (`sessions.user_id` FK, ON DELETE CASCADE)
|
||
- User 1 — N Session (`sessions.aircraft_id` FK for mission rows, ON DELETE SET NULL)
|
||
- User 1 — N Session (`sessions.revoked_by_user_id` FK, ON DELETE SET NULL)
|
||
- Session 1 — N Session (`parent_session_id` rotation chain)
|
||
|
||
**Data flow summary**:
|
||
- Client → API → UserService → PostgreSQL: user CRUD + Argon2id verify/hash + lazy migration
|
||
- Client → API → RefreshTokenService / SessionService / MfaService / MissionTokenService → PostgreSQL `sessions` + `users` + `audit_events`
|
||
- Verifier → API → SessionService → PostgreSQL `sessions` (revoked-since snapshot) + JwtSigningKeyProvider (JWKS)
|
||
- Client → API → ResourcesService → Filesystem: resource upload / list / clear
|
||
|
||
## 5. Integration Points
|
||
|
||
### Internal Communication
|
||
|
||
| From | To | Protocol | Pattern | Notes |
|
||
|------|----|----------|---------|-------|
|
||
| Admin API | User Management | Direct DI call | Request-Response | Scoped |
|
||
| Admin API | AuthService | Direct DI call | Request-Response | Scoped — also reads `IJwtSigningKeyProvider` (singleton) |
|
||
| Admin API | RefreshTokenService / SessionService / MfaService / MissionTokenService / AuditLog | Direct DI call | Request-Response | Scoped |
|
||
| Admin API | Resource Management | Direct DI call | Request-Response | Scoped |
|
||
| User Management | AuditLog | Direct DI call | Request-Response | Failed/success/lockout audit + sliding-window count |
|
||
| MfaService | IDataProtector | Direct DI call | Request-Response | Encrypt/decrypt mfa_secret |
|
||
| All services | Data Layer | Direct DI call | Request-Response | Singleton DbFactory |
|
||
|
||
### External Integrations
|
||
|
||
| External System | Protocol | Auth | Rate Limits | Failure Mode |
|
||
|----------------|----------|------|-------------|--------------|
|
||
| PostgreSQL | TCP (Npgsql) | Username/password | None configured | Exception propagation |
|
||
| Filesystem | OS I/O | OS-level permissions | None | Exception propagation |
|
||
|
||
## 6. Non-Functional Requirements
|
||
|
||
| Requirement | Target | Measurement | Priority |
|
||
|------------|--------|-------------|----------|
|
||
| Max upload size | 200 MB | Kestrel MaxRequestBodySize | High |
|
||
| Password hashing | Argon2id (parameters from `AuthConfig.PasswordHashing`) | Per-user, constant-time verify | High |
|
||
| Access token lifetime | `JwtConfig.AccessTokenLifetimeMinutes` (15 default) | Per token | High |
|
||
| Refresh token sliding lifetime | `SessionConfig.RefreshSlidingHours` | Per session row | High |
|
||
| Refresh token absolute lifetime | `SessionConfig.RefreshAbsoluteHours` | Per family | High |
|
||
| Mission token lifetime | `MissionSessionRequest.PlannedDurationH` (validation-bounded) | Per mission session | High |
|
||
| Per-IP login rate | `AuthConfig.RateLimit.PerIpPermitLimit` per `PerIpWindowSeconds` | Sliding window | High |
|
||
| Per-account login rate | `AuthConfig.RateLimit.PerAccountFailedThreshold` per `PerAccountWindowSeconds` | DB sliding window via `audit_events` | High |
|
||
| Account lockout | `AuthConfig.Lockout.ConsecutiveFailureThreshold` failures → `LockoutSeconds` lockout | DB-backed | High |
|
||
| HSTS | 1 y, includeSubDomains, preload (non-Development) | HTTP header | High |
|
||
| HTTPS redirect | Enabled (non-Development) | Middleware | High |
|
||
| Cache TTL | 4 hours | User entity cache | Low |
|
||
|
||
No explicit availability, latency, throughput, or recovery targets found in the codebase.
|
||
|
||
## 7. Security Architecture
|
||
|
||
**Authentication**:
|
||
- ES256 (ECDSA P-256) JWT bearer tokens (AZ-532). `ValidAlgorithms` pinned to `ES256` to prevent the HS256-with-public-key forgery class.
|
||
- Opaque refresh tokens with server-side rotation + reuse detection (AZ-531). Stored as SHA-256 hashes; never re-presented.
|
||
- TOTP MFA + recovery codes (AZ-534). Step-1 token is itself an ES256 JWT with a separate audience.
|
||
- Mission tokens (AZ-533) — long-lived, no refresh, bound to `aircraft_id`, auto-revoked on aircraft reconnect.
|
||
|
||
**Authorization**: Role-based (RBAC) via ASP.NET Core authorization policies:
|
||
- `apiAdminPolicy` — requires `ApiAdmin`
|
||
- `revocationReaderPolicy` — requires `Service` OR `ApiAdmin` (verifier fleet)
|
||
- General `[Authorize]` — any authenticated user
|
||
|
||
**Data protection**:
|
||
- **At rest**: `mfa_secret` is encrypted via `IDataProtector` (purpose `Azaion.Mfa.Secret`). MFA recovery codes are individually Argon2id-hashed and single-use. Passwords are Argon2id PHC strings. ES256 PEM keys live in `JwtConfig.KeysFolder` — protect via filesystem permissions.
|
||
- **In transit**: HSTS + HTTPS redirection in non-Development environments (AZ-538). CORS narrowed to `https://admin.azaion.com` only.
|
||
- **Token revocation propagation**: `GET /sessions/revoked` provides a verifier-poll snapshot; verifiers are responsible for honoring it within their poll cadence (currently ~30s recommended).
|
||
- **Secrets management**: Environment variables (`ASPNETCORE_*` prefix).
|
||
|
||
**Audit logging**: `audit_events` table records login_success/failed/lockout and mfa_enroll/confirm/disable/login_success/login_failed/recovery_used events with normalised email + caller IP. Drives the per-account rate limit and provides forensic evidence. Serilog continues to log business exceptions (WARN) and general events (INFO).
|
||
|
||
## 8. Key Architectural Decisions
|
||
|
||
### ADR-001: Minimal API over Controllers
|
||
|
||
**Context**: API has ~17 endpoints with simple request/response patterns.
|
||
|
||
**Decision**: Use ASP.NET Core Minimal API with top-level statements instead of MVC controllers.
|
||
|
||
**Consequences**: All endpoints in a single `Program.cs`. Simple for small APIs but could become unwieldy as endpoints grow.
|
||
|
||
### ADR-002: Read/Write Database Connection Separation
|
||
|
||
**Context**: Needed different privilege levels for read vs. write operations.
|
||
|
||
**Decision**: `DbFactory` maintains two connection strings — a read-only one (`AzaionDb`) and an admin one (`AzaionDbAdmin`) — with separate `Run` and `RunAdmin` methods.
|
||
|
||
**Consequences**: Write operations are explicitly gated through `RunAdmin`. Prevents accidental writes through the reader connection. Requires maintaining two DB users with different privileges.
|
||
|
||
### ADR-003: Per-User Resource Encryption — RETIRED (cycle 2, 2026-05-14)
|
||
|
||
**Original context**: Resources (DLLs, AI models) had to be delivered only to authorized users via a per-download AES-256-CBC stream keyed off the user's email + password.
|
||
|
||
**Retirement decision**: With the OTA delivery flow (AZ-183) and the hardware-binding flow (AZ-197) both gone, the only remaining consumer of the encrypted-download path was a now-vestigial `POST /resources/get/{dataFolder?}` endpoint and the two installer endpoints. None of them are part of the target architecture (browser SaaS + fTPM Jetsons), so the entire encrypt-on-download stack — `POST /resources/get`, `GET /resources/get-installer`, `GET /resources/get-installer/stage`, `ResourcesService.GetEncryptedResource`, `ResourcesService.GetInstaller`, `Security.GetApiEncryptionKey`, `Security.EncryptTo`, `Security.DecryptTo`, `GetResourceRequest`, `WrongResourceName` (50), `ResourcesConfig.SuiteInstallerFolder` / `SuiteStageInstallerFolder` — was removed. `Security.ToHash` is retained because it still backs SHA-384 password hashing in `UserService`.
|
||
|
||
**Consequences**: resource files now live on disk as plain bytes; any future at-rest encryption must come from filesystem or storage-layer features (LUKS, object-store SSE), not from application code.
|
||
|
||
### ADR-004: Hardware Fingerprint Binding — RETIRED (AZ-197)
|
||
|
||
**Original context**: Resources should only be usable on a specific physical machine.
|
||
|
||
**Original decision**: On first resource access, the user's hardware fingerprint string was stored. Subsequent accesses compared the hash of the provided hardware against the stored value.
|
||
|
||
**Retirement decision (2026-05-13, AZ-197)**: The threat model that motivated this binding (credential reuse across machines via desktop installers) no longer applies:
|
||
|
||
- **Edge devices** ship as **fTPM-secured Jetsons** (secure boot, fTPM-protected key storage, no user filesystem access, no installer redistribution). Hardware identity is anchored in the fTPM, not in a SHA-384 of CPU/GPU/Memory/DriveSerial strings.
|
||
- **Server / desktop access** is **SaaS-only** (browser → admin API). There is no installer to copy and no hardware fingerprint to take.
|
||
|
||
The binding's only remaining effect was a real production failure mode (`HardwareIdMismatch`, error code 40) on legitimate hardware events. AZ-197 removed `CheckHardwareHash`, `UpdateHardware`, `Security.GetHWHash`, the `PUT /users/hardware/set` and `POST /resources/check` endpoints, and the `Hardware` field from `GetResourceRequest`. The `User.Hardware` DB column is a nullable tombstone (no migration in AZ-197; separate ticket if/when the column is dropped).
|
||
|
||
### ADR-005: linq2db over Entity Framework
|
||
|
||
**Context**: Needed a lightweight ORM for PostgreSQL.
|
||
|
||
**Decision**: Use linq2db instead of Entity Framework Core.
|
||
|
||
**Consequences**: No migration framework — schema managed via SQL scripts (`env/db/`). Lighter runtime footprint. Manual mapping configuration in `AzaionDbSchemaHolder`.
|
||
|
||
### ADR-006: Asymmetric ES256 JWT signing with file-system key store + JWKS *(cycle 2 — AZ-532)*
|
||
|
||
**Context**: Cycle-1 JWT signing was symmetric HS256 with the secret in environment configuration. The verifier fleet (satellite-provider, gps-denied, ui) needed to validate tokens without sharing the signing secret with every service. Sharing the HS256 secret would have made any verifier compromise also a token-forgery primitive.
|
||
|
||
**Decision**: Switch to ES256 (ECDSA P-256). The Admin API holds the private key; verifiers fetch the public key set from `GET /.well-known/jwks.json`. Keys live as one PEM per kid in `JwtConfig.KeysFolder`. `JwtConfig.ActiveKid` selects the signer; ALL discovered keys are exposed in JWKS so existing tokens stay verifiable across rotations.
|
||
|
||
**Alternatives rejected**:
|
||
- **Continue HS256 + share secret**: rejected — secret-distribution + verifier-compromise blast radius.
|
||
- **RS256**: equivalent security, larger keys, no operational benefit at our scale.
|
||
- **External KMS / HSM**: deferred — adds operational complexity (KMS auth, latency on every signing op) without near-term benefit. The PEM-on-disk approach is reversible to KMS later.
|
||
|
||
**Consequences**:
|
||
- JwtBearer `ValidAlgorithms = [ES256]` is mandatory — without it, a token forged with `alg=HS256` using the public key as the HMAC secret would validate.
|
||
- The PEM directory MUST be a persistent volume.
|
||
- Key rotation is "drop a new PEM, set `ActiveKid`, restart" — the old kid keeps verifying tokens until physically removed.
|
||
- Verifiers MUST cache the JWKS for at most 1 hour to pick up new kids quickly.
|
||
|
||
### ADR-007: Refresh tokens as opaque rotating server-side rows (not JWT) *(cycle 2 — AZ-531)*
|
||
|
||
**Context**: The dual-token model needs a refresh token. The two viable shapes are (a) signed self-describing JWT or (b) opaque server-stored value. Refresh tokens are long-lived; their threat model centres on theft + replay.
|
||
|
||
**Decision**: Opaque random `Base64Url(32 bytes)` stored on the server as a SHA-256 hash. Each rotation marks the previous row as `revoked_reason='rotated'` and inserts a new row in the same `family_id`. Presenting an already-rotated token revokes the entire family with `reason='reuse_detected'`.
|
||
|
||
**Alternatives rejected**:
|
||
- **JWT refresh token**: server cannot revoke without a denylist (which negates the "stateless" advantage). No reuse-detection without ALSO server state.
|
||
- **Sliding session ID alone (no rotation)**: theft is permanent until manual revocation.
|
||
|
||
**Consequences**:
|
||
- Every refresh hits Postgres (one indexed lookup + one update + one insert in a transaction). Acceptable at current load; if it becomes a bottleneck, the `sessions_refresh_hash_idx` UNIQUE INDEX is the obvious caching boundary.
|
||
- Refresh-token theft is detectable on the next legitimate refresh.
|
||
- The session row is also the `sid` claim in the access token — the same row drives logout (F12), JWKS-independent revocation snapshots (F15), and AMR persistence across rotations (`mfa_authenticated`).
|
||
|
||
### ADR-008: TOTP MFA secrets encrypted via `IDataProtector` *(cycle 2 — AZ-534)*
|
||
|
||
**Context**: MFA secrets are TOTP shared secrets — possession of the database alone (DBA access, backup leak) must NOT yield the ability to mint TOTP codes for users.
|
||
|
||
**Decision**: Encrypt `mfa_secret` with ASP.NET `IDataProtector` (purpose string `Azaion.Mfa.Secret`) before persisting. The DataProtection key store is configured via `DataProtection:KeysFolder` and MUST be a persistent volume in production. Recovery codes are individually Argon2id-hashed and stored as a `jsonb` array; single-use is enforced by setting `used_at` transactionally with the rest of the login.
|
||
|
||
**Alternatives rejected**:
|
||
- **Plaintext**: explicit DB-leak escalation path.
|
||
- **Application-managed AES via env-var key**: re-introduces the very key-distribution problem ADR-006 solved for JWT signing.
|
||
- **External KMS for MFA secrets**: deferred for the same reason as ADR-006.
|
||
|
||
**Consequences**:
|
||
- Loss of the DataProtection key folder = users must re-enroll MFA (no recovery path). This MUST be backed up alongside DB backups.
|
||
- DBA-only access does not yield MFA bypass.
|
||
|
||
### ADR-009: Per-account lockout + DB-backed sliding-window rate limit alongside per-IP token bucket *(cycle 2 — AZ-537)*
|
||
|
||
**Context**: ASP.NET `RateLimiter` is per-process and per-IP. CMMC AC.L2-3.1.8 requires per-account lockout that survives process restarts. Per-IP alone is insufficient (NAT'd attacker farm; bot rotates IPs). Per-account-only is insufficient (single IP can DoS many accounts at "just below threshold").
|
||
|
||
**Decision**: Both layers, both required to pass:
|
||
1. Per-IP — ASP.NET `RateLimiter` middleware with `SlidingWindowRateLimiter` on `/login` and `/login/mfa`. In-memory; resets on restart but recovers within seconds.
|
||
2. Per-account — DB-backed sliding window via `audit_events` (count `login_failed` rows for the email within `PerAccountWindowSeconds`).
|
||
3. Lockout — `users.failed_login_count` + `users.lockout_until`. After `ConsecutiveFailureThreshold` failures, `lockout_until = now + LockoutSeconds`. Subsequent logins throw `AccountLocked` with `RetryAfterSeconds` until the window passes.
|
||
|
||
**Alternatives rejected**:
|
||
- **Redis token bucket per account**: avoids DB load but adds a new infra dependency for a low-write workload. The DB sliding window has acceptable cost (`audit_events_event_type_email_idx`).
|
||
- **Single combined rule**: harder to tune.
|
||
|
||
**Consequences**:
|
||
- `audit_events` will grow large (~14 GB/yr at projected fleet scale); operational follow-up to time-partition.
|
||
- The `Retry-After` header is set both by the per-IP middleware (lease metadata) and by the `BusinessExceptionHandler` (from `BusinessException.RetryAfterSeconds`), so clients see consistent backoff hints regardless of which layer rejected.
|
||
- All gating events go through `audit_events`, providing a single auditable history.
|