mirror of
https://github.com/azaion/admin.git
synced 2026-06-21 18:01:10 +00:00
[AZ-529] [AZ-530] Cycle-2 documentation refresh
Refreshes _docs/02_document/ to reflect the cycle-2 auth-modernization
+ CMMC hardening landings (AZ-531..AZ-538). Authoritative source for
the ripple set is ripple_log_cycle2.md.
Covered:
- architecture.md (section 1 rewritten, ADRs 6-9 added)
- data_model.md (sessions, audit_events, user columns, migrations)
- system-flows.md (F1 rewritten; F11-F17 added; F2/F7/F9 minor)
- module-layout.md (cycle-2 sub-component table)
- diagrams/flows/flow_login.md (dual-token + MFA)
- components/{01_data_layer,03_auth_and_security,05_admin_api}
- modules/ (12 new, 8 modified — full Argon2id/ES256/MFA/refresh
/mission/session/audit/jwks rollup)
- tests/{blackbox,security,traceability-matrix}
Step 13 (Update Docs) output for cycle 2.
Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -2,24 +2,36 @@
|
||||
|
||||
## 1. System Context
|
||||
|
||||
**Problem being solved**: Azaion Suite requires a centralized admin API to manage users, assign roles, and securely distribute encrypted software resources (DLLs, AI models, installers) to authorized devices and SaaS users.
|
||||
**Problem being solved**: Azaion Suite requires a centralized admin API to manage users + roles, authenticate humans (with optional second factor), authenticate UAVs for offline missions, and broker token revocation across a fleet of verifier services.
|
||||
|
||||
**System boundaries**:
|
||||
- **Inside**: User management, authentication (JWT), role-based authorization, file-based resource storage (upload / list / clear).
|
||||
- **Outside**: Client applications (admin web panel at admin.azaion.com, fTPM-secured Jetson edge devices), PostgreSQL database, server filesystem for resource storage.
|
||||
- **Inside**: user management, password hashing (Argon2id), authentication (ES256 JWT + opaque refresh tokens with rotation + reuse detection), TOTP MFA, mission-token issuance, session revocation + verifier-poll snapshot, account lockout + per-IP and per-account rate limiting, JWKS publication, role-based authorization, file-based resource storage (upload / list / clear), HSTS + HTTPS redirect.
|
||||
- **Outside**: admin web panel (`admin.azaion.com`), fTPM-secured Jetson edge devices (CompanionPC), verifier fleet (satellite-provider, gps-denied, ui — service-role identities), PostgreSQL, server filesystem.
|
||||
|
||||
> **Note (AZ-197, 2026-05-13)**: hardware-fingerprint binding (`User.Hardware`, `CheckHardwareHash`, `PUT /users/hardware/set`, `POST /resources/check`, `HardwareIdMismatch`/`BadHardware` error codes) was removed. Edge devices now ship as fTPM-secured Jetsons; server/desktop access is SaaS-only. The `User.Hardware` DB column remains as a nullable tombstone (no migration in AZ-197).
|
||||
|
||||
> **Note (cycle 2, 2026-05-14)**: the encrypted resource download (`POST /resources/get/{dataFolder?}`) and both installer endpoints (`GET /resources/get-installer`, `GET /resources/get-installer/stage`) were removed as obsolete. Their orphaned support code went with them: `ResourcesService.GetEncryptedResource` / `GetInstaller`, `Security.GetApiEncryptionKey` / `EncryptTo` / `DecryptTo`, the `GetResourceRequest` DTO (+ `WrongResourceName` error code 50, gap kept), and the `ResourcesConfig.SuiteInstallerFolder` / `SuiteStageInstallerFolder` properties + their env var rows in every config artifact. The `Azaion.Test` unit-test project became empty and was removed from the solution. Per-user file encryption is no longer part of the system; resource delivery is now upload + list + clear only. ADR-003 below is **retired** as a result.
|
||||
> **Note (AZ-197, cycle 1)**: hardware-fingerprint binding removed.
|
||||
>
|
||||
> **Note (cycle 2 early)**: encrypted resource download + installer endpoints removed; ADR-003 retired.
|
||||
>
|
||||
> **Note (cycle 2 — Auth Modernization, 2026-05-14, AZ-531..AZ-538)**: the entire authentication layer was rebuilt:
|
||||
> - **AZ-536** — Argon2id password hashing replaced SHA-384; lazy migration on login.
|
||||
> - **AZ-531** — opaque refresh tokens with server-side rotation, family-based reuse detection, sliding + absolute lifetimes (`SessionConfig`).
|
||||
> - **AZ-532** — symmetric HS256 → asymmetric ES256 with file-system key store + JWKS endpoint.
|
||||
> - **AZ-534** — TOTP MFA (enroll/confirm/disable, recovery codes, two-step login, `IDataProtector`-encrypted secret, `amr` claim).
|
||||
> - **AZ-535** — logout (single + all) + admin revoke + verifier-poll snapshot of revoked sessions; new `Service` role for verifier identities.
|
||||
> - **AZ-533** — long-lived no-refresh mission tokens for UAV ops, with auto-revoke on aircraft reconnect.
|
||||
> - **AZ-537** — DB-backed account lockout + per-account sliding-window rate limit + per-IP token-bucket via ASP.NET `RateLimiter`; `audit_events` table.
|
||||
> - **AZ-538** — CORS narrowed to single HTTPS origin, HSTS enabled (non-Development), HTTPS redirection (non-Development).
|
||||
> - New ADRs **ADR-006** through **ADR-009** below capture the per-decision context.
|
||||
|
||||
**External systems**:
|
||||
|
||||
| System | Integration Type | Direction | Purpose |
|
||||
|--------|-----------------|-----------|---------|
|
||||
| PostgreSQL | Database (linq2db) | Both | User data persistence |
|
||||
| Server filesystem | File I/O | Both | Resource file storage and retrieval |
|
||||
| Azaion Suite client | REST API | Inbound | Resource download, login |
|
||||
| Admin web panel (admin.azaion.com) | REST API | Inbound | User management, resource upload |
|
||||
| PostgreSQL | Database (linq2db) | Both | User + session + audit_events persistence |
|
||||
| Server filesystem | File I/O | Both | Resource files; ES256 PEM key store; DataProtection key store (when `DataProtection:KeysFolder` is set) |
|
||||
| Admin web panel (admin.azaion.com) | REST API | Inbound | User management, login, MFA, refresh, resource upload |
|
||||
| Verifier fleet (Service role) | REST API | Inbound | Polls `/sessions/revoked`, fetches `/.well-known/jwks.json` |
|
||||
| CompanionPC (Jetson) edge devices | REST API | Inbound | Login + refresh; mission-token consumer |
|
||||
|
||||
## 2. Technology Stack
|
||||
|
||||
@@ -30,11 +42,15 @@
|
||||
| Database | PostgreSQL | (server-side) | Open-source, robust relational DB |
|
||||
| ORM | linq2db | 5.4.1 | Lightweight, LINQ-native, no migrations overhead |
|
||||
| Cache | LazyCache (in-memory) | 2.4.0 | Simple async caching for user lookups |
|
||||
| Auth | JWT Bearer | 10.0.3 | Stateless token authentication |
|
||||
| Auth | JWT Bearer (ES256) | 10.0.3 | Stateless token auth; cycle 2 — switched from HS256 to ES256 with JWKS (AZ-532) |
|
||||
| Password hashing | Konscious.Security.Cryptography (Argon2id) | (cycle 2 add) | Replaces SHA-384 (AZ-536) |
|
||||
| MFA | OtpNet (TOTP) + QRCoder (PNG) | (cycle 2 add) | TOTP + recovery codes (AZ-534) |
|
||||
| Rate limiting | Microsoft.AspNetCore.RateLimiting | 10.0 | Per-IP sliding window (AZ-537) |
|
||||
| Data protection | Microsoft.AspNetCore.DataProtection | 10.0 | Encrypt MFA secret at rest (AZ-534) |
|
||||
| Validation | FluentValidation | 11.3.0 / 11.10.0 | Declarative request validation |
|
||||
| Logging | Serilog | 4.1.0 | Structured logging (console + file) |
|
||||
| API Docs | Swashbuckle (Swagger) | 10.1.4 | OpenAPI specification |
|
||||
| Serialization | Newtonsoft.Json | 13.0.1 | JSON for DB field mapping and responses |
|
||||
| Serialization | Newtonsoft.Json | 13.0.4 | JSON for DB field mapping and responses (bumped from 13.0.1 by audit D-1) |
|
||||
| Container | Docker | .NET 10.0 images | Multi-stage build, ARM64 support |
|
||||
| CI/CD | Woodpecker CI | — | Branch-based ARM64 builds |
|
||||
| Registry | docker.azaion.com | — | Private container registry |
|
||||
@@ -56,7 +72,11 @@
|
||||
| Secrets | Environment variables (`ASPNETCORE_*`) | Environment variables |
|
||||
| Logging | Console + file | Console + rolling file (`logs/log.txt`) |
|
||||
| Swagger | Enabled | Disabled |
|
||||
| CORS | Same as prod | `admin.azaion.com` |
|
||||
| CORS | (same policy registered, allows `https://admin.azaion.com`) | `https://admin.azaion.com` only |
|
||||
| HSTS | **Disabled** (Development bypass) | **Enabled** (1 y, includeSubDomains, preload) |
|
||||
| HTTPS redirect | **Disabled** (Development bypass) | **Enabled** |
|
||||
| ES256 keys | `JwtConfig.KeysFolder` — at least one PEM, `ActiveKid` selects | Same; persistent volume mandatory |
|
||||
| DataProtection keys | Ephemeral OK (single-instance dev) | `DataProtection:KeysFolder` MUST be a persistent volume — otherwise MFA secrets are unrecoverable after restart |
|
||||
|
||||
## 4. Data Model Overview
|
||||
|
||||
@@ -64,21 +84,25 @@
|
||||
|
||||
| Entity | Description | Owned By Component |
|
||||
|--------|-------------|--------------------|
|
||||
| User | System user with email (UNIQUE-indexed via `users_email_uidx`), password hash, role, config (legacy `Hardware` column tombstoned per AZ-197). Subset of users have `Role = CompanionPC` and are auto-provisioned via `POST /devices` (AZ-196), which delegates the insert to `UserService.RegisterUser` (post-security-audit consolidation, finding F-3). | 01 Data Layer |
|
||||
| UserConfig | JSON-serialized per-user configuration (queue offsets) | 01 Data Layer |
|
||||
| RoleEnum | Authorization role hierarchy (None → ApiAdmin); `ResourceUploader` retained as data only after the OTA endpoints were retired | 01 Data Layer |
|
||||
| DetectionClass *(AZ-513, cycle 1)* | Operator-managed detection-class catalogue (Name, ShortName, Color, MaxSizeM, PhotoMode?) backing the UI Detection Classes table | 01 Data Layer |
|
||||
| ExceptionEnum | Business error code catalog (HW-related codes 40/45 removed by AZ-197) | Common Helpers |
|
||||
| User | System user. Cycle 2 added `failed_login_count`, `lockout_until` (AZ-537) and `mfa_*` columns (AZ-534). `password_hash` is now Argon2id PHC; legacy SHA-384 base64 lazily upgraded on next login (AZ-536). | 01 Data Layer |
|
||||
| Session *(AZ-531+535+533+534)* | One row per refresh token (interactive) or per mission token. Carries `family_id` (rotation chain), `revoked_at`/`revoked_reason`/`revoked_by_user_id`, `class` ∈ {`interactive`, `mission`}, `aircraft_id`, `mfa_authenticated`. | 01 Data Layer |
|
||||
| AuditEvent *(AZ-537+534)* | Append-only `audit_events` row: login_failed/success/lockout, mfa_enroll/confirm/disable/login_success/login_failed/recovery_used. | 01 Data Layer |
|
||||
| UserConfig | JSON-serialized per-user configuration (queue offsets). | 01 Data Layer |
|
||||
| RoleEnum | Authorization role hierarchy. Cycle 2 added `Service = 60` for verifier identities (AZ-535). | 01 Data Layer |
|
||||
| DetectionClass | Operator-managed catalogue. Unchanged in cycle 2. | 01 Data Layer |
|
||||
| ExceptionEnum | Business error code catalog. Cycle 2 added codes 50–61 for the auth/MFA/refresh/mission/lockout paths. | Common Helpers |
|
||||
|
||||
> **Removed in cycle 1 / post-cycle-1**: the `Resource` entity, the `resources` table, and the OTA delivery flow (AZ-183 — F10) were reverted after the security audit (finding F-1). The data model no longer carries an OTA-artifact entity.
|
||||
|
||||
**Key relationships**:
|
||||
- User → RoleEnum: each user has exactly one role
|
||||
- User → UserConfig: optional 1:1 JSON field containing queue offsets
|
||||
**Key relationships** (cycle 2 additions):
|
||||
- User 1 — N Session (`sessions.user_id` FK, ON DELETE CASCADE)
|
||||
- User 1 — N Session (`sessions.aircraft_id` FK for mission rows, ON DELETE SET NULL)
|
||||
- User 1 — N Session (`sessions.revoked_by_user_id` FK, ON DELETE SET NULL)
|
||||
- Session 1 — N Session (`parent_session_id` rotation chain)
|
||||
|
||||
**Data flow summary**:
|
||||
- Client → API → UserService → PostgreSQL: user CRUD operations
|
||||
- Client → API → ResourcesService → Filesystem: resource upload / list / clear (encrypted download + installer delivery were retired in cycle 2)
|
||||
- Client → API → UserService → PostgreSQL: user CRUD + Argon2id verify/hash + lazy migration
|
||||
- Client → API → RefreshTokenService / SessionService / MfaService / MissionTokenService → PostgreSQL `sessions` + `users` + `audit_events`
|
||||
- Verifier → API → SessionService → PostgreSQL `sessions` (revoked-since snapshot) + JwtSigningKeyProvider (JWKS)
|
||||
- Client → API → ResourcesService → Filesystem: resource upload / list / clear
|
||||
|
||||
## 5. Integration Points
|
||||
|
||||
@@ -86,11 +110,13 @@
|
||||
|
||||
| From | To | Protocol | Pattern | Notes |
|
||||
|------|----|----------|---------|-------|
|
||||
| Admin API | User Management | Direct DI call | Request-Response | Scoped service injection |
|
||||
| Admin API | Auth & Security | Direct DI call | Request-Response | Scoped service injection |
|
||||
| Admin API | Resource Management | Direct DI call | Request-Response | Scoped service injection |
|
||||
| User Management | Data Layer | Direct DI call | Request-Response | Singleton DbFactory |
|
||||
| Auth & Security | User Management | Direct DI call | Request-Response | IUserService.GetByEmail |
|
||||
| Admin API | User Management | Direct DI call | Request-Response | Scoped |
|
||||
| Admin API | AuthService | Direct DI call | Request-Response | Scoped — also reads `IJwtSigningKeyProvider` (singleton) |
|
||||
| Admin API | RefreshTokenService / SessionService / MfaService / MissionTokenService / AuditLog | Direct DI call | Request-Response | Scoped |
|
||||
| Admin API | Resource Management | Direct DI call | Request-Response | Scoped |
|
||||
| User Management | AuditLog | Direct DI call | Request-Response | Failed/success/lockout audit + sliding-window count |
|
||||
| MfaService | IDataProtector | Direct DI call | Request-Response | Encrypt/decrypt mfa_secret |
|
||||
| All services | Data Layer | Direct DI call | Request-Response | Singleton DbFactory |
|
||||
|
||||
### External Integrations
|
||||
|
||||
@@ -104,29 +130,40 @@
|
||||
| Requirement | Target | Measurement | Priority |
|
||||
|------------|--------|-------------|----------|
|
||||
| Max upload size | 200 MB | Kestrel MaxRequestBodySize | High |
|
||||
| Password hashing | SHA-384 | Per-user | Medium |
|
||||
| Password hashing | Argon2id (parameters from `AuthConfig.PasswordHashing`) | Per-user, constant-time verify | High |
|
||||
| Access token lifetime | `JwtConfig.AccessTokenLifetimeMinutes` (15 default) | Per token | High |
|
||||
| Refresh token sliding lifetime | `SessionConfig.RefreshSlidingHours` | Per session row | High |
|
||||
| Refresh token absolute lifetime | `SessionConfig.RefreshAbsoluteHours` | Per family | High |
|
||||
| Mission token lifetime | `MissionSessionRequest.PlannedDurationH` (validation-bounded) | Per mission session | High |
|
||||
| Per-IP login rate | `AuthConfig.RateLimit.PerIpPermitLimit` per `PerIpWindowSeconds` | Sliding window | High |
|
||||
| Per-account login rate | `AuthConfig.RateLimit.PerAccountFailedThreshold` per `PerAccountWindowSeconds` | DB sliding window via `audit_events` | High |
|
||||
| Account lockout | `AuthConfig.Lockout.ConsecutiveFailureThreshold` failures → `LockoutSeconds` lockout | DB-backed | High |
|
||||
| HSTS | 1 y, includeSubDomains, preload (non-Development) | HTTP header | High |
|
||||
| HTTPS redirect | Enabled (non-Development) | Middleware | High |
|
||||
| Cache TTL | 4 hours | User entity cache | Low |
|
||||
|
||||
> The "File encryption / AES-256-CBC" NFR was retired in cycle 2 along with the encrypted-download endpoint. See ADR-003.
|
||||
|
||||
No explicit availability, latency, throughput, or recovery targets found in the codebase.
|
||||
|
||||
## 7. Security Architecture
|
||||
|
||||
**Authentication**: JWT Bearer tokens (HMAC-SHA256 signed, validated for issuer/audience/lifetime/signing key).
|
||||
**Authentication**:
|
||||
- ES256 (ECDSA P-256) JWT bearer tokens (AZ-532). `ValidAlgorithms` pinned to `ES256` to prevent the HS256-with-public-key forgery class.
|
||||
- Opaque refresh tokens with server-side rotation + reuse detection (AZ-531). Stored as SHA-256 hashes; never re-presented.
|
||||
- TOTP MFA + recovery codes (AZ-534). Step-1 token is itself an ES256 JWT with a separate audience.
|
||||
- Mission tokens (AZ-533) — long-lived, no refresh, bound to `aircraft_id`, auto-revoked on aircraft reconnect.
|
||||
|
||||
**Authorization**: Role-based (RBAC) via ASP.NET Core authorization policies:
|
||||
- `apiAdminPolicy` — requires `ApiAdmin` role
|
||||
- `apiAdminPolicy` — requires `ApiAdmin`
|
||||
- `revocationReaderPolicy` — requires `Service` OR `ApiAdmin` (verifier fleet)
|
||||
- General `[Authorize]` — any authenticated user
|
||||
|
||||
> The `apiUploaderPolicy` was added by AZ-183 and removed in the post-cycle-1 revert along with the OTA endpoints it guarded. `RoleEnum.ResourceUploader` remains as data only.
|
||||
|
||||
**Data protection**:
|
||||
- At rest: resource files are stored as plain bytes on the server filesystem (per-user AES-256-CBC encryption was retired in cycle 2 — see ADR-003).
|
||||
- In transit: HTTPS (assumed, not enforced in code)
|
||||
- Secrets management: Environment variables (`ASPNETCORE_*` prefix)
|
||||
- **At rest**: `mfa_secret` is encrypted via `IDataProtector` (purpose `Azaion.Mfa.Secret`). MFA recovery codes are individually Argon2id-hashed and single-use. Passwords are Argon2id PHC strings. ES256 PEM keys live in `JwtConfig.KeysFolder` — protect via filesystem permissions.
|
||||
- **In transit**: HSTS + HTTPS redirection in non-Development environments (AZ-538). CORS narrowed to `https://admin.azaion.com` only.
|
||||
- **Token revocation propagation**: `GET /sessions/revoked` provides a verifier-poll snapshot; verifiers are responsible for honoring it within their poll cadence (currently ~30s recommended).
|
||||
- **Secrets management**: Environment variables (`ASPNETCORE_*` prefix).
|
||||
|
||||
**Audit logging**: No explicit audit trail. Serilog logs business exceptions (WARN) and general events (INFO).
|
||||
**Audit logging**: `audit_events` table records login_success/failed/lockout and mfa_enroll/confirm/disable/login_success/login_failed/recovery_used events with normalised email + caller IP. Drives the per-account rate limit and provides forensic evidence. Serilog continues to log business exceptions (WARN) and general events (INFO).
|
||||
|
||||
## 8. Key Architectural Decisions
|
||||
|
||||
@@ -174,3 +211,68 @@ The binding's only remaining effect was a real production failure mode (`Hardwar
|
||||
**Decision**: Use linq2db instead of Entity Framework Core.
|
||||
|
||||
**Consequences**: No migration framework — schema managed via SQL scripts (`env/db/`). Lighter runtime footprint. Manual mapping configuration in `AzaionDbSchemaHolder`.
|
||||
|
||||
### ADR-006: Asymmetric ES256 JWT signing with file-system key store + JWKS *(cycle 2 — AZ-532)*
|
||||
|
||||
**Context**: Cycle-1 JWT signing was symmetric HS256 with the secret in environment configuration. The verifier fleet (satellite-provider, gps-denied, ui) needed to validate tokens without sharing the signing secret with every service. Sharing the HS256 secret would have made any verifier compromise also a token-forgery primitive.
|
||||
|
||||
**Decision**: Switch to ES256 (ECDSA P-256). The Admin API holds the private key; verifiers fetch the public key set from `GET /.well-known/jwks.json`. Keys live as one PEM per kid in `JwtConfig.KeysFolder`. `JwtConfig.ActiveKid` selects the signer; ALL discovered keys are exposed in JWKS so existing tokens stay verifiable across rotations.
|
||||
|
||||
**Alternatives rejected**:
|
||||
- **Continue HS256 + share secret**: rejected — secret-distribution + verifier-compromise blast radius.
|
||||
- **RS256**: equivalent security, larger keys, no operational benefit at our scale.
|
||||
- **External KMS / HSM**: deferred — adds operational complexity (KMS auth, latency on every signing op) without near-term benefit. The PEM-on-disk approach is reversible to KMS later.
|
||||
|
||||
**Consequences**:
|
||||
- JwtBearer `ValidAlgorithms = [ES256]` is mandatory — without it, a token forged with `alg=HS256` using the public key as the HMAC secret would validate.
|
||||
- The PEM directory MUST be a persistent volume.
|
||||
- Key rotation is "drop a new PEM, set `ActiveKid`, restart" — the old kid keeps verifying tokens until physically removed.
|
||||
- Verifiers MUST cache the JWKS for at most 1 hour to pick up new kids quickly.
|
||||
|
||||
### ADR-007: Refresh tokens as opaque rotating server-side rows (not JWT) *(cycle 2 — AZ-531)*
|
||||
|
||||
**Context**: The dual-token model needs a refresh token. The two viable shapes are (a) signed self-describing JWT or (b) opaque server-stored value. Refresh tokens are long-lived; their threat model centres on theft + replay.
|
||||
|
||||
**Decision**: Opaque random `Base64Url(32 bytes)` stored on the server as a SHA-256 hash. Each rotation marks the previous row as `revoked_reason='rotated'` and inserts a new row in the same `family_id`. Presenting an already-rotated token revokes the entire family with `reason='reuse_detected'`.
|
||||
|
||||
**Alternatives rejected**:
|
||||
- **JWT refresh token**: server cannot revoke without a denylist (which negates the "stateless" advantage). No reuse-detection without ALSO server state.
|
||||
- **Sliding session ID alone (no rotation)**: theft is permanent until manual revocation.
|
||||
|
||||
**Consequences**:
|
||||
- Every refresh hits Postgres (one indexed lookup + one update + one insert in a transaction). Acceptable at current load; if it becomes a bottleneck, the `sessions_refresh_hash_idx` UNIQUE INDEX is the obvious caching boundary.
|
||||
- Refresh-token theft is detectable on the next legitimate refresh.
|
||||
- The session row is also the `sid` claim in the access token — the same row drives logout (F12), JWKS-independent revocation snapshots (F15), and AMR persistence across rotations (`mfa_authenticated`).
|
||||
|
||||
### ADR-008: TOTP MFA secrets encrypted via `IDataProtector` *(cycle 2 — AZ-534)*
|
||||
|
||||
**Context**: MFA secrets are TOTP shared secrets — possession of the database alone (DBA access, backup leak) must NOT yield the ability to mint TOTP codes for users.
|
||||
|
||||
**Decision**: Encrypt `mfa_secret` with ASP.NET `IDataProtector` (purpose string `Azaion.Mfa.Secret`) before persisting. The DataProtection key store is configured via `DataProtection:KeysFolder` and MUST be a persistent volume in production. Recovery codes are individually Argon2id-hashed and stored as a `jsonb` array; single-use is enforced by setting `used_at` transactionally with the rest of the login.
|
||||
|
||||
**Alternatives rejected**:
|
||||
- **Plaintext**: explicit DB-leak escalation path.
|
||||
- **Application-managed AES via env-var key**: re-introduces the very key-distribution problem ADR-006 solved for JWT signing.
|
||||
- **External KMS for MFA secrets**: deferred for the same reason as ADR-006.
|
||||
|
||||
**Consequences**:
|
||||
- Loss of the DataProtection key folder = users must re-enroll MFA (no recovery path). This MUST be backed up alongside DB backups.
|
||||
- DBA-only access does not yield MFA bypass.
|
||||
|
||||
### ADR-009: Per-account lockout + DB-backed sliding-window rate limit alongside per-IP token bucket *(cycle 2 — AZ-537)*
|
||||
|
||||
**Context**: ASP.NET `RateLimiter` is per-process and per-IP. CMMC AC.L2-3.1.8 requires per-account lockout that survives process restarts. Per-IP alone is insufficient (NAT'd attacker farm; bot rotates IPs). Per-account-only is insufficient (single IP can DoS many accounts at "just below threshold").
|
||||
|
||||
**Decision**: Both layers, both required to pass:
|
||||
1. Per-IP — ASP.NET `RateLimiter` middleware with `SlidingWindowRateLimiter` on `/login` and `/login/mfa`. In-memory; resets on restart but recovers within seconds.
|
||||
2. Per-account — DB-backed sliding window via `audit_events` (count `login_failed` rows for the email within `PerAccountWindowSeconds`).
|
||||
3. Lockout — `users.failed_login_count` + `users.lockout_until`. After `ConsecutiveFailureThreshold` failures, `lockout_until = now + LockoutSeconds`. Subsequent logins throw `AccountLocked` with `RetryAfterSeconds` until the window passes.
|
||||
|
||||
**Alternatives rejected**:
|
||||
- **Redis token bucket per account**: avoids DB load but adds a new infra dependency for a low-write workload. The DB sliding window has acceptable cost (`audit_events_event_type_email_idx`).
|
||||
- **Single combined rule**: harder to tune.
|
||||
|
||||
**Consequences**:
|
||||
- `audit_events` will grow large (~14 GB/yr at projected fleet scale); operational follow-up to time-partition.
|
||||
- The `Retry-After` header is set both by the per-IP middleware (lease metadata) and by the `BusinessExceptionHandler` (from `BusinessException.RetryAfterSeconds`), so clients see consistent backoff hints regardless of which layer rejected.
|
||||
- All gating events go through `audit_events`, providing a single auditable history.
|
||||
|
||||
Reference in New Issue
Block a user