[AZ-536] [AZ-537] [AZ-538] Argon2id, login rate limit + lockout, CORS https-only
ci/woodpecker/push/01-test Pipeline failed
ci/woodpecker/push/02-build-push unknown status

AZ-536 — replace unsalted SHA-384 password hashing with Argon2id (RFC 9106).
Stored as PHC string with 64 MiB / 3 iter / 1 lane defaults; legacy SHA-384
hashes detected by prefix and lazily re-hashed on next successful login.
Verify uses CryptographicOperations.FixedTimeEquals on both formats.

AZ-537 — add per-IP sliding window rate limit on /login (ASP.NET Core
RateLimiter, 10/60s default — production-tight) plus DB-backed per-account
limit (5/300s) and consecutive-failure lockout (10 / 15 min) on the users
row. Adds a generic audit_events table with INSERT/SELECT-only grants for
the app role so the per-account count is queryable and admins cannot erase
their own forensic trail. BusinessExceptionHandler maps AccountLocked to
423 and LoginRateLimited to 429, both with Retry-After.

AZ-538 — drop the http://admin.azaion.com origin from CORS, gate
UseHsts() + UseHttpsRedirection() to non-Development envs (1y / preload).

Test infra: Npgsql in the e2e project + a DbHelper for direct DB
inspection used by the AZ-536/537 ACs. appsettings.Development.json
raises PerIpPermitLimit to 1000 so the suite (~270 logins from one
container IP) doesn't false-trip the limiter.

Tests: 53 pass + 3 documented skips (per-IP rate limit needs distinct
client IPs; HSTS/HTTPS redirect need ASPNETCORE_ENVIRONMENT=Production).

Code review: PASS_WITH_WARNINGS — 0 Critical, 0 High, 1 Medium, 3 Low.
See _docs/03_implementation/reviews/batch_01_cycle2_review.md.

Closes AZ-530 epic batch 1 of 4.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-14 04:52:31 +03:00
parent 9679b5636f
commit 491993f9c1
31 changed files with 1327 additions and 36 deletions
@@ -0,0 +1,93 @@
# Replace SHA-384 Password Hashing with Argon2id (Salted)
**Task**: AZ-536_argon2id_password_hashing
**Name**: Replace SHA-384 password hashing with Argon2id (salted)
**Description**: Replace the unsalted single-pass SHA-384 in `Azaion.Services/Security.cs::ToHash` with Argon2id (RFC 9106), salted, memory-hard. PHC string format (self-describing — no separate salt column needed). Lazy migration: existing SHA-384 hashes re-hash on next successful login.
**Complexity**: 3 points
**Dependencies**: None
**Component**: Services + DataAccess
**Tracker**: AZ-536
**Epic**: AZ-530
**CMMC ref**: IA.L2-3.5.10 (cryptographic mechanisms to protect passwords)
## Problem
`Azaion.Services/Security.cs::ToHash` does:
```csharp
public static string ToHash(this string str) =>
Convert.ToBase64String(SHA384.HashData(Encoding.UTF8.GetBytes(str)));
```
Used at `UserService.cs:43` (registration) and `UserService.cs:115` (login validation). This is **unsalted**, **fast**, **single-pass** SHA-384.
Problems:
- Trivially attacked with rainbow tables (no salt).
- GPU bruteforce ≈ billions of guesses/sec.
- Identical passwords across users produce identical hashes (visible in DB dumps).
- Affects every `users` row in the central admin DB — including operator, admin, and CompanionPC device passwords.
## Outcome
- Replace `ToHash` with Argon2id (RFC 9106), salted, with conservative parameters (memory ≥ 64 MiB, iterations ≥ 3, parallelism ≥ 1).
- Each password hash stored in PHC string format: `$argon2id$v=19$m=65536,t=3,p=1$<salt-b64>$<hash-b64>` — self-describing, no separate salt column needed.
- **Lazy migration**: existing SHA-384 hashes stay in the DB. On next successful login (verified by re-hashing the submitted plaintext with SHA-384 and matching), the password is re-hashed with Argon2id and the row updated. Detect format by prefix (`$argon2id$` vs base64).
- For service accounts that never log in interactively (CompanionPC devices), provide an admin-side bulk-reset script that rotates their passwords during next provisioning cycle.
## Scope
### Included
- Add `Konscious.Security.Cryptography.Argon2` (or `Isopoh.Cryptography.Argon2` — both pure C#) as a `Azaion.Services` dependency. Pin a specific version.
- Refactor `Security.cs`: `HashPassword(string)` returns PHC string; `VerifyPassword(string plaintext, string stored)` handles both formats and triggers re-hash for legacy SHA-384.
- Update `UserService.RegisterUser` to call `HashPassword`.
- Update `UserService.ValidateUser` to call `VerifyPassword` and on legacy-hash match, write the new Argon2id hash back transactionally before returning success.
- Update `_docs/05_security/security_report.md` to reflect the new state and the migration plan.
- Tests: hash format, verify happy path, verify legacy hash transparently re-hashes, verify wrong password fails for both formats, parameter sanity (m ≥ 64 MiB).
### Excluded
- Forced password reset on next login (not required — lazy migration covers humans; service accounts via separate provisioning).
- Pepper / HSM-bound hashing — future hardening pass.
- Algorithm agility framework ("add bcrypt support too") — not needed; Argon2id is the answer for the next 5+ years.
## Acceptance Criteria
**AC-1: New users get Argon2id hashes**
Given a fresh registration
When the row is inspected
Then `password_hash` starts with `$argon2id$v=19$m=`… and parameter parses confirm m ≥ 65536, t ≥ 3, p ≥ 1.
**AC-2: Legacy SHA-384 hashes still validate**
Given a seed user with a SHA-384 hash from before this change
When they log in with the correct password
Then 200 — login succeeds.
**AC-3: Successful legacy login transparently re-hashes**
After AC-2, when the same user's row is re-read
Then `password_hash` is now in Argon2id PHC format. The same plaintext continues to validate.
**AC-4: Wrong password fails for both formats**
Given a user with a SHA-384 hash and a user with an Argon2id hash
When each tries to log in with the wrong password
Then both return 409 ExceptionEnum=WrongPassword (existing error semantics preserved).
**AC-5: Verify is constant-time**
Given any stored hash
When `VerifyPassword` is called with various wrong passwords of different lengths
Then timing variance is not observable to a remote attacker (rely on the library's constant-time comparator; do NOT use `string ==`).
## Blackbox Tests
| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
|--------|------------------------|-------------|-------------------|----------------|
| AC-1 | Fresh registration | Read users.password_hash | Starts with $argon2id$v=19$, m ≥ 65536 | NFT-SEC-NEW |
| AC-2 | Seed user with legacy SHA-384 hash | POST /login with correct pwd | 200 | — |
| AC-3 | After AC-2 | Read users.password_hash | Now Argon2id PHC format | — |
| AC-4 | Both hash formats | POST /login with wrong pwd | 409 WrongPassword | — |
| AC-5 | Various-length wrong pwds | Time the verify | No remotely-observable timing leak | — |
## Risks / Notes
- Argon2id with 64 MiB × 3 iterations costs ≈ 50-200 ms per verify on commodity hardware. Login latency increases noticeably (was ≈ 1 ms with SHA-384). This is the point — it makes bruteforce expensive. Document the new latency in security report.
- AZ-537 (rate limit + lockout) and this ticket touch the same code path (`UserService.ValidateUser`). Coordinate merge order — land Argon2id (this ticket) first since it changes the success path semantics, then AZ-537 layers on top.
@@ -0,0 +1,99 @@
# /login Rate Limit + Account Lockout
**Task**: AZ-537_login_rate_limit_lockout
**Name**: /login rate limit + account lockout
**Description**: Add ASP.NET Core sliding-window rate limiter on `/login` (per-IP and per-account) plus an account-lockout policy after 10 consecutive failures. Closes the unbounded credential-stuffing / password-spray surface.
**Complexity**: 3 points
**Dependencies**: None functionally; coordinate merge order with AZ-536 (both touch `UserService.ValidateUser`)
**Component**: Admin API + Services + DataAccess
**Tracker**: AZ-537
**Epic**: AZ-530
**CMMC ref**: AC.L2-3.1.8 (limit unsuccessful logon attempts)
## Problem
`Azaion.AdminApi/Program.cs:177` (`POST /login`) has no rate limiting and no account lockout. An attacker can:
- **Credential stuffing**: spray leaked username/password pairs from other breaches at unlimited RPS.
- **Password spray**: try one common password against every known account.
- **Targeted bruteforce**: hammer one account.
Nothing in the request path slows them down. Combined with the SHA-384 hashing flaw (sister ticket AZ-536), this is high-severity.
## Outcome
- ASP.NET Core built-in rate limiter (`AddRateLimiter`) attached to `/login`:
- **Per-IP**: 10 attempts / 60 s (sliding window). Burst of 3.
- **Per-account** (keyed by submitted email, normalised lowercase): 5 attempts / 5 min.
- Both limits return 429 with `Retry-After` header when exceeded.
- **Account lockout**: after 10 consecutive failed logins for a single account, lock it for 15 min (configurable). Lockout state stored on `users` row (`lockout_until timestamptz`, `failed_login_count int`). Successful login resets the counter.
- Lockout takes precedence over rate limit (if account is locked, return 423 Locked even if request is within rate budget).
- Counters reset on successful login.
## Scope
### Included
- New columns on `users`: `failed_login_count int default 0`, `lockout_until timestamptz null`.
- Migration script for the schema change.
- `RateLimiter` configuration in `Program.cs` (use built-in `AddSlidingWindowLimiter` for IP + account partitions).
- Update `UserService.ValidateUser` to:
- Reject early with 423 if `lockout_until > now()`.
- On wrong password: increment `failed_login_count`; if it hits the threshold, set `lockout_until = now() + 15min`.
- On success: zero the counter and clear `lockout_until`.
- `appsettings.json` keys for thresholds (`Auth:RateLimit:*`, `Auth:Lockout:MaxAttempts`, `Auth:Lockout:DurationMinutes`).
- Tests: rate-limit triggers 429, lockout triggers 423 even for correct password, success resets counter, lockout auto-expires after duration.
- Audit log entries for each lockout event (security-relevant).
### Excluded
- CAPTCHA challenge — not in scope; rate-limit + lockout is sufficient for CMMC L2.
- Distributed rate-limit store (Redis-backed limiter for multi-instance admin) — in-memory limiter is acceptable for current single-instance deploy. Document the upgrade path.
- Admin-side "unlock user" API — separate small ticket if needed; for now wait out the 15-min window or DB intervention.
## Acceptance Criteria
**AC-1: Per-IP rate limit triggers 429**
Given 11 `/login` requests from the same IP within 60 s
When the 11th is sent
Then response is 429 with a `Retry-After` header.
**AC-2: Per-account rate limit triggers 429**
Given 6 `/login` requests for `alice@x.com` from 6 different IPs within 5 min
When the 6th is sent
Then response is 429 (account-key partition triggered).
**AC-3: Account lockout after 10 failures**
Given `alice@x.com` has 9 consecutive wrong-password attempts (across IPs / time)
When the 10th wrong attempt arrives
Then `users.lockout_until = now() + 15min`. Subsequent attempts — even with the correct password — return 423 Locked until that time.
**AC-4: Successful login resets the counter**
Given `alice@x.com` has 5 failed attempts
When she submits the correct password (within the rate-limit budget)
Then login succeeds and `failed_login_count = 0`, `lockout_until = NULL`.
**AC-5: Lockout auto-expires**
Given `alice@x.com` is locked with `lockout_until = T`
When she submits the correct password at `T + 1s`
Then login succeeds.
**AC-6: Audit log on lockout**
Given AC-3 fires
When the audit log is inspected
Then there is a `login_lockout` entry with `email`, `ip`, `timestamp`.
## Blackbox Tests
| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
|--------|------------------------|-------------|-------------------|----------------|
| AC-1 | 11 requests from same IP in 60s | 11th POST /login | 429 with Retry-After | NFT-SEC-NEW |
| AC-2 | 6 requests for alice from 6 IPs in 5min | 6th POST /login | 429 | NFT-SEC-NEW |
| AC-3 | 10 wrong-pwd attempts | 11th attempt with correct pwd | 423 Locked | NFT-SEC-NEW |
| AC-4 | 5 failed attempts | Successful login | counter=0, lockout_until=NULL | — |
| AC-5 | Locked until T | Login at T+1s with correct pwd | 200 | — |
| AC-6 | AC-3 fires | Inspect audit log | login_lockout entry present | — |
## Risks / Notes
- DoS-as-a-service: an attacker can lock out a known target's account by spraying wrong passwords from many IPs. The per-account counter intentionally allows this (CMMC requires lockout regardless of source). Mitigate operationally with admin-side unlock; do not weaken the rule.
- AZ-536 (Argon2id hashing) and this ticket both modify `UserService.ValidateUser`. Coordinate merge order — land AZ-536 first since it changes the success path semantics; this ticket layers on top.
@@ -0,0 +1,95 @@
# CORS — Drop HTTP Origin, Enforce HTTPS-Only + HSTS
**Task**: AZ-538_cors_https_only_hsts
**Name**: CORS — drop http origin, enforce HTTPS-only + HSTS
**Description**: Remove `http://admin.azaion.com` from the CORS allow-list (currently combined with `AllowCredentials()`, which permits credentialed traffic over cleartext), enable HSTS in non-Development envs, and add HTTPS redirection as defence in depth.
**Complexity**: 2 points
**Dependencies**: None
**Component**: Admin API
**Tracker**: AZ-538
**Epic**: AZ-530
**CMMC ref**: SC.L2-3.13.8 (encrypt CUI in transit), SC.L2-3.13.11 (FIPS-validated cryptography)
## Problem
`Azaion.AdminApi/Program.cs` lines 117-127:
```csharp
builder.Services.AddCors(options =>
{
options.AddPolicy("AdminCorsPolicy", policy =>
{
policy.WithOrigins("https://admin.azaion.com", "http://admin.azaion.com")
.AllowAnyMethod()
.AllowAnyHeader()
.AllowCredentials();
});
});
```
Allowing the `http://` origin together with `AllowCredentials()` means a browser will send cookies / `Authorization` headers to the admin API over cleartext from `http://admin.azaion.com`. Any LAN MITM (coffee shop wifi, compromised AP, ARP spoof) can capture the session.
## Outcome
- Drop `"http://admin.azaion.com"` from `WithOrigins`. Only `https://admin.azaion.com` remains.
- Enable HSTS via `app.UseHsts()` in non-Development environments. `max-age=31536000; includeSubDomains; preload`.
- Add `app.UseHttpsRedirection()` to bounce any cleartext request to HTTPS at the protocol layer (defence in depth — even if someone re-adds the http origin by accident, the redirect kicks in first).
- Verify dev workflow: any contributor who relied on `http://admin.azaion.com` locally must switch to `https://localhost:<port>` (devcert is already in `secrets/`).
## Scope
### Included
- One-line `WithOrigins` change.
- `UseHsts` + `UseHttpsRedirection` in `Program.cs`, gated to non-Development env to keep `dotnet watch` flow on http://localhost intact.
- Update `_docs/05_security/security_report.md` (close the finding).
- Update `_docs/02_document/architecture.md` if it documents the http allowance.
- Smoke test: cleartext origin returns CORS rejection in browser preflight.
### Excluded
- mTLS between services — separate ticket, larger scope.
- Cert pinning at clients — separate ticket.
- TLS 1.3 enforcement — already the Kestrel default in .NET 10; no action needed.
## Acceptance Criteria
**AC-1: http origin rejected by CORS**
Given a browser preflight `OPTIONS /login` with `Origin: http://admin.azaion.com`
When the response is inspected
Then no `Access-Control-Allow-Origin` header is returned (CORS denies the request).
**AC-2: https origin still works**
Given a browser preflight `OPTIONS /login` with `Origin: https://admin.azaion.com`
When the response is inspected
Then `Access-Control-Allow-Origin: https://admin.azaion.com` is present and `Access-Control-Allow-Credentials: true`.
**AC-3: HSTS header on prod responses**
Given the app runs with `ASPNETCORE_ENVIRONMENT=Production`
When any HTTPS request returns
Then response includes `Strict-Transport-Security: max-age=31536000; includeSubDomains; preload`.
**AC-4: HTTP requests redirect to HTTPS**
Given the app runs with `ASPNETCORE_ENVIRONMENT=Production`
When `GET http://admin.azaion.com/health/live` is called
Then response is 307 to `https://admin.azaion.com/health/live`.
**AC-5: Development env unchanged**
Given `ASPNETCORE_ENVIRONMENT=Development`
When `GET http://localhost:8080/health/live` is called
Then 200 (no HTTPS redirect, no HSTS).
## Blackbox Tests
| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
|--------|------------------------|-------------|-------------------|----------------|
| AC-1 | Origin: http://admin.azaion.com | OPTIONS preflight | No ACAO header | NFT-SEC-NEW |
| AC-2 | Origin: https://admin.azaion.com | OPTIONS preflight | ACAO present, ACAC: true | — |
| AC-3 | Production env | Any HTTPS response | HSTS header present | NFT-SEC-NEW |
| AC-4 | Production env | GET http:// URL | 307 to https:// | — |
| AC-5 | Development env | GET http://localhost:8080/health/live | 200, no HSTS | — |
## Risks / Notes
- If any deployed UI build is pinned to `http://admin.azaion.com`, this change will break it. Verify the UI build's API base URL before merging.
- If a reverse proxy / load balancer terminates TLS upstream, ensure `app.UseForwardedHeaders` is correctly configured so `UseHttpsRedirection` doesn't loop. Document expected header config in `_docs/04_deploy/`.