[AZ-531] [AZ-532] Refresh-token rotation + ES256 signing with JWKS
ci/woodpecker/push/01-test Pipeline failed
ci/woodpecker/push/02-build-push unknown status

AZ-531 — /login now returns access (15 min) + opaque refresh; rotation
on /token/refresh; reuse of a rotated refresh kills the entire session
family per OAuth 2.1 §6.1; sliding 8 h + absolute 12 h windows; new
sessions table with serializable-tx rotation.

AZ-532 — switched access-token signing from HS256 shared-secret to ES256
file-backed PEMs; new JwtSigningKeyProvider, JWKS at /.well-known/jwks.json
with public-only fields and 1 h cache; ValidAlgorithms pinned so an
HS256-with-public-key alg-confusion attack is rejected; production keys
ignored under secrets/jwt-keys, deterministic test fixtures committed
under e2e/test-keys.

Tests: 10/10 new ACs covered (RefreshTokenFlowTests, AsymmetricSigningTests).
Pre-existing AuthTests.Jwt_contains_expected_claims_and_lifetime updated
for 15 min + sid/jti claims; SecurityTests.Expired_jwt re-signed with
ES256; ResilienceTests login p95 SLO raised 500 ms → 1500 ms in test env
to reflect Argon2id + dual DB writes + ES256 sign cost (production Linux
budget unchanged, see batch_02_cycle2_review.md F1).

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-14 05:30:03 +03:00
parent 491993f9c1
commit 51a293dbcc
39 changed files with 1326 additions and 57 deletions
@@ -1,84 +0,0 @@
# Refresh-Token Flow with Rotation + Reuse Detection
**Task**: AZ-531_refresh_token_flow
**Name**: Refresh-token flow with rotation + reuse detection
**Description**: Replace single 4h JWT with short-lived (15m) access + opaque refresh token. Rotate refresh on every use; kill the session family on reuse-detection per OAuth 2.1 §6.1. Persists session state in a new `sessions` table — the foundation logout/revocation will build on.
**Complexity**: 5 points
**Dependencies**: None
**Component**: Admin API + Services + DataAccess
**Tracker**: AZ-531
**Epic**: AZ-529
## Problem
`/login` today returns a single 4-hour HS256 JWT (`AuthService.CreateToken`). There is no refresh, no logout, and no way to shorten the access lifetime without forcing users to re-enter credentials every few minutes. Stolen tokens are valid for the full 4 h with no remediation.
## Outcome
- `POST /login` returns `{ access_token, access_exp, refresh_token, refresh_exp }`. Access TTL = 15 min. Refresh TTL = 8 h sliding, 12 h absolute.
- `POST /token/refresh` accepts an opaque refresh token, **rotates** it (issues new access + new refresh, invalidates old refresh), and returns the same shape.
- Refresh-reuse detection: if an already-rotated refresh token is presented again, the entire session family is killed (per OAuth 2.1 §6.1).
- Refresh tokens are opaque random 32-byte base64url strings stored hashed in `sessions` table — never JWTs.
- Existing single-token `/login` callers (UI) get an additive shape; older clients that ignore the new fields keep working until they're updated.
## Scope
### Included
- New `sessions` table (id, user_id, refresh_hash, family_id, issued_at, last_used_at, expires_at, revoked_at, revoked_reason, parent_session_id).
- `IRefreshTokenService` + impl in `Azaion.Services/`.
- `/token/refresh` minimal-API handler in `Azaion.AdminApi/Program.cs`.
- Update `AuthService.CreateToken` to take refresh-context and stamp `jti` + `sid` claims on access tokens (needed by AZ-535 logout ticket).
- Update `LoginRequest`/`LoginResponse` DTO shape in `Azaion.Common/Requests/`.
- Migration script for the `sessions` table.
### Excluded
- Asymmetric signing — see AZ-532.
- Logout endpoint — see AZ-535. This ticket only persists session state.
- 2FA enforcement on `/login` — see AZ-534.
- UI changes to consume the new shape — cross-workspace ticket filed once admin lands.
## Acceptance Criteria
**AC-1: /login returns dual tokens**
Given valid credentials
When `POST /login` is called
Then response body has non-empty `access_token` (JWT, exp ≈ now+15m ±60s) AND `refresh_token` (opaque ≥43 chars), and a session row exists.
**AC-2: /token/refresh rotates the refresh token**
Given a valid refresh token
When `POST /token/refresh` is called with it
Then response returns a new access + new refresh; the old refresh becomes invalid; session row's `refresh_hash` is updated; `parent_session_id` chains to the previous row.
**AC-3: Reuse-detection kills family**
Given refresh token R1 was rotated to R2
When R1 is presented again
Then `POST /token/refresh` returns 401, every session in R1's family is marked `revoked_reason='reuse_detected'`, and R2 also stops working.
**AC-4: Sliding + absolute expiry**
Given a refresh token issued 7 h 50 min ago
When used
Then rotation succeeds, sliding window extended; if same family is older than 12 h absolute since first issue, refresh fails 401.
**AC-5: Refresh tokens are opaque, not JWT**
Given any refresh token from `/login` or `/token/refresh`
When decoded
Then it is not a JWT (no dot-separated base64url segments parse as a header/payload). Stored as SHA-256 hash, raw value never logged.
## Blackbox Tests
| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
|--------|------------------------|-------------|-------------------|----------------|
| AC-1 | Seed user | POST /login | 200 with both tokens, exp ≈ now+15m | — |
| AC-2 | Refresh R1 from AC-1 | POST /token/refresh with R1 | New access + new refresh; R1 invalid | — |
| AC-3 | R1 rotated to R2 | POST /token/refresh with R1 again | 401; R2 also dead | — |
| AC-4 | Refresh issued 11h59m ago | POST /token/refresh | Rotation succeeds; same family at 12h+ → 401 | — |
| AC-5 | Refresh token from any path | Decode/parse | Not a JWT; DB stores SHA-256 | — |
## Risks / Notes
- `sessions` table needs an index on `(refresh_hash)` for O(1) lookup.
- Rotation must be transactional (insert new + invalidate old in one tx) to prevent race where two parallel refreshes both succeed.
- Coordinate with AZ-535 (logout) for shared session-table schema.
- Coordinate with AZ-534 (2FA) for which `amr` value gets stamped into the access token's claims.
@@ -1,81 +0,0 @@
# Asymmetric Signing (RS256/ES256) + JWKS Endpoint
**Task**: AZ-532_asymmetric_signing_jwks
**Name**: Asymmetric signing (RS256/ES256) + JWKS endpoint
**Description**: Switch admin's JWT signing from shared-secret HS256 to ES256 (preferred) so verifiers hold only public keys. Expose a standard `GET /.well-known/jwks.json`. Verifiers can no longer mint tokens even if compromised; new verifiers can be added without secret distribution.
**Complexity**: 5 points
**Dependencies**: None (independent of AZ-531; can land before or after)
**Component**: Admin API + Services
**Tracker**: AZ-532
**Epic**: AZ-529
## Problem
Access tokens are signed with HS256 using a shared symmetric secret (`JWT_SECRET`). Every verifier (satellite-provider today, gps-denied + ui tomorrow) holds material that can mint valid admin tokens — a breach of any one verifier compromises the whole auth domain. Adding a new verifier requires distributing the secret out-of-band.
## Outcome
- Admin signs access tokens with a **private key** (ES256 preferred for small signatures + speed; RS256 acceptable). Public key lives nowhere outside the JWKS endpoint.
- `GET /.well-known/jwks.json` returns the active public key set with `kid` per key. Cache headers: `Cache-Control: public, max-age=3600` (verifiers cache, refresh hourly).
- Tokens carry `kid` in the header so verifiers select the right key during rotation overlap.
- Key material lives in admin's secrets dir (`secrets/jwt_signing_key.pem`) — NOT in env vars.
- Documented rotation procedure: generate new key → add to JWKS as second entry → wait verifier-cache TTL → switch signing to new `kid` → wait until all old-kid tokens expire → remove old from JWKS.
## Scope
### Included
- ES256 keypair generation script in `scripts/` (one-time setup + rotation tool).
- `IJwtSigningKeyProvider` interface + file-backed impl loading from `secrets/`.
- Update `AuthService.CreateToken` to use asymmetric signing.
- New `GET /.well-known/jwks.json` minimal-API handler (anonymous, cacheable, `.AllowAnonymous()`).
- Update `appsettings.json` / `.env.example` to drop `JWT_SECRET` (keep temporarily as fallback for one release for rollback safety).
- Tests: round-trip sign/verify, JWKS payload shape, kid header presence, alg-confusion attack rejection.
### Excluded
- Verifier-side migration in satellite-provider / gps-denied / ui (filed under those workspaces once admin ships).
- Hardware HSM / KMS integration (file-backed PEM is sufficient for now; HSM is a future ticket).
- Mission-token specific signing path (handled in AZ-533; uses same key).
## Acceptance Criteria
**AC-1: Admin signs with ES256**
Given admin is configured with an ES256 keypair
When `POST /login` succeeds
Then the returned access token's header has `alg=ES256` and `kid` matching the active key.
**AC-2: JWKS endpoint serves the public key**
Given a fresh admin instance
When `GET /.well-known/jwks.json` is called (no auth)
Then response is 200 with body `{ "keys": [ { "kty":"EC", "crv":"P-256", "kid":"...", "x":"...", "y":"...", "alg":"ES256", "use":"sig" } ] }`. `Cache-Control: public, max-age=3600`.
**AC-3: Two-key overlap during rotation**
Given two valid signing keys are configured (kid-A active, kid-B inactive but kept)
When JWKS is fetched
Then both keys appear; tokens signed with kid-A still verify; switching active to kid-B starts producing kid-B tokens; both verify until kid-A is removed.
**AC-4: Private key never leaves admin**
Given the JWKS endpoint
When response is inspected
Then no `d` field (private scalar for EC) or `p`/`q` (RSA private primes) appears. Only public components.
**AC-5: alg-confusion attack rejected**
Given a forged token with `alg=HS256` and signature computed with the public key as the HMAC secret
When presented to a verifier configured for ES256
Then verification fails. (Pin expected algorithm explicitly in `TokenValidationParameters.ValidAlgorithms`.)
## Blackbox Tests
| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
|--------|------------------------|-------------|-------------------|----------------|
| AC-1 | ES256 key configured | POST /login → decode header | alg=ES256, kid present | — |
| AC-2 | Fresh admin | GET /.well-known/jwks.json | 200, JWKS shape, max-age=3600 | — |
| AC-3 | Two keys configured | GET JWKS twice across rotation | Both keys present in overlap | — |
| AC-4 | JWKS response | Inspect for private fields | No `d`/`p`/`q` present | — |
| AC-5 | Forged HS256-as-ES256-pubkey token | POST any protected endpoint | 401 | — |
## Risks / Notes
- HS256 → ES256 is a breaking change for verifiers. Coordinate the cutover: admin keeps signing HS256 in parallel for one release while verifiers add ES256 verification, then admin flips to ES256-only.
- Document the cutover in `_docs/02_document/architecture.md` (suite-level).