docs+src: complete Steps 1-3 outcomes + auth re-sync baseline

This commit captures everything produced during autodev existing-code
Steps 1 (Document), 2 (Architecture Baseline Scan), and 3 (Test Spec),
together with the targeted auth + CORS re-sync triggered on 2026-05-14
when codebase drift was detected at Step 4 entry. None of this work was
previously committed.

Step 1 (Document) — 50+ _docs/02_document/ files: problem, solution,
architecture, system flows, glossary, module-layout, per-component
specs (01..06), modules, deployment, diagrams, data model, FINAL
report, verification log, discovery.

Step 2 (Architecture Baseline) — architecture_compliance_baseline.md.
Verdict PASS_WITH_WARNINGS (0 Critical, 0 High, 1 Medium, 2 Low). No
High/Critical findings; auto-chained to Step 3 per existing-code flow.

Step 3 (Test Spec) — _docs/02_document/tests/* (67 scenarios across
blackbox, security, resilience, resource-limit, performance), plus
e2e/docker-compose.test.yml, e2e/seed/run.sh, scripts/run-tests.sh,
scripts/run-performance-tests.sh. Coverage 88% over the active scope
(40 of 45 items covered, 6 RB-deferred, 5 documented-as-uncovered).

Targeted auth + CORS re-sync — replaces the deleted in-house token
issuer with a JWKS-verifier model. AuthController and TokenService
removed; JwtExtensions switched from HS256 symmetric to ES256 over
admin's JWKS. ConfigurationResolver and CorsConfigurationValidator
added under src/Infrastructure/. ADR-002 and ADR-006 retired; SEC-01,
SEC-02, SEC-03 marked Closed. One new testability risk recorded in
architecture.md Open Risks Section 6 (JWKS HTTPS gating).

Source changes:
- src/Auth/JwtExtensions.cs (modified) — ES256, JWKS, alg pinning
- src/Program.cs (modified) — DI wiring for ConfigurationResolver
  and CorsConfigurationValidator
- src/Controllers/AuthController.cs (deleted) — no in-service issuance
- src/Services/TokenService.cs (deleted) — same
- src/Infrastructure/ConfigurationResolver.cs (new)
- src/Infrastructure/CorsConfigurationValidator.cs (new)
- .env.example (new) — required env var documentation
- .gitignore (updated)

Cross-repo coordination: _docs/cross-repo/flights_h1_h2_h3_change_spec
captures the change-spec for downstream services that consumed the now
deleted /auth endpoints.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-14 20:19:05 +03:00
parent 08eadc1158
commit 03f879206e
66 changed files with 6006 additions and 133 deletions
+134
View File
@@ -0,0 +1,134 @@
# Resilience Tests
### NFT-RES-01: RabbitMQ broker outage during create
**Summary**: `POST /annotations` succeeds (HTTP 200) when the RabbitMQ broker is unreachable; the outbox row is preserved; `FailsafeProducer` does not crash; on broker recovery the message is delivered.
**Traces to**: AC-F-12, OP-02 (single-instance baseline)
**Preconditions**: SUT healthy; broker initially reachable; clean outbox.
**Fault injection**:
- `docker exec rabbitmq rabbitmqctl stop_app` mid-test (stops AMQP/streams listeners; container stays up).
**Steps**:
| Step | Action | Expected Behavior |
|------|--------|------------------|
| 1 | Stop RabbitMQ app | broker unreachable on 5552 |
| 2 | `POST /annotations` once | HTTP 200; outbox row inserted |
| 3 | Out-of-band: `SELECT COUNT(*) FROM annotations_queue_records WHERE annotation_id = <id>` | `count == 1` (row not deleted because drain failed) |
| 4 | `GET /health` | HTTP 200 (SUT not crashed) |
| 5 | `docker exec rabbitmq rabbitmqctl start_app` | broker recovers |
| 6 | Wait `drain_interval × 3` | drainer publishes the queued message |
| 7 | Out-of-band: `SELECT COUNT(*) FROM annotations_queue_records WHERE annotation_id = <id>` | `count == 0` (drained) |
| 8 | Stream consumer (started before step 5 at offset `next`) reads one message | message body matches the documented schema |
**Pass criteria**: zero 5xx during outage; outbox preserves the row; recovery delivers the deferred message; total recovery time ≤ 60s after broker comes back.
**Duration**: ~2 minutes.
---
### NFT-RES-02: Postgres restart between writes
**Summary**: Killing and restarting Postgres during a quiet period does not corrupt state; subsequent writes succeed.
**Traces to**: AC-N-02 (idempotent migrator), implicit data-integrity NFR
**Fault injection**: `docker compose restart postgres` while no in-flight requests.
**Steps**:
| Step | Action | Expected Behavior |
|------|--------|------------------|
| 1 | `POST /annotations` once (FT-P-01-shape) | HTTP 200; row in DB |
| 2 | `docker compose restart postgres` | DB up after ~5s |
| 3 | Wait for SUT `/health` to return 200 | SUT recovers connection pool (or restarts itself) |
| 4 | `POST /annotations` again | HTTP 200; row in DB |
| 5 | `GET /annotations/<id from step 1>` | HTTP 200; original row intact |
**Pass criteria**: original row intact after restart; new write succeeds within 30s of DB recovery; zero data loss.
**Duration**: ~2 minutes.
---
### NFT-RES-03: Postgres unreachable during create
**Summary**: When DB is unreachable mid-request, the SUT returns a structured error envelope (no 500 with stack trace); the SUT recovers when DB returns.
**Traces to**: AC-N-04 (zero unhandled exceptions to clients)
**Fault injection**: `docker pause postgres` between request start and request end (race-y; use a delay-injecting test proxy if needed).
**Steps**:
| Step | Action | Expected Behavior |
|------|--------|------------------|
| 1 | `docker pause postgres` | DB connections hang |
| 2 | `POST /annotations` once with timeout 30s | HTTP 5xx OR HTTP 503; **error envelope present**; **no raw exception text in body** |
| 3 | `docker unpause postgres` | DB responsive |
| 4 | `POST /annotations` again | HTTP 200; SUT recovered |
**Pass criteria**: under-DB-outage response uses the error envelope; SUT recovers within 30s of DB recovery.
**Duration**: ~2 minutes.
---
### NFT-RES-04: SSE subscriber disconnect mid-stream
**Summary**: A subscriber that disconnects mid-stream does not crash the SUT or block other subscribers.
**Traces to**: AC-F-10, OP-01 (per-instance SSE state)
**Steps**:
| Step | Action | Expected Behavior |
|------|--------|------------------|
| 1 | Open 3 SSE connections to `/annotations/events?missionId=<m>` | all 3 alive |
| 2 | Abruptly close subscriber #2 (TCP RST) | SUT cleans up its channel slot |
| 3 | `POST /annotations` for mission `<m>` | HTTP 200 |
| 4 | Subscribers #1 and #3 each receive the event | both receive within 1000ms |
| 5 | `GET /health` | HTTP 200 |
**Pass criteria**: surviving subscribers still receive events; no SUT memory growth visible (channel slots reclaimed); `/health` stays green.
**Duration**: ~1 minute.
---
### NFT-RES-05: Repeated FailsafeProducer empty-catch path
**Summary**: When the image referenced by an outbox row no longer exists on disk, the drainer logs and proceeds (post RB-05). Tests today's behavior (empty catch) AND, after RB-05 lands, asserts the logged failure path.
**Traces to**: RB-05
**Fault injection**: insert an outbox row whose `annotation_id` references a missing image (manually delete the file after `POST /annotations` returned 200, before the drain interval fires).
**Steps**:
| Step | Action | Expected Behavior |
|------|--------|------------------|
| 1 | `POST /annotations` (FT-P-01) | HTTP 200; outbox row + image file present |
| 2 | Delete `<images_dir>/<id>.jpg` | image gone |
| 3 | Wait `drain_interval × 2` | drainer runs |
| 4 | Out-of-band: `SELECT COUNT(*) FROM annotations_queue_records WHERE annotation_id = <id>` | today's behavior: row may be deleted or stuck (empty catch swallows IOException) — **document actual behavior here** |
| 5 `[after RB-05]` | Inspect SUT logs for an `ERROR` entry mentioning the missing image | one log entry present; metric counter `failsafe_drain_errors` incremented |
**Pass criteria today**: SUT does not crash; `/health` stays 200.
**Pass criteria after RB-05**: as above + the logged failure path is exercised.
**Duration**: ~1 minute.
---
### NFT-RES-06: Stream consumer reconnect
**Summary**: A stream consumer that drops and reconnects with offset `last_committed` reads only post-disconnect messages.
**Traces to**: implicit (consumer-side concern, but documents the contract Annotations producer expects)
**Steps**:
| Step | Action | Expected Behavior |
|------|--------|------------------|
| 1 | Start consumer at offset `next`; record current end-of-stream offset `O0` | consumer up |
| 2 | `POST /annotations` 5 times | 5 outbox rows; 5 stream messages produced shortly after |
| 3 | Consumer reads all 5; commits offset after each | consumer offset = `O0 + 5` |
| 4 | Disconnect consumer | done |
| 5 | `POST /annotations` 3 more times | 3 more stream messages |
| 6 | Reconnect consumer at `last_committed = O0 + 5` | consumer reads only messages 6..8 |
**Pass criteria**: re-attached consumer sees no duplicates and no gaps.
**Duration**: ~1 minute.