docs+src: complete Steps 1-3 outcomes + auth re-sync baseline

This commit captures everything produced during autodev existing-code
Steps 1 (Document), 2 (Architecture Baseline Scan), and 3 (Test Spec),
together with the targeted auth + CORS re-sync triggered on 2026-05-14
when codebase drift was detected at Step 4 entry. None of this work was
previously committed.

Step 1 (Document) — 50+ _docs/02_document/ files: problem, solution,
architecture, system flows, glossary, module-layout, per-component
specs (01..06), modules, deployment, diagrams, data model, FINAL
report, verification log, discovery.

Step 2 (Architecture Baseline) — architecture_compliance_baseline.md.
Verdict PASS_WITH_WARNINGS (0 Critical, 0 High, 1 Medium, 2 Low). No
High/Critical findings; auto-chained to Step 3 per existing-code flow.

Step 3 (Test Spec) — _docs/02_document/tests/* (67 scenarios across
blackbox, security, resilience, resource-limit, performance), plus
e2e/docker-compose.test.yml, e2e/seed/run.sh, scripts/run-tests.sh,
scripts/run-performance-tests.sh. Coverage 88% over the active scope
(40 of 45 items covered, 6 RB-deferred, 5 documented-as-uncovered).

Targeted auth + CORS re-sync — replaces the deleted in-house token
issuer with a JWKS-verifier model. AuthController and TokenService
removed; JwtExtensions switched from HS256 symmetric to ES256 over
admin's JWKS. ConfigurationResolver and CorsConfigurationValidator
added under src/Infrastructure/. ADR-002 and ADR-006 retired; SEC-01,
SEC-02, SEC-03 marked Closed. One new testability risk recorded in
architecture.md Open Risks Section 6 (JWKS HTTPS gating).

Source changes:
- src/Auth/JwtExtensions.cs (modified) — ES256, JWKS, alg pinning
- src/Program.cs (modified) — DI wiring for ConfigurationResolver
  and CorsConfigurationValidator
- src/Controllers/AuthController.cs (deleted) — no in-service issuance
- src/Services/TokenService.cs (deleted) — same
- src/Infrastructure/ConfigurationResolver.cs (new)
- src/Infrastructure/CorsConfigurationValidator.cs (new)
- .env.example (new) — required env var documentation
- .gitignore (updated)

Cross-repo coordination: _docs/cross-repo/flights_h1_h2_h3_change_spec
captures the change-spec for downstream services that consumed the now
deleted /auth endpoints.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-14 20:19:05 +03:00
parent 08eadc1158
commit 03f879206e
66 changed files with 6006 additions and 133 deletions
@@ -0,0 +1,123 @@
# Resource Limit Tests
### NFT-RES-LIM-01: Sustained-load process memory
**Summary**: Process memory stays bounded under sustained `POST /annotations` traffic.
**Traces to**: AC-N-03 (outbox depth bounded → memory bounded), HW-03 (memory pressure on `FailsafeProducer`'s image re-read)
**Preconditions**: SUT freshly started; clean state; a stream consumer connected so the outbox actually drains.
**Monitoring**:
- `docker stats annotations` polled every 10s for `MemUsage` (RSS) and `MemPerc`.
- Sample at the 0s / 60s / 600s marks.
**Duration**: 10 minutes at 5 RPS.
**Pass criteria**: RSS at the 600s mark ≤ 1.5× RSS at the 60s mark; no OOMKilled events; container stays healthy.
---
### NFT-RES-LIM-02: Single-file upload boundary
**Summary**: Determine the maximum single-file upload size accepted by `POST /media`.
**Traces to**: documented gap (no explicit limit in code; ASP.NET form-options apply)
**Monitoring**: HTTP status code per uploaded size.
**Steps**:
| Size | Expected Result |
|------|-----------------|
| 1 MB | HTTP 200 |
| 10 MB | HTTP 200 |
| 50 MB | HTTP 200 |
| 100 MB | HTTP 200 (probable, depends on ASP.NET defaults) |
| 256 MB | HTTP 200 OR 400 (test the boundary) |
| 512 MB | likely HTTP 400 / form-options reject |
**Duration**: ~5 minutes (one upload per size).
**Pass criteria**: a clear cutoff size is documented; below it the SUT accepts; at or above it the SUT returns the error envelope (NOT a 500 with no body, NOT a hang).
---
### NFT-RES-LIM-03: Outbox depth under broker outage
**Summary**: With RabbitMQ stopped for an extended period, the outbox `annotations_queue_records` table grows linearly with traffic AND does not exceed disk capacity / DB connection pool limits within the test window.
**Traces to**: NFT-RES-01 (extended), AC-N-03
**Monitoring**:
- `SELECT COUNT(*) FROM annotations_queue_records` every 30s.
- Disk usage of the Postgres data volume every minute.
- `docker stats postgres` for memory.
**Steps**:
| Step | Action | Expected Behavior |
|------|--------|------------------|
| 1 | `docker exec rabbitmq rabbitmqctl stop_app` | broker down |
| 2 | Run 10 RPS of `POST /annotations` for 5 minutes | 3000 outbox rows written |
| 3 | Sample queue depth and disk usage | depth grows linearly; disk grows linearly with image bytes (since `images_dir` is also written) |
| 4 | `docker exec rabbitmq rabbitmqctl start_app` | broker recovers |
| 5 | Wait for queue to drain | depth goes to 0 within 5 minutes of recovery |
**Duration**: 15 minutes total.
**Pass criteria**:
- During outage: SUT does not return 5xx; queue depth is exactly equal to total successful POSTs since the outage started.
- During recovery: queue drains to 0 within 5 minutes.
- No DB connection pool exhaustion (no `connection refused` from Postgres).
- No SUT crashes.
---
### NFT-RES-LIM-04: Disk usage by `images_dir` over many distinct uploads
**Summary**: Each distinct `image_bytes` POST consumes O(image-size) disk; identical re-uploads consume zero additional disk (idempotent).
**Traces to**: AC-F-01, AC-F-02
**Steps**:
| Step | Action | Expected Behavior |
|------|--------|------------------|
| 1 | Capture `du -sb $images_dir` baseline | non-empty path |
| 2 | `POST /annotations` 100× with `image_small.jpg` (same bytes) | 1 file added, ~1.5 MB delta from step 1 |
| 3 | `POST /annotations` 100× with random distinct image bytes (synthetic) | 100 new files; delta ≈ 100 × avg-size |
**Pass criteria**: identical uploads do not duplicate disk; distinct uploads scale linearly.
**Duration**: ~5 minutes.
---
### NFT-RES-LIM-05: Concurrent SSE subscribers — process-memory boundary
**Summary**: 100 simultaneous SSE subscribers do not exhaust the SUT's memory or thread pool.
**Traces to**: AC-N-05 (idle-channel memory bounded), OP-01 (per-instance SSE state)
**Preconditions**: SUT freshly started.
**Steps**:
| Step | Action | Expected Behavior |
|------|--------|------------------|
| 1 | Open 100 SSE connections to `/annotations/events?missionId=<m>` | all 100 alive |
| 2 | Sample `docker stats annotations` immediately after connection | RSS recorded |
| 3 | Idle for 10 minutes; sample every 60s | RSS stays within ± 10% of step 2 |
| 4 | `POST /annotations` once for mission `<m>` | all 100 subscribers receive the event within 1500ms |
**Pass criteria**: RSS bounded; all subscribers receive the event; no `connection refused` or thread-pool starvation.
**Duration**: ~12 minutes.
---
### NFT-RES-LIM-06: Migration on cold-start cost
**Summary**: Boot-time `DatabaseMigrator.MigrateAsync()` adds bounded latency to cold start (`/health` returns 200 within `<budget>` after container start).
**Traces to**: AC-N-01
**Steps**:
| Step | Action | Expected Behavior |
|------|--------|------------------|
| 1 | `docker compose down annotations && docker compose up -d annotations` | container starting |
| 2 | Poll `/health` every 200ms; record time-to-first-200 | record time |
| 3 | Repeat with a fresh DB (cold migrator) and a populated DB (warm migrator) | both runs measured |
**Pass criteria** (until contracted): time-to-first-200 ≤ 30s on cold migrator; ≤ 10s on warm migrator. **Step 15 will tune.**
**Duration**: ~2 minutes.