docs+src: complete Steps 1-3 outcomes + auth re-sync baseline

This commit captures everything produced during autodev existing-code
Steps 1 (Document), 2 (Architecture Baseline Scan), and 3 (Test Spec),
together with the targeted auth + CORS re-sync triggered on 2026-05-14
when codebase drift was detected at Step 4 entry. None of this work was
previously committed.

Step 1 (Document) — 50+ _docs/02_document/ files: problem, solution,
architecture, system flows, glossary, module-layout, per-component
specs (01..06), modules, deployment, diagrams, data model, FINAL
report, verification log, discovery.

Step 2 (Architecture Baseline) — architecture_compliance_baseline.md.
Verdict PASS_WITH_WARNINGS (0 Critical, 0 High, 1 Medium, 2 Low). No
High/Critical findings; auto-chained to Step 3 per existing-code flow.

Step 3 (Test Spec) — _docs/02_document/tests/* (67 scenarios across
blackbox, security, resilience, resource-limit, performance), plus
e2e/docker-compose.test.yml, e2e/seed/run.sh, scripts/run-tests.sh,
scripts/run-performance-tests.sh. Coverage 88% over the active scope
(40 of 45 items covered, 6 RB-deferred, 5 documented-as-uncovered).

Targeted auth + CORS re-sync — replaces the deleted in-house token
issuer with a JWKS-verifier model. AuthController and TokenService
removed; JwtExtensions switched from HS256 symmetric to ES256 over
admin's JWKS. ConfigurationResolver and CorsConfigurationValidator
added under src/Infrastructure/. ADR-002 and ADR-006 retired; SEC-01,
SEC-02, SEC-03 marked Closed. One new testability risk recorded in
architecture.md Open Risks Section 6 (JWKS HTTPS gating).

Source changes:
- src/Auth/JwtExtensions.cs (modified) — ES256, JWKS, alg pinning
- src/Program.cs (modified) — DI wiring for ConfigurationResolver
  and CorsConfigurationValidator
- src/Controllers/AuthController.cs (deleted) — no in-service issuance
- src/Services/TokenService.cs (deleted) — same
- src/Infrastructure/ConfigurationResolver.cs (new)
- src/Infrastructure/CorsConfigurationValidator.cs (new)
- .env.example (new) — required env var documentation
- .gitignore (updated)

Cross-repo coordination: _docs/cross-repo/flights_h1_h2_h3_change_spec
captures the change-spec for downstream services that consumed the now
deleted /auth endpoints.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-14 20:19:05 +03:00
parent 08eadc1158
commit 03f879206e
66 changed files with 6006 additions and 133 deletions
@@ -0,0 +1,55 @@
# Observability
Source of truth: `src/Program.cs` (no dedicated logging config files exist in repo).
## Health check
```csharp
app.MapGet("/health", () => Results.Ok(new { status = "healthy" }));
```
- Path: `GET /health`
- Auth: none (`MapGet` bypasses controller-level `[Authorize]`).
- Response: `200 { "status": "healthy" }`
- **Liveness only**: this endpoint does not probe the DB, RabbitMQ, or filesystem. A pod can return healthy while the failsafe outbox is unable to publish or while DB connectivity is broken.
## API documentation
- `app.UseSwagger()` and `app.UseSwaggerUI()` mounted unconditionally (ADR-005).
- Endpoints: `/swagger/v1/swagger.json` (OpenAPI), `/swagger/index.html` (UI).
- No version pinning of the OpenAPI document (Swashbuckle defaults).
## Logging
- Default ASP.NET Core console logger. No `appsettings.json` overrides in repo.
- No structured logger (Serilog / NLog) configured.
- No correlation id middleware in repo (`X-Request-Id` not propagated).
## Metrics
None configured today. Possible additions:
- `prometheus-net` exporter on `/metrics`.
- ASP.NET Core `MetricsCollector` (built-in HTTP / runtime counters).
## Traces
None configured. OpenTelemetry SDK is not referenced in `csproj`.
## Image revision stamp
The runtime container has `AZAION_REVISION = $CI_COMMIT_SHA` set as an env var (Dockerfile + Woodpecker pipeline). This makes "what's running?" diagnosable from inside the container with `printenv AZAION_REVISION` or by surfacing it in a future `/info` endpoint.
## Error visibility to clients
`ErrorHandlingMiddleware` maps exceptions to JSON `{ statusCode, message }` with HTTP 400 / 404 / 409 / 500. Internal exception details are not leaked beyond the `message` string (confirm during Step 4 verification — make sure 500 paths do not echo stack traces).
## Open observability items
- **Readiness vs liveness split**: today there is one endpoint that does not check dependencies. A `GET /ready` that pings DB and (optionally) RabbitMQ would unblock proper rolling-update gates.
- **Structured logs** with request id correlation across HTTP + outbox drain + SSE.
- **Outbox depth metric** (`COUNT(*)` on `annotations_queue_records`) — surfaces stuck-failsafe scenarios early.
- **SSE subscriber count metric** — visibility into connected UIs.
- **Stream publish lag** — time from outbox row insertion to RabbitMQ publish.
- **Failure injection / chaos hooks** — none today.
These are candidates for the deploy-and-retro phase of autodev (Steps 1417 once the project enters Phase B).