Files
annotations/_docs/02_document/deployment/observability.md
T
Oleksandr Bezdieniezhnykh 03f879206e docs+src: complete Steps 1-3 outcomes + auth re-sync baseline
This commit captures everything produced during autodev existing-code
Steps 1 (Document), 2 (Architecture Baseline Scan), and 3 (Test Spec),
together with the targeted auth + CORS re-sync triggered on 2026-05-14
when codebase drift was detected at Step 4 entry. None of this work was
previously committed.

Step 1 (Document) — 50+ _docs/02_document/ files: problem, solution,
architecture, system flows, glossary, module-layout, per-component
specs (01..06), modules, deployment, diagrams, data model, FINAL
report, verification log, discovery.

Step 2 (Architecture Baseline) — architecture_compliance_baseline.md.
Verdict PASS_WITH_WARNINGS (0 Critical, 0 High, 1 Medium, 2 Low). No
High/Critical findings; auto-chained to Step 3 per existing-code flow.

Step 3 (Test Spec) — _docs/02_document/tests/* (67 scenarios across
blackbox, security, resilience, resource-limit, performance), plus
e2e/docker-compose.test.yml, e2e/seed/run.sh, scripts/run-tests.sh,
scripts/run-performance-tests.sh. Coverage 88% over the active scope
(40 of 45 items covered, 6 RB-deferred, 5 documented-as-uncovered).

Targeted auth + CORS re-sync — replaces the deleted in-house token
issuer with a JWKS-verifier model. AuthController and TokenService
removed; JwtExtensions switched from HS256 symmetric to ES256 over
admin's JWKS. ConfigurationResolver and CorsConfigurationValidator
added under src/Infrastructure/. ADR-002 and ADR-006 retired; SEC-01,
SEC-02, SEC-03 marked Closed. One new testability risk recorded in
architecture.md Open Risks Section 6 (JWKS HTTPS gating).

Source changes:
- src/Auth/JwtExtensions.cs (modified) — ES256, JWKS, alg pinning
- src/Program.cs (modified) — DI wiring for ConfigurationResolver
  and CorsConfigurationValidator
- src/Controllers/AuthController.cs (deleted) — no in-service issuance
- src/Services/TokenService.cs (deleted) — same
- src/Infrastructure/ConfigurationResolver.cs (new)
- src/Infrastructure/CorsConfigurationValidator.cs (new)
- .env.example (new) — required env var documentation
- .gitignore (updated)

Cross-repo coordination: _docs/cross-repo/flights_h1_h2_h3_change_spec
captures the change-spec for downstream services that consumed the now
deleted /auth endpoints.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-14 20:19:05 +03:00

56 lines
2.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Observability
Source of truth: `src/Program.cs` (no dedicated logging config files exist in repo).
## Health check
```csharp
app.MapGet("/health", () => Results.Ok(new { status = "healthy" }));
```
- Path: `GET /health`
- Auth: none (`MapGet` bypasses controller-level `[Authorize]`).
- Response: `200 { "status": "healthy" }`
- **Liveness only**: this endpoint does not probe the DB, RabbitMQ, or filesystem. A pod can return healthy while the failsafe outbox is unable to publish or while DB connectivity is broken.
## API documentation
- `app.UseSwagger()` and `app.UseSwaggerUI()` mounted unconditionally (ADR-005).
- Endpoints: `/swagger/v1/swagger.json` (OpenAPI), `/swagger/index.html` (UI).
- No version pinning of the OpenAPI document (Swashbuckle defaults).
## Logging
- Default ASP.NET Core console logger. No `appsettings.json` overrides in repo.
- No structured logger (Serilog / NLog) configured.
- No correlation id middleware in repo (`X-Request-Id` not propagated).
## Metrics
None configured today. Possible additions:
- `prometheus-net` exporter on `/metrics`.
- ASP.NET Core `MetricsCollector` (built-in HTTP / runtime counters).
## Traces
None configured. OpenTelemetry SDK is not referenced in `csproj`.
## Image revision stamp
The runtime container has `AZAION_REVISION = $CI_COMMIT_SHA` set as an env var (Dockerfile + Woodpecker pipeline). This makes "what's running?" diagnosable from inside the container with `printenv AZAION_REVISION` or by surfacing it in a future `/info` endpoint.
## Error visibility to clients
`ErrorHandlingMiddleware` maps exceptions to JSON `{ statusCode, message }` with HTTP 400 / 404 / 409 / 500. Internal exception details are not leaked beyond the `message` string (confirm during Step 4 verification — make sure 500 paths do not echo stack traces).
## Open observability items
- **Readiness vs liveness split**: today there is one endpoint that does not check dependencies. A `GET /ready` that pings DB and (optionally) RabbitMQ would unblock proper rolling-update gates.
- **Structured logs** with request id correlation across HTTP + outbox drain + SSE.
- **Outbox depth metric** (`COUNT(*)` on `annotations_queue_records`) — surfaces stuck-failsafe scenarios early.
- **SSE subscriber count metric** — visibility into connected UIs.
- **Stream publish lag** — time from outbox row insertion to RabbitMQ publish.
- **Failure injection / chaos hooks** — none today.
These are candidates for the deploy-and-retro phase of autodev (Steps 1417 once the project enters Phase B).