Files
annotations/_docs/02_document/deployment/observability.md
T
Oleksandr Bezdieniezhnykh 03f879206e docs+src: complete Steps 1-3 outcomes + auth re-sync baseline
This commit captures everything produced during autodev existing-code
Steps 1 (Document), 2 (Architecture Baseline Scan), and 3 (Test Spec),
together with the targeted auth + CORS re-sync triggered on 2026-05-14
when codebase drift was detected at Step 4 entry. None of this work was
previously committed.

Step 1 (Document) — 50+ _docs/02_document/ files: problem, solution,
architecture, system flows, glossary, module-layout, per-component
specs (01..06), modules, deployment, diagrams, data model, FINAL
report, verification log, discovery.

Step 2 (Architecture Baseline) — architecture_compliance_baseline.md.
Verdict PASS_WITH_WARNINGS (0 Critical, 0 High, 1 Medium, 2 Low). No
High/Critical findings; auto-chained to Step 3 per existing-code flow.

Step 3 (Test Spec) — _docs/02_document/tests/* (67 scenarios across
blackbox, security, resilience, resource-limit, performance), plus
e2e/docker-compose.test.yml, e2e/seed/run.sh, scripts/run-tests.sh,
scripts/run-performance-tests.sh. Coverage 88% over the active scope
(40 of 45 items covered, 6 RB-deferred, 5 documented-as-uncovered).

Targeted auth + CORS re-sync — replaces the deleted in-house token
issuer with a JWKS-verifier model. AuthController and TokenService
removed; JwtExtensions switched from HS256 symmetric to ES256 over
admin's JWKS. ConfigurationResolver and CorsConfigurationValidator
added under src/Infrastructure/. ADR-002 and ADR-006 retired; SEC-01,
SEC-02, SEC-03 marked Closed. One new testability risk recorded in
architecture.md Open Risks Section 6 (JWKS HTTPS gating).

Source changes:
- src/Auth/JwtExtensions.cs (modified) — ES256, JWKS, alg pinning
- src/Program.cs (modified) — DI wiring for ConfigurationResolver
  and CorsConfigurationValidator
- src/Controllers/AuthController.cs (deleted) — no in-service issuance
- src/Services/TokenService.cs (deleted) — same
- src/Infrastructure/ConfigurationResolver.cs (new)
- src/Infrastructure/CorsConfigurationValidator.cs (new)
- .env.example (new) — required env var documentation
- .gitignore (updated)

Cross-repo coordination: _docs/cross-repo/flights_h1_h2_h3_change_spec
captures the change-spec for downstream services that consumed the now
deleted /auth endpoints.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-14 20:19:05 +03:00

2.5 KiB
Raw Blame History

Observability

Source of truth: src/Program.cs (no dedicated logging config files exist in repo).

Health check

app.MapGet("/health", () => Results.Ok(new { status = "healthy" }));
  • Path: GET /health
  • Auth: none (MapGet bypasses controller-level [Authorize]).
  • Response: 200 { "status": "healthy" }
  • Liveness only: this endpoint does not probe the DB, RabbitMQ, or filesystem. A pod can return healthy while the failsafe outbox is unable to publish or while DB connectivity is broken.

API documentation

  • app.UseSwagger() and app.UseSwaggerUI() mounted unconditionally (ADR-005).
  • Endpoints: /swagger/v1/swagger.json (OpenAPI), /swagger/index.html (UI).
  • No version pinning of the OpenAPI document (Swashbuckle defaults).

Logging

  • Default ASP.NET Core console logger. No appsettings.json overrides in repo.
  • No structured logger (Serilog / NLog) configured.
  • No correlation id middleware in repo (X-Request-Id not propagated).

Metrics

None configured today. Possible additions:

  • prometheus-net exporter on /metrics.
  • ASP.NET Core MetricsCollector (built-in HTTP / runtime counters).

Traces

None configured. OpenTelemetry SDK is not referenced in csproj.

Image revision stamp

The runtime container has AZAION_REVISION = $CI_COMMIT_SHA set as an env var (Dockerfile + Woodpecker pipeline). This makes "what's running?" diagnosable from inside the container with printenv AZAION_REVISION or by surfacing it in a future /info endpoint.

Error visibility to clients

ErrorHandlingMiddleware maps exceptions to JSON { statusCode, message } with HTTP 400 / 404 / 409 / 500. Internal exception details are not leaked beyond the message string (confirm during Step 4 verification — make sure 500 paths do not echo stack traces).

Open observability items

  • Readiness vs liveness split: today there is one endpoint that does not check dependencies. A GET /ready that pings DB and (optionally) RabbitMQ would unblock proper rolling-update gates.
  • Structured logs with request id correlation across HTTP + outbox drain + SSE.
  • Outbox depth metric (COUNT(*) on annotations_queue_records) — surfaces stuck-failsafe scenarios early.
  • SSE subscriber count metric — visibility into connected UIs.
  • Stream publish lag — time from outbox row insertion to RabbitMQ publish.
  • Failure injection / chaos hooks — none today.

These are candidates for the deploy-and-retro phase of autodev (Steps 1417 once the project enters Phase B).