# Observability Source of truth: `src/Program.cs` (no dedicated logging config files exist in repo). ## Health check ```csharp app.MapGet("/health", () => Results.Ok(new { status = "healthy" })); ``` - Path: `GET /health` - Auth: none (`MapGet` bypasses controller-level `[Authorize]`). - Response: `200 { "status": "healthy" }` - **Liveness only**: this endpoint does not probe the DB, RabbitMQ, or filesystem. A pod can return healthy while the failsafe outbox is unable to publish or while DB connectivity is broken. ## API documentation - `app.UseSwagger()` and `app.UseSwaggerUI()` mounted unconditionally (ADR-005). - Endpoints: `/swagger/v1/swagger.json` (OpenAPI), `/swagger/index.html` (UI). - No version pinning of the OpenAPI document (Swashbuckle defaults). ## Logging - Default ASP.NET Core console logger. No `appsettings.json` overrides in repo. - No structured logger (Serilog / NLog) configured. - No correlation id middleware in repo (`X-Request-Id` not propagated). ## Metrics None configured today. Possible additions: - `prometheus-net` exporter on `/metrics`. - ASP.NET Core `MetricsCollector` (built-in HTTP / runtime counters). ## Traces None configured. OpenTelemetry SDK is not referenced in `csproj`. ## Image revision stamp The runtime container has `AZAION_REVISION = $CI_COMMIT_SHA` set as an env var (Dockerfile + Woodpecker pipeline). This makes "what's running?" diagnosable from inside the container with `printenv AZAION_REVISION` or by surfacing it in a future `/info` endpoint. ## Error visibility to clients `ErrorHandlingMiddleware` maps exceptions to JSON `{ statusCode, message }` with HTTP 400 / 404 / 409 / 500. Internal exception details are not leaked beyond the `message` string (confirm during Step 4 verification — make sure 500 paths do not echo stack traces). ## Open observability items - **Readiness vs liveness split**: today there is one endpoint that does not check dependencies. A `GET /ready` that pings DB and (optionally) RabbitMQ would unblock proper rolling-update gates. - **Structured logs** with request id correlation across HTTP + outbox drain + SSE. - **Outbox depth metric** (`COUNT(*)` on `annotations_queue_records`) — surfaces stuck-failsafe scenarios early. - **SSE subscriber count metric** — visibility into connected UIs. - **Stream publish lag** — time from outbox row insertion to RabbitMQ publish. - **Failure injection / chaos hooks** — none today. These are candidates for the deploy-and-retro phase of autodev (Steps 14–17 once the project enters Phase B).