ui/_docs/02_document/deployment/observability.md

# Azaion UI — Observability

> Synthesis output of `/document` Step 3d (observability). Derived from inspection
> of all module docs + `nginx.conf` + the absence of any client telemetry SDK
> in `package.json`.

## 1. Status: minimal

The browser-side SPA emits **no centralized telemetry today**:

- No analytics SDK (no `@sentry/*`, `@datadog/*`, `web-vitals`, `posthog`, etc.).
- No error reporting service.
- No client-side feature-flag service.
- Errors that aren't caught by an `<ErrorBoundary>` (which doesn't exist today — finding in `10_app-shell`) end up as `console.error` only.

This is acceptable as a starting state. A future iteration adds an error-tracking SDK (Sentry candidate) with the SDK key sourced from a runtime `/config.json` — see `environment_strategy.md`.

## 2. Existing logging (per module)

| Module | What is logged | How | Why it's unsatisfactory |
|--------|----------------|-----|-------------------------|
| `01_api-transport/client.ts` | request / response errors | `console.error` | No retries, no spans, no correlation IDs |
| `01_api-transport/sse.ts` | EventSource errors | `console.error` | No reconnect logic; no telemetry |
| `02_auth/AuthContext.tsx` | login / refresh outcomes | `console.error` | Successful refresh is silent (good); failures are silent (bad — need user-visible recovery flow) |
| `03_shared-ui/FlightContext.tsx` | flight load + select-flight errors | swallowed | `selectFlight` is fire-and-forget, error invisible |
| `06_annotations/AnnotationsSidebar.tsx` | AI-detect errors | `console.error` | User sees no feedback (finding #21–23) |
| `06_annotations/AnnotationsPage.tsx` | save errors | partial — `handleSave` has fallback that **hides save loss** (finding) | Worst case: user thinks the annotation saved but it didn't |
| `07_dataset/DatasetPage.tsx` | various | swallowed `catch` blocks (finding #6) | Same risk |
| `05_flights/FlightsPage.tsx` | save partial-failure | not detected | Per-waypoint failures invisible (finding #19) |
| `05_flights/flightPlanUtils.ts` | weather fetch errors | swallowed silently | Wind data missing → battery estimate wrong; user not informed |

The dominant pattern is "silent catch + console.error" — this is the single biggest observability gap.

## 3. Server-side logs the UI relies on

The suite services (admin, flights, annotations, detect, etc.) are responsible for:

- Audit logging (login, logout, role changes, destructive admin actions)
- Request tracing (the UI does not send a `traceparent` header today — Step 6 candidate)
- Performance metrics (UI does not measure RUM)

The UI's bug-reproduction story relies on suite-side logs. A correlation ID injected by the UI on every request would dramatically simplify cross-service debugging — a Step 6 problem-extraction surface.

## 4. Client-side metrics (none)

No `web-vitals` or equivalent is installed. Recommended (Step 5 solution surface):

- **CLS** (cumulative layout shift) — the canvas + leaflet + chart layout has known shifts on initial load.
- **LCP** (largest contentful paint) — the bundle is the dominant cost.
- **FID / INP** (interaction latency) — relevant for the canvas drag and waypoint drag-drop.
- **Custom metrics**: time-to-first-flight-list, time-to-first-thumbnail, time-to-first-detection.

## 5. Error boundaries

`10_app-shell` finding: no `<ErrorBoundary>` wraps the route tree. A single uncaught render error today crashes the whole SPA. Step 4 / Step 5 surface — add a top-level `<ErrorBoundary>` plus per-feature boundaries for the canvas / map / chart so isolated failures don't take down the whole UI.

## 6. Recommended near-term improvements (Step 5 solution candidates)

1. **Add a top-level `<ErrorBoundary>`** in `App.tsx` with a "something broke" recovery card.
2. **Replace silent catches** (`}` `catch {}`) with `console.error` + user toast — at minimum.
3. **Inject a correlation ID** (`X-Request-Id` header) on every fetch + EventSource.
4. **Surface AI-detect progress + errors** — see Flow F7 (currently flow doesn't even subscribe).
5. **Add Sentry (or equivalent)** with runtime-config-driven DSN.
6. **Add `web-vitals`** + emit to suite admin/ telemetry endpoint.