Files
Oleksandr Bezdieniezhnykh 510df68bcf [AZ-447] autodev Steps 1-4 baseline: docs, tests, refactor specs
Captures the full output of autodev existing-code Phase A through
Step 4 (Code Testability Revision) for the Azaion UI workspace:

- Step 1 Document: _docs/02_document/ (FINAL_report, architecture,
  glossary, components/, modules/, diagrams/, system-flows,
  module-layout) plus _docs/00_problem/ + _docs/01_solution/ +
  _docs/legacy/ + _docs/how_to_test + README.
- Step 2 Architecture Baseline: architecture_compliance_baseline.md.
- Step 3 Test Spec: _docs/02_document/tests/ (environment,
  test-data, blackbox/performance/resilience/security/
  resource-limit tests, traceability-matrix), enum_spec_snapshot,
  expected_results/results_report.md (98 rows), plus the
  run-tests.sh + run-performance-tests.sh runners.
- Step 4 Code Testability Revision: 01-testability-refactoring/
  run dir (list-of-changes C01-C07, deferred_to_refactor,
  analysis/research_findings + refactoring_roadmap) and the 7
  child task specs AZ-448..AZ-454 under _docs/02_tasks/todo/
  plus _dependencies_table.md.
- _docs/_autodev_state.md pins the cursor at Step 4 / refactor
  Phase 4 entry so /autodev resumes cleanly.

Epic AZ-447 (UI testability gates) tracks the 7 child tasks that
will land in subsequent commits.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-11 00:38:49 +03:00

65 lines
4.1 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Azaion UI — Observability
> Synthesis output of `/document` Step 3d (observability). Derived from inspection
> of all module docs + `nginx.conf` + the absence of any client telemetry SDK
> in `package.json`.
## 1. Status: minimal
The browser-side SPA emits **no centralized telemetry today**:
- No analytics SDK (no `@sentry/*`, `@datadog/*`, `web-vitals`, `posthog`, etc.).
- No error reporting service.
- No client-side feature-flag service.
- Errors that aren't caught by an `<ErrorBoundary>` (which doesn't exist today — finding in `10_app-shell`) end up as `console.error` only.
This is acceptable as a starting state. A future iteration adds an error-tracking SDK (Sentry candidate) with the SDK key sourced from a runtime `/config.json` — see `environment_strategy.md`.
## 2. Existing logging (per module)
| Module | What is logged | How | Why it's unsatisfactory |
|--------|----------------|-----|-------------------------|
| `01_api-transport/client.ts` | request / response errors | `console.error` | No retries, no spans, no correlation IDs |
| `01_api-transport/sse.ts` | EventSource errors | `console.error` | No reconnect logic; no telemetry |
| `02_auth/AuthContext.tsx` | login / refresh outcomes | `console.error` | Successful refresh is silent (good); failures are silent (bad — need user-visible recovery flow) |
| `03_shared-ui/FlightContext.tsx` | flight load + select-flight errors | swallowed | `selectFlight` is fire-and-forget, error invisible |
| `06_annotations/AnnotationsSidebar.tsx` | AI-detect errors | `console.error` | User sees no feedback (finding #2123) |
| `06_annotations/AnnotationsPage.tsx` | save errors | partial — `handleSave` has fallback that **hides save loss** (finding) | Worst case: user thinks the annotation saved but it didn't |
| `07_dataset/DatasetPage.tsx` | various | swallowed `catch` blocks (finding #6) | Same risk |
| `05_flights/FlightsPage.tsx` | save partial-failure | not detected | Per-waypoint failures invisible (finding #19) |
| `05_flights/flightPlanUtils.ts` | weather fetch errors | swallowed silently | Wind data missing → battery estimate wrong; user not informed |
The dominant pattern is "silent catch + console.error" — this is the single biggest observability gap.
## 3. Server-side logs the UI relies on
The suite services (admin, flights, annotations, detect, etc.) are responsible for:
- Audit logging (login, logout, role changes, destructive admin actions)
- Request tracing (the UI does not send a `traceparent` header today — Step 6 candidate)
- Performance metrics (UI does not measure RUM)
The UI's bug-reproduction story relies on suite-side logs. A correlation ID injected by the UI on every request would dramatically simplify cross-service debugging — a Step 6 problem-extraction surface.
## 4. Client-side metrics (none)
No `web-vitals` or equivalent is installed. Recommended (Step 5 solution surface):
- **CLS** (cumulative layout shift) — the canvas + leaflet + chart layout has known shifts on initial load.
- **LCP** (largest contentful paint) — the bundle is the dominant cost.
- **FID / INP** (interaction latency) — relevant for the canvas drag and waypoint drag-drop.
- **Custom metrics**: time-to-first-flight-list, time-to-first-thumbnail, time-to-first-detection.
## 5. Error boundaries
`10_app-shell` finding: no `<ErrorBoundary>` wraps the route tree. A single uncaught render error today crashes the whole SPA. Step 4 / Step 5 surface — add a top-level `<ErrorBoundary>` plus per-feature boundaries for the canvas / map / chart so isolated failures don't take down the whole UI.
## 6. Recommended near-term improvements (Step 5 solution candidates)
1. **Add a top-level `<ErrorBoundary>`** in `App.tsx` with a "something broke" recovery card.
2. **Replace silent catches** (`}` `catch {}`) with `console.error` + user toast — at minimum.
3. **Inject a correlation ID** (`X-Request-Id` header) on every fetch + EventSource.
4. **Surface AI-detect progress + errors** — see Flow F7 (currently flow doesn't even subscribe).
5. **Add Sentry (or equivalent)** with runtime-config-driven DSN.
6. **Add `web-vitals`** + emit to suite admin/ telemetry endpoint.