Captures the full output of autodev existing-code Phase A through Step 4 (Code Testability Revision) for the Azaion UI workspace: - Step 1 Document: _docs/02_document/ (FINAL_report, architecture, glossary, components/, modules/, diagrams/, system-flows, module-layout) plus _docs/00_problem/ + _docs/01_solution/ + _docs/legacy/ + _docs/how_to_test + README. - Step 2 Architecture Baseline: architecture_compliance_baseline.md. - Step 3 Test Spec: _docs/02_document/tests/ (environment, test-data, blackbox/performance/resilience/security/ resource-limit tests, traceability-matrix), enum_spec_snapshot, expected_results/results_report.md (98 rows), plus the run-tests.sh + run-performance-tests.sh runners. - Step 4 Code Testability Revision: 01-testability-refactoring/ run dir (list-of-changes C01-C07, deferred_to_refactor, analysis/research_findings + refactoring_roadmap) and the 7 child task specs AZ-448..AZ-454 under _docs/02_tasks/todo/ plus _dependencies_table.md. - _docs/_autodev_state.md pins the cursor at Step 4 / refactor Phase 4 entry so /autodev resumes cleanly. Epic AZ-447 (UI testability gates) tracks the 7 child tasks that will land in subsequent commits. Co-authored-by: Cursor <cursoragent@cursor.com>
4.1 KiB
Azaion UI — Observability
Synthesis output of
/documentStep 3d (observability). Derived from inspection of all module docs +nginx.conf+ the absence of any client telemetry SDK inpackage.json.
1. Status: minimal
The browser-side SPA emits no centralized telemetry today:
- No analytics SDK (no
@sentry/*,@datadog/*,web-vitals,posthog, etc.). - No error reporting service.
- No client-side feature-flag service.
- Errors that aren't caught by an
<ErrorBoundary>(which doesn't exist today — finding in10_app-shell) end up asconsole.erroronly.
This is acceptable as a starting state. A future iteration adds an error-tracking SDK (Sentry candidate) with the SDK key sourced from a runtime /config.json — see environment_strategy.md.
2. Existing logging (per module)
| Module | What is logged | How | Why it's unsatisfactory |
|---|---|---|---|
01_api-transport/client.ts |
request / response errors | console.error |
No retries, no spans, no correlation IDs |
01_api-transport/sse.ts |
EventSource errors | console.error |
No reconnect logic; no telemetry |
02_auth/AuthContext.tsx |
login / refresh outcomes | console.error |
Successful refresh is silent (good); failures are silent (bad — need user-visible recovery flow) |
03_shared-ui/FlightContext.tsx |
flight load + select-flight errors | swallowed | selectFlight is fire-and-forget, error invisible |
06_annotations/AnnotationsSidebar.tsx |
AI-detect errors | console.error |
User sees no feedback (finding #21–23) |
06_annotations/AnnotationsPage.tsx |
save errors | partial — handleSave has fallback that hides save loss (finding) |
Worst case: user thinks the annotation saved but it didn't |
07_dataset/DatasetPage.tsx |
various | swallowed catch blocks (finding #6) |
Same risk |
05_flights/FlightsPage.tsx |
save partial-failure | not detected | Per-waypoint failures invisible (finding #19) |
05_flights/flightPlanUtils.ts |
weather fetch errors | swallowed silently | Wind data missing → battery estimate wrong; user not informed |
The dominant pattern is "silent catch + console.error" — this is the single biggest observability gap.
3. Server-side logs the UI relies on
The suite services (admin, flights, annotations, detect, etc.) are responsible for:
- Audit logging (login, logout, role changes, destructive admin actions)
- Request tracing (the UI does not send a
traceparentheader today — Step 6 candidate) - Performance metrics (UI does not measure RUM)
The UI's bug-reproduction story relies on suite-side logs. A correlation ID injected by the UI on every request would dramatically simplify cross-service debugging — a Step 6 problem-extraction surface.
4. Client-side metrics (none)
No web-vitals or equivalent is installed. Recommended (Step 5 solution surface):
- CLS (cumulative layout shift) — the canvas + leaflet + chart layout has known shifts on initial load.
- LCP (largest contentful paint) — the bundle is the dominant cost.
- FID / INP (interaction latency) — relevant for the canvas drag and waypoint drag-drop.
- Custom metrics: time-to-first-flight-list, time-to-first-thumbnail, time-to-first-detection.
5. Error boundaries
10_app-shell finding: no <ErrorBoundary> wraps the route tree. A single uncaught render error today crashes the whole SPA. Step 4 / Step 5 surface — add a top-level <ErrorBoundary> plus per-feature boundaries for the canvas / map / chart so isolated failures don't take down the whole UI.
6. Recommended near-term improvements (Step 5 solution candidates)
- Add a top-level
<ErrorBoundary>inApp.tsxwith a "something broke" recovery card. - Replace silent catches (
}catch {}) withconsole.error+ user toast — at minimum. - Inject a correlation ID (
X-Request-Idheader) on every fetch + EventSource. - Surface AI-detect progress + errors — see Flow F7 (currently flow doesn't even subscribe).
- Add Sentry (or equivalent) with runtime-config-driven DSN.
- Add
web-vitals+ emit to suite admin/ telemetry endpoint.