mirror of
https://github.com/azaion/ui.git
synced 2026-06-21 15:21:11 +00:00
510df68bcf
Captures the full output of autodev existing-code Phase A through Step 4 (Code Testability Revision) for the Azaion UI workspace: - Step 1 Document: _docs/02_document/ (FINAL_report, architecture, glossary, components/, modules/, diagrams/, system-flows, module-layout) plus _docs/00_problem/ + _docs/01_solution/ + _docs/legacy/ + _docs/how_to_test + README. - Step 2 Architecture Baseline: architecture_compliance_baseline.md. - Step 3 Test Spec: _docs/02_document/tests/ (environment, test-data, blackbox/performance/resilience/security/ resource-limit tests, traceability-matrix), enum_spec_snapshot, expected_results/results_report.md (98 rows), plus the run-tests.sh + run-performance-tests.sh runners. - Step 4 Code Testability Revision: 01-testability-refactoring/ run dir (list-of-changes C01-C07, deferred_to_refactor, analysis/research_findings + refactoring_roadmap) and the 7 child task specs AZ-448..AZ-454 under _docs/02_tasks/todo/ plus _dependencies_table.md. - _docs/_autodev_state.md pins the cursor at Step 4 / refactor Phase 4 entry so /autodev resumes cleanly. Epic AZ-447 (UI testability gates) tracks the 7 child tasks that will land in subsequent commits. Co-authored-by: Cursor <cursoragent@cursor.com>
65 lines
4.1 KiB
Markdown
65 lines
4.1 KiB
Markdown
# Azaion UI — Observability
|
||
|
||
> Synthesis output of `/document` Step 3d (observability). Derived from inspection
|
||
> of all module docs + `nginx.conf` + the absence of any client telemetry SDK
|
||
> in `package.json`.
|
||
|
||
## 1. Status: minimal
|
||
|
||
The browser-side SPA emits **no centralized telemetry today**:
|
||
|
||
- No analytics SDK (no `@sentry/*`, `@datadog/*`, `web-vitals`, `posthog`, etc.).
|
||
- No error reporting service.
|
||
- No client-side feature-flag service.
|
||
- Errors that aren't caught by an `<ErrorBoundary>` (which doesn't exist today — finding in `10_app-shell`) end up as `console.error` only.
|
||
|
||
This is acceptable as a starting state. A future iteration adds an error-tracking SDK (Sentry candidate) with the SDK key sourced from a runtime `/config.json` — see `environment_strategy.md`.
|
||
|
||
## 2. Existing logging (per module)
|
||
|
||
| Module | What is logged | How | Why it's unsatisfactory |
|
||
|--------|----------------|-----|-------------------------|
|
||
| `01_api-transport/client.ts` | request / response errors | `console.error` | No retries, no spans, no correlation IDs |
|
||
| `01_api-transport/sse.ts` | EventSource errors | `console.error` | No reconnect logic; no telemetry |
|
||
| `02_auth/AuthContext.tsx` | login / refresh outcomes | `console.error` | Successful refresh is silent (good); failures are silent (bad — need user-visible recovery flow) |
|
||
| `03_shared-ui/FlightContext.tsx` | flight load + select-flight errors | swallowed | `selectFlight` is fire-and-forget, error invisible |
|
||
| `06_annotations/AnnotationsSidebar.tsx` | AI-detect errors | `console.error` | User sees no feedback (finding #21–23) |
|
||
| `06_annotations/AnnotationsPage.tsx` | save errors | partial — `handleSave` has fallback that **hides save loss** (finding) | Worst case: user thinks the annotation saved but it didn't |
|
||
| `07_dataset/DatasetPage.tsx` | various | swallowed `catch` blocks (finding #6) | Same risk |
|
||
| `05_flights/FlightsPage.tsx` | save partial-failure | not detected | Per-waypoint failures invisible (finding #19) |
|
||
| `05_flights/flightPlanUtils.ts` | weather fetch errors | swallowed silently | Wind data missing → battery estimate wrong; user not informed |
|
||
|
||
The dominant pattern is "silent catch + console.error" — this is the single biggest observability gap.
|
||
|
||
## 3. Server-side logs the UI relies on
|
||
|
||
The suite services (admin, flights, annotations, detect, etc.) are responsible for:
|
||
|
||
- Audit logging (login, logout, role changes, destructive admin actions)
|
||
- Request tracing (the UI does not send a `traceparent` header today — Step 6 candidate)
|
||
- Performance metrics (UI does not measure RUM)
|
||
|
||
The UI's bug-reproduction story relies on suite-side logs. A correlation ID injected by the UI on every request would dramatically simplify cross-service debugging — a Step 6 problem-extraction surface.
|
||
|
||
## 4. Client-side metrics (none)
|
||
|
||
No `web-vitals` or equivalent is installed. Recommended (Step 5 solution surface):
|
||
|
||
- **CLS** (cumulative layout shift) — the canvas + leaflet + chart layout has known shifts on initial load.
|
||
- **LCP** (largest contentful paint) — the bundle is the dominant cost.
|
||
- **FID / INP** (interaction latency) — relevant for the canvas drag and waypoint drag-drop.
|
||
- **Custom metrics**: time-to-first-flight-list, time-to-first-thumbnail, time-to-first-detection.
|
||
|
||
## 5. Error boundaries
|
||
|
||
`10_app-shell` finding: no `<ErrorBoundary>` wraps the route tree. A single uncaught render error today crashes the whole SPA. Step 4 / Step 5 surface — add a top-level `<ErrorBoundary>` plus per-feature boundaries for the canvas / map / chart so isolated failures don't take down the whole UI.
|
||
|
||
## 6. Recommended near-term improvements (Step 5 solution candidates)
|
||
|
||
1. **Add a top-level `<ErrorBoundary>`** in `App.tsx` with a "something broke" recovery card.
|
||
2. **Replace silent catches** (`}` `catch {}`) with `console.error` + user toast — at minimum.
|
||
3. **Inject a correlation ID** (`X-Request-Id` header) on every fetch + EventSource.
|
||
4. **Surface AI-detect progress + errors** — see Flow F7 (currently flow doesn't even subscribe).
|
||
5. **Add Sentry (or equivalent)** with runtime-config-driven DSN.
|
||
6. **Add `web-vitals`** + emit to suite admin/ telemetry endpoint.
|