[AZ-447] autodev Steps 1-4 baseline: docs, tests, refactor specs

Captures the full output of autodev existing-code Phase A through
Step 4 (Code Testability Revision) for the Azaion UI workspace:

- Step 1 Document: _docs/02_document/ (FINAL_report, architecture,
  glossary, components/, modules/, diagrams/, system-flows,
  module-layout) plus _docs/00_problem/ + _docs/01_solution/ +
  _docs/legacy/ + _docs/how_to_test + README.
- Step 2 Architecture Baseline: architecture_compliance_baseline.md.
- Step 3 Test Spec: _docs/02_document/tests/ (environment,
  test-data, blackbox/performance/resilience/security/
  resource-limit tests, traceability-matrix), enum_spec_snapshot,
  expected_results/results_report.md (98 rows), plus the
  run-tests.sh + run-performance-tests.sh runners.
- Step 4 Code Testability Revision: 01-testability-refactoring/
  run dir (list-of-changes C01-C07, deferred_to_refactor,
  analysis/research_findings + refactoring_roadmap) and the 7
  child task specs AZ-448..AZ-454 under _docs/02_tasks/todo/
  plus _dependencies_table.md.
- _docs/_autodev_state.md pins the cursor at Step 4 / refactor
  Phase 4 entry so /autodev resumes cleanly.

Epic AZ-447 (UI testability gates) tracks the 7 child tasks that
will land in subsequent commits.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-11 00:38:49 +03:00
parent da0a5aa187
commit 510df68bcf
84 changed files with 13065 additions and 0 deletions
@@ -0,0 +1,56 @@
# Azaion UI — CI/CD Pipeline
> Synthesis output of `/document` Step 3d (ci_cd_pipeline). Derived from
> `.woodpecker/build-arm.yml`.
## 1. Triggers
| Branch | Triggers | Image tag |
|--------|----------|-----------|
| `dev` | every push | `${REGISTRY_HOST}/azaion/ui:dev-arm` |
| `stage` | every push | `${REGISTRY_HOST}/azaion/ui:stage-arm` |
| `main` | every push | `${REGISTRY_HOST}/azaion/ui:main-arm` |
Other branches do NOT build (PR builds, feature-branch builds, tag builds — none configured today).
## 2. Steps
| # | Step | What | Notes |
|---|------|------|-------|
| 1 | Checkout | `git clone` + `git checkout $CI_COMMIT_SHA` | Standard Woodpecker behaviour |
| 2 | Build + Push image | Multi-stage Dockerfile produces `nginx:alpine` image with `dist/` baked in | Pushes to `${REGISTRY_HOST}/azaion/ui:${branch}-arm` with OCI labels (revision, created, source) |
**Missing steps** (recommended for autodev Steps 57):
| Step | Purpose | Tool candidates |
|------|---------|-----------------|
| `bun install --frozen-lockfile` smoke | Catch lockfile drift before build | First few seconds of the build stage cover this |
| `tsc --noEmit` | Type-check the whole project | Already part of `bun run build` (`tsc -b && vite build`) |
| `bun test` (or vitest / jest) | Run test suite | **Required** — there is no test runner today |
| `eslint` / `biome` | Lint | Not configured today |
| Vulnerability scan | CVE scan on the image | `trivy` or `grype` candidates |
| SBOM emission | Software bill of materials | `syft` candidate |
| Image signing | Supply-chain trust | `cosign` candidate |
| Multi-arch build | Add AMD64 alongside ARM64 | `docker buildx` candidates |
These are tracked as Step 47 deliverables under autodev; the current pipeline is correct but minimal.
## 3. Secrets & registry
- `${REGISTRY_HOST}` — provided by Woodpecker secrets at runtime.
- Registry credentials — stored as Woodpecker secrets; not in this repo.
- No GPG/TLS signing keys today.
## 4. Branch model
- `dev` is the active development branch (per `.cursor/rules/git-workflow.mdc`).
- `stage` is for pre-production validation.
- `main` is production.
- No `release/*` long-lived branches.
- PR builds are not configured (Woodpecker build only fires on push, not on PR open).
## 5. Build artifact
The output of the pipeline is exactly one OCI image per push: `${REGISTRY_HOST}/azaion/ui:${branch}-arm`. There is **no** versioned image tag (e.g., `1.2.3-arm`); branch tags are mutable. The OCI `revision` label is the deterministic anchor (= `$CI_COMMIT_SHA`).
**Future**: when this UI ships under a versioned suite release, the pipeline should also tag images with `vMAJOR.MINOR.PATCH-arm` derived from `package.json` `version`.
@@ -0,0 +1,72 @@
# Azaion UI — Containerization
> Synthesis output of `/document` Step 3d (containerization). Derived from
> `Dockerfile`, `nginx.conf`, and `00_discovery.md` §3.
## 1. Image
**Multi-stage build** (`Dockerfile`):
| Stage | Base image | Role |
|-------|------------|------|
| 1 (builder) | `oven/bun:1.3.11-alpine` | `bun install --frozen-lockfile` + `bun run build` (= `tsc -b && vite build`) → `dist/` |
| 2 (runtime) | `nginx:alpine` | Serves `/usr/share/nginx/html` (`dist/`); listens on `:80` |
**Why this shape**:
- Bun gives a fast install + build vs. npm/yarn/pnpm.
- nginx alpine is a sub-25 MB runtime that already has reverse-proxy routing for `/api`.
- No Node runtime in production → smaller attack surface, faster startup, lower memory.
**Image labels** (OCI, set by Woodpecker CI):
- `org.opencontainers.image.revision = $CI_COMMIT_SHA`
- `org.opencontainers.image.created = $CI_BUILD_CREATED`
- `org.opencontainers.image.source = <repo url>`
**Environment**:
- `AZAION_REVISION = $CI_COMMIT_SHA` — accessible at runtime for diagnostics.
- No other env vars consumed at runtime by the SPA bundle (the bundle is fully static).
## 2. nginx routing (`nginx.conf`)
The image's nginx config strips `/api/<service>/` and reverse-proxies to the matching suite service inside the container network.
| Public path | Upstream (intra-cluster) | Service |
|-------------|--------------------------|---------|
| `/api/annotations/` | `http://annotations:8080/` | `annotations/` |
| `/api/flights/` | `http://flights:8080/` | `flights/` |
| `/api/admin/` | `http://admin:8080/` | `admin/` |
| `/api/resource/` | `http://resource:8080/` | `resource/` |
| `/api/detect/` | `http://detect:8080/` | `detect/` |
| `/api/loader/` | `http://loader:8080/` | `loader/` |
| `/api/gps-denied-desktop/` | `http://gps-denied-desktop:8080/` | `gps-denied-desktop/` |
| `/api/gps-denied-onboard/` | `http://gps-denied-onboard:8080/` | `gps-denied-onboard/` |
| `/api/autopilot/` | `http://autopilot:8080/` | `autopilot/` |
| `/` (any other path) | static fallback to `/index.html` (SPA routing) | — |
**Body size cap**: `client_max_body_size 500M` — tlog + video uploads in GPS-Denied Test Mode and large image uploads in Annotations both ride this limit.
**Headers passed to upstream**: standard `Host`, `X-Real-IP`, `X-Forwarded-For`, `X-Forwarded-Proto` (assumed — verify in `nginx.conf`).
**SSE handling**: `proxy_buffering off` MUST be set on `/api/detect/` and any other path that streams (Step 4 verification — confirm in `nginx.conf`).
## 3. Resource sizing (recommended, not enforced)
| Resource | Recommendation | Rationale |
|----------|----------------|-----------|
| CPU | 100 m (0.1 vCPU) | nginx is near-idle; 99 % of work is suite services |
| Memory | 32 Mi | nginx + ~5 MB of static assets |
| Storage | ephemeral 50 Mi | bundle is sub-5 MB gzipped today; some headroom |
| Replicas | 1+ | trivially horizontal; HA only matters if the ingress sits in front |
**Bundle size budget**: `vite build` output should stay under ~2 MB gzipped initial JS. Currently `chart.js` and `leaflet` are the dominant chunks; `AltitudeChart` is a lazy-load candidate (finding in `05_flights`).
## 4. Health checks
**Today: none.**
Recommended (Step 4 / Step 6 surface):
- **Liveness**: `GET /index.html → 200`
- **Readiness**: same (the SPA has no warm-up)
- **Container health**: `wget --spider -q http://localhost/index.html`
The suite-level orchestrator (parent suite docker-compose / k8s) is expected to handle ingress health-checking; individual UI replicas don't need their own.
@@ -0,0 +1,51 @@
# Azaion UI — Environment Strategy
> Synthesis output of `/document` Step 3d (environment_strategy). Derived from
> `vite.config.ts`, `nginx.conf`, `.gitignore`, the workspace `README.md`, and
> the absence of a workspace `.env.example`.
## 1. Environments
| Env | How it runs | API base | Auth | Tile providers |
|-----|-------------|----------|------|----------------|
| Development | `bun run dev` (Vite dev server, port 5173) | Vite dev proxy: `/api → http://localhost:8080` (configured in `vite.config.ts`) | Suite admin/ service running locally (typically via parent suite `docker-compose up`) | OSM + satellite (env-configurable in mission-planner only) |
| Stage | nginx in container, ARM image `:stage-arm` | nginx `/api/<service>/ → http://<service>:8080/` (intra-cluster) | Stage suite admin/ service | Same |
| Production | nginx in container, ARM image `:main-arm` | nginx `/api/<service>/ → http://<service>:8080/` | Prod suite admin/ service | Same |
## 2. Configuration model
The SPA bundle is **fully static**. No env vars are read at runtime by the bundle. Every cross-environment difference is resolved at the **deployment edge** (nginx) or at the **suite-service level**.
| Concern | Where it's set | Notes |
|---------|----------------|-------|
| Backend API URL | nginx `proxy_pass` (`nginx.conf`) — same nginx config across stage / prod | Base URLs are intra-cluster service names (`http://annotations:8080`, etc.); the URL difference between environments is hidden by the orchestrator's DNS |
| Auth cookie domain | Set by suite admin/ service on `Set-Cookie` | UI does not control |
| Refresh-token lifetime | Set by suite admin/ service | UI tolerates any TTL |
| Tile provider URL (mission-planner) | `.env.example` declares `VITE_SATELLITE_TILE_URL` | mission-planner only; not deployed |
| OpenWeatherMap API key | **Hardcoded in source** (`flightPlanUtils.ts:60`) | Security finding — Step 4 fix to remove + proxy via suite |
| `AZAION_REVISION` | Stamped into image at build time | For diagnostics |
## 3. Why no `.env`
The workspace `.env.example` is **absent** today. The `README.md` "Local development" section explicitly notes this as a Step 4 testability fix.
**Trade-off**: avoiding a build-time env injection means `dist/` is identical across environments, which is great for promotability (the same image flows dev → stage → prod). The cost: the OpenWeatherMap key (and any future runtime config) cannot be changed without a rebuild.
**Future direction** (Step 4 / Step 5):
- Move the OpenWeatherMap call server-side (`flights/` service) — eliminates the bundled key entirely.
- Introduce a runtime `/config.json` that nginx serves — lets ops change feature flags / tile URLs without rebuilding.
- OR keep the static bundle and use Vite's `define` for build-time injection of safe-to-publish values (no secrets).
## 4. Promotability
The same image (`:dev-arm`, `:stage-arm`, `:main-arm`) is built per branch from the same Dockerfile. Theoretically the `:dev-arm` image is functionally identical to the `:main-arm` image except for the `AZAION_REVISION` label.
In practice: branch separation is the gating mechanism. Once dev → stage → main propagation is normalized, the safer pattern is to build ONE image per commit and re-tag it across environments (immutable image promotion). The Woodpecker pipeline does not implement this today; it rebuilds per-branch.
## 5. Local-dev quirks
- **Vite dev proxy** (`vite.config.ts`) requires the suite to be reachable on `http://localhost:8080`. If the parent suite's docker-compose binds to a different port, the developer must edit `vite.config.ts` (no env-driven override today).
- **`bun.lock`**: committed (per `package.json`'s `packageManager` field). `package-lock.json` is gitignored.
- **`.idea/`, `.claude/`, `.superpowers/`**: gitignored — IDE / agent metadata.
- **Playwright entries in `.gitignore`**: present but aspirational — Playwright is not installed (Step 57 territory).
- **mission-planner**: has its own `.env.example` declaring `VITE_SATELLITE_TILE_URL` and runs as a sibling Vite app. Not bundled into the deployed image.
@@ -0,0 +1,64 @@
# Azaion UI — Observability
> Synthesis output of `/document` Step 3d (observability). Derived from inspection
> of all module docs + `nginx.conf` + the absence of any client telemetry SDK
> in `package.json`.
## 1. Status: minimal
The browser-side SPA emits **no centralized telemetry today**:
- No analytics SDK (no `@sentry/*`, `@datadog/*`, `web-vitals`, `posthog`, etc.).
- No error reporting service.
- No client-side feature-flag service.
- Errors that aren't caught by an `<ErrorBoundary>` (which doesn't exist today — finding in `10_app-shell`) end up as `console.error` only.
This is acceptable as a starting state. A future iteration adds an error-tracking SDK (Sentry candidate) with the SDK key sourced from a runtime `/config.json` — see `environment_strategy.md`.
## 2. Existing logging (per module)
| Module | What is logged | How | Why it's unsatisfactory |
|--------|----------------|-----|-------------------------|
| `01_api-transport/client.ts` | request / response errors | `console.error` | No retries, no spans, no correlation IDs |
| `01_api-transport/sse.ts` | EventSource errors | `console.error` | No reconnect logic; no telemetry |
| `02_auth/AuthContext.tsx` | login / refresh outcomes | `console.error` | Successful refresh is silent (good); failures are silent (bad — need user-visible recovery flow) |
| `03_shared-ui/FlightContext.tsx` | flight load + select-flight errors | swallowed | `selectFlight` is fire-and-forget, error invisible |
| `06_annotations/AnnotationsSidebar.tsx` | AI-detect errors | `console.error` | User sees no feedback (finding #2123) |
| `06_annotations/AnnotationsPage.tsx` | save errors | partial — `handleSave` has fallback that **hides save loss** (finding) | Worst case: user thinks the annotation saved but it didn't |
| `07_dataset/DatasetPage.tsx` | various | swallowed `catch` blocks (finding #6) | Same risk |
| `05_flights/FlightsPage.tsx` | save partial-failure | not detected | Per-waypoint failures invisible (finding #19) |
| `05_flights/flightPlanUtils.ts` | weather fetch errors | swallowed silently | Wind data missing → battery estimate wrong; user not informed |
The dominant pattern is "silent catch + console.error" — this is the single biggest observability gap.
## 3. Server-side logs the UI relies on
The suite services (admin, flights, annotations, detect, etc.) are responsible for:
- Audit logging (login, logout, role changes, destructive admin actions)
- Request tracing (the UI does not send a `traceparent` header today — Step 6 candidate)
- Performance metrics (UI does not measure RUM)
The UI's bug-reproduction story relies on suite-side logs. A correlation ID injected by the UI on every request would dramatically simplify cross-service debugging — a Step 6 problem-extraction surface.
## 4. Client-side metrics (none)
No `web-vitals` or equivalent is installed. Recommended (Step 5 solution surface):
- **CLS** (cumulative layout shift) — the canvas + leaflet + chart layout has known shifts on initial load.
- **LCP** (largest contentful paint) — the bundle is the dominant cost.
- **FID / INP** (interaction latency) — relevant for the canvas drag and waypoint drag-drop.
- **Custom metrics**: time-to-first-flight-list, time-to-first-thumbnail, time-to-first-detection.
## 5. Error boundaries
`10_app-shell` finding: no `<ErrorBoundary>` wraps the route tree. A single uncaught render error today crashes the whole SPA. Step 4 / Step 5 surface — add a top-level `<ErrorBoundary>` plus per-feature boundaries for the canvas / map / chart so isolated failures don't take down the whole UI.
## 6. Recommended near-term improvements (Step 5 solution candidates)
1. **Add a top-level `<ErrorBoundary>`** in `App.tsx` with a "something broke" recovery card.
2. **Replace silent catches** (`}` `catch {}`) with `console.error` + user toast — at minimum.
3. **Inject a correlation ID** (`X-Request-Id` header) on every fetch + EventSource.
4. **Surface AI-detect progress + errors** — see Flow F7 (currently flow doesn't even subscribe).
5. **Add Sentry (or equivalent)** with runtime-config-driven DSN.
6. **Add `web-vitals`** + emit to suite admin/ telemetry endpoint.