Update demo replay validation and testing documentation
ci/woodpecker/push/02-build-push Pipeline failed

- Modified the autodev state to reflect the current testing phase and details of the new `jetson-e2e` tests.
- Enhanced the "How to Test" documentation to provide clearer instructions on the demo replay validation process, including video and tlog alignment steps.
- Updated architectural documentation to include the new demo replay operator flow and its dependencies.
- Documented the removal of deprecated auto-sync features and clarified the operator-facing UI for replay validation.
- Added new entries in the dependencies table for upcoming tasks related to the demo replay flow.

These changes improve clarity and usability for operators and developers working with the demo replay system.
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-06-20 11:24:43 +03:00
parent 12d0008763
commit 1f634c2604
175 changed files with 20701 additions and 41 deletions
+39 -5
View File
@@ -274,8 +274,8 @@ source repo
Two consequences for the architecture:
1. **C11 read contract adapted to the v1.0.0 inventory shape (AZ-777 Phase 1)**`POST /api/satellite/tiles/inventory` + `GET /tiles/{z}/{x}/{y}` replace the historical `GET /api/satellite/tiles?bbox=…&zoom=…` shape. The bbox-driven `download_tiles_for_area` entry point and its DTOs are unchanged at the call-site level; the contract adaptation is internal to `HttpTileDownloader`. Auth is JWT Bearer (`SATELLITE_PROVIDER_API_KEY`) over TLS; `SATELLITE_PROVIDER_TLS_INSECURE=1` is a documented dev-only knob for self-signed certs.
2. **Route-driven seeding (Epic AZ-835 — C11's third interface, `SatelliteProviderRouteClient`)** — the operator can now submit a tlog-derived `RouteSpec` (waypoints + region size; produced by `replay_input.tlog_route.extract_route_from_tlog` — AZ-836; canonical DTO at `_types/route.py` per AZ-845) via `POST /api/satellite/route` and have `satellite-provider` materialise just the corridor tiles, polling `GET /api/satellite/route/{id}` until `mapsReady=true`. This is ~100× more tile-efficient than the bbox path on long, narrow flights. Pre-emptive validation mirrors the AZ-809 `CreateRouteRequestValidator` bounds. The route-driven path is exercised today by the cycle-3 e2e fixture `operator_pre_flight_setup` (AZ-839) and the orchestrator test `test_az835_e2e_real_flight.py` (AZ-840); the C12 production CLI binding is a future-cycle integration.
1. **C11 read contract adapted to the v1.0.0 inventory shape (AZ-777 Phase 1)**`POST /api/satellite/tiles/inventory` + `GET /tiles/{z}/{x}/{y}` replace the historical `GET /api/satellite/tiles?bbox=…&zoom=…` shape. The bbox-driven `download_tiles_for_area` entry point and its DTOs are unchanged at the call-site level; the contract adaptation is internal to `HttpTileDownloader`. Auth is JWT Bearer (`SATELLITE_PROVIDER_API_KEY`) over TLS; `SATELLITE_PROVIDER_TLS_INSECURE=1` is a documented dev-only knob for self-signed certs. **Proposed successor (ADR-013 / AZ-976)**: gRPC `satellite.v1.RouteTileDelivery.DeliverRouteTiles` server-streaming with client tile catalog — see `tile_provision_grpc.md`; supersedes the never-shipped inventory REST endpoint.
2. **Route-driven seeding (Epic AZ-835 / AZ-969)** — the operator submits a tlog-derived `RouteSpec` (produced by `replay_input.tlog_route.extract_route_from_tlog` — AZ-836) via C12 `seed-cache-from-tlog` (AZ-974) or the F11 `replay_api` demo job (AZ-973). E2E fixture `operator_pre_flight_setup` wraps the same production `operator_replay.cache_seed` module.
**Imagery source license attribution (cycle 3)**: the Jetson `satellite-provider` instance downloads from the **Google Maps satellite layer** (`lyrs=s`), governed by Google Maps Platform Terms of Service. Dev/research use only; production deployment requires either a Google Maps Platform licensing review or migration to a true CC-BY satellite source on the parent-suite side (parent-suite ticket TBD). Operator-side seed scripts (`tests/fixtures/derkachi_c6/seed_region.py`, `seed_route.py`) propagate the "Imagery © Google" attribution.
@@ -292,11 +292,17 @@ Cycle 4 rebuilt the replay-mode operator-input surface around a single canonical
| **AZ-894** (CSV adapter) | New primary path | `csv_replay_input.CsvReplayInputAdapter` consumes a paired `(video, CSV)` where the CSV's `Time` column is the canonical clock for every IMU/GPS sample. Gated `BUILD_CSV_REPLAY_ADAPTER=ON` in airborne and research binaries; OFF in operator-orchestrator. |
| **AZ-895** (auto-sync deprecation) | Removed legacy | `replay_input.auto_sync` (AZ-405) reduced to a no-op stub that raises on first call; `tlog_video_adapter.py` reduced to a deprecated stub whose `open()` raises immediately. The legacy `--time-offset-ms` / `--skip-auto-sync` / `--auto-trim` CLI flags accepted-with-warning, ignored. Hard removal tracked in AZ-908 (cycle 5+ backlog). |
| **AZ-896** (CSV format spec) | Contract | `_docs/02_document/contracts/replay/csv_replay_format.md` documents the CSV row schema, the row-0-alignment-with-video-frame-0 invariant, and an example `data_imu.csv` shipped under the same path. |
| **AZ-897** (operator UI) | Cycle-5+ follow-up | First operator-facing UI surface — a React + Tailwind single-page form that uploads a paired `(video, CSV)`, links to AZ-896's format docs + example CSV, and tails the verdict from the headless `gps-denied-replay` invocation. Not on cycle-4 critical path; flagged here so the CSV format stays UI-friendly. |
| **AZ-897** (operator UI) | Cycle 5 — Epic AZ-969 | Dual-timeline `(video, tlog)` alignment UI in `../ui`; uploads raw tlog, calls `replay_api` preview/align/demo endpoints; displays map + verdict. Spec: `../ui/_docs/02_tasks/todo/AZ-897_operator_replay_sync_ui.md`. |
The architectural rationale is captured in **Invariant 14** of the replay protocol (`_docs/02_document/contracts/replay/replay_protocol.md`): the system runs as a single edge process on a single device; there must be exactly one wall/monotonic clock authoritative for timestamps that cross component boundaries. In live mode that clock is the C8 inbound `FcAdapter`'s FC-boot-relative timestamp; in replay mode (after cycle 4) it is the CSV row's `Time` column. The previous design's two-clock surface (Jetson monotonic at C1 VIO emission, FC-boot at C8 IMU window arrival) produced the AZ-848 regression and is retired with the auto-sync deprecation.
The legacy `TlogReplayFcAdapter` is retained for two audit-only paths — offline FDR analysis from `tools/` and a one-shot `gps-denied-tlog-to-csv` migration utility that exports legacy tlog inputs to the canonical CSV. Neither path runs from the airborne composition root after cycle 4.
The legacy `TlogReplayFcAdapter` is retained for audit paths — offline FDR analysis and `gps-denied-tlog-to-csv` export (AZ-972). Runtime replay uses the CSV adapter after operator alignment (F11 / Epic AZ-969).
### Demo replay operator flow (cycle 5 — Epic AZ-969)
F11 in `system-flows.md` is the **primary product demo**, not an e2e-test concern. Raw operator inputs are `(video, tlog, calibration)`; alignment produces an AZ-896 CSV on a single canonical clock; route-driven cache seeding uses `extract_route_from_tlog` via C12 / `replay_api` production modules (AZ-974, AZ-973). Backend children: AZ-970 (preview API), AZ-971 (alignment refine), AZ-972 (CSV export), AZ-973 (orchestration), AZ-974 (C12 seed CLI), AZ-975 (docs). UI: AZ-897 in `../ui`.
The cycle-4 `(video, CSV)` upload bypass (AZ-959) remains for operators who already have an aligned CSV; it is not the default demo entry.
### `satellite-provider` upload contract (per D-PROJ-2 carryforward)
@@ -781,4 +787,32 @@ When C5 ships a second strategy — `eskf` (ESKF baseline, AZ-588) — the subst
- `_docs/02_document/contracts/replay/replay_protocol.md` gains a new "Open-loop ESKF composition profile" sub-section in **Composition root extension** plus a new **Invariant 13** ("C4↔C5 pairing matrix is enforced at compose time") that the AZ-776 unit tests own.
- `_docs/02_document/components/06_c4_pose/description.md` gains an "Enabled flag" sub-section that points at this ADR; the rest of the component contract is unchanged.
- The unit-test surface at `tests/unit/runtime_root/test_az776_open_loop_eskf_composition.py` owns the seven invariants AZ-776 introduces: `C4PoseConfig.enabled` default-true, AC-1 (open-loop ESKF composes without C4), AC-2 (default GTSAM profile still includes C4), AC-3a + AC-3b (the two forbidden pairings raise `CompositionError`), and the two `pre_constructed` behaviours (`c5_isam2_graph_handle` omitted when C4 disabled, present when C4 enabled). The full suite passes in ~4 s.
- The composition root's contract surface in `runtime_root/__init__.py` gains one public helper (`CompositionError` was already public; the new `skip_slugs` parameter to `_compose` is module-private). No public CLI flag is added — operators set `c4_pose.enabled = false` in YAML.
- The composition root's contract surface in `runtime_root/__init__.py` gains one public helper (`CompositionError` was already public; the new `skip_slugs` parameter to `_compose` is module-private). No public CLI flag is added — operators set `c4_pose.enabled = false` in YAML.
### ADR-013 — gRPC server-streaming tile provision for operator pre-flight (AZ-976)
**Context**: Operator-side cache build (C11/C12 ↔ `satellite-provider`) is off the hot airborne path but dominates time-to-ready when a corridor has thousands of tiles. The current REST shape (`POST /route` + poll + planned `POST /inventory` + N× `GET /tiles/{z}/{x}/{y}`) multiplies round-trips and cannot overlap "tiles already on SP disk" with "tiles still downloading from Google Maps". The inventory POST was specified in AZ-777 but never shipped in satellite-provider; Jetson smoke tests 404 on it today. Both codebases are owned by the same team (.NET satellite-provider, Python gps-denied operator tooling), so a typed streaming contract is feasible without a browser client.
**Decision**:
1. **We will add `satellite.v1.RouteTileDelivery.DeliverRouteTiles`** — unary request (`RouteSpec` + `client_tiles`), server-streaming `RouteTileEvent` (manifest → batches → progress → complete | error) — as the primary operator-side pre-flight transport (Epic AZ-976). Proto: `tile_provision.proto`; human contract: `tile_provision_grpc.md`.
2. **The request carries `RouteSpec.route_id` (idempotent UUID) plus `ClientTileRecord[]`.** satellite-provider omits tiles when the client catalog already has equal-or-better resolution and equal-or-newer `captured_at` (lower m/px = better).
3. **First stream event is `RouteManifest`** (`total_candidates`, `skipped_by_client`, `to_deliver`); then `TileBatch` messages with inline JPEGs. Server sends on-disk hits before externally fetched tiles (wire-agnostic ordering; `TilePayload.route_priority` hints along-route order).
4. **ADR-004 boundary is preserved**: only C11/C12 on the operator workstation import gRPC stubs.
**Alternatives considered**:
| Alternative | Rejected because |
|-------------|------------------|
| REST `POST /inventory` + parallel GET | Never implemented in satellite-provider; still N+1 HTTP; no overlap of cached vs in-flight fetch |
| SSE over HTTPS | Weaker typing; both sides are service binaries, not browsers — gRPC + protobuf is the better fit |
| ZeroMQ between products | Poor fit across WAN/NAT; better kept **inside** satellite-provider's fetch workers |
| In-flight streaming to UAV | Violates RESTRICT-SAT-1 / ADR-004; wrong reliability model for the aircraft |
**Consequences**:
- Epic AZ-976 decomposes: AZ-977 (SP gRPC server), AZ-978 (C11 client + C12 wiring), AZ-979 (Jetson benchmark + flip default).
- REST `route_client` + `HttpTileDownloader` remain as fallback until AZ-979 benchmark promotes gRPC.
- Finished C6 is still staged onto the Jetson via USB/rsync before flight — this ADR optimizes operator wait time, not in-air link dependency.
**Evidence**: `_docs/02_document/contracts/c11_tilemanager/tile_provision.proto`, `tile_provision_grpc.md`, `_docs/02_tasks/todo/AZ-976_grpc_tile_provision_epic.md`.