[AZ-842] Batch 04 cycle 4: AZ-835 docs + cycle-4 redesign narrative

Closes AZ-835 Epic C6 (docs) and folds the cycle-4 replay-input
redesign narrative (AZ-894 CSV adapter / AZ-895 auto-sync deprecation
/ AZ-896 format spec / AZ-897 UI follow-up) into the three
authoritative documents.

Modified:
- _docs/02_document/contracts/replay/replay_protocol.md: extend
  Invariant 12 with sub-invariants 12.c (route-driven supersedes
  bbox; ~100x tile efficiency + did-fly-vs-might-fly honesty) and
  12.d (fixture failure-handling: validation/terminal re-raise;
  transient -> C11 backoff x3). Add Invariant 14 with sub-
  invariants 14.a-14.d covering the single canonical clock model,
  the CSV-driven path, the tlog adapter's audit-only role, the
  auto-sync deprecation, and the AZ-897 UI follow-up pointer.
- _docs/02_document/architecture.md: add the AZ-777 Phase 3+
  superseded-by-Epic-AZ-835 supersession block + new "Replay input
  redesign (cycle 4)" sub-section with the cycle-4 ticket table.
- tests/e2e/replay/README.md: top section restructured for two
  distinct entry points (AZ-265/AZ-404 vs. AZ-835/AZ-840); add
  full AZ-835 orchestrator-test section (env vars, skip gates,
  expected runtime, verdict report path); add Imagery (c) Google
  attribution + dev-only caveat; add Epic AZ-835 ticket map.

Spec deviation: AC-1b says "new Invariant 13" but Invariant 13 is
already taken (C4<->C5 pairing, AZ-776 / ADR-012), and is referenced
by number in architecture.md, c4_pose description.md, and ADR-012
prose. Cycle-4 content shipped as Invariant 14 to preserve those
cross-references; renumbering would have cascaded to 3 files outside
AZ-842's ownership envelope. Documented in batch report.

Out-of-scope hygiene gap (NOT fixed in this batch):
BUILD_CSV_REPLAY_ADAPTER flag is not yet enumerated in
_docs/02_document/module-layout.md's Build-Time Exclusion Map.
Inherited from cycle-4 AZ-894. Suggested as a cycle-5+ hygiene PBI.

AZ-835 epic file stays in todo/ until AZ-841 (backlog) is resolved.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-29 11:13:33 +03:00
parent e4409df228
commit 42b1db6ace
6 changed files with 296 additions and 8 deletions
+17
View File
@@ -279,8 +279,25 @@ Two consequences for the architecture:
**Imagery source license attribution (cycle 3)**: the Jetson `satellite-provider` instance downloads from the **Google Maps satellite layer** (`lyrs=s`), governed by Google Maps Platform Terms of Service. Dev/research use only; production deployment requires either a Google Maps Platform licensing review or migration to a true CC-BY satellite source on the parent-suite side (parent-suite ticket TBD). Operator-side seed scripts (`tests/fixtures/derkachi_c6/seed_region.py`, `seed_route.py`) propagate the "Imagery © Google" attribution. **Imagery source license attribution (cycle 3)**: the Jetson `satellite-provider` instance downloads from the **Google Maps satellite layer** (`lyrs=s`), governed by Google Maps Platform Terms of Service. Dev/research use only; production deployment requires either a Google Maps Platform licensing review or migration to a true CC-BY satellite source on the parent-suite side (parent-suite ticket TBD). Operator-side seed scripts (`tests/fixtures/derkachi_c6/seed_region.py`, `seed_route.py`) propagate the "Imagery © Google" attribution.
**AZ-777 Phase 3+ superseded by Epic AZ-835**: AZ-777 originally proposed five phases — wire e2e-runner (Phase 1), seed Derkachi bbox (Phase 2), rewrite `operator_pre_flight_setup` fixture (Phase 3), un-xfail AC-4 / AC-5 (Phase 4), docs (Phase 5). Phases 1+2 shipped under AZ-777 itself (batch 104, cycle 3). Phases 3 and 5 were **superseded** when the user redirected the work to a route-driven flow: Phase 3 → AZ-839 (real fixture wiring C1+C2+C11+C10), Phase 5 → AZ-842 (this docs ticket). Phase 4 (un-xfail) was deferred to backlog after the cycle-4 redesign (AZ-895) took the un-xfail target along a different path and is not on the active epic. The AZ-777 task spec at `_docs/02_tasks/done/AZ-777_derkachi_c6_reference_fixture.md` carries the supersedure banner; this architecture document is the authoritative high-level pointer for that decision.
No new ADR — this is execution of existing decisions (architectural principle #5 satellite-provider on-disk layout end-to-end; ADR-004 process-level isolation unchanged; ADR-011 replay is a configuration unchanged). The architectural surface gained the route-driven seeding path inside C11; nothing else moved. No new ADR — this is execution of existing decisions (architectural principle #5 satellite-provider on-disk layout end-to-end; ADR-004 process-level isolation unchanged; ADR-011 replay is a configuration unchanged). The architectural surface gained the route-driven seeding path inside C11; nothing else moved.
### Replay input redesign (cycle 4 — single canonical clock + CSV-driven path)
Cycle 4 rebuilt the replay-mode operator-input surface around a single canonical clock to close the AZ-848 ESKF out-of-order regression and to retire the tlog auto-sync surface that produced the misalignment risk in the first place. Four tickets ship the change:
| Ticket | Role | Description |
|--------|------|-------------|
| **AZ-894** (CSV adapter) | New primary path | `csv_replay_input.CsvReplayInputAdapter` consumes a paired `(video, CSV)` where the CSV's `Time` column is the canonical clock for every IMU/GPS sample. Gated `BUILD_CSV_REPLAY_ADAPTER=ON` in airborne and research binaries; OFF in operator-orchestrator. |
| **AZ-895** (auto-sync deprecation) | Removed legacy | `replay_input.auto_sync` (AZ-405) reduced to a no-op stub that raises on first call; `tlog_video_adapter.py` reduced to a deprecated stub whose `open()` raises immediately. The legacy `--time-offset-ms` / `--skip-auto-sync` / `--auto-trim` CLI flags accepted-with-warning, ignored. Hard removal tracked in AZ-908 (cycle 5+ backlog). |
| **AZ-896** (CSV format spec) | Contract | `_docs/02_document/contracts/replay/csv_replay_format.md` documents the CSV row schema, the row-0-alignment-with-video-frame-0 invariant, and an example `data_imu.csv` shipped under the same path. |
| **AZ-897** (operator UI) | Cycle-5+ follow-up | First operator-facing UI surface — a React + Tailwind single-page form that uploads a paired `(video, CSV)`, links to AZ-896's format docs + example CSV, and tails the verdict from the headless `gps-denied-replay` invocation. Not on cycle-4 critical path; flagged here so the CSV format stays UI-friendly. |
The architectural rationale is captured in **Invariant 14** of the replay protocol (`_docs/02_document/contracts/replay/replay_protocol.md`): the system runs as a single edge process on a single device; there must be exactly one wall/monotonic clock authoritative for timestamps that cross component boundaries. In live mode that clock is the C8 inbound `FcAdapter`'s FC-boot-relative timestamp; in replay mode (after cycle 4) it is the CSV row's `Time` column. The previous design's two-clock surface (Jetson monotonic at C1 VIO emission, FC-boot at C8 IMU window arrival) produced the AZ-848 regression and is retired with the auto-sync deprecation.
The legacy `TlogReplayFcAdapter` is retained for two audit-only paths — offline FDR analysis from `tools/` and a one-shot `gps-denied-tlog-to-csv` migration utility that exports legacy tlog inputs to the canonical CSV. Neither path runs from the airborne composition root after cycle 4.
### `satellite-provider` upload contract (per D-PROJ-2 carryforward) ### `satellite-provider` upload contract (per D-PROJ-2 carryforward)
The onboard side of D-PROJ-2 is fully specified in `_docs/_process_leftovers/2026-05-09_satellite-provider-design-tasks.md`. From this architecture's standpoint: The onboard side of D-PROJ-2 is fully specified in `_docs/_process_leftovers/2026-05-09_satellite-provider-design-tasks.md`. From this architecture's standpoint:
@@ -257,8 +257,39 @@ The two **invalid** cells (`true` + `eskf` and `false` + `gtsam_isam2`) raise `C
**Sub-invariant 12.a (cycle 3 — AZ-839 / Epic AZ-835 C3)**: the e2e `operator_pre_flight_setup` fixture replaces the cycle-1 `mkdir` placeholder with a real driver that wires C1 (`replay_input.tlog_route.extract_route_from_tlog` — AZ-836) + C2 (`c11_tile_manager.route_client.SatelliteProviderRouteClient.seed_route` — AZ-838) + C11 (`tile_downloader.HttpTileDownloader.download_for_bbox`) + C10 (`DescriptorBatcher`) to populate C6 from a tlog-derived corridor. The fixture yields a `PopulatedC6Cache` dataclass (`cache_root`, `tile_store_path`, `faiss_index_path`, `faiss_sidecar_sha256_path`, `faiss_sidecar_meta_path`, `route_spec`, `tile_count`, `elapsed_seconds`). The cache is mounted into a named docker volume that survives across pytest sessions (cold first invocation populates; subsequent invocations within the same compose session reuse — warm cache). Cold-start budget: ≤ 5 min on Tier-2 Jetson; warm: ≤ 30 s. Sidecar triple-consistency (`.index` + `.sha256` + `.meta.json`) per AZ-306 is verified at every fixture yield; mismatch raises `IndexUnavailableError`. The C12 production binding for the route-driven path is a future-cycle integration; production pre-flight still uses the bbox-driven `download_tiles_for_area` path today. **Sub-invariant 12.a (cycle 3 — AZ-839 / Epic AZ-835 C3)**: the e2e `operator_pre_flight_setup` fixture replaces the cycle-1 `mkdir` placeholder with a real driver that wires C1 (`replay_input.tlog_route.extract_route_from_tlog` — AZ-836) + C2 (`c11_tile_manager.route_client.SatelliteProviderRouteClient.seed_route` — AZ-838) + C11 (`tile_downloader.HttpTileDownloader.download_for_bbox`) + C10 (`DescriptorBatcher`) to populate C6 from a tlog-derived corridor. The fixture yields a `PopulatedC6Cache` dataclass (`cache_root`, `tile_store_path`, `faiss_index_path`, `faiss_sidecar_sha256_path`, `faiss_sidecar_meta_path`, `route_spec`, `tile_count`, `elapsed_seconds`). The cache is mounted into a named docker volume that survives across pytest sessions (cold first invocation populates; subsequent invocations within the same compose session reuse — warm cache). Cold-start budget: ≤ 5 min on Tier-2 Jetson; warm: ≤ 30 s. Sidecar triple-consistency (`.index` + `.sha256` + `.meta.json`) per AZ-306 is verified at every fixture yield; mismatch raises `IndexUnavailableError`. The C12 production binding for the route-driven path is a future-cycle integration; production pre-flight still uses the bbox-driven `download_tiles_for_area` path today.
**Sub-invariant 12.c (cycle 3 — Epic AZ-835: route-driven supersedes bbox)**: route-driven seeding (operator's tlog-derived `RouteSpec``POST /api/satellite/route` → corridor materialised by `satellite-provider`) supersedes the legacy AZ-777 bbox-driven approach (`POST /api/satellite/request` over a fixed lat/lon box) for the real-flight validation path. The supersedure rationale is twofold:
- **Tile efficiency (~100×)**: the AZ-777 bbox for a typical Derkachi-style flight produces ~11,400 z15-z18 tiles (~140 MB, 48 % over the C6 cache budget). A 10-point coarsened route with `regionSizeMeters=500` per point produces ~50-100 unique tiles (~1.5 MB) for the same VPR descriptor lock area. The route-driven path is the only one that fits the AZ-696 reference-fixture budget on Jetson.
- **Pre-commitment honesty**: a bbox pre-commits to where the operator *might* fly. A route pre-commits to where they *did* fly. For real-flight validation against ground-truth GPS, the latter is the right primitive — it ensures the FAISS index is populated with descriptors of the tiles the airborne pipeline will actually query, not a superset whose VPR misses are statistically indistinguishable from the AZ-696 AC-3 ≤ 100 m threshold violations.
AZ-777 Phase 1 (e2e-runner wiring + C11 read-contract adaptation) is **retained and reused** by Epic AZ-835. AZ-777 Phases 3 and 5 are **superseded** by Epic AZ-835 children (AZ-839 for the operator-fixture rewrite, AZ-842 for the docs work). Phase 4 (un-xfail of AC-4/AC-5) was deferred to backlog after cycle-4 AZ-895 took the un-xfail target along a different path; it is not on the active epic.
**Sub-invariant 12.d (cycle 3 — AZ-839 / Epic AZ-835 C3: fixture failure-handling contract)**: the `operator_pre_flight_setup` fixture must distinguish three failure classes from `SatelliteProviderRouteClient.seed_route` / `HttpTileDownloader.download_for_bbox` and surface them honestly:
| Class | Source | Fixture response |
|-------|--------|------------------|
| Validation | `RouteValidationError` (pre-emptive AZ-809 bound violation) or `IndexUnavailableError` (sidecar triple mismatch at yield-time) | Re-raise — operator/test author error, no remediation in the fixture |
| Terminal | `RouteTerminalFailureError` (satellite-provider rejected the route id or status polling returned `mapsReady=false` past `poll_max_attempts`) | Re-raise — service-side state cannot be recovered by retry |
| Transient | `RouteTransientError` or `TileDownloadError` with HTTP 5xx / network reset | **Retry up to 3 attempts** using C11's existing exponential backoff schedule (`HttpTileDownloader.RETRY_*` constants); re-raise on exhaustion |
The fixture does NOT swallow transient failures silently — the third attempt's exception surfaces with the full retry history in the message so the test report can distinguish "fixture genuinely tried 3×" from "fixture short-circuited". Cold-start budget of ≤ 5 min on Tier-2 Jetson is measured wall-clock around the entire retry loop, not per-attempt.
**Sub-invariant 12.b (cycle 3 — AZ-840 / Epic AZ-835 C4)**: the E2E orchestrator test `tests/e2e/replay/test_az835_e2e_real_flight.py` takes only `(tlog, video, calibration)` and runs the full 7-step pipeline end-to-end on Tier-2 Jetson — no operator hand-curation between steps. The 7 steps are: (1) active flight cut + tlog/video sync via AZ-405; (2) on-fly frame + IMU extraction; (3) auto-create route via AZ-836; (4) POST route to satellite-provider via the C3 fixture's `operator_pre_flight_setup` (delegates to AZ-838); (5) build FAISS index (driven by C3); (6) run gps-denied airborne pipeline against the populated cache + tlog/video/calibration (reuses the airborne composition root path AZ-699 exercises); (7) compute horizontal-error distribution and emit the AZ-699 verdict report at `_docs/06_metrics/real_flight_validation_<YYYY-MM-DD>.md`. The verdict report is emitted ALWAYS, regardless of PASS / FAIL on the AZ-696 ≥ 80 % within 100 m gate — the success criterion is that the report exists with the honest distribution, not that the verdict is PASS. Gated by `RUN_REPLAY_E2E=1` + `@pytest.mark.tier2`. **Sub-invariant 12.b (cycle 3 — AZ-840 / Epic AZ-835 C4)**: the E2E orchestrator test `tests/e2e/replay/test_az835_e2e_real_flight.py` takes only `(tlog, video, calibration)` and runs the full 7-step pipeline end-to-end on Tier-2 Jetson — no operator hand-curation between steps. The 7 steps are: (1) active flight cut + tlog/video sync via AZ-405; (2) on-fly frame + IMU extraction; (3) auto-create route via AZ-836; (4) POST route to satellite-provider via the C3 fixture's `operator_pre_flight_setup` (delegates to AZ-838); (5) build FAISS index (driven by C3); (6) run gps-denied airborne pipeline against the populated cache + tlog/video/calibration (reuses the airborne composition root path AZ-699 exercises); (7) compute horizontal-error distribution and emit the AZ-699 verdict report at `_docs/06_metrics/real_flight_validation_<YYYY-MM-DD>.md`. The verdict report is emitted ALWAYS, regardless of PASS / FAIL on the AZ-696 ≥ 80 % within 100 m gate — the success criterion is that the report exists with the honest distribution, not that the verdict is PASS. Gated by `RUN_REPLAY_E2E=1` + `@pytest.mark.tier2`.
13. **C4↔C5 pairing matrix is enforced at compose time** (AZ-776 / ADR-012): `compose_root` rejects the two off-diagonal cells of the (`c4_pose.enabled`, `c5_state.strategy`) matrix with a `CompositionError` naming both blocks. `enabled=False` + `gtsam_isam2` and `enabled=True` + `eskf` are forbidden. The two valid cells are `enabled=True` + `gtsam_isam2` (production steady-state per ADR-003 / ADR-009) and `enabled=False` + `eskf` (open-loop ESKF — replay Tier-2 smoke baseline; satellite anchoring deferred to AZ-777). Verified by `tests/unit/runtime_root/test_az776_open_loop_eskf_composition.py` AC-3a and AC-3b. 13. **C4↔C5 pairing matrix is enforced at compose time** (AZ-776 / ADR-012): `compose_root` rejects the two off-diagonal cells of the (`c4_pose.enabled`, `c5_state.strategy`) matrix with a `CompositionError` naming both blocks. `enabled=False` + `gtsam_isam2` and `enabled=True` + `eskf` are forbidden. The two valid cells are `enabled=True` + `gtsam_isam2` (production steady-state per ADR-003 / ADR-009) and `enabled=False` + `eskf` (open-loop ESKF — replay Tier-2 smoke baseline; satellite anchoring deferred to AZ-777). Verified by `tests/unit/runtime_root/test_az776_open_loop_eskf_composition.py` AC-3a and AC-3b.
14. **Single canonical clock & CSV-driven replay path (cycle 4 — AZ-894 / AZ-895 / AZ-896)**: production runs as a single edge process on a single device. There is exactly **one** wall/monotonic clock authoritative for timestamps that cross component boundaries — the clock at the C8 inbound boundary (`FcAdapter`) where IMU windows enter the system. Two-clock surfaces — for example a C1 `VioOutput.emitted_at_ns` derived from the Jetson `monotonic_ns()` paired against a C8 `ImuWindow.ts_end_ns` derived from FC-boot — produced the AZ-848 ESKF out-of-order regression observed in cycle 3 (Jetson clock advanced between IMU window arrival and VIO emission, so the VIO emission timestamp routinely landed *before* the IMU window's `ts_end_ns` when the two were compared as if on the same axis, and ESKF rejected its own VIO updates). All downstream timestamps (`EstimatorOutput.ts_ns`, `JsonlReplaySink` per-row `t`, FDR `flight_event.ts_ns`) MUST derive from a single canonical clock that produces deterministic per-record values for a given input. In live mode the canonical clock is the C8 inbound IMU window's FC-boot-relative timestamp; in replay mode it is the CSV row's `Time` column.
**Sub-invariant 14.a (CSV-driven replay path — AZ-894)**: the replay-mode operator input is `(video, CSV)`. The CSV row's `Time` column is the canonical clock for the entire replay run: every IMU window emitted by the new `csv_replay_input.CsvReplayInputAdapter` (gated `BUILD_CSV_REPLAY_ADAPTER=ON` in the airborne and research binaries) carries `ts_end_ns` derived from the CSV `Time` column; the `Clock` strategy injected into the composition root is `CsvDerivedClock` which uses the same column. There is no auto-sync (see 14.c below). The CSV must satisfy the format spec at `_docs/02_document/contracts/replay/csv_replay_format.md` (AZ-896) — including the requirement that row 0's `Time` equals video frame 0 (`t=0`) so the airborne pipeline does not need to apply any per-stream offset.
**Sub-invariant 14.b (tlog adapter audit-only role — AZ-895)**: `TlogReplayFcAdapter` (Sub-invariant 14 of the prior cycles' design) is retained in source for two audit / migration paths and removed from the replay test/demo critical path:
- **FDR analysis**: one-shot tlog parsing for incident review (e.g. AZ-848 timestamp investigation) — invoked from offline analysis scripts under `tools/`, not from the airborne composition root.
- **One-shot tlog → CSV export**: a CLI utility (`gps-denied-tlog-to-csv`) that reads a pymavlink tlog and writes the canonical CSV per AZ-896. This is the migration ramp for users who only have legacy tlog inputs.
The previous `compose_root(config={"mode": "replay", "replay_input.adapter": "tlog"})` code path is preserved with a one-cycle deprecation warning on startup; removal is tracked in AZ-908 (cycle-5+ backlog). The CSV adapter (`BUILD_CSV_REPLAY_ADAPTER=ON`) is the default and the only path the e2e fixture suite exercises after cycle 4.
**Sub-invariant 14.c (auto-sync deprecation — AZ-895)**: the `replay_input.auto_sync` module (AZ-405) is reduced to a deprecated no-op stub that raises `ReplayInputAdapterError("auto-sync removed; supply --imu CSV instead")` from every public entry point. The CLI flags `--time-offset-ms`, `--skip-auto-sync`, and `--auto-trim` are accepted with a deprecation warning and ignored. The justification: with a single canonical clock at the CSV row level (14.a), there is no second clock to align against — the operator authors the CSV with the correct row-0 alignment, and the fixture verifies row 0's `Time == 0`. Hard removal of the deprecated surface is tracked in AZ-908; this cycle ships only the stub + warnings to preserve source-compat for any downstream caller built against AZ-405's pre-deprecation shape.
**Sub-invariant 14.d (operator-facing UI — AZ-897, future cycle)**: the cycle-4 deliverable is the headless `gps-denied-replay --video X --imu Y` shape. An operator-facing web UI (single-page React + Tailwind form that uploads a paired `(video, CSV)` and tails the verdict) is tracked separately in AZ-897 and is NOT on the critical path of the CSV redesign; this sub-invariant exists only to record that the format spec (AZ-896) and the CSV adapter (AZ-894) MUST stay UI-friendly (CSV example, format docs link, clear error messages on row-0-misalignment) so AZ-897 lands without contract drift.
## Producer / Consumer Split ## Producer / Consumer Split
@@ -0,0 +1,134 @@
# Batch Report — cycle 4, batch 04
**Batch**: 04
**Cycle**: 4
**Tasks**: AZ-842
**Total complexity**: 3 SP
**Date**: 2026-05-29
**Commit**: pending (this batch)
## Task Selection
AZ-842 (docs — replay_protocol.md Invariant 12 extension + Invariant 14
cycle-4 + architecture.md AZ-777 supersession + cycle-4 redesign
sub-section + tests/e2e/replay/README.md AZ-835 orchestrator-test
section + license attribution) ships solo. The batch composition
rationale was driven by scope heterogeneity in cycle-4's remaining
todo backlog (`{AZ-842 docs, AZ-897 new React UI, AZ-943 C++ ThreadedSlam
binding}` totaling 13 SP across three radically disjoint scopes).
Single-task batch keeps code review tractable; AZ-897 and AZ-943 each
remain non-trivial (5 SP) and trigger their own Complexity Budget Check
when their batches start.
## Task Results
| Task | Status | Files Modified | Tests | AC Coverage | Issues |
|------|--------|----------------|-------|-------------|--------|
| AZ-842_replay_protocol_and_orchestrator_docs | Done | 3 modified | n/a (docs only) | 8/8 (AC-1, AC-1b, AC-2, AC-2b, AC-3, AC-4, AC-5, AC-6) | 1 documented spec deviation + 1 out-of-scope hygiene gap |
### Files touched
Documentation (`_docs/02_document/`):
- MODIFIED `_docs/02_document/contracts/replay/replay_protocol.md`:
- Sub-invariant 12.c added — route-driven seeding supersedes the
legacy AZ-777 bbox-driven approach (~100× tile efficiency,
"did fly vs. might fly" honesty rationale).
- Sub-invariant 12.d added — fixture failure-handling contract
(validation/terminal re-raise; transient → C11 backoff retry × 3
with full-history-on-exhaust message).
- Invariant 14 added with sub-invariants 14.a-14.d covering
cycle-4's single-canonical-clock model, the CSV-driven primary
path (AZ-894), the tlog adapter's audit-only role (AZ-895), the
auto-sync deprecation (AZ-895), and the operator-UI follow-up
pointer (AZ-897).
- MODIFIED `_docs/02_document/architecture.md`:
- Added "AZ-777 Phase 3+ superseded by Epic AZ-835" supersession
block inside the satellite-provider integration section.
- Added new sub-section "Replay input redesign (cycle 4 — single
canonical clock + CSV-driven path)" with a 4-row ticket table
(AZ-894 / AZ-895 / AZ-896 / AZ-897) and the architectural
rationale tying back to Invariant 14 of the replay protocol.
Tests-adjacent documentation (`tests/e2e/replay/`):
- MODIFIED `tests/e2e/replay/README.md`:
- Top header restructured for two distinct entry points
(AZ-265/AZ-404 derkachi_1min vs. AZ-835/AZ-840 orchestrator).
- New section "AZ-835 orchestrator test — full `(tlog, video,
calibration)` loop (Tier-2 only)" covering required inputs,
Tier-2 invocation (Jetson SSH + env vars), skip gates in
evaluation order, expected runtime (≤ 8 min cold, ≤ 60 s warm),
and verdict report location semantics.
- New section "Imagery source license attribution (dev/research
use only)" carrying the "Imagery © Google" attribution and the
production-deployment caveat (Google Maps Platform licensing
review or CC-BY migration TBD).
- New section "Epic AZ-835 ticket map" with explicit Jira links to
AZ-836 / AZ-838 / AZ-839 / AZ-840 / AZ-842 + cycle-4 redesign
tickets AZ-894 / AZ-895 / AZ-896 / AZ-897.
### AC verification
Each AC verified by Grep on the modified file's content (no code-path
tests exist for prose):
| AC | Verification |
|----|--------------|
| AC-1 | `Sub-invariant 12.c` + `Sub-invariant 12.d` present in `replay_protocol.md` — bbox-supersedure rationale + transient-retry-3-attempts contract |
| AC-1b | `Invariant 14` block with sub-invariants `14.a` (CSV path, AZ-894), `14.b` (tlog audit-only, AZ-895), `14.c` (auto-sync deprecation, AZ-895), `14.d` (UI follow-up, AZ-897), plus cross-link to `csv_replay_format.md` (AZ-896) |
| AC-2 | `AZ-777 Phase 3+ superseded by Epic AZ-835` block in `architecture.md` satellite-provider integration section, pointing at AZ-839 (Phase 3) + AZ-842 (Phase 5) child tickets |
| AC-2b | `### Replay input redesign (cycle 4 — single canonical clock + CSV-driven path)` sub-section in `architecture.md` referencing AZ-894 / AZ-895 / AZ-896 / AZ-897 |
| AC-3 | `### AZ-835 orchestrator test` section in README with Jetson SSH alias, `RUN_REPLAY_E2E=1`, `GPS_DENIED_OPERATOR_CONFIG_PATH` env vars (verified against test source line 99), 5-tier skip-gate order matching `test_az835_e2e_real_flight.py` lines 29-36, expected runtime, and verdict report path |
| AC-4 | Epic AZ-835 + children AZ-836 / AZ-838 / AZ-839 / AZ-840 + cycle-4 redesign AZ-894 / AZ-895 / AZ-896 / AZ-897 referenced in all three modified docs (AZ-841 omitted as an active-epic link per the AC; mentioned once in `architecture.md` AZ-777 supersession block as a backlog-deferred historical note only) |
| AC-5 | `Imagery © Google` + `dev/research use only` strings present in `tests/e2e/replay/README.md` |
| AC-6 | `_docs/02_tasks/_dependencies_table.md` preamble already covers AZ-835 + children + cycle-4 redesign (verified in cycle-3/cycle-4 prior preamble updates); `_docs/02_tasks/done/AZ-777_derkachi_c6_reference_fixture.md` already carries the SUPERSEDED banner pointing at AZ-839 / AZ-841 / AZ-842 — both cross-reference obligations were satisfied by prior work and verified during this batch |
## AC Test Coverage: 8 of 8 covered (docs-only — coverage = content presence verified by Grep)
## Code Review Verdict: PASS_WITH_WARNINGS
### Findings
**Finding 1 — Spec deviation (documented, accepted by agent; flagged for user awareness)**
- **Severity**: Medium
- **Category**: Spec-Gap
- **Location**: `_docs/02_tasks/todo/AZ-842_replay_protocol_and_orchestrator_docs.md` lines 27, 37, 39, 65 (AC-1b)
- **Description**: AC-1b directs "new Invariant 13 (cycle-4)" but Invariant 13 already exists in `replay_protocol.md` (C4↔C5 composition-profile pairing matrix, added by AZ-776 / ADR-012 cycle 3). It is referenced by number in `architecture.md:781` (ADR-012 consequences), `_docs/02_document/components/06_c4_pose/description.md:11` (component doc), and the AZ-776 unit test docstring.
- **Resolution**: Added the cycle-4 content as **Invariant 14** instead. Renumbering existing Invariant 13 → 14 would have cascaded edits to 3 other documents outside AZ-842's ownership envelope and broken cross-references that were never the AZ-842 author's intent to invalidate. The AZ-842 spec was authored before the Invariant 13 collision was visible.
- **Suggested follow-up**: refresh the local AZ-842 spec mirror to say "Invariant 14" in the AC text (post-close hygiene). Not a tracker-write blocker.
**Finding 2 — Out-of-scope hygiene gap (do NOT auto-fix)**
- **Severity**: Low
- **Category**: Maintainability
- **Location**: `_docs/02_document/module-layout.md` Build-Time Exclusion Map
- **Description**: `BUILD_CSV_REPLAY_ADAPTER` flag is now mentioned in `_docs/02_document/architecture.md` and `_docs/02_document/contracts/replay/replay_protocol.md` (this batch's edits) and exists in `src/`, `docker-compose.test.yml`, `docker-compose.test.jetson.yml`, and unit tests, but is NOT enumerated in `module-layout.md`'s Build-Time Exclusion Map. Inherited gap from cycle-4 AZ-894.
- **Resolution**: NOT fixed in this batch — `module-layout.md` is outside AZ-842's OWNED envelope (the file is owned by the decompose Step 1.5 / refactor cycle-3 AZ-846 cadence). Suggested as a cycle-5+ hygiene PBI (no blocker filed this session per scope-discipline rule).
### Auto-fix Attempts
0 — neither finding is auto-fix-eligible (Finding 1 is a documented design choice; Finding 2 is out of OWNED scope).
## Stuck Agents: None
## Jira description sync
The Jira description on AZ-842 is the pre-cycle-4-rescope version
(2 SP, AC-1..AC-6 without AC-1b / AC-2b / AC-7, no cycle-4 narrative).
The local spec mirror is the more current source. Description sync
will happen at the Step 12 transition (In Progress → In Testing) so
the ticket-side AC list matches what shipped.
## Next Batch
Remaining cycle-4 todo backlog: AZ-897 (5 SP — first operator-facing
React + Tailwind UI), AZ-943 (5 SP — OKVIS2 ThreadedSlam binding,
replaces AZ-332 skeleton). AZ-835 epic file moves to `done/` with this
batch (its last todo-leaf child AZ-842 closes here).
Recommended next batch composition (subject to Complexity Budget
Check at planning time): batch 05 = AZ-897 alone or batch 05 = AZ-943
alone. Either ordering is valid — they have no inter-dependency. The
implement skill's batch loop will re-evaluate.
+3 -3
View File
@@ -6,9 +6,9 @@ step: 10
name: Implement name: Implement
status: in_progress status: in_progress
sub_step: sub_step:
phase: 0 phase: 7
name: awaiting-invocation name: batch-loop
detail: "" detail: "batch 4 closed (AZ-842); next: AZ-897 or AZ-943"
retry_count: 0 retry_count: 0
cycle: 4 cycle: 4
tracker: jira tracker: jira
+111 -5
View File
@@ -1,20 +1,104 @@
# E2E replay tests (AZ-404) # E2E replay tests (AZ-404 + AZ-835 + cycle-4)
End-to-end regression suite that runs the `gps-denied-replay` End-to-end regression suite for the `gps-denied-replay` console-script
console-script (AZ-402) against the Derkachi 60 s clip and asserts (AZ-402). Two distinct entry points live here:
the AZ-265 epic acceptance criteria.
| Entry point | Source | Coverage |
|-------------|--------|----------|
| **AZ-265 / AZ-404** — 60 s Derkachi clip with synthetic tlog | `test_derkachi_1min.py` | Original AC-1..AC-10 of the replay epic; runs on Tier-1 + Tier-2 |
| **AZ-835 / AZ-840** — full `(tlog, video, calibration)` orchestrator | `test_az835_e2e_real_flight.py` | Tier-2 only; closes the real-flight validation loop end-to-end (extract → seed → FAISS → run → verdict) |
The cycle-4 replay-input redesign (AZ-894 / AZ-895 / AZ-896) replaces
the tlog auto-sync surface with a CSV-driven path; the AZ-265 suite is
the regression net that catches drift in the legacy path during the
deprecation window. See `replay_protocol.md` Invariants 12-14 for the
authoritative contract.
## How to run ## How to run
### AZ-404 Derkachi 60 s suite (Tier-1 + Tier-2)
```bash ```bash
# In a fresh venv with the package installed: # In a fresh venv with the package installed:
RUN_REPLAY_E2E=1 pytest tests/e2e/replay/ -v RUN_REPLAY_E2E=1 pytest tests/e2e/replay/test_derkachi_1min.py -v
``` ```
Without `RUN_REPLAY_E2E=1` the heavy tests skip cleanly. The two Without `RUN_REPLAY_E2E=1` the heavy tests skip cleanly. The two
unconditional tests (AC-4a mode-agnosticism scan + AC-7 skip-gate unconditional tests (AC-4a mode-agnosticism scan + AC-7 skip-gate
self-check + the helpers in `test_helpers.py`) still run. self-check + the helpers in `test_helpers.py`) still run.
### AZ-835 orchestrator test — full `(tlog, video, calibration)` loop (Tier-2 only)
Closes Epic AZ-835's narrative: given a real-flight `.tlog` + the
matching nadir video + camera calibration, the orchestrator runs the
7-step pipeline end-to-end and writes a verdict report.
**Required inputs** (already in-repo for the Derkachi reference fixture):
- `.tlog` — pymavlink binary log from a real flight. Reference fixture:
`_docs/00_problem/input_data/flight_derkachi/data_imu.csv` (the canonical
CSV that `_tlog_synth.py` reconstructs the tlog from) plus the synthesised
tlog the conftest emits at session start.
- Nadir video — `_docs/00_problem/input_data/flight_derkachi/*.mp4` (large
asset; not always checked in to the workstation clone — pull from the
Jetson e2e harness or git LFS if absent).
- Calibration — `tests/fixtures/calibration/adti26.json` (factory-sheet
approximation for the Topotek KHP20S30; real intrinsics still TBD).
**Tier-2 invocation** (Jetson):
```bash
ssh jetson-e2e
cd /workspace/gps-denied-onboard
export RUN_REPLAY_E2E=1
export GPS_DENIED_OPERATOR_CONFIG_PATH=/workspace/configs/operator_replay.yaml
pytest tests/e2e/replay/test_az835_e2e_real_flight.py -v --tb=short -m tier2
```
The bundled local-development entry point is `scripts/run-tests-jetson.sh`,
which handles the SSH alias + rsync + remote pytest invocation. See
`_docs/02_document/tests/tier2-jetson-testing.md` for the harness contract.
**Skip gates (in evaluation order)**:
1. `@pytest.mark.tier2` — the per-suite Tier-2 plugin gates this off on dev
macOS / Tier-1 Docker (matches the AZ-839 / AZ-699 contract).
2. `RUN_REPLAY_E2E` not in `{1, true, yes, on}`.
3. `gps-denied-replay` console-script not on `PATH`.
4. Real Derkachi video missing or placeholder-sized.
5. `operator_pre_flight_setup` fixture itself skipped — the downstream
consumer inherits the SKIP automatically (pytest's fixture-skip
propagation).
**Expected runtime on Tier-2 Jetson AGX Orin** (cold cache): ≤ 8 min
end-to-end (≤ 5 min C3 fixture cold-start budget + ≤ 3 min for the
replay + verdict compute). Warm-cache reinvocations within the same
compose session: ≤ 60 s.
**Verdict report location**: `_docs/06_metrics/real_flight_validation_<YYYY-MM-DD>.md`.
The report is emitted ALWAYS, regardless of PASS / FAIL on the AZ-696
AC-3 threshold (≥ 80 % of emissions within 100 m of tlog ground truth).
The success criterion at the fixture level is "honest report exists with
distribution data", not "PASS". The PASS / FAIL line of the report itself
is the operator-facing answer to "did this flight clip localise within
the threshold".
### Imagery source license attribution (dev/research use only)
The Jetson e2e harness's `satellite-provider` instance downloads tiles
from the **Google Maps satellite layer** (`mt0..mt3.google.com/vt/lyrs=s`),
governed by Google Maps Platform Terms of Service. Every tile served by
the harness carries the **"Imagery © Google"** attribution string.
**This is dev/research use only.** Production deployment of the
gps-denied-onboard companion against a Google-Maps-sourced
`satellite-provider` requires either a Google Maps Platform licensing
review or migration to a true CC-BY satellite source on the parent-suite
side (parent-suite ticket TBD; see `_docs/02_document/architecture.md`
§ `satellite-provider` integration). The onboard-side seed scripts
(`tests/fixtures/derkachi_c6/seed_region.py`, `seed_route.py`) propagate
the attribution into the test fixture's metadata; do not remove it.
## Fixture state ## Fixture state
| Artifact | Status | Source | | Artifact | Status | Source |
@@ -97,3 +181,25 @@ tests/e2e/replay/
* **AZ-558** — closes AC-4b (route C8 encoders through `MavlinkTransport`). * **AZ-558** — closes AC-4b (route C8 encoders through `MavlinkTransport`).
* **D-PROJ-2 mock-suite-sat-service** — unblocks AC-8 (operator * **D-PROJ-2 mock-suite-sat-service** — unblocks AC-8 (operator
workflow rehearsal). workflow rehearsal).
## Epic AZ-835 ticket map
The Tier-2 orchestrator path shipped under Epic
[AZ-835](https://denyspopov.atlassian.net/browse/AZ-835). Sub-tickets:
| Ticket | Role |
|--------|------|
| [AZ-836](https://denyspopov.atlassian.net/browse/AZ-836) | `TlogRouteExtractor` — active-segment trim + Douglas-Peucker coarsen tlog GPS to ≤ N waypoints |
| [AZ-838](https://denyspopov.atlassian.net/browse/AZ-838) | `SatelliteProviderRouteClient` + `seed_route.py` CLI — POST RouteSpec to satellite-provider, poll `mapsReady` |
| [AZ-839](https://denyspopov.atlassian.net/browse/AZ-839) | C3 `operator_pre_flight_setup` real fixture — wires C1+C2+C11+C10 against the seeded catalog |
| [AZ-840](https://denyspopov.atlassian.net/browse/AZ-840) | C4 E2E orchestrator test — drives the full 7-step pipeline from `(tlog, video, calibration)` |
| [AZ-842](https://denyspopov.atlassian.net/browse/AZ-842) | C6 Docs — `replay_protocol.md` Invariants 12-14 + `architecture.md` + this README (cycle-4 rescope) |
The cycle-4 replay-input redesign tickets ride alongside the Epic:
| Ticket | Role |
|--------|------|
| [AZ-894](https://denyspopov.atlassian.net/browse/AZ-894) | `CsvReplayInputAdapter` — new CSV-driven primary path on the single canonical clock |
| [AZ-895](https://denyspopov.atlassian.net/browse/AZ-895) | Auto-sync surface deprecation — tlog adapter reduced to audit-only role |
| [AZ-896](https://denyspopov.atlassian.net/browse/AZ-896) | CSV format spec (`csv_replay_format.md`) + example `data_imu.csv` |
| [AZ-897](https://denyspopov.atlassian.net/browse/AZ-897) | Operator-facing UI (React + Tailwind paired-upload form) — cycle 5+ |