chore: WIP pre-implement

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-21 08:41:12 +00:00 · 2026-05-26 17:09:13 +03:00
parent be743a72d6
commit 940066bee2
31 changed files with 1709 additions and 54 deletions
@@ -672,3 +672,44 @@ All tests run from the `e2e-runner` container against the SUT through public bou
 The Vertical-error section is replaced by `_No emissions carried a comparable altitude — vertical stats skipped._` when none of the JSONL rows carry an `alt_m` field comparable to the ground-truth altitude.

 **Skip semantics**: AZ-699 distinguishes between *missing-prerequisite skip* (cleanly skipped with the missing file's path) and *test-cannot-resolve mask* (`@xfail` — explicitly forbidden by AZ-699 AC-1). The AZ-404 1-min test's `@xfail` on AC-3 is unchanged (AZ-699 AC-4 is "add a new test, don't replace") — FT-P-20 is the honest replacement that runs alongside it.
+
+---
+
+### FT-P-21: End-to-end orchestrator pipeline from `(tlog, video, calibration)` only
+
+**Summary**: Validates the full 7-step Epic AZ-835 pipeline — given only `(tlog, video, calibration)`, the system auto-extracts a `RouteSpec` (AZ-836), posts it to the real satellite-provider (AZ-838), builds the C6 FAISS index via the route-driven `operator_pre_flight_setup` fixture (AZ-839, supersedes the AZ-777 Phase 3 bbox-seeded placeholder), runs the airborne replay pipeline, and emits a horizontal-error verdict report. No operator hand-curation between steps. Closes the Epic AZ-835 narrative: "give it a tlog + video + calibration, and the system does everything else."
+**Traces to**: AZ-840 AC-1..AC-8 (epic AZ-835 narrative); supplementary product-AC coverage on AC-1.1, AC-1.2, AC-8.3 (pre-loaded cache realised from route, not bbox).
+**Category**: End-to-end Integration + Position Accuracy
+
+**Preconditions**:
+- Tier-2 Jetson with `@pytest.mark.tier2` + `RUN_REPLAY_E2E=1` env (explicit skip-reason naming the missing env var — no silent skip per AZ-840 AC-6).
+- Real `satellite-provider` + `satellite-provider-postgres` services running in `docker-compose.test.jetson.yml` (lineage AZ-688 / AZ-691 / AZ-692; cycle-3 AZ-777 Phase 1 adapted C11 to the real `POST /api/satellite/tiles/inventory` + `GET /tiles/{z}/{x}/{y}` endpoints).
+- `tests/e2e/replay/conftest.py::operator_pre_flight_setup` from AZ-839 (route-driven C6 population, supersedes the AZ-777 Phase 3 placeholder).
+- `_docs/00_problem/input_data/flight_derkachi/derkachi.tlog` + `flight_derkachi.mp4` (real binary + real video >1 MB).
+- `_docs/00_problem/input_data/flight_derkachi/khp20s30_factory.json` (AZ-702 factory-sheet camera calibration).
+- `gps-denied-replay` console-script installed in the e2e-runner image (AZ-604).
+- AZ-776 (eskf open-loop composition profile) landed; AZ-848 — Jetson `eskf_out_of_order` regression — currently blocks the heavy-AC path on Jetson, so FT-P-21 produces its first honest verdict once AZ-848 lands.
+
+**Input data**: real `derkachi.tlog`, real `flight_derkachi.mp4`, factory-sheet camera calibration. AZ-836's `extract_route_from_tlog(tlog, max_waypoints=10)` derives the `RouteSpec` from the tlog itself; no operator authoring required.
+
+**Steps**:
+
+| Step | Consumer Action | Expected System Response |
+|------|----------------|------------------------|
+| 1 | Active-flight cut + tlog/video sync via AZ-405's `tlog_video_adapter` | Active segment located; tlog↔video offset resolved (`replay.compose_root.ready` logs `auto_sync_used=true|false`, AC-8 escape hatch honored). |
+| 2 | On-fly frame + IMU extraction via `VideoFileFrameSource` + `TlogReplayFcAdapter` | Frame and IMU streams co-aligned per AZ-697 ground-truth invariants. |
+| 3 | `extract_route_from_tlog(tlog, max_waypoints=10)` → `RouteSpec` | Route materially follows tlog trajectory; waypoints inside the Derkachi bbox (lat 50.0808..50.0832, lon 36.1070..36.1134) per AZ-836 AC-1. |
+| 4 | `operator_pre_flight_setup` posts route via `SatelliteProviderRouteClient.seed_route`; satellite-provider downloads Google Maps tiles into C6 | Route registered; `mapsReady=true` within poll budget; `tile_count > 0`; warm fixture re-invocation within the same compose session ≤ 30 s (AZ-839 AC-2). |
+| 5 | C10 `DescriptorBatcher` builds the FAISS HNSW NetVLAD index from the populated C6 | Three sidecar files (`.index` + `.sha256` + `.meta.json`) pass the AZ-306 triple-consistency check; tamper test raises `IndexUnavailableError` (AZ-839 AC-6). |
+| 6 | Invoke airborne `gps-denied-replay` against the populated cache + tlog/video/calibration | Subprocess runs the per-frame loop end-to-end; emits JSONL outputs (currently blocked by AZ-848 — `eskf_out_of_order` at frame 3 fails the binary with exit 1 deterministically on the Derkachi 1-min clip). |
+| 7 | Compute horizontal-error distribution via `helpers/accuracy_report.py` + `helpers/gps_compare.py`; write verdict report | `_docs/06_metrics/real_flight_validation_<YYYY-MM-DD>.md` exists with the honest distribution (PASS or FAIL on the AZ-696 100 m / 80 % gate — verdict emitted **regardless** of PASS/FAIL per AZ-840 AC-2). |
+
+**Expected outcome**: Verdict report exists with the honest horizontal-error distribution. Test PASSes iff the run meets the AZ-696 100 m / 80 % gate (≥ 80 % of ticks within 100 m of ground truth). Mid-pipeline failures (e.g., satellite-provider rejection at step 4, sidecar mismatch at step 5, ESKF divergence at step 6) fail LOUD with a clear error pointing at the failing step — no silent skip past a failure (AZ-840 AC-5).
+
+**Max execution time**: 15 min wall-clock on the Derkachi clip (AZ-840 AC-4 soft target for first delivery; hard NFR set after first honest measurement is recorded in the verdict report).
+
+**Relationship to existing tests**:
+- FT-P-20 (AZ-699 real-flight runner) is preserved (AZ-840 AC-7) — FT-P-21 reuses its verdict-report-writing path through `_report_writer.py` rather than superseding it. Either the two live alongside, or AZ-699's runner is wrapped by AZ-840's orchestrator with the verdict-writing path preserved.
+- FT-P-15 + FT-P-16 (pre-loaded cache, AC-8.3) remain the canonical bbox-fixture tests; FT-P-21 is the route-driven supplementary test that exercises the same end-state (populated C6) via the production C11→satellite-provider path.
+
+**Implemented as**: `tests/e2e/replay/test_az835_e2e_real_flight.py` (per AZ-840). Unit-tested orchestration helper: `tests/e2e/replay/test_e2e_orchestrator_unit.py` (17 tests covering parameter validation + error propagation between the 7 orchestration steps).
@@ -8,8 +8,8 @@ This matrix is the canonical view of test coverage for the planning context. It

 | AC ID | Acceptance Criterion (one-line) | Test IDs | Coverage |
 |-------|---------------------|----------|----------|
-| AC-1.1 | Frame-center GPS within 50 m for ≥80% of normal-flight photos | FT-P-01 | Covered |
-| AC-1.2 | Frame-center GPS within 20 m for ≥50% of normal-flight photos | FT-P-01 | Covered |
+| AC-1.1 | Frame-center GPS within 50 m for ≥80% of normal-flight photos | FT-P-01, FT-P-21 (orchestrator-level supplementary) | Covered |
+| AC-1.2 | Frame-center GPS within 20 m for ≥50% of normal-flight photos | FT-P-01, FT-P-21 (orchestrator-level supplementary) | Covered |
 | AC-1.3 | Cumulative drift between satellite-anchored fixes <100 m visual / <50 m IMU-fused | FT-P-02 | Covered |
 | AC-1.4 | Estimate reports 95% covariance + source label | FT-P-03 | Covered |
 | AC-2.1a | Frame-to-frame registration ≥95% on normal segments | FT-P-04 | Covered |
@@ -35,7 +35,7 @@ This matrix is the canonical view of test coverage for the planning context. It
 | AC-7.2 | AI-camera object coordinates from gimbal/zoom/altitude | — | NOT COVERED — same as AC-7.1 |
 | AC-8.1 | Imagery via Suite Sat Service offline cache, ≥0.5 m/px | FT-P-15, FT-P-16, NFT-SEC-02 | Covered |
 | AC-8.2 | Tile freshness <6 mo (active-conflict) / <12 mo (rear) | FT-N-05 | Covered |
-| AC-8.3 | Imagery pre-loaded onto companion before flight | FT-P-15, FT-P-16 | Covered |
+| AC-8.3 | Imagery pre-loaded onto companion before flight | FT-P-15, FT-P-16, FT-P-21 (route-driven via real satellite-provider) | Covered |
 | AC-8.4 | Mid-flight tile generation with quality metadata | FT-P-17 | Covered |
 | AC-8.5 | No raw nav/AI-cam frame retention except thumbnail log | FT-P-18 | Covered |
 | AC-8.6 | Satellite relocalization scale-ratio + scene-change | FT-P-19 (scale FULL; scene-change PARTIAL) | PARTIAL — scene-change subset reduced confidence (only 2/60 stills have paired sat refs; no labeled change-pair dataset). Independent of the AC-NEW-4 / AC-NEW-7 multi-flight gap (those rows were resolved by AC-text relaxation 2026-05-09; AC-8.6 scene-change still requires a labeled change-pair dataset that synthetic perturbations cannot substitute for). Mitigation: deferred to a follow-up cycle when labeled change-pair data becomes available; surfaced in the Step 4 risk register |
@@ -78,6 +78,8 @@ This matrix is the canonical view of test coverage for the planning context. It
 > Revised 2026-05-09 (Plan Phase 2a.0 outcomes): three rows moved PARTIAL → Covered (AC-NEW-4, AC-NEW-7, RESTRICT-FAIL-2) following AC-text relaxation per Q3=B. Restriction row count corrected from 19 to 20 (pre-existing arithmetic error).
 >
 > Revised 2026-05-19 (Greenfield Step 12 cycle-update — autodev): NFT-RES-05 appended to `resilience-tests.md` capturing the composition-root bootstrap contract introduced by AZ-591 / AZ-618 / AZ-687 (replay-mode minimal config, `AirborneBootstrapError` operator-error contract, Tier-2 `replay.compose_root.ready` + `replay.input.frame_emitted` log-boundary gate). NFT-RES-05 is added to AC-NEW-1 and AC-4.1 as bootstrap-precondition coverage; no coverage counts move because the scenario is supplementary, not promoting any PARTIAL row.
+>
+> Revised 2026-05-24 (Existing-code cycle-3 Step 12 cycle-update — autodev): FT-P-21 appended to `blackbox-tests.md` capturing the Epic AZ-835 orchestrator-level end-to-end pipeline (AZ-836 `RouteSpec` extractor + AZ-838 `SatelliteProviderRouteClient` + AZ-839 route-driven `operator_pre_flight_setup` + AZ-840 orchestrator test). FT-P-21 is supplementary route-driven coverage on AC-1.1, AC-1.2 (orchestrator-level pipeline accuracy) and AC-8.3 (pre-loaded cache realised via the production C11→satellite-provider path rather than the bbox-seeded FT-P-15/FT-P-16 fixture). No coverage counts move — FT-P-21 supplements already-Covered rows. **Currently blocked on Jetson by AZ-848** (`eskf_out_of_order` regression introduced by AZ-776's missing Jetson-verification gate — pre-existing, surfaced cycle-3 Step 11; tracked locally at `_docs/02_tasks/todo/AZ-848_jetson_eskf_out_of_order_regression.md`). Cycle-3 internal changes (C11 contract adaptation per AZ-777 Phase 1; RouteSpec relocation per AZ-845; module-layout refresh AZ-846; AZ-270 lint widening AZ-847; C12 cold-start unit-NFR threshold relax AZ-844) are implementation-only and produce no new black-box scenarios.

 | Category | Total Items | Covered | PARTIAL | Not Covered | Coverage % (Covered + PARTIAL counted half) |
 |----------|-----------|---------|---------|-------------|--------------------------------------------|