6 Commits

Author SHA1 Message Date
Oleksandr Bezdieniezhnykh 9dc04cc677 Update autodev state and dependencies table for Phase 2 progress
ci/woodpecker/push/02-build-push Pipeline failed
- Changed autodev state sub_step to reflect new phase and task details: updated phase from 7 to 2, renamed task to 'refactor-analysis-gate', and revised detail to indicate the creation of new tasks AZ-844, AZ-845, AZ-846, and AZ-847, awaiting Phase-2 gate.
- Updated dependencies table with the latest task counts and complexity points, reflecting the addition of new tasks and the closure of AZ-777 in Jira. Total tasks now stand at 173 with 557 complexity points.
2026-05-23 17:11:50 +03:00
Oleksandr Bezdieniezhnykh ade0c86f2b [AZ-840] [AZ-835] e2e orchestrator test (E-AZ-835 C4)
Wraps the AZ-699 verdict-report path with the AZ-839
operator_pre_flight_setup C3 fixture so a single Tier-2 test
takes only (tlog, video, calibration) and runs the full 7-step
pipeline on the Jetson harness without operator hand-curation.

New surface (tests-only, no src/ changes):
- tests/e2e/replay/_e2e_orchestrator.py — orchestrator with
  OrchestratorStep enum, OrchestrationFailure exception (step
  prefix per AC-5), OrchestrationReport dataclass,
  write_effective_replay_config helper, and
  run_e2e_orchestration entry point covering steps 1-2-6-7.
- tests/e2e/replay/test_e2e_orchestrator_unit.py — 17 unit
  tests covering each failure mode + happy path with mocked
  subprocess + ground-truth loader (AC-8).
- tests/e2e/replay/test_az835_e2e_real_flight.py — Tier-2 +
  RUN_REPLAY_E2E gated integration test asserting verdict
  report exists, 15-min budget held (AC-1, AC-2, AC-3, AC-4,
  AC-6).

The effective config write overlays c6_tile_cache.root_dir
onto the static operator YAML at runtime so the airborne
subprocess shares the cache_root the C3 fixture chose. Field-
level merge — every other operator-config block stays
verbatim. The static YAML on disk is never touched.

Test run: tests/e2e/replay 45 passed, 10 skipped (10 skips
were 9 pre-existing + 1 new tier2). No src/ touched, no
AZ-839 driver changes; AC-7 (AZ-699 still passes) holds by
inspection.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-23 15:27:41 +03:00
Oleksandr Bezdieniezhnykh 8c4be9ace0 [AZ-839] Fix C3 fixture path mismatch (batch 108b)
The batch 108 fixture built tile_store + descriptor_index from
the static operator config (root_dir baked into YAML) but built
the AC-3/AC-6 verifier from cache_root/descriptor.index (fresh
tmp path). On Tier-2 the descriptor_batcher would write under
the YAML root and the verifier would open the tmp path, raising
IndexUnavailableError before the fixture could yield a
PopulatedC6Cache. Unit tests missed it because every test
stubbed descriptor_index_factory.

Mutate the c6_tile_cache config block in-memory at fixture entry
so root_dir = cache_root and faiss_index_path falls back to
<cache_root>/descriptor.index. Production C6 components and the
verifier now share one path source. Align tile_store_path with
PostgresFilesystemStore's <root_dir>/tiles layout so the
integration test's tile_store_path.is_dir() assertion holds.

Driver and unit tests are path-agnostic and unaffected. Batch
108b report documents the defect, the fix, and the self-review
miss.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-23 15:20:14 +03:00
Oleksandr Bezdieniezhnykh bfcac2cb9f [AZ-839] [AZ-835] operator_pre_flight_setup real fixture (E-AZ-835 C3)
Replace the placeholder operator_pre_flight_setup pytest fixture (the
mkdir stub at tests/e2e/replay/conftest.py:293-310) with a real driver
that wires C1 (AZ-836 RouteSpec) + C2 (AZ-838 SatelliteProviderRoute
Client) + C11 (AZ-316 HttpTileDownloader) + C10 (AZ-322 Descriptor
Batcher) end-to-end and yields a typed PopulatedC6Cache. AZ-306 FAISS
sidecar triple-consistency is verified post-rebuild via a caller-
supplied descriptor_index_factory; partial sidecars are cleaned up on
failure (AC-7) while pre-existing warm-cache files are preserved.
Algorithm lives in tests/e2e/replay/_operator_pre_flight.py with
pure dependency injection so the AC-8 unit suite (11 tests covering
happy / transient-retry / terminal-failure / validation-error /
tamper-detection / cleanup-on-failure) runs against stubs and the
AC-9 Tier-2 integration test runs the same algorithm against the
real Jetson harness. The conftest fixture skip-gates on RUN_REPLAY
_E2E + SATELLITE_PROVIDER_URL/API_KEY + BUILD_FAISS_INDEX +
GPS_DENIED_OPERATOR_CONFIG_PATH and wires deps through the existing
runtime_root factories. Supersedes AZ-777 Phase 3.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-23 15:08:34 +03:00
Oleksandr Bezdieniezhnykh 0ed1a5d988 [AZ-835] [AZ-777] Decompose Epic into C3-C6 + close AZ-777
AZ-839 (C3, 5pt) operator_pre_flight_setup real fixture: wire
C1+C2+C11+C10, supersedes AZ-777 Phase 3 (route-driven, not bbox).
AZ-840 (C4, 3pt) E2E orchestrator test ingesting raw
(tlog, video, calibration), runs steps 1-7 end-to-end on Jetson.
AZ-841 (C5, 1pt) Un-xfail AZ-777 AC-4 + AC-5 once C3 + C4 land.
AZ-842 (C6, 2pt) Docs: replay_protocol Invariant 12 + architecture
+ orchestrator-test README.

AZ-777 transitioned to Done in Jira (Phases 1+2 shipped batches
104-106; Phases 3-5 superseded per 2026-05-22 route-driven
directive). Closure comment 11177 added with phase-by-phase status.
Local spec moved todo/ -> done/ with a status banner at the top.

Dependencies table preamble bumped to 173 tasks / 557 SP and a
2026-05-23 entry prepended. Autodev state sub_step.detail set to
"batch 108 next; AZ-839 C3".

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-23 14:02:53 +03:00
Oleksandr Bezdieniezhnykh 7eed4d6e76 chore: bump D-CROSS-CVE-1 opencv-pin leftover replay timestamp
PyPI re-queried 2026-05-23 13:44: only gtsam 4.2 published (numpy-1
ABI). Replay condition (numpy>=2 stable wheels) still NOT met.
Leftover remains open; opencv-python pin stays at >=4.11.0.86,<4.12.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-23 14:02:36 +03:00
28 changed files with 4481 additions and 23 deletions
File diff suppressed because one or more lines are too long
@@ -1,5 +1,21 @@
# Derkachi e2e: wire EXISTING parent-suite satellite-provider into the operator pre-flight fixture
> **Status (2026-05-23)**: **CLOSED** — Phases 1+2 shipped (cycle 3); Phases 35 **superseded by Epic AZ-835** per the 2026-05-22 user directive (route-driven seeding instead of bbox).
>
> | Phase | Outcome |
> |-------|---------|
> | Phase 1 (e2e-runner wire + C11 contract adapt + smoke test) | **SHIPPED** — batch 104, 2026-05-21 |
> | Phase 2 (`seed_region.py` CLI + `bbox.yaml` + license attribution) | **SHIPPED** — between batches 104 and 106 |
> | Phase 3 (real `operator_pre_flight_setup` fixture) | **SUPERSEDED** → AZ-839 (Epic AZ-835 C3, 5 SP) — route-driven, not bbox |
> | Phase 4 (un-xfail AC-4 + AC-5) | **SUPERSEDED** → AZ-841 (Epic AZ-835 C5, 1 SP) |
> | Phase 5 (docs) | **SUPERSEDED** → AZ-842 (Epic AZ-835 C6, 2 SP) |
>
> Total credited to AZ-777: 8 SP (per the 2026-05-21 single-ticket-containment override; Phases 1+2 fit within that envelope). Remaining work (~11 SP including AZ-836 / AZ-838 already shipped) is tracked under Epic AZ-835 children.
>
> Spec preserved as historical reference. **Do not implement Phases 35 from this file** — see the Epic AZ-835 children instead.
>
> See also: `_docs/_process_leftovers/2026-05-21_az777_complexity_override.md` (decision log).
**Task**: AZ-777_derkachi_c6_reference_fixture
**Name**: Drive the production C10/C11 pre-flight pipeline against the parent-suite `satellite-provider` .NET service ALREADY running in the Jetson e2e harness so the Derkachi clip produces a real FAISS-anchored C4/C5 satellite-fix loop end-to-end
**Description**: The Jetson e2e harness already runs the real `satellite-provider` .NET 8 service (lineage AZ-688 / AZ-691 / AZ-692, services `satellite-provider` + `satellite-provider-postgres` in `docker-compose.test.jetson.yml`), but the e2e-runner still points its `SATELLITE_PROVIDER_URL` at the legacy `mock-sat` fixture and the placeholder `operator_pre_flight_setup` fixture never drives the C10/C11 pipeline. Compounding this, C11's `HttpTileDownloader` path constants (`_LIST_PATH=/api/satellite/tiles`, `_GET_PATH=/api/satellite/tiles/{tile_id}`) do not match the real satellite-provider API surface (`POST /api/satellite/tiles/inventory` for LIST, `GET /tiles/{z}/{x}/{y}` for tile fetch). This task wires the existing service into the e2e-runner, adapts C11 to the real contract, seeds the Derkachi-bbox tile catalog via `POST /api/satellite/request`, replaces the placeholder fixture with a real C10+C11 driver, and un-xfails the Tier-2 Derkachi + AZ-699 verdict tests.
@@ -0,0 +1,85 @@
# operator_pre_flight_setup real fixture (AZ-835 C3)
**Task**: AZ-839_operator_pre_flight_setup_real_fixture
**Name**: operator_pre_flight_setup fixture: wire C1+C2+C11+C10 into real fixture, supersede AZ-777 Phase 3 (AZ-835 C3)
**Description**: Third building block of Epic AZ-835. Replace the placeholder `operator_pre_flight_setup` fixture (currently a `mkdir` stub at `tests/e2e/replay/conftest.py` lines 293-310) with a real driver that wires C1 (AZ-836) + C2 (AZ-838) + C11 (AZ-777 Phase 1) + C10 to populate C6 from a tlog-derived route. Supersedes AZ-777 Phase 3 (the bbox-seeded placeholder-replacement design) per the 2026-05-22 user directive — route-driven seeding is ~100x more tile-efficient and pre-commits to where the operator did fly per the tlog.
**Complexity**: 5 SP
**Dependencies**: AZ-836 (C1, RouteSpec + extractor — In Testing); AZ-838 (C2, SatelliteProviderRouteClient + seed_route.py CLI — In Testing); AZ-777 Phase 1 (e2e-runner ↔ satellite-provider wire + C11 contract adaptation — done, batch 104); AZ-322 (C10 DescriptorBatcher — done); AZ-316+AZ-777 Phase 1 (C11 HttpTileDownloader.download_for_bbox — done); AZ-306 (FAISS sidecar triple-consistency — done); AZ-835 (parent Epic)
**Component**: `tests/e2e/replay/conftest.py` (`operator_pre_flight_setup` fixture rewrite + new `PopulatedC6Cache` dataclass)
**Tracker**: AZ-839 (https://denyspopov.atlassian.net/browse/AZ-839)
**Parent Epic**: AZ-835
Jira AZ-839 is the authoritative spec; this file is the in-workspace mirror.
## Public surface
```python
from dataclasses import dataclass
from pathlib import Path
from gps_denied_onboard.replay_input.tlog_route import RouteSpec # AZ-836
@dataclass(frozen=True, slots=True)
class PopulatedC6Cache:
cache_root: Path # named-volume mount inside the e2e-runner container
tile_store_path: Path # postgres + filesystem store root
faiss_index_path: Path # .index file
faiss_sidecar_sha256_path: Path # .sha256 file
faiss_sidecar_meta_path: Path # .meta.json file
route_spec: RouteSpec # provenance — which tlog/route produced this cache
tile_count: int # how many tiles ended up in C6
elapsed_seconds: float # wall time, for the AC-1/AC-2 perf budget
```
The fixture remains a pytest fixture at `tests/e2e/replay/conftest.py::operator_pre_flight_setup`, same `session` scope as today. Input contract unchanged (same args the placeholder takes) plus a new dependency on `RouteSpec` — either fixture-injected or extracted from the test's tlog parameter via `extract_route_from_tlog`.
## Behaviour
1. Read the route spec (fixture-injected or extracted from test tlog via `extract_route_from_tlog`).
2. Instantiate `SatelliteProviderRouteClient` from env (`SATELLITE_PROVIDER_URL`, `SATELLITE_PROVIDER_API_KEY`, `SATELLITE_PROVIDER_TLS_INSECURE`).
3. Call `seed_route(route_spec)`. On `RouteValidationError` / `RouteTerminalFailureError` → re-raise with original cause. On `RouteTransientError` → retry up to 3 attempts using C11's `_DEFAULT_BACKOFF_SCHEDULE_S = (1, 2, 4, 8)`.
4. Enumerate tile coverage locally (mirror `route_client._enumerate_route_tile_coords` from AZ-838); call C11 `HttpTileDownloader.download_for_bbox` to pull every tile into C6.
5. Invoke C10 `DescriptorBatcher` against the populated C6 to build the FAISS HNSW index using the NetVLAD backbone (per `c2_vpr/config.py:67` default).
6. Verify sidecar coherence (`.index` + `.sha256` + `.meta.json` triple-consistency per AZ-306). Mismatch → `IndexUnavailableError`.
7. Yield `PopulatedC6Cache(...)`. On any failure path, clean up partial cache state (no half-built FAISS index left behind).
**Mount strategy**: write into a named docker volume that survives across pytest sessions. Cold first invocation populates; subsequent invocations within the same compose session reuse (warm cache). Same pattern AZ-777 Phase 3 originally specced; only the cache **source** changes (route, not bbox).
## Acceptance criteria
| # | Criterion |
|---|-----------|
| AC-1 | Cold first invocation on the Derkachi tlog completes in ≤ 5 min on Tier-2 Jetson (includes satellite-provider Google Maps round-trips). |
| AC-2 | Warm invocation within the same compose session completes in ≤ 30 s (named-volume reuse). |
| AC-3 | Yielded `PopulatedC6Cache` has all paths populated; `tile_count > 0`; FAISS sidecar triple-consistency passes (AZ-306). |
| AC-4 | `RouteValidationError` / `RouteTerminalFailureError` from `seed_route` is re-raised with original cause; no silent swallow. |
| AC-5 | `RouteTransientError` is retried up to 3 attempts using C11's existing backoff schedule; final attempt's exception is propagated. |
| AC-6 | Tamper test — corrupt one of the three sidecar files between fixture runs; next invocation raises `IndexUnavailableError`. |
| AC-7 | On any failure path inside the fixture, partial state is cleaned up (no half-built FAISS index, no orphaned postgres rows). |
| AC-8 | Unit tests (stubbed `SatelliteProviderRouteClient` + stubbed C11 + stubbed C10) cover: happy path, transient-retry, terminal-failure, validation-error, tamper-detection, cleanup-on-failure. |
| AC-9 | Integration test gated by `RUN_REPLAY_E2E=1` + `@pytest.mark.tier2` against the Jetson harness produces a real `PopulatedC6Cache` from the Derkachi tlog. |
## Out of scope
- Driving the airborne replay pipeline against the populated cache (AZ-840 / C4).
- Un-xfailing the existing AZ-777 AC-4 / AC-5 tests (AZ-841 / C5).
- Updating `replay_protocol.md` (AZ-842 / C6).
- Switching the C2 default backbone away from NetVLAD.
- Multi-tlog aggregate caches (one route per fixture invocation).
## Risks
**Risk 1 — Docker named-volume lifecycle across pytest sessions.** First invocation may leave half-populated volume on crash; the cleanup-on-failure path in step 7 must be robust. Mitigation: AC-7 covers explicitly + a `try/finally` around the four wiring steps.
**Risk 2 — Cold-start budget (AC-1, 5 min) tight on first Jetson run.** Google Maps round-trips for ~50-100 tiles may exceed budget on slow networks. Mitigation: instrument elapsed_seconds on every step and surface in the verdict report; if AC-1 fails, file a perf-tuning ticket rather than skipping the AC.
## References
- Parent Epic: AZ-835 — https://denyspopov.atlassian.net/browse/AZ-835
- Existing placeholder: `tests/e2e/replay/conftest.py` lines 293-310
- C1: AZ-836 (`extract_route_from_tlog`) — https://denyspopov.atlassian.net/browse/AZ-836
- C2: AZ-838 (`SatelliteProviderRouteClient`) — https://denyspopov.atlassian.net/browse/AZ-838
- AZ-777 (Phase 3+ superseded): `_docs/02_tasks/done/AZ-777_derkachi_c6_reference_fixture.md`
- C10 DescriptorBatcher: `src/gps_denied_onboard/components/c10_provisioning/descriptor_batcher.py`
- C11 HttpTileDownloader: `src/gps_denied_onboard/components/c11_tile_manager/tile_downloader.py`
- AZ-306 FAISS sidecar triple-consistency reference
@@ -0,0 +1,75 @@
# E2E orchestrator test (AZ-835 C4)
**Task**: AZ-840_e2e_orchestrator_test
**Name**: E2E orchestrator test ingesting raw (tlog, video, calibration) and running steps 1-7 (AZ-835 C4)
**Description**: Fourth building block of Epic AZ-835. A single pytest test that takes only `(tlog, video, calibration)` and runs the full 7-step pipeline end-to-end on the Jetson harness — without any operator hand-curation between steps. Extends or wraps the existing AZ-699 verdict test (`tests/e2e/replay/test_derkachi_real_tlog.py::test_az699_real_flight_validation_emits_verdict_and_report`) so the verdict-report-writing path is preserved. This is the test that closes the Epic's narrative: "give it a tlog + video, and the system does everything else."
**Complexity**: 3 SP
**Dependencies**: AZ-839 (C3, `operator_pre_flight_setup` real fixture — HARD); AZ-836 (C1, RouteSpec — In Testing); AZ-838 (C2, SatelliteProviderRouteClient — In Testing); AZ-699 (real flight validation runner — done); AZ-405 (tlog/video auto-sync — done); AZ-702 (camera factory-sheet calibration — done); AZ-696 (≥ 80 % within 100 m threshold — done); AZ-835 (parent Epic)
**Component**: `tests/e2e/replay/test_az835_e2e_real_flight.py` (new) OR extend `test_derkachi_real_tlog.py`
**Tracker**: AZ-840 (https://denyspopov.atlassian.net/browse/AZ-840)
**Parent Epic**: AZ-835
Jira AZ-840 is the authoritative spec; this file is the in-workspace mirror.
## Inputs (test parameters)
- `tlog_path: Path` — raw ArduPilot tlog binary (Derkachi as the reference fixture; parametrize for future tlogs).
- `video_path: Path` — raw flight video.
- `calibration_path: Path` — camera factory-sheet calibration (AZ-702).
## Pipeline orchestration
The 7 steps from the Epic:
1. **Active flight cut + tlog/video sync** — call AZ-405's `tlog_video_adapter`. If active-segment detection needs a small extension, file as an in-scope sub-fix; if it needs a meaningful new feature, STOP and propose a sibling ticket.
2. **On-fly frame + IMU extraction**`VideoFileFrameSource` + `TlogReplayFcAdapter`. No change.
3. **Auto-create route** — call AZ-836's `extract_route_from_tlog(tlog, max_waypoints=10)`. Assert the returned `RouteSpec` materially follows the tlog trajectory.
4. **POST route to satellite-provider** — delegate to AZ-839 (C3) fixture `operator_pre_flight_setup` (which itself calls AZ-838's `SatelliteProviderRouteClient.seed_route`). The fixture's `PopulatedC6Cache` is the dependency boundary.
5. **Build FAISS index** — driven by C3 fixture as part of populating the cache.
6. **Run gps-denied airborne pipeline** — invoke the `gps-denied-replay` console-script or equivalent direct-call entry point against the populated cache + tlog/video/calibration. Reuse the airborne composition root path AZ-699 exercises today.
7. **Get GPS fixes, check vs tlog GPS** — call `helpers/accuracy_report.py` + `helpers/gps_compare.py` to compute the horizontal-error distribution and emit the verdict report at `_docs/06_metrics/real_flight_validation_<YYYY-MM-DD>.md`.
## Test gating
- `@pytest.mark.tier2`.
- Skip-unless-env(`RUN_REPLAY_E2E=1`) with an explicit skip reason that names the missing env var — no silent skip.
## Verdict report
Emit ALWAYS, even on FAIL. The success criterion for AC-1 is that the report exists and the distribution is honest — NOT that the verdict is PASS.
## Acceptance criteria
| # | Criterion |
|---|-----------|
| AC-1 | Test takes only `(tlog, video, calibration)` and runs steps 1-7 end-to-end on Tier-2 Jetson. No operator hand-curation between steps. |
| AC-2 | Test produces the AZ-699 verdict report at `_docs/06_metrics/real_flight_validation_<YYYY-MM-DD>.md` with the honest horizontal-error distribution, REGARDLESS of PASS/FAIL on the AZ-696 AC-3 threshold (≥ 80 % within 100 m). |
| AC-3 | Test reuses the C3 fixture's `operator_pre_flight_setup` for steps 3-5; no duplicate seeding/downloading logic. |
| AC-4 | Test runs to completion within 15 min wall time on the Derkachi clip (soft target for first delivery; hard NFR set after first measurement is recorded in the report). |
| AC-5 | Mid-pipeline failure (e.g. step 4 satellite-provider rejection, step 5 FAISS sidecar mismatch) fails LOUD with a clear error pointing at the failing step. No silent skip past a failing step. |
| AC-6 | Test is gated by `RUN_REPLAY_E2E=1` + `@pytest.mark.tier2`; explicit skip reason names the missing env var. |
| AC-7 | The existing AZ-699 verdict test continues to pass (this test does not break or supersede it; either it lives alongside, or AZ-699 is folded into this test with the verdict-writing path preserved). |
| AC-8 | Unit tests cover the orchestration helper layer (parameter validation, error propagation between steps). The end-to-end happy path is the Jetson integration test. |
## Out of scope
- Un-xfailing the AZ-777 AC-4 / AC-5 tests (AZ-841 / C5).
- Documentation updates beyond the test file's own docstring (AZ-842 / C6).
- Real-time tlog ingestion (one finished `.tlog` per test invocation).
- Multi-flight aggregate validation.
- Performance optimization beyond the AC-4 soft target.
- Modifying the airborne composition root.
## Risks
**Risk 1 — Integration glue between AZ-405 tlog/video sync and the airborne pipeline's frame-source contract.** The auto-sync adapter and the airborne composition root were authored in different cycles; small impedance mismatches are likely. Mitigation: if the glue exceeds the 3 SP budget, STOP and propose a sub-ticket rather than expanding scope.
**Risk 2 — Step 1 active-segment detection may need extension.** AZ-405 covered tlog↔video sync; take-off/landing boundary detection may not be implemented. Mitigation: file an in-scope sub-fix if small; STOP and propose a sibling ticket if not.
## References
- Parent Epic: AZ-835 — https://denyspopov.atlassian.net/browse/AZ-835
- Hard dep (C3 fixture): AZ-839 — https://denyspopov.atlassian.net/browse/AZ-839
- Existing verdict test: `tests/e2e/replay/test_derkachi_real_tlog.py::test_az699_real_flight_validation_emits_verdict_and_report`
- Tlog/video adapter: `src/gps_denied_onboard/replay_input/tlog_video_adapter.py` (AZ-405)
- Helpers: `src/gps_denied_onboard/helpers/accuracy_report.py`, `src/gps_denied_onboard/helpers/gps_compare.py`
@@ -0,0 +1,57 @@
# Un-xfail AZ-777 AC-4 + AC-5 Tier-2 tests (AZ-835 C5)
**Task**: AZ-841_unxfail_az777_tier2_tests
**Name**: Un-xfail AZ-777 AC-4 + AC-5 Tier-2 tests once C3 fixture + C4 orchestrator land (AZ-835 C5)
**Description**: Fifth building block of Epic AZ-835. Once C3 (AZ-839, `operator_pre_flight_setup` real fixture) and C4 (AZ-840, e2e orchestrator test) land, remove the `@pytest.mark.xfail` markers from the AZ-777 Tier-2 tests. The verdict — PASS or FAIL — becomes the honest signal. Both tests remain gated by `RUN_REPLAY_E2E=1` + `@pytest.mark.tier2`.
**Complexity**: 1 SP
**Dependencies**: AZ-839 (C3, `operator_pre_flight_setup` real fixture — HARD); AZ-840 (C4, e2e orchestrator test — HARD); AZ-777 (being closed/superseded by this Epic; tests live in same file tree); AZ-835 (parent Epic)
**Component**: `tests/e2e/replay/test_derkachi_1min.py` (xfail removal) + `tests/e2e/replay/test_derkachi_real_tlog.py` (xfail removal)
**Tracker**: AZ-841 (https://denyspopov.atlassian.net/browse/AZ-841)
**Parent Epic**: AZ-835
Jira AZ-841 is the authoritative spec; this file is the in-workspace mirror.
## Targets
1. `tests/e2e/replay/test_derkachi_1min.py::test_ac3_within_100m_80pct_of_ticks` (AZ-777 AC-4) — remove `@pytest.mark.xfail`; verify `@pytest.mark.tier2` + `RUN_REPLAY_E2E` gating stays in place.
2. `tests/e2e/replay/test_derkachi_real_tlog.py::test_az699_real_flight_validation_emits_verdict_and_report` (AZ-777 AC-5) — remove `@pytest.mark.xfail`; verify gating stays in place.
## Verification
**On Tier-2 Jetson** (`RUN_REPLAY_E2E=1`):
- `test_ac3_within_100m_80pct_of_ticks` PASSES (≥ 80 % of ticks within 100 m of ground truth, log lines `replay.satellite_anchor_inserted` visible).
- `test_az699_real_flight_validation_emits_verdict_and_report` runs to completion within 15 min and emits `_docs/06_metrics/real_flight_validation_<YYYY-MM-DD>.md` with the honest distribution. PASS preferred but NOT required for AC-4 — emitting the honest report IS the success criterion.
**Locally** (no env):
- Both tests skip explicitly with a reason naming `RUN_REPLAY_E2E` — they MUST NOT pass as a side effect of being skipped.
## Acceptance criteria
| # | Criterion |
|---|-----------|
| AC-1 | `@pytest.mark.xfail` removed from both AZ-777 tests. |
| AC-2 | Both tests still gated by `@pytest.mark.tier2` + skip-unless-env(`RUN_REPLAY_E2E=1`). Skip reason names the missing env. |
| AC-3 | On Jetson with `RUN_REPLAY_E2E=1`, `test_ac3_within_100m_80pct_of_ticks` PASSES (≥ 80 % within 100 m). |
| AC-4 | On Jetson with `RUN_REPLAY_E2E=1`, `test_az699_real_flight_validation_emits_verdict_and_report` completes within 15 min and emits the verdict report. PASS preferred but not required for AC-4. |
| AC-5 | If either test FAILS on the metric (e.g. only 60 % within 100 m), the test reports FAIL honestly — no fallback to xfail or skip. Failure mode is a feature, not a bug. |
| AC-6 | Locally on a machine without `RUN_REPLAY_E2E`, both tests skip with an explicit reason. |
## Out of scope
- Modifying the airborne pipeline to improve metric performance (separate optimization tickets if AC-3 fails).
- Adding new test cases (this ticket only removes xfail; new cases belong to other tickets).
- Documentation updates (AZ-842 / C6).
- Modifying the verdict thresholds (AZ-696).
## Risks
**Risk 1 — Un-xfailed tests may FAIL on the metric.** If horizontal-error distribution comes in worse than the 80 % @ 100 m gate, this test reports FAIL. That outcome is in-scope for AC-5 (report honestly) and out-of-scope for this ticket's fix (file a separate optimization ticket).
## References
- Parent Epic: AZ-835 — https://denyspopov.atlassian.net/browse/AZ-835
- Hard deps: AZ-839 (C3), AZ-840 (C4)
- Tests: `tests/e2e/replay/test_derkachi_1min.py`, `tests/e2e/replay/test_derkachi_real_tlog.py`
- AZ-777 spec: `_docs/02_tasks/done/AZ-777_derkachi_c6_reference_fixture.md` (post-closure)
- Threshold spec: AZ-696 (≥ 80 % within 100 m)
- Verdict writer: `src/gps_denied_onboard/helpers/accuracy_report.py`
@@ -0,0 +1,65 @@
# Docs: replay_protocol.md + architecture.md + orchestrator-test README (AZ-835 C6)
**Task**: AZ-842_replay_protocol_and_orchestrator_docs
**Name**: Docs: replay_protocol.md Invariant 12 + AZ-777 Phase 3+ superseded note + orchestrator-test README (AZ-835 C6)
**Description**: Sixth and final building block of Epic AZ-835. Capture the route-driven flow in the authoritative documents so future implementers, operators, and reviewers understand what changed and why.
**Complexity**: 2 SP
**Dependencies**: AZ-841 (C5, un-xfail — SOFT; README describes test outcomes assuming C5 has landed); AZ-777 (being closed/superseded by this Epic — AZ-777 spec is updated during the AZ-777 closure step, verified by AC-6); AZ-835 (parent Epic)
**Component**: `_docs/02_document/contracts/replay/replay_protocol.md` + `_docs/02_document/architecture.md` + `tests/e2e/replay/README*.md`
**Tracker**: AZ-842 (https://denyspopov.atlassian.net/browse/AZ-842)
**Parent Epic**: AZ-835
Jira AZ-842 is the authoritative spec; this file is the in-workspace mirror.
## Modified files
### 1. `_docs/02_document/contracts/replay/replay_protocol.md` — Invariant 12 extension
Extend **Invariant 12** with an AZ-835 sub-section describing:
- The route-driven `operator_pre_flight_setup` fixture (AZ-839 / C3) flow: tlog → `RouteSpec``POST /api/satellite/route` → tile download → FAISS build → yield `PopulatedC6Cache`.
- Why route-driven supersedes the AZ-777 bbox approach (efficiency: ~100× fewer tiles; honesty: pre-commits to where the operator did fly).
- The C3 fixture's failure-handling contract (validation/terminal → re-raise; transient → retry up to 3 attempts using C11's existing backoff schedule).
### 2. `_docs/02_document/architecture.md` — satellite-provider entry extension
Append a sub-section to the existing satellite-provider entry noting that Epic AZ-835 + its C1-C5 children landed the full e2e real-flight validation path on top of AZ-777 Phase 1's wire + C11 contract adaptation. Mark AZ-777 Phase 3+ as superseded by Epic AZ-835 (pointer-only — the AZ-777 spec itself is updated in C5's wake during the AZ-777 closure step).
### 3. `tests/e2e/replay/README*.md` — orchestrator-test README
Either extend `tests/e2e/replay/README.md` or create a dedicated `tests/e2e/replay/README_AZ835.md` (prefer dedicated file if the existing README is already long). Short operator-facing content:
- How to run the new orchestrator test locally (env vars, Jetson SSH alias, expected runtime).
- What `(tlog, video, calibration)` triple to provide and where the reference Derkachi fixture lives.
- Where the verdict report is written and how to interpret it (PASS/FAIL on AZ-696 AC-3 threshold).
- Imagery-source caveat: Google Maps satellite (dev/research use only; production needs CC-BY migration on the satellite-provider side).
## Acceptance criteria
| # | Criterion |
|---|-----------|
| AC-1 | `replay_protocol.md` Invariant 12 has a new AZ-835 sub-section covering the route-driven flow, the bbox-supersedure rationale, and the failure-handling contract. |
| AC-2 | `architecture.md` satellite-provider entry has a sub-section noting Epic AZ-835's contribution and pointing at AZ-777 Phase 3+ as superseded. |
| AC-3 | `tests/e2e/replay/README*.md` exists and a new contributor can run the orchestrator test on Jetson using only the README's instructions (no out-of-band knowledge required). |
| AC-4 | All three docs link to the Epic (AZ-835) and to the relevant child tickets (AZ-836 / AZ-838 / AZ-839 / AZ-840 / AZ-841). |
| AC-5 | License attribution string ("Imagery © Google") and the dev-only caveat are present in the test README. |
| AC-6 | Cross-references in `_docs/02_tasks/_dependencies_table.md` and `_docs/02_tasks/done/AZ-777*.md` (once moved) point at this Epic / its children. |
## Out of scope
- Updating consumer-facing API/contract docs in `../satellite-provider/` (parent-suite owns those).
- Migrating imagery source to a CC-BY provider (parent-suite, out of scope for this Epic).
- Writing additional tutorials beyond the orchestrator-test README.
- ADR creation — no new architectural decision; this Epic implements existing decisions.
## Risks
**Risk 1 — Scope creep into reformatting unrelated doc sections.** Resist; this ticket only adds what AC-1..AC-5 require.
## References
- Parent Epic: AZ-835 — https://denyspopov.atlassian.net/browse/AZ-835
- Replay protocol: `_docs/02_document/contracts/replay/replay_protocol.md` Invariant 12
- Architecture: `_docs/02_document/architecture.md` (satellite-provider section)
- Tests directory: `tests/e2e/replay/`
- AZ-777 spec (being superseded): `_docs/02_tasks/done/AZ-777_derkachi_c6_reference_fixture.md` (post-closure)
@@ -0,0 +1,69 @@
# Relocate RouteSpec DTO to _types/route.py (AZ-507 rule 9 fix)
**Task**: AZ-845_refactor_relocate_routespec
**Name**: Relocate `RouteSpec` from `replay_input/tlog_route.py` to `_types/route.py`
**Description**: Resolve cycle-3 cumulative review F1 (High Architecture). Move the `RouteSpec` cross-component DTO to `_types/route.py` so the `c11_tile_manager.route_client` import becomes rule-9 compliant. Producer-side keeps backward-compat re-export so test imports do not break.
**Complexity**: 2 SP
**Dependencies**: None (anchor task of refactor run 02-az507-routespec-relocation)
**Component**: `_types/` (new file `route.py`); `replay_input/` (`tlog_route.py`, `__init__.py` modify); `components/c11_tile_manager/` (`route_client.py` modify)
**Tracker**: AZ-845 (https://denyspopov.atlassian.net/browse/AZ-845)
**Parent Epic**: AZ-844 (Refactor 02 — RouteSpec relocation + module-layout refresh + AZ-270 lint widening)
Jira AZ-845 is the authoritative spec; this file is the in-workspace mirror.
## Problem
`src/gps_denied_onboard/components/c11_tile_manager/route_client.py:56` imports `RouteSpec` from `gps_denied_onboard.replay_input.tlog_route`, violating `module-layout.md` rule 9 (AZ-507 cross-component contract surface). Per the rule, `components/<X>/*.py` may only import from `_types/*`, `_types.inference_errors`, `helpers/*`, `config`, `logging`, `fdr_client`, `clock`, `frame_source` (interface only), and its own subpackage. `replay_input` is not in this allow-list. Every other cross-component DTO already lives under `_types/*` (`_types/geo.py`, `_types/tile.py`, `_types/inference.py`, etc.); `RouteSpec` is the asymmetric outlier.
## Outcome
- `RouteSpec` is defined in `src/gps_denied_onboard/_types/route.py` (frozen+slots dataclass; full docstring carried over verbatim).
- `c11_tile_manager/route_client.py:56` imports `RouteSpec` from `gps_denied_onboard._types.route`.
- `replay_input/tlog_route.py` continues to use `RouteSpec` internally (extractor return type) by importing from `_types.route`; keeps `RouteSpec` in `__all__` for backward-compat re-export.
- `replay_input/__init__.py` re-exports `RouteSpec` from `_types.route` directly.
- All existing tests pass at HEAD.
- Rule-9 audit reports zero violations after the move.
## Scope
### Included
- New file: `src/gps_denied_onboard/_types/route.py` with `RouteSpec` dataclass.
- Modify `src/gps_denied_onboard/replay_input/tlog_route.py` (remove local definition, add import).
- Modify `src/gps_denied_onboard/replay_input/__init__.py` (re-export from `_types.route`).
- Modify `src/gps_denied_onboard/components/c11_tile_manager/route_client.py:56` (the rule-9 fix) plus the docstring snippet at file-top that names the source module.
- Optional hygiene: update 5 test files that import `RouteSpec` from `replay_input.tlog_route` directly (`tests/unit/replay_input/test_tlog_route.py`, `tests/unit/c11_tile_manager/test_route_client.py`, `tests/e2e/replay/_operator_pre_flight.py`, `tests/e2e/replay/test_e2e_orchestrator_unit.py`, `tests/e2e/replay/test_operator_pre_flight_driver.py`) to import from `_types.route` for symmetry.
### Excluded
- `RouteExtractionError` does NOT relocate — it is a `replay_input/`-specific error not imported by any `components/<X>/*.py` file.
- `extract_route_from_tlog` does NOT relocate — extraction logic is a `replay_input/` concern; only the DTO moves.
- No contract document at `_docs/02_document/contracts/shared_types/route.md`.
- No behaviour, performance, or contract-shape changes.
## Acceptance Criteria
| # | Criterion |
|---|-----------|
| AC-1 | `_types/route.py` contains `RouteSpec` with `@dataclass(frozen=True, slots=True)`, identical fields to the original (`waypoints`, `suggested_region_size_meters`, `source_tlog`, `source_segment`, `total_distance_meters`), and the full original docstring. |
| AC-2 | `route_client.py:56` reads `from gps_denied_onboard._types.route import RouteSpec`; rule-9 audit reports zero violations across `components/**/*.py`. |
| AC-3 | `replay_input/tlog_route.py` imports `RouteSpec` from `_types.route`; `extract_route_from_tlog` returns `RouteSpec`; `RouteSpec` is in `__all__` so `from replay_input.tlog_route import RouteSpec` resolves via re-export. |
| AC-4 | `from gps_denied_onboard.replay_input import RouteSpec` resolves to the same class object as `_types.route.RouteSpec` (verified by `is` identity check in test). |
| AC-5 | `pytest tests/unit/replay_input/test_tlog_route.py tests/unit/c11_tile_manager/test_route_client.py` passes — no failures, no skipped tests beyond pre-existing skips. |
## Constraints
- `RouteSpec` MUST remain `frozen=True, slots=True` (AZ-355 AC-2).
- `RouteSpec.__module__` MAY change to `gps_denied_onboard._types.route` (intended observable change; no test asserts on it).
- `from gps_denied_onboard.replay_input import RouteSpec` MUST keep working.
## Risks & Mitigation
**Risk 1 — pickle / serialization break**: confirmed by grep — no `pickle.dumps(route)` exists in `src/` or `tests/`. Risk does not materialize.
**Risk 2 — hidden import grep missed**: producer-side keeps `RouteSpec` in its namespace via re-import + `__all__`; lazy importers using the old path resolve correctly.
## Implementation Notes
- This is the anchor of refactor run 02-az507-routespec-relocation. AZ-846 (module-layout refresh) and AZ-847 (lint widening) are blocked by this task (Jira "Blocks" links recorded).
- After this task lands, run the rule-9 audit script (the widened lint from AZ-847 once it lands) to confirm zero violations.
@@ -0,0 +1,60 @@
# Refresh module-layout.md cycle-3 entries (c11 + replay_input + _types/route)
**Task**: AZ-846_refactor_module_layout_cycle3
**Name**: Refresh `module-layout.md` for cycle-3 file additions and the new `_types/route.py`
**Description**: Resolve cycle-3 cumulative review F2 (Medium Architecture). Update the c11_tile_manager Internal list, the shared/replay_input file list, and the `_types/` section in `module-layout.md` so they match on-disk reality. Cycle-2 carry-overs OUTSIDE these three sections are explicitly out of scope (deferred to a separate doc task).
**Complexity**: 2 SP
**Dependencies**: AZ-845 (the new `_types/route.py` file must exist before this task can register it)
**Component**: `_docs/02_document/module-layout.md` (single file)
**Tracker**: AZ-846 (https://denyspopov.atlassian.net/browse/AZ-846)
**Parent Epic**: AZ-844 (Refactor 02 — RouteSpec relocation + module-layout refresh + AZ-270 lint widening)
Jira AZ-846 is the authoritative spec; this file is the in-workspace mirror.
## Problem
`module-layout.md` is stale. Cycle-3 cumulative review F2 documents:
- **c11_tile_manager Internal list** lists 2 files (`satellite_provider_downloader.py`, `satellite_provider_uploader.py`); on-disk has 8 internal files plus `route_client.py` (cycle-3 NEW from batch 107).
- **shared/replay_input file list** is missing `errors.py` (cycle-2 carry), `tlog_ground_truth.py` (cycle-2 carry), `tlog_route.py` (cycle-3 NEW from batch 106).
- **`_types/` file list** does not yet include `route.py` (added by AZ-845).
`/implement` Step 4 (File Ownership) treats `module-layout.md` as authoritative; staleness BLOCKS any future task touching unregistered areas. F2 is currently Medium; severity escalates to High if a fourth consecutive cycle leaves it stale.
## Outcome
- c11_tile_manager Internal list registers all 8 internals + `route_client.py`.
- shared/replay_input file list registers `errors.py`, `tlog_ground_truth.py`, `tlog_route.py`.
- `_types/` section registers `route.py` with a one-line description matching the convention of other `_types/*.py` entries.
- `git diff` shows additions only to those three sections — no other section, rule, or rule-text edit.
## Scope
### Included
- Append cycle-3 + relevant cycle-2-carry entries to the c11_tile_manager Internal list, the shared/replay_input file list, and the `_types/` section.
### Excluded
- **Cycle-2 carry-overs OUTSIDE these sections**: `replay_api/` Per-Component Mapping entry, `cli/render_map.py`, `cli/replay_api_entrypoint.py`, `helpers/gps_compare.py`, `helpers/accuracy_report.py`. These are recorded in the cycle-3 retrospective and require a separate follow-up doc task with its own AZ ID.
- No code changes.
- No changes to `module-layout.md` rule numbering or rule text. Only the per-section file inventories are updated.
## Acceptance Criteria
| # | Criterion |
|---|-----------|
| AC-1 | c11_tile_manager Internal list contains all 8 existing internals (`_types.py`, `config.py`, `errors.py`, `idempotent_retry.py`, `signing_key.py`, `tile_downloader.py`, `tile_uploader.py`) plus `route_client.py`, alphabetised. |
| AC-2 | shared/replay_input file list adds `errors.py`, `tlog_ground_truth.py`, `tlog_route.py` with one-line descriptions matching the existing convention. |
| AC-3 | `_types/` section includes `route.py` with a one-line description of `RouteSpec` (waypoints + region size + source tlog provenance), identifying its producer (`replay_input/tlog_route.py`) and consumer (`c11/route_client.py`). |
| AC-4 | Diff of `module-layout.md` shows edits to ONLY the three named sections; no edits to other sections, rule numbering, or rule text. |
## Constraints
- Single file modified: `_docs/02_document/module-layout.md`.
- No tests required — documentation update.
- Scope discipline: cycle-2 doc carry-overs outside the three sections remain deferred.
## Risks & Mitigation
**Risk 1 — scope creep into cycle-2 carry-overs**: the Excluded list is explicit; Phase-4 implementer reviews the diff against ACs and rejects entries outside the three named sections at review.
@@ -0,0 +1,61 @@
# Widen test_az270_compose_root lint to enforce full rule-9 allow-list
**Task**: AZ-847_refactor_az270_lint_widening
**Name**: Widen `test_ac6_only_compose_root_imports_concrete_strategies` to enforce the full rule-9 allow-list
**Description**: Resolve cycle-3 cumulative review F3 (Medium Maintainability). Replace the AZ-270 lint's narrow `components → components` check with a full rule-9 allow-list check, so any future cross-component drift is caught at lint time rather than at cumulative-review time. Strict superset of the existing AC-6 check.
**Complexity**: 2 SP
**Dependencies**: AZ-845 (the widened lint must see a clean codebase to pass; running it against pre-AZ-845 HEAD is what AC-4 demonstrates as a one-time verification)
**Component**: `tests/unit/test_az270_compose_root.py` (single file)
**Tracker**: AZ-847 (https://denyspopov.atlassian.net/browse/AZ-847)
**Parent Epic**: AZ-844 (Refactor 02 — RouteSpec relocation + module-layout refresh + AZ-270 lint widening)
Jira AZ-847 is the authoritative spec; this file is the in-workspace mirror.
## Problem
`tests/unit/test_az270_compose_root.py:194-219` (`test_ac6_only_compose_root_imports_concrete_strategies`) walks `src/gps_denied_onboard/components/**/*.py` and flags only edges whose `node.module` starts with `gps_denied_onboard.components.` AND whose leaf-component is not the importer's component. The full rule-9 allow-list (8 prefixes plus `frame_source` interface-only restriction) is NOT enforced. Imports from `replay_input`, `replay_api`, `runtime_root`, `cli/*`, and `frame_source` non-interface modules pass silently. F1 of the cycle-3 cumulative review (the c11 → replay_input edge) is the concrete consequence.
`module-layout.md` rule 9 documents this lint as the enforcement mechanism for the rule. Reviewers reasonably assume the lint covers the documented allow-list; in practice it covers only one of the eight prefixes. The asymmetry is the F3 finding.
## Outcome
- `test_ac6_only_compose_root_imports_concrete_strategies` enforces the full rule-9 allow-list: a `components/<X>/*.py` ImportFrom node is allowed iff the imported module matches one of: `gps_denied_onboard.components.<X>.*` (own component), `gps_denied_onboard._types.*`, `gps_denied_onboard._types.inference_errors`, `gps_denied_onboard.helpers.*`, `gps_denied_onboard.config`, `gps_denied_onboard.logging`, `gps_denied_onboard.fdr_client`, `gps_denied_onboard.clock`, `gps_denied_onboard.frame_source` (interface-only — see Constraints).
- The widened lint is a strict superset of the existing AC-6 narrow check.
- After AZ-845 lands, the widened lint reports zero violations.
- The test docstring cites `module-layout.md` rule 9, not just AZ-270 AC-6.
## Scope
### Included
- Modify `tests/unit/test_az270_compose_root.py` — the `test_ac6_*` test and its docstring.
- Add a small allow-list constant at module scope (single source of truth).
- Verify by `pytest tests/unit/test_az270_compose_root.py` after AZ-845 lands.
### Excluded
- Changes to other tests in the same file.
- Changes to production code.
- The `frame_source` interface-only enforcement: if AST-level disambiguation between interface and non-interface modules within `frame_source/*` is not feasible, allow-list only the explicit interface module path and reject other `frame_source.*` paths. Document in the test docstring.
## Acceptance Criteria
| # | Criterion |
|---|-----------|
| AC-1 | The lint flags any ImportFrom in `components/**/*.py` whose `module` starts with `gps_denied_onboard.` and is NOT in the rule-9 allow-list. |
| AC-2 | Strict superset of the existing AC-6 narrow check — every cross-component edge previously flagged is still flagged. |
| AC-3 | After AZ-845 lands, the widened lint reports zero violations. |
| AC-4 | Against the codebase BEFORE AZ-845 (verified during implementation by running the new lint on a temp checkout of pre-relocation HEAD), the lint produces a failure naming the c11 → replay_input edge and citing rule 9. |
| AC-5 | The test docstring cites `module-layout.md` rule 9 (AZ-507 cross-component contract surface) and lists the allow-list. |
## Constraints
- `frame_source` interface-only requirement: if AST-level disambiguation is not feasible, allow-list only the explicit interface module path. Document the chosen disambiguation strategy in the test docstring. Surface to user if the documented intent and codebase reality disagree.
- The existing test name MAY remain (preserves AZ-270 audit trail) or be renamed; if renamed, update `module-layout.md` rule 9's enforcement-citation.
- Single file modified: `tests/unit/test_az270_compose_root.py`. No production source change.
## Risks & Mitigation
**Risk 1 — widening exposes another rule-9 violation**: STOP-and-surface protocol. The implement skill MUST stop and present the additional violation as a scope-decision Choose to the user, NOT auto-bundle into this task. Remediation of any newly-exposed violation is a separate AZ ticket.
**Risk 2 — false positive on `gps_denied_onboard.frame_source` non-interface module**: documented disambiguation strategy in the test docstring. If wrong, the failure surfaces as a deterministic test failure, not silent drift; surface to user.
@@ -0,0 +1,175 @@
# Batch 108 — Cycle 3 — AZ-839 operator_pre_flight_setup real fixture
**Date**: 2026-05-23
**Tasks**: AZ-839 (C3 — Epic AZ-835).
**Story points**: 5.
**Jira status**: AZ-839 → In Progress (transitioned at batch start);
moves to In Testing at commit step.
## What shipped
Third building block of Epic AZ-835. Replaces the placeholder
`operator_pre_flight_setup` pytest fixture (the previous `mkdir`
stub at `tests/e2e/replay/conftest.py:293-310`) with a real
driver that wires C1+C2+C11+C10 end-to-end:
1. **C1 RouteSpec** — extracted from the Derkachi tlog via AZ-836's
`extract_route_from_tlog` (the existing `derkachi_replay_inputs`
session fixture supplies the tlog path; the new fixture chains
off that contract).
2. **C2 SatelliteProviderRouteClient**`seed_route(spec)` with the
bounded transient-retry ladder documented in AZ-839 AC-5.
Validation / terminal failures propagate unchanged (AC-4).
3. **C11 HttpTileDownloader**`download_tiles_for_area(request)`
over a bbox derived from the route waypoints (mirrors C2's
internal `_enumerate_route_tile_coords` envelope without
importing the private helper).
4. **C10 DescriptorBatcher**`populate_descriptors(corpus_filter)`
builds the FAISS HNSW index over the populated C6 cache. The
AZ-306 sidecar triple-consistency is verified by re-loading the
index through a caller-supplied `descriptor_index_factory` after
the rebuild — any tampering surfaces as `IndexUnavailableError`
(AC-6).
5. **Cleanup-on-failure** — partial sidecar files written by the
driver are removed if any step raises, while pre-existing warm
cache files are preserved (AC-7).
Algorithm (`populate_c6_from_route`) is exposed through pure
dependency injection so the AC-8 unit tests run against stubs and
the AC-9 integration test runs the same algorithm against real
collaborators on the Jetson harness.
## Files changed
Tests / fixtures (4):
- `tests/e2e/replay/_operator_pre_flight.py` (new, ~430 lines) —
the AZ-839 driver: `PopulatedC6Cache` dataclass +
`populate_c6_from_route()` + private helpers
(`_seed_route_with_retry`, `_route_bbox`,
`_cleanup_partial_sidecars`).
- `tests/e2e/replay/conftest.py` — replaces the placeholder fixture
with the real `operator_pre_flight_setup` (session-scoped,
skip-gated by `RUN_REPLAY_E2E` + `SATELLITE_PROVIDER_URL` +
`SATELLITE_PROVIDER_API_KEY` + `BUILD_FAISS_INDEX` +
`GPS_DENIED_OPERATOR_CONFIG_PATH`); adds three private helpers
(`_operator_pre_flight_skip_reason`,
`_build_operator_pre_flight_cache`,
`_build_replay_backbone_embedder`,
`_resolve_replay_descriptor_dim`, `_default_tile_decoder`).
- `tests/e2e/replay/test_operator_pre_flight_driver.py` (new,
~410 lines) — 11 unit tests exercising AC-3 / AC-4 / AC-5 / AC-6
/ AC-7 against stubbed `SatelliteProviderRouteClient` /
`HttpTileDownloader` / `DescriptorBatcher` /
`descriptor_index_factory`.
- `tests/e2e/replay/test_operator_pre_flight_integration.py` (new,
~40 lines) — Tier-2 + RUN_REPLAY_E2E gated test that consumes the
fixture and asserts the `PopulatedC6Cache` invariants (AC-9
pytest entry point).
Tracker docs (1):
- `_docs/03_implementation/batch_108_cycle3_report.md` (this file).
No production-code (`src/gps_denied_onboard/**`) modifications.
The driver lives under `tests/` because AZ-839's outcome is the
fixture, not a new operator-binary surface; the wiring it does is
the existing operator-side runtime factories
(`runtime_root.c10_factory`, `runtime_root.c11_factory`,
`runtime_root.storage_factory`, `runtime_root.inference_factory`)
already shipped under prior epics.
## AC coverage
| AC | Test(s) | Status |
|----|---------|--------|
| AC-1 cold first invocation ≤ 5 min | exercised on Tier-2 via AC-9 integration test; `PopulatedC6Cache.elapsed_seconds` instruments the budget | DEFERRED (Tier-2 only) |
| AC-2 warm invocation ≤ 30 s | same gated test, re-invocation within session reuses the named-volume mount | DEFERRED (Tier-2 only) |
| AC-3 populated cache + sidecar triple | `test_populate_c6_from_route_returns_populated_cache` + `test_populate_c6_from_route_passes_sector_class_to_downloader` | PASS |
| AC-4 validation/terminal propagate | `test_route_validation_error_propagates_unchanged` + `test_route_terminal_failure_propagates_unchanged` | PASS |
| AC-5 transient retry ladder (3 attempts, backoff) | `test_route_transient_error_retries_then_succeeds` + `test_route_transient_error_exhausted_propagates_last_attempt` | PASS |
| AC-6 tamper detection → `IndexUnavailableError` | `test_descriptor_index_factory_index_unavailable_propagates` | PASS |
| AC-7 cleanup on failure (no half-built sidecars) | `test_cleanup_removes_partial_sidecar_files_on_failure` + `test_cleanup_preserves_pre_existing_warm_cache` + `test_batcher_failure_propagates_and_cleans_up` + `test_downloader_failure_propagates_and_cleans_up` | PASS |
| AC-8 unit tests with stubs (happy / transient / terminal / validation / tamper / cleanup) | 11 tests in `test_operator_pre_flight_driver.py` | PASS |
| AC-9 integration on Jetson via fixture | `test_operator_pre_flight_setup_produces_populated_cache` (RUN_REPLAY_E2E + tier2 gated) | DEFERRED (Tier-2 only) |
DEFERRED ACs (AC-1, AC-2, AC-9) execute on the Jetson e2e harness
when `RUN_REPLAY_E2E=1` + `SATELLITE_PROVIDER_URL` +
`SATELLITE_PROVIDER_API_KEY` + `BUILD_FAISS_INDEX=ON` +
`GPS_DENIED_OPERATOR_CONFIG_PATH` are set. The pytest entry point
exists and skips explicitly per `.cursor/skills/implement/SKILL.md`
Step 8 ("a skipped test counts as Covered").
## Test run results
```
$ .venv/bin/pytest tests/e2e/replay/test_operator_pre_flight_driver.py -v --tb=short
============================== 11 passed in 0.33s ==============================
$ .venv/bin/pytest tests/e2e/replay/test_operator_pre_flight_integration.py -v --tb=short
============================== 1 skipped in 0.29s ==============================
(SKIPPED — Tier-2-only test; set GPS_DENIED_TIER=2 to run)
$ .venv/bin/pytest tests/e2e/replay/ -v --tb=short --timeout=60
====================== 28 passed, 8 skipped in 1.14s =======================
```
Suite-wide test run is deferred to Step 11 (Run Tests) per the
iterative-skill exception in `.cursor/rules/coderule.mdc` — batch
108 is a batch, not the end of cycle-3 implementation.
## Code review (self-review)
Per `.cursor/rules/no-subagents.mdc`, the structured `/code-review`
skill is run inline. Verdict: **PASS_WITH_WARNINGS**.
| Phase | Result |
|-------|--------|
| 1. Context loading | AZ-839 task spec + dependencies (AZ-836 RouteSpec, AZ-838 SatelliteProviderRouteClient, AZ-322 DescriptorBatcher, AZ-316 HttpTileDownloader, AZ-306 FaissDescriptorIndex) all read prior to implementation. The FAISS triple-consistency check was verified against `faiss_descriptor_index._load()` source. |
| 2. Spec compliance | AC-3 / AC-4 / AC-5 / AC-6 / AC-7 / AC-8 directly covered. AC-1 / AC-2 / AC-9 deferred to Tier-2 harness (gated tests exist). **No Medium / High findings.** |
| 3. Code quality | Driver is one function with one responsibility (orchestrate the C1+C2+C11+C10 pipeline); SRP upheld. Each helper is named after its job (`_seed_route_with_retry`, `_route_bbox`, `_cleanup_partial_sidecars`). Functions ≤ ~80 lines. Explicit exception filtering (`RouteValidationError`, `RouteTerminalFailureError`, `RouteTransientError`) — no bare except. Tests follow Arrange/Act/Assert with comment markers per `coderule.mdc`. |
| 4. Security quick-scan | JWT consumed via env-sourced kwargs, never logged. The cleanup path does not unlink files outside the `cache_root/` tree (only the three sidecar paths the driver was handed). |
| 5. Performance scan | O(n) over waypoints (n ≤ 10 by AZ-836's `max_waypoints` default). No new N+1. The retry ladder respects the AZ-838 `_DEFAULT_BACKOFF_SCHEDULE_S` cadence verbatim. |
| 6. Cross-task consistency | Single-task batch — N/A. |
| 7. Architecture compliance | `_operator_pre_flight.py` lives under `tests/e2e/replay/` (test infrastructure). Imports only from C10 / C11 / C6 public surfaces and from `replay_input.tlog_route.RouteSpec` (Adapter layer per `module-layout.md`). The conftest fixture wires deps via the existing `runtime_root` factories — does not import concrete impl modules directly. No cross-component imports between C-prefixed components. No new cyclic dependencies. ADR check skipped (no ADRs directory). |
### Findings
**F1 (Low) — `_default_tile_decoder` lives in conftest.py**
`_default_tile_decoder` (JPEG → CHW float32 numpy) lives in the
test conftest. The same primitive will be needed by the eventual
replay-mode operator binary (Epic AZ-835 follow-up); promoting it
into `runtime_root` is out of scope for AZ-839 (which is "wire C10
into a real fixture"), but it is on the path of AZ-840 / AZ-841.
**Recommendation**: leave as-is for AZ-839; revisit during AZ-840.
**F2 (Low) — `_resolve_replay_descriptor_dim` is NetVLAD-only**
The NetVLAD descriptor dim resolver pinned at `c2_vpr/config.py:67`
matches the AZ-839 task spec's "Out of scope" §, but it skips the
fixture if any other backbone is configured. **Recommendation**:
when AZ-840 needs a non-NetVLAD backbone, extend the resolver
table per strategy. Tracking via the AZ-840 spec is sufficient.
### Deltas vs. spec
None. The task spec mentions `download_for_bbox`; the actual
production method is `download_tiles_for_area` (a `bbox`-aware
single-zoom request via `DownloadRequest`). The spec was informal
on the method name; the production API (which has been stable
since AZ-316) was honoured.
## Notes for follow-up
- AZ-840 (e2e orchestrator test) consumes this fixture. The
fixture already returns a typed `PopulatedC6Cache` so AZ-840 has
a concrete contract to assert against.
- AZ-841 (un-xfail AZ-777 Tier-2 tests) builds on AZ-839 + AZ-840.
The existing `test_ac8_operator_workflow` skip reason in
`test_derkachi_1min.py` (D-PROJ-2 mock-suite-sat-service) is
stale post-AZ-839 — AZ-841 will rewrite it to consume the new
fixture.
- AZ-842 (docs — replay_protocol.md Invariant 12 + architecture +
orchestrator README) describes the route-driven flow this batch
ships.
@@ -0,0 +1,179 @@
# Batch 108b — Cycle 3 — AZ-839 conftest path-mismatch fix
**Date**: 2026-05-23
**Tasks**: AZ-839 (C3 — Epic AZ-835).
**Story points**: 0 (defect fix on top of the AZ-839 batch 108
ship; counts under the existing 5 SP envelope).
**Jira status**: AZ-839 reopened (In Testing → In Progress) at the
start of this batch on the 2026-05-23 self-review finding;
re-transitions to In Testing at commit step.
## Why this batch exists
The AZ-839 batch 108 self-review verdict was PASS_WITH_WARNINGS
based on 11 driver unit tests + 28 replay-suite passes. While
reading the C3 fixture to plan the AZ-840 orchestrator, a real
path-mismatch defect surfaced that **AC-3 / AC-6 unit tests
could not catch** because every unit test stubs the
`descriptor_index_factory`. The defect was not introduced by
batch 108b — it was missed by batch 108's self-review and would
have failed the AC-9 Tier-2 integration test on first execution.
Per `meta-rule.mdc` "Real Results, Not Simulated Ones" the work
was paused before any AZ-840 code was written, the user was given
a Choose A/B/C/D, and option A (reopen AZ-839, fix, recommit) was
selected.
## The defect
In `tests/e2e/replay/conftest.py::_build_operator_pre_flight_cache`:
* `tile_store = build_tile_store(config)` constructed a
`PostgresFilesystemStore` whose filesystem root came from
`config.components["c6_tile_cache"].root_dir` — i.e. the static
YAML path baked into the operator config (default
`/var/lib/gps-denied/tiles`).
* `descriptor_index = build_descriptor_index(config)` constructed
a `FaissDescriptorIndex` at
`<config.root_dir>/descriptor.index`.
* `_descriptor_index_factory()` (the AC-3 / AC-6 verifier seam)
constructed a SEPARATE `FaissDescriptorIndex` at
`cache_root / "descriptor.index"` — the freshly-mktemp'd
fixture path.
* On Tier-2 those two paths cannot be equal: `cache_root` is
generated at test time by `tmp_path_factory`; the static YAML
carries a path that is fixed at config-load time.
* Result: `descriptor_batcher.populate_descriptors()` writes the
rebuilt FAISS triple under the static YAML root; the verifier
then opens `cache_root/descriptor.index` and finds nothing,
raising `IndexUnavailableError` from `FaissDescriptorIndex._load`.
The fixture would have failed to ever yield a `PopulatedC6Cache`
on Tier-2 — AC-3 (paths populated) and AC-6 (sidecar coherence)
both unreachable.
The same shape applied to the tile filesystem: `tile_store_path =
cache_root / "tile_store"` did not match the actual
`PostgresFilesystemStore` layout (`<root_dir>/tiles/`).
## The fix
`_build_operator_pre_flight_cache` now mutates the in-memory
`c6_tile_cache` config block so the production C6 components and
the verifier all read/write under the fixture's `cache_root`:
```python
c6_block = config.components["c6_tile_cache"]
c6_block_overridden = dataclasses.replace(
c6_block,
root_dir=str(cache_root),
faiss_index_path="", # force fallback to <root_dir>/descriptor.index
)
config = dataclasses.replace(
config,
components={**config.components, "c6_tile_cache": c6_block_overridden},
)
tile_store_path = cache_root / "tiles"
faiss_index_path = cache_root / "descriptor.index"
```
After the override:
* `build_tile_store(config)` writes under `cache_root/tiles/`.
* `build_descriptor_index(config)` rebuilds at
`cache_root/descriptor.index` (+ `.sha256` + `.meta.json`).
* `_descriptor_index_factory()` reads from the same
`cache_root/descriptor.index` — triple-consistency check now has
files to validate.
* `PopulatedC6Cache.tile_store_path` matches the
`PostgresFilesystemStore.__init__` layout (`self._tiles_dir =
self._root_dir / "tiles"`); the integration test's
`populated.tile_store_path.is_dir()` assertion will hold.
The existing operator-config YAML stays unchanged — the override
is in-memory, scoped to the fixture session, and never touches the
disk file the operator wrote.
## Files changed
* `tests/e2e/replay/conftest.py` — added `import dataclasses`;
added the c6_tile_cache override block + comment in
`_build_operator_pre_flight_cache`; renamed
`tile_store_path = cache_root / "tile_store"`
`cache_root / "tiles"` to match `PostgresFilesystemStore` layout;
removed the unused `tile_store_path.mkdir(...)` (the store's
constructor creates it).
No driver, unit-test, or integration-test changes. The driver's
public API (`populate_c6_from_route`, `PopulatedC6Cache`) is
unchanged.
## AC coverage delta
The minimal fix narrows AC-3 (paths populated) and AC-6 (sidecar
coherence) from "would have failed on Tier-2" to "actually
verifiable on Tier-2". No AC was previously claimed PASS that
this batch downgrades.
## Test run results
```
$ .venv/bin/pytest tests/e2e/replay/ -v --tb=short --timeout=60
============================ 28 passed, 9 skipped in 3.08s ===========================
```
Same outcome as batch 108. The unit suite is path-agnostic (every
test in `test_operator_pre_flight_driver.py` injects its own
paths through `_build_harness`) so the fix has no observable
effect on the green path. The 9 skipped tests are
RUN_REPLAY_E2E + Tier-2 gated; they will exercise the fix on the
Jetson harness when AZ-839's AC-9 integration test next runs.
## Code review (self-review of batch 108b)
Verdict: **PASS** (single-finding fix; no new findings).
| Phase | Result |
|-------|--------|
| 1. Context loading | Re-read `storage_factory.py` + `postgres_filesystem_store.py` + `faiss_descriptor_index.py` to confirm where `root_dir` / `faiss_index_path` are honoured. |
| 2. Spec compliance | AZ-839 AC-3 / AC-6 are now reachable on Tier-2; AC-9 entry point unchanged. |
| 3. Code quality | Comment names the failure mode the override prevents. `dataclasses.replace` is used twice rather than mutating frozen dataclasses. The new `tile_store_path` matches the production layout exactly. |
| 4. Security quick-scan | The override only changes paths; no DSN, JWT, or env-secret handling moved. |
| 5. Performance scan | No-op — the override runs once per session, before any heavy I/O. |
| 6. Cross-task consistency | Single-defect batch — N/A. |
| 7. Architecture compliance | The fixture stays in `tests/`; mutating `config.components` is a documented composition-root pattern (see `Config.with_blocks`). No new src/ writes. |
## Self-review meta — why batch 108 missed this
The batch 108 self-review went through all 7 review phases but
relied on the unit-test pass count for AC-3 / AC-6 confidence.
Every unit test injected its own `descriptor_index_factory`, so
the fixture's wiring of that factory to `cache_root` was never
exercised against the real production wiring of `descriptor_index`
to `config.root_dir`. Phase 7 (Architecture compliance) noted
"the conftest fixture wires deps via the existing `runtime_root`
factories — does not import concrete impl modules directly" but
did not check that the wiring was internally consistent.
Preventive lesson (no rule change yet — surfacing for AZ-840
follow-up): **when a fixture wires production components from a
config and ALSO constructs a side verifier from a different
source of truth, the two paths must be derived from a single
upstream value or asserted equal at fixture-setup time.** This
goes into the AZ-839 leftover note for AZ-840 to act on or to
escalate to a `coderule.mdc` rule update.
## Notes for follow-up
* AZ-840 (e2e orchestrator test) — this batch unblocks AZ-840
AC-3 (which hard-depends on the C3 fixture producing a usable
cache). AZ-840 will additionally need to feed the airborne
replay binary a config that points at the same `cache_root`
(the binary takes a single `--config <path>` and cannot read
the in-memory mutation); the cleanest path is for AZ-840 to
write an effective YAML at runtime from the same override
recipe used here. AZ-840's batch report will record the choice.
* AZ-839's batch 108 self-review process is being noted as a
partially-effective gate. No `coderule.mdc` rule change yet —
the `meta-rule.mdc` "Real Results" rule already covers the
general case; AZ-840's planning will check whether a more
specific fixture-vs-config-wiring rule is warranted.
@@ -0,0 +1,171 @@
# Batch 109 — Cycle 3 — AZ-840 e2e orchestrator test
**Date**: 2026-05-23
**Tasks**: AZ-840 (C4 — Epic AZ-835).
**Story points**: 3 (per the task spec).
**Jira status**: AZ-840 In Progress → In Testing at commit step.
## Why this batch exists
Epic AZ-835 (real-flight e2e validation) needs a single Tier-2
test that proves the 7-step pipeline runs from
`(tlog, video, calibration)` to a horizontal-error verdict
without operator hand-curation between steps. Steps 3-5 were
delivered by AZ-839 (C3 — `operator_pre_flight_setup`); steps
1-2-6-7 are this batch.
The AZ-839 batch 108b follow-up note explicitly anticipated this
batch: "AZ-840 will additionally need to feed the airborne
replay binary a config that points at the same `cache_root`
... the cleanest path is for AZ-840 to write an effective YAML
at runtime from the same override recipe used here."
## What this batch ships
A driver module + unit test suite + Tier-2 integration test:
* `tests/e2e/replay/_e2e_orchestrator.py` — wraps the AZ-699
verdict-report path with the AZ-839 C3 fixture's
`PopulatedC6Cache`. Public surface:
* `OrchestratorStep` enum — failure-step labels per AC-5.
* `OrchestrationFailure(step, message)` exception — wraps
every step failure with the step name in the message prefix.
* `OrchestrationReport` dataclass — verdict, distribution,
paths, wall-clock measurements per AC-4.
* `write_effective_replay_config` — small helper that overlays
`c6_tile_cache.root_dir` onto the static operator YAML.
* `read_calibration_acquisition_method` — mirror of AZ-699's
helper so the report writer keeps the same shape.
* `run_e2e_orchestration` — the AC-1 entry point wiring
validate → write_config → airborne subprocess → parse JSONL
→ load tlog GT → compute distribution → render report.
* `tests/e2e/replay/test_e2e_orchestrator_unit.py` — 17 unit
tests covering each of the 7 steps' failure modes plus the
happy path. The runner is injected (`subprocess.run` default)
so unit tests stage synthetic JSONL output without touching
the airborne binary. `load_tlog_ground_truth` is monkeypatched
to return a synthetic 3-row series.
* `tests/e2e/replay/test_az835_e2e_real_flight.py::
test_az840_e2e_real_flight_orchestration` — Tier-2 + RUN_REPLAY_E2E
gated test that consumes the C3 fixture + Derkachi inputs and
asserts the verdict markdown is written, the threshold-hit
share table is present, and the 15-min budget held.
## AC coverage
| AC | Description | Coverage |
|-----|----------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------|
| AC-1| Steps 1-7 end-to-end on Tier-2 from a fresh tlog/video | `test_az840_e2e_real_flight_orchestration` (Tier-2-gated); 17 unit tests prove the orchestrator structure |
| AC-2| Verdict report exists either PASS or FAIL | `test_run_e2e_orchestration_writes_report_even_on_fail_verdict` + integration assertion `report_path.is_file()` |
| AC-3| Reuses C3 fixture (`operator_pre_flight_setup`) | Integration test consumes the fixture; effective config overlay points at `populated_cache.cache_root` |
| AC-4| 15-min wall-time soft target on the Derkachi clip | `_DEFAULT_MAX_SECONDS = 900.0` passed as `subprocess.run` `timeout`; integration asserts `replay_subprocess_seconds <= 900`|
| AC-5| Mid-pipeline failure fails LOUD with a clear step prefix | `OrchestratorStep` enum + 8 step-specific failure unit tests (`validate`/`write_config`/`airborne` × 3/`parse` × 2/`gt`) |
| AC-6| Gated by `RUN_REPLAY_E2E=1` + Tier-2 marker | `_orchestrator_skip_reason()` checks env vars + binary + video size; `@pytest.mark.tier2` decorator |
| AC-7| AZ-699 verdict test continues to pass | No changes to `test_derkachi_real_tlog.py`; same `real_flight_validation_<date>.md` report path convention |
| AC-8| Unit-tested orchestration helper without Tier-2 inputs | 17 unit tests covering config write (4) + calibration parse (3) + run helper (10) — all use mocked subprocess + GT loader |
## Test run results
```
$ .venv/bin/pytest tests/e2e/replay/ -v --tb=short --timeout=60
============================ 45 passed, 10 skipped, 3 warnings in 0.78s ============
```
Breakdown:
* 17 new orchestrator unit tests pass.
* 11 AZ-839 driver unit tests still pass (no driver changes).
* 14 helper unit tests (`test_helpers.py`) still pass.
* 3 derkachi-1min mode-agnostic AST tests still pass.
* 10 skips: 1 new Tier-2 (this AZ-840 integration), 6
RUN_REPLAY_E2E gated AZ-404 cases, 1 AC-8 D-PROJ-2 placeholder,
1 Tier-2 AZ-699, 1 Tier-2 AZ-839 integration. None are
regressions; the tier2 gate trips off-Jetson.
## Design notes
### `--auto-trim` ownership
The orchestrator passes `--auto-trim` unconditionally so AZ-405 /
AZ-698 active-flight-cut + tlog/video sync (Epic step 1) runs
inside the airborne binary every time. The Epic narrative does
not separate trim from the airborne pipeline; collapsing them
into a single subprocess invocation matches AZ-699 and avoids
duplicating the trim path.
### `clip_duration_s` parity with AZ-699
`run_e2e_orchestration` computes
`clip_duration_s = ground_truth[-1].t_s - ground_truth[0].t_s`
exactly as `test_derkachi_real_tlog.py` does. This means both
verdict reports name the same clip duration even when the
trimmed video is shorter than the ground-truth window — a
deliberate choice: the report header documents what the verdict
covers, not what the binary processed.
### Effective config write — single source of truth
`write_effective_replay_config` materialises the same override
recipe AZ-839 uses in-memory, but on disk so the airborne
subprocess sees the cache_root the fixture chose. Field-level
merge: every other block in the operator YAML is preserved
verbatim; only `c6_tile_cache.root_dir` and
`c6_tile_cache.faiss_index_path` are overwritten. The static
operator YAML on disk is never touched.
### Failure surface = step prefix
`OrchestrationFailure` always prefixes its message with
`[<step>]`. CI log scrapers and pytest's traceback printer both
surface the prefix on the first line; AC-5 ("clear error
pointing at the failing step") holds without requiring the test
to inspect the exception object. The step is also exposed as
`exc.step` for programmatic assertions.
## Files changed
* `tests/e2e/replay/_e2e_orchestrator.py` (new, 656 LOC).
* `tests/e2e/replay/test_e2e_orchestrator_unit.py` (new, 660+ LOC).
* `tests/e2e/replay/test_az835_e2e_real_flight.py` (new, 156 LOC).
No `src/` changes, no operator-config YAML changes, no AZ-839
driver changes. AZ-840 is purely additive at the test layer.
## Code review (self-review)
Verdict: **PASS_WITH_WARNINGS**.
| Phase | Result |
|-------|--------|
| 1. Context loading | Re-read `gps_compare.py`, `accuracy_report.py`, `replay_input.py`, `cli/replay.py`, `test_derkachi_real_tlog.py`. Emission schema (`emitted_at`, `position_wgs84`) is the same shape `gps-denied-replay` writes. |
| 2. Spec compliance | All 8 AZ-840 ACs covered; AC-7 holds by inspection (no AZ-699 changes). |
| 3. Code quality | All public types have docstrings; failure messages name the upstream exception via `repr` so `OSError` / `subprocess.TimeoutExpired` carry through. Runner kw-args mirror `subprocess.run` signature 1:1. |
| 4. Security quick-scan | Effective config write goes to a tmp file the test owns; no secrets in the YAML overlay (override is two string fields). Subprocess `env` is opt-in (`None` defaults to `os.environ`). |
| 5. Performance scan | Unit tests run in 0.51 s. Tier-2 wall-clock cap is 900 s, enforced by the subprocess timeout. |
| 6. Cross-task consistency | `clip_duration_s` and `report_path` match AZ-699 exactly so a single Jetson run produces the same markdown shape. |
| 7. Architecture compliance | Orchestrator lives entirely under `tests/e2e/replay/`; no `src/` writes. C3 fixture's invariants (`PopulatedC6Cache.cache_root` is the single source of truth) propagate via `write_effective_replay_config`. |
## Findings
| ID | Severity | Description | Disposition |
|----|----------|-------------|-------------|
| F1 | Low | `_default_tile_decoder` in `conftest.py` (carried from batch 108) — still raw TIFF. Not in the AZ-840 path; AZ-840 doesn't change tile decoding. | Defer; no AZ-840 ticket. |
| F2 | Low | `_resolve_replay_descriptor_dim` is NetVLAD-only (carried from batch 108). AZ-840 doesn't change descriptors. | Defer; no AZ-840 ticket. |
| F3 | Low | `--pace asap` is hardcoded in `_run_replay_subprocess` argv; the AZ-699 test passes `--pace asap` too, so behaviour is identical. If a future test wants a real-time pace, the runner kwarg is the seam. | Document; no ticket. |
| F4 | Low | `_run_replay_subprocess` does not stream stdout/stderr; failures surface only after the subprocess exits. For 15-min runs this means the operator sees no progress until the budget expires. AZ-699 has the same shape. | Document; consider an AZ-* if the budget grows. |
## Notes for follow-up
* AZ-840 lands the orchestrator test as Tier-2-gated. Verifying
the Tier-2 path actually runs on the Jetson harness is the
next gating step before Epic AZ-835 can flip from "covered by
unit tests" to "covered by Tier-2 integration".
* `_e2e_orchestrator.py` is intentionally kept under `tests/`
rather than promoted to `src/`. If a second consumer of the
same orchestration shape appears (e.g. AZ-833 mock-suite-sat
parity test), the move to a shared helper module under
`src/gps_denied_onboard/replay/` is the right next step;
for now the test-only location matches the helper's only
consumer.
* AZ-841 (Tier-2 unxfail follow-up) and AZ-842 (replay protocol
+ orchestrator docs) sit downstream — both should reference
this batch report in their planning sections.
@@ -0,0 +1,178 @@
# Cumulative Code Review — Cycle 3 — Batches 104109
**Date**: 2026-05-23
**Scope**: union of files changed across cycle-3 batches 104, 106, 107, 108, 108b, 109
**Tasks covered**: AZ-777 spec refresh + Phase 1 + Phase 2; AZ-836 (Epic AZ-835 C1); AZ-838 (Epic AZ-835 C2); AZ-839 (Epic AZ-835 C3) + 108b fixture-path fix; AZ-840 (Epic AZ-835 C4)
**Mode**: cumulative (all 7 phases)
**Verdict**: **FAIL** (0 Critical, 1 High, 2 Medium, 0 Low)
**Baseline file**: `_docs/02_document/architecture_compliance_baseline.md`**still absent** (carried over from cycle 2 retro action), no `## Baseline Delta` section emitted (see Notes)
## Scope of files reviewed
**Production source** (6 files):
1. `src/gps_denied_onboard/components/c11_tile_manager/tile_downloader.py` — modified (b104; AZ-777 Phase 1 contract adaptation: `_LIST_PATH` / `_GET_PATH` aligned with `Program.cs:187-209`)
2. `src/gps_denied_onboard/components/c11_tile_manager/route_client.py`**new** (b107; ~600 LOC; `SatelliteProviderRouteClient`, `RouteSeedResult`, helpers)
3. `src/gps_denied_onboard/components/c11_tile_manager/errors.py` — modified (b107; new `SatelliteProviderRouteError` + `RouteValidationError` + `RouteTransientError` + `RouteTerminalFailureError`)
4. `src/gps_denied_onboard/components/c11_tile_manager/__init__.py` — modified (b107; re-exports new public surface)
5. `src/gps_denied_onboard/replay_input/tlog_route.py`**new** (b106; `RouteSpec`, `RouteExtractionError`, `extract_route_from_tlog`)
6. `src/gps_denied_onboard/replay_input/__init__.py` — modified (b106; re-exports new public surface)
**Tests** (10 files): `tests/unit/c11_tile_manager/test_tile_downloader.py` (rewritten, b104; 14 ACs), `tests/unit/replay_input/test_tlog_route.py` (new, b106; 14 tests), `tests/e2e/satellite_provider/__init__.py` + `test_smoke.py` (new, b104; 2 tier-2 tests), `tests/e2e/replay/_operator_pre_flight.py` (new, b108; ~430 LOC), `tests/e2e/replay/conftest.py` (modified, b108+b108b), `tests/e2e/replay/test_operator_pre_flight_driver.py` (new, b108; 11 unit tests), `tests/e2e/replay/_e2e_orchestrator.py` (new, b109; 656 LOC), `tests/e2e/replay/test_e2e_orchestrator_unit.py` (new, b109; 17 unit tests), `tests/e2e/replay/test_az835_e2e_real_flight.py` (new, b109; tier-2 integration).
**CLI / fixtures** (2 files): `tests/fixtures/derkachi_c6/seed_route.py` (new, b107), `scripts/mint_dev_jwt.py` (new, b104).
**Compose / env** (2 files): `docker-compose.test.jetson.yml` (modified, b104), `.env.test.example` (modified, b104).
## Findings
| # | Severity | Category | File | Title |
|---|----------|----------|------|-------|
| F1 | High | Architecture | `src/gps_denied_onboard/components/c11_tile_manager/route_client.py:56` | `RouteSpec` DTO placement violates AZ-507 cross-component contract surface |
| F2 | Medium | Architecture | `_docs/02_document/module-layout.md` | Module layout stale — cycle-3 additions unregistered (cycle-2 carry-over worsened) |
| F3 | Medium | Maintainability | `tests/unit/test_az270_compose_root.py:194` | `test_ac6_only_compose_root_imports_concrete_strategies` lint scope is narrower than module-layout.md rule 9 |
### Finding Details
**F1: `RouteSpec` DTO placement violates AZ-507 cross-component contract surface** (High / Architecture)
- Location: `src/gps_denied_onboard/components/c11_tile_manager/route_client.py:56`
- Description: `route_client.py` (a `components/c11_tile_manager/*.py` file) imports `RouteSpec` from `gps_denied_onboard.replay_input.tlog_route`. Per `module-layout.md` rule 9 (AZ-507 cross-component contract surface):
> "the only places a `components/<X>/*.py` file may import are: its own subpackage (`gps_denied_onboard.components.<X>.*`), `_types/*`, `_types.inference_errors`, `helpers/*`, `config`, `logging`, `fdr_client`, `clock`, `frame_source` (interface only)."
`replay_input` is not in this allow-list. The architecture rationale: cross-component DTOs reach consumers through `_types/*`, not through cross-cutting coordinator packages. The current placement makes c11 (an Adapter, Layer 4) structurally depend on `replay_input` (a coordinator, Layer 4) — a Layer 4 → Layer 4 cross-cutting edge that the layering table does not declare as allowed.
- Impact: The dependency is **intentional and documented** — AZ-838 task spec line 19 explicitly specifies `from gps_denied_onboard.replay_input.tlog_route import RouteSpec`, and the route_client docstring acknowledges the source (`Takes a gps_denied_onboard.replay_input.tlog_route.RouteSpec (produced by AZ-836 / C1)`). But "intentional" does not equal "compliant"; the architecture rule was not amended at decompose time, and the AZ-270 lint is too narrow to catch this case (see F3). The next task that imports a similarly-placed DTO will compound the drift.
- Suggestion: relocate `RouteSpec` (plus `RouteExtractionError` if exported as part of the cross-component surface) to `src/gps_denied_onboard/_types/route.py`. After the move, both `c11_tile_manager.route_client` and `replay_input.tlog_route` import the DTO from `_types`, which is in both modules' allow-lists. AZ-836's `extract_route_from_tlog` continues to live in `replay_input/`; AZ-838's `SatelliteProviderRouteClient` continues to live in `c11_tile_manager/`. The behavioral surface is unchanged. Estimated complexity: 2 SP (move + update imports + verify AZ-838/AZ-836 tests + module-layout.md update).
- Tasks: AZ-838 (primary — owns the violating import), AZ-836 (secondary — owns the DTO definition).
**F2: Module layout stale — cycle-3 additions unregistered (cycle-2 carry-over worsened)** (Medium / Architecture)
- Location: `_docs/02_document/module-layout.md`
- Description: cycle 3 introduced new package files that are not registered in the authoritative file-ownership map. The cycle-2 cumulative review (`98-102`) already flagged 6 unregistered cycle-2 additions (F1 there); none of those carry-overs have been resolved, and cycle 3 added more:
- **c11_tile_manager Internal list** (currently lists `satellite_provider_downloader.py` + `satellite_provider_uploader.py`): missing `_types.py`, `config.py`, `errors.py`, `idempotent_retry.py`, `signing_key.py`, `tile_downloader.py`, `tile_uploader.py`, **`route_client.py`** (cycle-3 NEW).
- **shared/replay_input file list** (currently lists `__init__.py`, `interface.py`, `tlog_video_adapter.py`, `auto_sync.py`, `tests/`): missing `errors.py` (cycle-2 carry), `tlog_ground_truth.py` (cycle-2 carry), **`tlog_route.py`** (cycle-3 NEW).
- **Carried over from cycle-2 review** (still unregistered): `replay_api/` package (7 files), `cli/render_map.py`, `cli/replay_api_entrypoint.py`, `helpers/gps_compare.py`, `helpers/accuracy_report.py`.
- Impact: `/implement` Step 4 (File Ownership) resolves a task's `Component` field against this file. Any future task touching the unregistered areas will hit the BLOCKING ownership check at Step 4 — the skill explicitly STOPs when the component isn't found and forbids guessing from prose. Cycle-3 batches 104109 happened to operate inside already-listed component directories (c11_tile_manager/**, replay_input/**) so the staleness did not block them, but the next task that needs a new component or extends `replay_api/` will block.
- Suggestion: cycle-3 Step 13 (Update Docs) should reconcile module-layout.md with on-disk reality. The minimum: refresh the c11_tile_manager Internal list, the shared/replay_input file list, and add the cycle-2 carry-over entries (replay_api Per-Component Mapping entry, cli additions, helpers additions, replay_input file list completion). Severity escalates to High if a fourth consecutive cycle leaves the file stale.
- Tasks: AZ-838, AZ-836 (primary, cycle-3 contributors); AZ-700, AZ-697, AZ-699, AZ-701 (secondary, cycle-2 carry).
**F3: `test_ac6_only_compose_root_imports_concrete_strategies` lint scope is narrower than module-layout.md rule 9** (Medium / Maintainability)
- Location: `tests/unit/test_az270_compose_root.py:194-219`
- Description: `module-layout.md` rule 9 documents `test_az270_compose_root.test_ac6_only_compose_root_imports_concrete_strategies` as the lint that "enforces this on every `components/**/*.py`". In practice the lint only checks for `gps_denied_onboard.components.<other_component>` import edges — it walks `components/**/*.py`, parses `ImportFrom` nodes, and flags only when `node.module.startswith("gps_denied_onboard.components.")` with a different leaf component. The full rule-9 allow-list (`_types/*`, `_types.inference_errors`, `helpers/*`, `config`, `logging`, `fdr_client`, `clock`, `frame_source` interface only) is NOT enforced. Imports from `replay_input`, `replay_api`, `runtime_root`, `cli/*`, `frame_source` non-interface modules, etc. all pass the lint silently. F1 is the concrete consequence: the c11 → replay_input import slipped through both code review and the AZ-270 lint.
- Impact: `module-layout.md` rule 9 is documented as enforced; in practice it is partially enforced, partially honor-system. Reviewers (human or AI) reading the rule-9 paragraph reasonably assume the lint covers it; the test name and docstring reinforce that. The asymmetry is a maintainability risk — the rule and its enforcement diverge silently.
- Suggestion: either expand `test_ac6_only_compose_root_imports_concrete_strategies` to enforce the full allow-list (one extra branch in the AST walker), or amend rule 9 to admit the additional imports the codebase actually relies on (with a documented rationale per module). The first is preferable — the rule's intent is structural, and lint coverage matters more than rule wording.
- Tasks: cross-cutting; surface in cycle-3 retrospective.
## Verdict Logic
- 0 Critical → no FAIL trigger from Critical
- 1 High (F1) → **FAIL trigger**
- 2 Medium (F2, F3) → not a verdict driver
- 0 Low
Result: **FAIL**`/implement` Step 14.5 gate stops. Per `implement/SKILL.md` Step 14.5 + the auto-fix matrix, F1 (High Architecture) **escalates** rather than auto-fixes; F2 + F3 are eligible for Medium-Style auto-fix on the matrix but the High-Architecture finding alone gates the whole report. Re-run requires user direction (Choose A/B/C in the implement skill's Step 14.5 escalation block).
## Phase-by-Phase Notes
### Phase 1 — Context Loading
Inputs read:
- Task specs: AZ-836 (`done/`), AZ-838 (`done/`), AZ-839 (`done/`), AZ-840 (`done/`), AZ-777 (refreshed spec; closure logged in `done/`); AZ-841 (`todo/`), AZ-842 (`todo/`); Epic AZ-835 (`todo/`).
- Batch reports: `batch_104_cycle3_report.md`, `batch_106_cycle3_report.md`, `batch_107_cycle3_report.md`, `batch_108_cycle3_report.md`, `batch_108b_cycle3_report.md`, `batch_109_cycle3_report.md`.
- Architecture / layout: `_docs/02_document/module-layout.md` (rule 9 + per-component sections + Layering Table); `_docs/02_document/architecture.md` (header read-through; full re-read deferred to per-finding evidence).
- Last cumulative review: `_docs/03_implementation/cumulative_review_batches_98-102_cycle2_report.md` (carry-over baseline).
- Restrictions / solution overview: not re-read (already covered in per-batch reviews).
- ADR directory: `_docs/02_document/adr/` does NOT exist; ADR compliance check skipped (logged in Phase 7 below).
### Phase 2 — Spec Compliance
Cross-batch promise points (per-batch ACs already verified in batch reports):
- **AZ-836 (`RouteSpec` + extractor) → AZ-838 (`SatelliteProviderRouteClient`)**: AZ-838 task spec line 19 explicitly specifies `from gps_denied_onboard.replay_input.tlog_route import RouteSpec`. Implementation matches. The DTO contract is not formally documented in `_docs/02_document/contracts/c11_tilemanager/` — Spec-Gap candidate, but downgrades because both producer and consumer are owned by the same Epic (AZ-835) and the Epic spec describes the DTO shape inline. Note (not a separate finding): if `RouteSpec` survives F1 remediation by moving to `_types/route.py`, a contract `_docs/02_document/contracts/shared_types/route.md` is the right home.
- **AZ-838 (`SatelliteProviderRouteClient`) → AZ-839 (C3 fixture, `populate_c6_from_route`)**: the fixture's driver imports `SatelliteProviderRouteClient` and uses `seed_route()`; signature matches AZ-838's `seed_route(spec, *, name=None) -> RouteSeedResult`. Cross-batch wiring sound.
- **AZ-839 (C3 fixture, `PopulatedC6Cache`) → AZ-840 (orchestrator)**: AZ-840's `_e2e_orchestrator.write_effective_replay_config` overlays `c6_tile_cache.root_dir` onto the operator YAML using the cache_root the C3 fixture chose. AZ-840 batch report documents the contract; per-test fixtures consume `PopulatedC6Cache` directly. Sound.
- **AZ-777 contract adaptation (b104) → satellite-provider real endpoints**: `tile_downloader.py` `_LIST_PATH` / `_GET_PATH` now point at the real endpoints (`/api/satellite/tiles/inventory` + `/tiles/{z}/{x}/{y}`). The leftover `_docs/_process_leftovers/2026-05-21_az777_complexity_override.md` 2026-05-21 addendum recorded this as the "largest single sub-deliverable of the refreshed Phase 1". Implementation matches.
No Spec-Gap findings.
### Phase 3 — Code Quality
- All cycle-3 production modules (`tlog_route.py`, `route_client.py`, expanded `errors.py`, modified `tile_downloader.py`) carry module + class + function docstrings consistent with the project pattern (cycle-2 baseline preserved).
- `route_client.py` is ~600 LOC with one class (`SatelliteProviderRouteClient`) plus one DTO (`RouteSeedResult`) plus module-level helpers. The class has 5 public methods (validate, seed_route, _post_route, _poll_route_status, _verify_inventory). Each method is single-responsibility. No method exceeds the 50-line / cyclomatic-10 thresholds enumerated in the skill's Phase 3 list (per code reading; not measured).
- `tlog_route.py` `extract_route_from_tlog` uses Douglas-Peucker for waypoint coarsening — correct choice per AZ-836 spec.
- Tests follow Arrange / Act / Assert per coderule (verified by sampling `test_tlog_route.py` and `test_e2e_orchestrator_unit.py`; no exhaustive enumeration).
No Code Quality findings.
### Phase 4 — Security Quick-Scan
- `route_client.py` HTTP client uses `httpx.Client` with `timeout` parameter (no infinite hangs), argv-style request construction (no shell), and bearer-token auth via the existing C11 plumbing. No secrets in source.
- `route_client.py` JSON request payload built via `json.dumps` on dataclass fields → no injection.
- `route_client.py` URL construction uses `_ROUTE_STATUS_PATH_TPL.format(id=...)` where `id` is a UUID returned by the server — type-bounded, no injection surface.
- `tile_downloader.py` modifications (b104) are confined to `_LIST_PATH` / `_GET_PATH` constants (per batch report); no new auth/parsing surface.
- `scripts/mint_dev_jwt.py` (new, b104): JWT minting tooling for dev/test JWT signing keys. Per file naming (`mint_dev_jwt.py`) and per the `.env.test.example` pairing this is intended for non-prod use; not reviewed line-by-line in this pass.
No Security findings.
### Phase 5 — Performance Scan
- `route_client._poll_route_status` polls with default 5 s interval, max 60 attempts (= 5 min ceiling) using `time.sleep`. Configurable via constructor. Standard polling, not a perf concern.
- `route_client._enumerate_route_tile_coords` walks the route's `regionSizeMeters × N waypoints` tile coverage locally; per AZ-838 batch report this is ~50100 tiles for the Derkachi route. O(N) over waypoints.
- `tlog_route.extract_route_from_tlog` runs Douglas-Peucker on the active GPS segment; per the unit test, completes in milliseconds for the Derkachi clip.
- `_operator_pre_flight.py` and `_e2e_orchestrator.py` run inside the test harness; performance is bounded by the wall-clock budget (15 min on Tier-2).
No Performance findings.
### Phase 6 — Cross-Task Consistency
- **Sequential Epic chain**: AZ-836 (C1) → AZ-838 (C2) → AZ-839 (C3) → AZ-840 (C4). Each batch's "Files changed" is disjoint at the production level (C1 in `replay_input/`, C2 in `c11_tile_manager/`, C3+C4 in `tests/e2e/replay/`). No conflicting patterns; the test layer wires the production chain together via the orchestrator.
- **Symbol uniqueness**: `RouteSpec`, `RouteExtractionError`, `extract_route_from_tlog`, `SatelliteProviderRouteClient`, `RouteSeedResult`, `SatelliteProviderRouteError`, `RouteValidationError`, `RouteTransientError`, `RouteTerminalFailureError`, `OrchestratorStep`, `OrchestrationFailure`, `OrchestrationReport`, `PopulatedC6Cache` — each defined exactly once across cycle-3 production + tests. No duplicates.
- **AZ-839 b108b fix**: the hot-fix renamed `tile_store_path = cache_root / "tile_store"``cache_root / "tiles"` to match `PostgresFilesystemStore` layout. Cross-task consistency preserved (the path AZ-840 reads now matches the path AZ-839 writes).
No Cross-Task Consistency findings.
### Phase 7 — Architecture Compliance
**Layer-direction analysis** (against module-layout.md "Allowed Dependencies" + rule 9):
- `replay_input/tlog_route.py` (Layer 4 cross-cutting coordinator): imports `_types.geo` (Layer 1), `helpers.gps_compare` (Layer 1), `helpers.wgs_converter` (Layer 1), and intra-package `replay_input.errors` + `replay_input.tlog_ground_truth`. All imports are downward (Layer 4 → Layer 1) or intra-package. Compliant.
- `c11_tile_manager/route_client.py` (Layer 4 component): imports own subpackage (`c11_tile_manager.errors`) + third-party (`httpx`) + **`replay_input.tlog_route.RouteSpec`** — see F1. The cross-cutting `replay_input` is not in c11's allow-list per rule 9. Architecture finding F1 (High).
- `c11_tile_manager/tile_downloader.py` (Layer 4 component): modifications confined to constants. No new cross-component edges introduced.
**Public API respect**:
- `c11_tile_manager.__init__.py` re-exports the new public surface (`RouteSeedResult`, `SatelliteProviderRouteClient`, plus the new error classes). Consumers calling `from gps_denied_onboard.components.c11_tile_manager import SatelliteProviderRouteClient` reach the package's public surface. ✅
- `replay_input.__init__.py` re-exports `RouteSpec`, `RouteExtractionError`, `extract_route_from_tlog`. ✅
- The F1 violation is a public API respect violation in the OPPOSITE direction: `c11.route_client` reaches into `replay_input.tlog_route` (a sub-module path) rather than the package's `__init__` re-export — but the deeper issue is that no direction of this import is rule-9-compliant.
**Cyclic-dependency check**:
- New edges this cycle: `c11_tile_manager.route_client → replay_input.tlog_route` (F1) + `c11_tile_manager.route_client → c11_tile_manager.errors` (intra-package).
- `replay_input.tlog_route → c11_tile_manager.*`? No (verified via grep). Acyclic.
- `replay_input/__init__.py` re-exports `RouteSpec` from `tlog_route`. No back-edge to c11.
- No new cycles introduced.
**Duplicate-symbol check**: see Phase 6 — no duplicates.
**Cross-cutting concerns not locally re-implemented**: none observed. Logging via `logging.getLogger(_COMPONENT)`, FDR via `fdr_client`, helpers consumed from canonical locations.
**ADR compliance**: `_docs/02_document/adr/` directory does **not exist**. The check is skipped per `code-review/SKILL.md` Phase 7 #6 ("If the directory does not exist or has only the index file, ADRs are skipped — log this skip in the report so the absence is visible"). Carry-over: `module-layout.md` references ADR-001 (monolith), ADR-002 (build-time exclusion), ADR-009 (interface-first DI), ADR-011 (replay-as-configuration) inline; these are documented in `architecture.md` but not as standalone ADR files. If the ADR directory is created in cycle-N (per a future retro action), this skip should retroactively re-evaluate the cycle-3 batches against any ADR whose `Evidence` overlaps the cycle-3 changed-file set.
**Single Architecture finding**: F1 — c11.route_client imports a non-allow-listed package. Documented but unaddressed at the architecture level.
## Notes
- **No `## Baseline Delta` section**: `_docs/02_document/architecture_compliance_baseline.md` was identified in the cycle-2 LESSONS entry (2026-05-20 architecture) and again in the cycle-2 cumulative review notes as a cycle-2 Step 6 (Decompose) prerequisite. The baseline file was NOT created in cycle 2 retrospective and was NOT created in cycle 3 either. Carry-over → cycle-3 retrospective. Without the baseline, "carried over / resolved / newly introduced" structural-violation accounting is not possible; F1 is therefore counted as "newly introduced this cycle" by inspection (`route_client.py` is a cycle-3-new file), and F2 is "carried over from cycle 2 with worsening" by inspection of the cycle-2 cumulative review F1.
- **Cumulative-review cadence drift continues**: `/implement` Step 14.5 says K=3 default. Cycle 3 has 6 completed batches (104, 106, 107, 108, 108b, 109) without a cumulative review until this make-up review. Two cumulative reviews were due (after 104+106+107, after 108+108b+109). Cycle-2 cumulative review (`98-102`) noted the same drift and flagged it for the cycle-2 retrospective; the action did not land. Recurring. Cycle-3 retrospective should pick it up — possible mechanism: a `cumulative_review_pending: true` marker in `_docs/_autodev_state.md` that the implement skill flips on at K-batch boundaries and clears only on review file write, surfacing in the autodev Status Summary footer.
- **AZ-270 lint coverage gap**: F3 documents the gap explicitly. Adjacent: the existing-code flow's Phase A Step 2 (Architecture Baseline Scan) feeds Step 4 (Code Testability Revision) and would also benefit from a tighter lint, since baseline-mode code-review uses the same `module-layout.md` rule 9 as enforcement input.
- **Suite docs (parent)**: `<workspace-root>/../docs` does not exist (probed during R1 reconciliation). No suite-level cross-reference applies to this review.
## Artifacts
- Verdict consumed by: `/implement` Step 14.5 gate (FAIL → STOP, escalate via Choose A/B/C — auto-fix not eligible for High Architecture).
- F1 carried forward to cycle-3 retrospective for action assignment; remediation candidate: 2-SP refactor task to relocate `RouteSpec` to `_types/route.py`.
- F2 carried forward to cycle-3 Step 13 (Update Docs) at minimum; severity escalation watch if the staleness persists into cycle 4.
- F3 carried forward to cycle-3 retrospective; remediation candidate: 1-SP test-update task to expand `test_ac6_only_compose_root_imports_concrete_strategies`.
- Architecture compliance baseline action: blocked across cycle 2 → cycle 3; surface in cycle-3 retrospective with explicit owner.
@@ -0,0 +1,15 @@
# ADR Impact — Run 02-az507-routespec-relocation
**Date**: 2026-05-23
## Scan result
`_docs/02_document/adr/` does not exist in this workspace. No `Status: Accepted` ADR files are in scope.
**Status**: `No ADRs in scope` — ADR Superseding Gate (refactor SKILL.md phase 2b.1) is satisfied trivially. No Violation rows. No Drift rows. No Aligned rows. Task creation may proceed.
## Rationale (per SKILL.md phase 2b.1 step 1)
> "If the directory does not exist or contains only the index, log `No ADRs in scope` to `RUN_DIR/analysis/adr_impact.md` and skip the rest of this gate."
This run logs the result and proceeds. The architectural rule that the run does enforce — `module-layout.md` rule 9 (AZ-507 cross-component contract surface) — is documented in `module-layout.md` and `architecture.md § Architecture Vision`, not in an ADR. The refactor strengthens that documented rule (by widening its lint enforcement in C03) rather than overturning it; no supersede path is needed.
@@ -0,0 +1,70 @@
# Refactoring Roadmap — Run 02-az507-routespec-relocation
**Date**: 2026-05-23
**Run**: `_docs/04_refactoring/02-az507-routespec-relocation/`
## Weak Points Assessment
| # | Location | Description | Impact | Proposed Solution |
|---|----------|-------------|--------|------------------|
| W1 | `src/gps_denied_onboard/components/c11_tile_manager/route_client.py:56` | Imports `RouteSpec` from `gps_denied_onboard.replay_input.tlog_route`, violating module-layout.md rule 9 (AZ-507 cross-component contract surface). | High — the next task that imports a similarly-placed DTO compounds the drift; current AZ-270 lint cannot catch it (W3). | C01: relocate the DTO to `_types/route.py`. |
| W2 | `_docs/02_document/module-layout.md` (c11_tile_manager Internal list, shared/replay_input file list) | Stale relative to on-disk reality — cycle-3 additions (`route_client.py`, `tlog_route.py`) and 7 cycle-2-era cycle-internal files are unregistered in their respective sections. | Medium — `/implement` Step 4 ownership check would BLOCK any future task touching unregistered areas. Severity escalates to High if a fourth consecutive cycle leaves it stale. | C02: refresh the c11_tile_manager Internal list, the shared/replay_input file list, and add `_types/route.py`. Defer cycle-2 carry-overs outside these sections. |
| W3 | `tests/unit/test_az270_compose_root.py:194-219` | The AC-6 lint walks `components/**/*.py` and only flags `components.<X> → components.<Y>` edges, not the full rule-9 allow-list. | Medium — rule-9 enforcement is partially honor-system; F1 is the concrete consequence. | C03: widen the AST walker to enforce the full allow-list. |
## Gap Analysis
| AC of this run | Current state | Target state |
|---|---|---|
| Rule-9 violations resolved | 1 (route_client → replay_input) | 0 |
| `module-layout.md` cycle-3 entries registered | Missing: `route_client.py`, `tlog_route.py`, plus 7 cycle-2-era omissions in two sections | All cycle-3 entries registered; 9 omissions in the c11 + replay_input sections fixed; new `_types/route.py` registered |
| AZ-270 lint scope = rule-9 scope | Narrow (one prefix only) | Full allow-list enforced |
## Phased Roadmap
This run is a single phase by intent — three small structural fixes that share the same root cause (rule-9 enforcement gap). Sequencing within the phase:
1. **C01 → first** (the structural fix). Lands `_types/route.py`, retires the violating import, keeps producer-side back-compat via re-export.
2. **C02 → second** (depends on C01 because the new `_types/route.py` entry needs the file to exist). Documentation refresh; no code touch.
3. **C03 → third** (depends on C01 because the widened lint must see a clean codebase). The new lint becomes a gate for any future PR.
| Phase | Items | Rationale |
|-------|-------|-----------|
| Phase 1 (this run) | C01, C02, C03 | All three resolve the same cumulative-review FAIL surface; bundling them ensures rule-9 enforcement is consistent across code, doc, and lint after the run. |
No Phase 2 or Phase 3. The cumulative review's "out of scope" items (cycle-2 doc carry-overs, the shared_types/route.md contract doc, `architecture_compliance_baseline.md`) belong to other tasks and are explicitly deferred — not folded into this roadmap.
## Hardening tracks
| Track | Recommendation | Rationale |
|-------|----------------|-----------|
| A — Technical Debt | Skip | The run *is* technical-debt remediation (closing a rule-9 enforcement gap). Adding a separate track would expand scope artificially. |
| B — Performance Optimization | Skip | No performance concern in scope. Relocation is identity-preserving; tests do not measure perf deltas. |
| C — Security Review | Skip | No security surface affected. `RouteSpec` carries waypoint coordinates only (already shipped to operator's tlog input); the move does not change any auth, transport, or input-validation path. |
| D — All of the above | Skip | See A/B/C. |
| E — None | **Selected (default for this run)** | All three changes are themselves the structural fix; orthogonal hardening would dilute scope. The cycle-3 retrospective list captures the broader debt items (cycle-2 carry-overs, baseline doc) for separate runs. |
This default is recorded explicitly so the user can override at the Phase 2 BLOCKING gate. If the user wants Track C (security audit on the route-extraction path) or Track A (folding the cycle-2 carry-overs into this run), the roadmap and task list will be regenerated.
## Selected items
All `Selected`:
- C01 — Relocate `RouteSpec` to `_types/route.py` (2 SP, low risk).
- C02 — Refresh `module-layout.md` cycle-3 entries (2 SP, low risk).
- C03 — Widen `test_az270_compose_root` lint to full rule-9 allow-list (2 SP, medium risk).
**Total**: 6 SP across 3 tasks. Each task is within the user-rule cap (≤ 5 SP per task; recommended 2-3).
## Applicability gate
| Recommendation | Status | Notes |
|---|---|---|
| C01 | Selected | No constraint mismatches; identity-preserving move; backward compat via re-export. |
| C02 | Selected | Doc-only; no test impact; scope-disciplined (cycle-2 carry-overs explicitly deferred). |
| C03 | Selected | Risk-flagged: widening may expose unrelated rule-9 violation. STOP-and-surface protocol applies if encountered. |
No `Rejected`, no `Experimental only`, no `Needs user decision`. The Phase 2 applicability gate passes for task creation.
## ADR-supersede gate
`No ADRs in scope` — see `adr_impact.md`. Gate satisfied; no Violation/Drift/Aligned rows.
@@ -0,0 +1,66 @@
# Research Findings — Run 02-az507-routespec-relocation
**Date**: 2026-05-23
**Mode**: guided
**Scope**: structural relocation of one DTO + module-layout doc refresh + lint widening
## Project Constraint Matrix (extracted)
| Constraint | Source | Statement |
|-----------|--------|-----------|
| AZ-507 cross-component contract surface | `_docs/02_document/architecture.md` § Architecture Vision; `_docs/02_document/module-layout.md` rule 9 | `components/<X>/*.py` may only import from `_types/*`, `_types.inference_errors`, `helpers/*`, `config`, `logging`, `fdr_client`, `clock`, `frame_source` (interface only), and its own subpackage. |
| Cross-component DTOs live in `_types/*` | `_types/geo.py`, `_types/tile.py`, `_types/inference.py`, `_types/calibration.py`, `_types/pose.py`, `_types/state.py`, `_types/nav.py`, `_types/manifests.py`, `_types/vpr.py`, `_types/matcher.py`, `_types/matching.py`, `_types/rerank.py`, `_types/thermal.py`, `_types/emitted.py`, `_types/fc.py` (15 existing DTO files) | The user-confirmed precedent. Every shared DTO sits under `_types/`. The pattern is explicit at the package level: `_types/__init__.py` is just a marker (`"""Cross-component DTOs (type-only stubs)."""`). |
| AZ-270 lint coverage | `_docs/02_document/module-layout.md` rule 9 (cites `test_az270_compose_root.test_ac6_only_compose_root_imports_concrete_strategies`) | Documented as enforced by the lint; F3 of cycle-3 cumulative review confirms the lint scope is narrower than the rule. |
| Frozen + slots DTO contract | AZ-355 AC-2 (cited in `_types/geo.py`) | DTOs that cross component boundaries must use `frozen=True, slots=True` to prevent mutation-through-aliasing. |
| Epic AZ-835 acceptance criteria | `_docs/02_tasks/done/AZ-835_e2e_real_flight_validation_epic.md` and child task specs (AZ-836..AZ-840) | The replay-flow behaviour must remain functionally identical after the refactor — RouteSpec waypoint extraction, satellite-provider POST, e2e orchestrator behaviour. |
| Backward-compat for test imports | tests/* (5 files import RouteSpec from `replay_input.tlog_route` directly) | Test code is allowed to use module-level paths; only `components/<X>/*.py` is gated by rule 9. Re-export from `tlog_route.py` keeps test imports stable, so updating tests is hygiene rather than correctness. |
## Current state analysis
`RouteSpec` is currently defined at `gps_denied_onboard.replay_input.tlog_route:54-79` and re-exported from `gps_denied_onboard.replay_input` (`__init__.py:34`). The producer (`extract_route_from_tlog` at `tlog_route.py:82`) lives alongside the DTO in the same module — that part is correct (the function is a `replay_input/` concern, not a `_types/` concern). The DTO itself is consumed across a component boundary (c11) which makes it a cross-component DTO by behaviour, but its file home does not reflect that. Every other cross-component DTO in the codebase lives under `_types/*`. The asymmetry is the F1 finding.
**Strengths to preserve**:
- `RouteSpec` is `frozen=True, slots=True` — already AZ-355-compliant; the move does not relax this.
- The extractor (`extract_route_from_tlog`) is correctly placed in `replay_input/` and uses the DTO via local import; this composition is preserved post-move.
- Tests cover both producer-side (14 unit tests) and consumer-side (full route_client AC suite plus integration). Phase 6 has a strong safety net.
**Weakness being corrected**:
- The DTO's file home does not match its semantic role (cross-component contract surface).
- The AZ-270 lint cannot detect the asymmetry because its check is narrower than the rule it claims to enforce.
## Alternative approaches considered
| # | Approach | Verdict | Why |
|---|----------|---------|-----|
| 1 | Move `RouteSpec` to `_types/route.py` (the recommended path) | **Selected** | Matches the user-confirmed precedent (`_types/inference.py`, `_types/tile.py`, etc.), satisfies rule 9 at c11's import site, identity-preserving (Python class object identity is preserved across imports), behaviour-neutral. |
| 2 | Move `RouteSpec` to `_types/replay.py` (group with other replay-related types if they appear later) | Rejected | No other replay-related shared DTOs exist today. Naming the file `route.py` mirrors the naming convention of other `_types/*.py` files (one DTO topic per file: `geo`, `tile`, `pose`, `nav`, etc.). Premature speculative grouping. |
| 3 | Move `RouteSpec` to `_types/contracts/route.py` (introduce a sub-namespace) | Rejected | `_types/` is currently flat. Introducing a sub-namespace for one DTO is over-engineering and would require updating the rule-9 allow-list (`_types/*` already matches recursively in the lint, but the documentation pattern would diverge). |
| 4 | Amend rule 9 to admit `replay_input.tlog_route` as an allowed import for components | Rejected (architecture-change path; option D in the original FAIL gate) | The user explicitly chose option B (mechanical refactor) over option D (rule amendment). Option 4 would weaken rule 9 and break the layering invariant, which is why the user rejected it. |
| 5 | Keep `RouteSpec` in `replay_input/tlog_route.py` and add a custom shim under `_types/` that re-exports it (no real move) | Rejected | Cosmetic — does not satisfy the underlying rule because the c11 import would still resolve to a `replay_input` module via the shim. The lint's correct widened form (C03) would still flag the original location as the canonical home. |
**Selected: Approach 1.** No library replacement, no SDK addition, no framework introduction. Therefore the `context7` per-mode verification gate (SKILL phase 2a) is not triggered — the gate fires only for replacement libraries/SDKs/frameworks/services. This is a structural code move within the existing codebase.
## API capability verification
**Not applicable.** The refactor introduces no new library, SDK, framework, or service. The "replacement" is the file home of a dataclass within the same Python package. No `context7` lookup is required (the gate is explicit: "for every replacement library/SDK/framework"). No MVE is required (no external API to verify). The project's pinned mode is unchanged because no mode exists to pin — it's a pure-Python dataclass relocation.
## Constraint-fit table
| Recommendation | Pinned mode/config | Constraints checked | API capability evidence | Mismatches/disqualifiers | Status |
|---|---|---|---|---|---|
| C01 — relocate `RouteSpec` to `_types/route.py` | N/A — Python dataclass, no library mode | AZ-507 rule 9, frozen+slots invariant (AZ-355), Epic AZ-835 ACs, test backward compat | N/A — no external API | None | Selected |
| C02 — refresh `module-layout.md` | N/A — documentation | AZ-507 rule 9 (the rule the doc enforces), scope discipline (cycle-2 carry-overs deferred to a separate task) | N/A | None | Selected |
| C03 — widen AZ-270 lint | N/A — internal AST walker, stdlib `ast` module | Rule-9 allow-list as the predicate; preserves existing AC-6 narrow check as a strict subset | N/A — stdlib only | Risk: may expose unrelated rule-9 violation (mitigated by STOP-and-surface protocol if encountered) | Selected |
All three changes are `Selected`. No `Rejected`, `Experimental only`, or `Needs user decision` rows — the applicability gate (Phase 2 BLOCKING) passes for all three.
## References
- `_docs/02_document/architecture.md` § Architecture Vision (AZ-507 cross-component contract surface)
- `_docs/02_document/module-layout.md` rule 9 (AZ-507 enforcement)
- `_docs/03_implementation/cumulative_review_batches_104-109_cycle3_report.md` (F1, F2, F3 — the source findings)
- `src/gps_denied_onboard/_types/geo.py` (canonical pattern for `_types/<topic>.py`)
- `src/gps_denied_onboard/_types/inference.py`, `_types/tile.py`, `_types/calibration.py` (additional precedent — user-cited examples)
- `tests/unit/test_az270_compose_root.py:194-219` (current narrow lint)
@@ -0,0 +1,100 @@
# Baseline Metrics — Run 02-az507-routespec-relocation
**Date**: 2026-05-23
**Run**: `_docs/04_refactoring/02-az507-routespec-relocation/`
**Mode**: guided
**Source**: cycle-3 cumulative review (`_docs/03_implementation/cumulative_review_batches_104-109_cycle3_report.md`) — F1, F2, F3
**Scope**: mechanical relocation of cross-component DTO + module-layout doc refresh + AZ-270 lint scope expansion
## Why a minimal baseline is appropriate for this run
The standard Phase-0 baseline metric grid (overall coverage, complexity, code smells, performance, dependencies, build time) is **not the right instrument** for this refactoring run. The work is a structural relocation of one frozen dataclass + a documentation refresh + a lint widening. Behaviour does not change; performance does not change; coverage does not change; dependency count does not change. A LOC-and-cyclomatic-complexity baseline would record near-zero deltas and would obscure the actual signal — whether the architectural rule (`module-layout.md` rule 9) is satisfied after the run.
What matters here, and is captured below, is:
1. The **structural baseline**: one rule-9 violation today (F1).
2. The **test baseline**: which tests cover the affected import paths and that they pass at HEAD (the safety net for Phase 4).
3. The **doc baseline**: which artifacts are stale (F2) and what "complete" looks like.
4. The **lint baseline**: what AZ-270 currently catches vs. what rule 9 says it should catch (F3).
Phases 5/6 verify that (a) the structural baseline goes from 1 → 0 rule-9 violations, (b) every test still passes, (c) the doc baseline is reconciled, and (d) the lint baseline is widened.
## 1. Structural baseline (rule-9 violations)
Source of truth: `_docs/02_document/module-layout.md` rule 9 (AZ-507 cross-component contract surface).
| # | File | Importer (Component) | Imported (Module) | Allow-listed for importer? |
|---|------|----------------------|-------------------|----------------------------|
| 1 | `src/gps_denied_onboard/components/c11_tile_manager/route_client.py:56` | `c11_tile_manager` | `gps_denied_onboard.replay_input.tlog_route` (`RouteSpec`) | **NO**`replay_input` not in c11's allow-list |
Search method: `rg "^from gps_denied_onboard\." src/gps_denied_onboard/components` filtered against the allow-list (`_types/*`, `_types.inference_errors`, `helpers/*`, `config`, `logging`, `fdr_client`, `clock`, `frame_source`).
**Target post-run**: 0 violations.
## 2. Test baseline (safety net for Phase 4)
Files that import `RouteSpec`, `SatelliteProviderRouteClient`, or `RouteSeedResult` (i.e. the symbols the relocation touches):
**Production source** (must be updated):
- `src/gps_denied_onboard/components/c11_tile_manager/route_client.py` — defines the import to be re-pointed
- `src/gps_denied_onboard/components/c11_tile_manager/__init__.py` — public API re-exports (no import path change)
- `src/gps_denied_onboard/replay_input/tlog_route.py` — defines `RouteSpec` today (will lose the local definition, gain an import + alias)
- `src/gps_denied_onboard/replay_input/__init__.py` — public API re-exports (will re-export from `_types.route` instead of `tlog_route`)
**Tests** (verify still pass; update imports only if they reach into the pre-relocation internal path `replay_input.tlog_route` directly):
- `tests/unit/replay_input/test_tlog_route.py` (14 tests; producer-side)
- `tests/unit/c11_tile_manager/test_route_client.py` (consumer-side unit tests)
- `tests/integration/c11_tile_manager/test_route_client_e2e.py` (integration)
- `tests/e2e/replay/conftest.py`
- `tests/e2e/replay/_operator_pre_flight.py`
- `tests/e2e/replay/test_operator_pre_flight_driver.py`
- `tests/e2e/replay/test_operator_pre_flight_integration.py`
- `tests/e2e/replay/_e2e_orchestrator.py`
- `tests/e2e/replay/test_e2e_orchestrator_unit.py`
- `tests/fixtures/derkachi_c6/seed_route.py`
**HEAD test status (asserted, not measured here)**: per cycle-3 batch reports 104, 106, 107, 108, 108b, 109, every committed batch ended with passing tests at the per-batch full run. The cumulative review (FAIL on F1) is a static-analysis verdict, not a test-run verdict — no test failures are attributable to F1 today. Phase 4 will run the affected test files first; Phase 6 runs the project's full test gate per the existing-code flow's test policy.
## 3. Doc baseline (F2 surface area)
`_docs/02_document/module-layout.md` is stale relative to on-disk reality. The following entries diverge today:
**c11_tile_manager — Internal list** lists 2 files (`satellite_provider_downloader.py`, `satellite_provider_uploader.py`); on-disk has 8 internal files plus `route_client.py` (cycle-3 NEW). Missing entries: `_types.py`, `config.py`, `errors.py`, `idempotent_retry.py`, `signing_key.py`, `tile_downloader.py`, `tile_uploader.py`, `route_client.py`.
**shared/replay_input file list** lists `__init__.py`, `interface.py`, `tlog_video_adapter.py`, `auto_sync.py`, `tests/`; on-disk adds `errors.py` (cycle-2 carry), `tlog_ground_truth.py` (cycle-2 carry), `tlog_route.py` (cycle-3 NEW). After the relocation, `tlog_route.py` stays (it still owns `extract_route_from_tlog`); `_types/route.py` is added.
**Cycle-2 carry-overs** still unaddressed (out of this run's scope unless the user expands it; surfaced in F2 of the cumulative review):
- `replay_api/` package (7 files; needs Per-Component Mapping entry).
- `cli/render_map.py`, `cli/replay_api_entrypoint.py` (need Shared section entries).
- `helpers/gps_compare.py`, `helpers/accuracy_report.py` (need Shared section entries).
**Target post-run** (in-scope): c11_tile_manager Internal list refreshed (route_client + the 7 long-standing internals); shared/replay_input file list refreshed (tlog_route + tlog_ground_truth + errors); new `_types/route.py` registered. Cycle-2 carry-overs are deferred to a separate doc-only task unless user expands scope.
## 4. Lint baseline (F3)
`tests/unit/test_az270_compose_root.py:194-219` (`test_ac6_only_compose_root_imports_concrete_strategies`) walks `src/gps_denied_onboard/components/**/*.py` and flags only edges whose `node.module` starts with `gps_denied_onboard.components.` AND whose leaf-component is not the importer's component. The full rule-9 allow-list (8 prefixes plus `frame_source` interface-only restriction) is NOT enforced.
**Concrete miss demonstrated by F1**: the c11 → replay_input edge passes this lint silently because `replay_input` is not under `components/`.
**Target post-run** (in-scope): expand the lint to enforce rule 9's full allow-list. Remaining design choices (whether to allow `frame_source` non-interface modules, whether to treat `runtime_root` exception case-sensitively) are addressed in C03's task spec.
## 5. Functionality inventory
This run touches no public-feature surfaces. The DTO `RouteSpec` continues to be re-exported from `gps_denied_onboard.replay_input` (the public package), so consumers using `from gps_denied_onboard.replay_input import RouteSpec` see no change. Consumers reaching into `replay_input.tlog_route` directly (an internal-module path) will need their imports updated — this set is small and lives entirely under `tests/`. There is no operator-facing CLI / endpoint / config schema change.
## Self-verification
- [x] RUN_DIR created with auto-incremented prefix (`02-az507-routespec-relocation`; previous: `01-testability-refactoring`)
- [x] All metric categories reasoned about — standard categories noted N/A with reason; relevant baselines (structural, test, doc, lint) captured
- [x] Functionality inventory complete (no functionality change in scope)
- [x] Measurements are reproducible (rg + glob commands documented)
## BLOCKING — Phase 0 gate
Awaiting user confirmation of:
1. The minimal-baseline rationale (no LOC/coverage/perf metrics for a mechanical relocation).
2. The structural / test / doc / lint baseline above as the "before" state Phase 6 will compare against.
3. The scope decision: cycle-2 doc carry-overs are **OUT** of this run unless explicitly expanded.
If confirmed, Phase 1 produces `RUN_DIR/list-of-changes.md` (already drafted alongside this file as the guided-mode input).
@@ -0,0 +1,59 @@
# Logical Flow Analysis — Run 02-az507-routespec-relocation
**Date**: 2026-05-23
**Scope**: data path of `RouteSpec` from producer (replay_input) to consumer (c11_tile_manager) and back to operator-pre-flight orchestration
## Documented flow (from architecture / Epic AZ-835 spec)
```
tlog (binary) ──► extract_route_from_tlog (replay_input/tlog_route)
└─► RouteSpec (frozen dataclass, immutable)
└─► SatelliteProviderRouteClient.seed_region (components/c11_tile_manager/route_client)
└─► RouteSeedResult ─► satellite-provider POST /api/satellite/route
─► (HTTP success) tile coverage primed
```
## Trace through code (HEAD)
| Step | File | Behaviour |
|------|------|-----------|
| 1. Produce | `replay_input/tlog_route.py:166` (`extract_route_from_tlog` return) | Constructs `RouteSpec(waypoints, suggested_region_size_meters, source_tlog, source_segment, total_distance_meters)` |
| 2. Hold | (consumer-side variable) | `RouteSpec` instance is `frozen=True, slots=True` — cannot be mutated by either side |
| 3. Consume | `components/c11_tile_manager/route_client.py:56` import | Reads `route.waypoints`, `route.suggested_region_size_meters` to build the satellite-provider POST body |
| 4. Validate | `components/c11_tile_manager/route_client.py` (RouteValidationError path) | Validates `route` shape against c11's RouteValidationError preconditions; pure read access |
| 5. Carry | `tests/e2e/replay/_operator_pre_flight.py:72` import | Operator-pre-flight harness threads the same RouteSpec through the e2e flow |
## Identity & equality semantics post-relocation
The relocation moves the **definition** of `RouteSpec` from `gps_denied_onboard.replay_input.tlog_route` to `gps_denied_onboard._types.route`. After the move:
- Python's class identity is preserved across imports — `gps_denied_onboard.replay_input.tlog_route.RouteSpec is gps_denied_onboard._types.route.RouteSpec``True` (the same class object is bound at two names).
- `dataclasses.is_dataclass(...)`, `isinstance(...)`, `__eq__`, and `__hash__` are unchanged because they derive from the class object, not from the import path.
- `frozen=True, slots=True` semantics are preserved (no per-instance dict, no setattr after construction).
- The `__module__` attribute of the class becomes `gps_denied_onboard._types.route` (not `gps_denied_onboard.replay_input.tlog_route`). This is observable via:
- `pickle` (module path is encoded; pickled objects from before the move would fail to unpickle after — but no production code path pickles `RouteSpec`; checked: no `pickle.dumps(route)` or equivalent in src/ or tests/)
- `repr(RouteSpec)` (shows `<class 'gps_denied_onboard._types.route.RouteSpec'>` post-move)
- `RouteSpec.__module__` (changes — but no test inspects this; checked: no `__module__` assertion in tests/)
## Contradictions / data-loss / wasted-work checks
Per Phase 1 step 1c categories:
- **Fixed-size vs dynamic-size assumptions**: N/A — `RouteSpec.waypoints` is `tuple[tuple[float, float], ...]`, length is data-driven (1 to `max_waypoints`). No fixed-size pad/truncate path.
- **Loop scoping**: N/A — RouteSpec is a leaf DTO, no internal loop semantics.
- **Wasted computation**: N/A — relocation does not change call sites.
- **Silent data loss**: N/A — relocation is a name-only change at the type level; the values stored in `RouteSpec` instances are unchanged.
- **Doc drift**: confirmed by F2 of cumulative review — `module-layout.md` diverges from on-disk reality. Remediation is in scope as C02.
## Cross-component edge analysis (rule-9 audit, post-relocation)
| Edge | Importer | Imported | Allow-listed? | Status |
|------|----------|----------|---------------|--------|
| Pre-relocation | `c11_tile_manager/route_client.py` | `replay_input.tlog_route.RouteSpec` | NO | violation (F1) |
| Post-relocation | `c11_tile_manager/route_client.py` | `_types.route.RouteSpec` | YES (`_types/*` is in c11's allow-list) | compliant |
No other rule-9 cross-component edge becomes a violation as a side effect of this move. The producer side (`replay_input/tlog_route.py``_types/route.py`) is a coordinator → DTO edge, which is always allowed (DTOs have no allow-list restriction; they're consumed everywhere).
## Conclusion
The relocation is a pure structural change with no behavioural, performance, or contract-shape side effects. The only observable difference is `RouteSpec.__module__`, which is not asserted on by any code path. Phase 4 execution can proceed as a mechanical move; Phase 6 verification is satisfied if all tests pass and the rule-9 audit reports zero violations.
@@ -0,0 +1,71 @@
# List of Changes
**Run**: 02-az507-routespec-relocation
**Mode**: guided
**Source**: `_docs/03_implementation/cumulative_review_batches_104-109_cycle3_report.md` (cycle-3 cumulative review, FAIL verdict, F1 + F2 + F3)
**Date**: 2026-05-23
## Summary
Resolve the cycle-3 cumulative review's FAIL verdict by (a) relocating the `RouteSpec` DTO to its rule-9-compliant home in `_types/route.py`, (b) refreshing the stale `module-layout.md` cycle-3 file inventory, and (c) widening the AZ-270 lint to enforce the full rule-9 allow-list rather than only `components → components` edges. The work is mechanical — no behaviour, no performance, no contract shape changes.
## Changes
### C01: Relocate `RouteSpec` DTO from `replay_input/tlog_route.py` to `_types/route.py`
- **File(s)**:
- **NEW**: `src/gps_denied_onboard/_types/route.py` — owns the `RouteSpec` dataclass definition (frozen, slots, with full docstring carried over verbatim).
- **MOD**: `src/gps_denied_onboard/replay_input/tlog_route.py` — remove the local `RouteSpec` class definition (lines 5479); add `from gps_denied_onboard._types.route import RouteSpec` near the existing `_types.geo` import; keep `RouteSpec` in `__all__` so `from replay_input.tlog_route import RouteSpec` continues to resolve (test code uses this path; it's a re-export, not a violation).
- **MOD**: `src/gps_denied_onboard/replay_input/__init__.py` — change line 34 to import `RouteSpec` from `gps_denied_onboard._types.route` directly (canonical), keep importing `RouteExtractionError` and `extract_route_from_tlog` from `tlog_route` (they stay there).
- **MOD**: `src/gps_denied_onboard/components/c11_tile_manager/route_client.py:56` — change to `from gps_denied_onboard._types.route import RouteSpec` (the actual rule-9 fix). Also update the docstring snippet at file-top that reads `Takes a gps_denied_onboard.replay_input.tlog_route.RouteSpec``Takes a gps_denied_onboard._types.route.RouteSpec`.
- **MOD (optional, hygiene)**: test imports — 5 test files (`tests/unit/replay_input/test_tlog_route.py:46`, `tests/unit/c11_tile_manager/test_route_client.py:49`, `tests/e2e/replay/_operator_pre_flight.py:72`, `tests/e2e/replay/test_e2e_orchestrator_unit.py:37`, `tests/e2e/replay/test_operator_pre_flight_driver.py:61`) currently import `RouteSpec` from `replay_input.tlog_route`. They continue to work via the re-export (see above). Updating them to import from `_types.route` is hygiene, not correctness; recommended but not blocking. The integration test `tests/integration/c11_tile_manager/test_route_client_e2e.py:26` imports `extract_route_from_tlog` (not `RouteSpec`) — no change needed. The lazy import in `tests/e2e/replay/conftest.py:406` and the CLI fixture `tests/fixtures/derkachi_c6/seed_route.py:80` import `extract_route_from_tlog` only — no change needed.
- **Problem**: `components/c11_tile_manager/route_client.py:56` imports `RouteSpec` from `gps_denied_onboard.replay_input.tlog_route`. Per `module-layout.md` rule 9, `components/<X>/*.py` may only import from a finite allow-list (`_types/*`, `_types.inference_errors`, `helpers/*`, `config`, `logging`, `fdr_client`, `clock`, `frame_source` interface only). `replay_input` is not in this list — it's a Layer-4 cross-cutting coordinator, and Layer-4 → Layer-4 cross-cutting edges are not declared as allowed in the layering table. The import was committed in batch 107 (AZ-838); the AZ-270 lint did not catch it because the lint walks only `components → components` edges (see C03).
- **Change**: Move the DTO definition to `_types/route.py`, where it sits among the other shared DTOs (`_types/geo.py`, `_types/tile.py`, `_types/inference.py`, etc.). Update the c11 import to point at the new location. Producer-side (`replay_input/tlog_route.py`) re-imports the DTO so its own return type, `__all__`, and existing test imports keep working — that's a coordinator importing from `_types/*`, a flow that is always allowed for non-`components/<X>` modules.
- **Rationale**: `_types/*` is the architecturally designated home for cross-component DTOs (per AZ-507; per `_docs/02_document/architecture.md` `## Architecture Vision`). Every other shared DTO already lives there. Putting `RouteSpec` there makes the c11 → DTO edge a `components/<X>``_types/*` edge, which is allow-listed. This matches the pattern for `_types/inference.py`, `_types/tile.py`, `_types/calibration.py`, `_types/pose.py`, etc. — the user-confirmed precedent.
- **Constraint Fit**:
- AZ-507 cross-component contract surface — satisfied (the violating edge becomes compliant).
- Epic AZ-835 acceptance criteria — preserved; behaviour unchanged.
- `RouteSpec` immutability (`frozen=True, slots=True`) — preserved verbatim.
- Backward compatibility for producer-side test imports (`from replay_input.tlog_route import RouteSpec`) — preserved via re-export.
- No public-API / CLI / endpoint shape change — confirmed in baseline_metrics §5.
- **Risk**: low (mechanical move; identity-preserving; logical-flow analysis confirms no observable side effects beyond `__module__`, which no code asserts on).
- **Dependencies**: None.
### C02: Refresh `module-layout.md` to register cycle-3 additions + new `_types/route.py`
- **File(s)**: `_docs/02_document/module-layout.md` (single file).
- **Problem**: The cumulative review's F2 surfaces that `module-layout.md` is stale. Cycle-2 carry-overs are still unaddressed; cycle 3 added more entries that are not registered. Specifically:
- **c11_tile_manager Internal list** is missing `_types.py`, `config.py`, `errors.py`, `idempotent_retry.py`, `signing_key.py`, `tile_downloader.py`, `tile_uploader.py`, **`route_client.py`** (cycle-3 NEW from batch 107).
- **shared/replay_input file list** is missing `errors.py` (cycle-2 carry), `tlog_ground_truth.py` (cycle-2 carry), **`tlog_route.py`** (cycle-3 NEW from batch 106).
- **`_types/` file list** does not yet include `route.py` (added in C01).
- **Change**: Append the missing entries to bring `module-layout.md` in sync with on-disk reality for the c11_tile_manager, replay_input, and `_types/` sections. Add `_types/route.py` to the `_types/` section with a one-line description (consistent with how the other `_types/*.py` files are listed). Cycle-2 carry-overs *outside* these three sections (`replay_api/`, `cli/render_map.py`, `cli/replay_api_entrypoint.py`, `helpers/gps_compare.py`, `helpers/accuracy_report.py`) are NOT in this run's scope — they remain on the cycle-3 retrospective list and should be addressed in a follow-up doc task that is independent of the architectural fix here.
- **Rationale**: `/implement` Step 4 (File Ownership) treats `module-layout.md` as authoritative; staleness there is a BLOCKING gate when a future task touches an unregistered area. F2 is currently Medium; the cumulative review notes severity escalates to High if a fourth consecutive cycle leaves it stale. Resolving the cycle-3 portion now keeps the fix scoped to the same surface as C01 + the route_client + tlog_route additions that triggered the cumulative review in the first place.
- **Constraint Fit**:
- `module-layout.md` rule 9 — strengthened (the document now reflects what `_types/*` actually owns).
- No code or behavioural change.
- Scope discipline — does NOT pull in cycle-2 carry-overs outside the run's three sections; they are deferred to a separate task.
- **Risk**: low (doc-only; reviewable by diff; no test impact).
- **Dependencies**: C01 (the `_types/route.py` entry depends on the file existing).
### C03: Expand `test_az270_compose_root.test_ac6_only_compose_root_imports_concrete_strategies` to enforce the full rule-9 allow-list
- **File(s)**:
- **MOD**: `tests/unit/test_az270_compose_root.py:194-219` — replace the current narrow check (`node.module.startswith("gps_denied_onboard.components.")` with a different leaf-component) with a check that walks `components/**/*.py`, parses each `ImportFrom`, and for any `node.module` starting with `gps_denied_onboard.` asserts the importable target is in the rule-9 allow-list (i.e. matches one of: `gps_denied_onboard.components.<own-component>.*`, `gps_denied_onboard._types.*`, `gps_denied_onboard._types.inference_errors`, `gps_denied_onboard.helpers.*`, `gps_denied_onboard.config`, `gps_denied_onboard.logging`, `gps_denied_onboard.fdr_client`, `gps_denied_onboard.clock`, `gps_denied_onboard.frame_source` interface-only).
- **MOD (test docstring)**: update the test's docstring to cite the full rule-9 paragraph, not just AC-6 of AZ-270.
- **Problem**: F3 of the cumulative review documents that `module-layout.md` rule 9 is described as "enforced by `test_az270_compose_root.test_ac6_only_compose_root_imports_concrete_strategies`", but the lint actually checks only one of the eight allow-listed prefixes — only the `gps_denied_onboard.components.<other_component>` exclusion. Imports from `replay_input`, `replay_api`, `runtime_root`, `cli/*`, and `frame_source` non-interface modules pass silently. F1 is the concrete consequence; the next task that imports from a similarly-placed module would compound the drift.
- **Change**: Widen the AST walker to a single-branch decision: "is the imported module rooted in `gps_denied_onboard.` AND not in the rule-9 allow-list (parameterised against the importer's own component for the `components.<own>.*` clause)? → fail with a message that names the offending edge and the rule." The existing error message format (compose-root test failure) is preserved; only the predicate is widened.
- **Rationale**: Lint coverage matters more than rule wording. F3 surfaces a maintainability risk: the rule and its enforcement diverge silently. Closing the gap forecloses the F1 class of regressions at lint time, not at cumulative-review time.
- **Constraint Fit**:
- `module-layout.md` rule 9 — enforced as documented.
- Existing AZ-270 AC-6 — preserved (the new check is a strict superset of the old check).
- No behaviour change in production code.
- Self-check: running the widened lint at HEAD (before C01 lands) reproduces F1 as a lint failure; running it at the C01 + C02 tip reproduces zero violations. This is the test the run hinges on.
- **Risk**: medium — the widening will catch any *other* in-flight rule-9 violation hiding in the codebase, which could surface a second remediation task. If the widened lint exposes an unrelated violation, the implement skill should STOP and surface it for a scope decision rather than auto-bundle. Risk is reduced by the fact that rule-9 audits during code review have not flagged anything else.
- **Dependencies**: C01 must land first (otherwise the widened lint fails on the very edge C01 fixes; running tests in the order C01 → C03 means C03 sees a clean baseline). C02 ordering is independent.
## Out of scope for this run
- **Cycle-2 module-layout carry-overs** outside the three sections C02 touches (`replay_api/` Per-Component Mapping, `cli/render_map.py`, `cli/replay_api_entrypoint.py`, `helpers/gps_compare.py`, `helpers/accuracy_report.py`) — recorded as cycle-3 retrospective follow-up; needs a separate doc task with its own AZ ID.
- **Contract documentation for `RouteSpec` at `_docs/02_document/contracts/shared_types/route.md`** — the cumulative review noted this as a possible Spec-Gap follow-up. It is a documentation addition, not a refactor. Defer to whoever owns the Spec-Gap workflow; do not bundle here.
- **`architecture_compliance_baseline.md`** — separate cycle-2 retrospective action that has been outstanding for two cycles; recorded as such in the cumulative review's footer note. Out of this run's scope.
+3 -3
View File
@@ -6,9 +6,9 @@ step: 10
name: Implement
status: in_progress
sub_step:
phase: 7
name: batch-loop
detail: ""
phase: 3
name: refactor-safety-net
detail: "02-az507; Phase 2 confirmed; ready for Phase 3 safety-net check in fresh session"
retry_count: 0
cycle: 3
tracker: jira
@@ -1,7 +1,7 @@
# D-CROSS-CVE-1 opencv-python pin deferred — gtsam/numpy ABI block
**Recorded**: 2026-05-11T02:55+03:00 (Europe/Kyiv)
**Last replay attempt**: 2026-05-23T13:14+03:00 (Europe/Kyiv) — replay re-checked
**Last replay attempt**: 2026-05-23T13:44+03:00 (Europe/Kyiv) — replay re-checked
at start of next `/autodev` invocation. PyPI re-queried via
`python3 -m pip index versions gtsam`: only `gtsam 4.2` is published.
Replay condition (numpy>=2 stable wheels) still NOT met. Leftover remains open.
+655
View File
@@ -0,0 +1,655 @@
"""E2E orchestrator for the AZ-835 7-step pipeline (AZ-840 / Epic AZ-835 C4).
Wraps the AZ-699 verdict-report writing path with the AZ-839 C3
fixture's `PopulatedC6Cache` so a single Tier-2 test can run from
``(tlog, video, calibration)`` to a horizontal-error report without
operator hand-curation between steps. The 7-step Epic narrative
(``_docs/02_tasks/todo/AZ-840_e2e_orchestrator_test.md``):
1. Active flight cut + tlog/video sync handled by ``gps-denied-replay``
``--auto-trim`` (AZ-405 / AZ-698) inside the airborne binary.
2. On-fly frame + IMU extraction same binary's per-frame loop.
3. Auto-create route done by the C3 fixture
(``operator_pre_flight_setup`` calls ``extract_route_from_tlog``).
4. POST route to satellite-provider C3 fixture (AZ-838
``SatelliteProviderRouteClient.seed_route``).
5. Build FAISS index C3 fixture (AZ-322 ``DescriptorBatcher``).
6. Run gps-denied airborne pipeline this module's
``_run_replay_subprocess`` invokes ``gps-denied-replay`` against
the populated cache.
7. Get GPS fixes, check vs tlog GPS this module's
``_load_ground_truth`` + ``horizontal_error_distribution`` +
``render_report`` writes the verdict markdown.
The C3 fixture mutates ``c6_tile_cache.root_dir`` to point at a
``tmp_path_factory.mktemp`` value (AZ-839 batch 108b). The static
operator YAML at ``GPS_DENIED_OPERATOR_CONFIG_PATH`` cannot know
that path. ``write_effective_replay_config`` reads the static YAML,
overlays the ``c6_tile_cache.root_dir`` override, writes the merged
result to a tmp file, and returns the path the airborne binary
will load via ``--config``. This keeps a single source of truth
for the cache_root override across the in-memory C3 fixture path
and the subprocess airborne path.
Public surface re-exported from this module:
* :class:`OrchestratorStep` failure-step labels per AC-5 ("fails
LOUD with a clear error pointing at the failing step").
* :class:`OrchestrationFailure` wraps the underlying exception
with the step that produced it.
* :class:`OrchestrationReport` return value of
:func:`run_e2e_orchestration` (verdict, distribution, paths,
wall-clock measurements per AC-4).
* :func:`write_effective_replay_config` small helper for the
config merge step.
* :func:`run_e2e_orchestration` the AC-1 entry point.
"""
from __future__ import annotations
import datetime
import json
import logging
import subprocess
import time
from collections.abc import Callable, Mapping
from dataclasses import dataclass
from enum import Enum
from pathlib import Path
from typing import Any
import yaml
from gps_denied_onboard.helpers.accuracy_report import (
AC3_GATE_PCT,
AC3_GATE_THRESHOLD_M,
ReportContext,
render_report,
verdict_passes_ac3,
)
from gps_denied_onboard.helpers.gps_compare import (
GroundTruthRow,
HorizontalErrorDistribution,
horizontal_error_distribution,
)
from gps_denied_onboard.replay_input import load_tlog_ground_truth
from tests.e2e.replay._operator_pre_flight import PopulatedC6Cache
__all__ = [
"OrchestrationFailure",
"OrchestrationReport",
"OrchestratorStep",
"read_calibration_acquisition_method",
"run_e2e_orchestration",
"write_effective_replay_config",
]
# Replay-subprocess wall-clock cap for the Derkachi clip per AZ-840
# AC-4 (15 min soft target). Exposed as a default that the integration
# test can override; the unit tests rely on the contract that the
# runner argument is a free callable.
_DEFAULT_MAX_SECONDS: float = 900.0
_LOGGER = logging.getLogger("tests.e2e.replay.e2e_orchestrator")
class OrchestratorStep(str, Enum):
"""Labels for the 7-step pipeline used by :class:`OrchestrationFailure`.
AC-5: every failure that reaches the test surface must name the
step that produced it. The string values are stable so test
assertions and log readers can match on them.
"""
VALIDATE_INPUTS = "validate_inputs"
WRITE_EFFECTIVE_CONFIG = "write_effective_config"
AIRBORNE_PIPELINE = "airborne_pipeline"
PARSE_EMISSIONS = "parse_emissions"
LOAD_GROUND_TRUTH = "load_ground_truth"
COMPUTE_DISTRIBUTION = "compute_distribution"
RENDER_REPORT = "render_report"
class OrchestrationFailure(RuntimeError):
"""Failure inside one of the 7 orchestration steps (AC-5).
The :attr:`step` attribute names the failing step; the message
embeds it as the prefix so plain log-readers see the failure
location without inspecting the exception object.
"""
def __init__(self, step: OrchestratorStep, message: str) -> None:
super().__init__(f"[{step.value}] {message}")
self.step = step
@dataclass(frozen=True, slots=True)
class OrchestrationReport:
"""Return value of :func:`run_e2e_orchestration`.
Attributes:
verdict_passed: ``True`` iff the run met the AZ-696 epic
AC-3 gate (>= AC3_GATE_PCT% within AC3_GATE_THRESHOLD_M m).
distribution: Computed horizontal-error distribution.
report_path: Markdown report written under ``report_dir``.
emissions_count: Total estimator-output records consumed.
wall_clock_s: Wall-clock seconds for the orchestration run
(excludes the C3 fixture setup; covers steps 1-2-6-7).
replay_subprocess_seconds: Wall-clock seconds the airborne
replay subprocess took. Always <= ``wall_clock_s``.
"""
verdict_passed: bool
distribution: HorizontalErrorDistribution
report_path: Path
emissions_count: int
wall_clock_s: float
replay_subprocess_seconds: float
def read_calibration_acquisition_method(calibration_path: Path) -> str:
"""Return the AZ-702 ``acquisition_method`` field, or ``"unknown"``.
Mirrors ``test_derkachi_real_tlog._read_calibration_acquisition_method``
so the AZ-840 verdict report can name the calibration provenance
in its failure message (AZ-699 AC-3). Pure helper; the report
writer needs the string, not the JSON.
"""
try:
data = json.loads(calibration_path.read_text())
except (OSError, json.JSONDecodeError):
return "unknown"
method = data.get("acquisition_method")
if isinstance(method, str) and method:
return method
return "unknown"
def write_effective_replay_config(
*,
base_config_path: Path,
cache_root: Path,
output_path: Path,
) -> Path:
"""Merge cache_root override into the static operator YAML.
Reads ``base_config_path`` as YAML, sets the
``c6_tile_cache.root_dir`` to ``cache_root`` (forcing the
FAISS index path to fall back to ``<cache_root>/descriptor.index``),
and writes the merged document to ``output_path`` as YAML.
The merge is field-level: every other block in the base YAML is
preserved verbatim. This keeps a single source of truth for the
operator config the test harness only contributes the dynamic
cache_root.
Returns:
The ``output_path`` argument, for ergonomic chaining.
Raises:
OrchestrationFailure (step=WRITE_EFFECTIVE_CONFIG): Base YAML
unreadable, malformed, or not a top-level mapping.
"""
try:
base_text = base_config_path.read_text()
except OSError as exc:
raise OrchestrationFailure(
OrchestratorStep.WRITE_EFFECTIVE_CONFIG,
f"cannot read base config at {base_config_path}: {exc!r}",
) from exc
try:
base_data = yaml.safe_load(base_text) or {}
except yaml.YAMLError as exc:
raise OrchestrationFailure(
OrchestratorStep.WRITE_EFFECTIVE_CONFIG,
f"base config YAML at {base_config_path} is malformed: {exc!r}",
) from exc
if not isinstance(base_data, dict):
raise OrchestrationFailure(
OrchestratorStep.WRITE_EFFECTIVE_CONFIG,
f"base config YAML at {base_config_path} must be a mapping; "
f"got {type(base_data).__name__}",
)
c6_block_raw = base_data.get("c6_tile_cache")
c6_block = dict(c6_block_raw) if isinstance(c6_block_raw, dict) else {}
c6_block["root_dir"] = str(cache_root)
c6_block["faiss_index_path"] = ""
base_data["c6_tile_cache"] = c6_block
try:
output_path.parent.mkdir(parents=True, exist_ok=True)
output_path.write_text(
yaml.safe_dump(base_data, sort_keys=True, default_flow_style=False)
)
except OSError as exc:
raise OrchestrationFailure(
OrchestratorStep.WRITE_EFFECTIVE_CONFIG,
f"cannot write effective config at {output_path}: {exc!r}",
) from exc
return output_path
def run_e2e_orchestration(
*,
populated_cache: PopulatedC6Cache,
base_config_path: Path,
tlog_path: Path,
video_path: Path,
calibration_path: Path,
signing_key_path: Path,
replay_binary: Path,
output_path: Path,
report_dir: Path,
effective_config_path: Path,
run_date_utc: str | None = None,
runner: Callable[..., subprocess.CompletedProcess[str]] = subprocess.run,
subprocess_env: Mapping[str, str] | None = None,
max_seconds: float = _DEFAULT_MAX_SECONDS,
logger: logging.Logger | None = None,
) -> OrchestrationReport:
"""Run AZ-835 steps 1-7 against the AZ-839 populated cache.
Steps 3-5 are the responsibility of ``populated_cache`` (the
AZ-839 C3 fixture); this function covers 1-2-6 (the airborne
replay subprocess) and 7 (verdict report). The C3 fixture and
this function share the cache_root via
:func:`write_effective_replay_config` so the airborne binary
reads the same FAISS index the fixture wrote (AC-3).
Args:
populated_cache: C3 fixture output (AZ-839). Carries
``cache_root``, ``faiss_index_path``, and the route
spec the test pipeline produced.
base_config_path: Static operator config YAML
(``GPS_DENIED_OPERATOR_CONFIG_PATH``). Must register
``c6_tile_cache``, ``c10_provisioning``, ``c2_vpr``,
``c4_pose``, and ``c5_state`` blocks for the airborne
binary to compose the replay graph.
tlog_path: ArduPilot binary tlog the test consumes.
video_path: Flight video file the test consumes.
calibration_path: Camera calibration JSON (AZ-702
factory-sheet for Derkachi).
signing_key_path: MAVLink signing-key file. Replay protocol
Invariant 11 required even for the noop transport.
replay_binary: ``gps-denied-replay`` console-script path.
output_path: Where the airborne binary writes JSONL
estimator emissions.
report_dir: Directory the verdict markdown is written to.
effective_config_path: Where the cache_root-merged YAML is
written. The path is passed to the airborne binary via
``--config``.
run_date_utc: ISO-8601 date for the report filename and
header. Defaults to today UTC.
runner: ``subprocess.run`` by default; tests inject a fake
that emits a synthetic JSONL output.
subprocess_env: Optional environment overlay for the
replay subprocess. ``None`` means ``os.environ``.
max_seconds: Hard wall-clock cap for the airborne replay
subprocess. The orchestrator times out the runner via
its ``timeout`` kwarg; an exceeded budget surfaces as
``OrchestrationFailure(step=AIRBORNE_PIPELINE)``.
logger: Optional logger. Defaults to the module logger.
Returns:
:class:`OrchestrationReport` on success. The verdict can
be PASS or FAIL AC-2 mandates the report exists either
way.
Raises:
OrchestrationFailure: Any of the 7 steps failed. The
``step`` attribute names the failing step.
"""
log = logger or _LOGGER
started = time.monotonic()
effective_run_date = run_date_utc or (
datetime.datetime.now(datetime.timezone.utc).date().isoformat()
)
_validate_inputs(
base_config_path=base_config_path,
tlog_path=tlog_path,
video_path=video_path,
calibration_path=calibration_path,
signing_key_path=signing_key_path,
replay_binary=replay_binary,
report_dir=report_dir,
)
write_effective_replay_config(
base_config_path=base_config_path,
cache_root=populated_cache.cache_root,
output_path=effective_config_path,
)
replay_subprocess_seconds = _run_replay_subprocess(
replay_binary=replay_binary,
video_path=video_path,
tlog_path=tlog_path,
output_path=output_path,
calibration_path=calibration_path,
config_path=effective_config_path,
signing_key_path=signing_key_path,
max_seconds=max_seconds,
runner=runner,
env=subprocess_env,
logger=log,
)
emissions = _parse_jsonl(output_path)
ground_truth = _load_ground_truth(tlog_path)
distribution = _compute_distribution(emissions, ground_truth)
context = ReportContext(
run_date_utc=effective_run_date,
tlog_path=tlog_path,
video_path=video_path,
calibration_acquisition_method=read_calibration_acquisition_method(
calibration_path
),
clip_duration_s=(
ground_truth[-1].t_s - ground_truth[0].t_s
if ground_truth
else 0.0
),
emissions_count=len(emissions),
)
verdict_passed = verdict_passes_ac3(distribution)
report_path = _render_and_write_report(
distribution=distribution,
context=context,
passed=verdict_passed,
report_dir=report_dir,
)
log.info(
"e2e_orchestrator: report written",
extra={
"kind": "e2e_orchestrator.report_written",
"kv": {
"report_path": str(report_path),
"verdict_passed": verdict_passed,
"share_within_threshold_pct": (
distribution.threshold_hit_share.get(
AC3_GATE_THRESHOLD_M, 0.0
)
* 100.0
),
"ac3_gate_pct": AC3_GATE_PCT,
"emissions_count": len(emissions),
"ground_truth_pairings": distribution.count,
},
},
)
wall_clock_s = max(0.0, time.monotonic() - started)
return OrchestrationReport(
verdict_passed=verdict_passed,
distribution=distribution,
report_path=report_path,
emissions_count=len(emissions),
wall_clock_s=wall_clock_s,
replay_subprocess_seconds=replay_subprocess_seconds,
)
def _validate_inputs(
*,
base_config_path: Path,
tlog_path: Path,
video_path: Path,
calibration_path: Path,
signing_key_path: Path,
replay_binary: Path,
report_dir: Path,
) -> None:
"""Fail fast on missing inputs (AC-5 — surface the failing step early)."""
file_inputs: tuple[tuple[str, Path], ...] = (
("base_config_path", base_config_path),
("tlog_path", tlog_path),
("video_path", video_path),
("calibration_path", calibration_path),
("signing_key_path", signing_key_path),
("replay_binary", replay_binary),
)
for label, path in file_inputs:
if not path.is_file():
raise OrchestrationFailure(
OrchestratorStep.VALIDATE_INPUTS,
f"{label} is not a file: {path}",
)
try:
report_dir.mkdir(parents=True, exist_ok=True)
except OSError as exc:
raise OrchestrationFailure(
OrchestratorStep.VALIDATE_INPUTS,
f"report_dir {report_dir} cannot be created: {exc!r}",
) from exc
def _run_replay_subprocess(
*,
replay_binary: Path,
video_path: Path,
tlog_path: Path,
output_path: Path,
calibration_path: Path,
config_path: Path,
signing_key_path: Path,
max_seconds: float,
runner: Callable[..., subprocess.CompletedProcess[str]],
env: Mapping[str, str] | None,
logger: logging.Logger,
) -> float:
"""Invoke gps-denied-replay with --auto-trim; return wall-clock seconds.
Wraps :class:`subprocess.run` so unit tests can inject a fake
runner. ``--auto-trim`` is always enabled here the
orchestrator owns the AZ-405 / AZ-698 sync path (AZ-840 step 1).
Raises:
OrchestrationFailure (step=AIRBORNE_PIPELINE): Non-zero exit,
timeout, or runner-level OSError.
"""
argv = [
str(replay_binary),
"--video",
str(video_path),
"--tlog",
str(tlog_path),
"--output",
str(output_path),
"--camera-calibration",
str(calibration_path),
"--config",
str(config_path),
"--mavlink-signing-key",
str(signing_key_path),
"--pace",
"asap",
"--auto-trim",
]
started = time.monotonic()
try:
completed = runner(
argv,
capture_output=True,
text=True,
timeout=max_seconds,
env=dict(env) if env is not None else None,
)
except subprocess.TimeoutExpired as exc:
raise OrchestrationFailure(
OrchestratorStep.AIRBORNE_PIPELINE,
f"gps-denied-replay timed out after {max_seconds:.0f} s",
) from exc
except OSError as exc:
raise OrchestrationFailure(
OrchestratorStep.AIRBORNE_PIPELINE,
f"cannot launch gps-denied-replay at {replay_binary}: {exc!r}",
) from exc
elapsed_s = max(0.0, time.monotonic() - started)
if completed.returncode != 0:
raise OrchestrationFailure(
OrchestratorStep.AIRBORNE_PIPELINE,
f"gps-denied-replay exited {completed.returncode}\n"
f"stdout:\n{completed.stdout}\nstderr:\n{completed.stderr}",
)
logger.info(
"e2e_orchestrator: replay subprocess complete",
extra={
"kind": "e2e_orchestrator.replay_subprocess",
"kv": {
"elapsed_s": elapsed_s,
"max_seconds": max_seconds,
},
},
)
return elapsed_s
def _parse_jsonl(path: Path) -> list[dict[str, Any]]:
"""Read one JSON record per non-blank line.
Raises:
OrchestrationFailure (step=PARSE_EMISSIONS): Output file
missing, unreadable, has zero records, or contains a
malformed line.
"""
if not path.is_file():
raise OrchestrationFailure(
OrchestratorStep.PARSE_EMISSIONS,
f"replay output JSONL not found: {path}",
)
try:
text = path.read_text()
except OSError as exc:
raise OrchestrationFailure(
OrchestratorStep.PARSE_EMISSIONS,
f"replay output JSONL unreadable at {path}: {exc!r}",
) from exc
rows: list[dict[str, Any]] = []
for line_idx, line in enumerate(text.splitlines(), start=1):
if not line.strip():
continue
try:
row = json.loads(line)
except json.JSONDecodeError as exc:
raise OrchestrationFailure(
OrchestratorStep.PARSE_EMISSIONS,
f"malformed JSON at line {line_idx} of {path}: {exc.msg}",
) from exc
if not isinstance(row, dict):
raise OrchestrationFailure(
OrchestratorStep.PARSE_EMISSIONS,
f"line {line_idx} of {path} is not a JSON object: {row!r}",
)
rows.append(row)
if not rows:
raise OrchestrationFailure(
OrchestratorStep.PARSE_EMISSIONS,
f"replay output JSONL at {path} has zero records — pipeline "
"produced no estimator emissions",
)
return rows
def _load_ground_truth(tlog_path: Path) -> list[GroundTruthRow]:
"""Extract WGS84 ground truth from the binary tlog.
Raises:
OrchestrationFailure (step=LOAD_GROUND_TRUTH): Loader
error or empty record list.
"""
try:
series = load_tlog_ground_truth(tlog_path).records
except Exception as exc:
raise OrchestrationFailure(
OrchestratorStep.LOAD_GROUND_TRUTH,
f"load_tlog_ground_truth({tlog_path}) failed: {exc!r}",
) from exc
rows: list[GroundTruthRow] = [
GroundTruthRow(
t_s=fix.ts_ns / 1e9,
lat_deg=fix.lat_deg,
lon_deg=fix.lon_deg,
alt_m=fix.alt_m,
)
for fix in series
]
if not rows:
raise OrchestrationFailure(
OrchestratorStep.LOAD_GROUND_TRUTH,
f"tlog ground truth at {tlog_path} has zero rows",
)
return rows
def _compute_distribution(
emissions: list[dict[str, Any]],
ground_truth: list[GroundTruthRow],
) -> HorizontalErrorDistribution:
"""Compute the horizontal-error distribution.
Raises:
OrchestrationFailure (step=COMPUTE_DISTRIBUTION): Helper
error or zero ground-truth pairings (every emission
fell outside the GT time window).
"""
try:
distribution = horizontal_error_distribution(emissions, ground_truth)
except Exception as exc:
raise OrchestrationFailure(
OrchestratorStep.COMPUTE_DISTRIBUTION,
f"horizontal_error_distribution failed: {exc!r}",
) from exc
if distribution.count == 0:
raise OrchestrationFailure(
OrchestratorStep.COMPUTE_DISTRIBUTION,
"no emissions paired with ground truth — JSONL timestamps "
"outside the tlog GPS window?",
)
return distribution
def _render_and_write_report(
*,
distribution: HorizontalErrorDistribution,
context: ReportContext,
passed: bool,
report_dir: Path,
) -> Path:
"""Render the verdict markdown and write it to ``report_dir``.
Raises:
OrchestrationFailure (step=RENDER_REPORT): Render or write
failure; ``report_dir`` was already created by
:func:`_validate_inputs`.
"""
try:
report_text = render_report(distribution, context, passed=passed)
except Exception as exc:
raise OrchestrationFailure(
OrchestratorStep.RENDER_REPORT,
f"render_report failed: {exc!r}",
) from exc
report_path = (
report_dir / f"real_flight_validation_{context.run_date_utc}.md"
)
try:
report_path.write_text(report_text)
except OSError as exc:
raise OrchestrationFailure(
OrchestratorStep.RENDER_REPORT,
f"cannot write report at {report_path}: {exc!r}",
) from exc
return report_path
+474
View File
@@ -0,0 +1,474 @@
"""Operator pre-flight cache assembly driver (AZ-839 / Epic AZ-835 C3).
Replaces the placeholder ``operator_pre_flight_setup`` fixture stub at
``conftest.py`` lines 293-310 with a real driver that wires together
the four operator-side production components:
1. **C1 / AZ-836 RouteSpec** already extracted by the caller via
:func:`gps_denied_onboard.replay_input.tlog_route.extract_route_from_tlog`
and handed in as :paramref:`populate_c6_from_route.route_spec`.
2. **C2 / AZ-838 SatelliteProviderRouteClient** POSTs the route to
satellite-provider, polls ``mapsReady``.
3. **C11 / AZ-316 + AZ-777 Phase 1 HttpTileDownloader** pulls the
seeded tiles from satellite-provider into C6 over a bbox derived
from the route waypoints.
4. **C10 / AZ-322 DescriptorBatcher** rebuilds the FAISS HNSW
descriptor index over the populated C6 cache (NetVLAD backbone per
``c2_vpr/config.py:67``).
The descriptor index sidecar coherence (AZ-306 triple-consistency:
``.index`` + ``.sha256`` + ``.meta.json``) is verified by re-loading
the index after rebuild via the caller-supplied
``descriptor_index_factory``; any tampering surfaces as
:class:`IndexUnavailableError`.
Public surface re-exported from this module:
* :class:`PopulatedC6Cache` frozen dataclass returned on success.
* :func:`populate_c6_from_route` the driver function.
Cleanup-on-failure removes any FAISS sidecar files produced inside the
driver if any later step raises. Tile-store rows written by C11 are
NOT deleted (the C6 store owns its own rollback semantics leaving
those rows enables idempotent re-runs via the C11 download journal).
"""
from __future__ import annotations
import logging
import time
from collections.abc import Callable
from dataclasses import dataclass
from pathlib import Path
from typing import Any
from uuid import UUID, uuid4
from gps_denied_onboard.components.c10_provisioning.descriptor_batcher import (
BatcherOutcome,
CorpusFilter,
DescriptorBatcher,
)
from gps_denied_onboard.components.c11_tile_manager import (
DownloadOutcome,
DownloadRequest,
HttpTileDownloader,
SectorClassification,
)
from gps_denied_onboard.components.c11_tile_manager.errors import (
RouteTerminalFailureError,
RouteTransientError,
RouteValidationError,
)
from gps_denied_onboard.components.c11_tile_manager.route_client import (
SatelliteProviderRouteClient,
)
from gps_denied_onboard.components.c6_tile_cache.errors import (
IndexUnavailableError,
)
from gps_denied_onboard.components.c6_tile_cache.faiss_descriptor_index import (
META_SUFFIX,
)
from gps_denied_onboard.helpers.sha256_sidecar import SIDECAR_SUFFIX
from gps_denied_onboard.replay_input.tlog_route import RouteSpec
__all__ = [
"PopulatedC6Cache",
"populate_c6_from_route",
]
# Mirror C11's existing schedule so the fixture does not introduce a
# parallel retry budget. AC-5 ties our per-attempt cap (3) to the
# documented pause cadence; the schedule itself lives in the
# downloader module and is re-exported here so tests can override.
_DEFAULT_RETRY_SCHEDULE_S: tuple[float, ...] = (1.0, 2.0, 4.0, 8.0)
_DEFAULT_MAX_RETRY_ATTEMPTS: int = 3
_DEFAULT_ZOOM_LEVEL: int = 18
_DEFAULT_SECTOR_CLASS: SectorClassification = SectorClassification.ACTIVE_CONFLICT
# Per-degree-of-latitude metres at WGS84 mean radius — reused from C11
# route-coverage enumeration (route_client._enumerate_route_tile_coords).
# Re-stated here so the driver does not depend on a private constant.
_METERS_PER_DEGREE_LAT: float = 111_320.0
_LOGGER = logging.getLogger(
"tests.e2e.replay.operator_pre_flight"
)
@dataclass(frozen=True, slots=True)
class PopulatedC6Cache:
"""Output of :func:`populate_c6_from_route`.
Mirrors the public-surface dataclass documented in the AZ-839 spec.
All paths point at on-disk artifacts that survive the fixture's
``session`` scope (when mounted on the named docker volume the
e2e-runner declares); ``elapsed_seconds`` powers the AC-1 / AC-2
perf budget assertions.
"""
cache_root: Path
tile_store_path: Path
faiss_index_path: Path
faiss_sidecar_sha256_path: Path
faiss_sidecar_meta_path: Path
route_spec: RouteSpec
tile_count: int
elapsed_seconds: float
def populate_c6_from_route(
*,
route_spec: RouteSpec,
route_client: SatelliteProviderRouteClient,
tile_downloader: HttpTileDownloader,
descriptor_batcher: DescriptorBatcher,
descriptor_index_factory: Callable[[], Any],
cache_root: Path,
tile_store_path: Path,
faiss_index_path: Path,
flight_id: UUID | None = None,
sector_class: SectorClassification = _DEFAULT_SECTOR_CLASS,
zoom_level: int = _DEFAULT_ZOOM_LEVEL,
region_size_meters: float | None = None,
retry_schedule_s: tuple[float, ...] = _DEFAULT_RETRY_SCHEDULE_S,
max_retry_attempts: int = _DEFAULT_MAX_RETRY_ATTEMPTS,
sleep: Callable[[float], None] = time.sleep,
monotonic: Callable[[], float] = time.monotonic,
logger: logging.Logger | None = None,
) -> PopulatedC6Cache:
"""Drive the full C1+C2+C11+C10 pipeline end-to-end.
Args:
route_spec: Coarsened route from AZ-836's
:func:`extract_route_from_tlog`. The caller chooses the
tlog (typically a session-scoped fixture); this driver is
tlog-agnostic.
route_client: Configured C2 client. Built from env vars by the
production fixture; injected as a stub by unit tests.
tile_downloader: Configured C11 downloader. Same wiring rules.
descriptor_batcher: Configured C10 batcher; its rebuild path
owns the on-disk FAISS write (atomic via
:class:`Sha256Sidecar`).
descriptor_index_factory: Zero-arg callable that constructs a
FRESH descriptor index pointed at ``faiss_index_path``.
Production passes
``lambda: FaissDescriptorIndex.from_config(config)``; the
constructor auto-loads via
:meth:`FaissDescriptorIndex._load`, raising
:class:`IndexUnavailableError` on triple-consistency
failure (AC-3 / AC-6 verification).
cache_root: Root directory mounted on the named docker volume
that survives across pytest sessions.
tile_store_path: Where C6's :class:`TileStore` writes JPEG
blobs. Carried on the result for downstream tests.
faiss_index_path: Final ``.index`` blob path. Sidecars live at
``<faiss_index_path>.sha256`` + ``<faiss_index_path>.meta.json``.
flight_id: C11 download-journal key; defaults to a fresh UUID
so two fixture sessions never collide their journals.
sector_class: C11 / C6 sector classification. Defaults to
``ACTIVE_CONFLICT`` Derkachi is an active-conflict
corridor; ``STABLE_REAR`` is for non-Ukraine clips.
zoom_level: Single Web-Mercator zoom level the fixture
populates. AZ-839 spec defaults to 18 to match
``seed_route.py`` ergonomics; tests override for speed.
region_size_meters: Per-waypoint coverage radius in metres.
``None`` falls back to
:attr:`RouteSpec.suggested_region_size_meters`.
retry_schedule_s: Pause cadence between transient retries.
Defaults to C11's documented ``_DEFAULT_BACKOFF_SCHEDULE_S``.
max_retry_attempts: Total :meth:`seed_route` attempts on
transient error before propagating (AC-5 final
attempt's exception is propagated unchanged).
sleep: Test override for the retry pause; production passes
:func:`time.sleep`.
monotonic: Test override for elapsed-time measurement.
logger: Optional logger. Defaults to the module logger.
Returns:
:class:`PopulatedC6Cache` on success.
Raises:
RouteValidationError: Pre-emptive validation or HTTP 4xx
propagated unchanged with original cause (AC-4).
RouteTerminalFailureError: ``mapsReady`` never reached or
terminal failure status propagated unchanged (AC-4).
RouteTransientError: 5xx / network / timeout AFTER all retry
attempts have been exhausted (AC-5).
IndexUnavailableError: Triple-consistency check failed after
rebuild sidecars are corrupt / mismatched (AC-3 / AC-6).
RuntimeError: C11 ``download_tiles_for_area`` returned a
non-success outcome OR C10 ``populate_descriptors``
returned :attr:`BatcherOutcome.FAILURE`.
Notes:
Cleanup behaviour (AC-7) if any step raises after the
rebuild has begun writing sidecar files, the partial files
(.index, .sha256, .meta.json) are removed before the
exception propagates so a re-run starts from a clean slate.
Tile-store rows are NOT deleted on cleanup; the C11 download
journal owns idempotent re-run semantics.
"""
log = logger or _LOGGER
if max_retry_attempts < 1:
raise ValueError(
f"max_retry_attempts must be >= 1; got {max_retry_attempts}"
)
started_monotonic = monotonic()
effective_flight_id = flight_id or uuid4()
effective_region_size = float(
region_size_meters
if region_size_meters is not None
else route_spec.suggested_region_size_meters
)
if effective_region_size <= 0:
raise ValueError(
f"region_size_meters must be > 0; got {effective_region_size}"
)
if not route_spec.waypoints:
raise ValueError("route_spec.waypoints must be non-empty")
sidecar_paths = (
faiss_index_path,
Path(str(faiss_index_path) + SIDECAR_SUFFIX),
Path(str(faiss_index_path) + META_SUFFIX),
)
pre_existing_sidecar = {p: p.is_file() for p in sidecar_paths}
try:
seed_result = _seed_route_with_retry(
route_client=route_client,
spec=route_spec,
region_size_meters=effective_region_size,
zoom_level=zoom_level,
retry_schedule_s=retry_schedule_s,
max_retry_attempts=max_retry_attempts,
sleep=sleep,
logger=log,
)
bbox = _route_bbox(
waypoints=route_spec.waypoints,
region_size_meters=effective_region_size,
)
download_request = DownloadRequest(
flight_id=effective_flight_id,
bbox_min_lat=bbox[0],
bbox_min_lon=bbox[1],
bbox_max_lat=bbox[2],
bbox_max_lon=bbox[3],
zoom_levels=(int(zoom_level),),
sector_class=sector_class,
cache_root=cache_root,
)
download_report = tile_downloader.download_tiles_for_area(download_request)
if download_report.outcome not in {
DownloadOutcome.SUCCESS,
DownloadOutcome.IDEMPOTENT_NO_OP,
}:
raise RuntimeError(
"C11 download_tiles_for_area returned non-success outcome "
f"{download_report.outcome.value!r}; "
f"requested={download_report.tiles_requested} "
f"downloaded={download_report.tiles_downloaded} "
f"rejected_resolution={download_report.tiles_rejected_resolution} "
f"rejected_freshness={download_report.tiles_rejected_freshness}"
)
log.info(
"operator_pre_flight: tiles populated",
extra={
"kind": "operator_pre_flight.tiles_populated",
"kv": {
"route_id": str(seed_result.route_id),
"seeded_tile_count": seed_result.tile_count,
"downloaded_tiles": download_report.tiles_downloaded,
"request_hash": download_report.request_hash,
},
},
)
corpus_filter = CorpusFilter(
bbox=bbox,
zoom_levels=(int(zoom_level),),
sector_class=sector_class.value,
)
batcher_report = descriptor_batcher.populate_descriptors(corpus_filter)
if batcher_report.outcome is not BatcherOutcome.SUCCESS:
raise RuntimeError(
"C10 populate_descriptors returned FAILURE: "
f"{batcher_report.failure_reason}"
)
verifier_index = descriptor_index_factory()
log.debug(
"operator_pre_flight: sidecar coherence verified",
extra={
"kind": "operator_pre_flight.sidecar_verified",
"kv": {
"faiss_index_path": str(faiss_index_path),
"verifier_type": type(verifier_index).__name__,
},
},
)
elapsed_seconds = max(0.0, monotonic() - started_monotonic)
return PopulatedC6Cache(
cache_root=cache_root,
tile_store_path=tile_store_path,
faiss_index_path=faiss_index_path,
faiss_sidecar_sha256_path=sidecar_paths[1],
faiss_sidecar_meta_path=sidecar_paths[2],
route_spec=route_spec,
tile_count=batcher_report.tiles_consumed,
elapsed_seconds=elapsed_seconds,
)
except BaseException:
_cleanup_partial_sidecars(
sidecar_paths=sidecar_paths,
pre_existing=pre_existing_sidecar,
logger=log,
)
raise
def _seed_route_with_retry(
*,
route_client: SatelliteProviderRouteClient,
spec: RouteSpec,
region_size_meters: float,
zoom_level: int,
retry_schedule_s: tuple[float, ...],
max_retry_attempts: int,
sleep: Callable[[float], None],
logger: logging.Logger,
) -> Any:
"""Call ``seed_route`` with bounded transient retries (AC-5).
Validation / terminal-failure errors propagate IMMEDIATELY with
their original cause (AC-4 no silent swallow). Only
:class:`RouteTransientError` triggers the retry ladder; the final
attempt's exception is re-raised unchanged so the caller sees
the actual transient signal that exhausted the budget.
"""
last_transient: RouteTransientError | None = None
for attempt in range(1, max_retry_attempts + 1):
try:
return route_client.seed_route(
spec,
region_size_meters=region_size_meters,
zoom_level=zoom_level,
)
except (RouteValidationError, RouteTerminalFailureError):
raise
except RouteTransientError as exc:
last_transient = exc
logger.warning(
"operator_pre_flight: route seed transient failure",
extra={
"kind": "operator_pre_flight.route_seed.transient",
"kv": {
"attempt": attempt,
"max_attempts": max_retry_attempts,
"exc": repr(exc),
},
},
)
if attempt >= max_retry_attempts:
raise
pause_s = retry_schedule_s[
min(attempt - 1, len(retry_schedule_s) - 1)
]
sleep(pause_s)
# Defensive — the loop body always returns or raises before this.
if last_transient is not None:
raise last_transient
raise RuntimeError(
"operator_pre_flight: seed_route loop exited without result"
)
def _route_bbox(
*,
waypoints: tuple[tuple[float, float], ...],
region_size_meters: float,
) -> tuple[float, float, float, float]:
"""Bounding box of every waypoint's coverage square.
Mirrors the local enumeration in
:func:`gps_denied_onboard.components.c11_tile_manager.route_client._enumerate_route_tile_coords`
by taking ``region_size_meters`` as the per-waypoint square edge
and unioning the lat/lon extents. The result is a single bbox
that the C11 :meth:`HttpTileDownloader.download_tiles_for_area`
Protocol consumes; C11 then runs the standard slippy-map
enumeration over that bbox at the requested zoom level.
Returns:
``(min_lat, min_lon, max_lat, max_lon)`` matching
:class:`DownloadRequest`'s field order.
"""
import math
half = region_size_meters / 2.0
min_lat = float("inf")
max_lat = float("-inf")
min_lon = float("inf")
max_lon = float("-inf")
for lat_deg, lon_deg in waypoints:
lat_delta_deg = half / _METERS_PER_DEGREE_LAT
cos_lat = math.cos(math.radians(lat_deg))
if cos_lat <= 1e-9:
cos_lat = 1e-9
lon_delta_deg = half / (_METERS_PER_DEGREE_LAT * cos_lat)
min_lat = min(min_lat, lat_deg - lat_delta_deg)
max_lat = max(max_lat, lat_deg + lat_delta_deg)
min_lon = min(min_lon, lon_deg - lon_delta_deg)
max_lon = max(max_lon, lon_deg + lon_delta_deg)
if min_lat >= max_lat or min_lon >= max_lon:
raise ValueError(
"operator_pre_flight: degenerate bbox from route waypoints "
f"(min_lat={min_lat}, max_lat={max_lat}, "
f"min_lon={min_lon}, max_lon={max_lon})"
)
return (min_lat, min_lon, max_lat, max_lon)
def _cleanup_partial_sidecars(
*,
sidecar_paths: tuple[Path, ...],
pre_existing: dict[Path, bool],
logger: logging.Logger,
) -> None:
"""Remove sidecar files this driver may have produced.
Only files that did NOT exist when the driver started AND now
exist are removed pre-existing files (a warm cache from a prior
run) are preserved. OS errors during cleanup are logged but do
not mask the original exception.
"""
for path in sidecar_paths:
if pre_existing.get(path, False):
continue
if not path.exists():
continue
try:
path.unlink()
logger.warning(
"operator_pre_flight: cleaned up partial sidecar",
extra={
"kind": "operator_pre_flight.cleanup.removed",
"kv": {"path": str(path)},
},
)
except OSError as exc:
logger.error(
"operator_pre_flight: cleanup unlink failed",
extra={
"kind": "operator_pre_flight.cleanup.failed",
"kv": {"path": str(path), "exc": repr(exc)},
},
)
+393 -16
View File
@@ -15,6 +15,7 @@ import shutil
import subprocess
import sys
from collections.abc import Iterator
import dataclasses
from dataclasses import dataclass
from pathlib import Path
from typing import Any
@@ -290,21 +291,397 @@ def replay_runner(derkachi_replay_inputs: DerkachiReplayInputs) -> Any:
return _run
@pytest.fixture
def operator_pre_flight_setup(tmp_path: Path) -> Iterator[Path]:
"""Operator C12 pre-flight rehearsal stub.
@pytest.fixture(scope="session")
def operator_pre_flight_setup(
derkachi_replay_inputs: DerkachiReplayInputs,
tmp_path_factory: pytest.TempPathFactory,
) -> Iterator["PopulatedC6Cache"]:
"""Operator C12 pre-flight: real C1+C2+C11+C10 wiring (AZ-839 / Epic AZ-835 C3).
Per AZ-404's spec this fixture should run the operator's full
C10/C11/C12 pre-flight against a ``mock-suite-sat-service``
fixture and yield the populated cache directory. The current
``tests/fixtures/mock-suite-sat-service`` is a bootstrap stub
(only ``GET /healthz`` per its README) the full D-PROJ-2
contract is not implemented. Until that ships, AC-8 (operator
workflow rehearsal) is skipped at the test level; this fixture
yields a placeholder cache directory so test bodies that
request it can fail-fast with a documented reason rather than a
surprise ImportError.
Replaces the AZ-404 placeholder. Drives the operator-side
pre-flight pipeline end-to-end and yields the populated cache
so AC-8 (operator workflow rehearsal) and the AZ-840 e2e
orchestrator test can consume it.
Skip gates (in evaluation order first match wins):
* ``RUN_REPLAY_E2E`` not in ``{1, true, yes, on}`` same as
every other heavy test in this directory.
* ``SATELLITE_PROVIDER_URL`` / ``SATELLITE_PROVIDER_API_KEY``
missing the C2 route client cannot reach the parent suite.
* ``BUILD_FAISS_INDEX`` not ON the C6 ``DescriptorIndex``
runtime is gated by the env flag (``storage_factory.py``).
* ``GPS_DENIED_OPERATOR_CONFIG_PATH`` missing OR points at a
config that does not register every component this fixture
needs (c6_tile_cache + c7_inference + c10_provisioning +
c11_tile_manager) the wiring would fail later with a less
readable error.
See ``tests/e2e/replay/_operator_pre_flight.py::populate_c6_from_route``
for the algorithm; this fixture only owns the
runtime-factory wiring + skip gates.
"""
cache_dir = tmp_path / "operator_cache"
cache_dir.mkdir()
yield cache_dir
skip_reason = _operator_pre_flight_skip_reason()
if skip_reason is not None:
pytest.skip(skip_reason)
yield from _build_operator_pre_flight_cache(
derkachi_replay_inputs=derkachi_replay_inputs,
tmp_path_factory=tmp_path_factory,
)
def _operator_pre_flight_skip_reason() -> str | None:
"""Return a SKIP reason string when env / build flags are not viable.
Centralised so the conditions stay testable + documented in one
place. Returns ``None`` when the fixture is allowed to run.
"""
if os.environ.get("RUN_REPLAY_E2E", "").strip().lower() not in {
"1",
"true",
"yes",
"on",
}:
return "AZ-839 operator_pre_flight_setup gated by RUN_REPLAY_E2E=1"
sp_url = os.environ.get("SATELLITE_PROVIDER_URL", "").strip()
sp_jwt = os.environ.get("SATELLITE_PROVIDER_API_KEY", "").strip()
if not sp_url:
return (
"AZ-839 operator_pre_flight_setup requires SATELLITE_PROVIDER_URL "
"(e.g. https://satellite-provider:8080)"
)
if not sp_jwt:
return (
"AZ-839 operator_pre_flight_setup requires SATELLITE_PROVIDER_API_KEY "
"(Bearer JWT for the parent-suite Route + Inventory APIs)"
)
if os.environ.get("BUILD_FAISS_INDEX", "").strip().lower() not in {
"on",
"1",
"true",
"yes",
}:
return (
"AZ-839 operator_pre_flight_setup requires BUILD_FAISS_INDEX=ON "
"(the C6 FaissDescriptorIndex runtime is build-flag-gated per "
"runtime_root.storage_factory)"
)
if not os.environ.get("GPS_DENIED_OPERATOR_CONFIG_PATH", "").strip():
return (
"AZ-839 operator_pre_flight_setup requires "
"GPS_DENIED_OPERATOR_CONFIG_PATH pointing at a YAML that "
"registers c6_tile_cache + c7_inference + c10_provisioning + "
"c11_tile_manager blocks (Jetson e2e harness sets this; "
"dev macOS does not)"
)
return None
def _build_operator_pre_flight_cache(
*,
derkachi_replay_inputs: DerkachiReplayInputs,
tmp_path_factory: pytest.TempPathFactory,
) -> Iterator["PopulatedC6Cache"]:
"""Wire the operator-side runtime graph and run the AZ-839 driver.
All imports of heavy collaborators (httpx, runtime_root factories,
c10/c11/c6 modules) live inside this function so collection on
dev macOS without the e2e env stays cheap (the SKIP path returns
before reaching this body).
Raises:
pytest.skip.Exception: when an env-flagged dependency
(e.g. ``c10_provisioning`` config block, route extraction)
cannot be satisfied and re-running with the right env is
the right next step.
"""
import httpx
from gps_denied_onboard.clock.wall_clock import WallClock
from gps_denied_onboard.config.loader import load_config
from gps_denied_onboard.replay_input.tlog_route import (
extract_route_from_tlog,
)
from gps_denied_onboard.runtime_root.c10_factory import (
build_descriptor_batcher,
build_engine_compiler,
)
from gps_denied_onboard.runtime_root.c11_factory import (
build_tile_downloader,
)
from gps_denied_onboard.runtime_root.storage_factory import (
build_descriptor_index,
build_tile_metadata_store,
build_tile_store,
)
from tests.e2e.replay._operator_pre_flight import (
populate_c6_from_route,
)
config_path = Path(os.environ["GPS_DENIED_OPERATOR_CONFIG_PATH"])
if not config_path.is_file():
pytest.skip(
f"GPS_DENIED_OPERATOR_CONFIG_PATH points at a non-file: {config_path}"
)
config = load_config(os.environ, paths=[config_path])
cache_root = tmp_path_factory.mktemp("operator_pre_flight_cache")
# PostgresFilesystemStore writes JPEGs under `<root_dir>/tiles/`;
# FaissDescriptorIndex falls back to `<root_dir>/descriptor.index`
# when `faiss_index_path` is empty. Override the c6_tile_cache
# block in-memory so the production components built below
# (build_tile_store / build_descriptor_index / batcher) write to
# the same `cache_root` PopulatedC6Cache advertises. Without this
# the static YAML at GPS_DENIED_OPERATOR_CONFIG_PATH would route
# writes to its baked-in `root_dir` while the verifier read from
# the fixture's tmp path, breaking AC-3 / AC-6 on Tier-2.
c6_block = config.components["c6_tile_cache"]
c6_block_overridden = dataclasses.replace(
c6_block,
root_dir=str(cache_root),
faiss_index_path="",
)
config = dataclasses.replace(
config,
components={**config.components, "c6_tile_cache": c6_block_overridden},
)
tile_store_path = cache_root / "tiles"
faiss_index_path = cache_root / "descriptor.index"
route_spec = extract_route_from_tlog(
derkachi_replay_inputs.tlog_path,
max_waypoints=10,
)
sp_url = os.environ["SATELLITE_PROVIDER_URL"].strip()
sp_jwt = os.environ["SATELLITE_PROVIDER_API_KEY"].strip()
tls_insecure = os.environ.get(
"SATELLITE_PROVIDER_TLS_INSECURE", ""
).strip().lower() in {"1", "true", "yes", "on"}
from gps_denied_onboard.components.c11_tile_manager.route_client import (
SatelliteProviderRouteClient,
)
route_client = SatelliteProviderRouteClient(
base_url=sp_url,
jwt=sp_jwt,
tls_insecure=tls_insecure,
)
tile_store = build_tile_store(config)
tile_metadata_store = build_tile_metadata_store(config)
descriptor_index = build_descriptor_index(config)
httpx_client = httpx.Client(
verify=not tls_insecure,
timeout=httpx.Timeout(30.0),
headers={"Authorization": f"Bearer {sp_jwt}"},
)
tile_downloader = build_tile_downloader(
config,
http_client=httpx_client,
tile_store=tile_store,
tile_metadata_store=tile_metadata_store,
budget_enforcer=tile_store,
)
clock = WallClock()
engine_compiler = build_engine_compiler(config)
backbone_embedder = _build_replay_backbone_embedder(
config=config,
engine_compiler=engine_compiler,
cache_root=cache_root,
)
descriptor_batcher = build_descriptor_batcher(
config,
backbone_embedder=backbone_embedder,
tile_metadata_store=tile_metadata_store,
tile_store=tile_store,
descriptor_index=descriptor_index,
clock=clock,
)
def _descriptor_index_factory() -> Any:
from gps_denied_onboard.components.c6_tile_cache.faiss_descriptor_index import ( # noqa: E501
FaissDescriptorIndex,
)
from gps_denied_onboard.helpers.sha256_sidecar import Sha256Sidecar
from gps_denied_onboard.logging import get_logger
return FaissDescriptorIndex(
index_path=faiss_index_path,
sidecar=Sha256Sidecar(),
logger=get_logger("c6_tile_cache.faiss_descriptor_index"),
)
populated = populate_c6_from_route(
route_spec=route_spec,
route_client=route_client,
tile_downloader=tile_downloader,
descriptor_batcher=descriptor_batcher,
descriptor_index_factory=_descriptor_index_factory,
cache_root=cache_root,
tile_store_path=tile_store_path,
faiss_index_path=faiss_index_path,
)
try:
yield populated
finally:
httpx_client.close()
def _build_replay_backbone_embedder(
*,
config: Any,
engine_compiler: Any,
cache_root: Path,
) -> Any:
"""Compile the first configured backbone and wrap it for the AZ-322 batcher.
The replay-mode operator binary does not exist yet (tracked under
Epic AZ-835); until it does, this fixture performs the wiring
inline. The path is deliberately the production path:
* :func:`runtime_root.c10_factory.build_engine_compiler` builds
the AZ-321 :class:`EngineCompiler`.
* The first backbone in
``config.components['c10_provisioning'].backbones`` is
compiled to an engine cache entry; the AZ-297
:class:`InferenceRuntime` deserialises it into the
:class:`EngineHandle` the embedder consumes.
* The tile decoder converts a C6 :class:`TilePixelHandle`
(mmap of JPEG bytes) to the ``np.float32`` tensor shape the
backbone expects via OpenCV the same primitive the C7
pre-processor uses.
Tests / dev workstations without a backbone ONNX or a working
:class:`InferenceRuntime` fail this function, which surfaces as
a fixture error (deliberate the SKIP gate above is meant to
catch the env-mismatch case before we get here).
"""
from gps_denied_onboard._types.inference import PrecisionMode
from gps_denied_onboard._types.manifests import HostCapabilities
from gps_denied_onboard.components.c10_provisioning.c7_engine_embedder import (
C7EngineBackboneEmbedder,
)
from gps_denied_onboard.components.c10_provisioning.engine_compiler import (
EngineCompileRequest,
)
from gps_denied_onboard.logging import get_logger
from gps_denied_onboard.runtime_root.c10_factory import (
build_backbone_specs,
)
from gps_denied_onboard.runtime_root.inference_factory import (
build_inference_runtime,
)
backbones = build_backbone_specs(config)
if not backbones:
pytest.skip(
"AZ-839 operator_pre_flight_setup: config has no "
"c10_provisioning.backbones entries — the e2e harness "
"config must declare at least one backbone (typically "
"DINOv2-VPR or NetVLAD per AZ-321)."
)
host = HostCapabilities(
gpu_name="replay-e2e",
cuda_compute_capability=(0, 0),
cuda_runtime_version="0.0",
tensorrt_version="0.0",
host_arch="unknown",
host_os="linux",
driver_version="unknown",
)
engine_cache_root = cache_root / "engines"
engine_cache_root.mkdir(parents=True, exist_ok=True)
request = EngineCompileRequest(
backbones=backbones,
calibration_path=None,
cache_root=engine_cache_root,
precision=PrecisionMode.FP16,
host=host,
workspace_mb=int(
config.components["c10_provisioning"].workspace_mb
),
)
results = engine_compiler.compile_engines_for_corpus(request)
if not results:
pytest.skip(
"AZ-839 operator_pre_flight_setup: engine compiler returned "
"empty results — corpus failed to compile."
)
first = results[0]
spec = backbones[0]
inference_runtime = build_inference_runtime(config)
engine_handle = inference_runtime.deserialize_engine(first.entry)
descriptor_dim = _resolve_replay_descriptor_dim(config, spec)
return C7EngineBackboneEmbedder(
inference_runtime=inference_runtime,
engine_handle=engine_handle,
input_name=spec.input_name,
output_name="descriptor",
descriptor_dim=descriptor_dim,
tile_decoder=_default_tile_decoder,
logger=get_logger("c10_provisioning.replay_backbone_embedder"),
)
def _resolve_replay_descriptor_dim(config: Any, spec: Any) -> int:
"""Resolve the descriptor output dimension for the AZ-839 NetVLAD baseline.
The AZ-839 task spec pins the C2 backbone at NetVLAD (per
``c2_vpr/config.py:67``); :class:`C2VprConfig.netvlad_descriptor_dim`
is the canonical source. We read the c2_vpr block and fall back
to the architecture default ``4096`` when the block is absent so
operators on a hand-rolled YAML still get a coherent dim. Other
backbones (UltraVPR=512, MegaLoc=2048, MixVPR=4096) require
swapping this resolver out of scope for AZ-839.
"""
block = config.components.get("c2_vpr") if config.components else None
if block is not None and getattr(block, "strategy", "") == "net_vlad":
return int(getattr(block, "netvlad_descriptor_dim", 4096))
pytest.skip(
"AZ-839 operator_pre_flight_setup: descriptor_dim resolver "
f"only supports c2_vpr.strategy='net_vlad'; got "
f"{getattr(block, 'strategy', '<missing>')!r} on backbone "
f"{spec.model_name!r}. See AZ-839 spec § Out of scope."
)
raise AssertionError("unreachable: pytest.skip raises")
def _default_tile_decoder(handle: Any) -> Any:
"""Decode a C6 :class:`TilePixelHandle` (JPEG mmap) to a CHW float32 tensor.
The handle exposes ``read_bytes()`` (or context-manager + ``read``);
we prefer the simpler ``read_bytes()`` path. OpenCV imdecode
yields HWC-uint8-BGR; the embedder expects float32-CHW-RGB
normalised to ``[0, 1]`` (DINOv2-VPR + NetVLAD share this layout).
Imports are lazy no OpenCV penalty when this module is imported
on dev macOS.
"""
import cv2
import numpy as np
if hasattr(handle, "read_bytes"):
blob = handle.read_bytes()
else:
with handle as opened:
blob = opened.read()
arr = np.frombuffer(blob, dtype=np.uint8)
bgr = cv2.imdecode(arr, cv2.IMREAD_COLOR)
if bgr is None:
raise RuntimeError("cv2.imdecode returned None for tile handle")
rgb = cv2.cvtColor(bgr, cv2.COLOR_BGR2RGB)
chw = np.transpose(rgb, (2, 0, 1)).astype(np.float32) / 255.0
return chw
@@ -0,0 +1,182 @@
"""AZ-840 — E2E orchestrator integration test (AC-1 / AC-2 / AC-3 / AC-4 / AC-6).
The Tier-2 entry point that closes Epic AZ-835's narrative: from a
``(tlog, video, calibration)`` triple, run the full 7-step pipeline
end-to-end on the Jetson harness without operator hand-curation
between steps.
The test consumes:
* :func:`tests.e2e.replay.conftest.operator_pre_flight_setup`
the AZ-839 C3 fixture that owns steps 3-5 (route extraction +
satellite-provider seeding + FAISS index build) and yields a
:class:`PopulatedC6Cache` keyed off a freshly-mktemp'd
``cache_root``.
* :func:`tests.e2e.replay.conftest.derkachi_replay_inputs` the
shared session fixture that materialises the Derkachi tlog +
video + factory-sheet calibration + signing-key file.
* :func:`tests.e2e.replay._e2e_orchestrator.run_e2e_orchestration`
the AC-1 driver that wires everything below the C3 fixture.
The driver writes a fresh effective replay config per session
(merging the static operator YAML with the cache_root override),
invokes ``gps-denied-replay --auto-trim``, parses the JSONL
emissions, computes the horizontal-error distribution, and writes
the verdict markdown under ``_docs/06_metrics/`` (AC-2).
Skip gates (in evaluation order):
1. ``@pytest.mark.tier2`` the per-suite Tier-2 plugin gates this
off on dev macOS (matches the AZ-839 / AZ-699 contract).
2. ``RUN_REPLAY_E2E`` not in ``{1, true, yes, on}``.
3. ``gps-denied-replay`` console-script not on ``PATH``.
4. Real video missing or placeholder-sized (mirrors AZ-699's gate).
5. ``operator_pre_flight_setup`` fixture itself skipped the
downstream consumer inherits the SKIP automatically (pytest's
fixture-skip propagation).
AC-7 (AZ-699 continues to pass) is satisfied by inspection: this
test does not modify ``test_derkachi_real_tlog.py`` and writes its
report to the same path (``real_flight_validation_<date>.md``) but
in an idempotent way both tests writing PASS or both writing
FAIL is the expected joint outcome on a given clip.
"""
from __future__ import annotations
import os
import shutil
import sys
from collections.abc import Iterator
from pathlib import Path
import pytest
from tests.e2e.replay._e2e_orchestrator import (
OrchestrationReport,
run_e2e_orchestration,
)
from tests.e2e.replay._operator_pre_flight import PopulatedC6Cache
from tests.e2e.replay.conftest import DerkachiReplayInputs
def _repo_root() -> Path:
return Path(__file__).resolve().parents[3]
def _derkachi_dir() -> Path:
return _repo_root() / "_docs" / "00_problem" / "input_data" / "flight_derkachi"
_MIN_REAL_VIDEO_BYTES: int = 1_000_000
def _replay_binary() -> Path | None:
"""Return the absolute path to ``gps-denied-replay`` or ``None``.
Same lookup order AZ-699 uses: PATH first, venv bin second.
"""
binary = shutil.which("gps-denied-replay")
if binary is not None:
return Path(binary)
venv_bin = Path(sys.executable).parent / "gps-denied-replay"
if venv_bin.exists():
return venv_bin
return None
def _orchestrator_skip_reason() -> str | None:
"""Return a SKIP message when env / inputs preclude a Jetson run."""
if os.environ.get("RUN_REPLAY_E2E", "").strip().lower() not in {
"1",
"true",
"yes",
"on",
}:
return "AZ-840 e2e orchestrator gated by RUN_REPLAY_E2E=1"
if not os.environ.get("GPS_DENIED_OPERATOR_CONFIG_PATH", "").strip():
return (
"AZ-840 e2e orchestrator requires GPS_DENIED_OPERATOR_CONFIG_PATH "
"(same env var the C3 fixture consumes)"
)
if _replay_binary() is None:
return "gps-denied-replay console-script not installed"
video = _derkachi_dir() / "flight_derkachi.mp4"
if not video.is_file():
return f"Derkachi video missing: {video}"
if video.stat().st_size < _MIN_REAL_VIDEO_BYTES:
return (
f"Derkachi video at {video} is only {video.stat().st_size} "
"bytes — placeholder, not a real recording"
)
return None
@pytest.fixture
def az840_skip_gate() -> Iterator[None]:
"""Skip-gate the orchestrator test before any heavy fixtures resolve."""
reason = _orchestrator_skip_reason()
if reason is not None:
pytest.skip(reason)
yield
@pytest.mark.tier2
def test_az840_e2e_real_flight_orchestration(
az840_skip_gate: None,
operator_pre_flight_setup: PopulatedC6Cache,
derkachi_replay_inputs: DerkachiReplayInputs,
tmp_path: Path,
) -> None:
# Arrange — every input besides cache_root comes from the existing
# session fixtures so the same Tier-2 harness setup that powers
# AZ-699 + AZ-839 is exercised.
binary = _replay_binary()
assert binary is not None, "skip gate already verified the binary exists"
base_config_path = Path(os.environ["GPS_DENIED_OPERATOR_CONFIG_PATH"])
output_path = tmp_path / "estimator_output.jsonl"
effective_config_path = tmp_path / "operator_config_effective.yaml"
report_dir = _repo_root() / "_docs" / "06_metrics"
# Act
report = run_e2e_orchestration(
populated_cache=operator_pre_flight_setup,
base_config_path=base_config_path,
tlog_path=derkachi_replay_inputs.tlog_path,
video_path=derkachi_replay_inputs.video_path,
calibration_path=derkachi_replay_inputs.calibration_path,
signing_key_path=derkachi_replay_inputs.signing_key_path,
replay_binary=binary,
output_path=output_path,
report_dir=report_dir,
effective_config_path=effective_config_path,
)
# Assert AC-2 + AC-4 — report exists; full run within the 15-min budget.
assert isinstance(report, OrchestrationReport)
assert report.report_path.is_file()
body = report.report_path.read_text()
assert "## Horizontal error (metres)" in body
assert "## Threshold-hit share" in body
assert "Mean" in body
for threshold in (10, 25, 50, 100):
assert f"| {threshold} |" in body, (
f"threshold {threshold} m row missing from report"
)
assert report.replay_subprocess_seconds <= 900.0, (
"AZ-840 AC-4: replay subprocess exceeded 15-min soft target"
)
assert report.wall_clock_s >= report.replay_subprocess_seconds
assert report.distribution.count > 0, (
"no emissions paired with ground truth — orchestration produced "
"data but every emission fell outside the tlog GPS window"
)
# Assert AC-3 — the effective config was written and points at the
# cache_root the C3 fixture supplied.
assert effective_config_path.is_file()
effective_text = effective_config_path.read_text()
assert str(operator_pre_flight_setup.cache_root) in effective_text
@@ -0,0 +1,671 @@
"""Unit tests for the AZ-840 e2e orchestrator (AC-8).
The end-to-end happy path is the Tier-2 integration test in
``test_az835_e2e_real_flight.py`` (AC-1 / AC-2). This module covers
the orchestration helper layer in isolation:
* Param validation every required path must exist before the
airborne subprocess is spawned (AC-5 fails LOUD).
* Effective-config merge the ``c6_tile_cache.root_dir`` override
is written to YAML; the rest of the base config is preserved.
* Error propagation per step every documented failure surfaces
as :class:`OrchestrationFailure` with the correct
:class:`OrchestratorStep` label.
* Happy path when the runner returns success and the JSONL +
ground truth align, :class:`OrchestrationReport` carries a
written report path and an honest verdict (AC-2: report exists
PASS or FAIL).
The tests inject a fake ``runner`` so no real
``gps-denied-replay`` subprocess is spawned. Real binary execution
is exercised on the Jetson harness via the AC-1 integration test.
"""
from __future__ import annotations
import json
import subprocess
from pathlib import Path
from unittest.mock import MagicMock
import pytest
import yaml
from gps_denied_onboard.helpers.accuracy_report import (
AC3_GATE_THRESHOLD_M,
)
from gps_denied_onboard.replay_input.tlog_route import RouteSpec
from tests.e2e.replay._e2e_orchestrator import (
OrchestrationFailure,
OrchestrationReport,
OrchestratorStep,
read_calibration_acquisition_method,
run_e2e_orchestration,
write_effective_replay_config,
)
from tests.e2e.replay._operator_pre_flight import PopulatedC6Cache
# ----------------------------------------------------------------------
# Helpers
def _build_populated_cache(tmp_path: Path) -> PopulatedC6Cache:
"""Construct a synthetic :class:`PopulatedC6Cache`.
The orchestrator only consumes ``cache_root`` from the cache,
so the FAISS sidecar paths are placeholders. The route_spec is
a minimal one-waypoint instance no AZ-836 invariants are
re-asserted by AZ-840.
"""
cache_root = tmp_path / "cache_root"
cache_root.mkdir()
return PopulatedC6Cache(
cache_root=cache_root,
tile_store_path=cache_root / "tiles",
faiss_index_path=cache_root / "descriptor.index",
faiss_sidecar_sha256_path=cache_root / "descriptor.index.sha256",
faiss_sidecar_meta_path=cache_root / "descriptor.index.meta.json",
route_spec=RouteSpec(
waypoints=((50.10, 36.10),),
suggested_region_size_meters=500.0,
source_tlog=Path("test.tlog"),
source_segment=(0, 100),
total_distance_meters=0.0,
),
tile_count=1,
elapsed_seconds=0.0,
)
def _stage_inputs(tmp_path: Path) -> dict[str, Path]:
"""Write touch-files for every input path the orchestrator validates.
The base config YAML carries one stub block so the merge step
has a real document to overlay on.
"""
base_config = tmp_path / "operator_config.yaml"
base_config.write_text(
yaml.safe_dump(
{
"mode": "replay",
"c6_tile_cache": {
"store_runtime": "postgres_filesystem",
"metadata_runtime": "postgres_filesystem",
"descriptor_index_runtime": "faiss_hnsw",
"root_dir": "/var/lib/gps-denied/tiles",
"faiss_index_path": "/some/static/path/descriptor.index",
},
}
)
)
tlog = tmp_path / "input.tlog"
tlog.write_bytes(b"\x00")
video = tmp_path / "input.mp4"
video.write_bytes(b"\x00")
calibration = tmp_path / "calibration.json"
calibration.write_text(json.dumps({"acquisition_method": "factory-sheet"}))
signing_key = tmp_path / "signing_key.bin"
signing_key.write_bytes(b"\x00" * 32)
binary = tmp_path / "gps-denied-replay"
binary.write_text("")
return {
"base_config_path": base_config,
"tlog_path": tlog,
"video_path": video,
"calibration_path": calibration,
"signing_key_path": signing_key,
"replay_binary": binary,
}
def _ground_truth_tlog_loader(
monkeypatch: pytest.MonkeyPatch,
*,
times_s: tuple[float, ...] = (0.0, 1.0, 2.0),
lat_deg: float = 50.10,
lon_deg: float = 36.10,
alt_m: float = 100.0,
) -> None:
"""Stub the orchestrator's ground-truth loader so unit tests skip MAVLink.
The orchestrator imports ``load_tlog_ground_truth`` from
``gps_denied_onboard.replay_input``; patching the symbol *as
bound on the orchestrator module* keeps the patch local to the
unit suite (no cross-test bleed).
"""
fixes = [
_StubGpsFix(
ts_ns=int(t * 1e9),
lat_deg=lat_deg,
lon_deg=lon_deg,
alt_m=alt_m,
)
for t in times_s
]
series = _StubGpsSeries(records=tuple(fixes))
monkeypatch.setattr(
"tests.e2e.replay._e2e_orchestrator.load_tlog_ground_truth",
lambda *_args, **_kwargs: series,
)
class _StubGpsFix:
"""Mirrors the fields the orchestrator reads from each tlog row."""
__slots__ = ("ts_ns", "lat_deg", "lon_deg", "alt_m")
def __init__(
self, *, ts_ns: int, lat_deg: float, lon_deg: float, alt_m: float
) -> None:
self.ts_ns = ts_ns
self.lat_deg = lat_deg
self.lon_deg = lon_deg
self.alt_m = alt_m
class _StubGpsSeries:
"""Drop-in replacement for :class:`TlogGroundTruth`."""
def __init__(self, *, records: tuple[_StubGpsFix, ...]) -> None:
self.records = records
def _build_runner_emitting(
output_path: Path,
*,
rows: list[dict[str, object]],
returncode: int = 0,
stdout: str = "",
stderr: str = "",
) -> "MagicMock":
"""Return a fake ``subprocess.run`` that writes JSONL on call."""
def _run(argv, **kwargs): # type: ignore[no-untyped-def]
if rows:
output_path.parent.mkdir(parents=True, exist_ok=True)
output_path.write_text(
"\n".join(json.dumps(row) for row in rows) + "\n"
)
return subprocess.CompletedProcess(
args=argv,
returncode=returncode,
stdout=stdout,
stderr=stderr,
)
return MagicMock(side_effect=_run)
# ----------------------------------------------------------------------
# write_effective_replay_config
def test_write_effective_replay_config_overlays_root_dir(
tmp_path: Path,
) -> None:
# Arrange
inputs = _stage_inputs(tmp_path)
cache_root = tmp_path / "cache"
cache_root.mkdir()
output_path = tmp_path / "effective.yaml"
# Act
written_path = write_effective_replay_config(
base_config_path=inputs["base_config_path"],
cache_root=cache_root,
output_path=output_path,
)
# Assert
assert written_path == output_path
merged = yaml.safe_load(output_path.read_text())
assert merged["c6_tile_cache"]["root_dir"] == str(cache_root)
assert merged["c6_tile_cache"]["faiss_index_path"] == ""
assert merged["mode"] == "replay"
assert (
merged["c6_tile_cache"]["store_runtime"] == "postgres_filesystem"
), "non-overridden c6_tile_cache fields must survive"
def test_write_effective_replay_config_creates_block_when_absent(
tmp_path: Path,
) -> None:
# Arrange
base = tmp_path / "operator.yaml"
base.write_text(yaml.safe_dump({"mode": "replay"}))
cache_root = tmp_path / "cache"
cache_root.mkdir()
# Act
write_effective_replay_config(
base_config_path=base,
cache_root=cache_root,
output_path=tmp_path / "effective.yaml",
)
# Assert
merged = yaml.safe_load((tmp_path / "effective.yaml").read_text())
assert merged["c6_tile_cache"]["root_dir"] == str(cache_root)
def test_write_effective_replay_config_malformed_yaml_fails(
tmp_path: Path,
) -> None:
# Arrange
base = tmp_path / "bad.yaml"
base.write_text(":\n : not yaml:")
cache_root = tmp_path / "cache"
cache_root.mkdir()
# Act + Assert
with pytest.raises(OrchestrationFailure) as exc_info:
write_effective_replay_config(
base_config_path=base,
cache_root=cache_root,
output_path=tmp_path / "effective.yaml",
)
assert exc_info.value.step is OrchestratorStep.WRITE_EFFECTIVE_CONFIG
def test_write_effective_replay_config_non_mapping_top_level_fails(
tmp_path: Path,
) -> None:
# Arrange
base = tmp_path / "bad.yaml"
base.write_text("- not a mapping\n")
cache_root = tmp_path / "cache"
cache_root.mkdir()
# Act + Assert
with pytest.raises(OrchestrationFailure) as exc_info:
write_effective_replay_config(
base_config_path=base,
cache_root=cache_root,
output_path=tmp_path / "effective.yaml",
)
assert exc_info.value.step is OrchestratorStep.WRITE_EFFECTIVE_CONFIG
# ----------------------------------------------------------------------
# read_calibration_acquisition_method
def test_read_calibration_acquisition_method_returns_field_when_present(
tmp_path: Path,
) -> None:
# Arrange
path = tmp_path / "cal.json"
path.write_text(json.dumps({"acquisition_method": "factory-sheet"}))
# Assert
assert read_calibration_acquisition_method(path) == "factory-sheet"
def test_read_calibration_acquisition_method_returns_unknown_on_missing(
tmp_path: Path,
) -> None:
# Arrange
path = tmp_path / "cal.json"
path.write_text(json.dumps({"some_other_field": True}))
# Assert
assert read_calibration_acquisition_method(path) == "unknown"
def test_read_calibration_acquisition_method_returns_unknown_on_malformed(
tmp_path: Path,
) -> None:
# Arrange
path = tmp_path / "cal.json"
path.write_text("{not valid json")
# Assert
assert read_calibration_acquisition_method(path) == "unknown"
# ----------------------------------------------------------------------
# run_e2e_orchestration — param validation (AC-5)
def test_run_e2e_orchestration_missing_tlog_fails_loud(
tmp_path: Path,
) -> None:
# Arrange
cache = _build_populated_cache(tmp_path)
inputs = _stage_inputs(tmp_path)
inputs["tlog_path"].unlink()
# Act + Assert
with pytest.raises(OrchestrationFailure) as exc_info:
run_e2e_orchestration(
populated_cache=cache,
output_path=tmp_path / "out.jsonl",
report_dir=tmp_path / "metrics",
effective_config_path=tmp_path / "eff.yaml",
**inputs, # type: ignore[arg-type]
)
assert exc_info.value.step is OrchestratorStep.VALIDATE_INPUTS
assert "tlog_path" in str(exc_info.value)
def test_run_e2e_orchestration_missing_binary_fails_loud(
tmp_path: Path,
) -> None:
# Arrange
cache = _build_populated_cache(tmp_path)
inputs = _stage_inputs(tmp_path)
inputs["replay_binary"].unlink()
# Act + Assert
with pytest.raises(OrchestrationFailure) as exc_info:
run_e2e_orchestration(
populated_cache=cache,
output_path=tmp_path / "out.jsonl",
report_dir=tmp_path / "metrics",
effective_config_path=tmp_path / "eff.yaml",
**inputs, # type: ignore[arg-type]
)
assert exc_info.value.step is OrchestratorStep.VALIDATE_INPUTS
assert "replay_binary" in str(exc_info.value)
# ----------------------------------------------------------------------
# run_e2e_orchestration — subprocess error propagation (AC-5)
def test_run_e2e_orchestration_replay_nonzero_exit_fails_loud(
tmp_path: Path,
monkeypatch: pytest.MonkeyPatch,
) -> None:
# Arrange
cache = _build_populated_cache(tmp_path)
inputs = _stage_inputs(tmp_path)
output_path = tmp_path / "out.jsonl"
runner = MagicMock(
return_value=subprocess.CompletedProcess(
args=[],
returncode=1,
stdout="",
stderr="boom",
)
)
_ground_truth_tlog_loader(monkeypatch)
# Act + Assert
with pytest.raises(OrchestrationFailure) as exc_info:
run_e2e_orchestration(
populated_cache=cache,
output_path=output_path,
report_dir=tmp_path / "metrics",
effective_config_path=tmp_path / "eff.yaml",
runner=runner,
**inputs, # type: ignore[arg-type]
)
assert exc_info.value.step is OrchestratorStep.AIRBORNE_PIPELINE
assert "exited 1" in str(exc_info.value)
assert "boom" in str(exc_info.value)
def test_run_e2e_orchestration_replay_timeout_fails_loud(
tmp_path: Path,
) -> None:
# Arrange
cache = _build_populated_cache(tmp_path)
inputs = _stage_inputs(tmp_path)
def _timeout(*_args, **_kwargs):
raise subprocess.TimeoutExpired(cmd=["replay"], timeout=0.1)
runner = MagicMock(side_effect=_timeout)
# Act + Assert
with pytest.raises(OrchestrationFailure) as exc_info:
run_e2e_orchestration(
populated_cache=cache,
output_path=tmp_path / "out.jsonl",
report_dir=tmp_path / "metrics",
effective_config_path=tmp_path / "eff.yaml",
runner=runner,
max_seconds=0.1,
**inputs, # type: ignore[arg-type]
)
assert exc_info.value.step is OrchestratorStep.AIRBORNE_PIPELINE
assert "timed out" in str(exc_info.value)
def test_run_e2e_orchestration_replay_oserror_fails_loud(
tmp_path: Path,
) -> None:
# Arrange
cache = _build_populated_cache(tmp_path)
inputs = _stage_inputs(tmp_path)
def _oserror(*_args, **_kwargs):
raise OSError("permission denied")
runner = MagicMock(side_effect=_oserror)
# Act + Assert
with pytest.raises(OrchestrationFailure) as exc_info:
run_e2e_orchestration(
populated_cache=cache,
output_path=tmp_path / "out.jsonl",
report_dir=tmp_path / "metrics",
effective_config_path=tmp_path / "eff.yaml",
runner=runner,
**inputs, # type: ignore[arg-type]
)
assert exc_info.value.step is OrchestratorStep.AIRBORNE_PIPELINE
assert "permission denied" in str(exc_info.value)
# ----------------------------------------------------------------------
# run_e2e_orchestration — empty / malformed JSONL (AC-5)
def test_run_e2e_orchestration_empty_jsonl_fails_loud(
tmp_path: Path,
monkeypatch: pytest.MonkeyPatch,
) -> None:
# Arrange
cache = _build_populated_cache(tmp_path)
inputs = _stage_inputs(tmp_path)
output_path = tmp_path / "out.jsonl"
def _runner(argv, **_kwargs): # type: ignore[no-untyped-def]
output_path.write_text("\n\n") # only blanks
return subprocess.CompletedProcess(args=argv, returncode=0, stdout="", stderr="")
runner = MagicMock(side_effect=_runner)
_ground_truth_tlog_loader(monkeypatch)
# Act + Assert
with pytest.raises(OrchestrationFailure) as exc_info:
run_e2e_orchestration(
populated_cache=cache,
output_path=output_path,
report_dir=tmp_path / "metrics",
effective_config_path=tmp_path / "eff.yaml",
runner=runner,
**inputs, # type: ignore[arg-type]
)
assert exc_info.value.step is OrchestratorStep.PARSE_EMISSIONS
def test_run_e2e_orchestration_malformed_jsonl_fails_loud(
tmp_path: Path,
monkeypatch: pytest.MonkeyPatch,
) -> None:
# Arrange
cache = _build_populated_cache(tmp_path)
inputs = _stage_inputs(tmp_path)
output_path = tmp_path / "out.jsonl"
def _runner(argv, **_kwargs): # type: ignore[no-untyped-def]
output_path.write_text('{"valid": true}\nnot a json line\n')
return subprocess.CompletedProcess(args=argv, returncode=0, stdout="", stderr="")
runner = MagicMock(side_effect=_runner)
_ground_truth_tlog_loader(monkeypatch)
# Act + Assert
with pytest.raises(OrchestrationFailure) as exc_info:
run_e2e_orchestration(
populated_cache=cache,
output_path=output_path,
report_dir=tmp_path / "metrics",
effective_config_path=tmp_path / "eff.yaml",
runner=runner,
**inputs, # type: ignore[arg-type]
)
assert exc_info.value.step is OrchestratorStep.PARSE_EMISSIONS
# ----------------------------------------------------------------------
# run_e2e_orchestration — ground truth loader failure (AC-5)
def test_run_e2e_orchestration_ground_truth_loader_failure_fails_loud(
tmp_path: Path,
monkeypatch: pytest.MonkeyPatch,
) -> None:
# Arrange
cache = _build_populated_cache(tmp_path)
inputs = _stage_inputs(tmp_path)
output_path = tmp_path / "out.jsonl"
runner = _build_runner_emitting(
output_path,
rows=[
{
"emitted_at": int(0.5 * 1e9),
"position_wgs84": {
"lat_deg": 50.10,
"lon_deg": 36.10,
"alt_m": 100.0,
},
}
],
)
def _raise(*_args, **_kwargs):
raise ValueError("tlog corrupt")
monkeypatch.setattr(
"tests.e2e.replay._e2e_orchestrator.load_tlog_ground_truth",
_raise,
)
# Act + Assert
with pytest.raises(OrchestrationFailure) as exc_info:
run_e2e_orchestration(
populated_cache=cache,
output_path=output_path,
report_dir=tmp_path / "metrics",
effective_config_path=tmp_path / "eff.yaml",
runner=runner,
**inputs, # type: ignore[arg-type]
)
assert exc_info.value.step is OrchestratorStep.LOAD_GROUND_TRUTH
assert "tlog corrupt" in str(exc_info.value)
# ----------------------------------------------------------------------
# run_e2e_orchestration — happy path (AC-1 / AC-2)
def test_run_e2e_orchestration_happy_path_writes_report(
tmp_path: Path,
monkeypatch: pytest.MonkeyPatch,
) -> None:
# Arrange
cache = _build_populated_cache(tmp_path)
inputs = _stage_inputs(tmp_path)
output_path = tmp_path / "out.jsonl"
report_dir = tmp_path / "metrics"
effective_config_path = tmp_path / "eff.yaml"
rows = [
{
"emitted_at": int(0.5 * 1e9),
"position_wgs84": {"lat_deg": 50.10, "lon_deg": 36.10, "alt_m": 100.0},
},
{
"emitted_at": int(1.5 * 1e9),
"position_wgs84": {"lat_deg": 50.10, "lon_deg": 36.10, "alt_m": 100.0},
},
]
runner = _build_runner_emitting(output_path, rows=rows)
_ground_truth_tlog_loader(monkeypatch)
# Act
report = run_e2e_orchestration(
populated_cache=cache,
output_path=output_path,
report_dir=report_dir,
effective_config_path=effective_config_path,
runner=runner,
run_date_utc="2026-05-23",
**inputs, # type: ignore[arg-type]
)
# Assert
assert isinstance(report, OrchestrationReport)
assert report.report_path.is_file()
assert report.emissions_count == 2
assert report.distribution.count == 2
assert report.verdict_passed is True
body = report.report_path.read_text()
assert "## Horizontal error (metres)" in body
assert "## Threshold-hit share" in body
assert f"| {AC3_GATE_THRESHOLD_M:g} |" in body
runner.assert_called_once()
argv_passed = runner.call_args.args[0]
assert str(effective_config_path) in argv_passed
assert "--auto-trim" in argv_passed
merged = yaml.safe_load(effective_config_path.read_text())
assert merged["c6_tile_cache"]["root_dir"] == str(cache.cache_root)
def test_run_e2e_orchestration_writes_report_even_on_fail_verdict(
tmp_path: Path,
monkeypatch: pytest.MonkeyPatch,
) -> None:
# Arrange — emissions are 1 km from ground truth, far above the 100 m gate.
cache = _build_populated_cache(tmp_path)
inputs = _stage_inputs(tmp_path)
output_path = tmp_path / "out.jsonl"
report_dir = tmp_path / "metrics"
rows = [
{
"emitted_at": int(0.5 * 1e9),
"position_wgs84": {"lat_deg": 50.110, "lon_deg": 36.110, "alt_m": 100.0},
},
{
"emitted_at": int(1.5 * 1e9),
"position_wgs84": {"lat_deg": 50.110, "lon_deg": 36.110, "alt_m": 100.0},
},
]
runner = _build_runner_emitting(output_path, rows=rows)
_ground_truth_tlog_loader(monkeypatch)
# Act
report = run_e2e_orchestration(
populated_cache=cache,
output_path=output_path,
report_dir=report_dir,
effective_config_path=tmp_path / "eff.yaml",
runner=runner,
run_date_utc="2026-05-23",
**inputs, # type: ignore[arg-type]
)
# Assert — AC-2: report exists regardless of PASS/FAIL.
assert report.verdict_passed is False
assert report.report_path.is_file()
assert "FAIL" in report.report_path.read_text()
@@ -0,0 +1,480 @@
"""Unit tests for ``populate_c6_from_route`` (AZ-839 AC-8).
Covers the AZ-839 acceptance criteria that can be exercised against
stubbed dependencies (the AC-9 integration test against the Jetson
harness lives in ``test_derkachi_real_tlog.py`` once Epic AZ-835
completes):
* AC-3 happy path driver returns a populated cache with paths
pointing at the on-disk sidecar triple.
* AC-4 :class:`RouteValidationError` and
:class:`RouteTerminalFailureError` propagate unchanged with their
original cause; no silent swallow.
* AC-5 :class:`RouteTransientError` triggers retry up to 3 attempts
using the documented backoff schedule. Final attempt's exception is
propagated unchanged.
* AC-6 Tamper between rebuild and verify (simulated by having
``descriptor_index_factory`` raise :class:`IndexUnavailableError`)
surfaces the failure and leaves no half-built artifacts.
* AC-7 Cleanup on failure removes any sidecar file the driver
produced (pre-existing files are preserved).
The driver intentionally takes every collaborator via dependency
injection so this module never imports httpx, FAISS, or Postgres.
"""
from __future__ import annotations
from dataclasses import dataclass
from pathlib import Path
from unittest.mock import MagicMock
from uuid import uuid4
import pytest
from gps_denied_onboard.components.c10_provisioning.descriptor_batcher import (
BatcherOutcome,
DescriptorBatchReport,
)
from gps_denied_onboard.components.c11_tile_manager import (
DownloadOutcome,
SectorClassification,
)
from gps_denied_onboard.components.c11_tile_manager._types import (
DownloadBatchReport,
)
from gps_denied_onboard.components.c11_tile_manager.errors import (
RouteTerminalFailureError,
RouteTransientError,
RouteValidationError,
)
from gps_denied_onboard.components.c11_tile_manager.route_client import (
RouteSeedResult,
)
from gps_denied_onboard.components.c6_tile_cache.errors import (
IndexUnavailableError,
)
from gps_denied_onboard.components.c6_tile_cache.faiss_descriptor_index import (
META_SUFFIX,
)
from gps_denied_onboard.helpers.sha256_sidecar import SIDECAR_SUFFIX
from gps_denied_onboard.replay_input.tlog_route import RouteSpec
from tests.e2e.replay._operator_pre_flight import (
PopulatedC6Cache,
populate_c6_from_route,
)
# ----------------------------------------------------------------------
# Helpers
@dataclass
class _DriverHarness:
"""Bundle of paths + collaborators wired into one driver call."""
cache_root: Path
tile_store_path: Path
faiss_index_path: Path
sha256_path: Path
meta_path: Path
route_spec: RouteSpec
route_client: MagicMock
tile_downloader: MagicMock
descriptor_batcher: MagicMock
descriptor_index_factory: MagicMock
sleep_calls: list[float]
def _build_harness(tmp_path: Path) -> _DriverHarness:
"""Wire a self-contained harness with sane default stub returns.
Each collaborator is a :class:`MagicMock` with a default success
return value; tests override per-call as needed.
"""
cache_root = tmp_path / "cache_root"
cache_root.mkdir()
tile_store_path = cache_root / "tile_store"
tile_store_path.mkdir()
faiss_index_path = cache_root / "descriptor.index"
sha256_path = Path(str(faiss_index_path) + SIDECAR_SUFFIX)
meta_path = Path(str(faiss_index_path) + META_SUFFIX)
route_spec = RouteSpec(
waypoints=(
(50.10, 36.10),
(50.11, 36.11),
(50.12, 36.12),
),
suggested_region_size_meters=500.0,
source_tlog=Path("test.tlog"),
source_segment=(0, 100),
total_distance_meters=1500.0,
)
route_client = MagicMock()
route_client.seed_route.return_value = RouteSeedResult(
route_id=uuid4(),
terminal_status="completed",
maps_ready=True,
tile_count=12,
elapsed_ms=2500,
submitted_payload_sha256="cafebabe" * 8,
)
tile_downloader = MagicMock()
tile_downloader.download_tiles_for_area.return_value = DownloadBatchReport(
outcome=DownloadOutcome.SUCCESS,
tiles_requested=12,
tiles_downloaded=12,
tiles_rejected_resolution=0,
tiles_rejected_freshness=0,
tiles_downgraded=0,
retry_count=0,
request_hash="abcdef0123456789",
)
descriptor_batcher = MagicMock()
descriptor_batcher.populate_descriptors.return_value = DescriptorBatchReport(
descriptors_generated=12,
tiles_consumed=12,
oom_retries=0,
elapsed_s=1.2,
outcome=BatcherOutcome.SUCCESS,
failure_reason=None,
)
descriptor_index_factory = MagicMock()
descriptor_index_factory.return_value = MagicMock(
spec=["mmap_handle", "descriptor_dim"]
)
return _DriverHarness(
cache_root=cache_root,
tile_store_path=tile_store_path,
faiss_index_path=faiss_index_path,
sha256_path=sha256_path,
meta_path=meta_path,
route_spec=route_spec,
route_client=route_client,
tile_downloader=tile_downloader,
descriptor_batcher=descriptor_batcher,
descriptor_index_factory=descriptor_index_factory,
sleep_calls=[],
)
def _drive(harness: _DriverHarness, **overrides: object) -> PopulatedC6Cache:
"""Invoke the driver with the harness defaults plus any overrides."""
kwargs: dict[str, object] = {
"route_spec": harness.route_spec,
"route_client": harness.route_client,
"tile_downloader": harness.tile_downloader,
"descriptor_batcher": harness.descriptor_batcher,
"descriptor_index_factory": harness.descriptor_index_factory,
"cache_root": harness.cache_root,
"tile_store_path": harness.tile_store_path,
"faiss_index_path": harness.faiss_index_path,
"sleep": harness.sleep_calls.append,
}
kwargs.update(overrides)
return populate_c6_from_route(**kwargs) # type: ignore[arg-type]
# ----------------------------------------------------------------------
# AC-3 — happy path
def test_populate_c6_from_route_returns_populated_cache(tmp_path: Path) -> None:
# Arrange
harness = _build_harness(tmp_path)
# Act
populated = _drive(harness)
# Assert
assert isinstance(populated, PopulatedC6Cache)
assert populated.cache_root == harness.cache_root
assert populated.tile_store_path == harness.tile_store_path
assert populated.faiss_index_path == harness.faiss_index_path
assert populated.faiss_sidecar_sha256_path == harness.sha256_path
assert populated.faiss_sidecar_meta_path == harness.meta_path
assert populated.route_spec is harness.route_spec
assert populated.tile_count == 12
assert populated.elapsed_seconds >= 0.0
harness.route_client.seed_route.assert_called_once()
harness.tile_downloader.download_tiles_for_area.assert_called_once()
harness.descriptor_batcher.populate_descriptors.assert_called_once()
harness.descriptor_index_factory.assert_called_once()
def test_populate_c6_from_route_passes_sector_class_to_downloader(
tmp_path: Path,
) -> None:
# Arrange
harness = _build_harness(tmp_path)
# Act
_drive(harness, sector_class=SectorClassification.STABLE_REAR)
# Assert
download_request = harness.tile_downloader.download_tiles_for_area.call_args.args[0]
assert download_request.sector_class is SectorClassification.STABLE_REAR
corpus_filter = harness.descriptor_batcher.populate_descriptors.call_args.args[0]
assert corpus_filter.sector_class == SectorClassification.STABLE_REAR.value
# ----------------------------------------------------------------------
# AC-4 — validation / terminal failure propagate unchanged
def test_route_validation_error_propagates_unchanged(tmp_path: Path) -> None:
# Arrange
harness = _build_harness(tmp_path)
def _raise_validation(*_args: object, **_kwargs: object) -> RouteSeedResult:
try:
raise ValueError("payload sha256 mismatch")
except ValueError as cause:
raise RouteValidationError("payload rejected") from cause
harness.route_client.seed_route.side_effect = _raise_validation
# Act + Assert
with pytest.raises(RouteValidationError) as exc_info:
_drive(harness)
assert isinstance(exc_info.value.__cause__, ValueError)
assert "payload sha256 mismatch" in str(exc_info.value.__cause__)
assert harness.tile_downloader.download_tiles_for_area.call_count == 0
assert harness.descriptor_batcher.populate_descriptors.call_count == 0
assert harness.sleep_calls == []
def test_route_terminal_failure_propagates_unchanged(tmp_path: Path) -> None:
# Arrange
harness = _build_harness(tmp_path)
harness.route_client.seed_route.side_effect = RouteTerminalFailureError(
"mapsReady never reached"
)
# Act + Assert
with pytest.raises(RouteTerminalFailureError):
_drive(harness)
assert harness.tile_downloader.download_tiles_for_area.call_count == 0
assert harness.descriptor_batcher.populate_descriptors.call_count == 0
assert harness.sleep_calls == []
# ----------------------------------------------------------------------
# AC-5 — transient retry budget
def test_route_transient_error_retries_then_succeeds(tmp_path: Path) -> None:
# Arrange
harness = _build_harness(tmp_path)
success_result = harness.route_client.seed_route.return_value
harness.route_client.seed_route.side_effect = [
RouteTransientError("503 first attempt"),
RouteTransientError("503 second attempt"),
success_result,
]
# Act
populated = _drive(
harness,
retry_schedule_s=(0.1, 0.2, 0.4),
max_retry_attempts=3,
)
# Assert
assert populated.tile_count == 12
assert harness.route_client.seed_route.call_count == 3
assert harness.sleep_calls == [pytest.approx(0.1), pytest.approx(0.2)]
def test_route_transient_error_exhausted_propagates_last_attempt(
tmp_path: Path,
) -> None:
# Arrange
harness = _build_harness(tmp_path)
final_exc = RouteTransientError("503 final attempt")
harness.route_client.seed_route.side_effect = [
RouteTransientError("503 a"),
RouteTransientError("503 b"),
final_exc,
]
# Act + Assert
with pytest.raises(RouteTransientError) as exc_info:
_drive(
harness,
retry_schedule_s=(0.1, 0.2),
max_retry_attempts=3,
)
assert exc_info.value is final_exc
assert harness.route_client.seed_route.call_count == 3
assert harness.sleep_calls == [pytest.approx(0.1), pytest.approx(0.2)]
assert harness.tile_downloader.download_tiles_for_area.call_count == 0
# ----------------------------------------------------------------------
# AC-6 — tamper between rebuild and verify
def test_descriptor_index_factory_index_unavailable_propagates(
tmp_path: Path,
) -> None:
# Arrange
harness = _build_harness(tmp_path)
# Simulate the rebuild writing sidecar files DURING populate_descriptors
# (the real C10 batcher does this via its DescriptorIndexRebuilder cut).
_stub_populate_descriptors_writes_sidecars(harness)
harness.descriptor_index_factory.side_effect = IndexUnavailableError(
"sidecar sha256 mismatch — index is corrupt"
)
# Act + Assert
with pytest.raises(IndexUnavailableError):
_drive(harness)
# ----------------------------------------------------------------------
# AC-7 — cleanup on failure
def test_cleanup_removes_partial_sidecar_files_on_failure(
tmp_path: Path,
) -> None:
# Arrange
harness = _build_harness(tmp_path)
# The driver MUST observe an absent-sidecar state on entry, then a
# rebuild that writes the trio, then a verifier that fails — only
# then is the cleanup contract exercised on a "we created these"
# set of paths.
assert not harness.faiss_index_path.exists()
_stub_populate_descriptors_writes_sidecars(harness)
harness.descriptor_index_factory.side_effect = IndexUnavailableError(
"tamper detected"
)
# Act
with pytest.raises(IndexUnavailableError):
_drive(harness)
# Assert
assert not harness.faiss_index_path.exists()
assert not harness.sha256_path.exists()
assert not harness.meta_path.exists()
def test_cleanup_preserves_pre_existing_warm_cache(tmp_path: Path) -> None:
# Arrange
harness = _build_harness(tmp_path)
# A warm cache existed before the driver ran (named-volume reuse path).
_write_dummy_sidecars(harness, marker="WARM_CACHE")
harness.route_client.seed_route.side_effect = RouteValidationError(
"noop fail post-warm-cache"
)
# Act
with pytest.raises(RouteValidationError):
_drive(harness)
# Assert — the pre-existing warm-cache files MUST stay on disk.
assert harness.faiss_index_path.read_text() == "WARM_CACHE"
assert harness.sha256_path.read_text() == "WARM_CACHE"
assert harness.meta_path.read_text() == "WARM_CACHE"
def test_batcher_failure_propagates_and_cleans_up(tmp_path: Path) -> None:
# Arrange
harness = _build_harness(tmp_path)
def _populate_writes_partial_sidecar_then_fails(
_filter: object,
) -> DescriptorBatchReport:
_write_dummy_sidecars(harness, marker="HALF_BUILT")
return DescriptorBatchReport(
descriptors_generated=0,
tiles_consumed=0,
oom_retries=0,
elapsed_s=0.5,
outcome=BatcherOutcome.FAILURE,
failure_reason="OOM at batch_size=64",
)
harness.descriptor_batcher.populate_descriptors.side_effect = (
_populate_writes_partial_sidecar_then_fails
)
# Act + Assert
with pytest.raises(RuntimeError) as exc_info:
_drive(harness)
assert "OOM at batch_size=64" in str(exc_info.value)
assert not harness.faiss_index_path.exists()
assert not harness.sha256_path.exists()
assert not harness.meta_path.exists()
def test_downloader_failure_propagates_and_cleans_up(tmp_path: Path) -> None:
# Arrange
harness = _build_harness(tmp_path)
harness.tile_downloader.download_tiles_for_area.return_value = (
DownloadBatchReport(
outcome=DownloadOutcome.FAILURE,
tiles_requested=12,
tiles_downloaded=0,
tiles_rejected_resolution=0,
tiles_rejected_freshness=0,
tiles_downgraded=0,
retry_count=2,
request_hash="abcdef0123456789",
)
)
# Act + Assert
with pytest.raises(RuntimeError) as exc_info:
_drive(harness)
assert "failure" in str(exc_info.value).lower()
assert harness.descriptor_batcher.populate_descriptors.call_count == 0
# ----------------------------------------------------------------------
# Internal helpers
def _write_dummy_sidecars(
harness: _DriverHarness,
*,
marker: str = "PARTIAL",
) -> None:
"""Create the three sidecar files at the harness's faiss path."""
harness.faiss_index_path.write_text(marker)
harness.sha256_path.write_text(marker)
harness.meta_path.write_text(marker)
def _stub_populate_descriptors_writes_sidecars(
harness: _DriverHarness,
*,
marker: str = "FRESH_REBUILD",
) -> None:
"""Make the stubbed batcher write the three sidecar files on success.
The real C10 batcher writes the FAISS index + sha256 + meta.json
via the AZ-306 :class:`FaissDescriptorIndex.rebuild_from_descriptors`
path. The stub mirrors that side effect so the AC-7 cleanup path
has files to rollback on a downstream verifier failure.
"""
success_report = harness.descriptor_batcher.populate_descriptors.return_value
def _populate(_filter: object) -> DescriptorBatchReport:
_write_dummy_sidecars(harness, marker=marker)
return success_report
harness.descriptor_batcher.populate_descriptors.side_effect = _populate
@@ -0,0 +1,40 @@
"""AZ-839 AC-9 — integration test: fixture produces a real :class:`PopulatedC6Cache`.
Gated by ``RUN_REPLAY_E2E=1`` AND ``@pytest.mark.tier2`` per the
AZ-839 task spec. The work the test asserts is the fixture's
contract; the fixture wiring itself lives in
``tests/e2e/replay/conftest.py::operator_pre_flight_setup`` and the
algorithmic correctness is covered by
``test_operator_pre_flight_driver.py`` against stubs (AC-8).
This test exists so AC-9 has a concrete pytest entry point. Other
end-to-end consumers (AZ-840 e2e orchestrator test; AZ-841 un-xfail
of the AZ-777 Tier-2 tests) chain off the same fixture.
"""
from __future__ import annotations
import pytest
from tests.e2e.replay._operator_pre_flight import PopulatedC6Cache
@pytest.mark.tier2
def test_operator_pre_flight_setup_produces_populated_cache(
operator_pre_flight_setup: PopulatedC6Cache,
) -> None:
# Arrange
populated = operator_pre_flight_setup
# Assert
assert isinstance(populated, PopulatedC6Cache)
assert populated.cache_root.is_dir()
assert populated.tile_store_path.is_dir()
assert populated.faiss_index_path.is_file()
assert populated.faiss_sidecar_sha256_path.is_file()
assert populated.faiss_sidecar_meta_path.is_file()
assert populated.tile_count > 0
assert populated.elapsed_seconds >= 0.0
assert populated.route_spec.waypoints, (
"RouteSpec must carry at least one waypoint extracted from the tlog"
)