[AZ-839] [AZ-835] operator_pre_flight_setup real fixture (E-AZ-835 C3)

Replace the placeholder operator_pre_flight_setup pytest fixture (the
mkdir stub at tests/e2e/replay/conftest.py:293-310) with a real driver
that wires C1 (AZ-836 RouteSpec) + C2 (AZ-838 SatelliteProviderRoute
Client) + C11 (AZ-316 HttpTileDownloader) + C10 (AZ-322 Descriptor
Batcher) end-to-end and yields a typed PopulatedC6Cache. AZ-306 FAISS
sidecar triple-consistency is verified post-rebuild via a caller-
supplied descriptor_index_factory; partial sidecars are cleaned up on
failure (AC-7) while pre-existing warm-cache files are preserved.
Algorithm lives in tests/e2e/replay/_operator_pre_flight.py with
pure dependency injection so the AC-8 unit suite (11 tests covering
happy / transient-retry / terminal-failure / validation-error /
tamper-detection / cleanup-on-failure) runs against stubs and the
AC-9 Tier-2 integration test runs the same algorithm against the
real Jetson harness. The conftest fixture skip-gates on RUN_REPLAY
_E2E + SATELLITE_PROVIDER_URL/API_KEY + BUILD_FAISS_INDEX +
GPS_DENIED_OPERATOR_CONFIG_PATH and wires deps through the existing
runtime_root factories. Supersedes AZ-777 Phase 3.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-23 15:08:34 +03:00
parent 0ed1a5d988
commit bfcac2cb9f
7 changed files with 1544 additions and 17 deletions
@@ -0,0 +1,175 @@
# Batch 108 — Cycle 3 — AZ-839 operator_pre_flight_setup real fixture
**Date**: 2026-05-23
**Tasks**: AZ-839 (C3 — Epic AZ-835).
**Story points**: 5.
**Jira status**: AZ-839 → In Progress (transitioned at batch start);
moves to In Testing at commit step.
## What shipped
Third building block of Epic AZ-835. Replaces the placeholder
`operator_pre_flight_setup` pytest fixture (the previous `mkdir`
stub at `tests/e2e/replay/conftest.py:293-310`) with a real
driver that wires C1+C2+C11+C10 end-to-end:
1. **C1 RouteSpec** — extracted from the Derkachi tlog via AZ-836's
`extract_route_from_tlog` (the existing `derkachi_replay_inputs`
session fixture supplies the tlog path; the new fixture chains
off that contract).
2. **C2 SatelliteProviderRouteClient**`seed_route(spec)` with the
bounded transient-retry ladder documented in AZ-839 AC-5.
Validation / terminal failures propagate unchanged (AC-4).
3. **C11 HttpTileDownloader**`download_tiles_for_area(request)`
over a bbox derived from the route waypoints (mirrors C2's
internal `_enumerate_route_tile_coords` envelope without
importing the private helper).
4. **C10 DescriptorBatcher**`populate_descriptors(corpus_filter)`
builds the FAISS HNSW index over the populated C6 cache. The
AZ-306 sidecar triple-consistency is verified by re-loading the
index through a caller-supplied `descriptor_index_factory` after
the rebuild — any tampering surfaces as `IndexUnavailableError`
(AC-6).
5. **Cleanup-on-failure** — partial sidecar files written by the
driver are removed if any step raises, while pre-existing warm
cache files are preserved (AC-7).
Algorithm (`populate_c6_from_route`) is exposed through pure
dependency injection so the AC-8 unit tests run against stubs and
the AC-9 integration test runs the same algorithm against real
collaborators on the Jetson harness.
## Files changed
Tests / fixtures (4):
- `tests/e2e/replay/_operator_pre_flight.py` (new, ~430 lines) —
the AZ-839 driver: `PopulatedC6Cache` dataclass +
`populate_c6_from_route()` + private helpers
(`_seed_route_with_retry`, `_route_bbox`,
`_cleanup_partial_sidecars`).
- `tests/e2e/replay/conftest.py` — replaces the placeholder fixture
with the real `operator_pre_flight_setup` (session-scoped,
skip-gated by `RUN_REPLAY_E2E` + `SATELLITE_PROVIDER_URL` +
`SATELLITE_PROVIDER_API_KEY` + `BUILD_FAISS_INDEX` +
`GPS_DENIED_OPERATOR_CONFIG_PATH`); adds three private helpers
(`_operator_pre_flight_skip_reason`,
`_build_operator_pre_flight_cache`,
`_build_replay_backbone_embedder`,
`_resolve_replay_descriptor_dim`, `_default_tile_decoder`).
- `tests/e2e/replay/test_operator_pre_flight_driver.py` (new,
~410 lines) — 11 unit tests exercising AC-3 / AC-4 / AC-5 / AC-6
/ AC-7 against stubbed `SatelliteProviderRouteClient` /
`HttpTileDownloader` / `DescriptorBatcher` /
`descriptor_index_factory`.
- `tests/e2e/replay/test_operator_pre_flight_integration.py` (new,
~40 lines) — Tier-2 + RUN_REPLAY_E2E gated test that consumes the
fixture and asserts the `PopulatedC6Cache` invariants (AC-9
pytest entry point).
Tracker docs (1):
- `_docs/03_implementation/batch_108_cycle3_report.md` (this file).
No production-code (`src/gps_denied_onboard/**`) modifications.
The driver lives under `tests/` because AZ-839's outcome is the
fixture, not a new operator-binary surface; the wiring it does is
the existing operator-side runtime factories
(`runtime_root.c10_factory`, `runtime_root.c11_factory`,
`runtime_root.storage_factory`, `runtime_root.inference_factory`)
already shipped under prior epics.
## AC coverage
| AC | Test(s) | Status |
|----|---------|--------|
| AC-1 cold first invocation ≤ 5 min | exercised on Tier-2 via AC-9 integration test; `PopulatedC6Cache.elapsed_seconds` instruments the budget | DEFERRED (Tier-2 only) |
| AC-2 warm invocation ≤ 30 s | same gated test, re-invocation within session reuses the named-volume mount | DEFERRED (Tier-2 only) |
| AC-3 populated cache + sidecar triple | `test_populate_c6_from_route_returns_populated_cache` + `test_populate_c6_from_route_passes_sector_class_to_downloader` | PASS |
| AC-4 validation/terminal propagate | `test_route_validation_error_propagates_unchanged` + `test_route_terminal_failure_propagates_unchanged` | PASS |
| AC-5 transient retry ladder (3 attempts, backoff) | `test_route_transient_error_retries_then_succeeds` + `test_route_transient_error_exhausted_propagates_last_attempt` | PASS |
| AC-6 tamper detection → `IndexUnavailableError` | `test_descriptor_index_factory_index_unavailable_propagates` | PASS |
| AC-7 cleanup on failure (no half-built sidecars) | `test_cleanup_removes_partial_sidecar_files_on_failure` + `test_cleanup_preserves_pre_existing_warm_cache` + `test_batcher_failure_propagates_and_cleans_up` + `test_downloader_failure_propagates_and_cleans_up` | PASS |
| AC-8 unit tests with stubs (happy / transient / terminal / validation / tamper / cleanup) | 11 tests in `test_operator_pre_flight_driver.py` | PASS |
| AC-9 integration on Jetson via fixture | `test_operator_pre_flight_setup_produces_populated_cache` (RUN_REPLAY_E2E + tier2 gated) | DEFERRED (Tier-2 only) |
DEFERRED ACs (AC-1, AC-2, AC-9) execute on the Jetson e2e harness
when `RUN_REPLAY_E2E=1` + `SATELLITE_PROVIDER_URL` +
`SATELLITE_PROVIDER_API_KEY` + `BUILD_FAISS_INDEX=ON` +
`GPS_DENIED_OPERATOR_CONFIG_PATH` are set. The pytest entry point
exists and skips explicitly per `.cursor/skills/implement/SKILL.md`
Step 8 ("a skipped test counts as Covered").
## Test run results
```
$ .venv/bin/pytest tests/e2e/replay/test_operator_pre_flight_driver.py -v --tb=short
============================== 11 passed in 0.33s ==============================
$ .venv/bin/pytest tests/e2e/replay/test_operator_pre_flight_integration.py -v --tb=short
============================== 1 skipped in 0.29s ==============================
(SKIPPED — Tier-2-only test; set GPS_DENIED_TIER=2 to run)
$ .venv/bin/pytest tests/e2e/replay/ -v --tb=short --timeout=60
====================== 28 passed, 8 skipped in 1.14s =======================
```
Suite-wide test run is deferred to Step 11 (Run Tests) per the
iterative-skill exception in `.cursor/rules/coderule.mdc` — batch
108 is a batch, not the end of cycle-3 implementation.
## Code review (self-review)
Per `.cursor/rules/no-subagents.mdc`, the structured `/code-review`
skill is run inline. Verdict: **PASS_WITH_WARNINGS**.
| Phase | Result |
|-------|--------|
| 1. Context loading | AZ-839 task spec + dependencies (AZ-836 RouteSpec, AZ-838 SatelliteProviderRouteClient, AZ-322 DescriptorBatcher, AZ-316 HttpTileDownloader, AZ-306 FaissDescriptorIndex) all read prior to implementation. The FAISS triple-consistency check was verified against `faiss_descriptor_index._load()` source. |
| 2. Spec compliance | AC-3 / AC-4 / AC-5 / AC-6 / AC-7 / AC-8 directly covered. AC-1 / AC-2 / AC-9 deferred to Tier-2 harness (gated tests exist). **No Medium / High findings.** |
| 3. Code quality | Driver is one function with one responsibility (orchestrate the C1+C2+C11+C10 pipeline); SRP upheld. Each helper is named after its job (`_seed_route_with_retry`, `_route_bbox`, `_cleanup_partial_sidecars`). Functions ≤ ~80 lines. Explicit exception filtering (`RouteValidationError`, `RouteTerminalFailureError`, `RouteTransientError`) — no bare except. Tests follow Arrange/Act/Assert with comment markers per `coderule.mdc`. |
| 4. Security quick-scan | JWT consumed via env-sourced kwargs, never logged. The cleanup path does not unlink files outside the `cache_root/` tree (only the three sidecar paths the driver was handed). |
| 5. Performance scan | O(n) over waypoints (n ≤ 10 by AZ-836's `max_waypoints` default). No new N+1. The retry ladder respects the AZ-838 `_DEFAULT_BACKOFF_SCHEDULE_S` cadence verbatim. |
| 6. Cross-task consistency | Single-task batch — N/A. |
| 7. Architecture compliance | `_operator_pre_flight.py` lives under `tests/e2e/replay/` (test infrastructure). Imports only from C10 / C11 / C6 public surfaces and from `replay_input.tlog_route.RouteSpec` (Adapter layer per `module-layout.md`). The conftest fixture wires deps via the existing `runtime_root` factories — does not import concrete impl modules directly. No cross-component imports between C-prefixed components. No new cyclic dependencies. ADR check skipped (no ADRs directory). |
### Findings
**F1 (Low) — `_default_tile_decoder` lives in conftest.py**
`_default_tile_decoder` (JPEG → CHW float32 numpy) lives in the
test conftest. The same primitive will be needed by the eventual
replay-mode operator binary (Epic AZ-835 follow-up); promoting it
into `runtime_root` is out of scope for AZ-839 (which is "wire C10
into a real fixture"), but it is on the path of AZ-840 / AZ-841.
**Recommendation**: leave as-is for AZ-839; revisit during AZ-840.
**F2 (Low) — `_resolve_replay_descriptor_dim` is NetVLAD-only**
The NetVLAD descriptor dim resolver pinned at `c2_vpr/config.py:67`
matches the AZ-839 task spec's "Out of scope" §, but it skips the
fixture if any other backbone is configured. **Recommendation**:
when AZ-840 needs a non-NetVLAD backbone, extend the resolver
table per strategy. Tracking via the AZ-840 spec is sufficient.
### Deltas vs. spec
None. The task spec mentions `download_for_bbox`; the actual
production method is `download_tiles_for_area` (a `bbox`-aware
single-zoom request via `DownloadRequest`). The spec was informal
on the method name; the production API (which has been stable
since AZ-316) was honoured.
## Notes for follow-up
- AZ-840 (e2e orchestrator test) consumes this fixture. The
fixture already returns a typed `PopulatedC6Cache` so AZ-840 has
a concrete contract to assert against.
- AZ-841 (un-xfail AZ-777 Tier-2 tests) builds on AZ-839 + AZ-840.
The existing `test_ac8_operator_workflow` skip reason in
`test_derkachi_1min.py` (D-PROJ-2 mock-suite-sat-service) is
stale post-AZ-839 — AZ-841 will rewrite it to consume the new
fixture.
- AZ-842 (docs — replay_protocol.md Invariant 12 + architecture +
orchestrator README) describes the route-driven flow this batch
ships.