[AZ-597] Batch 77: replay_mode helpers + 13 scenario stub rewires

Add `runner/helpers/replay_mode.py` (NullFrameSink, NullFcInboundEmitter, default_frame_period_ms, load_replay_json, resolve_replay_subdir, imu_replay_noop) and rewire all 13 scenarios off their local `_resolve_*` / `_drive_*` / `_push_*` NotImplementedError stubs. Closes the offline FDR-replay execution path. `grep raise NotImplementedError` under `e2e/tests/` now returns zero matches. +17 unit tests (626 total, up from 608). Unit-test behaviour unchanged (scenarios still skip via b75 sitl_replay_ready gate when E2E_SITL_REPLAY_DIR is unset). Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-22 17:11:14 +00:00 · 2026-05-17 09:52:05 +03:00
parent 6554d568f1
commit f49d803252
22 changed files with 798 additions and 85 deletions
@@ -0,0 +1,102 @@
+# Scenario stub cleanup (replay_mode helpers + 13 rewires)
+
+**Task**: AZ-597_scenario_stub_cleanup
+**Name**: Add `runner/helpers/replay_mode.py` + rewire 13 scenarios off local `_resolve_*` / `_drive_*` / `_push_*` stubs
+**Description**: After AZ-594/595/596 landed the three core harness helpers, sitl_observer, and the fc_proxy_runtime driver, the last unimplemented layer is a grab-bag of local per-scenario stubs that all share the same FDR-replay no-op pattern. Bundle them into one shared `runner/helpers/replay_mode.py` module so the offline FDR-replay path is end-to-end executable once the SITL replay fixture builder lands.
+**Complexity**: 3 points
+**Dependencies**: AZ-594, AZ-595, AZ-596
+**Component**: Blackbox Tests / Test Infrastructure (epic AZ-262)
+**Tracker**: AZ-597
+**Epic**: AZ-262 (E-BBT)
+
+## Problem
+
+Despite the AZ-594/595/596 arc, 13 scenarios still carry local
+`_resolve_*` / `_drive_*` / `_push_*` `NotImplementedError` stubs:
+
+| Stub | Scenarios |
+|------|-----------|
+| `_resolve_frame_sink()` | 13 (FT-P-01/02/04/05/07/08/09-AP/09-iNav/10/11, FT-N-01/02/03/04) |
+| `_resolve_fc_inbound_emitter(fc_adapter[, host])` | 3 (FT-P-02/04/10) |
+| `_drive_imu_replay(csv_path)` | 2 (FT-P-07, FT-N-02) |
+| `_resolve_frame_period_ms()` | 2 (FT-N-03/04) |
+| `_resolve_outage_injection_frames()` | 1 (FT-N-03) |
+| `_resolve_gt_per_frame(report)` | 1 (FT-N-01) |
+| `_push_single_image_and_observe(...)` | 1 (FT-P-03/14) |
+
+These are unreachable today (the b75 `sitl_replay_ready` gate skips
+before they're called) so this cleanup can land safely under the
+unit-test regression gate. The value: once the SITL replay fixture
+builder ships, scenarios become runnable with no further per-scenario
+edits.
+
+## Surfaces (`runner/helpers/replay_mode.py`)
+
+* `NullFrameSink` — implements `FrameSink` protocol. `write_frame`
+  is a counter; exposes `frames_written: int`.
+* `NullFcInboundEmitter` — implements `FcInboundEmitter` protocol.
+  `emit` is a counter; exposes `samples_emitted: int`.
+* `DEFAULT_FRAME_PERIOD_MS = 33` + `default_frame_period_ms() -> int`.
+* `load_replay_json(filename: str) -> dict | list` — reads
+  `${E2E_SITL_REPLAY_DIR}/<filename>`. Raises `FileNotFoundError`
+  when env var unset OR file missing; `ValueError` with file pointer
+  on malformed JSON.
+* `resolve_replay_subdir(name: str) -> Path` — returns
+  `${E2E_SITL_REPLAY_DIR}/<name>/`. Raises `FileNotFoundError` when
+  env var unset OR directory missing.
+* `imu_replay_noop(csv_path: Path) -> None` — no-op stand-in for the
+  per-scenario `_drive_imu_replay` (IMU is pre-baked into the FDR
+  archive in replay mode; the CSV path is preserved as a parameter
+  for diagnostic logging only).
+
+## Per-scenario rewire pattern
+
+```python
+# Before:
+def _resolve_frame_sink():
+    raise NotImplementedError(...)
+
+# After:
+from runner.helpers.replay_mode import NullFrameSink
+def _resolve_frame_sink():
+    return NullFrameSink()
+```
+
+Same shape for the other six helpers. For the two scenarios that need
+scenario-specific JSON (`_resolve_gt_per_frame`,
+`_push_single_image_and_observe`), they call
+`load_replay_json("gt_per_frame.json")` /
+`load_replay_json("single_image_observation.json")` and project the
+result into their scenario-local dataclass.
+
+## Acceptance Criteria
+
+**AC-1**: `NullFrameSink.write_frame` and `NullFcInboundEmitter.emit`
+are pure counters.
+
+**AC-2**: `load_replay_json` raises `FileNotFoundError` (env unset or
+file missing) and `ValueError` (malformed JSON with file pointer).
+
+**AC-3**: `resolve_replay_subdir` raises `FileNotFoundError` (env
+unset or subdir missing).
+
+**AC-4**: `default_frame_period_ms()` returns 33.
+
+**AC-5**: All 13 scenarios have local `_resolve_*` / `_drive_*` /
+`_push_*` stubs deleted and import from `runner.helpers.replay_mode`.
+
+**AC-6**: ≥6 unit tests on `replay_mode.py`.
+
+**AC-7**: Full e2e unit-test suite passes (regression gate).
+
+## Out of Scope
+
+* The actual SITL replay fixture builder.
+* Live MAVLink router / pymavlink plumbing.
+
+## Files Touched
+
+* `e2e/runner/helpers/replay_mode.py` (new)
+* `e2e/_unit_tests/helpers/test_replay_mode.py` (new)
+* `e2e/_unit_tests/test_directory_layout.py` (register new module)
+* 13 scenario files under `e2e/tests/{positive,negative}/`
@@ -0,0 +1,96 @@
+# Batch 77 Report — replay_mode helpers + 13 scenario stub rewires (cycle 1, batch 11 of test phase)
+
+**Batch**: 77
+**Date**: 2026-05-17
+**Context**: Test implementation (greenfield Step 10 — Implement Tests)
+**Tasks**: AZ-597 (3 cp) — 1 task (scenario stub cleanup bundle)
+**Cycle**: 1
+**Verdict**: COMPLETE — PASS (self-reviewed; see `reviews/batch_77_review.md`)
+
+## Summary
+
+Closes the offline FDR-replay path that AZ-594 (b74), AZ-595 (b75),
+and AZ-596 (b76) opened. After those three batches, the only remaining
+`NotImplementedError` stubs in the scenario suite were a grab-bag of
+local `_resolve_*` / `_drive_*` / `_push_*` helpers duplicated across
+13 scenario files. They all reduced to the same FDR-replay pattern —
+either a no-op counter (frame sink, FC inbound emitter, IMU replay
+driver) or a JSON read from `${E2E_SITL_REPLAY_DIR}/` (per-frame GT,
+single-image observation, outage frames subdir).
+
+This batch bundles those into one shared `runner/helpers/replay_mode.py`
+module + rewires the 13 scenarios off their local stubs. After the
+batch:
+
+* `grep raise NotImplementedError` under `e2e/tests/` returns **zero**
+  matches.
+* Once the SITL replay fixture builder lands (separate ticket), every
+  scenario becomes runnable end-to-end with no further per-scenario
+  edits.
+* Unit-test mode is unchanged — the b75 `sitl_replay_ready` skip
+  gate keeps the loaders unreached when `E2E_SITL_REPLAY_DIR` is unset.
+
+### AZ-597 — replay_mode helpers + 13 scenario rewires (3 cp)
+
+* **`runner/helpers/replay_mode.py`** (new):
+  * `NullFrameSink` — counter-only `FrameSink` (`frames_written: int`).
+  * `NullFcInboundEmitter` — counter-only `FcInboundEmitter`
+    (`samples_emitted: int`).
+  * `default_frame_period_ms() -> int` + `DEFAULT_FRAME_PERIOD_MS = 33`
+    (30 fps).
+  * `load_replay_json(filename)` — generic JSON loader. Raises
+    `FileNotFoundError` (env-unset / file-missing) or `ValueError`
+    (malformed, file pointer included).
+  * `resolve_replay_subdir(name)` — directory loader. Raises
+    `FileNotFoundError` (env-unset / subdir-missing).
+  * `imu_replay_noop(csv_path)` — explicit no-op; signature mirrors
+    `imu_replay.ImuReplayer.replay` for future live-mode parity.
+  * Single shared `_resolve_replay_root_or_raise(reason)` enforces
+    the `E2E_SITL_REPLAY_DIR` semantics exactly once.
+* **13 scenarios rewired** (all `_resolve_*` / `_drive_*` / `_push_*`
+  stubs deleted):
+  * `_resolve_frame_sink` → `NullFrameSink()` in: FT-P-01, FT-P-02,
+    FT-P-04, FT-P-05, FT-P-07, FT-P-08, FT-P-09-AP, FT-P-09-iNav,
+    FT-P-10, FT-P-11, FT-N-01, FT-N-02, FT-N-03, FT-N-04.
+  * `_resolve_fc_inbound_emitter` → `NullFcInboundEmitter()` in:
+    FT-P-02, FT-P-04, FT-P-10.
+  * `_drive_imu_replay` → `imu_replay_noop(...)` in: FT-P-07, FT-N-02.
+  * `_resolve_frame_period_ms` → `default_frame_period_ms()` in:
+    FT-N-03, FT-N-04.
+  * `_resolve_outage_injection_frames` → `resolve_replay_subdir("outage_frames")`
+    in: FT-N-03.
+  * `_resolve_gt_per_frame` → `load_replay_json("gt_per_frame.json")`
+    + dataclass projection in: FT-N-01.
+  * `_push_single_image_and_observe` → `load_replay_json("single_image_observation.json")`
+    + tuple projection in: FT-P-03/14.
+* **`e2e/_unit_tests/test_directory_layout.py`** — registers the new
+  `runner/helpers/replay_mode.py` path.
+
+## Out of scope (deferred)
+
+* The actual SITL replay fixture builder (separate ticket — will
+  populate `${E2E_SITL_REPLAY_DIR}/` with `gps_state.json`,
+  `gt_per_frame.json`, `single_image_observation.json`,
+  `outage_frames/`, `ekf_divergence_events.json`, etc.).
+* Live MAVLink router / pymavlink plumbing (separate live-mode
+  infrastructure ticket).
+
+## Test Results
+
+* New unit tests: **17** (2 null-sink, 2 null-emitter, 1
+  frame-period, 2 imu-replay-noop, 6 load_replay_json, 4
+  resolve_replay_subdir).
+* Full `e2e/_unit_tests` suite: **626 passed in 127 s** (previous
+  cumulative: 608 → +18 net = +17 new replay_mode tests + 1 new
+  directory-layout parametrize entry).
+* No new linter errors.
+* `grep raise NotImplementedError` under `e2e/tests/` returns
+  **zero** matches.
+
+## State
+
+* Spec moved: `_docs/02_tasks/todo/AZ-597_scenario_stub_cleanup.md`
+  → `_docs/02_tasks/done/`.
+* `_docs/_autodev_state.md` advanced to `last_completed_batch: 77`.
+* `last_cumulative_review` remains `batches_73-75`; next K=3
+  cumulative review fires at the end of batch 78.
@@ -0,0 +1,152 @@
+# Code Review Report
+
+**Batch**: 77 — AZ-597 (replay_mode helpers + 13 scenario stub rewires)
+**Date**: 2026-05-17
+**Verdict**: PASS
+
+## Findings
+
+(none)
+
+## Findings Sweep
+
+### Phase 1 — Context Loading
+
+Read the AZ-597 task spec, the `FrameSink` / `FcInboundEmitter`
+Protocol definitions in `frame_source_replay.py` and `imu_replay.py`
+(b74) to verify the new `Null*` implementations match the exact
+method signatures, the `outlier_tolerance_evaluator.GtPose` dataclass
+shape consumed by FT-N-01, and the surface used by FT-P-03/14's
+`_push_single_image_and_observe` return tuple. Re-read the b75
+`sitl_observer.replay_dir` env-var resolution pattern for symmetry
+(`E2E_SITL_REPLAY_DIR`, empty-string-as-None semantics).
+
+### Phase 2 — Spec Compliance
+
+| AC | Coverage | Status |
+|----|----------|--------|
+| AC-1 (`NullFrameSink.write_frame` / `NullFcInboundEmitter.emit` are pure counters) | `test_null_frame_sink_counts_writes`, `test_null_frame_sink_starts_at_zero`, `test_null_emitter_counts_emits`, `test_null_emitter_starts_at_zero` | Covered |
+| AC-2 (`load_replay_json` raises `FileNotFoundError` env-unset + file-missing; `ValueError` malformed; round-trips otherwise) | `test_load_replay_json_raises_when_env_unset`, `test_load_replay_json_raises_when_env_empty`, `test_load_replay_json_raises_when_file_missing`, `test_load_replay_json_raises_on_malformed_json`, `test_load_replay_json_round_trips_dict`, `test_load_replay_json_round_trips_list` | Covered |
+| AC-3 (`resolve_replay_subdir` raises `FileNotFoundError` env-unset + subdir-missing; returns Path otherwise) | `test_resolve_replay_subdir_raises_when_env_unset`, `test_resolve_replay_subdir_raises_when_subdir_missing`, `test_resolve_replay_subdir_returns_path_when_exists`, `test_resolve_replay_subdir_rejects_file_at_path` | Covered |
+| AC-4 (`default_frame_period_ms()` returns 33; documented) | `test_default_frame_period_ms_is_30_fps` (asserts both function + constant); module docstring documents 30 fps default | Covered |
+| AC-5 (13 scenarios have local `_resolve_*` / `_drive_*` / `_push_*` stubs deleted; import from `runner.helpers.replay_mode`) | Verified by `Grep raise NotImplementedError under e2e/tests` returning **no matches**. The 13 scenarios touch: FT-P-01/02/04/05/07/08/09-AP/09-iNav/10/11, FT-N-01/02/03/04, and FT-P-03/14. | Covered |
+| AC-6 (≥6 unit tests for `replay_mode.py`) | 17 tests total (2 null-sink, 2 null-emitter, 1 frame-period, 2 imu-replay-noop, 6 load_replay_json, 4 resolve_replay_subdir) | Covered (exceeds floor) |
+| AC-7 (full suite passes) | 626 passed (+18 from 608; +17 new replay_mode tests + 1 new directory-layout parametrize entry) | Covered |
+
+### Phase 3 — Code Quality
+
+* **Single responsibility**: each surface in `replay_mode.py` owns
+  exactly one concern.
+  * `NullFrameSink` / `NullFcInboundEmitter` — Protocol-compatible
+    counter sinks. No I/O, no JSON, no env-var reads. Pure data.
+  * `default_frame_period_ms` — constant lookup. Trivial; lives in
+    the same module as the constant it wraps so callers see the
+    rationale next to the value.
+  * `imu_replay_noop` — explicit no-op with a comment explaining
+    why the IMU CSV is ignored in replay mode. Signature mirrors
+    `imu_replay.ImuReplayer.replay` so a future live-mode driver
+    can be slotted in.
+  * `load_replay_json` / `resolve_replay_subdir` — two file-system
+    surfaces, distinct contracts ("file must exist + parse" vs
+    "directory must exist"). Both go through one shared
+    `_resolve_replay_root_or_raise` so env-var semantics are
+    enforced exactly once.
+* **No suppressed errors**:
+  * `load_replay_json` converts `json.JSONDecodeError` → `ValueError`
+    with the offending file path AND `raise … from exc` preserves
+    the original.
+  * `_resolve_replay_root_or_raise` includes the calling surface in
+    the error message (`"load_replay_json('foo.json'): ${ENV} not set"`)
+    so a test author seeing the failure knows exactly which scenario
+    fired which loader.
+  * No bare `except`, no `2>/dev/null`, no empty `pass`.
+* **AAA comment discipline**: all 17 new tests use
+  `# Arrange / # Act / # Assert`; sections omitted when not needed.
+* **Code comments**: only the module docstring narrates "why
+  replay-mode no-ops are correct". Per-function docstrings document
+  contracts and the live-mode follow-up. No line-narration.
+* **Public boundary**: `replay_mode.py` imports stdlib only
+  (`json`, `os`, `pathlib`). Zero `from gps_denied_onboard ...` imports.
+
+### Phase 4 — Security
+
+* **No new credentials, secrets, or network surfaces**. All work is
+  in-process counter state + file I/O over a controlled env-var-rooted
+  path.
+* **`E2E_SITL_REPLAY_DIR`** read consistently with b75/b76 (set →
+  use; unset / empty / whitespace → treated as absent). No
+  shell-injection surface — the path is fed straight into `Path`
+  arithmetic.
+* **No `eval`, `exec`, `pickle`, `subprocess`, or
+  `yaml.load(unsafe=True)`** in the new module.
+* **JSON parse is pure stdlib** with explicit error wrapping. No
+  schema validation — that's the caller's job (the scenarios that
+  consume the parsed payload validate field types at the call site).
+
+### Phase 5 — Performance
+
+* All surfaces are O(1) or O(N) where N is the input JSON size
+  (single `json.loads` call). No file I/O at module-import time.
+* `NullFrameSink` / `NullFcInboundEmitter` are constant-time per call
+  with single integer increment + no allocations.
+
+### Phase 6 — Cross-Task Consistency
+
+* **Env-var pattern matches b75 (`sitl_observer.replay_dir`) and b76
+  (`fc_proxy_runtime._resolve_replay_dir`)**: same env var, same
+  "empty string → None" semantics, same lazy resolution per call.
+  The three modules deliberately do not share a helper — each owns
+  its own resolution so the import graph stays flat
+  (`replay_mode` ↛ `sitl_observer`, `replay_mode` ↛ `fc_proxy_runtime`).
+  The cost is ~12 lines of duplicate env-var code across three
+  modules; the benefit is no cross-dependency surface.
+* **Skip-gate interaction**: the b75 `sitl_replay_ready` fixture
+  still skips before any of these loaders fire in unit-test mode.
+  When the SITL replay fixture builder lands and the env var is set,
+  scenarios will reach the loaders — at which point the explicit
+  `FileNotFoundError` messages ("replay fixture 'gt_per_frame.json'
+  not found at …") provide a precise pointer to which fixture file
+  is missing.
+* **`FileNotFoundError` / `ValueError` discipline matches the rest
+  of `e2e/runner/helpers/`** (b73-b76 cumulative): missing inputs →
+  `FileNotFoundError`, malformed inputs → `ValueError` with a file
+  pointer.
+* **Scenario-side import convention**: every rewired stub imports
+  inside the function body, not at module top-level. This matches
+  the existing scenario convention (`from runner.helpers import …`
+  is deferred so that `pytest --collect-only` doesn't pay the import
+  cost). 13 scenarios, one pattern.
+* **`_push_single_image_and_observe` and `_resolve_gt_per_frame`
+  field-name discipline**: both load JSON and project into a
+  scenario-local dataclass / tuple. The JSON keys (`frame_idx`,
+  `lat_deg`, `lon_deg`, `record`, `source_label`) match exactly what
+  the evaluators / consumers downstream already expect — no schema
+  translation layer required.
+
+### Phase 7 — Architecture Compliance
+
+* **Module placement**: `e2e/runner/helpers/replay_mode.py` (new)
+  + `e2e/_unit_tests/helpers/test_replay_mode.py` (new). Both
+  registered in `e2e/_unit_tests/test_directory_layout.py`; the
+  layout invariant test still passes.
+* **No `src/gps_denied_onboard` imports** anywhere. Confirmed.
+* **No new top-level dependencies** — stdlib only. `requirements.txt`
+  untouched.
+* **Backwards-compatible scenario contract**: every `_resolve_*` /
+  `_drive_*` / `_push_*` keeps its original name + signature + return
+  type. The 13 rewires are body-only changes — no call site changes
+  in the scenario test functions themselves.
+
+## Test Results
+
+* New unit tests: **17** (2 null-sink + 2 null-emitter + 1
+  frame-period + 2 imu-replay-noop + 6 load_replay_json + 4
+  resolve_replay_subdir).
+* Full `e2e/_unit_tests` suite: **626 passed in 127 s** (previous
+  cumulative: 608 → +18 net = +17 new replay_mode tests + 1 new
+  directory-layout parametrize entry).
+* No new linter errors (`ReadLints` clean on `replay_mode.py`,
+  `test_replay_mode.py`, `test_directory_layout.py`, and all 13
+  rewired scenario files).
+* `Grep raise NotImplementedError` under `e2e/tests/` returns **no
+  matches** — confirming AC-5 (every scenario stub deleted).
@@ -12,9 +12,9 @@ sub_step:
 retry_count: 0
 cycle: 1
 tracker: jira
-last_completed_batch: 76
+last_completed_batch: 77
 last_cumulative_review: batches_73-75
-current_batch: 77
+current_batch: 78
 current_batch_tasks: ""
 last_step_outcomes:
  step_8: "Code is testable — no changes needed (testability_assessment.md committed; no list-of-changes, no source edits)"