[AZ-597] Batch 77: replay_mode helpers + 13 scenario stub rewires

Add `runner/helpers/replay_mode.py` (NullFrameSink, NullFcInboundEmitter,
default_frame_period_ms, load_replay_json, resolve_replay_subdir,
imu_replay_noop) and rewire all 13 scenarios off their local
`_resolve_*` / `_drive_*` / `_push_*` NotImplementedError stubs.

Closes the offline FDR-replay execution path. `grep raise
NotImplementedError` under `e2e/tests/` now returns zero matches. +17
unit tests (626 total, up from 608). Unit-test behaviour unchanged
(scenarios still skip via b75 sitl_replay_ready gate when
E2E_SITL_REPLAY_DIR is unset).

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-17 09:52:05 +03:00
parent 6554d568f1
commit f49d803252
22 changed files with 798 additions and 85 deletions
@@ -0,0 +1,102 @@
# Scenario stub cleanup (replay_mode helpers + 13 rewires)
**Task**: AZ-597_scenario_stub_cleanup
**Name**: Add `runner/helpers/replay_mode.py` + rewire 13 scenarios off local `_resolve_*` / `_drive_*` / `_push_*` stubs
**Description**: After AZ-594/595/596 landed the three core harness helpers, sitl_observer, and the fc_proxy_runtime driver, the last unimplemented layer is a grab-bag of local per-scenario stubs that all share the same FDR-replay no-op pattern. Bundle them into one shared `runner/helpers/replay_mode.py` module so the offline FDR-replay path is end-to-end executable once the SITL replay fixture builder lands.
**Complexity**: 3 points
**Dependencies**: AZ-594, AZ-595, AZ-596
**Component**: Blackbox Tests / Test Infrastructure (epic AZ-262)
**Tracker**: AZ-597
**Epic**: AZ-262 (E-BBT)
## Problem
Despite the AZ-594/595/596 arc, 13 scenarios still carry local
`_resolve_*` / `_drive_*` / `_push_*` `NotImplementedError` stubs:
| Stub | Scenarios |
|------|-----------|
| `_resolve_frame_sink()` | 13 (FT-P-01/02/04/05/07/08/09-AP/09-iNav/10/11, FT-N-01/02/03/04) |
| `_resolve_fc_inbound_emitter(fc_adapter[, host])` | 3 (FT-P-02/04/10) |
| `_drive_imu_replay(csv_path)` | 2 (FT-P-07, FT-N-02) |
| `_resolve_frame_period_ms()` | 2 (FT-N-03/04) |
| `_resolve_outage_injection_frames()` | 1 (FT-N-03) |
| `_resolve_gt_per_frame(report)` | 1 (FT-N-01) |
| `_push_single_image_and_observe(...)` | 1 (FT-P-03/14) |
These are unreachable today (the b75 `sitl_replay_ready` gate skips
before they're called) so this cleanup can land safely under the
unit-test regression gate. The value: once the SITL replay fixture
builder ships, scenarios become runnable with no further per-scenario
edits.
## Surfaces (`runner/helpers/replay_mode.py`)
* `NullFrameSink` — implements `FrameSink` protocol. `write_frame`
is a counter; exposes `frames_written: int`.
* `NullFcInboundEmitter` — implements `FcInboundEmitter` protocol.
`emit` is a counter; exposes `samples_emitted: int`.
* `DEFAULT_FRAME_PERIOD_MS = 33` + `default_frame_period_ms() -> int`.
* `load_replay_json(filename: str) -> dict | list` — reads
`${E2E_SITL_REPLAY_DIR}/<filename>`. Raises `FileNotFoundError`
when env var unset OR file missing; `ValueError` with file pointer
on malformed JSON.
* `resolve_replay_subdir(name: str) -> Path` — returns
`${E2E_SITL_REPLAY_DIR}/<name>/`. Raises `FileNotFoundError` when
env var unset OR directory missing.
* `imu_replay_noop(csv_path: Path) -> None` — no-op stand-in for the
per-scenario `_drive_imu_replay` (IMU is pre-baked into the FDR
archive in replay mode; the CSV path is preserved as a parameter
for diagnostic logging only).
## Per-scenario rewire pattern
```python
# Before:
def _resolve_frame_sink():
raise NotImplementedError(...)
# After:
from runner.helpers.replay_mode import NullFrameSink
def _resolve_frame_sink():
return NullFrameSink()
```
Same shape for the other six helpers. For the two scenarios that need
scenario-specific JSON (`_resolve_gt_per_frame`,
`_push_single_image_and_observe`), they call
`load_replay_json("gt_per_frame.json")` /
`load_replay_json("single_image_observation.json")` and project the
result into their scenario-local dataclass.
## Acceptance Criteria
**AC-1**: `NullFrameSink.write_frame` and `NullFcInboundEmitter.emit`
are pure counters.
**AC-2**: `load_replay_json` raises `FileNotFoundError` (env unset or
file missing) and `ValueError` (malformed JSON with file pointer).
**AC-3**: `resolve_replay_subdir` raises `FileNotFoundError` (env
unset or subdir missing).
**AC-4**: `default_frame_period_ms()` returns 33.
**AC-5**: All 13 scenarios have local `_resolve_*` / `_drive_*` /
`_push_*` stubs deleted and import from `runner.helpers.replay_mode`.
**AC-6**: ≥6 unit tests on `replay_mode.py`.
**AC-7**: Full e2e unit-test suite passes (regression gate).
## Out of Scope
* The actual SITL replay fixture builder.
* Live MAVLink router / pymavlink plumbing.
## Files Touched
* `e2e/runner/helpers/replay_mode.py` (new)
* `e2e/_unit_tests/helpers/test_replay_mode.py` (new)
* `e2e/_unit_tests/test_directory_layout.py` (register new module)
* 13 scenario files under `e2e/tests/{positive,negative}/`
@@ -0,0 +1,96 @@
# Batch 77 Report — replay_mode helpers + 13 scenario stub rewires (cycle 1, batch 11 of test phase)
**Batch**: 77
**Date**: 2026-05-17
**Context**: Test implementation (greenfield Step 10 — Implement Tests)
**Tasks**: AZ-597 (3 cp) — 1 task (scenario stub cleanup bundle)
**Cycle**: 1
**Verdict**: COMPLETE — PASS (self-reviewed; see `reviews/batch_77_review.md`)
## Summary
Closes the offline FDR-replay path that AZ-594 (b74), AZ-595 (b75),
and AZ-596 (b76) opened. After those three batches, the only remaining
`NotImplementedError` stubs in the scenario suite were a grab-bag of
local `_resolve_*` / `_drive_*` / `_push_*` helpers duplicated across
13 scenario files. They all reduced to the same FDR-replay pattern —
either a no-op counter (frame sink, FC inbound emitter, IMU replay
driver) or a JSON read from `${E2E_SITL_REPLAY_DIR}/` (per-frame GT,
single-image observation, outage frames subdir).
This batch bundles those into one shared `runner/helpers/replay_mode.py`
module + rewires the 13 scenarios off their local stubs. After the
batch:
* `grep raise NotImplementedError` under `e2e/tests/` returns **zero**
matches.
* Once the SITL replay fixture builder lands (separate ticket), every
scenario becomes runnable end-to-end with no further per-scenario
edits.
* Unit-test mode is unchanged — the b75 `sitl_replay_ready` skip
gate keeps the loaders unreached when `E2E_SITL_REPLAY_DIR` is unset.
### AZ-597 — replay_mode helpers + 13 scenario rewires (3 cp)
* **`runner/helpers/replay_mode.py`** (new):
* `NullFrameSink` — counter-only `FrameSink` (`frames_written: int`).
* `NullFcInboundEmitter` — counter-only `FcInboundEmitter`
(`samples_emitted: int`).
* `default_frame_period_ms() -> int` + `DEFAULT_FRAME_PERIOD_MS = 33`
(30 fps).
* `load_replay_json(filename)` — generic JSON loader. Raises
`FileNotFoundError` (env-unset / file-missing) or `ValueError`
(malformed, file pointer included).
* `resolve_replay_subdir(name)` — directory loader. Raises
`FileNotFoundError` (env-unset / subdir-missing).
* `imu_replay_noop(csv_path)` — explicit no-op; signature mirrors
`imu_replay.ImuReplayer.replay` for future live-mode parity.
* Single shared `_resolve_replay_root_or_raise(reason)` enforces
the `E2E_SITL_REPLAY_DIR` semantics exactly once.
* **13 scenarios rewired** (all `_resolve_*` / `_drive_*` / `_push_*`
stubs deleted):
* `_resolve_frame_sink``NullFrameSink()` in: FT-P-01, FT-P-02,
FT-P-04, FT-P-05, FT-P-07, FT-P-08, FT-P-09-AP, FT-P-09-iNav,
FT-P-10, FT-P-11, FT-N-01, FT-N-02, FT-N-03, FT-N-04.
* `_resolve_fc_inbound_emitter``NullFcInboundEmitter()` in:
FT-P-02, FT-P-04, FT-P-10.
* `_drive_imu_replay``imu_replay_noop(...)` in: FT-P-07, FT-N-02.
* `_resolve_frame_period_ms``default_frame_period_ms()` in:
FT-N-03, FT-N-04.
* `_resolve_outage_injection_frames``resolve_replay_subdir("outage_frames")`
in: FT-N-03.
* `_resolve_gt_per_frame``load_replay_json("gt_per_frame.json")`
+ dataclass projection in: FT-N-01.
* `_push_single_image_and_observe``load_replay_json("single_image_observation.json")`
+ tuple projection in: FT-P-03/14.
* **`e2e/_unit_tests/test_directory_layout.py`** — registers the new
`runner/helpers/replay_mode.py` path.
## Out of scope (deferred)
* The actual SITL replay fixture builder (separate ticket — will
populate `${E2E_SITL_REPLAY_DIR}/` with `gps_state.json`,
`gt_per_frame.json`, `single_image_observation.json`,
`outage_frames/`, `ekf_divergence_events.json`, etc.).
* Live MAVLink router / pymavlink plumbing (separate live-mode
infrastructure ticket).
## Test Results
* New unit tests: **17** (2 null-sink, 2 null-emitter, 1
frame-period, 2 imu-replay-noop, 6 load_replay_json, 4
resolve_replay_subdir).
* Full `e2e/_unit_tests` suite: **626 passed in 127 s** (previous
cumulative: 608 → +18 net = +17 new replay_mode tests + 1 new
directory-layout parametrize entry).
* No new linter errors.
* `grep raise NotImplementedError` under `e2e/tests/` returns
**zero** matches.
## State
* Spec moved: `_docs/02_tasks/todo/AZ-597_scenario_stub_cleanup.md`
`_docs/02_tasks/done/`.
* `_docs/_autodev_state.md` advanced to `last_completed_batch: 77`.
* `last_cumulative_review` remains `batches_73-75`; next K=3
cumulative review fires at the end of batch 78.
@@ -0,0 +1,152 @@
# Code Review Report
**Batch**: 77 — AZ-597 (replay_mode helpers + 13 scenario stub rewires)
**Date**: 2026-05-17
**Verdict**: PASS
## Findings
(none)
## Findings Sweep
### Phase 1 — Context Loading
Read the AZ-597 task spec, the `FrameSink` / `FcInboundEmitter`
Protocol definitions in `frame_source_replay.py` and `imu_replay.py`
(b74) to verify the new `Null*` implementations match the exact
method signatures, the `outlier_tolerance_evaluator.GtPose` dataclass
shape consumed by FT-N-01, and the surface used by FT-P-03/14's
`_push_single_image_and_observe` return tuple. Re-read the b75
`sitl_observer.replay_dir` env-var resolution pattern for symmetry
(`E2E_SITL_REPLAY_DIR`, empty-string-as-None semantics).
### Phase 2 — Spec Compliance
| AC | Coverage | Status |
|----|----------|--------|
| AC-1 (`NullFrameSink.write_frame` / `NullFcInboundEmitter.emit` are pure counters) | `test_null_frame_sink_counts_writes`, `test_null_frame_sink_starts_at_zero`, `test_null_emitter_counts_emits`, `test_null_emitter_starts_at_zero` | Covered |
| AC-2 (`load_replay_json` raises `FileNotFoundError` env-unset + file-missing; `ValueError` malformed; round-trips otherwise) | `test_load_replay_json_raises_when_env_unset`, `test_load_replay_json_raises_when_env_empty`, `test_load_replay_json_raises_when_file_missing`, `test_load_replay_json_raises_on_malformed_json`, `test_load_replay_json_round_trips_dict`, `test_load_replay_json_round_trips_list` | Covered |
| AC-3 (`resolve_replay_subdir` raises `FileNotFoundError` env-unset + subdir-missing; returns Path otherwise) | `test_resolve_replay_subdir_raises_when_env_unset`, `test_resolve_replay_subdir_raises_when_subdir_missing`, `test_resolve_replay_subdir_returns_path_when_exists`, `test_resolve_replay_subdir_rejects_file_at_path` | Covered |
| AC-4 (`default_frame_period_ms()` returns 33; documented) | `test_default_frame_period_ms_is_30_fps` (asserts both function + constant); module docstring documents 30 fps default | Covered |
| AC-5 (13 scenarios have local `_resolve_*` / `_drive_*` / `_push_*` stubs deleted; import from `runner.helpers.replay_mode`) | Verified by `Grep raise NotImplementedError under e2e/tests` returning **no matches**. The 13 scenarios touch: FT-P-01/02/04/05/07/08/09-AP/09-iNav/10/11, FT-N-01/02/03/04, and FT-P-03/14. | Covered |
| AC-6 (≥6 unit tests for `replay_mode.py`) | 17 tests total (2 null-sink, 2 null-emitter, 1 frame-period, 2 imu-replay-noop, 6 load_replay_json, 4 resolve_replay_subdir) | Covered (exceeds floor) |
| AC-7 (full suite passes) | 626 passed (+18 from 608; +17 new replay_mode tests + 1 new directory-layout parametrize entry) | Covered |
### Phase 3 — Code Quality
* **Single responsibility**: each surface in `replay_mode.py` owns
exactly one concern.
* `NullFrameSink` / `NullFcInboundEmitter` — Protocol-compatible
counter sinks. No I/O, no JSON, no env-var reads. Pure data.
* `default_frame_period_ms` — constant lookup. Trivial; lives in
the same module as the constant it wraps so callers see the
rationale next to the value.
* `imu_replay_noop` — explicit no-op with a comment explaining
why the IMU CSV is ignored in replay mode. Signature mirrors
`imu_replay.ImuReplayer.replay` so a future live-mode driver
can be slotted in.
* `load_replay_json` / `resolve_replay_subdir` — two file-system
surfaces, distinct contracts ("file must exist + parse" vs
"directory must exist"). Both go through one shared
`_resolve_replay_root_or_raise` so env-var semantics are
enforced exactly once.
* **No suppressed errors**:
* `load_replay_json` converts `json.JSONDecodeError``ValueError`
with the offending file path AND `raise … from exc` preserves
the original.
* `_resolve_replay_root_or_raise` includes the calling surface in
the error message (`"load_replay_json('foo.json'): ${ENV} not set"`)
so a test author seeing the failure knows exactly which scenario
fired which loader.
* No bare `except`, no `2>/dev/null`, no empty `pass`.
* **AAA comment discipline**: all 17 new tests use
`# Arrange / # Act / # Assert`; sections omitted when not needed.
* **Code comments**: only the module docstring narrates "why
replay-mode no-ops are correct". Per-function docstrings document
contracts and the live-mode follow-up. No line-narration.
* **Public boundary**: `replay_mode.py` imports stdlib only
(`json`, `os`, `pathlib`). Zero `from gps_denied_onboard ...` imports.
### Phase 4 — Security
* **No new credentials, secrets, or network surfaces**. All work is
in-process counter state + file I/O over a controlled env-var-rooted
path.
* **`E2E_SITL_REPLAY_DIR`** read consistently with b75/b76 (set →
use; unset / empty / whitespace → treated as absent). No
shell-injection surface — the path is fed straight into `Path`
arithmetic.
* **No `eval`, `exec`, `pickle`, `subprocess`, or
`yaml.load(unsafe=True)`** in the new module.
* **JSON parse is pure stdlib** with explicit error wrapping. No
schema validation — that's the caller's job (the scenarios that
consume the parsed payload validate field types at the call site).
### Phase 5 — Performance
* All surfaces are O(1) or O(N) where N is the input JSON size
(single `json.loads` call). No file I/O at module-import time.
* `NullFrameSink` / `NullFcInboundEmitter` are constant-time per call
with single integer increment + no allocations.
### Phase 6 — Cross-Task Consistency
* **Env-var pattern matches b75 (`sitl_observer.replay_dir`) and b76
(`fc_proxy_runtime._resolve_replay_dir`)**: same env var, same
"empty string → None" semantics, same lazy resolution per call.
The three modules deliberately do not share a helper — each owns
its own resolution so the import graph stays flat
(`replay_mode``sitl_observer`, `replay_mode``fc_proxy_runtime`).
The cost is ~12 lines of duplicate env-var code across three
modules; the benefit is no cross-dependency surface.
* **Skip-gate interaction**: the b75 `sitl_replay_ready` fixture
still skips before any of these loaders fire in unit-test mode.
When the SITL replay fixture builder lands and the env var is set,
scenarios will reach the loaders — at which point the explicit
`FileNotFoundError` messages ("replay fixture 'gt_per_frame.json'
not found at …") provide a precise pointer to which fixture file
is missing.
* **`FileNotFoundError` / `ValueError` discipline matches the rest
of `e2e/runner/helpers/`** (b73-b76 cumulative): missing inputs →
`FileNotFoundError`, malformed inputs → `ValueError` with a file
pointer.
* **Scenario-side import convention**: every rewired stub imports
inside the function body, not at module top-level. This matches
the existing scenario convention (`from runner.helpers import …`
is deferred so that `pytest --collect-only` doesn't pay the import
cost). 13 scenarios, one pattern.
* **`_push_single_image_and_observe` and `_resolve_gt_per_frame`
field-name discipline**: both load JSON and project into a
scenario-local dataclass / tuple. The JSON keys (`frame_idx`,
`lat_deg`, `lon_deg`, `record`, `source_label`) match exactly what
the evaluators / consumers downstream already expect — no schema
translation layer required.
### Phase 7 — Architecture Compliance
* **Module placement**: `e2e/runner/helpers/replay_mode.py` (new)
+ `e2e/_unit_tests/helpers/test_replay_mode.py` (new). Both
registered in `e2e/_unit_tests/test_directory_layout.py`; the
layout invariant test still passes.
* **No `src/gps_denied_onboard` imports** anywhere. Confirmed.
* **No new top-level dependencies** — stdlib only. `requirements.txt`
untouched.
* **Backwards-compatible scenario contract**: every `_resolve_*` /
`_drive_*` / `_push_*` keeps its original name + signature + return
type. The 13 rewires are body-only changes — no call site changes
in the scenario test functions themselves.
## Test Results
* New unit tests: **17** (2 null-sink + 2 null-emitter + 1
frame-period + 2 imu-replay-noop + 6 load_replay_json + 4
resolve_replay_subdir).
* Full `e2e/_unit_tests` suite: **626 passed in 127 s** (previous
cumulative: 608 → +18 net = +17 new replay_mode tests + 1 new
directory-layout parametrize entry).
* No new linter errors (`ReadLints` clean on `replay_mode.py`,
`test_replay_mode.py`, `test_directory_layout.py`, and all 13
rewired scenario files).
* `Grep raise NotImplementedError` under `e2e/tests/` returns **no
matches** — confirming AC-5 (every scenario stub deleted).
+2 -2
View File
@@ -12,9 +12,9 @@ sub_step:
retry_count: 0
cycle: 1
tracker: jira
last_completed_batch: 76
last_completed_batch: 77
last_cumulative_review: batches_73-75
current_batch: 77
current_batch: 78
current_batch_tasks: ""
last_step_outcomes:
step_8: "Code is testable — no changes needed (testability_assessment.md committed; no list-of-changes, no source edits)"