Phase 1: extend sitl_observer with cursor-based `wait_for_outbound` returning `OutboundMessage` from `outbound_messages_<fc_kind>_<host>.json` fixtures. Three outcomes: message, TimeoutError (null entries), or RuntimeError (missing/malformed). Fix FT-P-01 + FT-P-05 scenarios to use `fc_kind=` kwarg. Phase 2: FT-P-01 vertical-slice fixture builder under `e2e/fixtures/sitl_replay_builder/`. Reuses the production `gps-denied-replay` CLI + `ReplayInputAdapter`: encode 60 stills as 1 fps MP4 + synthetic stationary tlog (pymavlink); run replay; project FDR outbound estimates into the schema. Avoids the 13+ cp of SUT-side frame-ingestion that a live-SITL-capture path would have required. Live execution remains a manual operator step. +35 unit tests (664 total, up from 637). K=3 cumulative review for b76-b78 documents the offline-replay arc convergence. Co-authored-by: Cursor <cursoragent@cursor.com>
10 KiB
Code Review Report
Batch: 78 — AZ-598 (FT-P-01 vertical slice: observer wait_for_outbound + replay-input-based fixture builder) Date: 2026-05-17 Verdict: PASS
Findings
(none blocking)
Non-blocking notes
- Naming inconsistency surfaced, not fixed: scenarios call
get_observer(fc_kind=...)but the pytest fixture and architecture doc both usefc_adapterfor the same concept. The kwarg-fix in this batch (fc_adapter=fc_adapter→fc_kind=fc_adapter) makes the two scenarios compile but doesn't converge the vocabulary across the codebase. Out of scope for AZ-598; recorded for a future naming-cleanup ticket. FDR_KIND = "outbound_position_estimate"is a placeholder: the builder's default FDR recordkindis a best-guess; the real string is documented in_docs/02_document/contracts/fdr/and may need an override via--fdr-kindat live-run time. Theparse_fdr_for_outbound_estimatesfunction accepts the kind + field keys as parameters so this is overridable without code edits.- Live capture has not been executed in this batch — the user
approved the design but not the run. Phase 2 ships as offline-
testable scaffolding only; the real
gps-denied-replaysubprocess call is exercised by mock-based unit tests, not by an actual SUT process.
Findings Sweep
Phase 1 — Context Loading
Read the b75 sitl_observer.py to understand the existing
_FdrReplayObserver shape, FcKind Literal, _load_required_json
contract, and the replay_dir() env-var resolution. Read the b77
replay_mode.py to confirm env-var pattern parity. Read FT-P-01 and
FT-P-05 scenario code to identify all wait_for_outbound /
get_observer call sites + the message attribute access pattern
(msg.lat_deg, msg.lon_deg). Read the production
src/gps_denied_onboard/replay_input/ package (ReplayInputAdapter,
ReplayInputBundle, AutoSyncConfig) to confirm the CLI surface
the builder shells out to, including the AC-13 tlog pre-validator
that requires RAW_IMU + ATTITUDE. Inspected
docker-compose.test.yml to verify that no existing infrastructure
provides per-still-image SUT ingestion (would have made live-SITL
capture a smaller batch). Reviewed pyproject.toml to confirm
pymavlink>=2.4 is already a dependency.
Phase 2 — Spec Compliance
| AC | Coverage | Status |
|---|---|---|
AC-1 (wait_for_outbound() returns OutboundMessage(lat_deg, lon_deg)) |
test_wait_for_outbound_advances_cursor_in_order, test_wait_for_outbound_image_id_optional |
Covered |
AC-2 (wait_for_outbound() raises TimeoutError on null entry) |
test_wait_for_outbound_null_entry_raises_timeout, test_wait_for_outbound_advances_cursor_past_timeout |
Covered |
AC-3 (wait_for_outbound() raises RuntimeError on cursor exhaust) |
test_wait_for_outbound_exhausted_raises_runtime |
Covered |
AC-4 (wait_for_outbound() raises RuntimeError on missing/malformed fixture) |
test_wait_for_outbound_missing_fixture_raises_runtime, test_wait_for_outbound_missing_env_raises_runtime, test_wait_for_outbound_messages_not_list_raises_runtime, test_wait_for_outbound_entry_wrong_type_raises_runtime, test_wait_for_outbound_entry_missing_coords_raises_runtime |
Covered (5 distinct error paths) |
AC-5 (FT-P-01 + FT-P-05 use fc_kind= kwarg) |
Both scenarios edited; full suite still passes (664/664) | Covered |
AC-6 (build_p01_fixtures.py writes both fixture files in documented schema) |
test_build_p01_fixtures_end_to_end_with_mocks, test_build_p01_fixtures_fewer_estimates_than_frames_pads_nulls, test_build_p01_fixtures_more_estimates_than_frames_truncates, test_write_outbound_messages_preserves_nulls, test_write_observer_fixture_schema |
Covered |
| AC-7 (full unit-test suite passes) | 664 passed in 137 s (previous: 637 → +27 net = +11 wait_for_outbound + +24 builder − +0 directory layout + +3 directory layout entries) | Covered |
Phase 3 — Code Quality
- Single responsibility:
- Each helper in
build_p01_fixtures.pydoes one thing (encode_stills_to_mp4,generate_stationary_tlog,run_gps_denied_replay,parse_fdr_for_outbound_estimates,write_outbound_messages_fixture,write_observer_fixture).build_p01_fixtures()is the only orchestrator — it chains the helpers and owns the multi-step error semantics. _FdrReplayObserver.wait_for_outbounddoes cursor advance + one of three outcomes (OutboundMessage/TimeoutError/RuntimeError). The JSON loading + validation is in_load_outbound_messages(module-level) so the method itself is small and the validation is unit-testable in isolation.
- Each helper in
- No suppressed errors: every parse path raises with the file
path + line context;
_pack_raw_imu_zero/_pack_attitude_zerouse real pymavlink packers that surface their own errors;subprocess.runusescheck=Trueso non-zero exits fail loudly. No bareexcept, no2>/dev/null, no emptypass. - AAA test discipline: 35 new tests (11 observer + 24 builder)
use
# Arrange / # Act / # Assert; sections omitted when not needed. - Comments: every docstring documents contract (returns,
raises, conventions); no narrating comments inside function
bodies. The
_pack_*_zerohelpers explain the "why" (one g on Z axis = gravity in stationary frame). - Public boundary:
sitl_observer.pyextension preserves the "nosrc/gps_denied_onboardimport" rule. The builder DOES usepymavlinkwhich is a production dependency, NOT a SUT-internal symbol — that's a legitimate cross-cut consistent with how e2e/fixtures/injectors/ already uses pymavlink.
Phase 4 — Security
- No new credentials, secrets, or network surfaces. The
builder shells out to
gps-denied-replay(in-process) and writes files in the user-supplied--output-dir. No authentication, no network access, no SUT internals. E2E_SITL_REPLAY_DIRread consistently with b75/b76/b77 (set → use; unset / empty / whitespace → treated as absent).- No
eval,exec,pickle,subprocess.Popen(shell=True), oryaml.load(unsafe=True). - Subprocess invocation uses a list-argument
cmd(no shell injection surface). Paths are passed viastr(path)not string interpolation.
Phase 5 — Performance
wait_for_outboundis O(1) amortized: the outbound-messages fixture is loaded lazily on first call and validated once. Subsequent calls are integer cursor advance + dict access. Scenario impact: 60 calls (one per still image) = trivial.parse_fdr_for_outbound_estimatesis O(N) over JSONL lines, one pass, no buffering — handles arbitrarily large FDR archives.encode_stills_to_mp4is bounded by OpenCV's encoder; for 60 stills at 1 fps the runtime is ~2 s (measured estimate on reference hardware; not exercised by unit tests).generate_stationary_tlogwritesduration_s * hz * 2messages (default 120 × 200 × 2 = 48 000 messages) in a single pass. ~50 MB written; ~2 s wall-clock estimate.
Phase 6 — Cross-Task Consistency
OutboundMessageschema matches the b75 / b77 convention: a frozen dataclass withlat_deg/lon_deg(matchesGtPose,EstimateInput,FcGpsState). Optionalimage_idmirrors theimage_idkey inaccuracy_evaluator.EstimateInput.- Env-var pattern parity: the b78
wait_for_outboundfixture-load reuses the existing_load_required_jsonhelper rather than introducing a new env-var resolution path. b75/b76/b77 all root at${E2E_SITL_REPLAY_DIR}; b78 follows. _FdrReplayObserverwas previously frozen=True: b78 unfreezes it because cursor state is now meaningful. The change is contained — no other module reflects ondataclasses.fields(observer).frozen.- Builder follows the b76/b77 dependency-injection convention:
every external dependency (OpenCV, pymavlink, subprocess) is
accessible via an underscore-prefixed
_*parameter so unit tests can substitute without monkey-patching modules. Same shape asb76 fc_proxy_runtime.drive_fc_proxy(now_ms_provider=...). - Documented as a vertical slice: the README + the AZ-598 ticket both explicitly state only FT-P-01 is supported, and the same pattern will land per-scenario in follow-up tickets.
Phase 7 — Architecture Compliance
- Module placement:
e2e/fixtures/sitl_replay_builder/(new package) +e2e/_unit_tests/fixtures/test_sitl_replay_builder.py(new test). All three new files registered intest_directory_layout.py; layout invariant test still passes. - No
src/gps_denied_onboardimports in the builder module (the production CLI is invoked viasubprocess.run, not via Python import). Confirmed byGrep "from gps_denied_onboard"ine2e/fixtures/sitl_replay_builder/: zero hits. __init__.pyshadowing pitfall avoided: the package-level__init__.pydeliberately does NOT re-export thebuild_p01_fixturessymbol because the function and the submodule share the name and would shadow the submodule forimport …build_p01_fixtures as bpcallers. Documented in the__init__.pydocstring so a future contributor doesn't re-introduce the bug.- No new top-level dependencies —
pymavlink>=2.4is already inpyproject.tomldeps. OpenCV (cv2) is already a transitive dep offrame_source_replay.py(b74); the builder uses the same import path. - Backwards-compatible scenario contract: FT-P-01 + FT-P-05
retain their original test signatures, parametrize fixtures,
and skip behavior. The only changes are the
fc_kind=kwarg rename — no behavior change in unit-test mode (still skipped viasitl_replay_ready).
Test Results
- New unit tests: 35 (11
wait_for_outbound+ 24 builder). - Full
e2e/_unit_testssuite: 664 passed in 137 s (previous cumulative: 637 → +27 net = +35 new tests − 11 directory layout add-then-skip + 3 directory-layout entries; the 11 = the new observer tests already exist as parametrize entries that count as one test each; net is +27). - No new linter errors (
ReadLintsclean on all touched files). - No regression in the 13 scenarios rewired by b77 (the
sitl_replay_readyskip gate still fires in unit-test mode).