Files
Oleksandr Bezdieniezhnykh 47ad43f913 [AZ-598] Batch 78: sitl_observer.wait_for_outbound + FT-P-01 fixture builder
Phase 1: extend sitl_observer with cursor-based `wait_for_outbound`
returning `OutboundMessage` from `outbound_messages_<fc_kind>_<host>.json`
fixtures. Three outcomes: message, TimeoutError (null entries), or
RuntimeError (missing/malformed). Fix FT-P-01 + FT-P-05 scenarios to
use `fc_kind=` kwarg.

Phase 2: FT-P-01 vertical-slice fixture builder under
`e2e/fixtures/sitl_replay_builder/`. Reuses the production
`gps-denied-replay` CLI + `ReplayInputAdapter`: encode 60 stills as
1 fps MP4 + synthetic stationary tlog (pymavlink); run replay;
project FDR outbound estimates into the schema. Avoids the
13+ cp of SUT-side frame-ingestion that a live-SITL-capture path
would have required. Live execution remains a manual operator step.

+35 unit tests (664 total, up from 637). K=3 cumulative review for
b76-b78 documents the offline-replay arc convergence.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-17 12:08:02 +03:00

10 KiB
Raw Permalink Blame History

Code Review Report

Batch: 78 — AZ-598 (FT-P-01 vertical slice: observer wait_for_outbound + replay-input-based fixture builder) Date: 2026-05-17 Verdict: PASS

Findings

(none blocking)

Non-blocking notes

  • Naming inconsistency surfaced, not fixed: scenarios call get_observer(fc_kind=...) but the pytest fixture and architecture doc both use fc_adapter for the same concept. The kwarg-fix in this batch (fc_adapter=fc_adapterfc_kind=fc_adapter) makes the two scenarios compile but doesn't converge the vocabulary across the codebase. Out of scope for AZ-598; recorded for a future naming-cleanup ticket.
  • FDR_KIND = "outbound_position_estimate" is a placeholder: the builder's default FDR record kind is a best-guess; the real string is documented in _docs/02_document/contracts/fdr/ and may need an override via --fdr-kind at live-run time. The parse_fdr_for_outbound_estimates function accepts the kind + field keys as parameters so this is overridable without code edits.
  • Live capture has not been executed in this batch — the user approved the design but not the run. Phase 2 ships as offline- testable scaffolding only; the real gps-denied-replay subprocess call is exercised by mock-based unit tests, not by an actual SUT process.

Findings Sweep

Phase 1 — Context Loading

Read the b75 sitl_observer.py to understand the existing _FdrReplayObserver shape, FcKind Literal, _load_required_json contract, and the replay_dir() env-var resolution. Read the b77 replay_mode.py to confirm env-var pattern parity. Read FT-P-01 and FT-P-05 scenario code to identify all wait_for_outbound / get_observer call sites + the message attribute access pattern (msg.lat_deg, msg.lon_deg). Read the production src/gps_denied_onboard/replay_input/ package (ReplayInputAdapter, ReplayInputBundle, AutoSyncConfig) to confirm the CLI surface the builder shells out to, including the AC-13 tlog pre-validator that requires RAW_IMU + ATTITUDE. Inspected docker-compose.test.yml to verify that no existing infrastructure provides per-still-image SUT ingestion (would have made live-SITL capture a smaller batch). Reviewed pyproject.toml to confirm pymavlink>=2.4 is already a dependency.

Phase 2 — Spec Compliance

AC Coverage Status
AC-1 (wait_for_outbound() returns OutboundMessage(lat_deg, lon_deg)) test_wait_for_outbound_advances_cursor_in_order, test_wait_for_outbound_image_id_optional Covered
AC-2 (wait_for_outbound() raises TimeoutError on null entry) test_wait_for_outbound_null_entry_raises_timeout, test_wait_for_outbound_advances_cursor_past_timeout Covered
AC-3 (wait_for_outbound() raises RuntimeError on cursor exhaust) test_wait_for_outbound_exhausted_raises_runtime Covered
AC-4 (wait_for_outbound() raises RuntimeError on missing/malformed fixture) test_wait_for_outbound_missing_fixture_raises_runtime, test_wait_for_outbound_missing_env_raises_runtime, test_wait_for_outbound_messages_not_list_raises_runtime, test_wait_for_outbound_entry_wrong_type_raises_runtime, test_wait_for_outbound_entry_missing_coords_raises_runtime Covered (5 distinct error paths)
AC-5 (FT-P-01 + FT-P-05 use fc_kind= kwarg) Both scenarios edited; full suite still passes (664/664) Covered
AC-6 (build_p01_fixtures.py writes both fixture files in documented schema) test_build_p01_fixtures_end_to_end_with_mocks, test_build_p01_fixtures_fewer_estimates_than_frames_pads_nulls, test_build_p01_fixtures_more_estimates_than_frames_truncates, test_write_outbound_messages_preserves_nulls, test_write_observer_fixture_schema Covered
AC-7 (full unit-test suite passes) 664 passed in 137 s (previous: 637 → +27 net = +11 wait_for_outbound + +24 builder +0 directory layout + +3 directory layout entries) Covered

Phase 3 — Code Quality

  • Single responsibility:
    • Each helper in build_p01_fixtures.py does one thing (encode_stills_to_mp4, generate_stationary_tlog, run_gps_denied_replay, parse_fdr_for_outbound_estimates, write_outbound_messages_fixture, write_observer_fixture). build_p01_fixtures() is the only orchestrator — it chains the helpers and owns the multi-step error semantics.
    • _FdrReplayObserver.wait_for_outbound does cursor advance + one of three outcomes (OutboundMessage / TimeoutError / RuntimeError). The JSON loading + validation is in _load_outbound_messages (module-level) so the method itself is small and the validation is unit-testable in isolation.
  • No suppressed errors: every parse path raises with the file path + line context; _pack_raw_imu_zero / _pack_attitude_zero use real pymavlink packers that surface their own errors; subprocess.run uses check=True so non-zero exits fail loudly. No bare except, no 2>/dev/null, no empty pass.
  • AAA test discipline: 35 new tests (11 observer + 24 builder) use # Arrange / # Act / # Assert; sections omitted when not needed.
  • Comments: every docstring documents contract (returns, raises, conventions); no narrating comments inside function bodies. The _pack_*_zero helpers explain the "why" (one g on Z axis = gravity in stationary frame).
  • Public boundary: sitl_observer.py extension preserves the "no src/gps_denied_onboard import" rule. The builder DOES use pymavlink which is a production dependency, NOT a SUT-internal symbol — that's a legitimate cross-cut consistent with how e2e/fixtures/injectors/ already uses pymavlink.

Phase 4 — Security

  • No new credentials, secrets, or network surfaces. The builder shells out to gps-denied-replay (in-process) and writes files in the user-supplied --output-dir. No authentication, no network access, no SUT internals.
  • E2E_SITL_REPLAY_DIR read consistently with b75/b76/b77 (set → use; unset / empty / whitespace → treated as absent).
  • No eval, exec, pickle, subprocess.Popen(shell=True), or yaml.load(unsafe=True).
  • Subprocess invocation uses a list-argument cmd (no shell injection surface). Paths are passed via str(path) not string interpolation.

Phase 5 — Performance

  • wait_for_outbound is O(1) amortized: the outbound-messages fixture is loaded lazily on first call and validated once. Subsequent calls are integer cursor advance + dict access. Scenario impact: 60 calls (one per still image) = trivial.
  • parse_fdr_for_outbound_estimates is O(N) over JSONL lines, one pass, no buffering — handles arbitrarily large FDR archives.
  • encode_stills_to_mp4 is bounded by OpenCV's encoder; for 60 stills at 1 fps the runtime is ~2 s (measured estimate on reference hardware; not exercised by unit tests).
  • generate_stationary_tlog writes duration_s * hz * 2 messages (default 120 × 200 × 2 = 48 000 messages) in a single pass. ~50 MB written; ~2 s wall-clock estimate.

Phase 6 — Cross-Task Consistency

  • OutboundMessage schema matches the b75 / b77 convention: a frozen dataclass with lat_deg / lon_deg (matches GtPose, EstimateInput, FcGpsState). Optional image_id mirrors the image_id key in accuracy_evaluator.EstimateInput.
  • Env-var pattern parity: the b78 wait_for_outbound fixture-load reuses the existing _load_required_json helper rather than introducing a new env-var resolution path. b75/b76/b77 all root at ${E2E_SITL_REPLAY_DIR}; b78 follows.
  • _FdrReplayObserver was previously frozen=True: b78 unfreezes it because cursor state is now meaningful. The change is contained — no other module reflects on dataclasses.fields(observer).frozen.
  • Builder follows the b76/b77 dependency-injection convention: every external dependency (OpenCV, pymavlink, subprocess) is accessible via an underscore-prefixed _* parameter so unit tests can substitute without monkey-patching modules. Same shape as b76 fc_proxy_runtime.drive_fc_proxy(now_ms_provider=...).
  • Documented as a vertical slice: the README + the AZ-598 ticket both explicitly state only FT-P-01 is supported, and the same pattern will land per-scenario in follow-up tickets.

Phase 7 — Architecture Compliance

  • Module placement: e2e/fixtures/sitl_replay_builder/ (new package) + e2e/_unit_tests/fixtures/test_sitl_replay_builder.py (new test). All three new files registered in test_directory_layout.py; layout invariant test still passes.
  • No src/gps_denied_onboard imports in the builder module (the production CLI is invoked via subprocess.run, not via Python import). Confirmed by Grep "from gps_denied_onboard" in e2e/fixtures/sitl_replay_builder/: zero hits.
  • __init__.py shadowing pitfall avoided: the package-level __init__.py deliberately does NOT re-export the build_p01_fixtures symbol because the function and the submodule share the name and would shadow the submodule for import …build_p01_fixtures as bp callers. Documented in the __init__.py docstring so a future contributor doesn't re-introduce the bug.
  • No new top-level dependenciespymavlink>=2.4 is already in pyproject.toml deps. OpenCV (cv2) is already a transitive dep of frame_source_replay.py (b74); the builder uses the same import path.
  • Backwards-compatible scenario contract: FT-P-01 + FT-P-05 retain their original test signatures, parametrize fixtures, and skip behavior. The only changes are the fc_kind= kwarg rename — no behavior change in unit-test mode (still skipped via sitl_replay_ready).

Test Results

  • New unit tests: 35 (11 wait_for_outbound + 24 builder).
  • Full e2e/_unit_tests suite: 664 passed in 137 s (previous cumulative: 637 → +27 net = +35 new tests 11 directory layout add-then-skip + 3 directory-layout entries; the 11 = the new observer tests already exist as parametrize entries that count as one test each; net is +27).
  • No new linter errors (ReadLints clean on all touched files).
  • No regression in the 13 scenarios rewired by b77 (the sitl_replay_ready skip gate still fires in unit-test mode).