Files
Oleksandr Bezdieniezhnykh 47ad43f913 [AZ-598] Batch 78: sitl_observer.wait_for_outbound + FT-P-01 fixture builder
Phase 1: extend sitl_observer with cursor-based `wait_for_outbound`
returning `OutboundMessage` from `outbound_messages_<fc_kind>_<host>.json`
fixtures. Three outcomes: message, TimeoutError (null entries), or
RuntimeError (missing/malformed). Fix FT-P-01 + FT-P-05 scenarios to
use `fc_kind=` kwarg.

Phase 2: FT-P-01 vertical-slice fixture builder under
`e2e/fixtures/sitl_replay_builder/`. Reuses the production
`gps-denied-replay` CLI + `ReplayInputAdapter`: encode 60 stills as
1 fps MP4 + synthetic stationary tlog (pymavlink); run replay;
project FDR outbound estimates into the schema. Avoids the
13+ cp of SUT-side frame-ingestion that a live-SITL-capture path
would have required. Live execution remains a manual operator step.

+35 unit tests (664 total, up from 637). K=3 cumulative review for
b76-b78 documents the offline-replay arc convergence.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-17 12:08:02 +03:00

6.7 KiB

Cumulative Code Review — Batches 76, 77, 78

Date: 2026-05-17 Verdict: PASS

Covers the arc:

  • Batch 76 / AZ-596fc_proxy_runtime driver for FT-N-04 (FDR-replay mode).
  • Batch 77 / AZ-597replay_mode.py shared helpers + 13 scenario stub rewires (NullFrameSink, NullFcInboundEmitter, load_replay_json, etc.).
  • Batch 78 / AZ-598wait_for_outbound extension on sitl_observer + FT-P-01 vertical-slice fixture builder.

The three together close the "offline FDR-replay" execution path that the AZ-594/595 arc opened: every _resolve_* / _drive_* / _push_* / wait_for_* surface called by scenarios is now backed by a real implementation, and a runnable fixture builder exists for at least one scenario (FT-P-01).

Cross-Cutting Themes

Convergence on a single offline-replay pattern

All three batches deliberately use the same shape:

  1. Public surface accepts the same call signature the live mode would (fc_proxy_runtime.drive_fc_proxy(schedule_path, *, now_ms_provider=None), _FdrReplayObserver.wait_for_outbound(timeout_s=None), imu_replay_noop(csv_path)).
  2. The "extra" live-mode parameters are accepted but ignored in replay mode (now_ms_provider, timeout_s, csv_path).
  3. Replay-mode data is loaded lazily from ${E2E_SITL_REPLAY_DIR}/<filename>.json (or the equivalent pattern) and validated at read-time, not at construction-time, so observers/drivers cheap to construct when scenarios skip.
  4. Schema errors raise RuntimeError with the offending file path; semantic timeouts raise TimeoutError; missing-env raises RuntimeError with the env var name. Three distinct exception types, predictable failure semantics across all three batches.

This is good. A future maintainer reading fc_proxy_runtime.py, sitl_observer.py, and replay_mode.py side-by-side will see the same pattern in all three. No drift.

Dependency injection for testability

All three batches use the same dependency-injection convention:

  • fc_proxy_runtime.drive_fc_proxy(..., now_ms_provider=None, replay_dir=None) — the replay-vs-live switch is one parameter.
  • build_p01_fixtures(..., _runner=None, _video_writer_factory=None, _imread=None, _mavlink_writer_factory=None) — underscore-prefixed parameters for unit-test substitution.
  • _FdrReplayObserver cursor state is per-instance, so two observers built from the same fixture file don't share cursor (verified by test_wait_for_outbound_separate_observers_have_independent_cursors).

No batch monkey-patches modules to inject test doubles. Substitution flows through public constructor / function parameters. This is the right pattern.

Documentation discipline

  • Each ticket spec landed in _docs/02_tasks/todo/ and moved to _docs/02_tasks/done/ on batch completion.
  • Each batch produced a batch_<N>_report.md summarizing scope, files touched, test deltas.
  • Each batch produced a batch_<N>_review.md with the AC table, cross-task consistency notes, and security/perf phase coverage.
  • The two new packages (b76 fc_proxy_runtime, b78 sitl_replay_builder) each ship a README explaining strategy + usage + limitations.

Spec → Code Traceability

Ticket Spec ACs Implementation Test Coverage
AZ-596 (b76) 7 ACs (fc_proxy_runtime drive + replay-mode no-op + audit report + env-var resolution) e2e/runner/helpers/fc_proxy_runtime.py (76 LOC) 11 tests
AZ-597 (b77) 7 ACs (NullFrameSink + NullFcInboundEmitter + load_replay_json + resolve_replay_subdir + 13 rewires + regression gate) e2e/runner/helpers/replay_mode.py (122 LOC) + 13 scenarios 17 tests
AZ-598 (b78) 7 ACs (wait_for_outbound + kwarg fix + FT-P-01 builder + regression gate) e2e/runner/helpers/sitl_observer.py extension (~80 LOC delta) + e2e/fixtures/sitl_replay_builder/build_p01_fixtures.py (~330 LOC) 35 tests (11 observer + 24 builder)

Every AC has at least one direct test that exercises it. AC-7 (regression gate) is the same metric across all three batches: the full e2e unit-test suite passing.

  • Test count trajectory: 608 → 619 (b76) → 626 (b77) → 637 (b78 phase 1) → 664 (b78 phase 2). Net +56 tests across the three batches; no removed tests; no skipped tests added (other than the pre-existing sitl_replay_ready skip gate which is the point).
  • Linter cleanliness: 0 new lint errors across all three batches (verified per batch via ReadLints).
  • Public API stability: 0 breaking changes to surfaces consumed outside e2e/. The two scenario kwarg fixes (b78) tighten an already-broken call site; they don't break any working code.
  • Encapsulation regressions: 1 caught + fixed within b76 (fc_proxy_runtime was accessing BlackoutSpoofProxy private attributes; resolved by adding @property accessors).

Lessons Learned (Propagating to Future Batches)

  1. Audit scenario call sites before extending helpers. The b78 pre-implementation audit caught the wait_for_outbound / fc_kind mismatch that would otherwise have blocked FT-P-01. This pattern (grep for <helper>.<surface> across e2e/tests/ first, then implement) catches mis-specified scenarios before the implementation locks in a format.
  2. Re-export discipline matters. The b78 __init__.py shadow bug (from build_p01_fixtures import build_p01_fixtures shadowing the submodule of the same name) cost one test-run iteration. The fix is documented in the package's __init__.py docstring so a future contributor doesn't re-introduce it.
  3. "Live" vs "offline" scope must be set up-front. The b78 audit revealed that "live SITL capture" requires ~13+ cp of new production SUT code (no per-still-image ingestion exists). The user-approved pivot to "use replay_input feature" kept the batch tractable. Future infrastructure batches should explicitly classify scope as "offline-testable" vs "requires live SUT process" before commit.
  4. Documentation gaps surface in cross-batch audit. The user's "is the upload feature implemented?" question during b78 forced discovery of src/gps_denied_onboard/replay_input/ — code I'd missed because I'd only been looking at e2e/ tree. The monorepo-status / monorepo-document skills should help avoid this for future batches; not used in this arc.

Recommendation

Continue the per-scenario fixture-builder pattern in follow-up tickets (one builder ticket per major scenario family, structured the same way as AZ-598). Open a ticket to converge fc_kind / fc_adapter naming. Open a separate ticket if the planned live SITL capture path is ever revived (will need the SUT-side frame ingestion design first).