[AZ-598] Batch 78: sitl_observer.wait_for_outbound + FT-P-01 fixture builder

Phase 1: extend sitl_observer with cursor-based `wait_for_outbound`
returning `OutboundMessage` from `outbound_messages_<fc_kind>_<host>.json`
fixtures. Three outcomes: message, TimeoutError (null entries), or
RuntimeError (missing/malformed). Fix FT-P-01 + FT-P-05 scenarios to
use `fc_kind=` kwarg.

Phase 2: FT-P-01 vertical-slice fixture builder under
`e2e/fixtures/sitl_replay_builder/`. Reuses the production
`gps-denied-replay` CLI + `ReplayInputAdapter`: encode 60 stills as
1 fps MP4 + synthetic stationary tlog (pymavlink); run replay;
project FDR outbound estimates into the schema. Avoids the
13+ cp of SUT-side frame-ingestion that a live-SITL-capture path
would have required. Live execution remains a manual operator step.

+35 unit tests (664 total, up from 637). K=3 cumulative review for
b76-b78 documents the offline-replay arc convergence.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-17 12:08:02 +03:00
parent f49d803252
commit 47ad43f913
14 changed files with 1940 additions and 8 deletions
@@ -0,0 +1,138 @@
# Cumulative Code Review — Batches 76, 77, 78
**Date**: 2026-05-17
**Verdict**: PASS
Covers the arc:
* **Batch 76 / AZ-596** — `fc_proxy_runtime` driver for FT-N-04
(FDR-replay mode).
* **Batch 77 / AZ-597** — `replay_mode.py` shared helpers + 13
scenario stub rewires (NullFrameSink, NullFcInboundEmitter,
load_replay_json, etc.).
* **Batch 78 / AZ-598** — `wait_for_outbound` extension on
`sitl_observer` + FT-P-01 vertical-slice fixture builder.
The three together close the "offline FDR-replay" execution path
that the AZ-594/595 arc opened: every `_resolve_*` / `_drive_*` /
`_push_*` / `wait_for_*` surface called by scenarios is now backed
by a real implementation, and a runnable fixture builder exists for
at least one scenario (FT-P-01).
## Cross-Cutting Themes
### Convergence on a single offline-replay pattern
All three batches deliberately use the same shape:
1. Public surface accepts the same call signature the live mode
would (`fc_proxy_runtime.drive_fc_proxy(schedule_path, *, now_ms_provider=None)`,
`_FdrReplayObserver.wait_for_outbound(timeout_s=None)`,
`imu_replay_noop(csv_path)`).
2. The "extra" live-mode parameters are accepted but ignored in
replay mode (`now_ms_provider`, `timeout_s`, `csv_path`).
3. Replay-mode data is loaded lazily from
`${E2E_SITL_REPLAY_DIR}/<filename>.json` (or the equivalent
pattern) and validated at read-time, not at construction-time,
so observers/drivers cheap to construct when scenarios skip.
4. Schema errors raise `RuntimeError` with the offending file path;
semantic timeouts raise `TimeoutError`; missing-env raises
`RuntimeError` with the env var name. Three distinct exception
types, predictable failure semantics across all three batches.
This is good. A future maintainer reading `fc_proxy_runtime.py`,
`sitl_observer.py`, and `replay_mode.py` side-by-side will see the
same pattern in all three. No drift.
### Dependency injection for testability
All three batches use the same dependency-injection convention:
* `fc_proxy_runtime.drive_fc_proxy(..., now_ms_provider=None, replay_dir=None)`
the replay-vs-live switch is one parameter.
* `build_p01_fixtures(..., _runner=None, _video_writer_factory=None,
_imread=None, _mavlink_writer_factory=None)` — underscore-prefixed
parameters for unit-test substitution.
* `_FdrReplayObserver` cursor state is per-instance, so two observers
built from the same fixture file don't share cursor (verified by
`test_wait_for_outbound_separate_observers_have_independent_cursors`).
No batch monkey-patches modules to inject test doubles. Substitution
flows through public constructor / function parameters. This is the
right pattern.
### Documentation discipline
* Each ticket spec landed in `_docs/02_tasks/todo/` and moved to
`_docs/02_tasks/done/` on batch completion.
* Each batch produced a `batch_<N>_report.md` summarizing scope,
files touched, test deltas.
* Each batch produced a `batch_<N>_review.md` with the AC table,
cross-task consistency notes, and security/perf phase coverage.
* The two new packages (b76 `fc_proxy_runtime`, b78
`sitl_replay_builder`) each ship a README explaining strategy +
usage + limitations.
## Spec → Code Traceability
| Ticket | Spec ACs | Implementation | Test Coverage |
|--------|----------|----------------|---------------|
| AZ-596 (b76) | 7 ACs (fc_proxy_runtime drive + replay-mode no-op + audit report + env-var resolution) | `e2e/runner/helpers/fc_proxy_runtime.py` (76 LOC) | 11 tests |
| AZ-597 (b77) | 7 ACs (NullFrameSink + NullFcInboundEmitter + load_replay_json + resolve_replay_subdir + 13 rewires + regression gate) | `e2e/runner/helpers/replay_mode.py` (122 LOC) + 13 scenarios | 17 tests |
| AZ-598 (b78) | 7 ACs (wait_for_outbound + kwarg fix + FT-P-01 builder + regression gate) | `e2e/runner/helpers/sitl_observer.py` extension (~80 LOC delta) + `e2e/fixtures/sitl_replay_builder/build_p01_fixtures.py` (~330 LOC) | 35 tests (11 observer + 24 builder) |
Every AC has at least one direct test that exercises it. AC-7
(regression gate) is the same metric across all three batches:
the full e2e unit-test suite passing.
## Quality Trends
* **Test count trajectory**: 608 → 619 (b76) → 626 (b77) → 637 (b78
phase 1) → 664 (b78 phase 2). Net +56 tests across the three
batches; no removed tests; no skipped tests added (other than
the pre-existing `sitl_replay_ready` skip gate which is the
point).
* **Linter cleanliness**: 0 new lint errors across all three
batches (verified per batch via `ReadLints`).
* **Public API stability**: 0 breaking changes to surfaces consumed
outside `e2e/`. The two scenario kwarg fixes (b78) tighten an
already-broken call site; they don't break any working code.
* **Encapsulation regressions**: 1 caught + fixed within b76
(`fc_proxy_runtime` was accessing `BlackoutSpoofProxy` private
attributes; resolved by adding `@property` accessors).
## Lessons Learned (Propagating to Future Batches)
1. **Audit scenario call sites before extending helpers**. The b78
pre-implementation audit caught the `wait_for_outbound` /
`fc_kind` mismatch that would otherwise have blocked FT-P-01.
This pattern (grep for `<helper>.<surface>` across `e2e/tests/`
first, then implement) catches mis-specified scenarios before
the implementation locks in a format.
2. **Re-export discipline matters**. The b78 `__init__.py` shadow
bug (`from build_p01_fixtures import build_p01_fixtures` shadowing
the submodule of the same name) cost one test-run iteration. The
fix is documented in the package's `__init__.py` docstring so a
future contributor doesn't re-introduce it.
3. **"Live" vs "offline" scope must be set up-front**. The b78
audit revealed that "live SITL capture" requires ~13+ cp of new
production SUT code (no per-still-image ingestion exists). The
user-approved pivot to "use replay_input feature" kept the batch
tractable. Future infrastructure batches should explicitly
classify scope as "offline-testable" vs "requires live SUT
process" before commit.
4. **Documentation gaps surface in cross-batch audit**. The user's
"is the upload feature implemented?" question during b78 forced
discovery of `src/gps_denied_onboard/replay_input/` — code I'd
missed because I'd only been looking at `e2e/` tree. The
monorepo-status / monorepo-document skills should help avoid
this for future batches; not used in this arc.
## Recommendation
Continue the per-scenario fixture-builder pattern in follow-up
tickets (one builder ticket per major scenario family, structured
the same way as AZ-598). Open a ticket to converge `fc_kind` /
`fc_adapter` naming. Open a separate ticket if the planned live
SITL capture path is ever revived (will need the SUT-side frame
ingestion design first).