[AZ-398] Replay: FrameSource + Clock Protocols + Clock injection

Ship the two Layer-1 cross-cutting Protocols replay mode needs to leave
production C1-C5 components mode-agnostic (Invariant 1) and replay-
deterministic (Invariant 2). Live + replay binaries see the same
interfaces; only the strategy differs.

* Clock Protocol (monotonic_ns / time_ns / sleep_until_ns) +
  WallClock (live + REALTIME replay) + TlogDerivedClock (ASAP replay;
  advance-on-call; non-monotonic source → ClockOrderingError).
* FrameSource Protocol (next_frame -> NavCameraFrame | None / close)
  + LiveCameraFrameSource (cv2.VideoCapture device index) +
  VideoFileFrameSource (cv2.VideoCapture file).
* Build-flag gating: BUILD_VIDEO_FILE_FRAME_SOURCE,
  BUILD_LIVE_CAMERA_FRAME_SOURCE (constructor-time check; Tier-0 OFF
  refuses construction with FrameSourceConfigError).
* Composition-root factories: build_clock + build_frame_source.
* Injected Clock across every component that previously called
  time.monotonic_ns() / time.sleep() directly: c5_state (estimator,
  ESKF, fallback watcher, source-label SM, isam2 handle), c8_fc_adapter
  (inbound MAVLink + MSP2, AP outbound, iNav outbound, QGC GCS),
  c13_fdr writer, c12_operator_tooling httpx flights client. All
  constructors default to WallClock() so existing call sites keep
  live-binary behaviour without a wiring change.
* AC-4 CI guard (tests/_meta/test_no_direct_time_in_components.py)
  AST-scans components/**/*.py for direct time.monotonic_ns /
  time.time_ns / time.sleep references and fails loudly with file:line.
* Conformance + factory tests: tests/unit/clock + tests/unit/frame_source.
* Test fixture updates: FallbackWatcher / SourceLabelStateMachine
  clock_ns is now required (removed time.monotonic_ns default);
  test_az388 patches estimator._clock instead of a module-level time;
  test_az393 ardupilot adapter uses a _FixedClock test double.

Excluded per the task spec: TlogReplayFcAdapter (AZ-399), ReplaySink
(AZ-400), compose_replay (AZ-401), CLI (AZ-402), Docker/CI (AZ-403),
E2E fixture (AZ-404), IMU auto-sync (AZ-405).

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-12 05:10:01 +03:00
parent 6c7d24f7e0
commit 823c0f1b2e
32 changed files with 1575 additions and 105 deletions
@@ -1,104 +0,0 @@
# Replay — FrameSource + Clock Protocols + LiveCameraFrameSource retrofit + VideoFileFrameSource
**Task**: AZ-398_replay_frame_source_clock
**Name**: `FrameSource` Protocol + `Clock` Protocol + `LiveCameraFrameSource` retrofit + `VideoFileFrameSource`
**Description**: Define the cross-cutting `FrameSource` Protocol (PEP 544 `@runtime_checkable`) at `src/gps_denied_onboard/frame_source/interface.py` (Layer 1 Foundation per `module-layout.md`) and the `Clock` Protocol at `src/gps_denied_onboard/clock/interface.py`. Implement `WallClock` (default for live/research/operator binaries) and `TlogDerivedClock` (replay-only; advances per tlog timestamps). Retrofit the existing camera-ingest plumbing into `LiveCameraFrameSource` — a NO-OP behaviour change wrapping the current ingest thread in the new Protocol shape. Implement `VideoFileFrameSource` (replay-only; gated `BUILD_VIDEO_FILE_FRAME_SOURCE`) using OpenCV `cv2.VideoCapture` to stream-decode an MP4/MKV/AVI file frame-by-frame at the configured `frame_rate_hz`; assigns `monotonic_ns` from `Clock` per Invariant 6. NO tlog parsing, NO replay composition, NO sink.
**Complexity**: 3 points
**Dependencies**: AZ-263, AZ-269, AZ-270, AZ-266; AZ-272 (FDR for the camera-source open/close events)
**Component**: replay-cross-cutting (epic AZ-265 / E-DEMO-REPLAY) — touches `frame_source/`, `clock/`, and runs a no-op restructure on existing camera plumbing (audited in code review)
**Tracker**: AZ-398
**Epic**: AZ-265 (E-DEMO-REPLAY)
### Document Dependencies
- `_docs/02_document/contracts/replay/replay_protocol.md``FrameSource` + `Clock` Protocols.
- `_docs/02_document/components/01_c1_vio/description.md` — current camera ingest semantics (the LiveCamera retrofit must preserve them).
- `_docs/02_document/module-layout.md` — Layer 1 Foundation; `frame_source/` and `clock/` directories.
- `_docs/02_document/architecture.md` — ADR-001, ADR-002, ADR-009.
## Problem
Without this task, replay mode has no formalised camera-frame interface — every consumer (C1) reaches into the implicit ingest thread directly, defeating mode-agnosticism (Invariant 1). The `Clock` Protocol is similarly missing — components calling `time.monotonic_ns()` directly in production code prevent replay determinism per R-DEMO-4 / Invariant 2. This task ships the foundation; downstream replay tasks depend on it.
## Outcome
- `src/gps_denied_onboard/frame_source/interface.py``FrameSource` Protocol with `next_frame()` and `close()`.
- `src/gps_denied_onboard/frame_source/__init__.py` — re-exports `FrameSource`.
- `src/gps_denied_onboard/frame_source/live_camera.py``LiveCameraFrameSource` (retrofit; gated `BUILD_LIVE_CAMERA_FRAME_SOURCE=ON` for live/research/operator/replay; `OFF` only for unit tests).
- `src/gps_denied_onboard/frame_source/video_file.py``VideoFileFrameSource` (gated `BUILD_VIDEO_FILE_FRAME_SOURCE=ON` only for replay binary).
- `src/gps_denied_onboard/clock/interface.py``Clock` Protocol with `monotonic_ns()`, `time_ns()`, `sleep_until_ns()`.
- `src/gps_denied_onboard/clock/__init__.py` — re-exports `Clock`.
- `src/gps_denied_onboard/clock/wall_clock.py``WallClock` strategy.
- `src/gps_denied_onboard/clock/tlog_derived.py``TlogDerivedClock` strategy (advances on each emitted frame; documented as advance-on-call semantics).
- Existing camera-ingest call sites refactored to use `FrameSource` interface via constructor injection (NO behaviour change; AC-3 verifies).
- Existing `time.monotonic_ns()` / `time.time_ns()` call sites in components refactored to use injected `Clock` (NO behaviour change; AC-4 verifies via AST scan).
- Build-flag wiring for the three new flags.
- Unit tests: Protocol conformance, frame ordering invariant, end-of-stream None semantics, transient-failure raises, build-flag gating, Clock determinism (TlogDerivedClock with mocked tlog).
## Scope
### Included
- Both Protocols.
- 4 strategy classes (LiveCamera + VideoFile + WallClock + TlogDerived).
- Retrofit of existing camera-ingest into LiveCameraFrameSource (NO behaviour change).
- AST-scan / grep CI guard for direct `time.monotonic_ns` usage in `components/` (Invariant 2 enforcement).
- Build-flag gating + factory/composition extension.
- Unit tests including ordering + EOS semantics + Clock determinism.
### Excluded
- `TlogReplayFcAdapter` — owned by next task.
- `ReplaySink` + `JsonlReplaySink` — owned by sink task.
- `compose_replay` — owned by composition task.
- CLI entrypoint — owned by CLI task.
- Component-internal acceptance tests — deferred to E-BBT.
## Acceptance Criteria
**AC-1: Protocol conformance**`runtime_checkable` `isinstance` returns True for `LiveCameraFrameSource`, `VideoFileFrameSource` against `FrameSource`; for `WallClock`, `TlogDerivedClock` against `Clock`.
**AC-2: VideoFileFrameSource produces ordered frames** — synthetic 60-frame test video; assert `next_frame()` returns 60 frames in `monotonic_ns` non-decreasing order; 61st call returns `None`; subsequent calls also `None`; `close()` is idempotent.
**AC-3: LiveCameraFrameSource retrofit no-op** — bring up the existing live camera path through `LiveCameraFrameSource`; capture 100 frames; assert frame metadata (camera_id, intrinsics, exposure timestamp) match the pre-retrofit baseline byte-for-byte (golden test).
**AC-4: Component AST scan passes** — CI guard scans `src/gps_denied_onboard/components/**/*.py` and asserts NO direct references to `time.monotonic_ns`, `time.time_ns`, `time.sleep` (Invariant 2). Test failure prints the offending file:line.
**AC-5: WallClock parity**`WallClock.monotonic_ns() - time.monotonic_ns()` ≤ 1 ms; `WallClock.time_ns() - time.time_ns()` ≤ 1 ms; `sleep_until_ns(now+100ms)` blocks for ~100 ms ± 5 ms.
**AC-6: TlogDerivedClock advance-on-call** — construct with a tlog-timestamp source emitting 1, 2, 3 ms; call `monotonic_ns()` three times; assert returned values are 1e6, 2e6, 3e6 ns; AC-fails on out-of-order tlog timestamps.
**AC-7: VideoFileFrameSource transient-failure raises** — corrupt video file → `FrameSourceError("video decode failed at frame N")` (NOT silent None per Invariant 4).
**AC-8: Build-flag gating**`BUILD_VIDEO_FILE_FRAME_SOURCE=OFF` → constructing `VideoFileFrameSource` raises `FrameSourceConfigError("BUILD_VIDEO_FILE_FRAME_SOURCE is OFF...")`. Same for `LiveCameraFrameSource` with its flag.
**AC-9: Public API re-exports**`from gps_denied_onboard.frame_source import FrameSource` and `from gps_denied_onboard.clock import Clock` resolve; concrete strategies are NOT in `__all__` (composition root only).
**AC-10: Frame source close is idempotent**`close()` called twice does not raise; the second call is a no-op + DEBUG log.
## Non-Functional Requirements
- `VideoFileFrameSource.next_frame()` p99 ≤ 5 ms on Tier-1 hardware for 1080p (target throughput ≥ 5× real time per epic NFT).
- `WallClock.monotonic_ns()` p99 ≤ 1 µs.
## Constraints
- OpenCV (`cv2`) is the existing camera dependency — reused by `VideoFileFrameSource` to avoid adding a new decoder dep.
- The retrofit MUST NOT change live-camera behaviour — AC-3 enforces (golden test).
- The AST-scan CI guard is a NEW CI step; it must run on every PR touching `src/gps_denied_onboard/components/**`.
- `Clock.sleep_until_ns(target)` honours `pace=ASAP` by no-op'ing in `TlogDerivedClock`.
## Risks & Mitigation
- **Risk: Retrofit breaks subtle frame-ingest timing in C1** — *Mitigation*: AC-3 golden test against pre-retrofit baseline; revert if the C1 unit tests regress.
- **Risk: AST scan false-positives on legitimate `time.monotonic_ns` in non-component cross-cutting code** — *Mitigation*: scan path scoped to `components/**` only; cross-cutting `clock/` directory exempt.
- **Risk: TlogDerivedClock advance-on-call semantics are confusing for the AC-5.2 fallback timer** — *Mitigation*: documented in the contract; AC-5.2 fallback timer's behaviour in replay-ASAP is "fires immediately when the gap exceeds 3 s of tlog-derived time" (which is exactly what we want for deterministic regression tests).
## Runtime Completeness
- **Named capability**: cross-cutting `FrameSource` + `Clock` Protocols + 4 concrete strategies.
- **Production code**: real Protocol surfaces, real OpenCV-based VideoFileFrameSource, real LiveCameraFrameSource retrofit, real WallClock + TlogDerivedClock.
- **Allowed external stubs**: test fakes only.
- **Unacceptable substitutes**: keeping the existing camera-ingest as a free function (defeats Invariant 1); leaving direct `time.monotonic_ns()` in components (defeats Invariant 2).
## Contract
Implements `_docs/02_document/contracts/replay/replay_protocol.md``FrameSource` + `Clock` Protocols.