Decompose Step 6 snapshot: 140 task specs + contract docs

Closes out greenfield Step 6 (Decompose) for all 14 components
(C1-C13 + cross-cutting helpers/replay). Covers tasks AZ-266..AZ-446
plus the _dependencies_table.md and component contract documents.

State file updated to greenfield Step 7 (Implement), not_started.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-11 00:39:48 +03:00
parent 8171fcb29e
commit 880eabcb3f
172 changed files with 22897 additions and 35 deletions
@@ -0,0 +1,161 @@
# Contract: Replay Mode (`FrameSource` + `ReplaySink` + `Clock` + replay composition)
**Owner**: replay (epic AZ-265 / E-DEMO-REPLAY) — strategies live inside existing components (`frame_source/`, `c8_fc_adapter/`); only the composition root and CLI are net-new top-level files.
**Producer task**: AZ-398 (`FrameSource` Protocol + `VideoFileFrameSource` + `LiveCameraFrameSource` retrofit + `Clock` Protocol)
**Consumer tasks**: AZ-399 (TlogReplayFcAdapter), AZ-400 (ReplaySink + JsonlReplaySink), AZ-401 (compose_replay + Clock injection), AZ-402 (gps-denied-replay CLI), AZ-403 (Dockerfile + CI matrix + SBOM diff), AZ-404 (E2E replay fixture test), AZ-405 (Auto-sync IMU take-off detection).
**Version**: 1.0.0
**Status**: draft
**Last Updated**: 2026-05-10
**Module-layout home**:
- `src/gps_denied_onboard/frame_source/interface.py`, `__init__.py``FrameSource` Protocol (Layer 1 cross-cutting per `module-layout.md`).
- `src/gps_denied_onboard/components/c8_fc_adapter/tlog_replay_adapter.py``TlogReplayFcAdapter` (gated `BUILD_TLOG_REPLAY_ADAPTER`).
- `src/gps_denied_onboard/components/c8_fc_adapter/replay_sink.py``ReplaySink` interface + `JsonlReplaySink` (gated `BUILD_REPLAY_SINK_JSONL`).
- `src/gps_denied_onboard/clock/interface.py`, `__init__.py``Clock` Protocol.
- `src/gps_denied_onboard/runtime_root/replay.py``compose_replay(config) -> ReplayRoot`.
## Purpose
Defines the public interfaces enabling **offline replay mode** per epic AZ-265: run the production C1C5 pipeline against historical inputs (12 min Derkachi-style clip + matching pymavlink `.tlog`) so the parent-suite UI demo has end-to-end fidelity equal to a live flight. Production C1C5 components MUST remain mode-agnostic — replay-aware logic lives ONLY in the composition root, the new strategies, and the CLI. The replay binary is a fourth Docker image (`gps-denied-replay-cli`) containing C1C5 + replay strategies but NOT C6/C10/C11/C12 (no operator-side workflows; tile cache is read pre-built).
This contract defines four Protocols and the replay composition surface:
- **`FrameSource`** — the formalised cross-cutting interface for camera-frame ingestion (previously implicit). Two strategies: `LiveCameraFrameSource` (retrofit; existing camera plumbing renamed and put behind the Protocol) and `VideoFileFrameSource` (replay-only, gated `BUILD_VIDEO_FILE_FRAME_SOURCE`).
- **`Clock`** — the wall-clock vs. tlog-derived time abstraction (R-DEMO-4 mitigation). Two strategies: `WallClock` (live/research/operator) and `TlogDerivedClock` (replay only).
- **`ReplaySink`** — the offline `EstimatorOutput` consumer interface. One strategy: `JsonlReplaySink` (one `EstimatorOutput` per JSONL line; gated `BUILD_REPLAY_SINK_JSONL`).
- **`TlogReplayFcAdapter`** — replay-only `FcAdapter` strategy (per AZ-261 `FcAdapter` Protocol from `_docs/02_document/contracts/c8_fc_adapter/fc_adapter_protocol.md`); parses pymavlink `.tlog` and emits `ImuWindow` / `AttitudeWindow` / `GpsHealth` / `FlightStateSignal` at tlog-timestamp cadence (or wall-clock-paced per `--pace`). Gated `BUILD_TLOG_REPLAY_ADAPTER`.
The shared `WgsConverter` (AZ-279) is constructor-injected into the tlog adapter for tlog-GPS → local-tangent-plane conversion.
## Public API
### Protocol: `FrameSource`
```python
@runtime_checkable
class FrameSource(Protocol):
def next_frame(self) -> NavCameraFrame | None: ... # None on end-of-stream
def close(self) -> None: ...
```
### Protocol: `Clock`
```python
@runtime_checkable
class Clock(Protocol):
def monotonic_ns(self) -> int: ...
def time_ns(self) -> int: ... # wall-clock (UTC) for log timestamps
def sleep_until_ns(self, target_ns: int) -> None: ... # honoured in --pace realtime; no-op in --pace asap
```
### Protocol: `ReplaySink`
```python
@runtime_checkable
class ReplaySink(Protocol):
def emit(self, output: EstimatorOutput) -> None: ...
def close(self) -> None: ...
```
### Concrete: `TlogReplayFcAdapter`
```python
class TlogReplayFcAdapter(FcAdapter):
def __init__(
self,
tlog_path: Path,
target_fc_dialect: FcKind, # ARDUPILOT_PLANE | INAV
clock: Clock,
wgs_converter: WgsConverter,
time_offset_ms: int = 0, # auto-detected by AZ-405 auto-sync task or set via --time-offset-ms
pace: ReplayPace = ReplayPace.ASAP, # REALTIME | ASAP
): ...
```
The `TlogReplayFcAdapter` implements the full `FcAdapter` Protocol from AZ-261. `emit_external_position` raises `FcEmitError("replay adapter does not emit to FC")` (replay is read-only on the FC side; downstream consumers use `ReplaySink` instead). `request_source_set_switch` raises `SourceSetSwitchNotSupportedError`. `subscribe_telemetry` is the primary surface — fans out IMU/attitude/GPS-health/flight-state from the tlog at the configured pace.
### CLI surface
```
gps-denied-replay
--video PATH
--tlog PATH
--output results.jsonl
--camera-calibration calib.json
--config config.yaml
[--pace {realtime,asap}] # default asap
[--time-offset-ms N] # overrides auto-sync
```
### Composition root extension
```python
def compose_replay(config: Config) -> ReplayRoot: ...
```
`ReplayRoot` is a dataclass holding all wired components plus the `FrameSource`, `TlogReplayFcAdapter`, `ReplaySink`, and `Clock` chosen for the replay run. The runtime loop is:
```
loop:
frame = frame_source.next_frame()
if frame is None: break
c1 = vio.process(frame) # C1
candidates = vpr.lookup(c1) # C2
reranked = rerank.rerank(candidates) # C2.5
matched = matcher.match(reranked) # C3
refined = refiner.refine_if_needed(matched) # C3.5
pose = pose_estimator.estimate(refined) # C4
state.add_pose_anchor(pose) # C5
state.add_vio(c1.vio_output) # C5
output = state.current_estimate()
replay_sink.emit(output)
replay_sink.close()
```
The tlog adapter's `subscribe_telemetry` callbacks are wired to C5's `add_fc_imu` and to C1's IMU prior on the same threads as in the live binary.
## Invariants
1. **Mode-agnostic C1C5**: production components MUST NOT contain `if replay_mode:` branches. Mode-specific behaviour lives in the strategy (Frame source / FC adapter / Sink / Clock). Verified by an explicit grep guard in CI.
2. **Single `Clock` per process**: the composition root resolves `Clock` exactly once at startup. All time-driven logic (AC-5.2 fallback timer, STATUSTEXT rate-limits, key rotation logging) consumes the injected `Clock` via constructor — never `time.monotonic_ns()` directly. Verified by an AST scan in CI for direct `time.monotonic_ns` / `time.time_ns` references in components.
3. **Frame source ordering**: `next_frame()` returns frames in monotonically non-decreasing `monotonic_ns` order. Out-of-order frames raise `FrameSourceError` (NOT silently dropped — replay must be deterministic).
4. **End-of-stream is None**: `next_frame()` returns `None` ONLY when the stream is permanently exhausted. Transient I/O failures raise `FrameSourceError`.
5. **TlogReplayFcAdapter emit-only-via-sink**: `emit_external_position` and `emit_status_text` raise `FcEmitError("replay adapter does not emit to FC")`. Downstream consumers MUST emit to `ReplaySink` instead.
6. **Pace mode honoured by Clock**: `pace=REALTIME``Clock.sleep_until_ns(target_ns)` blocks until wall-clock catches up; `pace=ASAP` → no-op. The pace flag is consumed ONLY by the `Clock` and the tlog adapter — components see only the `Clock` Protocol.
7. **JsonlReplaySink one-line-per-emit**: each `emit(output)` writes exactly one JSON object + newline; the file is fsync'd on `close()`. Schema matches `EstimatorOutput` (frozen dataclass serialised via `dataclasses.asdict` + `orjson.dumps`).
8. **Time-offset honoured**: when constructed with `time_offset_ms != 0`, the tlog adapter shifts every emitted timestamp by that offset before passing to subscribers. `time_offset_ms` is set ONCE at construction (no live re-tuning).
9. **Build-flag gating**: `VideoFileFrameSource`, `TlogReplayFcAdapter`, `JsonlReplaySink` MUST refuse construction when their respective `BUILD_*` flag is OFF (per ADR-002 — replay binary has them ON; airborne / research / operator have them OFF).
10. **Determinism**: same `(video, tlog, config, time_offset_ms, pace=ASAP)` input → same JSONL output within ≤ 1e-6 float drift in position fields (AC-5).
## Producer / Consumer Split
| Task ID | Scope |
|---------|-------|
| AZ-398 (Producer) | `FrameSource` Protocol; `Clock` Protocol; `VideoFileFrameSource` (gated `BUILD_VIDEO_FILE_FRAME_SOURCE`); `LiveCameraFrameSource` retrofit (rename existing camera-ingest plumbing into the Protocol shape — no behaviour change); `WallClock` + `TlogDerivedClock` strategies; composition wiring in the existing `compose_root`/`compose_operator` (Clock = WallClock there). NO tlog parsing, NO sink, NO replay composition. |
| AZ-399 (Consumer 1) | `TlogReplayFcAdapter`: pymavlink stream-parser (DO NOT materialise; R-DEMO-2 throughput floor); maps tlog message types → `FcTelemetryFrame`; supports both AP and iNav dialects; `subscribe_telemetry` fan-out at the configured pace; respects `time_offset_ms`; honours `Clock` for pacing; fail-fast at startup if required message types absent (R-DEMO-3). |
| AZ-400 (Consumer 2) | `ReplaySink` Protocol + `JsonlReplaySink` (one JSON object per line; orjson serialiser; `close()` fsyncs). |
| AZ-401 (Consumer 3) | `compose_replay(config) -> ReplayRoot`: full strategy resolution for the replay binary; `Clock` strategy selection (TlogDerivedClock for ASAP, WallClock for REALTIME; documented per R-DEMO-4); `FrameSource` = `VideoFileFrameSource`; `FcAdapter` = `TlogReplayFcAdapter`; `Sink` = `JsonlReplaySink`; ALL of C1C5 wired with the same Public API as the live binary. NO C6/C10/C11/C12. Configuration loading + camera-calibration loading. |
| AZ-402 (Consumer 4) | `gps-denied-replay` CLI entrypoint: argparse, config + calibration loader, runtime loop (the loop body documented in this contract above), structured-error exit codes (0=success, 2=AC-8 sync-impossible, 1=any other error). |
| AZ-403 (Consumer 5) | `gps-denied-replay-cli` Dockerfile (multi-stage; Python + C1C5 + cpp/* + replay strategies; NO C6/C10/C11/C12; NO HTTP server) + GitHub Actions matrix entry + SBOM diff CI step verifying absence of excluded components per AC-4. |
| AZ-404 (Consumer 6) | E2E replay fixture test: `tests/e2e/replay/test_derkachi_1min.py` — runs the CLI against a 12 min Derkachi clip + matching tlog; asserts AC-3 (≤ 100 m for ≥ 80 % of ticks); gated by `RUN_REPLAY_E2E=1` in CI. |
| AZ-405 (Consumer 7) | Auto-sync of video ↔ tlog via IMU take-off detection (AC-7 / AC-8). Take-off pattern: sustained vertical accel > 0.5 g + change in attitude rate > 1 rad/s lasting ≥ 0.5 s (typical quadcopter signature). Confidence-scored; falls back to WARN + best-guess if < 80 %; `--time-offset-ms` always overrides; AC-8 hard-fail (exit 2) if neither auto-detect nor manual offset produces > 95 % frame-window match. |
## Constraints
- `@runtime_checkable` on all Protocols; DTOs `frozen=True, slots=True`.
- Lazy-import per ADR-002 with the new `BUILD_VIDEO_FILE_FRAME_SOURCE`, `BUILD_TLOG_REPLAY_ADAPTER`, `BUILD_REPLAY_SINK_JSONL` flags.
- C1C5 components MUST remain mode-agnostic (Invariant 1).
- All time-driven logic in components MUST consume the injected `Clock` (Invariant 2).
- No HTTP server in the replay binary (parent-suite UI shells out to the CLI; defer until subprocess shape is proven insufficient).
- pymavlink bundled unmodified per D-C8-3.
- The tlog parser MUST stream-parse — never materialise the entire tlog into memory (R-DEMO-2; multi-GB tlogs).
## Risks / Mitigations
- **R-DEMO-1** (tlog ↔ video timestamp drift / unsynchronised recordings): auto-sync via IMU take-off detection (AC-7) + `--time-offset-ms` manual override. Fixed-wing hand-launch fallback documented.
- **R-DEMO-2** (pymavlink slow on multi-GB tlogs): stream-parse, never materialise. Throughput floor benchmarked + documented in CI.
- **R-DEMO-3** (demo footage missing required FC messages): `TlogReplayFcAdapter.open(...)` fails fast at startup, listing missing message types and the components that need them.
- **R-DEMO-4** (production C1C5 paths bake real-time-cadence assumptions): `Clock` injection (Invariants 1, 2). Documented as ADR amendment in next architecture-doc cycle.
## Notes for the Implementer
- The `LiveCameraFrameSource` retrofit is a no-op restructure: the existing camera-ingest thread becomes a class implementing `FrameSource`. Its behaviour is unchanged. This is what allows C1 to consume `FrameSource` via constructor without becoming replay-aware.
- The `TlogReplayFcAdapter`'s `subscribe_telemetry` fan-out runs on a dedicated thread (mirroring the live `PymavlinkArdupilotAdapter` decode-thread semantics). This way C1 and C5 see identical thread boundaries in live and replay.
- The `Clock` Protocol is the SAME interface in live and replay — only the strategy differs. This is the single Liskov-clean line that lets components consume `Clock` without knowing the mode.