[AZ-265] Replay as configuration of airborne binary (ADR-011)

Re-design replay mode per user direction: replay is no longer a fourth
Docker image with a reduced component set, but a `config.mode = "replay"`
branch of the single airborne binary. The pre-flight workflow (route in
suite UI -> C12 tile download via real satellite-provider -> C10
manifest+engines build) is identical between live and replay; only three
strategies swap at compose time:

  FrameSource:      Live <-> Video
  FcAdapter:        Pymavlink/MSP2 <-> TlogReplay
  MavlinkTransport: Serial <-> Noop

The C8 outbound MAVLink encoders run unchanged in both modes; their
bytes hit `NoopMavlinkTransport` in replay and disappear. A new
`JsonlReplaySink` taps C5's `EstimatorOutput` stream so the parent-suite
UI sees per-tick coordinates by tailing `results.jsonl`. MAVLink 2.0
signing key remains mandatory (operator supplies a dummy file).

A new `replay_input/` Layer-4 cross-cutting coordinator owns
`(video, tlog) -> (FrameSource, FcAdapter, Clock)` convergence; the
composition root sees only standard interfaces past `.open()`.

Docs:
- architecture.md: new ADR-011 with full rationale; ADR-002 binary
  narrative updated.
- contracts/replay/replay_protocol.md: bumped to v2.0.0; 12 invariants
  (notably mode-agnosticism + encoder byte-equality + signing key
  mandatory + real C6 cache in replay).
- module-layout.md: Build-Time Exclusion Map dropped from 4 to 3 binary
  columns; replay-mode `BUILD_*` flags default ON in airborne;
  `shared/replay_input` cross-cutting entry added.
- epics.md: E-DEMO-REPLAY scope reframed; story points 27-32 -> 19-24.

Task respecs:
- AZ-401: shrunk 3 -> 2 pts; `compose_root` mode branch + JSONL sink +
  NoopMavlinkTransport wiring; legacy `compose_replay` export deleted.
- AZ-402: console-script wrapper that mutates `config.mode = "replay"`
  and dispatches into the shared airborne main; `--mavlink-signing-key`
  mandatory.
- AZ-403: CANCELLED. Moved to done/ with banner; Jira transition deferred
  via `_docs/_process_leftovers/2026-05-14_az_403_cancellation_pending_tracker.md`.
- AZ-404: AC-4 reworded as mode-agnosticism AST scan + encoder
  byte-equality test; new AC-8 operator-workflow rehearsal.
- AZ-405: also owns the `replay_input/` module + `ReplayInputAdapter`.

_dependencies_table.md updated: AZ-401 gains AZ-405 dep; AZ-404 drops
AZ-403 dep; AZ-403 row marked CANCELLED.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-14 09:01:04 +03:00
parent fa3742d582
commit 5adf3dd04f
13 changed files with 765 additions and 418 deletions
@@ -1,33 +1,52 @@
# Contract: Replay Mode (`FrameSource` + `ReplaySink` + `Clock` + replay composition)
# Contract: Replay Mode (`replay_input` module + `FrameSource` + `Clock` + `ReplaySink` + `NoopMavlinkTransport`)
**Owner**: replay (epic AZ-265 / E-DEMO-REPLAY) — strategies live inside existing components (`frame_source/`, `c8_fc_adapter/`); only the composition root and CLI are net-new top-level files.
**Owner**: replay (epic AZ-265 / E-DEMO-REPLAY) — strategies live inside existing components (`frame_source/`, `clock/`, `c8_fc_adapter/`); a small new `replay_input/` cross-cutting module converges `(video, tlog)` inputs into the standard `FrameSource` + `FcAdapter` boundaries the rest of the system already consumes.
**Producer task**: AZ-398 (`FrameSource` Protocol + `VideoFileFrameSource` + `LiveCameraFrameSource` retrofit + `Clock` Protocol)
**Consumer tasks**: AZ-399 (TlogReplayFcAdapter), AZ-400 (ReplaySink + JsonlReplaySink), AZ-401 (compose_replay + Clock injection), AZ-402 (gps-denied-replay CLI), AZ-403 (Dockerfile + CI matrix + SBOM diff), AZ-404 (E2E replay fixture test), AZ-405 (Auto-sync IMU take-off detection).
**Version**: 1.0.0
**Consumer tasks**: AZ-399 (TlogReplayFcAdapter), AZ-400 (ReplaySink + JsonlReplaySink + NoopMavlinkTransport), AZ-401 (replay-mode branch in `compose_root`), AZ-402 (gps-denied-replay CLI wrapper), AZ-404 (E2E replay fixture test), AZ-405 (Auto-sync IMU take-off detection inside `replay_input/`).
**Version**: 2.0.0 (replaces v1.0.0 — "replay is a fourth Docker image" design replaced by "replay is a configuration of the airborne binary"; see ADR-011)
**Status**: draft
**Last Updated**: 2026-05-10
**Last Updated**: 2026-05-14
**Module-layout home**:
- `src/gps_denied_onboard/frame_source/interface.py`, `__init__.py``FrameSource` Protocol (Layer 1 cross-cutting per `module-layout.md`).
- `src/gps_denied_onboard/components/c8_fc_adapter/tlog_replay_adapter.py``TlogReplayFcAdapter` (gated `BUILD_TLOG_REPLAY_ADAPTER`).
- `src/gps_denied_onboard/components/c8_fc_adapter/replay_sink.py``ReplaySink` interface + `JsonlReplaySink` (gated `BUILD_REPLAY_SINK_JSONL`).
- `src/gps_denied_onboard/clock/interface.py`, `__init__.py``Clock` Protocol.
- `src/gps_denied_onboard/runtime_root/replay.py``compose_replay(config) -> ReplayRoot`.
- `src/gps_denied_onboard/clock/interface.py`, `__init__.py``Clock` Protocol (Layer 1 cross-cutting).
- `src/gps_denied_onboard/components/c8_fc_adapter/tlog_replay_adapter.py``TlogReplayFcAdapter` strategy (gated `BUILD_TLOG_REPLAY_ADAPTER`; ON in the airborne binary).
- `src/gps_denied_onboard/components/c8_fc_adapter/replay_sink.py``ReplaySink` Protocol + `JsonlReplaySink` strategy (gated `BUILD_REPLAY_SINK_JSONL`; ON in the airborne binary).
- `src/gps_denied_onboard/components/c8_fc_adapter/noop_mavlink_transport.py``NoopMavlinkTransport` strategy (gated `BUILD_REPLAY_SINK_JSONL`; ON in the airborne binary; wraps the live MAVLink transport layer so C8 encoders are unchanged).
- `src/gps_denied_onboard/replay_input/` — new Layer-4 cross-cutting coordinator that owns `(video, tlog)``(FrameSource, FcAdapter, Clock)` convergence + auto-sync + time-offset application.
- `src/gps_denied_onboard/runtime_root/__init__.py``compose_root(config)` extended with a `config.mode = "live" | "replay"` branch (no separate `compose_replay` composition root; replay is a configuration of the single airborne composition root).
- `src/gps_denied_onboard/cli/replay.py``gps-denied-replay` console-script: builds a replay-mode `Config` and dispatches into the same companion entry point as live.
## Purpose
Defines the public interfaces enabling **offline replay mode** per epic AZ-265: run the production C1C5 pipeline against historical inputs (12 min Derkachi-style clip + matching pymavlink `.tlog`) so the parent-suite UI demo has end-to-end fidelity equal to a live flight. Production C1C5 components MUST remain mode-agnostic — replay-aware logic lives ONLY in the composition root, the new strategies, and the CLI. The replay binary is a fourth Docker image (`gps-denied-replay-cli`) containing C1C5 + replay strategies but NOT C6/C10/C11/C12 (no operator-side workflows; tile cache is read pre-built).
Defines the public interfaces enabling **offline replay mode** per epic AZ-265: run the production C1C5 pipeline (with the full C6 tile cache + the same C7 inference runtime + the same C13 FDR) against historical inputs (12 min Derkachi-style clip + matching pymavlink `.tlog`) so the parent-suite UI demo has end-to-end fidelity equal to a live flight.
This contract defines four Protocols and the replay composition surface:
- **`FrameSource`** — the formalised cross-cutting interface for camera-frame ingestion (previously implicit). Two strategies: `LiveCameraFrameSource` (retrofit; existing camera plumbing renamed and put behind the Protocol) and `VideoFileFrameSource` (replay-only, gated `BUILD_VIDEO_FILE_FRAME_SOURCE`).
- **`Clock`** — the wall-clock vs. tlog-derived time abstraction (R-DEMO-4 mitigation). Two strategies: `WallClock` (live/research/operator) and `TlogDerivedClock` (replay only).
- **`ReplaySink`** — the offline `EstimatorOutput` consumer interface. One strategy: `JsonlReplaySink` (one `EstimatorOutput` per JSONL line; gated `BUILD_REPLAY_SINK_JSONL`).
**Design (v2.0.0 — replaces v1.0.0)**: replay is a **configuration of the airborne binary**, not a separate Docker image. See ADR-011 for the full rationale. The same image, same components, same composition root, same pre-flight workflow as a live flight; only three strategies differ at runtime:
| Concern | Live strategy | Replay strategy |
|---|---|---|
| `FrameSource` | `LiveCameraFrameSource` | `VideoFileFrameSource` |
| `FcAdapter` (inbound IMU/attitude/GPS/flight-state) | `PymavlinkArdupilotAdapter` / `Msp2InavAdapter` | `TlogReplayFcAdapter` |
| `FcAdapter` outbound transport (the bytes that go onto the wire) | Real serial/UART link to ArduPilot Plane / iNav | `NoopMavlinkTransport` (sink; C8 encoders unchanged) |
| `Clock` | `WallClock` | `TlogDerivedClock` (pace=ASAP) or `WallClock` (pace=REALTIME) |
| Per-tick position observable to the UI | C8 outbound + GCS telemetry summary | Additional `JsonlReplaySink` tap on C5's `EstimatorOutput` stream |
Everything else is identical: C6 reads the same pre-built tile cache the operator built via the normal C10/C11/C12 pre-flight flow; C7 deserializes the same TensorRT engines; C13 writes a real FDR for the replay run (a real flight record, just driven by historical inputs). Production C1C5 components remain **mode-agnostic** — replay-aware logic lives ONLY in the composition root branch, the strategies named above, the `replay_input/` coordinator, and the CLI.
The user-visible result: a UI consumer tails the JSONL file and sees per-tick `(lat, lon, alt, horiz_accuracy)` exactly as the airborne binary would emit them in a real flight. Other MAVLink emits (FC GPS_INPUT, GCS STATUSTEXT, EKF source-set commands) are swallowed by `NoopMavlinkTransport` — the operator confirmed they don't need to be observable in replay (the contract above is the single source of truth for that decision).
This contract defines four Protocols, one coordinator class, and the replay-mode composition branch:
- **`FrameSource`** — formalised cross-cutting interface for camera-frame ingestion. Two strategies: `LiveCameraFrameSource` (live) and `VideoFileFrameSource` (replay; gated `BUILD_VIDEO_FILE_FRAME_SOURCE`).
- **`Clock`** — wall-clock vs. tlog-derived time abstraction (R-DEMO-4 mitigation). Two strategies: `WallClock` (live/research/operator/replay-realtime) and `TlogDerivedClock` (replay-asap).
- **`ReplaySink`** — offline `EstimatorOutput` consumer interface tapping C5's output stream. One strategy: `JsonlReplaySink` (one `EstimatorOutput` per JSONL line; gated `BUILD_REPLAY_SINK_JSONL`).
- **`TlogReplayFcAdapter`** — replay-only `FcAdapter` strategy (per AZ-261 `FcAdapter` Protocol from `_docs/02_document/contracts/c8_fc_adapter/fc_adapter_protocol.md`); parses pymavlink `.tlog` and emits `ImuWindow` / `AttitudeWindow` / `GpsHealth` / `FlightStateSignal` at tlog-timestamp cadence (or wall-clock-paced per `--pace`). Gated `BUILD_TLOG_REPLAY_ADAPTER`.
- **`NoopMavlinkTransport`** — replay-only outbound transport that swallows every byte the C8 encoders try to write. The C8 outbound encoder code path is **unchanged** between live and replay (Invariant 1); the transport layer is the only place the destination differs. Gated `BUILD_REPLAY_SINK_JSONL` (shares the build flag with `JsonlReplaySink` — both are "where does this binary send its outputs in replay" concerns).
- **`ReplayInputAdapter`** — Layer-4 coordinator class in `replay_input/` that owns `(video, tlog)` lifecycle, applies the time-offset (manual via `--time-offset-ms` or auto via AZ-405 IMU-take-off detection), instantiates `VideoFileFrameSource` + `TlogReplayFcAdapter` + chosen `Clock`, and hands the trio to the composition root. The composition root sees only standard `FrameSource` + `FcAdapter` + `Clock` after the coordinator is opened.
The shared `WgsConverter` (AZ-279) is constructor-injected into the tlog adapter for tlog-GPS → local-tangent-plane conversion.
The shared `WgsConverter` (AZ-279) is constructor-injected into the tlog adapter for tlog-GPS → local-tangent-plane conversion (unchanged from v1.0.0).
## Public API
### Protocol: `FrameSource`
### Protocol: `FrameSource` (unchanged from v1.0.0)
```python
@runtime_checkable
@@ -36,7 +55,7 @@ class FrameSource(Protocol):
def close(self) -> None: ...
```
### Protocol: `Clock`
### Protocol: `Clock` (unchanged from v1.0.0)
```python
@runtime_checkable
@@ -46,7 +65,7 @@ class Clock(Protocol):
def sleep_until_ns(self, target_ns: int) -> None: ... # honoured in --pace realtime; no-op in --pace asap
```
### Protocol: `ReplaySink`
### Protocol: `ReplaySink` (unchanged from v1.0.0)
```python
@runtime_checkable
@@ -55,7 +74,7 @@ class ReplaySink(Protocol):
def close(self) -> None: ...
```
### Concrete: `TlogReplayFcAdapter`
### Concrete: `TlogReplayFcAdapter` (unchanged from v1.0.0)
```python
class TlogReplayFcAdapter(FcAdapter):
@@ -65,12 +84,81 @@ class TlogReplayFcAdapter(FcAdapter):
target_fc_dialect: FcKind, # ARDUPILOT_PLANE | INAV
clock: Clock,
wgs_converter: WgsConverter,
time_offset_ms: int = 0, # auto-detected by AZ-405 auto-sync task or set via --time-offset-ms
time_offset_ms: int = 0, # set by ReplayInputAdapter (auto-sync or --time-offset-ms)
pace: ReplayPace = ReplayPace.ASAP, # REALTIME | ASAP
): ...
```
The `TlogReplayFcAdapter` implements the full `FcAdapter` Protocol from AZ-261. `emit_external_position` raises `FcEmitError("replay adapter does not emit to FC")` (replay is read-only on the FC side; downstream consumers use `ReplaySink` instead). `request_source_set_switch` raises `SourceSetSwitchNotSupportedError`. `subscribe_telemetry` is the primary surface — fans out IMU/attitude/GPS-health/flight-state from the tlog at the configured pace.
The `TlogReplayFcAdapter` implements the **full** `FcAdapter` Protocol from AZ-261. `subscribe_telemetry` fans out IMU/attitude/GPS-health/flight-state from the tlog at the configured pace. `emit_external_position`, `emit_status_text`, and `request_source_set_switch` are implemented as **no-ops that delegate to the underlying transport** — in replay mode the transport is `NoopMavlinkTransport` (see below), so the bytes go nowhere; in live mode the same encoders shape the same bytes for a real wire. The encoder code path is identical; only the transport differs.
### Concrete: `NoopMavlinkTransport`
```python
class NoopMavlinkTransport(MavlinkTransport):
"""Outbound transport sink for replay mode.
Accepts every `write(payload: bytes)` and `close()` call without I/O.
Counts bytes written for observability (FDR + INFO log at close).
"""
def write(self, payload: bytes) -> None: ... # silent drop
def close(self) -> None: ...
def bytes_written(self) -> int: ... # observability
```
The C8 outbound encoders (per the v1.0.0 `FcAdapter` protocol — `emit_external_position`, `emit_status_text`, `request_source_set_switch`, and the `QgcTelemetryAdapter` 12 Hz GCS summary) operate over a constructor-injected `MavlinkTransport` interface (a new tiny Protocol introduced by AZ-401 to make this swap clean). In live mode the transport is `SerialMavlinkTransport` writing to the UART; in replay mode it is `NoopMavlinkTransport`. **The encoders themselves are unchanged** — they produce the same byte streams, including the MAVLink 2.0 signing handshake and per-flight key rotation. The signing key is mandatory in both modes (the operator supplies a dummy key for replay; the contract does not constrain the key's provenance).
This is the single architectural point that lets us say "replay is exactly like live, only the destination differs" without baking `if replay_mode:` branches into C8.
### Concrete: `ReplayInputAdapter`
```python
@dataclass(frozen=True, slots=True)
class ReplayInputBundle:
frame_source: FrameSource
fc_adapter: FcAdapter
clock: Clock
resolved_time_offset_ms: int
auto_sync_result: AutoSyncDecision | None # None when --time-offset-ms is provided
class ReplayInputAdapter:
"""Converges (video, tlog) into the standard FrameSource + FcAdapter + Clock surfaces.
Owns the time-alignment between video frames and tlog IMU/attitude ticks
(manual via --time-offset-ms or automatic via AZ-405 IMU-take-off detection).
Instantiates VideoFileFrameSource, TlogReplayFcAdapter, and the chosen Clock.
The composition root, after calling .open(), sees no replay-specific types.
"""
def __init__(
self,
*,
video_path: Path,
tlog_path: Path,
camera_calibration: CameraCalibration,
target_fc_dialect: FcKind,
wgs_converter: WgsConverter,
pace: ReplayPace,
manual_time_offset_ms: int | None, # None → auto-sync runs (AZ-405)
auto_sync_config: AutoSyncConfig,
) -> None: ...
def open(self) -> ReplayInputBundle:
"""Resolve time-offset (auto-sync or manual), build the strategies, return the bundle.
Raises:
ReplayInputAdapterError("tlog missing required message types: ...")
— R-DEMO-3 fail-fast at startup.
ReplayInputAdapterError("auto-sync hard-fail: ...")
— AC-8 of the epic (≤ 95 % frame-window match).
ReplayInputAdapterError("video file unreadable / unsupported codec / ...")
— VideoFileFrameSource opening failure surfaced at coordinator scope.
"""
...
def close(self) -> None: ... # closes both inputs; idempotent
```
### CLI surface
@@ -80,82 +168,103 @@ gps-denied-replay
--tlog PATH
--output results.jsonl
--camera-calibration calib.json
--config config.yaml
[--pace {realtime,asap}] # default asap
[--time-offset-ms N] # overrides auto-sync
--config config.yaml # same config schema as the airborne binary
--mavlink-signing-key PATH # mandatory; operator provides a dummy key for replay
[--pace {realtime,asap}] # default asap
[--time-offset-ms N] # overrides AZ-405 auto-sync
```
The CLI is a thin **mode-config wrapper**: it loads `config.yaml`, sets `config.mode = "replay"` and the replay-specific paths/flags, and calls the **same** entry point the live binary uses. The shared entry point calls `compose_root(config)` which returns a wired runtime; the runtime's per-frame loop is unchanged between live and replay.
### Composition root extension
```python
def compose_replay(config: Config) -> ReplayRoot: ...
```
`runtime_root/__init__.py` exposes a single `compose_root(config) -> Runtime` (no separate `compose_replay`). When `config.mode == "replay"`:
1. Build a `ReplayInputAdapter` from `config.replay.{video_path, tlog_path, pace, time_offset_ms, …}` + the same `CameraCalibration` and `WgsConverter` the live path already uses.
2. Call `replay_input.open()``ReplayInputBundle(frame_source, fc_adapter, clock, …)`.
3. Pick the `MavlinkTransport` strategy: `NoopMavlinkTransport` (replay) vs. `SerialMavlinkTransport` (live), based on `config.mode`.
4. Add a `JsonlReplaySink` subscriber to C5's `EstimatorOutput` stream (replay only). The live binary already emits to C8 outbound + QGC telemetry adapter; the JSONL sink is an additional listener, not a replacement.
5. Wire C1C5 + C6 + C7 + C13 exactly as in the live composition (Invariant 1 — components see the same interfaces).
6. Return the wired `Runtime` whose per-frame loop is the existing one (single source of truth — no per-mode loop).
`ReplayRoot` is a dataclass holding all wired components plus the `FrameSource`, `TlogReplayFcAdapter`, `ReplaySink`, and `Clock` chosen for the replay run. The runtime loop is:
```
loop:
frame = frame_source.next_frame()
frame = frame_source.next_frame() # VideoFileFrameSource in replay
if frame is None: break
c1 = vio.process(frame) # C1
candidates = vpr.lookup(c1) # C2
reranked = rerank.rerank(candidates) # C2.5
matched = matcher.match(reranked) # C3
refined = refiner.refine_if_needed(matched) # C3.5
pose = pose_estimator.estimate(refined) # C4
state.add_pose_anchor(pose) # C5
state.add_vio(c1.vio_output) # C5
c1 = vio.process(frame) # C1
candidates = vpr.lookup(c1) # C2 (uses real C6 DescriptorIndex)
reranked = rerank.rerank(candidates) # C2.5
matched = matcher.match(reranked) # C3
refined = refiner.refine_if_needed(matched) # C3.5
pose = pose_estimator.estimate(refined) # C4
state.add_pose_anchor(pose) # C5
state.add_vio(c1.vio_output) # C5
output = state.current_estimate()
replay_sink.emit(output)
replay_sink.close()
# multiple listeners, all wired by the composition root:
fc_adapter.emit_external_position(output) # → NoopMavlinkTransport in replay; SerialMavlinkTransport live
fdr.write(output) # C13: ALWAYS, both modes
if replay_sink is not None: # replay only
replay_sink.emit(output) # JsonlReplaySink → JSONL file → UI tails it
```
The tlog adapter's `subscribe_telemetry` callbacks are wired to C5's `add_fc_imu` and to C1's IMU prior on the same threads as in the live binary.
Side notes:
- The tlog adapter's `subscribe_telemetry` callbacks are wired to C5's `add_fc_imu` and to C1's IMU prior on the same threads as in the live binary (Invariant 1 — same threads, same callbacks, different source).
- `set_takeoff_origin` (AZ-490 / ADR-010) is invoked identically in replay: the operator's pre-flight C10 Manifest is the source of truth in both modes. The tlog's first GPS fix is the **fallback**, gated through the same Principle #11 bounded-delta check.
- `BUILD_FAISS_INDEX` is ON in the airborne binary (live and replay alike). C2 in replay queries the **real** C6 `FaissDescriptorIndex`, populated by the pre-flight C10 build. This is the architectural change vs. v1.0.0 of this contract.
## Invariants
1. **Mode-agnostic C1C5**: production components MUST NOT contain `if replay_mode:` branches. Mode-specific behaviour lives in the strategy (Frame source / FC adapter / Sink / Clock). Verified by an explicit grep guard in CI.
2. **Single `Clock` per process**: the composition root resolves `Clock` exactly once at startup. All time-driven logic (AC-5.2 fallback timer, STATUSTEXT rate-limits, key rotation logging) consumes the injected `Clock` via constructor — never `time.monotonic_ns()` directly. Verified by an AST scan in CI for direct `time.monotonic_ns` / `time.time_ns` references in components.
1. **Mode-agnostic C1C7, C13**: production components MUST NOT contain `if config.mode == "replay":` branches. Mode-specific behaviour lives in the strategies (FrameSource / FcAdapter / MavlinkTransport / ReplaySink / Clock). Verified by an explicit grep guard in CI (the AZ-404 E2E test owns this assertion).
2. **Single `Clock` per process**: `compose_root` resolves `Clock` exactly once at startup. All time-driven logic (AC-5.2 fallback timer, STATUSTEXT rate-limits, key rotation logging) consumes the injected `Clock` via constructor — never `time.monotonic_ns()` directly. Verified by an AST scan in CI for direct `time.monotonic_ns` / `time.time_ns` references in `components/**/*.py`.
3. **Frame source ordering**: `next_frame()` returns frames in monotonically non-decreasing `monotonic_ns` order. Out-of-order frames raise `FrameSourceError` (NOT silently dropped — replay must be deterministic).
4. **End-of-stream is None**: `next_frame()` returns `None` ONLY when the stream is permanently exhausted. Transient I/O failures raise `FrameSourceError`.
5. **TlogReplayFcAdapter emit-only-via-sink**: `emit_external_position` and `emit_status_text` raise `FcEmitError("replay adapter does not emit to FC")`. Downstream consumers MUST emit to `ReplaySink` instead.
5. **Outbound MAVLink encoders are mode-agnostic**: the C8 outbound encoders for `GPS_INPUT` / `MSP2_SENSOR_GPS` / `STATUSTEXT` / `NAMED_VALUE_FLOAT` / `MAV_CMD_SET_EKF_SOURCE_SET` produce identical byte streams in both modes. Only the `MavlinkTransport` strategy differs (Serial vs. Noop). The MAVLink 2.0 signing handshake runs in replay too (the operator provides a dummy signing key); the signing bytes are produced and then dropped by `NoopMavlinkTransport`. Verified by a unit test that captures the encoder output in both modes and diffs the byte streams.
6. **Pace mode honoured by Clock**: `pace=REALTIME``Clock.sleep_until_ns(target_ns)` blocks until wall-clock catches up; `pace=ASAP` → no-op. The pace flag is consumed ONLY by the `Clock` and the tlog adapter — components see only the `Clock` Protocol.
7. **JsonlReplaySink one-line-per-emit**: each `emit(output)` writes exactly one JSON object + newline; the file is fsync'd on `close()`. Schema matches `EstimatorOutput` (frozen dataclass serialised via `dataclasses.asdict` + `orjson.dumps`).
8. **Time-offset honoured**: when constructed with `time_offset_ms != 0`, the tlog adapter shifts every emitted timestamp by that offset before passing to subscribers. `time_offset_ms` is set ONCE at construction (no live re-tuning).
9. **Build-flag gating**: `VideoFileFrameSource`, `TlogReplayFcAdapter`, `JsonlReplaySink` MUST refuse construction when their respective `BUILD_*` flag is OFF (per ADR-002 — replay binary has them ON; airborne / research / operator have them OFF).
8. **Time-offset resolved before composition**: the `ReplayInputAdapter` resolves `time_offset_ms` (auto-sync or manual) and locks it into the `TlogReplayFcAdapter` constructor before `compose_root` returns the wired runtime. No live re-tuning.
9. **Build-flag gating**: `VideoFileFrameSource`, `TlogReplayFcAdapter`, `JsonlReplaySink`, `NoopMavlinkTransport` MUST refuse construction when their respective `BUILD_*` flag is OFF (per ADR-002). In the airborne binary all four flags are ON by default; setting any of them OFF in airborne disables replay mode (the binary still runs live mode normally).
10. **Determinism**: same `(video, tlog, config, time_offset_ms, pace=ASAP)` input → same JSONL output within ≤ 1e-6 float drift in position fields (AC-5).
11. **MAVLink signing key required in replay**: the airborne binary refuses to run without `--mavlink-signing-key PATH` in both modes. In replay the operator supplies a dummy file (well-formed key bytes; no real channel to verify against). This preserves Invariant 5 — the encoders' signing code path runs identically in both modes.
12. **Real C6 cache in replay**: the airborne binary in replay mode reads the same pre-built C6 tile cache the operator built via the normal pre-flight C10/C11/C12 flow. There is no replay-specific cache shape. Verified by the AZ-404 E2E fixture, which runs the operator's pre-flight flow before invoking the replay CLI.
## Producer / Consumer Split
| Task ID | Scope |
|---------|-------|
| AZ-398 (Producer) | `FrameSource` Protocol; `Clock` Protocol; `VideoFileFrameSource` (gated `BUILD_VIDEO_FILE_FRAME_SOURCE`); `LiveCameraFrameSource` retrofit (rename existing camera-ingest plumbing into the Protocol shape — no behaviour change); `WallClock` + `TlogDerivedClock` strategies; composition wiring in the existing `compose_root`/`compose_operator` (Clock = WallClock there). NO tlog parsing, NO sink, NO replay composition. |
| AZ-399 (Consumer 1) | `TlogReplayFcAdapter`: pymavlink stream-parser (DO NOT materialise; R-DEMO-2 throughput floor); maps tlog message types → `FcTelemetryFrame`; supports both AP and iNav dialects; `subscribe_telemetry` fan-out at the configured pace; respects `time_offset_ms`; honours `Clock` for pacing; fail-fast at startup if required message types absent (R-DEMO-3). |
| AZ-400 (Consumer 2) | `ReplaySink` Protocol + `JsonlReplaySink` (one JSON object per line; orjson serialiser; `close()` fsyncs). |
| AZ-401 (Consumer 3) | `compose_replay(config) -> ReplayRoot`: full strategy resolution for the replay binary; `Clock` strategy selection (TlogDerivedClock for ASAP, WallClock for REALTIME; documented per R-DEMO-4); `FrameSource` = `VideoFileFrameSource`; `FcAdapter` = `TlogReplayFcAdapter`; `Sink` = `JsonlReplaySink`; ALL of C1C5 wired with the same Public API as the live binary. NO C6/C10/C11/C12. Configuration loading + camera-calibration loading. |
| AZ-402 (Consumer 4) | `gps-denied-replay` CLI entrypoint: argparse, config + calibration loader, runtime loop (the loop body documented in this contract above), structured-error exit codes (0=success, 2=AC-8 sync-impossible, 1=any other error). |
| AZ-403 (Consumer 5) | `gps-denied-replay-cli` Dockerfile (multi-stage; Python + C1C5 + cpp/* + replay strategies; NO C6/C10/C11/C12; NO HTTP server) + GitHub Actions matrix entry + SBOM diff CI step verifying absence of excluded components per AC-4. |
| AZ-404 (Consumer 6) | E2E replay fixture test: `tests/e2e/replay/test_derkachi_1min.py` — runs the CLI against a 12 min Derkachi clip + matching tlog; asserts AC-3 (≤ 100 m for ≥ 80 % of ticks); gated by `RUN_REPLAY_E2E=1` in CI. |
| AZ-405 (Consumer 7) | Auto-sync of video ↔ tlog via IMU take-off detection (AC-7 / AC-8). Take-off pattern: sustained vertical accel > 0.5 g + change in attitude rate > 1 rad/s lasting ≥ 0.5 s (typical quadcopter signature). Confidence-scored; falls back to WARN + best-guess if < 80 %; `--time-offset-ms` always overrides; AC-8 hard-fail (exit 2) if neither auto-detect nor manual offset produces > 95 % frame-window match. |
| AZ-398 (Producer) | `FrameSource` Protocol; `Clock` Protocol; `VideoFileFrameSource` (gated `BUILD_VIDEO_FILE_FRAME_SOURCE`); `LiveCameraFrameSource` retrofit (rename existing camera-ingest plumbing into the Protocol shape — no behaviour change); `WallClock` + `TlogDerivedClock` strategies; composition wiring in `compose_root` (Clock = WallClock in live, picked per-pace in replay). NO tlog parsing, NO sink, NO replay coordinator. |
| AZ-399 (Consumer 1) | `TlogReplayFcAdapter`: pymavlink stream-parser (DO NOT materialise; R-DEMO-2 throughput floor); maps tlog message types → `FcTelemetryFrame`; supports both AP and iNav dialects; `subscribe_telemetry` fan-out at the configured pace; respects `time_offset_ms`; honours `Clock` for pacing; outbound `emit_*` methods delegate to constructor-injected `MavlinkTransport` (Invariant 5); fail-fast at startup if required message types absent (R-DEMO-3). |
| AZ-400 (Consumer 2) | `ReplaySink` Protocol + `JsonlReplaySink` (one JSON object per line; orjson serialiser; `close()` fsyncs). **Also**: `MavlinkTransport` Protocol cut-out + `NoopMavlinkTransport` strategy + `SerialMavlinkTransport` retrofit (rename the existing C8 transport code into the Protocol shape — no behaviour change). |
| AZ-401 (Consumer 3) | Extend `compose_root(config)` with a `config.mode = "live" \| "replay"` branch: in replay mode, builds the `ReplayInputAdapter`, picks `NoopMavlinkTransport`, adds the `JsonlReplaySink` listener on C5's `EstimatorOutput` stream, and otherwise wires C1C7 + C13 identically to live. Build-flag check at startup. NO separate `compose_replay` function (replay is a configuration of the single composition root). |
| AZ-402 (Consumer 4) | `gps-denied-replay` CLI: argparse, config + calibration loader, sets `config.mode = "replay"`, dispatches into the same companion entry point as live; structured-error exit codes (0=success, 2=AC-8 sync-impossible from `ReplayInputAdapter.open()`, 1=any other error). |
| AZ-404 (Consumer 6) | E2E replay fixture test: `tests/e2e/replay/test_derkachi_1min.py` — runs the CLI against a 12 min Derkachi clip + matching tlog; asserts AC-3 (≤ 100 m for ≥ 80 % of ticks); gated by `RUN_REPLAY_E2E=1` in CI. Asserts Invariant 1 (no `if config.mode == "replay"` branches in components) via an AST scan. |
| AZ-405 (Consumer 7) | Auto-sync of video ↔ tlog via IMU take-off detection. Lives **inside `replay_input/`** (this task creates the module): take-off pattern (sustained vertical accel > 0.5 g + change in attitude rate > 1 rad/s lasting ≥ 0.5 s) + video motion-onset; confidence-scored; falls back to WARN + best-guess if < 80 %; `--time-offset-ms` always overrides; AC-8 hard-fail (exit 2) if neither auto-detect nor manual offset produces > 95 % frame-window match. The `ReplayInputAdapter` coordinator is also defined and implemented by this task (it is the natural home for the auto-sync logic — the coordinator owns the time-alignment concern, and auto-sync is one of the two ways the offset is resolved). |
**AZ-403 (formerly: replay-cli Dockerfile + SBOM diff CI step) is CANCELLED**: the replay-cli Docker image no longer exists under v2.0.0. The airborne Docker image IS the replay image; no SBOM diff is needed because there are no components to assert as absent. See `_docs/02_tasks/done/AZ-403_replay_dockerfile_ci.md` (cancellation banner) and the ADR-011 amendment in `architecture.md`.
## Constraints
- `@runtime_checkable` on all Protocols; DTOs `frozen=True, slots=True`.
- Lazy-import per ADR-002 with the new `BUILD_VIDEO_FILE_FRAME_SOURCE`, `BUILD_TLOG_REPLAY_ADAPTER`, `BUILD_REPLAY_SINK_JSONL` flags.
- C1C5 components MUST remain mode-agnostic (Invariant 1).
- Lazy-import per ADR-002 with the new `BUILD_VIDEO_FILE_FRAME_SOURCE`, `BUILD_TLOG_REPLAY_ADAPTER`, `BUILD_REPLAY_SINK_JSONL` flags. All three flags are ON in the airborne binary (production-default); OFF in the operator-orchestrator binary; the research binary mirrors airborne (ON).
- C1C7 + C13 components MUST remain mode-agnostic (Invariant 1).
- All time-driven logic in components MUST consume the injected `Clock` (Invariant 2).
- No HTTP server in the replay binary (parent-suite UI shells out to the CLI; defer until subprocess shape is proven insufficient).
- No HTTP server in the airborne binary regardless of mode (parent-suite UI shells out to the CLI and tails the JSONL file; defer until the subprocess shape is proven insufficient).
- pymavlink bundled unmodified per D-C8-3.
- The tlog parser MUST stream-parse — never materialise the entire tlog into memory (R-DEMO-2; multi-GB tlogs).
- MAVLink 2.0 signing key is mandatory in both modes (Invariant 11). The replay run reuses the live binary's per-flight key-load code path; the operator supplies a dummy key file.
## Risks / Mitigations
- **R-DEMO-1** (tlog ↔ video timestamp drift / unsynchronised recordings): auto-sync via IMU take-off detection (AC-7) + `--time-offset-ms` manual override. Fixed-wing hand-launch fallback documented.
- **R-DEMO-1** (tlog ↔ video timestamp drift / unsynchronised recordings): auto-sync via IMU take-off detection (AC-7) + `--time-offset-ms` manual override. Fixed-wing hand-launch fallback documented. Owned by `replay_input/` per AZ-405.
- **R-DEMO-2** (pymavlink slow on multi-GB tlogs): stream-parse, never materialise. Throughput floor benchmarked + documented in CI.
- **R-DEMO-3** (demo footage missing required FC messages): `TlogReplayFcAdapter.open(...)` fails fast at startup, listing missing message types and the components that need them.
- **R-DEMO-4** (production C1C5 paths bake real-time-cadence assumptions): `Clock` injection (Invariants 1, 2). Documented as ADR amendment in next architecture-doc cycle.
- **R-DEMO-3** (demo footage missing required FC messages): `ReplayInputAdapter.open(...)` fails fast at startup, listing missing message types and the components that need them.
- **R-DEMO-4** (production C1C5 paths bake real-time-cadence assumptions): `Clock` injection (Invariants 1, 2). Captured in ADR-011 (architecture.md).
- **R-DEMO-5 (new in v2.0.0)** (live and replay diverge silently because the modes share a composition root): mitigated by Invariant 1 (no mode-aware branches in components) + Invariant 5 (encoders are byte-identical) + the AZ-404 E2E test asserting both invariants on every PR. The single composition root is the single point of mode awareness.
## Notes for the Implementer
- The `LiveCameraFrameSource` retrofit is a no-op restructure: the existing camera-ingest thread becomes a class implementing `FrameSource`. Its behaviour is unchanged. This is what allows C1 to consume `FrameSource` via constructor without becoming replay-aware.
- The `TlogReplayFcAdapter`'s `subscribe_telemetry` fan-out runs on a dedicated thread (mirroring the live `PymavlinkArdupilotAdapter` decode-thread semantics). This way C1 and C5 see identical thread boundaries in live and replay.
- The `SerialMavlinkTransport` retrofit (introduced by AZ-400) is a no-op restructure: the existing pymavlink transport code becomes a class implementing the new tiny `MavlinkTransport` Protocol. Its behaviour is unchanged. This is what allows C8 outbound encoders to remain identical between live and replay.
- The `TlogReplayFcAdapter`'s `subscribe_telemetry` fan-out runs on a dedicated thread (mirroring the live `PymavlinkArdupilotAdapter` decode-thread semantics). C1 and C5 see identical thread boundaries in live and replay (Invariant 1).
- The `Clock` Protocol is the SAME interface in live and replay — only the strategy differs. This is the single Liskov-clean line that lets components consume `Clock` without knowing the mode.
- The `ReplayInputAdapter` lives at `src/gps_denied_onboard/replay_input/__init__.py` (public) + `tlog_video_adapter.py` (concrete) + `auto_sync.py` (AZ-405 logic). It is a Layer-4 module per `module-layout.md` (it imports from Layer 1 `frame_source/` and `clock/` interfaces, and instantiates Layer-4 strategies from `c8_fc_adapter/`). The composition root imports the **public API** of `replay_input/` only; it does not reach into the coordinator's internals.
- The parent-suite UI demo flow: operator plans a route in the suite UI → C12 builds the cache → operator runs `gps-denied-replay --video ... --tlog ... --output results.jsonl` → UI tails `results.jsonl` and renders per-tick `(lat, lon, alt, horiz_accuracy)`. The operator's pre-flight workflow is **identical** to a live flight up until the final "fly" step. This is the user-confirmed design intent.