Files
gps-denied-onboard/_docs/02_document/contracts/replay/replay_protocol.md
T
Oleksandr Bezdieniezhnykh 880eabcb3f Decompose Step 6 snapshot: 140 task specs + contract docs
Closes out greenfield Step 6 (Decompose) for all 14 components
(C1-C13 + cross-cutting helpers/replay). Covers tasks AZ-266..AZ-446
plus the _dependencies_table.md and component contract documents.

State file updated to greenfield Step 7 (Implement), not_started.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-11 00:39:48 +03:00

13 KiB
Raw Blame History

Contract: Replay Mode (FrameSource + ReplaySink + Clock + replay composition)

Owner: replay (epic AZ-265 / E-DEMO-REPLAY) — strategies live inside existing components (frame_source/, c8_fc_adapter/); only the composition root and CLI are net-new top-level files. Producer task: AZ-398 (FrameSource Protocol + VideoFileFrameSource + LiveCameraFrameSource retrofit + Clock Protocol) Consumer tasks: AZ-399 (TlogReplayFcAdapter), AZ-400 (ReplaySink + JsonlReplaySink), AZ-401 (compose_replay + Clock injection), AZ-402 (gps-denied-replay CLI), AZ-403 (Dockerfile + CI matrix + SBOM diff), AZ-404 (E2E replay fixture test), AZ-405 (Auto-sync IMU take-off detection). Version: 1.0.0 Status: draft Last Updated: 2026-05-10 Module-layout home:

  • src/gps_denied_onboard/frame_source/interface.py, __init__.pyFrameSource Protocol (Layer 1 cross-cutting per module-layout.md).
  • src/gps_denied_onboard/components/c8_fc_adapter/tlog_replay_adapter.pyTlogReplayFcAdapter (gated BUILD_TLOG_REPLAY_ADAPTER).
  • src/gps_denied_onboard/components/c8_fc_adapter/replay_sink.pyReplaySink interface + JsonlReplaySink (gated BUILD_REPLAY_SINK_JSONL).
  • src/gps_denied_onboard/clock/interface.py, __init__.pyClock Protocol.
  • src/gps_denied_onboard/runtime_root/replay.pycompose_replay(config) -> ReplayRoot.

Purpose

Defines the public interfaces enabling offline replay mode per epic AZ-265: run the production C1C5 pipeline against historical inputs (12 min Derkachi-style clip + matching pymavlink .tlog) so the parent-suite UI demo has end-to-end fidelity equal to a live flight. Production C1C5 components MUST remain mode-agnostic — replay-aware logic lives ONLY in the composition root, the new strategies, and the CLI. The replay binary is a fourth Docker image (gps-denied-replay-cli) containing C1C5 + replay strategies but NOT C6/C10/C11/C12 (no operator-side workflows; tile cache is read pre-built).

This contract defines four Protocols and the replay composition surface:

  • FrameSource — the formalised cross-cutting interface for camera-frame ingestion (previously implicit). Two strategies: LiveCameraFrameSource (retrofit; existing camera plumbing renamed and put behind the Protocol) and VideoFileFrameSource (replay-only, gated BUILD_VIDEO_FILE_FRAME_SOURCE).
  • Clock — the wall-clock vs. tlog-derived time abstraction (R-DEMO-4 mitigation). Two strategies: WallClock (live/research/operator) and TlogDerivedClock (replay only).
  • ReplaySink — the offline EstimatorOutput consumer interface. One strategy: JsonlReplaySink (one EstimatorOutput per JSONL line; gated BUILD_REPLAY_SINK_JSONL).
  • TlogReplayFcAdapter — replay-only FcAdapter strategy (per AZ-261 FcAdapter Protocol from _docs/02_document/contracts/c8_fc_adapter/fc_adapter_protocol.md); parses pymavlink .tlog and emits ImuWindow / AttitudeWindow / GpsHealth / FlightStateSignal at tlog-timestamp cadence (or wall-clock-paced per --pace). Gated BUILD_TLOG_REPLAY_ADAPTER.

The shared WgsConverter (AZ-279) is constructor-injected into the tlog adapter for tlog-GPS → local-tangent-plane conversion.

Public API

Protocol: FrameSource

@runtime_checkable
class FrameSource(Protocol):
    def next_frame(self) -> NavCameraFrame | None: ...   # None on end-of-stream
    def close(self) -> None: ...

Protocol: Clock

@runtime_checkable
class Clock(Protocol):
    def monotonic_ns(self) -> int: ...
    def time_ns(self) -> int: ...                        # wall-clock (UTC) for log timestamps
    def sleep_until_ns(self, target_ns: int) -> None: ...   # honoured in --pace realtime; no-op in --pace asap

Protocol: ReplaySink

@runtime_checkable
class ReplaySink(Protocol):
    def emit(self, output: EstimatorOutput) -> None: ...
    def close(self) -> None: ...

Concrete: TlogReplayFcAdapter

class TlogReplayFcAdapter(FcAdapter):
    def __init__(
        self,
        tlog_path: Path,
        target_fc_dialect: FcKind,            # ARDUPILOT_PLANE | INAV
        clock: Clock,
        wgs_converter: WgsConverter,
        time_offset_ms: int = 0,              # auto-detected by AZ-405 auto-sync task or set via --time-offset-ms
        pace: ReplayPace = ReplayPace.ASAP,   # REALTIME | ASAP
    ): ...

The TlogReplayFcAdapter implements the full FcAdapter Protocol from AZ-261. emit_external_position raises FcEmitError("replay adapter does not emit to FC") (replay is read-only on the FC side; downstream consumers use ReplaySink instead). request_source_set_switch raises SourceSetSwitchNotSupportedError. subscribe_telemetry is the primary surface — fans out IMU/attitude/GPS-health/flight-state from the tlog at the configured pace.

CLI surface

gps-denied-replay
  --video PATH
  --tlog PATH
  --output results.jsonl
  --camera-calibration calib.json
  --config config.yaml
  [--pace {realtime,asap}]      # default asap
  [--time-offset-ms N]          # overrides auto-sync

Composition root extension

def compose_replay(config: Config) -> ReplayRoot: ...

ReplayRoot is a dataclass holding all wired components plus the FrameSource, TlogReplayFcAdapter, ReplaySink, and Clock chosen for the replay run. The runtime loop is:

loop:
  frame = frame_source.next_frame()
  if frame is None: break
  c1 = vio.process(frame)                      # C1
  candidates = vpr.lookup(c1)                  # C2
  reranked = rerank.rerank(candidates)         # C2.5
  matched = matcher.match(reranked)            # C3
  refined = refiner.refine_if_needed(matched)  # C3.5
  pose = pose_estimator.estimate(refined)      # C4
  state.add_pose_anchor(pose)                  # C5
  state.add_vio(c1.vio_output)                 # C5
  output = state.current_estimate()
  replay_sink.emit(output)
replay_sink.close()

The tlog adapter's subscribe_telemetry callbacks are wired to C5's add_fc_imu and to C1's IMU prior on the same threads as in the live binary.

Invariants

  1. Mode-agnostic C1C5: production components MUST NOT contain if replay_mode: branches. Mode-specific behaviour lives in the strategy (Frame source / FC adapter / Sink / Clock). Verified by an explicit grep guard in CI.
  2. Single Clock per process: the composition root resolves Clock exactly once at startup. All time-driven logic (AC-5.2 fallback timer, STATUSTEXT rate-limits, key rotation logging) consumes the injected Clock via constructor — never time.monotonic_ns() directly. Verified by an AST scan in CI for direct time.monotonic_ns / time.time_ns references in components.
  3. Frame source ordering: next_frame() returns frames in monotonically non-decreasing monotonic_ns order. Out-of-order frames raise FrameSourceError (NOT silently dropped — replay must be deterministic).
  4. End-of-stream is None: next_frame() returns None ONLY when the stream is permanently exhausted. Transient I/O failures raise FrameSourceError.
  5. TlogReplayFcAdapter emit-only-via-sink: emit_external_position and emit_status_text raise FcEmitError("replay adapter does not emit to FC"). Downstream consumers MUST emit to ReplaySink instead.
  6. Pace mode honoured by Clock: pace=REALTIMEClock.sleep_until_ns(target_ns) blocks until wall-clock catches up; pace=ASAP → no-op. The pace flag is consumed ONLY by the Clock and the tlog adapter — components see only the Clock Protocol.
  7. JsonlReplaySink one-line-per-emit: each emit(output) writes exactly one JSON object + newline; the file is fsync'd on close(). Schema matches EstimatorOutput (frozen dataclass serialised via dataclasses.asdict + orjson.dumps).
  8. Time-offset honoured: when constructed with time_offset_ms != 0, the tlog adapter shifts every emitted timestamp by that offset before passing to subscribers. time_offset_ms is set ONCE at construction (no live re-tuning).
  9. Build-flag gating: VideoFileFrameSource, TlogReplayFcAdapter, JsonlReplaySink MUST refuse construction when their respective BUILD_* flag is OFF (per ADR-002 — replay binary has them ON; airborne / research / operator have them OFF).
  10. Determinism: same (video, tlog, config, time_offset_ms, pace=ASAP) input → same JSONL output within ≤ 1e-6 float drift in position fields (AC-5).

Producer / Consumer Split

Task ID Scope
AZ-398 (Producer) FrameSource Protocol; Clock Protocol; VideoFileFrameSource (gated BUILD_VIDEO_FILE_FRAME_SOURCE); LiveCameraFrameSource retrofit (rename existing camera-ingest plumbing into the Protocol shape — no behaviour change); WallClock + TlogDerivedClock strategies; composition wiring in the existing compose_root/compose_operator (Clock = WallClock there). NO tlog parsing, NO sink, NO replay composition.
AZ-399 (Consumer 1) TlogReplayFcAdapter: pymavlink stream-parser (DO NOT materialise; R-DEMO-2 throughput floor); maps tlog message types → FcTelemetryFrame; supports both AP and iNav dialects; subscribe_telemetry fan-out at the configured pace; respects time_offset_ms; honours Clock for pacing; fail-fast at startup if required message types absent (R-DEMO-3).
AZ-400 (Consumer 2) ReplaySink Protocol + JsonlReplaySink (one JSON object per line; orjson serialiser; close() fsyncs).
AZ-401 (Consumer 3) compose_replay(config) -> ReplayRoot: full strategy resolution for the replay binary; Clock strategy selection (TlogDerivedClock for ASAP, WallClock for REALTIME; documented per R-DEMO-4); FrameSource = VideoFileFrameSource; FcAdapter = TlogReplayFcAdapter; Sink = JsonlReplaySink; ALL of C1C5 wired with the same Public API as the live binary. NO C6/C10/C11/C12. Configuration loading + camera-calibration loading.
AZ-402 (Consumer 4) gps-denied-replay CLI entrypoint: argparse, config + calibration loader, runtime loop (the loop body documented in this contract above), structured-error exit codes (0=success, 2=AC-8 sync-impossible, 1=any other error).
AZ-403 (Consumer 5) gps-denied-replay-cli Dockerfile (multi-stage; Python + C1C5 + cpp/* + replay strategies; NO C6/C10/C11/C12; NO HTTP server) + GitHub Actions matrix entry + SBOM diff CI step verifying absence of excluded components per AC-4.
AZ-404 (Consumer 6) E2E replay fixture test: tests/e2e/replay/test_derkachi_1min.py — runs the CLI against a 12 min Derkachi clip + matching tlog; asserts AC-3 (≤ 100 m for ≥ 80 % of ticks); gated by RUN_REPLAY_E2E=1 in CI.
AZ-405 (Consumer 7) Auto-sync of video ↔ tlog via IMU take-off detection (AC-7 / AC-8). Take-off pattern: sustained vertical accel > 0.5 g + change in attitude rate > 1 rad/s lasting ≥ 0.5 s (typical quadcopter signature). Confidence-scored; falls back to WARN + best-guess if < 80 %; --time-offset-ms always overrides; AC-8 hard-fail (exit 2) if neither auto-detect nor manual offset produces > 95 % frame-window match.

Constraints

  • @runtime_checkable on all Protocols; DTOs frozen=True, slots=True.
  • Lazy-import per ADR-002 with the new BUILD_VIDEO_FILE_FRAME_SOURCE, BUILD_TLOG_REPLAY_ADAPTER, BUILD_REPLAY_SINK_JSONL flags.
  • C1C5 components MUST remain mode-agnostic (Invariant 1).
  • All time-driven logic in components MUST consume the injected Clock (Invariant 2).
  • No HTTP server in the replay binary (parent-suite UI shells out to the CLI; defer until subprocess shape is proven insufficient).
  • pymavlink bundled unmodified per D-C8-3.
  • The tlog parser MUST stream-parse — never materialise the entire tlog into memory (R-DEMO-2; multi-GB tlogs).

Risks / Mitigations

  • R-DEMO-1 (tlog ↔ video timestamp drift / unsynchronised recordings): auto-sync via IMU take-off detection (AC-7) + --time-offset-ms manual override. Fixed-wing hand-launch fallback documented.
  • R-DEMO-2 (pymavlink slow on multi-GB tlogs): stream-parse, never materialise. Throughput floor benchmarked + documented in CI.
  • R-DEMO-3 (demo footage missing required FC messages): TlogReplayFcAdapter.open(...) fails fast at startup, listing missing message types and the components that need them.
  • R-DEMO-4 (production C1C5 paths bake real-time-cadence assumptions): Clock injection (Invariants 1, 2). Documented as ADR amendment in next architecture-doc cycle.

Notes for the Implementer

  • The LiveCameraFrameSource retrofit is a no-op restructure: the existing camera-ingest thread becomes a class implementing FrameSource. Its behaviour is unchanged. This is what allows C1 to consume FrameSource via constructor without becoming replay-aware.
  • The TlogReplayFcAdapter's subscribe_telemetry fan-out runs on a dedicated thread (mirroring the live PymavlinkArdupilotAdapter decode-thread semantics). This way C1 and C5 see identical thread boundaries in live and replay.
  • The Clock Protocol is the SAME interface in live and replay — only the strategy differs. This is the single Liskov-clean line that lets components consume Clock without knowing the mode.