Files
gps-denied-onboard/_docs/02_document/contracts/replay/replay_protocol.md
T
Oleksandr Bezdieniezhnykh 42b1db6ace [AZ-842] Batch 04 cycle 4: AZ-835 docs + cycle-4 redesign narrative
Closes AZ-835 Epic C6 (docs) and folds the cycle-4 replay-input
redesign narrative (AZ-894 CSV adapter / AZ-895 auto-sync deprecation
/ AZ-896 format spec / AZ-897 UI follow-up) into the three
authoritative documents.

Modified:
- _docs/02_document/contracts/replay/replay_protocol.md: extend
  Invariant 12 with sub-invariants 12.c (route-driven supersedes
  bbox; ~100x tile efficiency + did-fly-vs-might-fly honesty) and
  12.d (fixture failure-handling: validation/terminal re-raise;
  transient -> C11 backoff x3). Add Invariant 14 with sub-
  invariants 14.a-14.d covering the single canonical clock model,
  the CSV-driven path, the tlog adapter's audit-only role, the
  auto-sync deprecation, and the AZ-897 UI follow-up pointer.
- _docs/02_document/architecture.md: add the AZ-777 Phase 3+
  superseded-by-Epic-AZ-835 supersession block + new "Replay input
  redesign (cycle 4)" sub-section with the cycle-4 ticket table.
- tests/e2e/replay/README.md: top section restructured for two
  distinct entry points (AZ-265/AZ-404 vs. AZ-835/AZ-840); add
  full AZ-835 orchestrator-test section (env vars, skip gates,
  expected runtime, verdict report path); add Imagery (c) Google
  attribution + dev-only caveat; add Epic AZ-835 ticket map.

Spec deviation: AC-1b says "new Invariant 13" but Invariant 13 is
already taken (C4<->C5 pairing, AZ-776 / ADR-012), and is referenced
by number in architecture.md, c4_pose description.md, and ADR-012
prose. Cycle-4 content shipped as Invariant 14 to preserve those
cross-references; renumbering would have cascaded to 3 files outside
AZ-842's ownership envelope. Documented in batch report.

Out-of-scope hygiene gap (NOT fixed in this batch):
BUILD_CSV_REPLAY_ADAPTER flag is not yet enumerated in
_docs/02_document/module-layout.md's Build-Time Exclusion Map.
Inherited from cycle-4 AZ-894. Suggested as a cycle-5+ hygiene PBI.

AZ-835 epic file stays in todo/ until AZ-841 (backlog) is resolved.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-29 11:13:33 +03:00

39 KiB
Raw Blame History

Contract: Replay Mode (replay_input module + FrameSource + Clock + ReplaySink + NoopMavlinkTransport)

Owner: replay (epic AZ-265 / E-DEMO-REPLAY) — strategies live inside existing components (frame_source/, clock/, c8_fc_adapter/); a small new replay_input/ cross-cutting module converges (video, tlog) inputs into the standard FrameSource + FcAdapter boundaries the rest of the system already consumes. Producer task: AZ-398 (FrameSource Protocol + VideoFileFrameSource + LiveCameraFrameSource retrofit + Clock Protocol) Consumer tasks: AZ-399 (TlogReplayFcAdapter), AZ-400 (ReplaySink + JsonlReplaySink + NoopMavlinkTransport), AZ-401 (replay-mode branch in compose_root), AZ-402 (gps-denied-replay CLI wrapper), AZ-404 (E2E replay fixture test), AZ-405 (Auto-sync IMU take-off detection inside replay_input/). Version: 2.0.0 (replaces v1.0.0 — "replay is a fourth Docker image" design replaced by "replay is a configuration of the airborne binary"; see ADR-011) Status: draft Last Updated: 2026-05-14 Module-layout home:

  • src/gps_denied_onboard/frame_source/interface.py, __init__.pyFrameSource Protocol (Layer 1 cross-cutting per module-layout.md).
  • src/gps_denied_onboard/clock/interface.py, __init__.pyClock Protocol (Layer 1 cross-cutting).
  • src/gps_denied_onboard/components/c8_fc_adapter/tlog_replay_adapter.pyTlogReplayFcAdapter strategy (gated BUILD_TLOG_REPLAY_ADAPTER; ON in the airborne binary).
  • src/gps_denied_onboard/components/c8_fc_adapter/replay_sink.pyReplaySink Protocol + JsonlReplaySink strategy (gated BUILD_REPLAY_SINK_JSONL; ON in the airborne binary).
  • src/gps_denied_onboard/components/c8_fc_adapter/noop_mavlink_transport.pyNoopMavlinkTransport strategy (gated BUILD_REPLAY_SINK_JSONL; ON in the airborne binary; wraps the live MAVLink transport layer so C8 encoders are unchanged).
  • src/gps_denied_onboard/replay_input/ — new Layer-4 cross-cutting coordinator that owns (video, tlog)(FrameSource, FcAdapter, Clock) convergence + auto-sync + time-offset application.
  • src/gps_denied_onboard/runtime_root/__init__.pycompose_root(config) extended with a config.mode = "live" | "replay" branch (no separate compose_replay composition root; replay is a configuration of the single airborne composition root).
  • src/gps_denied_onboard/cli/replay.pygps-denied-replay console-script: builds a replay-mode Config and dispatches into the same companion entry point as live.

Purpose

Defines the public interfaces enabling offline replay mode per epic AZ-265: run the production C1C5 pipeline (with the full C6 tile cache + the same C7 inference runtime + the same C13 FDR) against historical inputs (12 min Derkachi-style clip + matching pymavlink .tlog) so the parent-suite UI demo has end-to-end fidelity equal to a live flight.

Design (v2.0.0 — replaces v1.0.0): replay is a configuration of the airborne binary, not a separate Docker image. See ADR-011 for the full rationale. The same image, same components, same composition root, same pre-flight workflow as a live flight; only three strategies differ at runtime:

Concern Live strategy Replay strategy
FrameSource LiveCameraFrameSource VideoFileFrameSource
FcAdapter (inbound IMU/attitude/GPS/flight-state) PymavlinkArdupilotAdapter / Msp2InavAdapter TlogReplayFcAdapter
FcAdapter outbound transport (the bytes that go onto the wire) Real serial/UART link to ArduPilot Plane / iNav NoopMavlinkTransport (sink; C8 encoders unchanged)
Clock WallClock TlogDerivedClock (pace=ASAP) or WallClock (pace=REALTIME)
Per-tick position observable to the UI C8 outbound + GCS telemetry summary Additional JsonlReplaySink tap on C5's EstimatorOutput stream

Everything else is identical: C6 reads the same pre-built tile cache the operator built via the normal C10/C11/C12 pre-flight flow; C7 deserializes the same TensorRT engines; C13 writes a real FDR for the replay run (a real flight record, just driven by historical inputs). Production C1C5 components remain mode-agnostic — replay-aware logic lives ONLY in the composition root branch, the strategies named above, the replay_input/ coordinator, and the CLI.

The user-visible result: a UI consumer tails the JSONL file and sees per-tick (lat, lon, alt, horiz_accuracy) exactly as the airborne binary would emit them in a real flight. Other MAVLink emits (FC GPS_INPUT, GCS STATUSTEXT, EKF source-set commands) are swallowed by NoopMavlinkTransport — the operator confirmed they don't need to be observable in replay (the contract above is the single source of truth for that decision).

This contract defines four Protocols, one coordinator class, and the replay-mode composition branch:

  • FrameSource — formalised cross-cutting interface for camera-frame ingestion. Two strategies: LiveCameraFrameSource (live) and VideoFileFrameSource (replay; gated BUILD_VIDEO_FILE_FRAME_SOURCE).
  • Clock — wall-clock vs. tlog-derived time abstraction (R-DEMO-4 mitigation). Two strategies: WallClock (live/research/operator/replay-realtime) and TlogDerivedClock (replay-asap).
  • ReplaySink — offline EstimatorOutput consumer interface tapping C5's output stream. One strategy: JsonlReplaySink (one EstimatorOutput per JSONL line; gated BUILD_REPLAY_SINK_JSONL).
  • TlogReplayFcAdapter — replay-only FcAdapter strategy (per AZ-261 FcAdapter Protocol from _docs/02_document/contracts/c8_fc_adapter/fc_adapter_protocol.md); parses pymavlink .tlog and emits ImuWindow / AttitudeWindow / GpsHealth / FlightStateSignal at tlog-timestamp cadence (or wall-clock-paced per --pace). Gated BUILD_TLOG_REPLAY_ADAPTER.
  • NoopMavlinkTransport — replay-only outbound transport that swallows every byte the C8 encoders try to write. The C8 outbound encoder code path is unchanged between live and replay (Invariant 1); the transport layer is the only place the destination differs. Gated BUILD_REPLAY_SINK_JSONL (shares the build flag with JsonlReplaySink — both are "where does this binary send its outputs in replay" concerns).
  • ReplayInputAdapter — Layer-4 coordinator class in replay_input/ that owns (video, tlog) lifecycle, applies the time-offset (manual via --time-offset-ms or auto via AZ-405 IMU-take-off detection), instantiates VideoFileFrameSource + TlogReplayFcAdapter + chosen Clock, and hands the trio to the composition root. The composition root sees only standard FrameSource + FcAdapter + Clock after the coordinator is opened.

The shared WgsConverter (AZ-279) is constructor-injected into the tlog adapter for tlog-GPS → local-tangent-plane conversion (unchanged from v1.0.0).

Public API

Protocol: FrameSource (unchanged from v1.0.0)

@runtime_checkable
class FrameSource(Protocol):
    def next_frame(self) -> NavCameraFrame | None: ...   # None on end-of-stream
    def close(self) -> None: ...

Protocol: Clock (unchanged from v1.0.0)

@runtime_checkable
class Clock(Protocol):
    def monotonic_ns(self) -> int: ...
    def time_ns(self) -> int: ...                        # wall-clock (UTC) for log timestamps
    def sleep_until_ns(self, target_ns: int) -> None: ...   # honoured in --pace realtime; no-op in --pace asap

Protocol: ReplaySink (unchanged from v1.0.0)

@runtime_checkable
class ReplaySink(Protocol):
    def emit(self, output: EstimatorOutput) -> None: ...
    def close(self) -> None: ...

Concrete: TlogReplayFcAdapter (unchanged from v1.0.0)

class TlogReplayFcAdapter(FcAdapter):
    def __init__(
        self,
        tlog_path: Path,
        target_fc_dialect: FcKind,            # ARDUPILOT_PLANE | INAV
        clock: Clock,
        wgs_converter: WgsConverter,
        time_offset_ms: int = 0,              # set by ReplayInputAdapter (auto-sync or --time-offset-ms)
        pace: ReplayPace = ReplayPace.ASAP,   # REALTIME | ASAP
    ): ...

The TlogReplayFcAdapter implements the full FcAdapter Protocol from AZ-261. subscribe_telemetry fans out IMU/attitude/GPS-health/flight-state from the tlog at the configured pace. emit_external_position, emit_status_text, and request_source_set_switch are implemented as no-ops that delegate to the underlying transport — in replay mode the transport is NoopMavlinkTransport (see below), so the bytes go nowhere; in live mode the same encoders shape the same bytes for a real wire. The encoder code path is identical; only the transport differs.

Concrete: NoopMavlinkTransport

class NoopMavlinkTransport(MavlinkTransport):
    """Outbound transport sink for replay mode.

    Accepts every `write(payload: bytes)` and `close()` call without I/O.
    Counts bytes written for observability (FDR + INFO log at close).
    """

    def write(self, payload: bytes) -> None: ...   # silent drop
    def close(self) -> None: ...
    def bytes_written(self) -> int: ...            # observability

The C8 outbound encoders (per the v1.0.0 FcAdapter protocol — emit_external_position, emit_status_text, request_source_set_switch, and the QgcTelemetryAdapter 12 Hz GCS summary) operate over a constructor-injected MavlinkTransport interface (a new tiny Protocol introduced by AZ-401 to make this swap clean). In live mode the transport is SerialMavlinkTransport writing to the UART; in replay mode it is NoopMavlinkTransport. The encoders themselves are unchanged — they produce the same byte streams, including the MAVLink 2.0 signing handshake and per-flight key rotation. The signing key is mandatory in both modes (the operator supplies a dummy key for replay; the contract does not constrain the key's provenance).

This is the single architectural point that lets us say "replay is exactly like live, only the destination differs" without baking if replay_mode: branches into C8.

Concrete: ReplayInputAdapter

@dataclass(frozen=True, slots=True)
class ReplayInputBundle:
    frame_source: FrameSource
    fc_adapter: FcAdapter
    clock: Clock
    resolved_time_offset_ms: int
    auto_sync_result: AutoSyncDecision | None    # None when --time-offset-ms is provided


class ReplayInputAdapter:
    """Converges (video, tlog) into the standard FrameSource + FcAdapter + Clock surfaces.

    Owns the time-alignment between video frames and tlog IMU/attitude ticks
    (manual via --time-offset-ms or automatic via AZ-405 IMU-take-off detection).
    Instantiates VideoFileFrameSource, TlogReplayFcAdapter, and the chosen Clock.
    The composition root, after calling .open(), sees no replay-specific types.
    """

    def __init__(
        self,
        *,
        video_path: Path,
        tlog_path: Path,
        camera_calibration: CameraCalibration,
        target_fc_dialect: FcKind,
        wgs_converter: WgsConverter,
        fdr_client: FdrClient,                # forwarded to TlogReplayFcAdapter + used for replay_input's own FDR records (auto-sync detected / low-confidence / AC-8 hard-fail)
        pace: ReplayPace,
        manual_time_offset_ms: int | None,    # None → auto-sync runs (AZ-405)
        auto_sync_config: AutoSyncConfig,
    ) -> None: ...

    def open(self) -> ReplayInputBundle:
        """Resolve time-offset (auto-sync or manual), build the strategies, return the bundle.

        Raises:
            ReplayInputAdapterError("tlog missing required message types: ...")
                — R-DEMO-3 fail-fast at startup.
            ReplayInputAdapterError("auto-sync hard-fail: ...")
                — AC-8 of the epic (≤ 95 % frame-window match).
            ReplayInputAdapterError("video file unreadable / unsupported codec / ...")
                — VideoFileFrameSource opening failure surfaced at coordinator scope.
        """
        ...

    def close(self) -> None: ...   # closes both inputs; idempotent

CLI surface

gps-denied-replay
  --video PATH
  --tlog PATH
  --output results.jsonl
  --camera-calibration calib.json
  --config config.yaml                    # same config schema as the airborne binary
  --mavlink-signing-key PATH              # mandatory; operator provides a dummy key for replay
  [--pace {realtime,asap}]                # default asap
  [--time-offset-ms N]                    # overrides AZ-405 auto-sync

The CLI is a thin mode-config wrapper: it loads config.yaml, sets config.mode = "replay" and the replay-specific paths/flags, and calls the same entry point the live binary uses. The shared entry point calls compose_root(config) which returns a wired runtime; the runtime's per-frame loop is unchanged between live and replay.

Composition root extension

runtime_root/__init__.py exposes a single compose_root(config) -> Runtime (no separate compose_replay). When config.mode == "replay":

  1. Build a ReplayInputAdapter from config.replay.{video_path, tlog_path, pace, time_offset_ms, …} + the same CameraCalibration and WgsConverter the live path already uses.
  2. Call replay_input.open()ReplayInputBundle(frame_source, fc_adapter, clock, …).
  3. Pick the MavlinkTransport strategy: NoopMavlinkTransport (replay) vs. SerialMavlinkTransport (live), based on config.mode.
  4. Add a JsonlReplaySink subscriber to C5's EstimatorOutput stream (replay only). The live binary already emits to C8 outbound + QGC telemetry adapter; the JSONL sink is an additional listener, not a replacement.
  5. Wire C1C5 + C6 + C7 + C13 exactly as in the live composition (Invariant 1 — components see the same interfaces).
  6. Return the wired Runtime whose per-frame loop is the existing one (single source of truth — no per-mode loop).
loop:
  frame = frame_source.next_frame()           # VideoFileFrameSource in replay
  if frame is None: break
  c1 = vio.process(frame)                     # C1
  candidates = vpr.lookup(c1)                 # C2 (uses real C6 DescriptorIndex)
  reranked = rerank.rerank(candidates)        # C2.5
  matched = matcher.match(reranked)           # C3
  refined = refiner.refine_if_needed(matched) # C3.5
  pose = pose_estimator.estimate(refined)     # C4
  state.add_pose_anchor(pose)                 # C5
  state.add_vio(c1.vio_output)                # C5
  output = state.current_estimate()
  # multiple listeners, all wired by the composition root:
  fc_adapter.emit_external_position(output)   # → NoopMavlinkTransport in replay; SerialMavlinkTransport live
  fdr.write(output)                           # C13: ALWAYS, both modes
  if replay_sink is not None:                 # replay only
      replay_sink.emit(output)                # JsonlReplaySink → JSONL file → UI tails it

Side notes:

  • The tlog adapter's subscribe_telemetry callbacks are wired to C5's add_fc_imu and to C1's IMU prior on the same threads as in the live binary (Invariant 1 — same threads, same callbacks, different source).
  • set_takeoff_origin (AZ-490 / ADR-010) is invoked identically in replay: the operator's pre-flight C10 Manifest is the source of truth in both modes. The tlog's first GPS fix is the fallback, gated through the same Principle #11 bounded-delta check.
  • BUILD_FAISS_INDEX is ON in the airborne binary (live and replay alike). C2 in replay queries the real C6 FaissDescriptorIndex, populated by the pre-flight C10 build. This is the architectural change vs. v1.0.0 of this contract.

Composition profile: open-loop ESKF (AZ-776 / ADR-012)

The replay binary supports a second composition profile alongside the production GTSAM-iSAM2 path: open-loop ESKF. It is the Tier-2 smoke baseline for the AZ-265 replay flow and the only profile that can run end-to-end against the Derkachi clip today (the satellite-anchored profile waits on AZ-777's C6 reference tile cache).

The user-facing switch is one YAML field on the C4 block:

c4_pose:
  enabled: false
c5_state:
  strategy: eskf

When c4_pose.enabled = false:

  • compose_root removes c4_pose from the component selection map before topological ordering; the C4 wrapper never runs, the OpenCVGtsamPoseEstimator is never instantiated, and pose_estimator.estimate(refined) in the per-frame loop above is a no-op (C5 receives IMU + VIO inputs only, no PnP anchor).
  • build_pre_constructed omits c5_isam2_graph_handle from the pre_constructed dict (no consumer requires it). The ESKF estimator is still pre-built and cached in the internal _c5_prebuilt_estimator slot exactly as in the gtsam_isam2 path; the C5 wrapper short-circuits onto the prebuilt instance per AZ-625.
  • The replay per-frame loop is otherwise unchanged — C1 VIO still runs, C5 still ingests, the JSONL sink still emits per video frame. Position drifts open-loop without satellite re-anchoring; AZ-777 closes that half of the loop by adding C2/C3/C4 against the C6 tile cache.

The 2×2 pairing matrix between c4_pose.enabled and c5_state.strategy is enforced at compose time (Invariant 13 below). The two valid cells are:

c4_pose.enabled c5_state.strategy Profile Status
true gtsam_isam2 Steady-state airborne (ADR-003 / ADR-009) Production
false eskf Open-loop ESKF (this section) Replay Tier-2 smoke baseline

The two invalid cells (true + eskf and false + gtsam_isam2) raise CompositionError from compose_root with explicit error text naming both blocks. The Tier-2 replay fixture (tests/e2e/replay/conftest.py) writes the second valid cell. The first cell is the live + airborne default.

Invariants

  1. Mode-agnostic C1C7, C13: production components MUST NOT contain if config.mode == "replay": branches. Mode-specific behaviour lives in the strategies (FrameSource / FcAdapter / MavlinkTransport / ReplaySink / Clock). Verified by an explicit grep guard in CI (the AZ-404 E2E test owns this assertion).

  2. Single Clock per process: compose_root resolves Clock exactly once at startup. All time-driven logic (AC-5.2 fallback timer, STATUSTEXT rate-limits, key rotation logging) consumes the injected Clock via constructor — never time.monotonic_ns() directly. Verified by an AST scan in CI for direct time.monotonic_ns / time.time_ns references in components/**/*.py.

  3. Frame source ordering: next_frame() returns frames in monotonically non-decreasing monotonic_ns order. Out-of-order frames raise FrameSourceError (NOT silently dropped — replay must be deterministic).

  4. End-of-stream is None: next_frame() returns None ONLY when the stream is permanently exhausted. Transient I/O failures raise FrameSourceError.

  5. Outbound MAVLink encoders are mode-agnostic: the C8 outbound encoders for GPS_INPUT / MSP2_SENSOR_GPS / STATUSTEXT / NAMED_VALUE_FLOAT / MAV_CMD_SET_EKF_SOURCE_SET produce identical byte streams in both modes. Only the MavlinkTransport strategy differs (Serial vs. Noop). The MAVLink 2.0 signing handshake runs in replay too (the operator provides a dummy signing key); the signing bytes are produced and then dropped by NoopMavlinkTransport. Verified by a unit test that captures the encoder output in both modes and diffs the byte streams.

  6. Pace mode honoured by Clock: pace=REALTIMEClock.sleep_until_ns(target_ns) blocks until wall-clock catches up; pace=ASAP → no-op. The pace flag is consumed ONLY by the Clock and the tlog adapter — components see only the Clock Protocol.

  7. JsonlReplaySink one-line-per-emit: each emit(output) writes exactly one JSON object + newline; the file is fsync'd on close(). Schema matches EstimatorOutput (frozen dataclass serialised via dataclasses.asdict + orjson.dumps).

  8. Time-offset resolved before composition: the ReplayInputAdapter resolves time_offset_ms (auto-sync or manual) and locks it into the TlogReplayFcAdapter constructor before compose_root returns the wired runtime. No live re-tuning.

  9. Build-flag gating: VideoFileFrameSource, TlogReplayFcAdapter, JsonlReplaySink, NoopMavlinkTransport MUST refuse construction when their respective BUILD_* flag is OFF (per ADR-002). In the airborne binary all four flags are ON by default; setting any of them OFF in airborne disables replay mode (the binary still runs live mode normally).

  10. Determinism: same (video, tlog, config, time_offset_ms, pace=ASAP) input → same JSONL output within ≤ 1e-6 float drift in position fields (AC-5).

  11. MAVLink signing key required in replay: the airborne binary refuses to run without --mavlink-signing-key PATH in both modes. In replay the operator supplies a dummy file (well-formed key bytes; no real channel to verify against). This preserves Invariant 5 — the encoders' signing code path runs identically in both modes.

  12. Real C6 cache in replay: the airborne binary in replay mode reads the same pre-built C6 tile cache the operator built via the normal pre-flight C10/C11/C12 flow. There is no replay-specific cache shape. Verified by the AZ-404 E2E fixture, which runs the operator's pre-flight flow before invoking the replay CLI.

    Sub-invariant 12.a (cycle 3 — AZ-839 / Epic AZ-835 C3): the e2e operator_pre_flight_setup fixture replaces the cycle-1 mkdir placeholder with a real driver that wires C1 (replay_input.tlog_route.extract_route_from_tlog — AZ-836) + C2 (c11_tile_manager.route_client.SatelliteProviderRouteClient.seed_route — AZ-838) + C11 (tile_downloader.HttpTileDownloader.download_for_bbox) + C10 (DescriptorBatcher) to populate C6 from a tlog-derived corridor. The fixture yields a PopulatedC6Cache dataclass (cache_root, tile_store_path, faiss_index_path, faiss_sidecar_sha256_path, faiss_sidecar_meta_path, route_spec, tile_count, elapsed_seconds). The cache is mounted into a named docker volume that survives across pytest sessions (cold first invocation populates; subsequent invocations within the same compose session reuse — warm cache). Cold-start budget: ≤ 5 min on Tier-2 Jetson; warm: ≤ 30 s. Sidecar triple-consistency (.index + .sha256 + .meta.json) per AZ-306 is verified at every fixture yield; mismatch raises IndexUnavailableError. The C12 production binding for the route-driven path is a future-cycle integration; production pre-flight still uses the bbox-driven download_tiles_for_area path today.

    Sub-invariant 12.c (cycle 3 — Epic AZ-835: route-driven supersedes bbox): route-driven seeding (operator's tlog-derived RouteSpecPOST /api/satellite/route → corridor materialised by satellite-provider) supersedes the legacy AZ-777 bbox-driven approach (POST /api/satellite/request over a fixed lat/lon box) for the real-flight validation path. The supersedure rationale is twofold:

    • Tile efficiency (~100×): the AZ-777 bbox for a typical Derkachi-style flight produces ~11,400 z15-z18 tiles (~140 MB, 48 % over the C6 cache budget). A 10-point coarsened route with regionSizeMeters=500 per point produces ~50-100 unique tiles (~1.5 MB) for the same VPR descriptor lock area. The route-driven path is the only one that fits the AZ-696 reference-fixture budget on Jetson.
    • Pre-commitment honesty: a bbox pre-commits to where the operator might fly. A route pre-commits to where they did fly. For real-flight validation against ground-truth GPS, the latter is the right primitive — it ensures the FAISS index is populated with descriptors of the tiles the airborne pipeline will actually query, not a superset whose VPR misses are statistically indistinguishable from the AZ-696 AC-3 ≤ 100 m threshold violations.

    AZ-777 Phase 1 (e2e-runner wiring + C11 read-contract adaptation) is retained and reused by Epic AZ-835. AZ-777 Phases 3 and 5 are superseded by Epic AZ-835 children (AZ-839 for the operator-fixture rewrite, AZ-842 for the docs work). Phase 4 (un-xfail of AC-4/AC-5) was deferred to backlog after cycle-4 AZ-895 took the un-xfail target along a different path; it is not on the active epic.

    Sub-invariant 12.d (cycle 3 — AZ-839 / Epic AZ-835 C3: fixture failure-handling contract): the operator_pre_flight_setup fixture must distinguish three failure classes from SatelliteProviderRouteClient.seed_route / HttpTileDownloader.download_for_bbox and surface them honestly:

    Class Source Fixture response
    Validation RouteValidationError (pre-emptive AZ-809 bound violation) or IndexUnavailableError (sidecar triple mismatch at yield-time) Re-raise — operator/test author error, no remediation in the fixture
    Terminal RouteTerminalFailureError (satellite-provider rejected the route id or status polling returned mapsReady=false past poll_max_attempts) Re-raise — service-side state cannot be recovered by retry
    Transient RouteTransientError or TileDownloadError with HTTP 5xx / network reset Retry up to 3 attempts using C11's existing exponential backoff schedule (HttpTileDownloader.RETRY_* constants); re-raise on exhaustion

    The fixture does NOT swallow transient failures silently — the third attempt's exception surfaces with the full retry history in the message so the test report can distinguish "fixture genuinely tried 3×" from "fixture short-circuited". Cold-start budget of ≤ 5 min on Tier-2 Jetson is measured wall-clock around the entire retry loop, not per-attempt.

    Sub-invariant 12.b (cycle 3 — AZ-840 / Epic AZ-835 C4): the E2E orchestrator test tests/e2e/replay/test_az835_e2e_real_flight.py takes only (tlog, video, calibration) and runs the full 7-step pipeline end-to-end on Tier-2 Jetson — no operator hand-curation between steps. The 7 steps are: (1) active flight cut + tlog/video sync via AZ-405; (2) on-fly frame + IMU extraction; (3) auto-create route via AZ-836; (4) POST route to satellite-provider via the C3 fixture's operator_pre_flight_setup (delegates to AZ-838); (5) build FAISS index (driven by C3); (6) run gps-denied airborne pipeline against the populated cache + tlog/video/calibration (reuses the airborne composition root path AZ-699 exercises); (7) compute horizontal-error distribution and emit the AZ-699 verdict report at _docs/06_metrics/real_flight_validation_<YYYY-MM-DD>.md. The verdict report is emitted ALWAYS, regardless of PASS / FAIL on the AZ-696 ≥ 80 % within 100 m gate — the success criterion is that the report exists with the honest distribution, not that the verdict is PASS. Gated by RUN_REPLAY_E2E=1 + @pytest.mark.tier2.

  13. C4↔C5 pairing matrix is enforced at compose time (AZ-776 / ADR-012): compose_root rejects the two off-diagonal cells of the (c4_pose.enabled, c5_state.strategy) matrix with a CompositionError naming both blocks. enabled=False + gtsam_isam2 and enabled=True + eskf are forbidden. The two valid cells are enabled=True + gtsam_isam2 (production steady-state per ADR-003 / ADR-009) and enabled=False + eskf (open-loop ESKF — replay Tier-2 smoke baseline; satellite anchoring deferred to AZ-777). Verified by tests/unit/runtime_root/test_az776_open_loop_eskf_composition.py AC-3a and AC-3b.

  14. Single canonical clock & CSV-driven replay path (cycle 4 — AZ-894 / AZ-895 / AZ-896): production runs as a single edge process on a single device. There is exactly one wall/monotonic clock authoritative for timestamps that cross component boundaries — the clock at the C8 inbound boundary (FcAdapter) where IMU windows enter the system. Two-clock surfaces — for example a C1 VioOutput.emitted_at_ns derived from the Jetson monotonic_ns() paired against a C8 ImuWindow.ts_end_ns derived from FC-boot — produced the AZ-848 ESKF out-of-order regression observed in cycle 3 (Jetson clock advanced between IMU window arrival and VIO emission, so the VIO emission timestamp routinely landed before the IMU window's ts_end_ns when the two were compared as if on the same axis, and ESKF rejected its own VIO updates). All downstream timestamps (EstimatorOutput.ts_ns, JsonlReplaySink per-row t, FDR flight_event.ts_ns) MUST derive from a single canonical clock that produces deterministic per-record values for a given input. In live mode the canonical clock is the C8 inbound IMU window's FC-boot-relative timestamp; in replay mode it is the CSV row's Time column.

    Sub-invariant 14.a (CSV-driven replay path — AZ-894): the replay-mode operator input is (video, CSV). The CSV row's Time column is the canonical clock for the entire replay run: every IMU window emitted by the new csv_replay_input.CsvReplayInputAdapter (gated BUILD_CSV_REPLAY_ADAPTER=ON in the airborne and research binaries) carries ts_end_ns derived from the CSV Time column; the Clock strategy injected into the composition root is CsvDerivedClock which uses the same column. There is no auto-sync (see 14.c below). The CSV must satisfy the format spec at _docs/02_document/contracts/replay/csv_replay_format.md (AZ-896) — including the requirement that row 0's Time equals video frame 0 (t=0) so the airborne pipeline does not need to apply any per-stream offset.

    Sub-invariant 14.b (tlog adapter audit-only role — AZ-895): TlogReplayFcAdapter (Sub-invariant 14 of the prior cycles' design) is retained in source for two audit / migration paths and removed from the replay test/demo critical path:

    • FDR analysis: one-shot tlog parsing for incident review (e.g. AZ-848 timestamp investigation) — invoked from offline analysis scripts under tools/, not from the airborne composition root.
    • One-shot tlog → CSV export: a CLI utility (gps-denied-tlog-to-csv) that reads a pymavlink tlog and writes the canonical CSV per AZ-896. This is the migration ramp for users who only have legacy tlog inputs.

    The previous compose_root(config={"mode": "replay", "replay_input.adapter": "tlog"}) code path is preserved with a one-cycle deprecation warning on startup; removal is tracked in AZ-908 (cycle-5+ backlog). The CSV adapter (BUILD_CSV_REPLAY_ADAPTER=ON) is the default and the only path the e2e fixture suite exercises after cycle 4.

    Sub-invariant 14.c (auto-sync deprecation — AZ-895): the replay_input.auto_sync module (AZ-405) is reduced to a deprecated no-op stub that raises ReplayInputAdapterError("auto-sync removed; supply --imu CSV instead") from every public entry point. The CLI flags --time-offset-ms, --skip-auto-sync, and --auto-trim are accepted with a deprecation warning and ignored. The justification: with a single canonical clock at the CSV row level (14.a), there is no second clock to align against — the operator authors the CSV with the correct row-0 alignment, and the fixture verifies row 0's Time == 0. Hard removal of the deprecated surface is tracked in AZ-908; this cycle ships only the stub + warnings to preserve source-compat for any downstream caller built against AZ-405's pre-deprecation shape.

    Sub-invariant 14.d (operator-facing UI — AZ-897, future cycle): the cycle-4 deliverable is the headless gps-denied-replay --video X --imu Y shape. An operator-facing web UI (single-page React + Tailwind form that uploads a paired (video, CSV) and tails the verdict) is tracked separately in AZ-897 and is NOT on the critical path of the CSV redesign; this sub-invariant exists only to record that the format spec (AZ-896) and the CSV adapter (AZ-894) MUST stay UI-friendly (CSV example, format docs link, clear error messages on row-0-misalignment) so AZ-897 lands without contract drift.

Producer / Consumer Split

Task ID Scope
AZ-398 (Producer) FrameSource Protocol; Clock Protocol; VideoFileFrameSource (gated BUILD_VIDEO_FILE_FRAME_SOURCE); LiveCameraFrameSource retrofit (rename existing camera-ingest plumbing into the Protocol shape — no behaviour change); WallClock + TlogDerivedClock strategies; composition wiring in compose_root (Clock = WallClock in live, picked per-pace in replay). NO tlog parsing, NO sink, NO replay coordinator.
AZ-399 (Consumer 1) TlogReplayFcAdapter: pymavlink stream-parser (DO NOT materialise; R-DEMO-2 throughput floor); maps tlog message types → FcTelemetryFrame; supports both AP and iNav dialects; subscribe_telemetry fan-out at the configured pace; respects time_offset_ms; honours Clock for pacing; outbound emit_* methods delegate to constructor-injected MavlinkTransport (Invariant 5); fail-fast at startup if required message types absent (R-DEMO-3).
AZ-400 (Consumer 2) ReplaySink Protocol + JsonlReplaySink (one JSON object per line; orjson serialiser; close() fsyncs). Also: MavlinkTransport Protocol cut-out + NoopMavlinkTransport strategy + SerialMavlinkTransport retrofit (rename the existing C8 transport code into the Protocol shape — no behaviour change).
AZ-401 (Consumer 3) Extend compose_root(config) with a config.mode = "live" | "replay" branch: in replay mode, builds the ReplayInputAdapter, picks NoopMavlinkTransport, adds the JsonlReplaySink listener on C5's EstimatorOutput stream, and otherwise wires C1C7 + C13 identically to live. Build-flag check at startup. NO separate compose_replay function (replay is a configuration of the single composition root).
AZ-402 (Consumer 4) gps-denied-replay CLI: argparse, config + calibration loader, sets config.mode = "replay", dispatches into the same companion entry point as live; structured-error exit codes (0=success, 2=AC-8 sync-impossible from ReplayInputAdapter.open(), 1=any other error).
AZ-404 (Consumer 6) E2E replay fixture test: tests/e2e/replay/test_derkachi_1min.py — runs the CLI against a 12 min Derkachi clip + matching tlog; asserts AC-3 (≤ 100 m for ≥ 80 % of ticks); gated by RUN_REPLAY_E2E=1 in CI. Asserts Invariant 1 (no if config.mode == "replay" branches in components) via an AST scan.
AZ-405 (Consumer 7) Auto-sync of video ↔ tlog via IMU take-off detection. Lives inside replay_input/ (this task creates the module): take-off pattern (sustained vertical accel > 0.5 g + change in attitude rate > 1 rad/s lasting ≥ 0.5 s) + video motion-onset; confidence-scored; falls back to WARN + best-guess if < 80 %; --time-offset-ms always overrides; AC-8 hard-fail (exit 2) if neither auto-detect nor manual offset produces > 95 % frame-window match. The ReplayInputAdapter coordinator is also defined and implemented by this task (it is the natural home for the auto-sync logic — the coordinator owns the time-alignment concern, and auto-sync is one of the two ways the offset is resolved).

AZ-403 (formerly: replay-cli Dockerfile + SBOM diff CI step) is CANCELLED: the replay-cli Docker image no longer exists under v2.0.0. The airborne Docker image IS the replay image; no SBOM diff is needed because there are no components to assert as absent. See _docs/02_tasks/done/AZ-403_replay_dockerfile_ci.md (cancellation banner) and the ADR-011 amendment in architecture.md.

Constraints

  • @runtime_checkable on all Protocols; DTOs frozen=True, slots=True.
  • Lazy-import per ADR-002 with the new BUILD_VIDEO_FILE_FRAME_SOURCE, BUILD_TLOG_REPLAY_ADAPTER, BUILD_REPLAY_SINK_JSONL flags. All three flags are ON in the airborne binary (production-default); OFF in the operator-orchestrator binary; the research binary mirrors airborne (ON).
  • C1C7 + C13 components MUST remain mode-agnostic (Invariant 1).
  • All time-driven logic in components MUST consume the injected Clock (Invariant 2).
  • No HTTP server in the airborne binary regardless of mode (parent-suite UI shells out to the CLI and tails the JSONL file; defer until the subprocess shape is proven insufficient).
  • pymavlink bundled unmodified per D-C8-3.
  • The tlog parser MUST stream-parse — never materialise the entire tlog into memory (R-DEMO-2; multi-GB tlogs).
  • MAVLink 2.0 signing key is mandatory in both modes (Invariant 11). The replay run reuses the live binary's per-flight key-load code path; the operator supplies a dummy key file.

Risks / Mitigations

  • R-DEMO-1 (tlog ↔ video timestamp drift / unsynchronised recordings): auto-sync via IMU take-off detection (AC-7) + --time-offset-ms manual override. Fixed-wing hand-launch fallback documented. Owned by replay_input/ per AZ-405.
  • R-DEMO-2 (pymavlink slow on multi-GB tlogs): stream-parse, never materialise. Throughput floor benchmarked + documented in CI.
  • R-DEMO-3 (demo footage missing required FC messages): ReplayInputAdapter.open(...) fails fast at startup, listing missing message types and the components that need them.
  • R-DEMO-4 (production C1C5 paths bake real-time-cadence assumptions): Clock injection (Invariants 1, 2). Captured in ADR-011 (architecture.md).
  • R-DEMO-5 (new in v2.0.0) (live and replay diverge silently because the modes share a composition root): mitigated by Invariant 1 (no mode-aware branches in components) + Invariant 5 (encoders are byte-identical) + the AZ-404 E2E test asserting both invariants on every PR. The single composition root is the single point of mode awareness.

Notes for the Implementer

  • The LiveCameraFrameSource retrofit is a no-op restructure: the existing camera-ingest thread becomes a class implementing FrameSource. Its behaviour is unchanged. This is what allows C1 to consume FrameSource via constructor without becoming replay-aware.
  • The SerialMavlinkTransport retrofit (introduced by AZ-400) is a no-op restructure: the existing pymavlink transport code becomes a class implementing the new tiny MavlinkTransport Protocol. Its behaviour is unchanged. This is what allows C8 outbound encoders to remain identical between live and replay.
  • The TlogReplayFcAdapter's subscribe_telemetry fan-out runs on a dedicated thread (mirroring the live PymavlinkArdupilotAdapter decode-thread semantics). C1 and C5 see identical thread boundaries in live and replay (Invariant 1).
  • The Clock Protocol is the SAME interface in live and replay — only the strategy differs. This is the single Liskov-clean line that lets components consume Clock without knowing the mode.
  • The ReplayInputAdapter lives at src/gps_denied_onboard/replay_input/__init__.py (public) + tlog_video_adapter.py (concrete) + auto_sync.py (AZ-405 logic). It is a Layer-4 module per module-layout.md (it imports from Layer 1 frame_source/ and clock/ interfaces, and instantiates Layer-4 strategies from c8_fc_adapter/). The composition root imports the public API of replay_input/ only; it does not reach into the coordinator's internals.
  • The parent-suite UI demo flow: operator plans a route in the suite UI → C12 builds the cache → operator runs gps-denied-replay --video ... --tlog ... --output results.jsonl → UI tails results.jsonl and renders per-tick (lat, lon, alt, horiz_accuracy). The operator's pre-flight workflow is identical to a live flight up until the final "fly" step. This is the user-confirmed design intent.