# Contract: Replay Mode (`replay_input` module + `FrameSource` + `Clock` + `ReplaySink` + `NoopMavlinkTransport`) **Owner**: replay (epic AZ-265 / E-DEMO-REPLAY) — strategies live inside existing components (`frame_source/`, `clock/`, `c8_fc_adapter/`); a small new `replay_input/` cross-cutting module converges `(video, tlog)` inputs into the standard `FrameSource` + `FcAdapter` boundaries the rest of the system already consumes. **Producer task**: AZ-398 (`FrameSource` Protocol + `VideoFileFrameSource` + `LiveCameraFrameSource` retrofit + `Clock` Protocol) **Consumer tasks**: AZ-399 (TlogReplayFcAdapter), AZ-400 (ReplaySink + JsonlReplaySink + NoopMavlinkTransport), AZ-401 (replay-mode branch in `compose_root`), AZ-402 (gps-denied-replay CLI wrapper), AZ-404 (E2E replay fixture test), AZ-405 (Auto-sync IMU take-off detection inside `replay_input/`). **Version**: 2.0.0 (replaces v1.0.0 — "replay is a fourth Docker image" design replaced by "replay is a configuration of the airborne binary"; see ADR-011) **Status**: draft **Last Updated**: 2026-05-14 **Module-layout home**: - `src/gps_denied_onboard/frame_source/interface.py`, `__init__.py` — `FrameSource` Protocol (Layer 1 cross-cutting per `module-layout.md`). - `src/gps_denied_onboard/clock/interface.py`, `__init__.py` — `Clock` Protocol (Layer 1 cross-cutting). - `src/gps_denied_onboard/components/c8_fc_adapter/tlog_replay_adapter.py` — `TlogReplayFcAdapter` strategy (gated `BUILD_TLOG_REPLAY_ADAPTER`; ON in the airborne binary). - `src/gps_denied_onboard/components/c8_fc_adapter/replay_sink.py` — `ReplaySink` Protocol + `JsonlReplaySink` strategy (gated `BUILD_REPLAY_SINK_JSONL`; ON in the airborne binary). - `src/gps_denied_onboard/components/c8_fc_adapter/noop_mavlink_transport.py` — `NoopMavlinkTransport` strategy (gated `BUILD_REPLAY_SINK_JSONL`; ON in the airborne binary; wraps the live MAVLink transport layer so C8 encoders are unchanged). - `src/gps_denied_onboard/replay_input/` — new Layer-4 cross-cutting coordinator that owns `(video, tlog)` → `(FrameSource, FcAdapter, Clock)` convergence + auto-sync + time-offset application. - `src/gps_denied_onboard/runtime_root/__init__.py` — `compose_root(config)` extended with a `config.mode = "live" | "replay"` branch (no separate `compose_replay` composition root; replay is a configuration of the single airborne composition root). - `src/gps_denied_onboard/cli/replay.py` — `gps-denied-replay` console-script: builds a replay-mode `Config` and dispatches into the same companion entry point as live. ## Purpose Defines the public interfaces enabling **offline replay mode** per epic AZ-265: run the production C1–C5 pipeline (with the full C6 tile cache + the same C7 inference runtime + the same C13 FDR) against historical inputs (1–2 min Derkachi-style clip + matching pymavlink `.tlog`) so the parent-suite UI demo has end-to-end fidelity equal to a live flight. **Design (v2.0.0 — replaces v1.0.0)**: replay is a **configuration of the airborne binary**, not a separate Docker image. See ADR-011 for the full rationale. The same image, same components, same composition root, same pre-flight workflow as a live flight; only three strategies differ at runtime: | Concern | Live strategy | Replay strategy | |---|---|---| | `FrameSource` | `LiveCameraFrameSource` | `VideoFileFrameSource` | | `FcAdapter` (inbound IMU/attitude/GPS/flight-state) | `PymavlinkArdupilotAdapter` / `Msp2InavAdapter` | `TlogReplayFcAdapter` | | `FcAdapter` outbound transport (the bytes that go onto the wire) | Real serial/UART link to ArduPilot Plane / iNav | `NoopMavlinkTransport` (sink; C8 encoders unchanged) | | `Clock` | `WallClock` | `TlogDerivedClock` (pace=ASAP) or `WallClock` (pace=REALTIME) | | Per-tick position observable to the UI | C8 outbound + GCS telemetry summary | Additional `JsonlReplaySink` tap on C5's `EstimatorOutput` stream | Everything else is identical: C6 reads the same pre-built tile cache the operator built via the normal C10/C11/C12 pre-flight flow; C7 deserializes the same TensorRT engines; C13 writes a real FDR for the replay run (a real flight record, just driven by historical inputs). Production C1–C5 components remain **mode-agnostic** — replay-aware logic lives ONLY in the composition root branch, the strategies named above, the `replay_input/` coordinator, and the CLI. The user-visible result: a UI consumer tails the JSONL file and sees per-tick `(lat, lon, alt, horiz_accuracy)` exactly as the airborne binary would emit them in a real flight. Other MAVLink emits (FC GPS_INPUT, GCS STATUSTEXT, EKF source-set commands) are swallowed by `NoopMavlinkTransport` — the operator confirmed they don't need to be observable in replay (the contract above is the single source of truth for that decision). This contract defines four Protocols, one coordinator class, and the replay-mode composition branch: - **`FrameSource`** — formalised cross-cutting interface for camera-frame ingestion. Two strategies: `LiveCameraFrameSource` (live) and `VideoFileFrameSource` (replay; gated `BUILD_VIDEO_FILE_FRAME_SOURCE`). - **`Clock`** — wall-clock vs. tlog-derived time abstraction (R-DEMO-4 mitigation). Two strategies: `WallClock` (live/research/operator/replay-realtime) and `TlogDerivedClock` (replay-asap). - **`ReplaySink`** — offline `EstimatorOutput` consumer interface tapping C5's output stream. One strategy: `JsonlReplaySink` (one `EstimatorOutput` per JSONL line; gated `BUILD_REPLAY_SINK_JSONL`). - **`TlogReplayFcAdapter`** — replay-only `FcAdapter` strategy (per AZ-261 `FcAdapter` Protocol from `_docs/02_document/contracts/c8_fc_adapter/fc_adapter_protocol.md`); parses pymavlink `.tlog` and emits `ImuWindow` / `AttitudeWindow` / `GpsHealth` / `FlightStateSignal` at tlog-timestamp cadence (or wall-clock-paced per `--pace`). Gated `BUILD_TLOG_REPLAY_ADAPTER`. - **`NoopMavlinkTransport`** — replay-only outbound transport that swallows every byte the C8 encoders try to write. The C8 outbound encoder code path is **unchanged** between live and replay (Invariant 1); the transport layer is the only place the destination differs. Gated `BUILD_REPLAY_SINK_JSONL` (shares the build flag with `JsonlReplaySink` — both are "where does this binary send its outputs in replay" concerns). - **`ReplayInputAdapter`** — Layer-4 coordinator class in `replay_input/` that owns `(video, tlog)` lifecycle, applies the time-offset (manual via `--time-offset-ms` or auto via AZ-405 IMU-take-off detection), instantiates `VideoFileFrameSource` + `TlogReplayFcAdapter` + chosen `Clock`, and hands the trio to the composition root. The composition root sees only standard `FrameSource` + `FcAdapter` + `Clock` after the coordinator is opened. The shared `WgsConverter` (AZ-279) is constructor-injected into the tlog adapter for tlog-GPS → local-tangent-plane conversion (unchanged from v1.0.0). ## Public API ### Protocol: `FrameSource` (unchanged from v1.0.0) ```python @runtime_checkable class FrameSource(Protocol): def next_frame(self) -> NavCameraFrame | None: ... # None on end-of-stream def close(self) -> None: ... ``` ### Protocol: `Clock` (unchanged from v1.0.0) ```python @runtime_checkable class Clock(Protocol): def monotonic_ns(self) -> int: ... def time_ns(self) -> int: ... # wall-clock (UTC) for log timestamps def sleep_until_ns(self, target_ns: int) -> None: ... # honoured in --pace realtime; no-op in --pace asap ``` ### Protocol: `ReplaySink` (unchanged from v1.0.0) ```python @runtime_checkable class ReplaySink(Protocol): def emit(self, output: EstimatorOutput) -> None: ... def close(self) -> None: ... ``` ### Concrete: `TlogReplayFcAdapter` (unchanged from v1.0.0) ```python class TlogReplayFcAdapter(FcAdapter): def __init__( self, tlog_path: Path, target_fc_dialect: FcKind, # ARDUPILOT_PLANE | INAV clock: Clock, wgs_converter: WgsConverter, time_offset_ms: int = 0, # set by ReplayInputAdapter (auto-sync or --time-offset-ms) pace: ReplayPace = ReplayPace.ASAP, # REALTIME | ASAP ): ... ``` The `TlogReplayFcAdapter` implements the **full** `FcAdapter` Protocol from AZ-261. `subscribe_telemetry` fans out IMU/attitude/GPS-health/flight-state from the tlog at the configured pace. `emit_external_position`, `emit_status_text`, and `request_source_set_switch` are implemented as **no-ops that delegate to the underlying transport** — in replay mode the transport is `NoopMavlinkTransport` (see below), so the bytes go nowhere; in live mode the same encoders shape the same bytes for a real wire. The encoder code path is identical; only the transport differs. ### Concrete: `NoopMavlinkTransport` ```python class NoopMavlinkTransport(MavlinkTransport): """Outbound transport sink for replay mode. Accepts every `write(payload: bytes)` and `close()` call without I/O. Counts bytes written for observability (FDR + INFO log at close). """ def write(self, payload: bytes) -> None: ... # silent drop def close(self) -> None: ... def bytes_written(self) -> int: ... # observability ``` The C8 outbound encoders (per the v1.0.0 `FcAdapter` protocol — `emit_external_position`, `emit_status_text`, `request_source_set_switch`, and the `QgcTelemetryAdapter` 1–2 Hz GCS summary) operate over a constructor-injected `MavlinkTransport` interface (a new tiny Protocol introduced by AZ-401 to make this swap clean). In live mode the transport is `SerialMavlinkTransport` writing to the UART; in replay mode it is `NoopMavlinkTransport`. **The encoders themselves are unchanged** — they produce the same byte streams, including the MAVLink 2.0 signing handshake and per-flight key rotation. The signing key is mandatory in both modes (the operator supplies a dummy key for replay; the contract does not constrain the key's provenance). This is the single architectural point that lets us say "replay is exactly like live, only the destination differs" without baking `if replay_mode:` branches into C8. ### Concrete: `ReplayInputAdapter` ```python @dataclass(frozen=True, slots=True) class ReplayInputBundle: frame_source: FrameSource fc_adapter: FcAdapter clock: Clock resolved_time_offset_ms: int auto_sync_result: AutoSyncDecision | None # None when --time-offset-ms is provided class ReplayInputAdapter: """Converges (video, tlog) into the standard FrameSource + FcAdapter + Clock surfaces. Owns the time-alignment between video frames and tlog IMU/attitude ticks (manual via --time-offset-ms or automatic via AZ-405 IMU-take-off detection). Instantiates VideoFileFrameSource, TlogReplayFcAdapter, and the chosen Clock. The composition root, after calling .open(), sees no replay-specific types. """ def __init__( self, *, video_path: Path, tlog_path: Path, camera_calibration: CameraCalibration, target_fc_dialect: FcKind, wgs_converter: WgsConverter, fdr_client: FdrClient, # forwarded to TlogReplayFcAdapter + used for replay_input's own FDR records (auto-sync detected / low-confidence / AC-8 hard-fail) pace: ReplayPace, manual_time_offset_ms: int | None, # None → auto-sync runs (AZ-405) auto_sync_config: AutoSyncConfig, ) -> None: ... def open(self) -> ReplayInputBundle: """Resolve time-offset (auto-sync or manual), build the strategies, return the bundle. Raises: ReplayInputAdapterError("tlog missing required message types: ...") — R-DEMO-3 fail-fast at startup. ReplayInputAdapterError("auto-sync hard-fail: ...") — AC-8 of the epic (≤ 95 % frame-window match). ReplayInputAdapterError("video file unreadable / unsupported codec / ...") — VideoFileFrameSource opening failure surfaced at coordinator scope. """ ... def close(self) -> None: ... # closes both inputs; idempotent ``` ### CLI surface ``` gps-denied-replay --video PATH --tlog PATH --output results.jsonl --camera-calibration calib.json --config config.yaml # same config schema as the airborne binary --mavlink-signing-key PATH # mandatory; operator provides a dummy key for replay [--pace {realtime,asap}] # default asap [--time-offset-ms N] # overrides AZ-405 auto-sync ``` The CLI is a thin **mode-config wrapper**: it loads `config.yaml`, sets `config.mode = "replay"` and the replay-specific paths/flags, and calls the **same** entry point the live binary uses. The shared entry point calls `compose_root(config)` which returns a wired runtime; the runtime's per-frame loop is unchanged between live and replay. ### Composition root extension `runtime_root/__init__.py` exposes a single `compose_root(config) -> Runtime` (no separate `compose_replay`). When `config.mode == "replay"`: 1. Build a `ReplayInputAdapter` from `config.replay.{video_path, tlog_path, pace, time_offset_ms, …}` + the same `CameraCalibration` and `WgsConverter` the live path already uses. 2. Call `replay_input.open()` → `ReplayInputBundle(frame_source, fc_adapter, clock, …)`. 3. Pick the `MavlinkTransport` strategy: `NoopMavlinkTransport` (replay) vs. `SerialMavlinkTransport` (live), based on `config.mode`. 4. Add a `JsonlReplaySink` subscriber to C5's `EstimatorOutput` stream (replay only). The live binary already emits to C8 outbound + QGC telemetry adapter; the JSONL sink is an additional listener, not a replacement. 5. Wire C1–C5 + C6 + C7 + C13 exactly as in the live composition (Invariant 1 — components see the same interfaces). 6. Return the wired `Runtime` whose per-frame loop is the existing one (single source of truth — no per-mode loop). ``` loop: frame = frame_source.next_frame() # VideoFileFrameSource in replay if frame is None: break c1 = vio.process(frame) # C1 candidates = vpr.lookup(c1) # C2 (uses real C6 DescriptorIndex) reranked = rerank.rerank(candidates) # C2.5 matched = matcher.match(reranked) # C3 refined = refiner.refine_if_needed(matched) # C3.5 pose = pose_estimator.estimate(refined) # C4 state.add_pose_anchor(pose) # C5 state.add_vio(c1.vio_output) # C5 output = state.current_estimate() # multiple listeners, all wired by the composition root: fc_adapter.emit_external_position(output) # → NoopMavlinkTransport in replay; SerialMavlinkTransport live fdr.write(output) # C13: ALWAYS, both modes if replay_sink is not None: # replay only replay_sink.emit(output) # JsonlReplaySink → JSONL file → UI tails it ``` Side notes: - The tlog adapter's `subscribe_telemetry` callbacks are wired to C5's `add_fc_imu` and to C1's IMU prior on the same threads as in the live binary (Invariant 1 — same threads, same callbacks, different source). - `set_takeoff_origin` (AZ-490 / ADR-010) is invoked identically in replay: the operator's pre-flight C10 Manifest is the source of truth in both modes. The tlog's first GPS fix is the **fallback**, gated through the same Principle #11 bounded-delta check. - `BUILD_FAISS_INDEX` is ON in the airborne binary (live and replay alike). C2 in replay queries the **real** C6 `FaissDescriptorIndex`, populated by the pre-flight C10 build. This is the architectural change vs. v1.0.0 of this contract. ### Composition profile: open-loop ESKF (AZ-776 / ADR-012) The replay binary supports a second composition profile alongside the production GTSAM-iSAM2 path: **open-loop ESKF**. It is the Tier-2 smoke baseline for the AZ-265 replay flow and the only profile that can run end-to-end against the Derkachi clip today (the satellite-anchored profile waits on AZ-777's C6 reference tile cache). The user-facing switch is one YAML field on the C4 block: ``` c4_pose: enabled: false c5_state: strategy: eskf ``` When `c4_pose.enabled = false`: - `compose_root` removes `c4_pose` from the component selection map before topological ordering; the C4 wrapper never runs, the `OpenCVGtsamPoseEstimator` is never instantiated, and `pose_estimator.estimate(refined)` in the per-frame loop above is a no-op (C5 receives IMU + VIO inputs only, no PnP anchor). - `build_pre_constructed` omits `c5_isam2_graph_handle` from the `pre_constructed` dict (no consumer requires it). The ESKF estimator is still pre-built and cached in the internal `_c5_prebuilt_estimator` slot exactly as in the gtsam_isam2 path; the C5 wrapper short-circuits onto the prebuilt instance per AZ-625. - The replay per-frame loop is otherwise unchanged — C1 VIO still runs, C5 still ingests, the JSONL sink still emits per video frame. Position drifts open-loop without satellite re-anchoring; AZ-777 closes that half of the loop by adding C2/C3/C4 against the C6 tile cache. The 2×2 pairing matrix between `c4_pose.enabled` and `c5_state.strategy` is enforced at compose time (Invariant 13 below). The two valid cells are: | `c4_pose.enabled` | `c5_state.strategy` | Profile | Status | |---|---|---|---| | `true` | `gtsam_isam2` | Steady-state airborne (ADR-003 / ADR-009) | Production | | `false` | `eskf` | Open-loop ESKF (this section) | Replay Tier-2 smoke baseline | The two **invalid** cells (`true` + `eskf` and `false` + `gtsam_isam2`) raise `CompositionError` from `compose_root` with explicit error text naming both blocks. The Tier-2 replay fixture (`tests/e2e/replay/conftest.py`) writes the second valid cell. The first cell is the live + airborne default. ## Invariants 1. **Mode-agnostic C1–C7, C13**: production components MUST NOT contain `if config.mode == "replay":` branches. Mode-specific behaviour lives in the strategies (FrameSource / FcAdapter / MavlinkTransport / ReplaySink / Clock). Verified by an explicit grep guard in CI (the AZ-404 E2E test owns this assertion). 2. **Single `Clock` per process**: `compose_root` resolves `Clock` exactly once at startup. All time-driven logic (AC-5.2 fallback timer, STATUSTEXT rate-limits, key rotation logging) consumes the injected `Clock` via constructor — never `time.monotonic_ns()` directly. Verified by an AST scan in CI for direct `time.monotonic_ns` / `time.time_ns` references in `components/**/*.py`. 3. **Frame source ordering**: `next_frame()` returns frames in monotonically non-decreasing `monotonic_ns` order. Out-of-order frames raise `FrameSourceError` (NOT silently dropped — replay must be deterministic). 4. **End-of-stream is None**: `next_frame()` returns `None` ONLY when the stream is permanently exhausted. Transient I/O failures raise `FrameSourceError`. 5. **Outbound MAVLink encoders are mode-agnostic**: the C8 outbound encoders for `GPS_INPUT` / `MSP2_SENSOR_GPS` / `STATUSTEXT` / `NAMED_VALUE_FLOAT` / `MAV_CMD_SET_EKF_SOURCE_SET` produce identical byte streams in both modes. Only the `MavlinkTransport` strategy differs (Serial vs. Noop). The MAVLink 2.0 signing handshake runs in replay too (the operator provides a dummy signing key); the signing bytes are produced and then dropped by `NoopMavlinkTransport`. Verified by a unit test that captures the encoder output in both modes and diffs the byte streams. 6. **Pace mode honoured by Clock**: `pace=REALTIME` → `Clock.sleep_until_ns(target_ns)` blocks until wall-clock catches up; `pace=ASAP` → no-op. The pace flag is consumed ONLY by the `Clock` and the tlog adapter — components see only the `Clock` Protocol. 7. **JsonlReplaySink one-line-per-emit**: each `emit(output)` writes exactly one JSON object + newline; the file is fsync'd on `close()`. Schema matches `EstimatorOutput` (frozen dataclass serialised via `dataclasses.asdict` + `orjson.dumps`). 8. **Time-offset resolved before composition**: the `ReplayInputAdapter` resolves `time_offset_ms` (auto-sync or manual) and locks it into the `TlogReplayFcAdapter` constructor before `compose_root` returns the wired runtime. No live re-tuning. 9. **Build-flag gating**: `VideoFileFrameSource`, `TlogReplayFcAdapter`, `JsonlReplaySink`, `NoopMavlinkTransport` MUST refuse construction when their respective `BUILD_*` flag is OFF (per ADR-002). In the airborne binary all four flags are ON by default; setting any of them OFF in airborne disables replay mode (the binary still runs live mode normally). 10. **Determinism**: same `(video, tlog, config, time_offset_ms, pace=ASAP)` input → same JSONL output within ≤ 1e-6 float drift in position fields (AC-5). 11. **MAVLink signing key required in replay**: the airborne binary refuses to run without `--mavlink-signing-key PATH` in both modes. In replay the operator supplies a dummy file (well-formed key bytes; no real channel to verify against). This preserves Invariant 5 — the encoders' signing code path runs identically in both modes. 12. **Real C6 cache in replay**: the airborne binary in replay mode reads the same pre-built C6 tile cache the operator built via the normal pre-flight C10/C11/C12 flow. There is no replay-specific cache shape. Verified by the AZ-404 E2E fixture, which runs the operator's pre-flight flow before invoking the replay CLI. **Sub-invariant 12.a (cycle 3 — AZ-839 / Epic AZ-835 C3)**: the e2e `operator_pre_flight_setup` fixture replaces the cycle-1 `mkdir` placeholder with a real driver that wires C1 (`replay_input.tlog_route.extract_route_from_tlog` — AZ-836) + C2 (`c11_tile_manager.route_client.SatelliteProviderRouteClient.seed_route` — AZ-838) + C11 (`tile_downloader.HttpTileDownloader.download_for_bbox`) + C10 (`DescriptorBatcher`) to populate C6 from a tlog-derived corridor. The fixture yields a `PopulatedC6Cache` dataclass (`cache_root`, `tile_store_path`, `faiss_index_path`, `faiss_sidecar_sha256_path`, `faiss_sidecar_meta_path`, `route_spec`, `tile_count`, `elapsed_seconds`). The cache is mounted into a named docker volume that survives across pytest sessions (cold first invocation populates; subsequent invocations within the same compose session reuse — warm cache). Cold-start budget: ≤ 5 min on Tier-2 Jetson; warm: ≤ 30 s. Sidecar triple-consistency (`.index` + `.sha256` + `.meta.json`) per AZ-306 is verified at every fixture yield; mismatch raises `IndexUnavailableError`. The C12 production binding for the route-driven path is a future-cycle integration; production pre-flight still uses the bbox-driven `download_tiles_for_area` path today. **Sub-invariant 12.c (cycle 3 — Epic AZ-835: route-driven supersedes bbox)**: route-driven seeding (operator's tlog-derived `RouteSpec` → `POST /api/satellite/route` → corridor materialised by `satellite-provider`) supersedes the legacy AZ-777 bbox-driven approach (`POST /api/satellite/request` over a fixed lat/lon box) for the real-flight validation path. The supersedure rationale is twofold: - **Tile efficiency (~100×)**: the AZ-777 bbox for a typical Derkachi-style flight produces ~11,400 z15-z18 tiles (~140 MB, 48 % over the C6 cache budget). A 10-point coarsened route with `regionSizeMeters=500` per point produces ~50-100 unique tiles (~1.5 MB) for the same VPR descriptor lock area. The route-driven path is the only one that fits the AZ-696 reference-fixture budget on Jetson. - **Pre-commitment honesty**: a bbox pre-commits to where the operator *might* fly. A route pre-commits to where they *did* fly. For real-flight validation against ground-truth GPS, the latter is the right primitive — it ensures the FAISS index is populated with descriptors of the tiles the airborne pipeline will actually query, not a superset whose VPR misses are statistically indistinguishable from the AZ-696 AC-3 ≤ 100 m threshold violations. AZ-777 Phase 1 (e2e-runner wiring + C11 read-contract adaptation) is **retained and reused** by Epic AZ-835. AZ-777 Phases 3 and 5 are **superseded** by Epic AZ-835 children (AZ-839 for the operator-fixture rewrite, AZ-842 for the docs work). Phase 4 (un-xfail of AC-4/AC-5) was deferred to backlog after cycle-4 AZ-895 took the un-xfail target along a different path; it is not on the active epic. **Sub-invariant 12.d (cycle 3 — AZ-839 / Epic AZ-835 C3: fixture failure-handling contract)**: the `operator_pre_flight_setup` fixture must distinguish three failure classes from `SatelliteProviderRouteClient.seed_route` / `HttpTileDownloader.download_for_bbox` and surface them honestly: | Class | Source | Fixture response | |-------|--------|------------------| | Validation | `RouteValidationError` (pre-emptive AZ-809 bound violation) or `IndexUnavailableError` (sidecar triple mismatch at yield-time) | Re-raise — operator/test author error, no remediation in the fixture | | Terminal | `RouteTerminalFailureError` (satellite-provider rejected the route id or status polling returned `mapsReady=false` past `poll_max_attempts`) | Re-raise — service-side state cannot be recovered by retry | | Transient | `RouteTransientError` or `TileDownloadError` with HTTP 5xx / network reset | **Retry up to 3 attempts** using C11's existing exponential backoff schedule (`HttpTileDownloader.RETRY_*` constants); re-raise on exhaustion | The fixture does NOT swallow transient failures silently — the third attempt's exception surfaces with the full retry history in the message so the test report can distinguish "fixture genuinely tried 3×" from "fixture short-circuited". Cold-start budget of ≤ 5 min on Tier-2 Jetson is measured wall-clock around the entire retry loop, not per-attempt. **Sub-invariant 12.b (cycle 3 — AZ-840 / Epic AZ-835 C4)**: the E2E orchestrator test `tests/e2e/replay/test_az835_e2e_real_flight.py` takes only `(tlog, video, calibration)` and runs the full 7-step pipeline end-to-end on Tier-2 Jetson — no operator hand-curation between steps. The 7 steps are: (1) active flight cut + tlog/video sync via AZ-405; (2) on-fly frame + IMU extraction; (3) auto-create route via AZ-836; (4) POST route to satellite-provider via the C3 fixture's `operator_pre_flight_setup` (delegates to AZ-838); (5) build FAISS index (driven by C3); (6) run gps-denied airborne pipeline against the populated cache + tlog/video/calibration (reuses the airborne composition root path AZ-699 exercises); (7) compute horizontal-error distribution and emit the AZ-699 verdict report at `_docs/06_metrics/real_flight_validation_.md`. The verdict report is emitted ALWAYS, regardless of PASS / FAIL on the AZ-696 ≥ 80 % within 100 m gate — the success criterion is that the report exists with the honest distribution, not that the verdict is PASS. Gated by `RUN_REPLAY_E2E=1` + `@pytest.mark.tier2`. 13. **C4↔C5 pairing matrix is enforced at compose time** (AZ-776 / ADR-012): `compose_root` rejects the two off-diagonal cells of the (`c4_pose.enabled`, `c5_state.strategy`) matrix with a `CompositionError` naming both blocks. `enabled=False` + `gtsam_isam2` and `enabled=True` + `eskf` are forbidden. The two valid cells are `enabled=True` + `gtsam_isam2` (production steady-state per ADR-003 / ADR-009) and `enabled=False` + `eskf` (open-loop ESKF — replay Tier-2 smoke baseline; satellite anchoring deferred to AZ-777). Verified by `tests/unit/runtime_root/test_az776_open_loop_eskf_composition.py` AC-3a and AC-3b. 14. **Single canonical clock & CSV-driven replay path (cycle 4 — AZ-894 / AZ-895 / AZ-896)**: production runs as a single edge process on a single device. There is exactly **one** wall/monotonic clock authoritative for timestamps that cross component boundaries — the clock at the C8 inbound boundary (`FcAdapter`) where IMU windows enter the system. Two-clock surfaces — for example a C1 `VioOutput.emitted_at_ns` derived from the Jetson `monotonic_ns()` paired against a C8 `ImuWindow.ts_end_ns` derived from FC-boot — produced the AZ-848 ESKF out-of-order regression observed in cycle 3 (Jetson clock advanced between IMU window arrival and VIO emission, so the VIO emission timestamp routinely landed *before* the IMU window's `ts_end_ns` when the two were compared as if on the same axis, and ESKF rejected its own VIO updates). All downstream timestamps (`EstimatorOutput.ts_ns`, `JsonlReplaySink` per-row `t`, FDR `flight_event.ts_ns`) MUST derive from a single canonical clock that produces deterministic per-record values for a given input. In live mode the canonical clock is the C8 inbound IMU window's FC-boot-relative timestamp; in replay mode it is the CSV row's `Time` column. **Sub-invariant 14.a (CSV-driven replay path — AZ-894)**: the replay-mode operator input is `(video, CSV)`. The CSV row's `Time` column is the canonical clock for the entire replay run: every IMU window emitted by the new `csv_replay_input.CsvReplayInputAdapter` (gated `BUILD_CSV_REPLAY_ADAPTER=ON` in the airborne and research binaries) carries `ts_end_ns` derived from the CSV `Time` column; the `Clock` strategy injected into the composition root is `CsvDerivedClock` which uses the same column. There is no auto-sync (see 14.c below). The CSV must satisfy the format spec at `_docs/02_document/contracts/replay/csv_replay_format.md` (AZ-896) — including the requirement that row 0's `Time` equals video frame 0 (`t=0`) so the airborne pipeline does not need to apply any per-stream offset. **Sub-invariant 14.b (tlog adapter audit-only role — AZ-895)**: `TlogReplayFcAdapter` (Sub-invariant 14 of the prior cycles' design) is retained in source for two audit / migration paths and removed from the replay test/demo critical path: - **FDR analysis**: one-shot tlog parsing for incident review (e.g. AZ-848 timestamp investigation) — invoked from offline analysis scripts under `tools/`, not from the airborne composition root. - **One-shot tlog → CSV export**: a CLI utility (`gps-denied-tlog-to-csv`) that reads a pymavlink tlog and writes the canonical CSV per AZ-896. This is the migration ramp for users who only have legacy tlog inputs. The previous `compose_root(config={"mode": "replay", "replay_input.adapter": "tlog"})` code path is preserved with a one-cycle deprecation warning on startup; removal is tracked in AZ-908 (cycle-5+ backlog). The CSV adapter (`BUILD_CSV_REPLAY_ADAPTER=ON`) is the default and the only path the e2e fixture suite exercises after cycle 4. **Sub-invariant 14.c (auto-sync deprecation — AZ-895)**: the `replay_input.auto_sync` module (AZ-405) is reduced to a deprecated no-op stub that raises `ReplayInputAdapterError("auto-sync removed; supply --imu CSV instead")` from every public entry point. The CLI flags `--time-offset-ms`, `--skip-auto-sync`, and `--auto-trim` are accepted with a deprecation warning and ignored. The justification: with a single canonical clock at the CSV row level (14.a), there is no second clock to align against — the operator authors the CSV with the correct row-0 alignment, and the fixture verifies row 0's `Time == 0`. Hard removal of the deprecated surface is tracked in AZ-908; this cycle ships only the stub + warnings to preserve source-compat for any downstream caller built against AZ-405's pre-deprecation shape. **Sub-invariant 14.d (operator-facing UI — AZ-897, superseded by Invariant 15)**: retained for historical cycle-4 CSV-only upload spec. Default demo entry is now F11 / AZ-969. 15. **Operator demo replay path (cycle 5 — AZ-969 / F11)**: the default product demo accepts raw `(video, tlog, calibration)` from the suite UI. Alignment is operator-visible (dual timeline bars + explicit refine); the backend exports an AZ-896 CSV whose `Time` column is the single canonical replay clock (Invariant 14.a). Steps: preview timelines (AZ-970) → coarse align + refine (AZ-897, AZ-971) → export CSV (AZ-972) → seed corridor cache from tlog GPS (AZ-974) → run `gps-denied-replay` (AZ-973) → map + verdict. The `(video, pre-authored CSV)` bypass (AZ-959) is optional, not default. E2E tests MUST use the same orchestration modules as production — no parallel test-only graph. AZ-908 (hard removal of alignment stubs) is deferred until AZ-971 ships. ## Producer / Consumer Split | Task ID | Scope | |---------|-------| | AZ-398 (Producer) | `FrameSource` Protocol; `Clock` Protocol; `VideoFileFrameSource` (gated `BUILD_VIDEO_FILE_FRAME_SOURCE`); `LiveCameraFrameSource` retrofit (rename existing camera-ingest plumbing into the Protocol shape — no behaviour change); `WallClock` + `TlogDerivedClock` strategies; composition wiring in `compose_root` (Clock = WallClock in live, picked per-pace in replay). NO tlog parsing, NO sink, NO replay coordinator. | | AZ-399 (Consumer 1) | `TlogReplayFcAdapter`: pymavlink stream-parser (DO NOT materialise; R-DEMO-2 throughput floor); maps tlog message types → `FcTelemetryFrame`; supports both AP and iNav dialects; `subscribe_telemetry` fan-out at the configured pace; respects `time_offset_ms`; honours `Clock` for pacing; outbound `emit_*` methods delegate to constructor-injected `MavlinkTransport` (Invariant 5); fail-fast at startup if required message types absent (R-DEMO-3). | | AZ-400 (Consumer 2) | `ReplaySink` Protocol + `JsonlReplaySink` (one JSON object per line; orjson serialiser; `close()` fsyncs). **Also**: `MavlinkTransport` Protocol cut-out + `NoopMavlinkTransport` strategy + `SerialMavlinkTransport` retrofit (rename the existing C8 transport code into the Protocol shape — no behaviour change). | | AZ-401 (Consumer 3) | Extend `compose_root(config)` with a `config.mode = "live" \| "replay"` branch: in replay mode, builds the `ReplayInputAdapter`, picks `NoopMavlinkTransport`, adds the `JsonlReplaySink` listener on C5's `EstimatorOutput` stream, and otherwise wires C1–C7 + C13 identically to live. Build-flag check at startup. NO separate `compose_replay` function (replay is a configuration of the single composition root). | | AZ-402 (Consumer 4) | `gps-denied-replay` CLI: argparse, config + calibration loader, sets `config.mode = "replay"`, dispatches into the same companion entry point as live; structured-error exit codes (0=success, 2=AC-8 sync-impossible from `ReplayInputAdapter.open()`, 1=any other error). | | AZ-404 (Consumer 6) | E2E replay fixture test: `tests/e2e/replay/test_derkachi_1min.py` — runs the CLI against a 1–2 min Derkachi clip + matching tlog; asserts AC-3 (≤ 100 m for ≥ 80 % of ticks); gated by `RUN_REPLAY_E2E=1` in CI. Asserts Invariant 1 (no `if config.mode == "replay"` branches in components) via an AST scan. | | AZ-405 (Consumer 7) | Auto-sync of video ↔ tlog via IMU take-off detection. Lives **inside `replay_input/`** (this task creates the module): take-off pattern (sustained vertical accel > 0.5 g + change in attitude rate > 1 rad/s lasting ≥ 0.5 s) + video motion-onset; confidence-scored; falls back to WARN + best-guess if < 80 %; `--time-offset-ms` always overrides; AC-8 hard-fail (exit 2) if neither auto-detect nor manual offset produces > 95 % frame-window match. The `ReplayInputAdapter` coordinator is also defined and implemented by this task (it is the natural home for the auto-sync logic — the coordinator owns the time-alignment concern, and auto-sync is one of the two ways the offset is resolved). | **AZ-403 (formerly: replay-cli Dockerfile + SBOM diff CI step) is CANCELLED**: the replay-cli Docker image no longer exists under v2.0.0. The airborne Docker image IS the replay image; no SBOM diff is needed because there are no components to assert as absent. See `_docs/02_tasks/done/AZ-403_replay_dockerfile_ci.md` (cancellation banner) and the ADR-011 amendment in `architecture.md`. ## Constraints - `@runtime_checkable` on all Protocols; DTOs `frozen=True, slots=True`. - Lazy-import per ADR-002 with the new `BUILD_VIDEO_FILE_FRAME_SOURCE`, `BUILD_TLOG_REPLAY_ADAPTER`, `BUILD_REPLAY_SINK_JSONL` flags. All three flags are ON in the airborne binary (production-default); OFF in the operator-orchestrator binary; the research binary mirrors airborne (ON). - C1–C7 + C13 components MUST remain mode-agnostic (Invariant 1). - All time-driven logic in components MUST consume the injected `Clock` (Invariant 2). - No HTTP server in the airborne binary regardless of mode (parent-suite UI shells out to the CLI and tails the JSONL file; defer until the subprocess shape is proven insufficient). - pymavlink bundled unmodified per D-C8-3. - The tlog parser MUST stream-parse — never materialise the entire tlog into memory (R-DEMO-2; multi-GB tlogs). - MAVLink 2.0 signing key is mandatory in both modes (Invariant 11). The replay run reuses the live binary's per-flight key-load code path; the operator supplies a dummy key file. ## Risks / Mitigations - **R-DEMO-1** (tlog ↔ video timestamp drift / unsynchronised recordings): auto-sync via IMU take-off detection (AC-7) + `--time-offset-ms` manual override. Fixed-wing hand-launch fallback documented. Owned by `replay_input/` per AZ-405. - **R-DEMO-2** (pymavlink slow on multi-GB tlogs): stream-parse, never materialise. Throughput floor benchmarked + documented in CI. - **R-DEMO-3** (demo footage missing required FC messages): `ReplayInputAdapter.open(...)` fails fast at startup, listing missing message types and the components that need them. - **R-DEMO-4** (production C1–C5 paths bake real-time-cadence assumptions): `Clock` injection (Invariants 1, 2). Captured in ADR-011 (architecture.md). - **R-DEMO-5 (new in v2.0.0)** (live and replay diverge silently because the modes share a composition root): mitigated by Invariant 1 (no mode-aware branches in components) + Invariant 5 (encoders are byte-identical) + the AZ-404 E2E test asserting both invariants on every PR. The single composition root is the single point of mode awareness. ## Notes for the Implementer - The `LiveCameraFrameSource` retrofit is a no-op restructure: the existing camera-ingest thread becomes a class implementing `FrameSource`. Its behaviour is unchanged. This is what allows C1 to consume `FrameSource` via constructor without becoming replay-aware. - The `SerialMavlinkTransport` retrofit (introduced by AZ-400) is a no-op restructure: the existing pymavlink transport code becomes a class implementing the new tiny `MavlinkTransport` Protocol. Its behaviour is unchanged. This is what allows C8 outbound encoders to remain identical between live and replay. - The `TlogReplayFcAdapter`'s `subscribe_telemetry` fan-out runs on a dedicated thread (mirroring the live `PymavlinkArdupilotAdapter` decode-thread semantics). C1 and C5 see identical thread boundaries in live and replay (Invariant 1). - The `Clock` Protocol is the SAME interface in live and replay — only the strategy differs. This is the single Liskov-clean line that lets components consume `Clock` without knowing the mode. - The `ReplayInputAdapter` lives at `src/gps_denied_onboard/replay_input/__init__.py` (public) + `tlog_video_adapter.py` (concrete) + `auto_sync.py` (AZ-405 logic). It is a Layer-4 module per `module-layout.md` (it imports from Layer 1 `frame_source/` and `clock/` interfaces, and instantiates Layer-4 strategies from `c8_fc_adapter/`). The composition root imports the **public API** of `replay_input/` only; it does not reach into the coordinator's internals. - The parent-suite UI demo flow: operator plans a route in the suite UI → C12 builds the cache → operator runs `gps-denied-replay --video ... --tlog ... --output results.jsonl` → UI tails `results.jsonl` and renders per-tick `(lat, lon, alt, horiz_accuracy)`. The operator's pre-flight workflow is **identical** to a live flight up until the final "fly" step. This is the user-confirmed design intent.