[AZ-399] [AZ-400] C8 TlogReplayFcAdapter + ReplaySink + JsonlReplaySink

Opens E-DEMO-REPLAY (AZ-265): the two C8 strategies that let the
upcoming compose_replay (AZ-401) and gps-denied-replay CLI (AZ-402)
run the production C1-C5 pipeline against a recorded (.tlog, video)
pair without touching live FC I/O.

AZ-400 lands the contract ReplaySink Protocol (emit + close per
replay_protocol.md v1.0.0) and JsonlReplaySink: orjson-serialised
JSONL, fsync-on-close, build-flag gated (BUILD_REPLAY_SINK_JSONL),
double-close idempotent, FDR mirror on open/close. The drifted
AZ-390 stub in interface.py is removed; the canonical Protocol now
lives in replay_sink.py per module-layout.md and is re-exported via
__init__.py. AZ-390 conformance test widened.

AZ-399 lands TlogReplayFcAdapter: full FcAdapter Protocol surface,
build-flag gated (BUILD_TLOG_REPLAY_ADAPTER), pymavlink stream-parse
with bounded pre-scan + fail-fast on missing required messages
(R-DEMO-3), dedicated decode thread feeding the existing AZ-391
SubscriptionBus. Outbound surface raises FcEmitError per Invariant 5;
request_source_set_switch raises SourceSetSwitchNotSupportedError.
Pacing honours Invariant 6 via Clock.sleep_until_ns. time_offset_ms
shifts every emitted received_at per Invariant 8. Non-monotonic
timestamps raise FcOpenError.

Test coverage: 188 c8_fc_adapter tests pass; 1 skipped (AZ-399 AC-1
500 MB tlog RSS bound, deferred to AZ-404 e2e behind RUN_REPLAY_E2E).
Code review: PASS_WITH_WARNINGS — 1 Medium (mapping logic duplicates
AZ-391 live decoder; intentional today, four behavioural deltas
documented), 2 Low.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-14 05:33:20 +03:00
parent 4eac24f37a
commit fa3742d582
13 changed files with 2688 additions and 23 deletions
@@ -1,109 +0,0 @@
# Replay — TlogReplayFcAdapter (pymavlink stream parser → inbound DTOs)
**Task**: AZ-399_replay_tlog_adapter
**Name**: `TlogReplayFcAdapter` — replay-only `FcAdapter` strategy parsing pymavlink `.tlog`
**Description**: Implement `TlogReplayFcAdapter` (gated `BUILD_TLOG_REPLAY_ADAPTER`) at `src/gps_denied_onboard/components/c8_fc_adapter/tlog_replay_adapter.py`. The class implements the full `FcAdapter` Protocol from AZ-390. STREAM-PARSE the pymavlink `.tlog` (R-DEMO-2; never materialise; multi-GB tlogs); map AP/iNav message types → `FcTelemetryFrame` (RAW_IMU/SCALED_IMU2 → IMU_SAMPLE; ATTITUDE → ATTITUDE; GPS_RAW_INT/GPS2_RAW → GPS_HEALTH; HEARTBEAT.system_status → MAV_STATE / FlightStateSignal). `subscribe_telemetry` is the primary surface — fan out to all subscribers at the configured `pace`: REALTIME → use `Clock.sleep_until_ns(target_ns)` between frames; ASAP → no-op pace. `time_offset_ms` shifts every emitted timestamp at construction (Invariant 8). `target_fc_dialect` chooses pymavlink dialect at parse time. Fail fast at startup (R-DEMO-3): if any required message type is absent (RAW_IMU + ATTITUDE + GPS_RAW_INT/GPS2_RAW + HEARTBEAT), raise `FcOpenError("tlog missing required messages: <list>")` with the components that need them. `emit_external_position` and `emit_status_text` raise `FcEmitError("replay adapter does not emit to FC")` (Invariant 5). `request_source_set_switch` raises `SourceSetSwitchNotSupportedError`. `current_flight_state` returns the latest `FlightStateSignal` from the parsed stream. WgsConverter (AZ-279) constructor-injected for tlog GPS → local-tangent-plane.
**Complexity**: 5 points
**Dependencies**: AZ-398 (`Clock` Protocol), AZ-390 (`FcAdapter` Protocol from E-C8); AZ-391 (DTO surface; `FcTelemetryFrame`); AZ-279 (`WgsConverter`); AZ-273 (FDR); AZ-263, AZ-269, AZ-266, AZ-272
**Component**: c8_fc_adapter (epic AZ-265 / E-DEMO-REPLAY) — strategy lives in `c8_fc_adapter/tlog_replay_adapter.py` per `module-layout.md`
**Tracker**: AZ-399
**Epic**: AZ-265 (E-DEMO-REPLAY)
### Document Dependencies
- `_docs/02_document/contracts/replay/replay_protocol.md``TlogReplayFcAdapter` concrete shape; Invariants 5, 6, 8.
- `_docs/02_document/contracts/c8_fc_adapter/fc_adapter_protocol.md``FcAdapter` Protocol surface this strategy implements.
- `_docs/02_document/components/10_c8_fc_adapter/description.md` — § 1 (live AP/iNav adapter shape that the replay strategy mirrors).
- `_docs/02_document/architecture.md` — D-C8-3 (pymavlink bundled), R-DEMO-2 (stream-parse).
## Problem
Without this task, the replay binary has no FC inbound — there's no IMU/attitude/GPS-health/MAV_STATE feeding C1 + C5; the live `PymavlinkArdupilotAdapter` cannot be used because there's no FC, only a `.tlog` file. The `TlogReplayFcAdapter` is the strategy that lets C1C5 run unchanged.
## Outcome
- `src/gps_denied_onboard/components/c8_fc_adapter/tlog_replay_adapter.py``TlogReplayFcAdapter` class implementing `FcAdapter`.
- Constructor: `__init__(self, tlog_path, target_fc_dialect, clock, wgs_converter, time_offset_ms=0, pace=ReplayPace.ASAP, fdr_client)`.
- `open(...)` — open + validate the tlog (fail-fast on missing message types); start the dedicated decode-thread (mirrors live AP adapter's decode-thread semantics per the contract notes).
- `subscribe_telemetry(callback)` — register against the multi-subscriber bus (re-uses the bus from AZ-391 inbound subscription).
- `emit_external_position` / `emit_status_text` — raise `FcEmitError("replay adapter does not emit to FC")` per Invariant 5.
- `request_source_set_switch``SourceSetSwitchNotSupportedError`.
- `current_flight_state` — return latest `FlightStateSignal` from the parsed stream.
- `close()` — stop the decode thread; close the tlog file.
- INFO log on `open(...)`: `kind="c8.tlog_replay.opened"` with `{tlog_path, target_fc_dialect, time_offset_ms, pace, message_counts: {RAW_IMU: N, ATTITUDE: M, ...}}`.
- ERROR log + raise on missing message types: `kind="c8.tlog_replay.missing_messages"` with the list of missing types.
- DEBUG log every 1000 frames: `kind="c8.tlog_replay.frame_progress"`.
## Scope
### Included
- `TlogReplayFcAdapter` class.
- pymavlink stream parser (no materialisation).
- AP + iNav dialect support.
- Multi-subscriber fan-out (re-uses AZ-391's bus implementation).
- Fail-fast on missing message types (R-DEMO-3).
- `time_offset_ms` shift.
- Pace honoured via injected `Clock`.
- Build-flag gating.
- Unit tests: tlog open + dialect detection, fail-fast missing messages, time_offset_ms applied, pace=REALTIME calls Clock.sleep_until_ns, pace=ASAP no-op, subscribers receive frames in tlog order, emit_external_position raises, source_set_switch unsupported, build-flag gating.
### Excluded
- `FrameSource` / `Clock` — owned by AZ-398.
- `ReplaySink` — owned by AZ-400.
- `compose_replay` — owned by AZ-401.
- CLI — owned by AZ-402.
- Auto-sync IMU take-off detection — owned by AZ-405 (this task accepts `time_offset_ms` as a constructor input; the auto-sync TASK computes it).
- E2E replay fixture test — owned by AZ-404.
## Acceptance Criteria
**AC-1: Tlog stream-parse memory bound** — open a 500 MB synthetic tlog; subscribe; assert peak RSS during `subscribe_telemetry` consumption stays within 100 MB above baseline (no materialisation per R-DEMO-2).
**AC-2: AP dialect frame mapping** — synthetic AP tlog with RAW_IMU + ATTITUDE + GPS_RAW_INT + HEARTBEAT; subscribe; assert four `FcTelemetryFrame` kinds (IMU_SAMPLE, ATTITUDE, GPS_HEALTH, MAV_STATE) emitted in tlog order with correct payload fields.
**AC-3: iNav dialect frame mapping** — synthetic iNav tlog (uses AP MAVLink dialect for telemetry per RESTRICT-COMM-2 secondary channel); same frame mapping.
**AC-4: Fail-fast missing messages** — tlog WITHOUT any RAW_IMU; `open(...)``FcOpenError("tlog missing required messages: ['RAW_IMU']; consumed by: [C1 VIO, C5 StateEstimator]")`. ERROR log + FDR record.
**AC-5: time_offset_ms shift** — open with `time_offset_ms=5000`; assert every emitted `received_at` is shifted by 5e9 ns relative to the raw tlog timestamp; verify with first + last + sample mid-stream frames.
**AC-6: Pace REALTIME calls Clock.sleep_until_ns** — open with `pace=ReplayPace.REALTIME` + a wall-clock-faking Clock; subscribe; assert `Clock.sleep_until_ns` called between every emitted frame with `target_ns = received_at`.
**AC-7: Pace ASAP no-op** — open with `pace=ReplayPace.ASAP`; assert `Clock.sleep_until_ns` NEVER called between frames; throughput proxy test: 1000 frames consumed in < 1 s on Tier-1 hardware.
**AC-8: emit_external_position raises** — call `emit_external_position(EstimatorOutput(...))``FcEmitError("replay adapter does not emit to FC")` per Invariant 5.
**AC-9: source_set_switch unsupported**`request_source_set_switch()``SourceSetSwitchNotSupportedError`.
**AC-10: Build-flag gating**`BUILD_TLOG_REPLAY_ADAPTER=OFF` → constructing the class raises `FcAdapterConfigError("BUILD_TLOG_REPLAY_ADAPTER is OFF...")`.
## Non-Functional Requirements
- Throughput proxy: 1000 frames consumed in < 1 s on Tier-1 hardware (supports the ≥ 5× real-time epic NFT).
- Memory bound: peak RSS stays within 100 MB above baseline for tlogs up to 5 GB.
- `subscribe_telemetry` callback dispatch p99 ≤ 1 ms (parallel to live AP adapter).
## Constraints
- pymavlink bundled unmodified per D-C8-3.
- Stream-parse only — never materialise (R-DEMO-2).
- `time_offset_ms` set ONCE at construction (Invariant 8); no live re-tuning.
- The decode thread runs on the SAME thread-binding semantics as the live AP adapter (mirrors live behaviour for C1 + C5 consumers; per the contract notes).
## Risks & Mitigation
- **R-DEMO-2 (multi-GB tlogs)** — *Mitigation*: stream-parse; AC-1 enforces 100 MB bound.
- **R-DEMO-3 (missing message types)** — *Mitigation*: fail-fast in `open(...)`; AC-4 enforces; ERROR log lists the missing types AND the components that need them.
- **Risk: pymavlink dialect auto-detection wrong on a tlog** — *Mitigation*: `target_fc_dialect` is an explicit constructor input — operator (or CLI) MUST pass the correct value; CLI defaults to ARDUPILOT_PLANE per the most-common case.
- **Risk: tlog timestamps non-monotonic (rare)** — *Mitigation*: assert monotonic on read; non-monotonic frames raise `FcOpenError` (parallel to FrameSource Invariant 3).
## Runtime Completeness
- **Named capability**: replay-only `FcAdapter` strategy parsing pymavlink `.tlog`.
- **Production code**: real pymavlink stream-parser, real multi-subscriber fan-out, real Clock-paced subscription.
- **Allowed external stubs**: test fakes only.
- **Unacceptable substitutes**: a fake-IMU generator masquerading as a tlog adapter (defeats AC-2/AC-3 message-fidelity).
## Contract
Implements `_docs/02_document/contracts/replay/replay_protocol.md``TlogReplayFcAdapter` concrete shape; `_docs/02_document/contracts/c8_fc_adapter/fc_adapter_protocol.md``FcAdapter` Protocol surface.
@@ -1,98 +0,0 @@
# Replay — ReplaySink Protocol + JsonlReplaySink
**Task**: AZ-400_replay_jsonl_sink
**Name**: `ReplaySink` Protocol + `JsonlReplaySink` impl
**Description**: Define the `ReplaySink` Protocol (PEP 544 `@runtime_checkable`) at `src/gps_denied_onboard/components/c8_fc_adapter/replay_sink.py` (per `module-layout.md` placement; gated `BUILD_REPLAY_SINK_JSONL`). Implement `JsonlReplaySink`: open a writable file at `output_path`; `emit(EstimatorOutput)` writes exactly one JSON object per line via `orjson.dumps(dataclasses.asdict(output)) + b"\n"` (Invariant 7); `close()` flushes + fsyncs the file. Validate `output_path`'s parent directory exists at construction; raise `ReplaySinkError` if not. Bounded-write pre-allocation: open the file with `buffering=0` (unbuffered) so each `emit` flushes immediately — but the explicit fsync on `close()` is the durability guarantee. Frozen-DTO serialisation: `EstimatorOutput.covariance_6x6` (numpy array) → flat list of 36 floats per line; `EstimatorOutput.source_label` (enum) → string name; `EstimatorOutput.captured_at` (int monotonic_ns) → int.
**Complexity**: 3 points
**Dependencies**: AZ-263, AZ-269, AZ-270, AZ-381 (`EstimatorOutput` DTO), AZ-266; AZ-272 (FDR for sink-open/close events)
**Component**: c8_fc_adapter (epic AZ-265 / E-DEMO-REPLAY) — sink lives in `c8_fc_adapter/replay_sink.py` per `module-layout.md`
**Tracker**: AZ-400
**Epic**: AZ-265 (E-DEMO-REPLAY)
### Document Dependencies
- `_docs/02_document/contracts/replay/replay_protocol.md``ReplaySink` Protocol; Invariant 7.
- `_docs/02_document/contracts/c5_state/state_estimator_protocol.md``EstimatorOutput` DTO shape.
- `_docs/02_document/module-layout.md` — sink placement under `c8_fc_adapter/`.
## Problem
Without this task, the replay binary has nowhere to emit `EstimatorOutput` — the live binary emits to the FC via `PymavlinkArdupilotAdapter`, but in replay there is no FC. The `JsonlReplaySink` produces the JSONL file the parent-suite UI demo consumes (one estimate per line) — the deliverable artefact of every replay run.
## Outcome
- `src/gps_denied_onboard/components/c8_fc_adapter/replay_sink.py``ReplaySink` Protocol + `JsonlReplaySink` impl.
- Re-export of `ReplaySink` from `c8_fc_adapter/__init__.py` (already declared in module-layout.md Public API).
- `JsonlReplaySink.__init__(output_path: Path, fdr_client: FdrClient)`.
- `JsonlReplaySink.emit(EstimatorOutput)` — one orjson-serialised line.
- `JsonlReplaySink.close()` — fsync + close.
- INFO log on construction: `kind="replay.sink.opened"` with `{output_path}`.
- INFO log on close: `kind="replay.sink.closed"` with `{output_path, lines_written}`.
- DEBUG log every 1000 emits: `kind="replay.sink.emit_progress"`.
- Unit tests: Protocol conformance, one-line-per-emit, JSON valid + matches schema, numpy → flat list serialisation, enum → string, missing parent dir raises, double close idempotent, build-flag gating.
## Scope
### Included
- Protocol + impl + factory wiring in `compose_replay` (composition done in the next task; sink construction here exposed via a module-level `create(...)` per project convention).
- orjson-based serialisation.
- Bounded-write semantics + fsync-on-close.
- Frozen DTO serialisation (numpy arrays + enums).
- Build-flag gating.
- Unit tests.
### Excluded
- `compose_replay` integration — owned by next task.
- CLI `--output` arg parsing — owned by CLI task.
- E2E replay fixture test — owned by E2E task.
## Acceptance Criteria
**AC-1: Protocol conformance**`runtime_checkable` `isinstance(JsonlReplaySink(...), ReplaySink)` returns True.
**AC-2: One JSON per emit** — emit 100 `EstimatorOutput` records; assert the output file has exactly 100 lines; each line parses as a valid JSON object via `json.loads`; close + reopen the file to reread.
**AC-3: Schema match** — emit a known `EstimatorOutput`; assert the parsed JSON has keys matching `EstimatorOutput.__dataclass_fields__` (full coverage of all fields).
**AC-4: numpy → flat list** — emit an `EstimatorOutput` with `covariance_6x6 = np.eye(6)`; assert the parsed JSON's `covariance_6x6` is a list of 36 floats with `[1.0, 0.0, 0.0, ..., 0.0, 1.0]` per row-major flatten.
**AC-5: enum → string** — emit with `source_label = SATELLITE_ANCHORED`; assert the parsed JSON's `source_label` is the string `"SATELLITE_ANCHORED"` (NOT the integer enum value).
**AC-6: missing parent dir raises**`JsonlReplaySink(Path("/nonexistent/dir/out.jsonl"))``ReplaySinkError("output parent directory does not exist: /nonexistent/dir")`.
**AC-7: close fsyncs** — emit 100 records; close; assert the file size on disk matches the in-memory expected size; (best-effort) assert the file's modified timestamp updates on close — a smoke check that fsync ran.
**AC-8: double close idempotent** — call `close()` twice; second call no-op'd + DEBUG log `kind="replay.sink.double_close"`.
**AC-9: lines_written reported on close** — close after 100 emits; INFO log carries `lines_written=100`.
**AC-10: Build-flag gating**`BUILD_REPLAY_SINK_JSONL=OFF` → constructing `JsonlReplaySink` raises `ReplaySinkConfigError("BUILD_REPLAY_SINK_JSONL is OFF...")`.
## Non-Functional Requirements
- `emit` p99 ≤ 1 ms (orjson is microsecond-class; file write dominates).
- `close()` p99 ≤ 50 ms (fsync round-trip).
- Memory: bounded at write-buffer size (no in-memory record retention).
## Constraints
- orjson is the chosen serialiser (faster than stdlib `json`; deterministic key ordering per Invariant 10's determinism floor).
- File is opened with explicit binary mode (`"wb"`) to avoid platform-specific newline translation.
- Parent directory existence is validated at construction (fail-fast).
## Risks & Mitigation
- **Risk: orjson serialiser doesn't handle numpy arrays natively** — *Mitigation*: pass `option=orjson.OPT_SERIALIZE_NUMPY`; verified in AC-4.
- **Risk: fsync hangs on slow disks** — *Mitigation*: documented; close() is best-effort; the JSONL is written line-by-line so a hung close still produces a partially-readable file.
- **Risk: large covariance arrays inflate line size** — *Mitigation*: 36-float list per line is ~720 bytes; for a 60 s run at 5 Hz that's 300 records × ~1 KB = 300 KB total — trivial.
## Runtime Completeness
- **Named capability**: offline `EstimatorOutput` sink for replay.
- **Production code**: real orjson-based serialiser, real fsync-on-close.
- **Allowed external stubs**: test fakes only.
- **Unacceptable substitutes**: in-memory list returned at end-of-replay (defeats streaming + UI consumption).
## Contract
Implements `_docs/02_document/contracts/replay/replay_protocol.md``ReplaySink` Protocol + `JsonlReplaySink`; Invariant 7.