[AZ-265] Replay as configuration of airborne binary (ADR-011)

Re-design replay mode per user direction: replay is no longer a fourth
Docker image with a reduced component set, but a `config.mode = "replay"`
branch of the single airborne binary. The pre-flight workflow (route in
suite UI -> C12 tile download via real satellite-provider -> C10
manifest+engines build) is identical between live and replay; only three
strategies swap at compose time:

  FrameSource:      Live <-> Video
  FcAdapter:        Pymavlink/MSP2 <-> TlogReplay
  MavlinkTransport: Serial <-> Noop

The C8 outbound MAVLink encoders run unchanged in both modes; their
bytes hit `NoopMavlinkTransport` in replay and disappear. A new
`JsonlReplaySink` taps C5's `EstimatorOutput` stream so the parent-suite
UI sees per-tick coordinates by tailing `results.jsonl`. MAVLink 2.0
signing key remains mandatory (operator supplies a dummy file).

A new `replay_input/` Layer-4 cross-cutting coordinator owns
`(video, tlog) -> (FrameSource, FcAdapter, Clock)` convergence; the
composition root sees only standard interfaces past `.open()`.

Docs:
- architecture.md: new ADR-011 with full rationale; ADR-002 binary
  narrative updated.
- contracts/replay/replay_protocol.md: bumped to v2.0.0; 12 invariants
  (notably mode-agnosticism + encoder byte-equality + signing key
  mandatory + real C6 cache in replay).
- module-layout.md: Build-Time Exclusion Map dropped from 4 to 3 binary
  columns; replay-mode `BUILD_*` flags default ON in airborne;
  `shared/replay_input` cross-cutting entry added.
- epics.md: E-DEMO-REPLAY scope reframed; story points 27-32 -> 19-24.

Task respecs:
- AZ-401: shrunk 3 -> 2 pts; `compose_root` mode branch + JSONL sink +
  NoopMavlinkTransport wiring; legacy `compose_replay` export deleted.
- AZ-402: console-script wrapper that mutates `config.mode = "replay"`
  and dispatches into the shared airborne main; `--mavlink-signing-key`
  mandatory.
- AZ-403: CANCELLED. Moved to done/ with banner; Jira transition deferred
  via `_docs/_process_leftovers/2026-05-14_az_403_cancellation_pending_tracker.md`.
- AZ-404: AC-4 reworded as mode-agnosticism AST scan + encoder
  byte-equality test; new AC-8 operator-workflow rehearsal.
- AZ-405: also owns the `replay_input/` module + `ReplayInputAdapter`.

_dependencies_table.md updated: AZ-401 gains AZ-405 dep; AZ-404 drops
AZ-403 dep; AZ-403 row marked CANCELLED.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-14 09:01:04 +03:00
parent fa3742d582
commit 5adf3dd04f
13 changed files with 765 additions and 418 deletions
+83 -44
View File
@@ -1,101 +1,140 @@
# Replay — gps-denied-replay CLI entrypoint + arg parser + calibration loader
# Replay — `gps-denied-replay` console-script wrapper (mode-config dispatcher)
**Task**: AZ-402_replay_cli
**Name**: `gps-denied-replay` CLI entrypoint + argparse + camera-calibration loader
**Description**: Implement the `gps-denied-replay` console script: `argparse`-based CLI accepting `--video PATH --tlog PATH --output results.jsonl --camera-calibration calib.json --config config.yaml [--pace {realtime,asap}] [--time-offset-ms N]`. Load and validate the camera-calibration JSON (project's standard pinhole + distortion-coefficients schema, reusable via the config loader from AZ-269/AZ-270 if practical, otherwise a small dedicated loader); construct the `Config` object; invoke `compose_replay(config) -> ReplayRoot`; call `replay_root.runtime_loop()`; map the returned exit code to the process exit code (0 = success per AC-1 of the epic; 2 = sync-impossible per AC-8; 1 = any other error). Set up structured logging (stdout JSON per project convention) and FDR client. Exit-code mapping documented inline. CLI registered as a console_script entrypoint in pyproject.toml under `[project.scripts]` (or equivalent build-config).
**Complexity**: 3 points
**Dependencies**: AZ-401 (`compose_replay` + `ReplayRoot.runtime_loop`); AZ-269 / AZ-270 (config); AZ-263; AZ-266; AZ-272 (FDR record schema); AZ-273 (`FdrClient`)
**Component**: replay-cli (epic AZ-265 / E-DEMO-REPLAY) — CLI entrypoint at `src/gps_denied_onboard/cli/replay.py`
**Name**: `gps-denied-replay` console-script — thin mode-config wrapper that builds a replay-mode `Config` and dispatches into the shared airborne entry point
**Description**: Implement the `gps-denied-replay` console-script in `src/gps_denied_onboard/cli/replay.py`. Per ADR-011, this is **not a standalone CLI** with its own composition root — it is a thin wrapper around the live airborne entry point that loads `config.yaml`, sets `config.mode = "replay"`, applies the replay-specific CLI args (`--video`, `--tlog`, `--output`, `--time-offset-ms`, `--pace`, `--mavlink-signing-key`), and calls the same `main()` function the live `gps-denied-onboard` binary calls. The shared main entry point calls `compose_root(config)` (which branches on `config.mode` per AZ-401) and runs the per-frame loop; the runtime loop is unchanged between live and replay.
CLI surface (argparse):
```
gps-denied-replay
--video PATH # required
--tlog PATH # required
--output results.jsonl # required
--camera-calibration calib.json # required
--config config.yaml # required (same schema as airborne)
--mavlink-signing-key PATH # required (operator supplies a dummy key for replay; signing handshake still runs)
[--pace {realtime,asap}] # default asap
[--time-offset-ms N] # overrides AZ-405 auto-sync inside replay_input/
```
The CLI:
1. Parses arguments + validates file existence (video, tlog, calib, config, signing key).
2. Loads `config.yaml` via the existing `config/` loader.
3. Loads the camera-calibration JSON (small dedicated loader; pinhole + distortion-coefficients schema).
4. Mutates the loaded config: `config.mode = "replay"`, `config.replay.video_path = ...`, `config.replay.tlog_path = ...`, `config.replay.output_path = ...`, `config.replay.pace = ...`, `config.replay.time_offset_ms = ...` (None if not provided — `ReplayInputAdapter` will auto-detect via AZ-405).
5. Calls the SAME `main(config, camera_calibration, signing_key_path)` function the live `gps-denied-onboard` binary already calls. The shared main wires everything via `compose_root(config)` and runs the per-frame loop.
6. Maps the runtime exit code to the process exit code (0 = success; 2 = `ReplayInputAdapter.open()` auto-sync hard-fail per AC-8 of the epic; 1 = any other error).
7. Top-level try/except logs the FULL traceback via `logger.exception` and exits 1 on any unhandled exception.
**Complexity**: 3 points (unchanged from v1.0.0 — the CLI shape is the same; what changed is that the CLI does NOT host the composition logic; it just builds a config and dispatches).
**Dependencies**: AZ-401 (`compose_root` extension with `config.mode == "replay"` branch + the `Config.mode` + `Config.replay` schema additions); AZ-269 / AZ-270 (config loader); AZ-263 (the shared airborne `main()` entry point); AZ-266 (logging); AZ-272 (FDR record schema); AZ-273 (`FdrClient`).
**Component**: replay-cli (epic AZ-265 / E-DEMO-REPLAY) — `src/gps_denied_onboard/cli/replay.py`.
**Tracker**: AZ-402
**Epic**: AZ-265 (E-DEMO-REPLAY)
### Document Dependencies
- `_docs/02_document/contracts/replay/replay_protocol.md` — CLI surface specification.
- `_docs/02_document/architecture.md` — § 5 (binary topology; replay-cli is the fourth Docker image).
- `_docs/02_document/contracts/replay/replay_protocol.md` (v2.0.0) — CLI surface specification + Invariant 11 (signing key mandatory in replay).
- `_docs/02_document/architecture.md` **ADR-011** (replay-as-configuration) + § 5 (binary topology; replay runs from the airborne image).
- `_docs/02_document/module-layout.md``cli/replay` cross-cutting entry (the console-script wrapper, not a standalone CLI).
## Problem
Without this task, the `compose_replay` composition root has no entrypoint — the parent-suite UI cannot shell out to a replay run. The CLI is the user-facing surface (and CI-test surface) of the replay binary.
Without this task, the operator has no entry point to invoke `config.mode == "replay"` against an arbitrary `(video, tlog)` pair — they would need to manually edit a config file with the replay-mode flag and the per-file paths, then invoke the airborne entry point. The CLI is the user-facing surface (and CI-test surface) for the replay mode.
## Outcome
- `src/gps_denied_onboard/cli/replay.py``main()` entrypoint:
- argparse setup with all 7 args + the 2 optional ones.
- calibration loader (small JSON loader; pinhole + distortion-coefficients schema) — module-internal.
- config loader invocation (re-use AZ-269 / AZ-270 plumbing).
- `compose_replay(config)` invocation.
- `replay_root.runtime_loop()` invocation; exit code propagated.
- Structured logging + FDR client setup.
- Top-level try/except: logs the error class + message + suggested next step before exiting 1.
- `pyproject.toml` (or equivalent) registers `gps-denied-replay = "gps_denied_onboard.cli.replay:main"`.
- INFO log at startup: `kind="replay.cli.started"` with all CLI args (sanitised — no key bytes per E-C8 signing invariants, but replay has no signing).
- INFO log at exit: `kind="replay.cli.exited"` with `{exit_code, frames_processed, lines_written}`.
- Unit tests: argparse defaults + overrides, calibration loader rejects malformed JSON, config loader passes-through to `compose_replay`, exit-code mapping on each known runtime_loop return value.
- argparse setup with all 6 required args + the 2 optional ones.
- File-existence validation for all required-file args (video, tlog, calib, config, signing key); fails fast with `ReplayCliError` + exit 1 on missing files.
- Calibration loader (small JSON loader; pinhole + distortion-coefficients schema) — module-internal helper.
- Config loader invocation (re-use AZ-269 / AZ-270 plumbing).
- Mode-config mutation: `config.mode = "replay"` + `config.replay.{video_path, tlog_path, output_path, pace, time_offset_ms}` populated from CLI args.
- Dispatch into the shared airborne `main(config, camera_calibration, signing_key_path)` entry point.
- Exit-code mapping: shared main returns 0 / 1 / 2 → CLI exits with the same code.
- Structured logging setup + FDR client setup happen inside the shared main (NOT duplicated here).
- Top-level try/except: logs the FULL traceback via `logger.exception` + exits 1 on any unhandled exception.
- `pyproject.toml` `[project.scripts]` registers `gps-denied-replay = "gps_denied_onboard.cli.replay:main"`.
- INFO log at CLI startup (BEFORE config load, since logging is not yet bootstrapped): a single `print(f"gps-denied-replay starting with args: {sanitised_args}")` via stderr; the shared main then bootstraps structured logging properly. `--mavlink-signing-key` value is replaced by `<redacted>` in the printed args.
- Unit tests:
- `test_argparse_all_args`: all 6 required + 2 optional args parsed correctly; defaults applied.
- `test_argparse_missing_video_exits_2`: argparse exits 2 when `--video` is omitted (stdlib argparse default).
- `test_argparse_missing_signing_key_exits_2`: same for `--mavlink-signing-key`.
- `test_calibration_loader_malformed`: corrupt calib.json → `ReplayCliError("camera-calibration JSON malformed: <details>")` + exit 1.
- `test_calibration_loader_schema`: calib.json missing `intrinsics``ReplayCliError("camera-calibration schema invalid: missing 'intrinsics'")`.
- `test_config_mode_set_to_replay`: parse args + invoke the CLI; capture the `Config` passed to the shared main; assert `config.mode == "replay"` + `config.replay.video_path` etc. populated.
- `test_dispatch_to_shared_main`: assert the shared main is called exactly once with the mutated config; assert no separate composition logic is invoked inside `cli/replay.py`.
- `test_exit_code_pass_through`: with a FakeMain returning 0 / 1 / 2, the CLI exits 0 / 1 / 2 respectively.
- `test_top_level_exception_logged_and_exits_1`: an unhandled exception inside the shared main is logged with full traceback (verified via `logger.exception` mock) and the CLI exits 1.
- `test_console_script_registered`: install the package in a fresh venv (via `tox` or `pytest-virtualenv`); assert `gps-denied-replay --help` runs and prints the argparse usage.
## Scope
### Included
- argparse + arg-validation (file existence, output-parent existence).
- camera-calibration JSON loader + schema validation.
- `compose_replay` invocation + runtime_loop dispatch.
- Exit-code mapping.
- Top-level error handling (catch + log + exit 1 on unexpected exception).
- argparse + arg-validation (file existence).
- camera-calibration JSON loader + schema validation (module-internal helper).
- Config-mode mutation (`config.mode = "replay"` + replay sub-config population).
- Dispatch into the shared airborne `main()` entry point.
- Exit-code mapping (pass-through).
- Top-level error handling.
- Console-script registration in pyproject.toml.
- Unit tests for argparse + calibration loader + exit-code mapping.
- All unit tests listed above.
### Excluded
- Auto-sync IMU take-off detection — owned by AZ-405 (this task accepts `--time-offset-ms` and forwards to config/replay; the auto-sync TASK computes the default value).
- Dockerfile + CI matrix — owned by Docker task.
- E2E replay fixture test — owned by E2E task.
- Auto-sync IMU take-off detection — owned by AZ-405 (the `ReplayInputAdapter` inside `replay_input/` consumes `--time-offset-ms` from config OR auto-detects when None).
- The `compose_root` branch + the JSONL sink + the NoopMavlinkTransport wiring — owned by AZ-401.
- E2E replay fixture test — owned by AZ-404.
- The shared airborne `main()` function itself — owned by AZ-263 / the existing airborne entry-point task. This task assumes the shared main exists and is callable with `(config, camera_calibration, signing_key_path)`.
## Acceptance Criteria
**AC-1: All required args parsed** — invoke with `--video v.mp4 --tlog t.tlog --output o.jsonl --camera-calibration c.json --config conf.yaml`; assert all five values reach `compose_replay`'s `Config` object.
**AC-1: All required args parsed** — invoke with `--video v.mp4 --tlog t.tlog --output o.jsonl --camera-calibration c.json --config conf.yaml --mavlink-signing-key key.bin`; assert all six values reach the shared main (or the `Config` mutation phase) intact.
**AC-2: --pace default ASAP** — invoke without `--pace`; assert config has `pace=ReplayPace.ASAP`.
**AC-2: `--pace` default ASAP** — invoke without `--pace`; assert `config.replay.pace == "asap"`.
**AC-3: --pace realtime** — invoke with `--pace realtime`; assert config has `pace=ReplayPace.REALTIME`.
**AC-3: `--pace realtime`** — invoke with `--pace realtime`; assert `config.replay.pace == "realtime"`.
**AC-4: --time-offset-ms forwarded** — invoke with `--time-offset-ms 5000`; assert config has `time_offset_ms=5000`.
**AC-4: `--time-offset-ms` forwarded** — invoke with `--time-offset-ms 5000`; assert `config.replay.time_offset_ms == 5000`. Without `--time-offset-ms`, assert `config.replay.time_offset_ms is None` (and `ReplayInputAdapter` will auto-detect).
**AC-5: Missing required arg → exit 2 + helpful message** — invoke without `--video`; assert exit code 2 (argparse default) + stderr message names the missing arg.
**AC-5: `--mavlink-signing-key` required** — invoke without `--mavlink-signing-key`; assert argparse exits 2 with stderr message naming the missing arg. Per replay protocol Invariant 11.
**AC-6: Calibration loader rejects malformed JSON** — pass a corrupt calib.json; assert `ReplayCliError("camera-calibration JSON malformed: <details>")` + exit 1.
**AC-7: Calibration schema validation** — pass a calib.json missing `intrinsics` key; assert `ReplayCliError("camera-calibration schema invalid: missing 'intrinsics'")`.
**AC-8: Output parent dir validation**`--output /nonexistent/out.jsonl``ReplayCliError("output parent directory does not exist")` + exit 1 (consistent with `JsonlReplaySink` behaviour).
**AC-8: Mode set to replay** — capture the `Config` object passed to the shared main; assert `config.mode == "replay"`.
**AC-9: Exit-code mapping** — wire a `FakeReplayRoot` whose `runtime_loop` returns 0 / 1 / 2; assert process exit code matches each.
**AC-9: Exit-code pass-through** — wire a FakeMain returning 0 / 1 / 2; assert the CLI exits 0 / 1 / 2 respectively. Exit code 2 is reserved for `ReplayInputAdapter.open()` auto-sync hard-fail (set by the shared main / `compose_root`), NOT for argparse missing-arg (which uses argparse's default exit 2 but with a distinguishable stderr message).
**AC-10: Console script registered** — install the package in a fresh venv; assert `gps-denied-replay --help` runs and prints the argparse usage.
## Non-Functional Requirements
- CLI startup p99 ≤ 5 s (cold-start NFT from the epic).
- argparse + calibration loading p99 ≤ 100 ms (excluding `compose_replay` itself).
- CLI startup p99 ≤ 5 s (cold-start NFT from the epic, including config + calibration loading).
- argparse + calibration loading p99 ≤ 100 ms (excluding `compose_root` itself).
## Constraints
- argparse (stdlib) — no new CLI framework.
- JSON for calibration (already the project convention).
- Exit codes: 0 = success; 2 = AC-8 sync-impossible (or argparse missing-arg); 1 = any other error.
- Exit codes: 0 = success; 2 = AC-8 sync-impossible (set by the shared main from `ReplayInputAdapter`) OR argparse missing-arg (stdlib default); 1 = any other error.
- Console-script registration in pyproject.toml `[project.scripts]`.
- The CLI MUST NOT call `compose_root` directly — it mutates the config and dispatches into the shared main, which calls `compose_root`. This keeps the live and replay code paths converged at the same entry point per ADR-011.
## Risks & Mitigation
- **Risk: argparse exit code 2 conflicts with epic AC-8 exit code 2** — *Mitigation*: documented; argparse exit 2 is for "missing-required-arg / arg-parse-error" — operator can distinguish via stderr; AC-8 exit 2 is for runtime-sync-impossible.
- **Risk: argparse exit code 2 conflicts with epic AC-8 exit code 2** — *Mitigation*: documented; the argparse path emits a `usage:` line + a "the following arguments are required: …" line to stderr (stdlib default), whereas the AC-8 path emits a `replay.auto_sync.ac8_validation_failed` structured log line with the auto-detected offset + match percentage. Operators distinguish via stderr inspection.
- **Risk: calibration JSON schema drift** — *Mitigation*: schema-validate at load time; AC-7 enforces.
- **Risk: top-level error swallowing makes debugging hard** — *Mitigation*: top-level except logs the FULL traceback (via `logger.exception`); the exit code is 1 but the operator sees the traceback in stderr.
- **Risk: the CLI accidentally re-implements composition logic** — *Mitigation*: AC-8 (`config.mode == "replay"` set) + dispatch-to-shared-main test together prevent any composition logic from sneaking into `cli/replay.py`. Code-review checklist on the PR.
## Runtime Completeness
- **Named capability**: `gps-denied-replay` CLI.
- **Production code**: real argparse, real calibration loader, real `compose_replay` dispatch, real exit-code propagation.
- **Named capability**: `gps-denied-replay` console-script that activates replay mode on the airborne binary.
- **Production code**: real argparse, real calibration loader, real config-mode mutation, real dispatch to the shared main, real exit-code pass-through.
- **Allowed external stubs**: test fakes only.
- **Unacceptable substitutes**: a click-based or typer-based CLI (adds a dependency for no gain over stdlib argparse).
- **Unacceptable substitutes**: a click-based or typer-based CLI (adds a dependency for no gain over stdlib argparse); calling `compose_root` directly from the CLI (bypasses the shared main and defeats ADR-011's "same entry point for both modes" property).
## Contract
Implements `_docs/02_document/contracts/replay/replay_protocol.md` — CLI surface.
Implements `_docs/02_document/contracts/replay/replay_protocol.md` (v2.0.0) — CLI surface + Invariant 11 (signing key mandatory). Operationalises ADR-011.