[AZ-334] C1 KLT/RANSAC strategy — engine-rule simple-baseline VIO

Implement KltRansacStrategy, the ADR-002 engine-rule mandatory
simple-baseline VioStrategy for E-C1. Pure-Python facade over
OpenCV's cv2.goodFeaturesToTrack / calcOpticalFlowPyrLK /
findEssentialMat / recoverPose pipeline — no C++/pybind11 binding
by design so a Tier-0 workstation runs the strategy with
`pip install opencv-python` and the BUILD_KLT_RANSAC=ON gate alone.
Constructor + state machine + FDR transition spine mirror
Okvis2Strategy + VinsMonoStrategy so the AZ-331 factory + IT-12
comparative harness treat all three as drop-in substitutable; the
duplication is the consolidation target now formally in scope for
the next cumulative review (batches 52-54).

AC coverage: AC-1..AC-11 + NFR-perf mapped to passing tests
(25 tests, 23 pass + 2 tier-2 skipped on dev/CI runners; all 25
pass under GPS_DENIED_TIER=2). Honest-covariance invariant (AC-9)
implemented as residual-scatter / (N_inliers - 5) with an inlier-
count penalty — no client-side floor or smoother; cov Frobenius
grows monotonically across DEGRADED. Camera-agnostic source
(AC-11) enforced by CI-grep gate that excludes docstring text.

Test-Run Cadence: focused suite tests/unit/c1_vio/ green (95 passed,
6 skipped); config-loader + compose-root suites green; full-suite
gate deferred to Step 16 per implement skill.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-14 02:40:01 +03:00
parent 4815dd6aa1
commit ceb24b5a62
10 changed files with 2371 additions and 14 deletions
@@ -0,0 +1,220 @@
# C1 KLT/RANSAC Strategy — Mandatory Simple-Baseline VIO
**Task**: AZ-334_c1_klt_ransac_strategy
**Name**: C1 KLT/RANSAC Strategy
**Description**: Implement `KltRansacStrategy`, the mandatory simple-baseline `VioStrategy` that satisfies the ADR-002 engine rule (every component MUST ship a simple-baseline strategy alongside its production-default). The class is a pure-Python facade over OpenCV's pyramidal KLT optical-flow + RANSAC essential-matrix path. No C++/pybind11 — OpenCV's Python bindings provide everything needed, keeping the simple-baseline path camera-agnostic, dependency-light, and dead-easy to reason about. Bound by AC-2.1a (≥ 95 % tracked-frame ratio on the Derkachi normal segment) and AC-2.2 (MRE p95 < 1 px frame-to-frame; `tests.md` C1-IT-04 binds KltRansac alongside Okvis2). Build-time gated by `BUILD_KLT_RANSAC=ON` (airborne / research / replay-cli; operator-tooling does not need VIO).
**Complexity**: 5 points
**Dependencies**: AZ-331_c1_vio_strategy_protocol, AZ-263_initial_structure, AZ-269_config_loader, AZ-266_log_module, AZ-276_imu_preintegrator, AZ-277_se3_utils, AZ-282_ransac_filter, AZ-272_fdr_record_schema, AZ-273_fdr_client_ringbuf
**Component**: c1_vio (epic AZ-254 / E-C1)
**Tracker**: AZ-334
**Epic**: AZ-254 (E-C1)
### Document Dependencies
- `_docs/02_document/contracts/c1_vio/vio_strategy_protocol.md` — the Protocol this task implements; produced by AZ-331.
- `_docs/02_document/contracts/shared_helpers/imu_preintegrator.md` — IMU substrate (AZ-276); the simple-baseline still consumes the GTSAM `CombinedImuFactor` so C5 fusion sees a consistent `VioOutput` shape.
- `_docs/02_document/contracts/shared_helpers/se3_utils.md` — SE(3) ↔ pose-matrix utilities (AZ-277).
- `_docs/02_document/contracts/shared_helpers/ransac_filter.md` — generic RANSAC inlier filter (AZ-282); reused for the essential-matrix outlier rejection step.
- `_docs/02_document/components/01_c1_vio/description.md` — § 5 implementation details + § 6 helpers + § 7 caveats (sharp turns < 5 % frame overlap cause feature-track loss in all three strategies).
- `_docs/02_document/components/01_c1_vio/tests.md` — C1-IT-03 (≥ 95 % tracked-frame ratio); C1-IT-04 (MRE p95 < 1 px) — both bind this strategy.
- ADR-002 — the engine rule that mandates this baseline.
## Problem
Without `KltRansacStrategy`:
- The ADR-002 engine rule is violated; every airborne binary MUST link a simple-baseline strategy alongside the production-default.
- AC-2.1a (95 % tracked-frame ratio on normal segments — the engine rule's quantitative gate) has no producer; the Derkachi C1-IT-03 fixture cannot validate the rule.
- The deployment binary's defense-in-depth posture collapses: an OKVIS2 backend init failure leaves the runtime with no fallback path.
- The simple-baseline serves as an interpretability anchor — when the production-default OKVIS2 produces a surprising covariance, the operator's debugging step is "compare to KLT/RANSAC on the same input"; without it, the comparison has nothing to compare against.
- The dependency-light path (no OKVIS2 / VINS-Mono native libs) becomes important for Tier-0 workstation development; without `KltRansacStrategy`, every developer needs OKVIS2 native libs installed.
- IT-12 comparative-study cannot include the simple-baseline as a third data point.
This task delivers the engine-rule mandatory baseline. It is the lowest-complexity strategy in this epic by code volume but carries the highest test coverage ratio because the AC-2.1a / AC-2.2 bounds are the gate.
## Outcome
- A `KltRansacStrategy` class at `src/gps_denied_onboard/components/c1_vio/klt_ransac.py` conforming to the `VioStrategy` Protocol from AZ-331; `current_strategy_label() == "klt_ransac"`.
- No `_native/` binding — pure Python over OpenCV's `cv2.calcOpticalFlowPyrLK` + `cv2.goodFeaturesToTrack` + `cv2.findEssentialMat` + `cv2.recoverPose` path. The `helpers.ransac_filter` (AZ-282) is reused for outlier rejection of the per-frame correspondences before pose recovery.
- Constructor `__init__(self, *, calibration: CameraCalibration, preintegrator: ImuPreintegrator, ransac_filter: RansacFilter, fdr_client: FdrClient, logger: Logger, config: KltRansacConfig)` — all dependencies constructor-injected per ADR-009. `KltRansacConfig` (`@dataclass(frozen=True)`) carries the KLT-specific knobs (max corner count, KLT pyramid levels, KLT window size, RANSAC inlier ratio, essential-matrix RANSAC threshold, min-features-for-pose) loaded from `config.vio.klt_ransac.*` via AZ-269.
- `process_frame(frame, imu, calibration) -> VioOutput`:
1. Append IMU samples to the injected `ImuPreintegrator` (the IMU contributes to bias accumulation; KLT itself is vision-only, but the `VioOutput.imu_bias` field still gets populated from the helper's current bias estimate).
2. Convert the input `NavCameraFrame.pixels` to grayscale (OpenCV expects `uint8` single-channel for KLT).
3. If this is the first frame, run `cv2.goodFeaturesToTrack` to seed the feature track buffer; emit `VioOutput` with `state == INIT` (zero relative pose, conservative covariance).
4. Otherwise, call `cv2.calcOpticalFlowPyrLK` to track the prior frame's features into this frame; reject `status==0` correspondences.
5. Run `cv2.findEssentialMat` with `cv2.RANSAC` over the surviving correspondences using `calibration.K`; reject correspondences whose RANSAC mask is 0.
6. Recover the relative pose via `cv2.recoverPose`; convert to SE(3) via `helpers.se3_utils`.
7. Estimate per-frame covariance from the inlier-residual scatter (sample covariance of 2D reprojection residuals back-projected to a 6×6 pose-perturbation covariance via the camera Jacobian; standard textbook approach — see Risks for the honest-covariance constraint).
8. Compute `feature_quality.mre_px` from the inlier residuals; `tracked` / `new` / `lost` counts come from the KLT step.
9. Build and return `VioOutput` with `frame_id` echoed; emit per-frame DEBUG log if enabled.
- `reset_to_warm_start(hint)`: clears the prior-frame feature track buffer, seeds the IMU bias from `hint.bias` via the preintegrator's `reset_with_bias`, and resets the internal "first frame seen" flag so the next `process_frame` re-seeds features. The hint's `body_T_world` is recorded as the baseline for relative-pose chaining; KLT/RANSAC's per-frame relative pose is interpreted relative to this baseline by C5 fusion.
- `health_snapshot()` returns `VioHealth(state, consecutive_lost, bias_norm)`:
- `INIT` for the first 12 frames (until KLT has a prior to track from);
- `TRACKING` when inlier count ≥ `config.vio.klt_ransac.min_features_for_pose`;
- `DEGRADED` when inlier count drops below threshold (covariance Frobenius norm grows in proportion);
- `LOST` after `config.vio.lost_frame_threshold` consecutive frames where pose recovery fails (e.g., RANSAC finds no consensus).
- The honest-covariance invariant is enforced behaviourally: the per-frame covariance grows monotonically as inlier count drops; no client-side floor or smoother is applied.
- Error envelope is closed: every OpenCV `cv2.error` is caught inside `process_frame` / `reset_to_warm_start` and rewrapped into the `VioError` family. `VioInitializingError` for the first frame; `VioFatalError` on sustained pose-recovery failure (RANSAC consensus < threshold for `lost_frame_threshold` frames).
- All FDR records emitted via the injected `FdrClient` use the `kind="vio.health"` schema.
## Scope
### Included
- `KltRansacStrategy` class + `KltRansacConfig` dataclass.
- The full `process_frame` / `reset_to_warm_start` / `health_snapshot` / `current_strategy_label` surface conforming to AZ-331's Protocol.
- IMU substrate via the constructor-injected `ImuPreintegrator` (AZ-276); the simple-baseline still calls into the helper for bias accumulation so `VioOutput.imu_bias` is consistent across all three strategies.
- RANSAC outlier rejection via the constructor-injected `RansacFilter` (AZ-282); KLT/RANSAC does not duplicate AZ-282's logic.
- Honest covariance estimation from the residual scatter — the formula and its limitations are documented in the spec; no smoothing on the read path.
- Per-frame DEBUG log gated by `config.vio.per_frame_debug_log` (default OFF).
- WARN / ERROR / INFO logging per description.md § 9.
- Health-state transitions emitted as FDR records.
- Composition-root wiring (entry to the AZ-331 `build_vio_strategy` factory's `klt_ransac` branch).
- KLT/RANSAC is the only strategy that builds for ALL of airborne / research / replay-cli; module-layout's Build-Time Exclusion Map shows `BUILD_KLT_RANSAC=ON` for those three binaries (operator-tooling does not need VIO).
- The KLT path MUST be camera-agnostic — no `adti20` / `adti26` specific branches; the calibration arrives via the per-call `CameraCalibration` argument.
### Excluded
- OKVIS2 strategy — separate task.
- VINS-Mono strategy — separate task.
- Warm-start hint persistence — separate task in this epic.
- C5 fusion of `VioOutput` — owned by E-C5.
- C13 FDR writer-thread — owned by E-C13.
- IMU preintegration — owned by AZ-276.
- The C1-IT-01..06 / C1-PT-01 tests themselves — deferred to Step 9 / E-BBT (KLT/RANSAC's AC-2.1a + AC-2.2 bindings will live there).
- Non-OpenCV optical-flow algorithms (Farnebäck dense flow, etc.) — out of scope; KLT pyramidal is the canonical simple baseline.
- Bundle-adjustment refinement of the recovered pose — out of scope this cycle; the engine rule's bound is per-frame relative pose, not refined keyframe pose.
## Acceptance Criteria
**AC-1: `current_strategy_label()` returns `"klt_ransac"`**
Given a `KltRansacStrategy` constructed via the AZ-331 factory with `config.vio.strategy = "klt_ransac"`
When `current_strategy_label()` is called
Then the returned string is exactly `"klt_ransac"`
**AC-2: First frame emits `VioOutput` with `state == INIT` and zero relative pose**
Given a freshly-constructed strategy and the very first `NavCameraFrame`
When `process_frame(frame, imu, calibration)` is called
Then a `VioOutput` is returned with `relative_pose_T` equal to the SE(3) identity (within numerical tolerance) and `pose_covariance_6x6` equal to the configured INIT-state conservative covariance; `health_snapshot().state == INIT`
**AC-3: Steady-state frame emits `VioOutput` with non-zero relative pose and SPD covariance**
Given a strategy with prior frames processed and a current frame with sufficient inliers
When `process_frame` is called
Then `relative_pose_T` is non-identity (the camera moved between frames in the fixture); `pose_covariance_6x6` is symmetric and positive-definite; `feature_quality.mre_px > 0`; `feature_quality.tracked > 0`
**AC-4: Pose recovery rewraps `cv2.error` into `VioError`**
Given a frame that triggers `cv2.findEssentialMat` or `cv2.recoverPose` to raise `cv2.error`
When `process_frame` is called
Then `VioFatalError` is raised; the original `cv2.error` is chained via `raise ... from`; no raw `cv2.error` leaks to the caller
**AC-5: `reset_to_warm_start` clears feature buffer and re-seeds bias**
Given a strategy with N processed frames and a non-default IMU bias
When `reset_to_warm_start(hint)` is called with a known `hint.bias`
Then the next `process_frame` call's first behaviour is `cv2.goodFeaturesToTrack` (verifiable via spy on the OpenCV call); `VioOutput.imu_bias` reflects `hint.bias` (within numerical tolerance); calling `reset_to_warm_start` again does not raise
**AC-6: Inlier loss → `DEGRADED` state with monotonically growing covariance**
Given a strategy in TRACKING state
When a frame is processed where the surviving inlier count is below `config.vio.klt_ransac.min_features_for_pose`
Then `VioOutput.pose_covariance_6x6` Frobenius norm is strictly greater than the prior frame's; `health_snapshot().state == DEGRADED`; `VioOutput` IS emitted (not raised)
**AC-7: Sustained pose-recovery failure raises `VioFatalError`**
Given a strategy in DEGRADED state
When `config.vio.lost_frame_threshold` consecutive frames fail pose recovery (e.g., RANSAC finds no consensus)
Then the next `process_frame` call raises `VioFatalError`; `health_snapshot().state == LOST`
**AC-8: `BUILD_KLT_RANSAC=OFF` does not import the strategy module**
Given the operator-tooling binary built with `BUILD_KLT_RANSAC=OFF`
When `gps_denied_onboard.components.c1_vio` is imported
Then `sys.modules` does NOT contain `gps_denied_onboard.components.c1_vio.klt_ransac`; AZ-331's factory raises `StrategyNotAvailableError("klt_ransac", missing_flag="BUILD_KLT_RANSAC")` if `klt_ransac` is requested
**AC-9: Honest covariance — no shrinkage during DEGRADED**
Given a controlled-degradation 60 s synthetic input
When `process_frame` runs through the degradation event
Then `||pose_covariance_6x6||_F` is monotonically non-decreasing from the moment `health_snapshot().state` first transitions to `DEGRADED` until `TRACKING` is restored or `LOST` is reached; the covariance estimator does NOT apply any client-side floor or smoother
**AC-10: FDR `vio.health` records emitted on every state transition**
Given the strategy is configured with a real `FdrClient` (or test double)
When `health_snapshot().state` transitions
Then exactly one FDR record with `kind="vio.health"` and the new state is emitted; no records on steady-state frames
**AC-11: Camera-agnostic path**
Given two `CameraCalibration` instances representing different deployed cameras (test fixture `adti26` and a synthetic alternate calibration)
When the same `process_frame` code path is exercised against both calibrations
Then the strategy produces sensible `VioOutput` for both without any calibration-specific branch in the source code (verifiable via static-grep CI gate: no `adti20` / `adti26` literals in `klt_ransac.py`)
## Non-Functional Requirements
**Performance**
- `process_frame` p95 ≤ 80 ms on Tier-2 (budget shared with OKVIS2 — KLT/RANSAC is typically faster but the budget is the wire boundary). Failure threshold 120 ms.
- Throughput ≥ 3 Hz sustained; failure threshold < 2.5 Hz.
- CPU ≤ 30 % of one core (OpenCV's `cv2.calcOpticalFlowPyrLK` is multi-threaded internally; bound at 30 % per ADR-002 budget partition).
- Memory ≤ 1.5 GB resident.
- AC-2.1a: ≥ 95 % tracked-frame ratio on Derkachi normal segment (C1-IT-03; deferred to Step 9 / E-BBT for the actual test, but this strategy MUST be capable of meeting it on the named fixture).
- AC-2.2: MRE p95 < 1 px frame-to-frame (C1-IT-04; this strategy IS bound by it per `tests.md`).
**Compatibility**
- OpenCV ≥ 4.12.0 (CVE-2025-53644 mitigation per architecture § 2 dependency table).
- No additional third-party dependencies — OpenCV + numpy only (numpy already pinned).
**Reliability**
- Error envelope closed at the `VioError` family; no raw OpenCV / numpy exceptions cross the API surface.
- Single-threaded by Protocol contract; one instance per camera ingest thread.
- Pure-Python — no native-lib install requirement; works on Tier-0 workstation with `pip install opencv-python` only.
**Concurrency**
- One `KltRansacStrategy` instance per camera ingest thread.
## Unit Tests
| AC Ref | What to Test | Required Outcome |
|--------|-------------|-----------------|
| AC-1 | `current_strategy_label()` after factory build with `klt_ransac` config | Returns `"klt_ransac"` |
| AC-2 | First-frame `process_frame` on a fixture | `VioOutput` returned; `relative_pose_T` ≈ identity; `state == INIT` |
| AC-3 | Process N frames; inspect `VioOutput` shape | `relative_pose_T` non-identity; `pose_covariance_6x6` SPD; `feature_quality.mre_px > 0` |
| AC-4 | Inject a frame that triggers `cv2.error` (mock OpenCV call) | `VioFatalError` raised; `__cause__` is the `cv2.error` |
| AC-5 | `reset_to_warm_start` then `process_frame` | First call to OpenCV is `goodFeaturesToTrack`; `imu_bias` reflects hint |
| AC-6 | Feed degraded fixture (low inlier count) | Covariance Frobenius norm strictly increases; `state == DEGRADED`; `VioOutput` IS emitted |
| AC-7 | `lost_frame_threshold` consecutive failed-pose frames | `VioFatalError` on next `process_frame`; `state == LOST` |
| AC-8 | `BUILD_KLT_RANSAC=OFF` import + factory call | Module not in `sys.modules`; factory raises `StrategyNotAvailableError` |
| AC-9 | 60 s controlled-degradation synthetic | Covariance monotonically non-decreasing during DEGRADED |
| AC-10 | Real / fake `FdrClient` spy through state transitions | Exactly one `vio.health` record per transition |
| AC-11 | Static-grep gate + run with two calibrations | No `adti20` / `adti26` literals in source; both calibrations produce sensible output |
| NFR-perf | C1-PT-01 microbench against Derkachi normal segment (Tier-2) | p95 ≤ 80 ms; throughput ≥ 3 Hz |
| NFR-reliability-error-envelope | Raise `cv2.error` from each OpenCV call point | All caught and rewrapped to `VioError` family |
## Constraints
- This task implements (does NOT define) the AZ-331 Protocol.
- KLT/RANSAC is pure Python — NO `_native/` binding under this strategy.
- The strategy MUST consume IMU via the AZ-276 `ImuPreintegrator` for bias propagation, even though KLT itself is vision-only (keeps `VioOutput.imu_bias` consistent across all three strategies).
- The strategy MUST consume RANSAC via the AZ-282 `RansacFilter` for the inlier-rejection step (cross-cutting helper; do not duplicate locally).
- OpenCV ≥ 4.12.0 is the only third-party dependency added by this task (already pinned at the project level).
- No covariance floor / smoother on the read path — the residual-scatter covariance estimator is the canonical formula; document its limitations in the Risks section.
- Per-frame DEBUG defaults OFF.
- Camera-agnostic — no calibration-specific branches in source. CI grep gate enforces.
- The `KltRansacConfig` schema extension to AZ-269 is owned by this task.
## Risks & Mitigation
**Risk 1: Residual-scatter covariance under-reports during high-overlap straight flight**
- *Risk*: The standard residual-scatter formula assumes residual noise is uncorrelated with pose perturbation; in long straight-flight segments the assumption holds, but in low-parallax scenarios the formula can under-report covariance — not a "honest-covariance violation" in the AC-9 sense (that test catches monotonicity), but a quantitative under-report C5 fusion will over-trust.
- *Mitigation*: D-CROSS-LATENCY-1 + AC-NEW-4 statistical headroom carry the residual risk; the C1-IT-12 comparative-study report (Step 9 / E-BBT) cross-validates KLT's covariance against OKVIS2's tightly-coupled output. The strategy spec documents the limitation; the deployed binary uses OKVIS2 by default and KLT only as a fallback / engine-rule baseline.
**Risk 2: KLT loses track on the first frame after take-off (no prior frame to track from)**
- *Risk*: AC-2 covers the INIT-state behaviour, but a misconfigured deployment that calls `process_frame` once and then crashes would leave C5 with no `VioOutput`; the AC-5.2 fallback is the right path but the diagnostic is harder.
- *Mitigation*: Per-frame DEBUG log (when enabled) records the INIT-state transition; the FDR `vio.health` record at INIT → TRACKING is emitted (AC-10) regardless of DEBUG state, so post-flight inspection always shows the warm-up.
**Risk 3: `cv2.findEssentialMat` is sensitive to RANSAC inlier-threshold tuning**
- *Risk*: The default OpenCV RANSAC threshold is in pixel units of normalised image coordinates; a misconfiguration makes the strategy either reject every correspondence or accept every outlier.
- *Mitigation*: `config.vio.klt_ransac.essential_matrix_ransac_threshold` is documented + tested with a sensitivity sweep in the deferred Step 9 test. AZ-282 (`RansacFilter`) provides a generic RANSAC entry point that this strategy uses for the AZ-282-managed correspondence-rejection step (a separate stage from `cv2.findEssentialMat`'s internal RANSAC).
**Risk 4: Sharp turns < 5 % frame overlap (RESTRICT-UAV-3) cause feature-track loss**
- *Risk*: The architecture's RESTRICT-UAV-3 calls out this constraint; KLT/RANSAC will lose tracks faster than OKVIS2 in this regime.
- *Mitigation*: The strategy reports `DEGRADED` (AC-6) immediately when inlier count drops; F6 satellite re-localisation (E-C2 / E-C3 / E-C4 path) is the recovery; no work for this strategy beyond honest reporting.
## Runtime Completeness
- **Named capability**: KLT pyramidal optical-flow + RANSAC essential-matrix simple-baseline VIO; the ADR-002 engine-rule mandatory baseline (architecture / E-C1 / `solution.md` "KltRansac mandatory simple-baseline").
- **Production code that must exist**: real `KltRansacStrategy` class implementing the AZ-331 Protocol; real OpenCV ≥ 4.12.0 calls (`cv2.calcOpticalFlowPyrLK`, `cv2.goodFeaturesToTrack`, `cv2.findEssentialMat`, `cv2.recoverPose`); real residual-scatter covariance from the inlier residuals; real bias propagation through AZ-276; real RANSAC inlier rejection through AZ-282.
- **Allowed external stubs**: tests MAY mock `cv2.error` raises at specific call points (AC-4); production wiring uses real OpenCV.
- **Unacceptable substitutes**: a deterministic-fallback `VioOutput` that bypasses OpenCV (would defeat AC-2.1a's tracked-frame ratio); a covariance floor (would break AC-9); skipping the AZ-276 `ImuPreintegrator` (would break the `VioOutput.imu_bias` consistency invariant across strategies); duplicating RANSAC logic instead of reusing AZ-282 (cross-cutting violation per `coderule.mdc` and decompose Step 2 § 9).