[AZ-333] C1 VINS-Mono strategy — research-only comparative VIO

VinsMonoStrategy: Python facade conforming to AZ-331 Protocol; mirrors
the AZ-332 OKVIS2 facade so the AZ-331 factory + IT-12 comparative
harness can treat both as drop-in substitutable. Native binding is a
pybind11 skeleton compiled behind BUILD_VINS_MONO=ON (default OFF for
airborne / operator-tooling / replay-cli per module-layout.md
Build-Time Exclusion Map). Real vins_estimator wiring is the Tier-2
follow-up.

VinsMonoConfig added to c1_vio/config.py with sliding-window /
feature-tracker / marginalisation / opt-iteration knobs plus
__post_init__ validation; exported through the package __init__.

cpp/vins_mono/CMakeLists.txt replaces the AZ-263 placeholder with full
pybind11 wiring: Risk-1 mitigation forces VINS_MONO_USE_ROS=OFF;
Risk-2 mitigation links Eigen from the same cpp/_third_party/eigen pin
as OKVIS2; Risk-3 mitigation enforces BUILD_VINS_MONO=OFF in
deployment binaries via the gate at the top of the file.

Tests: 17 new in test_vins_mono_strategy.py (15 pass + 2 tier2 skip);
fake_vins_mono_binding fixture added to conftest.py mirroring the
fake_okvis2_binding pattern; test_protocol_conformance updated to drop
vins_mono from _STRATEGIES_WITHOUT_PY_MODULE so the existing
parametrised factory tests route through the new strategy.

Focused c1_vio suite: 72 passed, 4 skipped. Full suite: 1788 passed,
1 unrelated pre-existing flake (c12 cold-start perf, env-bound).

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-14 01:11:09 +03:00
parent 2ce300ddb1
commit 6a5954bdae
13 changed files with 2056 additions and 15 deletions
@@ -1,198 +0,0 @@
# C1 VINS-Mono Strategy — Research-Only Comparative VIO
**Task**: AZ-333_c1_vins_mono_strategy
**Name**: C1 VINS-Mono Strategy
**Description**: Implement `VinsMonoStrategy`, the research-only `VioStrategy` that participates in the IT-12 comparative-study build only. The class is a Python facade over the VINS-Mono C++ loosely-coupled VIO core (sliding-window optimizer with separate IMU pre-integration thread) accessed via a pybind11 wrapper around `cpp/vins_mono/`. Build-time gated by `BUILD_VINS_MONO`; not present in any deployment-bound binary (airborne / operator-tooling / replay-cli all OFF; only research is ON per `module-layout.md` Build-Time Exclusion Map). MRE p95 < 1 px frame-to-frame is **not** required of VINS-Mono per the C1 component's `tests.md` (only Okvis2 + KltRansac are bound by AC-2.2); VINS-Mono is exempt as research-only.
**Complexity**: 5 points
**Dependencies**: AZ-331_c1_vio_strategy_protocol, AZ-263_initial_structure, AZ-269_config_loader, AZ-266_log_module, AZ-276_imu_preintegrator, AZ-277_se3_utils, AZ-272_fdr_record_schema, AZ-273_fdr_client_ringbuf
**Component**: c1_vio (epic AZ-254 / E-C1)
**Tracker**: AZ-333
**Epic**: AZ-254 (E-C1)
### Document Dependencies
- `_docs/02_document/contracts/c1_vio/vio_strategy_protocol.md` — the Protocol this task implements; produced by AZ-331.
- `_docs/02_document/contracts/shared_helpers/imu_preintegrator.md` — IMU substrate (AZ-276).
- `_docs/02_document/contracts/shared_helpers/se3_utils.md` — SE(3) ↔ pose-matrix utilities (AZ-277).
- `_docs/02_document/components/01_c1_vio/description.md` — § 5 implementation details + § 6 helpers; § 7 caveats note VINS-Mono is research-only.
- `_docs/02_document/components/01_c1_vio/tests.md` — § Component-Internal Tests note "VinsMono is research-only and exempt from MRE bound (only IT-12 comparative-study coverage)" (C1-IT-04).
## Problem
Without `VinsMonoStrategy`:
- The IT-12 comparative-study scenario (Mode B FT-P-04 / FT-P-05) cannot run all three VIO backends side-by-side; the research binary's purpose collapses to OKVIS2 + KltRansac only.
- Independent verification that OKVIS2 outperforms a comparable open-source loosely-coupled VIO on the Derkachi fixture has no producer; future architectural decisions to swap the production-default VIO have no comparative basis.
- The composition root's three-strategy switch is asymmetric — adding a fourth strategy in a future cycle would require revisiting the factory pattern instead of simply adding a fourth lazy branch.
- Consumers of `VioOutput` (C5 fusion) would have to be re-validated against a smaller dataset of behaviours; cross-strategy contract tests (deferred to Step 9 / E-BBT) lose a third data point.
This task delivers the comparative-research third strategy. Production binaries do NOT link it; only the IT-12 research binary loads it via `BUILD_VINS_MONO=ON`.
## Outcome
- A `VinsMonoStrategy` class at `src/gps_denied_onboard/components/c1_vio/vins_mono.py` conforming to the `VioStrategy` Protocol from AZ-331; `current_strategy_label() == "vins_mono"`.
- A pybind11 wrapper at `src/gps_denied_onboard/components/c1_vio/_native/vins_mono_binding.cpp` exposing the VINS-Mono C++ estimator (`vins_estimator::Estimator` or equivalent in the pinned upstream HEAD) to Python. The wrapper is built by CMake under `cpp/vins_mono/` (build-time gated by `BUILD_VINS_MONO`); the resulting `.so` is imported lazily inside `vins_mono.py`.
- Constructor `__init__(self, *, calibration: CameraCalibration, preintegrator: ImuPreintegrator, fdr_client: FdrClient, logger: Logger, config: VinsMonoConfig)` — all dependencies constructor-injected per ADR-009. `VinsMonoConfig` (`@dataclass(frozen=True)`) carries the VINS-Mono-specific knobs (sliding-window size, feature tracker thresholds, marginalisation strategy, max optimisation iterations) loaded from `config.vio.vins_mono.*` via AZ-269.
- `process_frame(frame, imu, calibration) -> VioOutput`:
1. Append IMU samples to the injected `ImuPreintegrator` (strict-monotonic guarded; `ImuPreintegrationError` rewraps to `VioFatalError`).
2. Feed the nav-camera frame to VINS-Mono via the pybind11 `add_image` wrapper.
3. If VINS-Mono emits a new estimator update, extract the relative pose (SE(3) via `helpers.se3_utils`), the 6×6 covariance from VINS-Mono's marginalised information matrix, the latest IMU bias, and the feature-quality summary.
4. Build and return `VioOutput` with `frame_id` echoed.
- `reset_to_warm_start(hint)`: tears down the current VINS-Mono estimator instance, constructs a fresh one, seeds the IMU bias and initial pose from `hint`. The next `config.vio.warm_start_max_frames` frames are allowed to converge before the strategy reports `state == TRACKING`.
- `health_snapshot()` returns `VioHealth(state, consecutive_lost, bias_norm)` derived from VINS-Mono's internal initialiser flag and feature-tracker health: `INIT` until the SfM bootstrap succeeds, `TRACKING` while the optimisation converges, `DEGRADED` when feature count drops below `config.vio.vins_mono.degraded_feature_threshold` or the marginalised information matrix's smallest eigenvalue drops below threshold, `LOST` after `config.vio.lost_frame_threshold` consecutive failed updates.
- The honest-covariance invariant is enforced behaviourally as in OKVIS2: VINS-Mono's marginalised covariance is read directly with no client-side floor or smoother.
- Error envelope is closed: every VINS-Mono / pybind11 / Eigen / Ceres exception is caught and rewrapped into the `VioError` family.
- All FDR records emitted via the injected `FdrClient` use the `kind="vio.health"` schema from AZ-272.
## Scope
### Included
- `VinsMonoStrategy` class implementation + the `VinsMonoConfig` dataclass + the `_native/vins_mono_binding.cpp` pybind11 wrapper.
- CMake target under `cpp/vins_mono/` that links the VINS-Mono upstream pin (BSD-3-Clause-style ROS license) and produces the binding `.so`. Build flag `BUILD_VINS_MONO`; default OFF for airborne / operator-tooling / replay-cli.
- The full `process_frame` / `reset_to_warm_start` / `health_snapshot` / `current_strategy_label` surface conforming to AZ-331's Protocol.
- IMU substrate via the constructor-injected `ImuPreintegrator` (AZ-276).
- Honest-covariance reading from VINS-Mono's marginalised information matrix.
- Lazy import of the `_native` binding inside `vins_mono.py`.
- Per-frame DEBUG log gated by `config.vio.per_frame_debug_log` (default OFF).
- WARN / ERROR / INFO logging per description.md § 9.
- Health-state transitions emitted as FDR records.
- Composition-root wiring (entry to the AZ-331 `build_vio_strategy` factory's `vins_mono` branch).
- VINS-Mono upstream's ROS dependency (if any) MUST be stripped or vendored — VINS-Mono historically ships as a ROS package; this task uses an upstream pin that has been de-ROSified (e.g., the `vins-mono-no-ros` community port) OR vendors only the `vins_estimator` / `feature_tracker` cores. The decision (which upstream to pin) is recorded as an ADR addendum if not already covered by Plan-phase pin selection.
### Excluded
- OKVIS2 strategy — separate task in this epic.
- KLT/RANSAC simple-baseline strategy — separate task in this epic.
- Warm-start hint persistence — separate task in this epic.
- C5 fusion of `VioOutput` — owned by E-C5.
- C13 FDR writer-thread — owned by E-C13.
- IMU preintegration mathematics — owned by AZ-276.
- The C1-IT-01..06 / C1-PT-01 tests themselves — deferred to Step 9 (E-BBT). Note: AC-2.2 MRE bound is exempt for VINS-Mono per `tests.md`.
- The IT-12 comparative-study harness — owned by suite-level test harness (Step 9 / E-BBT or test-spec extension).
- VINS-Mono upstream-source modifications beyond ROS-stripping — bug fixes upstream require a separate ADR.
- Multi-camera VINS-Mono — out of scope.
## Acceptance Criteria
**AC-1: `current_strategy_label()` returns `"vins_mono"`**
Given a `VinsMonoStrategy` constructed via the AZ-331 factory with `config.vio.strategy = "vins_mono"`
When `current_strategy_label()` is called
Then the returned string is exactly `"vins_mono"`
**AC-2: `process_frame` returns `VioOutput` with `frame_id` echoed**
Given a `NavCameraFrame` with `frame_id = "uuid-xyz"` and a populated `ImuWindow`
When `process_frame(frame, imu, calibration)` is called and reaches a successful estimator update
Then the returned `VioOutput.frame_id == "uuid-xyz"`; `pose_covariance_6x6` is symmetric and positive-definite; `imu_bias` is non-`None`
**AC-3: `process_frame` rewraps every backend exception into `VioError`**
Given a malformed input that triggers a VINS-Mono / pybind11 / Eigen / Ceres exception inside the backend
When `process_frame` is called
Then the raised exception is one of `VioInitializingError` / `VioDegradedError` / `VioFatalError`; the original exception is chained via `raise ... from`; no raw backend exception leaks
**AC-4: `reset_to_warm_start` clears state and seeds the hint**
Given a strategy with N processed frames
When `reset_to_warm_start(hint)` is called with a known `hint.bias` and `hint.body_T_world`
Then the next `process_frame` call's `VioOutput.imu_bias` reflects `hint.bias` (within numerical tolerance); calling `reset_to_warm_start` a second time without intervening frames does not raise
**AC-5: `health_snapshot()` reports `INIT` until SfM bootstrap completes**
Given a freshly-constructed strategy
When `health_snapshot()` is called before VINS-Mono's SfM bootstrap has succeeded
Then `state == INIT`; once bootstrap completes (typically 1020 frames per VINS-Mono behaviour), the next `health_snapshot()` returns `state == TRACKING`
**AC-6: `health_snapshot()` reports `DEGRADED` on feature loss**
Given a strategy in TRACKING state
When `process_frame` is fed a frame with feature count below `config.vio.vins_mono.degraded_feature_threshold` or with the marginalised information matrix's smallest eigenvalue below threshold
Then the returned `VioOutput.pose_covariance_6x6` Frobenius norm is strictly greater than the prior frame's; the next `health_snapshot()` returns `state == DEGRADED`; the strategy MUST emit a `VioOutput` (not raise)
**AC-7: Sustained loss raises `VioFatalError`**
Given a strategy in DEGRADED state
When `config.vio.lost_frame_threshold` consecutive frames fail to update
Then the next `process_frame` call raises `VioFatalError`; subsequent `health_snapshot()` returns `state == LOST`
**AC-8: `BUILD_VINS_MONO=OFF` does not import VINS-Mono native libs**
Given the airborne / operator-tooling / replay-cli binary built with `BUILD_VINS_MONO=OFF`
When `gps_denied_onboard.components.c1_vio` is imported
Then `sys.modules` does NOT contain `gps_denied_onboard.components.c1_vio.vins_mono` or any `_native.vins_mono_binding` entry; AZ-331's factory raises `StrategyNotAvailableError("vins_mono", missing_flag="BUILD_VINS_MONO")` if `vins_mono` is requested
**AC-9: Honest covariance — no shrinkage during DEGRADED**
Given a controlled-degradation 60 s synthetic input
When `process_frame` runs through the degradation event
Then `||pose_covariance_6x6||_F` is monotonically non-decreasing from the moment `health_snapshot().state` first transitions to `DEGRADED` until either `TRACKING` is restored or `LOST` is reached
**AC-10: FDR `vio.health` records emitted on every state transition**
Given the strategy is configured with a real `FdrClient` (or test double)
When `health_snapshot().state` transitions
Then exactly one FDR record with `kind="vio.health"` and the new state is emitted; no records on steady-state frames
## Non-Functional Requirements
**Performance**
- Per-frame latency budget: VINS-Mono is research-only and is NOT bound by C1-PT-01's ≤ 80 ms p95 target. Document VINS-Mono's actual p95 in the Step 9 / E-BBT comparative-study report (no hard threshold).
- Throughput: best-effort; expected to operate at 3 Hz on Tier-2 in the research binary but no failure threshold this cycle.
- CPU / memory: best-effort within the research binary's overall budget (research binary is not deployed; resource limits are looser).
**Compatibility**
- VINS-Mono upstream HEAD (de-ROSified port) pinned per Plan-phase. Upstream-source modifications beyond the ROS-strip require an explicit ADR addendum.
- pybind11 / Eigen / Ceres versions match the OKVIS2 build to avoid ABI conflicts inside the same research binary.
**Reliability**
- Error envelope closed at the `VioError` family; no raw VINS-Mono / Ceres / Eigen exceptions cross the Python boundary.
- Single-threaded by Protocol contract; one instance per camera ingest thread inside the research binary.
- AC-2.2 MRE bound is **exempt** per `tests.md` C1-IT-04 — VINS-Mono is research-only; no per-frame MRE assertion in this task's tests.
**Concurrency**
- One `VinsMonoStrategy` instance per research-binary camera ingest thread.
## Unit Tests
| AC Ref | What to Test | Required Outcome |
|--------|-------------|-----------------|
| AC-1 | `current_strategy_label()` after factory build with `vins_mono` config | Returns `"vins_mono"` |
| AC-2 | `process_frame` with a fixture frame + IMU window | `VioOutput.frame_id` echoed; covariance SPD; `imu_bias` non-None |
| AC-3 | Inject a malformed frame that triggers a backend exception (mocked binding) | `VioError`-family exception raised; original chained via `__cause__` |
| AC-4 | `reset_to_warm_start` then `process_frame` × N | Bias reflects hint; second `reset_to_warm_start` does not raise |
| AC-5 | Cold construct → process N frames | `INIT` until SfM bootstrap completes; then `TRACKING` |
| AC-6 | Feed degraded fixture | Covariance Frobenius norm strictly increases; `health_snapshot` returns `DEGRADED`; `VioOutput` IS emitted |
| AC-7 | `lost_frame_threshold` consecutive failed frames | `VioFatalError` on next `process_frame`; `health_snapshot` returns `LOST` |
| AC-8 | `BUILD_VINS_MONO=OFF` import + factory call | Module not in `sys.modules`; factory raises `StrategyNotAvailableError` |
| AC-9 | 60 s controlled-degradation synthetic | Covariance Frobenius norm monotonically non-decreasing during DEGRADED window |
| AC-10 | Real / fake `FdrClient` spy through state transitions | Exactly one `vio.health` record per transition |
| NFR-reliability-error-envelope | Raise each backend exception type via mock | All caught and rewrapped to `VioError` family |
| NFR-perf-document | Microbench `process_frame` on Derkachi fixture (research binary) | Document p50/p95 in the Step 9 comparative-study report (no hard threshold) |
## Constraints
- This task implements (does NOT define) the AZ-331 Protocol; signature mismatch is a Spec-Gap finding (High) at code-review.
- The pybind11 binding lives under `_native/` per `module-layout.md`; lazy-imported inside `vins_mono.py`.
- VINS-Mono native source lives under `cpp/vins_mono/` per `module-layout.md` rule #4. The chosen upstream MUST be ROS-free at the source level (either upstream port or in-tree ROS-strip).
- The strategy MUST consume IMU via the AZ-276 `ImuPreintegrator` helper; constructing a second IMU integration path is forbidden.
- This task introduces no new third-party dependencies beyond VINS-Mono + pybind11 + Eigen + Ceres (the Ceres dependency is unique to VINS-Mono among the three strategies; it is pinned via `cpp/vins_mono/CMakeLists.txt` and excluded from airborne / operator-tooling / replay-cli builds because `BUILD_VINS_MONO=OFF` for those binaries).
- Per-frame DEBUG logging defaults OFF.
- The strategy MUST NOT apply a covariance floor or smoother on the read path.
- AC-2.2 MRE bound is exempt per the C1 component's `tests.md`; the test task in Step 9 / E-BBT will configure C1-IT-04 to exclude VINS-Mono.
## Risks & Mitigation
**Risk 1: VINS-Mono upstream ships as a ROS package and ROS deps leak into the research binary**
- *Risk*: A naive vendored VINS-Mono pulls in `roscpp`, `rosbag`, etc., bloating the research binary and creating a build-time mess.
- *Mitigation*: The chosen upstream pin is a de-ROSified community port (or in-tree ROS-strip applied during the CMake build under `cpp/vins_mono/`). If a clean port does not exist at Plan-phase pin time, this task's Plan-phase decision records the chosen approach; CI's research-binary SBOM step asserts no ROS package leaks.
**Risk 2: Ceres + Eigen ABI conflict with OKVIS2's Eigen pin**
- *Risk*: VINS-Mono uses Ceres (for nonlinear optimisation); OKVIS2 also uses Eigen heavily. ABI mismatch between the two builds in the same binary produces silent corruption.
- *Mitigation*: Both `cpp/okvis2/CMakeLists.txt` and `cpp/vins_mono/CMakeLists.txt` link the same Eigen pin from `cpp/_third_party/eigen/`. The research binary's CMake build is the only place both load simultaneously; CI's research build asserts the linked Eigen version with `ldd`-style introspection.
**Risk 3: `BUILD_VINS_MONO=ON` accidentally enabled in a deployment binary**
- *Risk*: A misconfigured build flag could ship VINS-Mono to a deployed Jetson, blowing the binary size and adding an attack surface.
- *Mitigation*: `module-layout.md` Build-Time Exclusion Map locks `BUILD_VINS_MONO=OFF` for airborne / operator-tooling / replay-cli; CI's per-binary SBOM diff (`ci/sbom_diff.py`) fails if `vins_mono` appears in any non-research SBOM. The composition root validator additionally raises `ConfigurationError` at startup if `config.vio.strategy="vins_mono"` is requested in the airborne binary.
**Risk 4: VINS-Mono's loosely-coupled covariance is over-confident vs OKVIS2's tightly-coupled**
- *Risk*: The IT-12 comparative study could mislead an architect into picking VINS-Mono if its covariance under-reports.
- *Mitigation*: AC-9 honest-covariance enforcement applies to VINS-Mono too; the IT-12 report (Step 9 / E-BBT) compares calibrated covariances side-by-side; D-C5-5 captures the architectural decision that production stays on OKVIS2.
## Runtime Completeness
- **Named capability**: VINS-Mono loosely-coupled VIO + sliding-window optimisation + marginalised information matrix → 6×6 covariance (architecture / E-C1 / `solution.md` "research-only IT-12 comparative-study" / D-C5-3 sliding window context).
- **Production code that must exist**: real `VinsMonoStrategy` class implementing the AZ-331 Protocol; real pybind11 binding to `cpp/vins_mono/` (real VINS-Mono upstream, de-ROSified); real per-frame estimator update; real covariance read from VINS-Mono's marginalised information matrix; real bias propagation through AZ-276.
- **Allowed external stubs**: tests MAY use a fake pybind11 binding that returns scripted `VioOutput` payloads (AC-3 / AC-6 / AC-7); production wiring (research binary only) uses the real VINS-Mono upstream.
- **Unacceptable substitutes**: a pure-Python VINS-Mono re-implementation (would defeat the whole point of comparative study); skipping the AZ-276 `ImuPreintegrator` (would break the single-IMU-truth invariant); a covariance floor on the read path; shipping VINS-Mono in a deployment binary by default.