# C1 OKVIS2 Strategy — Production-Default VIO **Task**: AZ-332_c1_okvis2_strategy **Name**: C1 OKVIS2 Strategy **Description**: Implement `Okvis2Strategy`, the production-default `VioStrategy` for E-C1. The class is a Python facade over the OKVIS2 C++ tightly-coupled keyframe-based VIO core (sliding window of K=10–20 keyframes per D-C5-3) accessed via a pybind11 wrapper around `cpp/okvis2/`. The strategy owns the per-flight OKVIS2 estimator instance, feeds it nav-camera frames + IMU samples (via the AZ-276 `ImuPreintegrator` helper for the GTSAM `CombinedImuFactor` substrate that C5 also reads), and emits `VioOutput` with honest 6×6 covariance per AC-1.4 and per-frame `VioHealth`. Per `_docs/02_document/components/01_c1_vio/description.md` § 5: per-frame cost is dominated by feature extraction + matching, sliding-window optimisation is `O(F·log K)`; per-frame p95 latency must stay ≤ 80 ms on Tier-2 with C2 backbone running concurrently (C1-PT-01). Build-time gated by `BUILD_OKVIS2`. **Complexity**: 5 points **Dependencies**: AZ-331_c1_vio_strategy_protocol, AZ-263_initial_structure, AZ-269_config_loader, AZ-266_log_module, AZ-276_imu_preintegrator, AZ-277_se3_utils, AZ-272_fdr_record_schema, AZ-273_fdr_client_ringbuf **Component**: c1_vio (epic AZ-254 / E-C1) **Tracker**: AZ-332 **Epic**: AZ-254 (E-C1) ### Document Dependencies - `_docs/02_document/contracts/c1_vio/vio_strategy_protocol.md` — the Protocol this task implements; produced by AZ-331. - `_docs/02_document/contracts/shared_helpers/imu_preintegrator.md` — IMU substrate (AZ-276); consumer of the GTSAM `CombinedImuFactor` per-keyframe. - `_docs/02_document/contracts/shared_helpers/se3_utils.md` — SE(3) ↔ pose-matrix conversion utilities (AZ-277). - `_docs/02_document/components/01_c1_vio/description.md` — § 5 implementation details + § 6 helpers + § 7 caveats (Okvis2 latency spike behaviour under thermal throttle). ## Problem Without a production-default `Okvis2Strategy`: - The default airborne binary cannot operate — only the KLT/RANSAC simple-baseline (mandatory engine-rule path) would be available, and C1-PT-01 / AC-2.2 frame-to-frame MRE bounds were specified against OKVIS2. - The honest 6×6 covariance contract (AC-1.4 / AC-NEW-4) loses its production producer; KLT/RANSAC's covariance is a documented degraded fallback, not the primary signal C5's iSAM2 graph fuses. - D-CROSS-LATENCY-1's hybrid covariance auto-degrade decision in C4 has no `VioHealth` source-of-truth at production-quality numbers. - The architecture's "tightly-coupled VIO with sliding-window optimisation" claim becomes documentation-only. - Mode-B FT-P-04 / FT-P-05 suite-level scenarios cannot run against the production stack; FT-P-04 expects ≥ 95 % tracked-frame ratio on the Derkachi normal segment. This task delivers the canonical production VIO. The other two strategies (VINS-Mono research-only, KLT/RANSAC simple-baseline) are separate tasks; the contract task (AZ-331) defines the boundary all three share. ## Outcome - An `Okvis2Strategy` class at `src/gps_denied_onboard/components/c1_vio/okvis2.py` conforming to the `VioStrategy` Protocol from AZ-331; `current_strategy_label() == "okvis2"`. - A pybind11 wrapper at `src/gps_denied_onboard/components/c1_vio/_native/okvis2_binding.cpp` exposing the OKVIS2 C++ estimator (`okvis::ThreadedKFVio` or equivalent in the pinned upstream HEAD) to Python. The wrapper is built by CMake under `cpp/okvis2/` (build-time gated by `BUILD_OKVIS2`); the resulting `.so` is imported lazily inside `okvis2.py`. - Constructor `__init__(self, *, calibration: CameraCalibration, preintegrator: ImuPreintegrator, fdr_client: FdrClient, logger: Logger, config: Okvis2Config)` — all dependencies constructor-injected per ADR-009. `Okvis2Config` (`@dataclass(frozen=True)`) carries the OKVIS2-specific knobs (sliding-window size K ∈ [10, 20], keyframe-decision parallax threshold, RANSAC inlier ratio, max optimisation iterations) loaded from `config.vio.okvis2.*` via AZ-269. - `process_frame(frame, imu, calibration) -> VioOutput`: 1. Append IMU samples to the injected `ImuPreintegrator` (strict-monotonic guarded; `ImuPreintegrationError` rewraps to `VioFatalError`). 2. Feed the nav-camera frame to OKVIS2 via the pybind11 `add_frame` wrapper. 3. If OKVIS2 emits a new estimator update, extract the relative pose (SE(3) via `helpers.se3_utils`), the 6×6 covariance from OKVIS2's internal Hessian (or marginalised block per upstream API), the latest IMU bias, and the feature-quality summary (tracked / new / lost / mean parallax / per-frame MRE). 4. Build and return `VioOutput` with `frame_id` echoed. 5. Emit per-frame DEBUG log (off by default) with backbone identity + elapsed milliseconds; emit WARN log when degraded covariance is detected (per `health_snapshot` heuristic); emit ERROR log on `VioFatalError`. - `reset_to_warm_start(hint)`: tears down the current OKVIS2 estimator instance (releases C++ resources), constructs a fresh estimator, seeds the IMU bias from `hint.bias`, seeds the initial body-to-world pose from `hint.body_T_world`, and seeds the velocity from `hint.velocity_b`. The next `config.vio.warm_start_max_frames` frames are allowed to converge before the strategy reports `state == TRACKING` (AC-5.1). Calling `reset_to_warm_start` is idempotent across consecutive calls (the second call re-resets cleanly). - `health_snapshot()` returns `VioHealth(state, consecutive_lost, bias_norm)` derived from OKVIS2's internal tracker state: `INIT` until enough keyframes are accumulated, `TRACKING` while the optimisation converges, `DEGRADED` when feature count drops below `config.vio.okvis2.degraded_feature_threshold` or covariance Frobenius norm exceeds 2× steady-state, `LOST` after `config.vio.lost_frame_threshold` consecutive frames without a successful update. - The honest-covariance invariant (Protocol Invariant) is enforced behaviourally: the strategy MUST NOT shrink the reported covariance during a `DEGRADED` window (the OKVIS2 estimator's covariance is read directly; no smoothing or floor is applied that would mask degradation). - Error envelope is closed: every OKVIS2 / pybind11 / Eigen exception is caught inside `process_frame` / `reset_to_warm_start` and rewrapped into the `VioError` family (`VioInitializingError` while INIT, `VioFatalError` on backend-init failure or sustained LOST). - All FDR records emitted via the injected `FdrClient` use the `kind="vio.health"` schema from AZ-272; per-frame DEBUG goes to stdout/journald only (per description.md § 9 logging strategy). ## Scope ### Included - `Okvis2Strategy` class implementation + the `Okvis2Config` dataclass + the `_native/okvis2_binding.cpp` pybind11 wrapper. - CMake target under `cpp/okvis2/` that links the OKVIS2 upstream pin (BSD-3-Clause) and produces the binding `.so`. Build flag `BUILD_OKVIS2`. - The full `process_frame` / `reset_to_warm_start` / `health_snapshot` / `current_strategy_label` surface conforming to AZ-331's Protocol. - IMU substrate via the constructor-injected `ImuPreintegrator` (AZ-276); this strategy never imports GTSAM directly. - Honest-covariance reading from OKVIS2's internal estimator state (no client-side smoothing). - Lazy import of the `_native` binding inside `okvis2.py` so a Tier-0 build with `BUILD_OKVIS2=OFF` does not force the OKVIS2 native lib to be present. - Per-frame DEBUG log gated by `config.vio.per_frame_debug_log` (default OFF). - WARN / ERROR / INFO logging per description.md § 9. - Health-state transitions emitted as FDR records via the `kind="vio.health"` schema. - Composition-root wiring (entry to the AZ-331 `build_vio_strategy` factory's `okvis2` branch). - Standalone microbench script `python -m gps_denied_onboard.components.c1_vio.bench.okvis2 ` for C1-PT-01 latency measurements (referenced by Step 9 / E-BBT perf tests, not implemented as the test itself here — only the benchable surface). ### Excluded - VINS-Mono strategy — separate task in this epic. - KLT/RANSAC simple-baseline strategy — separate task in this epic. - Warm-start hint persistence (write at takeoff, read at F8 reboot) — separate task in this epic; this strategy only consumes a constructed `WarmStartPose`. - C5 fusion of `VioOutput` — owned by E-C5 (AZ-260). - C13 FDR writer-thread / segment rotation — owned by E-C13 (AZ-248); this strategy only emits via the producer-side `FdrClient`. - IMU preintegration mathematics — owned by AZ-276. - The C1-IT-01..06 / C1-PT-01 tests themselves — deferred to Step 9 (E-BBT) per greenfield flow Step 6 rule. - Honest-covariance contract test that sweeps all three strategies — that's a Step 9 / E-BBT cross-strategy test (epic child issue #7), not part of this single-strategy task. - OKVIS2 upstream-source modifications — upstream HEAD is pinned per Plan-phase; deviations require an explicit ADR. - Multi-camera OKVIS2 — out of scope (single nav-camera per RESTRICT-UAV-3). ## Acceptance Criteria **AC-1: `current_strategy_label()` returns `"okvis2"`** Given an `Okvis2Strategy` constructed via the AZ-331 factory with `config.vio.strategy = "okvis2"` When `current_strategy_label()` is called Then the returned string is exactly `"okvis2"` **AC-2: `process_frame` returns `VioOutput` with `frame_id` echoed** Given a `NavCameraFrame` with `frame_id = "uuid-abc"` and a populated `ImuWindow` When `process_frame(frame, imu, calibration)` is called and reaches a successful estimator update Then the returned `VioOutput.frame_id == "uuid-abc"`; `pose_covariance_6x6` is symmetric and positive-definite; `imu_bias` is non-`None` **AC-3: `process_frame` rewraps every backend exception into `VioError`** Given a malformed input that triggers an OKVIS2 / pybind11 / Eigen exception inside the backend When `process_frame` is called Then the raised exception is one of `VioInitializingError` / `VioDegradedError` / `VioFatalError`; the original exception is chained via `raise ... from`; no raw `RuntimeError` / `ValueError` from the backend leaks to the caller **AC-4: `reset_to_warm_start` clears state and seeds the hint** Given a strategy with N processed frames and a non-default IMU bias When `reset_to_warm_start(hint)` is called with a known `hint.bias` and `hint.body_T_world` Then the next `process_frame` call's `VioOutput.imu_bias` reflects `hint.bias` (within numerical tolerance) and the resulting `relative_pose_T` is consistent with starting from `hint.body_T_world`; calling `reset_to_warm_start` a second time without intervening frames does not raise **AC-5: `health_snapshot()` reports `INIT` initially and `TRACKING` after warm-up** Given a freshly-constructed strategy When `health_snapshot()` is called before any `process_frame` invocation Then `state == INIT`; after `config.vio.warm_start_max_frames` (default 5) successful `process_frame` calls on a normal-segment fixture, the next `health_snapshot()` returns `state == TRACKING` **AC-6: `health_snapshot()` reports `DEGRADED` on feature loss** Given a strategy in TRACKING state When `process_frame` is fed a frame with feature count below `config.vio.okvis2.degraded_feature_threshold` Then the returned `VioOutput.pose_covariance_6x6` Frobenius norm is strictly greater than the prior frame's; the next `health_snapshot()` returns `state == DEGRADED`; the strategy MUST emit a `VioOutput` (not raise) so C5 can down-weight rather than fall back **AC-7: Sustained loss raises `VioFatalError`** Given a strategy in DEGRADED state When `config.vio.lost_frame_threshold` (default 9) consecutive frames fail to update the estimator Then the next `process_frame` call raises `VioFatalError`; subsequent `health_snapshot()` returns `state == LOST`; the AC-5.2 fallback path (FC IMU-only after 3 s) is the consumer's responsibility **AC-8: `BUILD_OKVIS2=OFF` does not import OKVIS2 native libs** Given the binary is built with `BUILD_OKVIS2=OFF` When `gps_denied_onboard.components.c1_vio` is imported (NOT the `okvis2` submodule directly) Then `sys.modules` does NOT contain `gps_denied_onboard.components.c1_vio.okvis2` or any `_native.okvis2_binding` entry; AZ-331's factory raises `StrategyNotAvailableError("okvis2", missing_flag="BUILD_OKVIS2")` if `okvis2` is requested **AC-9: Honest covariance — no shrinkage during DEGRADED** Given a controlled-degradation 60 s synthetic input (same source as the deferred C1-IT-01 test fixture) When `process_frame` runs through the degradation event Then `||pose_covariance_6x6||_F` is monotonically non-decreasing from the moment `health_snapshot().state` first transitions to `DEGRADED` until either `TRACKING` is restored or `LOST` is reached; this is enforced by reading OKVIS2's internal covariance directly without any client-side floor or smoother **AC-10: FDR `vio.health` records emitted on every state transition** Given the strategy is configured with a real `FdrClient` (or test double) When `health_snapshot().state` transitions (`INIT → TRACKING`, `TRACKING → DEGRADED`, `DEGRADED → LOST`, etc.) Then exactly one FDR record with `kind="vio.health"` and the new state is emitted via the `FdrClient.emit` API; no records are emitted on steady-state frames ## Non-Functional Requirements **Performance** - `process_frame` p95 ≤ 80 ms on Tier-2 with C2 backbone running concurrently (C1-PT-01 / NFT-PERF-01 component partition); failure threshold 120 ms. - `process_frame` p50 ≤ 25 ms on Tier-2 (description.md C1-PT-01). - Throughput ≥ 3 Hz sustained; failure threshold < 2.5 Hz. - CPU ≤ 30 % of one core; memory ≤ 1.5 GB resident (description.md § 6 + epic NFR). **Compatibility** - OKVIS2 upstream HEAD pinned per Plan-phase. No upstream-source modifications. - pybind11 version matches the OKVIS2 / VINS-Mono / GTSAM build (description.md § 5 dependency table). - Eigen version matches OKVIS2 / GTSAM pin. **Reliability** - The error envelope is closed at the `VioError` family. No raw OKVIS2 / pybind11 / Eigen exceptions cross the Python boundary. - `process_frame` is idempotent w.r.t. state when it raises: a raised exception leaves the estimator in a recoverable state; the next valid frame integrates as if the bad one never came. - The strategy is single-threaded by Protocol contract; the composition root binds one instance to the camera ingest thread. **Concurrency** - One `Okvis2Strategy` instance per camera ingest thread; concurrent calls to `process_frame` on the same instance are undefined behaviour (matches Protocol invariant). - The injected `ImuPreintegrator` is also single-threaded; the same composition-root binding rule applies. ## Unit Tests | AC Ref | What to Test | Required Outcome | |--------|-------------|-----------------| | AC-1 | `current_strategy_label()` after factory build with `okvis2` config | Returns `"okvis2"` | | AC-2 | `process_frame` with a fixture frame + IMU window | `VioOutput.frame_id` echoed; covariance SPD; `imu_bias` non-None | | AC-3 | Inject a malformed frame that triggers a backend exception (mocked binding) | `VioError`-family exception raised; original chained via `__cause__` | | AC-4 | `reset_to_warm_start` then `process_frame` × N | Bias reflects hint; second `reset_to_warm_start` does not raise | | AC-5 | Cold construct → `health_snapshot` × N | `INIT` initially; `TRACKING` after `warm_start_max_frames` | | AC-6 | Feed degraded fixture | Covariance Frobenius norm strictly increases; `health_snapshot` returns `DEGRADED`; `VioOutput` IS emitted (not raised) | | AC-7 | Fed `lost_frame_threshold` consecutive failed frames | `VioFatalError` on the next `process_frame`; `health_snapshot` returns `LOST` | | AC-8 | `BUILD_OKVIS2=OFF` import + factory call | Module not in `sys.modules`; factory raises `StrategyNotAvailableError` | | AC-9 | 60 s controlled-degradation synthetic | Covariance Frobenius norm monotonically non-decreasing during DEGRADED window | | AC-10 | Real / fake `FdrClient` spy through state transitions | Exactly one `vio.health` record per transition; no spam on steady-state | | NFR-perf | C1-PT-01 microbench against the Derkachi normal segment fixture (Tier-2) | p95 ≤ 80 ms; p50 ≤ 25 ms; throughput ≥ 3 Hz | | NFR-reliability-error-envelope | Raise each backend exception type via mock binding; assert no leakage | All caught and rewrapped to `VioError` family | ## Constraints - This task implements (does NOT define) the AZ-331 Protocol; any signature mismatch is a Spec-Gap finding (High) per code-review skill Phase 2. - The pybind11 binding lives under `_native/` per `module-layout.md`; the `.so` import path is CMake-known and lazy-imported inside `okvis2.py`. - OKVIS2 native source lives under `cpp/okvis2/` (parallel to `src/`, NOT nested inside the Python package), per `module-layout.md` rule #4. - The strategy MUST consume IMU via the AZ-276 `ImuPreintegrator` helper; constructing a second IMU integration path is forbidden (defeats the "single source of IMU truth" invariant). - This task introduces no new third-party dependencies beyond OKVIS2 + pybind11 + Eigen (already pinned). - Per-frame DEBUG logging defaults OFF (would flood at 3 Hz); enabled only via `config.vio.per_frame_debug_log`. - The strategy MUST NOT apply a covariance floor or smoother on the read path — honest covariance is the safety floor for AC-NEW-4; smoothing is a Risks-and-Mitigation discussion only. - The `Okvis2Config` schema extension to AZ-269 is owned by this task; the field set is documented above. ## Risks & Mitigation **Risk 1: OKVIS2 latency spike on thermally-throttled Jetson breaks AC-4.1** - *Risk*: description.md § 7 notes OKVIS2's sliding-window optimisation can spike to 80–120 ms on a thermally-throttled Jetson; the C1-PT-01 p95 ≤ 80 ms budget is the wire boundary. - *Mitigation*: D-CROSS-LATENCY-1 hybrid auto-degrades **C4** covariance recovery (not C1) under thermal throttle, freeing budget. This task does NOT implement thermal-aware behaviour — it just measures and reports latency; the C4 task owns the degradation decision. AC-9 covers the honest-covariance side; AC-NFR-perf measures the latency. **Risk 2: pybind11 type marshalling overhead dominates the per-frame budget** - *Risk*: Marshalling a 5472×3648×3 uint8 frame across the Python ↔ C++ boundary on every `process_frame` could add 10s of ms. - *Mitigation*: The pybind11 binding accepts the frame as a `numpy.ndarray` with `py::array::c_style | py::array::forcecast` so the data buffer is shared (zero-copy on `c_style`-aligned input). The composition root binds the camera ingest path to emit `c_style` buffers (handled in `frame_source/LiveCameraFrameSource`, AZ-265 cycle-1 deliverable). If the zero-copy path is broken, AC-NFR-perf microbench shows it immediately. **Risk 3: OKVIS2 internal covariance is reported in a frame-convention C5 does not expect** - *Risk*: OKVIS2 reports covariance in its own body-frame; C5 expects body-to-world. A frame-convention bug would silently produce wrong covariance to iSAM2. - *Mitigation*: The strategy uses `helpers.se3_utils` (AZ-277) to convert OKVIS2's frame to the canonical body-to-world convention; the conversion is unit-tested at the helper level and asserted by AC-2 (covariance SPD) + the deferred C1-IT-02 (cross-strategy invariants test). **Risk 4: OKVIS2 BSD-3-Clause license attribution missed** - *Risk*: Failing to include OKVIS2's license notice in the airborne binary's NOTICE file violates BSD-3-Clause. - *Mitigation*: The CMake target under `cpp/okvis2/` includes the upstream LICENSE file in the build artifact's NOTICE bundle; CI's SBOM step (existing infra) verifies presence. Tracked in the project NOTICE generation pipeline (out of scope here). ## Runtime Completeness - **Named capability**: OKVIS2 tightly-coupled keyframe-based VIO + sliding-window optimisation + honest 6×6 covariance via OKVIS2's internal Hessian (architecture / E-C1 / `solution.md` "Strategy: Okvis2 production-default" / D-C5-3). - **Production code that must exist**: real `Okvis2Strategy` class implementing the AZ-331 Protocol; real pybind11 binding to `cpp/okvis2/` (real OKVIS2 upstream, not a mock); real per-frame OKVIS2 estimator update; real covariance read from OKVIS2's internal Hessian; real bias propagation through the AZ-276 `ImuPreintegrator`. - **Allowed external stubs**: tests MAY use a fake pybind11 binding that returns scripted `VioOutput` payloads (AC-3 / AC-6 / AC-7 use this for backend-exception injection); production wiring uses the real OKVIS2 upstream pinned by Plan-phase. - **Unacceptable substitutes**: a Python-level "OKVIS2" wrapper that re-implements the optimisation loop in pure Python (would defeat C1-PT-01 ≤ 80 ms p95); a covariance floor or smoother on the read path (would break AC-9 honest-covariance contract); skipping the AZ-276 `ImuPreintegrator` and integrating IMU samples internally (would break the single-IMU-truth invariant); using a pre-built deterministic-fallback `VioOutput` while OKVIS2 is "compiled out" (would silently break C5 fusion at deploy time without the BUILD-flag gate firing first).