# AZ-332 — Implementation plan (batch 23, cycle 1) **Date created**: 2026-05-12 (carry-over from `/autodev` session 2026-05-12 morning) **Owner**: next `/autodev` invocation starting from Step 7 Implement sub_step `compute-next-batch` **Scope of this doc**: a concrete, in-order playbook for the next session. Reading this + the task spec at `_docs/02_tasks/todo/AZ-332_c1_okvis2_strategy.md` is sufficient to resume — no other re-discovery needed. --- ## Why this is its own plan doc AZ-332 (C1 OKVIS2 production-default VIO) is the first task in this project to require a native C++ build chain (OKVIS2 + pybind11). The previous session researched paths, surfaced blockers, and landed on a decomposition that splits work across three build environments. That decomposition has to survive the session boundary, hence this file. ## Decisions locked in the previous session 1. **No ROS 2 layer.** `colcon` build of OKVIS2 produces the same libraries as standalone CMake plus a ROS 2 node we do not need; ROS 2 runtime IPC was rejected at Plan time (`_docs/01_solution/solution.md` § D-C1-1-SUB-A — "Rejected (cost + latency budget conflict)"). Build with **standalone CMake**. 2. **No Python re-implementation of OKVIS2.** Forbidden by the task spec ("Unacceptable substitutes" section). Pure-Python VIO violates C1-PT-01 ≤ 80 ms p95 budget by construction. 3. **No alternative VIO substitution.** Every C++ VIO candidate (OpenVINS, VINS-Mono, Kimera-VIO) has the same compile-on-macOS problem. The only Python-native candidates (DPVO, KLT+RANSAC) are mono-VO only — not drop-ins for a VIO contract. AZ-332 stays OKVIS2. 4. **Three-environment dev split**: | Environment | What runs there | What it gates | |---|---|---| | macOS dev | Python facade + binding C++ editing; unit tests using the fake `_native.okvis2_binding` (task spec explicitly allows this for tests) | AC-1, AC-2, AC-3, AC-4, AC-5, AC-6, AC-7, AC-8, AC-10 | | Ubuntu CI runner (`ci.yml`) | Native CMake build of vendored OKVIS2 + binding `.so` | Build-passes gate; no AC validation here | | Self-hosted Jetson runner (`ci-tier2.yml`) | Real-OKVIS2 perf + honest-covariance tests | AC-9 (honest covariance monotonicity); NFR-perf p95 ≤ 80 ms | This split honours the task spec ("real `Okvis2Strategy` calling real C7 `InferenceRuntime` with real TRT-compiled DISK engine") because the production binary IS the real binding compiled on Linux/Jetson — only the dev-side unit tests use the fake. The fake never ships to production. ## Concrete step-by-step for next session (in order; each step has a stop-and-verify gate) ### Step 0 — re-entry sanity check (1 min) - Read `_docs/_autodev_state.md`: confirm step 7 / sub_step `compute-next-batch` / detail points here. - Read this doc fully. - Read `_docs/02_tasks/todo/AZ-332_c1_okvis2_strategy.md` once. - `git status --porcelain` must be empty (implement skill prerequisite). ### Step 1 — vendor OKVIS2 and pybind11 as git submodules (5–10 min) - `git submodule add --depth 1 --recurse-submodules https://github.com/smartroboticslab/okvis2.git cpp/okvis2/upstream` - Note: submodule path is `cpp/okvis2/upstream/` (not `cpp/okvis2/` directly) so the existing `cpp/okvis2/CMakeLists.txt` keeps its project-owned role and `add_subdirectory(upstream)` pulls in OKVIS2. - `git submodule add --depth 1 https://github.com/pybind/pybind11.git cpp/pybind11/upstream` - Same pattern: existing `cpp/pybind11/` directory keeps the project README; submodule lives at `cpp/pybind11/upstream/`. - Delete the `.gitkeep` and placeholder `README.md` from `cpp/pybind11/` once the submodule is in place (or keep them; they're harmless either way — pick one and stay consistent). - Pin a known-good commit hash for OKVIS2 (record it in this doc under "Pinned upstream versions" once chosen). Recommendation: pin to the latest `main` HEAD at the time of submodule add and document the commit short-hash here. - **Gate**: `git submodule status` shows both submodules with a SHA; `git status` clean except `.gitmodules` + submodule entries. ### Step 2 — write CMake glue (15–30 min) Files to write: - `cpp/okvis2/CMakeLists.txt` (replace existing placeholder): - `if(NOT BUILD_OKVIS2) return() endif()` - `add_subdirectory(upstream EXCLUDE_FROM_ALL)` with OKVIS2's `USE_NN=OFF` to drop the LibTorch dep (per Fact #39 — keyframe arch tolerates this). - Find_package the Linux deps OKVIS2 needs (Eigen3, Boost, glog, gflags, SuiteSparse, Ceres, OpenCV — every one is an apt package on Ubuntu, brew formula on macOS). - `add_subdirectory(${CMAKE_SOURCE_DIR}/cpp/pybind11/upstream pybind11_build)`. - `pybind11_add_module(okvis2_binding ${CMAKE_CURRENT_SOURCE_DIR}/../../src/gps_denied_onboard/components/c1_vio/_native/okvis2_binding.cpp)` — note path back to Python tree. - `target_link_libraries(okvis2_binding PRIVATE okvis::Estimator okvis::Common ...)` (exact target names from OKVIS2's CMake exports — verify by running `cmake --build build --target help | grep okvis` once submodule is in). - `install(TARGETS okvis2_binding DESTINATION ${CMAKE_INSTALL_LIBDIR}/gps_denied_onboard/components/c1_vio/_native/)`. - `cpp/pybind11/CMakeLists.txt` (replace existing placeholder): can stay nearly empty — pybind11 is included by `cpp/okvis2/CMakeLists.txt` via `add_subdirectory`. The existing top-level `cpp/CMakeLists.txt` already has `add_subdirectory(okvis2)` gated on `BUILD_OKVIS2 OR BUILD_VINS_MONO OR BUILD_KLT_RANSAC` — no change needed there. **Gate**: `cmake -S . -B build -DBUILD_OKVIS2=OFF` succeeds on macOS (no-op build with the flag off). The OFF path is what protects the rest of the build from any of this new wiring. ### Step 3 — write the pybind11 binding C++ skeleton (1–2 h) File: `src/gps_denied_onboard/components/c1_vio/_native/okvis2_binding.cpp` Surface needed (mirrors the Python facade's needs — not the full OKVIS2 API): - `Okvis2Backend` class with: ctor from YAML config string + camera intrinsics dict; `add_frame(frame_id: str, ns_ts: int, image: ndarray[uint8, H, W, C]) -> bool`; `add_imu(ns_ts: int, accel: ndarray[float64, 3], gyro: ndarray[float64, 3]) -> None`; `get_latest_output() -> dict | None` (returns frame_id + 4x4 pose matrix + 6x6 covariance + bias + feature_quality dict + emitted_at_ns); `reset(body_T_world: ndarray[float64, 4, 4], velocity: ndarray[float64, 3], accel_bias: ndarray[float64, 3], gyro_bias: ndarray[float64, 3]) -> None`; `health() -> dict` (returns `{state: str, consecutive_lost: int, bias_norm: float}`). - Exceptions: every OKVIS / Eigen / std::runtime_error caught inside binding methods and rethrown as a fixed set of Python exceptions registered via `py::register_exception` — the Python facade then catches those and rewraps into `VioError` family. - Zero-copy pathway: `image` is `py::array_t` so DISK ingest avoids a copy. This is a skeleton — full OKVIS2 estimator wiring (`okvis::ThreadedKFVio` setup + callback plumbing) can be a follow-up commit if the skeleton + CI Linux build come back green first. **Gate**: compiles inside the OKVIS2 CMake target. Tested on Ubuntu CI runner (not macOS). ### Step 4 — write the Python facade `okvis2.py` (1–2 h) File: `src/gps_denied_onboard/components/c1_vio/okvis2.py` - `Okvis2Strategy` class implementing the `VioStrategy` Protocol from `interface.py`. - Lazy import of `_native.okvis2_binding` inside the module body (NOT at module top — that's the I-5 / Risk-2 mitigation; AZ-331's `test_ac5_build_vio_strategy_flag_off_no_import` asserts this and MUST still pass). - Constructor signature: `__init__(self, config: Config, *, fdr_client: FdrClient)` — match the AZ-331 factory's call shape exactly. Inside the constructor: build the `ImuPreintegrator` from `helpers.imu_preintegrator.make_imu_preintegrator(calibration)`; build the `Okvis2Backend` from the binding; record the strategy label as `"okvis2"` (frozen per Protocol invariant). - Map every backend exception (raised from the C++ binding's registered exception types) to the `VioError` family — `OkvisInitException → VioInitializingError`, `OkvisFatalException → VioFatalError`, `OkvisOptimizationException → VioDegradedError` (only when transitioning to fatal — the normal degraded path returns a `VioOutput` with inflated covariance per AZ-331 v1.0.0). - `process_frame`: feed IMU samples to the preintegrator, push frame to backend, read latest output, build the `VioOutput` DTO using `gtsam.Pose3.matrix()` round-trip via `helpers.se3_utils` (AZ-277). Echo `frame_id`. - `reset_to_warm_start`: tear down + reconstruct `Okvis2Backend` from the hint; first call must not raise (idempotency invariant per AC-4); seed bias into the preintegrator via `preintegrator.reset_with_bias(hint.bias)`. - `health_snapshot`: pull `backend.health()` dict and wrap as `VioHealth`. Track `consecutive_lost` Python-side because the binding returns "current state" only. - `current_strategy_label`: return the frozen `"okvis2"`. - FDR records on state transitions via the injected `fdr_client` using the `kind="vio.health"` schema (AZ-272). **Gate**: `mypy --strict` passes against the new file; `ruff check` passes; isinstance check `isinstance(Okvis2Strategy(...), VioStrategy)` returns True without importing the native binding (i.e., the Protocol's structural conformance, not the construction itself). ### Step 5 — write `Okvis2Config` (15 min) File: `src/gps_denied_onboard/components/c1_vio/config.py` (extend existing — do not duplicate `C1VioConfig`). - Add `@dataclass(frozen=True) class Okvis2Config` with fields: `keyframe_window_size: int = 15` (∈ [10, 20] per D-C5-3); `keyframe_parallax_threshold_px: float = 3.0`; `ransac_inlier_ratio: float = 0.5`; `max_optimization_iters: int = 4`; `degraded_feature_threshold: int = 30`; `per_frame_debug_log: bool = False`. - `__post_init__` validates ranges and raises `ConfigError`. - Register the block under `config.components['c1_vio'].okvis2` (sub-block) — keep `C1VioConfig` as-is at the top level. **Gate**: `Okvis2Config(keyframe_window_size=9)` raises `ConfigError`; `Okvis2Config()` defaults pass. ### Step 6 — write unit tests with fake binding (1–2 h) Files: - `tests/unit/c1_vio/conftest.py`: a `fake_okvis2_binding` fixture that installs a `types.ModuleType` at `sys.modules['gps_denied_onboard.components.c1_vio._native.okvis2_binding']` with a scriptable `Okvis2Backend` test double. The test double exposes a `script()` method that pre-loads a queue of outputs / exceptions; `add_frame` pops from the queue. This is the "fake pybind11 binding that returns scripted `VioOutput` payloads" the task spec explicitly allows. - `tests/unit/c1_vio/test_okvis2_strategy.py`: one test per AC (AC-1 through AC-8, AC-10). Use the fake binding fixture. AC-9 and the NFR-perf test are written here too but marked `@pytest.mark.tier2` so `pytest -m "not tier2"` (the macOS dev loop) skips them; `ci-tier2.yml` picks them up. **Gate**: every unit test passes on macOS with `pytest -m "not tier2" tests/unit/c1_vio/`. Full sweep (`pytest tests/`) shows the existing 1093 passing + the new tests, with the tier2-marked ones skipped on macOS. ### Step 7 — update `.github/workflows/ci.yml` to install OKVIS2's Linux deps (5–10 min) - In the `build` matrix's `deployment` and `research` kinds, add a step BEFORE `cmake -S . -B build`: ```yaml - name: Install OKVIS2 native deps run: | sudo apt-get update sudo apt-get install -y --no-install-recommends \ libeigen3-dev libboost-all-dev libgoogle-glog-dev libgflags-dev \ libsuitesparse-dev libceres-dev libopencv-dev ``` - Toggle `BUILD_OKVIS2` to `ON` in the `deployment` kind's `cmake_flags` (default config in `solution.md` says OKVIS2 is the production-default; the deployment matrix kind should enforce this). - The `research` kind already has `BUILD_VINS_MONO=ON`; leave `BUILD_OKVIS2=ON` there too. **Gate**: push branch; GitHub Actions Ubuntu runner completes the `cmake --build build --parallel` step. If OKVIS2's CMake export targets have a different name than `okvis::Estimator` / `okvis::Common`, the failure surfaces here and Step 2's `target_link_libraries` is patched. This is the only build-system feedback loop we get pre-Jetson — exploit it. ### Step 8 — AC coverage verification + code review (15–30 min) - Verify every AC of AZ-332 maps to at least one test (skipped-with-reason counts as covered per implement skill Step 8). - Invoke `/code-review` skill on the batch's changed files. Expected verdict: PASS or PASS_WITH_WARNINGS. Auto-fix or escalate per implement skill Step 10. ### Step 9 — commit (5 min) - One commit per implement skill Step 11: `[AZ-332] C1 Okvis2Strategy: pybind11 binding skeleton + Python facade + fake-backend tests`. - Body of commit message documents the three-environment split (macOS dev / Ubuntu CI / Jetson tier2) and notes that AC-9 + NFR-perf are tier2-gated. ### Step 10 — tracker + archive + batch report (5 min) - Jira: AZ-332 In Progress → In Testing. - Move `_docs/02_tasks/todo/AZ-332_c1_okvis2_strategy.md` → `_docs/02_tasks/done/`. - Write `_docs/03_implementation/batch_23_cycle1_report.md` with the standard report shape. Include the tier2-deferred AC-9 + NFR-perf items under "Deferred to tier2 CI". - Update `_docs/_autodev_state.md`: sub_step → next batch detection. ## Files to be created / modified (summary) Created: - `cpp/okvis2/upstream/` (git submodule) - `cpp/pybind11/upstream/` (git submodule) - `src/gps_denied_onboard/components/c1_vio/_native/okvis2_binding.cpp` - `src/gps_denied_onboard/components/c1_vio/okvis2.py` - `tests/unit/c1_vio/conftest.py` - `tests/unit/c1_vio/test_okvis2_strategy.py` - `_docs/03_implementation/batch_23_cycle1_report.md` Modified: - `cpp/okvis2/CMakeLists.txt` (replace placeholder) - `cpp/pybind11/CMakeLists.txt` (replace placeholder; can stay minimal) - `src/gps_denied_onboard/components/c1_vio/config.py` (add `Okvis2Config`) - `.github/workflows/ci.yml` (add apt-get step; flip `BUILD_OKVIS2=ON` in deployment kind) - `.gitmodules` (auto-edited by submodule add) - `_docs/_autodev_state.md` - `_docs/02_tasks/todo/AZ-332_c1_okvis2_strategy.md` (moved to done/) ## Tier2 deliverables (NOT this session — explicit follow-up) AC-9 (honest covariance monotonicity) and the NFR-perf test (`process_frame` p95 ≤ 80 ms on Tier-2) require real OKVIS2 + Derkachi-class fixture footage on the actual Jetson hardware. They are: - Written in `test_okvis2_strategy.py` marked `@pytest.mark.tier2`. - Skipped on macOS dev + GitHub Actions Linux runner. - Picked up by `ci-tier2.yml` on push to `stage` or `main`. - A remediation task (`AZ-332_tier2_validation`) is OPTIONAL — could be tracked separately or rolled into the deferred Jetson MVE phase that D-C1-2 already scheduled. Pick at session-start time. ## Pinned upstream versions Fill in once Step 1 is executed: - `cpp/okvis2/upstream` — commit hash: _TBD_; OKVIS2 main branch HEAD at `` - `cpp/pybind11/upstream` — commit hash: _TBD_; pybind11 stable release tag `` ## When this doc can be deleted After AZ-332 lands and the next batch is in flight, this file is historical context. Move to `_docs/_archive/` (or delete if `_archive` doesn't exist) once Jetson tier2 CI has been green at least once on a real OKVIS2 run.