Handoff artifacts from the prior /autodev session that stopped at Step 7 sub_step compute-next-batch: - _docs/_autodev_state.md: pointer updated to batch 23, AZ-332 only (AZ-345 deferred — dep AZ-346 not yet in done/). - _docs/03_implementation/AZ-332_implementation_plan.md: locked-in decisions (no ROS 2, no Python re-impl, three-env split: macOS dev / Ubuntu CI / Jetson tier2) + step-by-step playbook for next session. Pre-batch chore commit per implement skill prereq #4 (clean tree required before AZ-332 commit so the batch diff stays focused). Co-authored-by: Cursor <cursoragent@cursor.com>
15 KiB
AZ-332 — Implementation plan (batch 23, cycle 1)
Date created: 2026-05-12 (carry-over from /autodev session 2026-05-12 morning)
Owner: next /autodev invocation starting from Step 7 Implement sub_step compute-next-batch
Scope of this doc: a concrete, in-order playbook for the next session. Reading this + the task spec at _docs/02_tasks/todo/AZ-332_c1_okvis2_strategy.md is sufficient to resume — no other re-discovery needed.
Why this is its own plan doc
AZ-332 (C1 OKVIS2 production-default VIO) is the first task in this project to require a native C++ build chain (OKVIS2 + pybind11). The previous session researched paths, surfaced blockers, and landed on a decomposition that splits work across three build environments. That decomposition has to survive the session boundary, hence this file.
Decisions locked in the previous session
- No ROS 2 layer.
colconbuild of OKVIS2 produces the same libraries as standalone CMake plus a ROS 2 node we do not need; ROS 2 runtime IPC was rejected at Plan time (_docs/01_solution/solution.md§ D-C1-1-SUB-A — "Rejected (cost + latency budget conflict)"). Build with standalone CMake. - No Python re-implementation of OKVIS2. Forbidden by the task spec ("Unacceptable substitutes" section). Pure-Python VIO violates C1-PT-01 ≤ 80 ms p95 budget by construction.
- No alternative VIO substitution. Every C++ VIO candidate (OpenVINS, VINS-Mono, Kimera-VIO) has the same compile-on-macOS problem. The only Python-native candidates (DPVO, KLT+RANSAC) are mono-VO only — not drop-ins for a VIO contract. AZ-332 stays OKVIS2.
- Three-environment dev split:
Environment What runs there What it gates macOS dev Python facade + binding C++ editing; unit tests using the fake _native.okvis2_binding(task spec explicitly allows this for tests)AC-1, AC-2, AC-3, AC-4, AC-5, AC-6, AC-7, AC-8, AC-10 Ubuntu CI runner ( ci.yml)Native CMake build of vendored OKVIS2 + binding .soBuild-passes gate; no AC validation here Self-hosted Jetson runner ( ci-tier2.yml)Real-OKVIS2 perf + honest-covariance tests AC-9 (honest covariance monotonicity); NFR-perf p95 ≤ 80 ms
This split honours the task spec ("real Okvis2Strategy calling real C7 InferenceRuntime with real TRT-compiled DISK engine") because the production binary IS the real binding compiled on Linux/Jetson — only the dev-side unit tests use the fake. The fake never ships to production.
Concrete step-by-step for next session (in order; each step has a stop-and-verify gate)
Step 0 — re-entry sanity check (1 min)
- Read
_docs/_autodev_state.md: confirm step 7 / sub_stepcompute-next-batch/ detail points here. - Read this doc fully.
- Read
_docs/02_tasks/todo/AZ-332_c1_okvis2_strategy.mdonce. git status --porcelainmust be empty (implement skill prerequisite).
Step 1 — vendor OKVIS2 and pybind11 as git submodules (5–10 min)
git submodule add --depth 1 --recurse-submodules https://github.com/smartroboticslab/okvis2.git cpp/okvis2/upstream- Note: submodule path is
cpp/okvis2/upstream/(notcpp/okvis2/directly) so the existingcpp/okvis2/CMakeLists.txtkeeps its project-owned role andadd_subdirectory(upstream)pulls in OKVIS2.
- Note: submodule path is
git submodule add --depth 1 https://github.com/pybind/pybind11.git cpp/pybind11/upstream- Same pattern: existing
cpp/pybind11/directory keeps the project README; submodule lives atcpp/pybind11/upstream/.
- Same pattern: existing
- Delete the
.gitkeepand placeholderREADME.mdfromcpp/pybind11/once the submodule is in place (or keep them; they're harmless either way — pick one and stay consistent). - Pin a known-good commit hash for OKVIS2 (record it in this doc under "Pinned upstream versions" once chosen). Recommendation: pin to the latest
mainHEAD at the time of submodule add and document the commit short-hash here. - Gate:
git submodule statusshows both submodules with a SHA;git statusclean except.gitmodules+ submodule entries.
Step 2 — write CMake glue (15–30 min)
Files to write:
cpp/okvis2/CMakeLists.txt(replace existing placeholder):if(NOT BUILD_OKVIS2) return() endif()add_subdirectory(upstream EXCLUDE_FROM_ALL)with OKVIS2'sUSE_NN=OFFto drop the LibTorch dep (per Fact #39 — keyframe arch tolerates this).- Find_package the Linux deps OKVIS2 needs (Eigen3, Boost, glog, gflags, SuiteSparse, Ceres, OpenCV — every one is an apt package on Ubuntu, brew formula on macOS).
add_subdirectory(${CMAKE_SOURCE_DIR}/cpp/pybind11/upstream pybind11_build).pybind11_add_module(okvis2_binding ${CMAKE_CURRENT_SOURCE_DIR}/../../src/gps_denied_onboard/components/c1_vio/_native/okvis2_binding.cpp)— note path back to Python tree.target_link_libraries(okvis2_binding PRIVATE okvis::Estimator okvis::Common ...)(exact target names from OKVIS2's CMake exports — verify by runningcmake --build build --target help | grep okvisonce submodule is in).install(TARGETS okvis2_binding DESTINATION ${CMAKE_INSTALL_LIBDIR}/gps_denied_onboard/components/c1_vio/_native/).
cpp/pybind11/CMakeLists.txt(replace existing placeholder): can stay nearly empty — pybind11 is included bycpp/okvis2/CMakeLists.txtviaadd_subdirectory.
The existing top-level cpp/CMakeLists.txt already has add_subdirectory(okvis2) gated on BUILD_OKVIS2 OR BUILD_VINS_MONO OR BUILD_KLT_RANSAC — no change needed there.
Gate: cmake -S . -B build -DBUILD_OKVIS2=OFF succeeds on macOS (no-op build with the flag off). The OFF path is what protects the rest of the build from any of this new wiring.
Step 3 — write the pybind11 binding C++ skeleton (1–2 h)
File: src/gps_denied_onboard/components/c1_vio/_native/okvis2_binding.cpp
Surface needed (mirrors the Python facade's needs — not the full OKVIS2 API):
Okvis2Backendclass with: ctor from YAML config string + camera intrinsics dict;add_frame(frame_id: str, ns_ts: int, image: ndarray[uint8, H, W, C]) -> bool;add_imu(ns_ts: int, accel: ndarray[float64, 3], gyro: ndarray[float64, 3]) -> None;get_latest_output() -> dict | None(returns frame_id + 4x4 pose matrix + 6x6 covariance + bias + feature_quality dict + emitted_at_ns);reset(body_T_world: ndarray[float64, 4, 4], velocity: ndarray[float64, 3], accel_bias: ndarray[float64, 3], gyro_bias: ndarray[float64, 3]) -> None;health() -> dict(returns{state: str, consecutive_lost: int, bias_norm: float}).- Exceptions: every OKVIS / Eigen / std::runtime_error caught inside binding methods and rethrown as a fixed set of Python exceptions registered via
py::register_exception— the Python facade then catches those and rewraps intoVioErrorfamily. - Zero-copy pathway:
imageispy::array_t<uint8_t, py::array::c_style | py::array::forcecast>so DISK ingest avoids a copy.
This is a skeleton — full OKVIS2 estimator wiring (okvis::ThreadedKFVio setup + callback plumbing) can be a follow-up commit if the skeleton + CI Linux build come back green first.
Gate: compiles inside the OKVIS2 CMake target. Tested on Ubuntu CI runner (not macOS).
Step 4 — write the Python facade okvis2.py (1–2 h)
File: src/gps_denied_onboard/components/c1_vio/okvis2.py
Okvis2Strategyclass implementing theVioStrategyProtocol frominterface.py.- Lazy import of
_native.okvis2_bindinginside the module body (NOT at module top — that's the I-5 / Risk-2 mitigation; AZ-331'stest_ac5_build_vio_strategy_flag_off_no_importasserts this and MUST still pass). - Constructor signature:
__init__(self, config: Config, *, fdr_client: FdrClient)— match the AZ-331 factory's call shape exactly. Inside the constructor: build theImuPreintegratorfromhelpers.imu_preintegrator.make_imu_preintegrator(calibration); build theOkvis2Backendfrom the binding; record the strategy label as"okvis2"(frozen per Protocol invariant). - Map every backend exception (raised from the C++ binding's registered exception types) to the
VioErrorfamily —OkvisInitException → VioInitializingError,OkvisFatalException → VioFatalError,OkvisOptimizationException → VioDegradedError(only when transitioning to fatal — the normal degraded path returns aVioOutputwith inflated covariance per AZ-331 v1.0.0). process_frame: feed IMU samples to the preintegrator, push frame to backend, read latest output, build theVioOutputDTO usinggtsam.Pose3.matrix()round-trip viahelpers.se3_utils(AZ-277). Echoframe_id.reset_to_warm_start: tear down + reconstructOkvis2Backendfrom the hint; first call must not raise (idempotency invariant per AC-4); seed bias into the preintegrator viapreintegrator.reset_with_bias(hint.bias).health_snapshot: pullbackend.health()dict and wrap asVioHealth. Trackconsecutive_lostPython-side because the binding returns "current state" only.current_strategy_label: return the frozen"okvis2".- FDR records on state transitions via the injected
fdr_clientusing thekind="vio.health"schema (AZ-272).
Gate: mypy --strict passes against the new file; ruff check passes; isinstance check isinstance(Okvis2Strategy(...), VioStrategy) returns True without importing the native binding (i.e., the Protocol's structural conformance, not the construction itself).
Step 5 — write Okvis2Config (15 min)
File: src/gps_denied_onboard/components/c1_vio/config.py (extend existing — do not duplicate C1VioConfig).
- Add
@dataclass(frozen=True) class Okvis2Configwith fields:keyframe_window_size: int = 15(∈ [10, 20] per D-C5-3);keyframe_parallax_threshold_px: float = 3.0;ransac_inlier_ratio: float = 0.5;max_optimization_iters: int = 4;degraded_feature_threshold: int = 30;per_frame_debug_log: bool = False. __post_init__validates ranges and raisesConfigError.- Register the block under
config.components['c1_vio'].okvis2(sub-block) — keepC1VioConfigas-is at the top level.
Gate: Okvis2Config(keyframe_window_size=9) raises ConfigError; Okvis2Config() defaults pass.
Step 6 — write unit tests with fake binding (1–2 h)
Files:
tests/unit/c1_vio/conftest.py: afake_okvis2_bindingfixture that installs atypes.ModuleTypeatsys.modules['gps_denied_onboard.components.c1_vio._native.okvis2_binding']with a scriptableOkvis2Backendtest double. The test double exposes ascript()method that pre-loads a queue of outputs / exceptions;add_framepops from the queue. This is the "fake pybind11 binding that returns scriptedVioOutputpayloads" the task spec explicitly allows.tests/unit/c1_vio/test_okvis2_strategy.py: one test per AC (AC-1 through AC-8, AC-10). Use the fake binding fixture. AC-9 and the NFR-perf test are written here too but marked@pytest.mark.tier2sopytest -m "not tier2"(the macOS dev loop) skips them;ci-tier2.ymlpicks them up.
Gate: every unit test passes on macOS with pytest -m "not tier2" tests/unit/c1_vio/. Full sweep (pytest tests/) shows the existing 1093 passing + the new tests, with the tier2-marked ones skipped on macOS.
Step 7 — update .github/workflows/ci.yml to install OKVIS2's Linux deps (5–10 min)
- In the
buildmatrix'sdeploymentandresearchkinds, add a step BEFOREcmake -S . -B build:- name: Install OKVIS2 native deps run: | sudo apt-get update sudo apt-get install -y --no-install-recommends \ libeigen3-dev libboost-all-dev libgoogle-glog-dev libgflags-dev \ libsuitesparse-dev libceres-dev libopencv-dev - Toggle
BUILD_OKVIS2toONin thedeploymentkind'scmake_flags(default config insolution.mdsays OKVIS2 is the production-default; the deployment matrix kind should enforce this). - The
researchkind already hasBUILD_VINS_MONO=ON; leaveBUILD_OKVIS2=ONthere too.
Gate: push branch; GitHub Actions Ubuntu runner completes the cmake --build build --parallel step. If OKVIS2's CMake export targets have a different name than okvis::Estimator / okvis::Common, the failure surfaces here and Step 2's target_link_libraries is patched. This is the only build-system feedback loop we get pre-Jetson — exploit it.
Step 8 — AC coverage verification + code review (15–30 min)
- Verify every AC of AZ-332 maps to at least one test (skipped-with-reason counts as covered per implement skill Step 8).
- Invoke
/code-reviewskill on the batch's changed files. Expected verdict: PASS or PASS_WITH_WARNINGS. Auto-fix or escalate per implement skill Step 10.
Step 9 — commit (5 min)
- One commit per implement skill Step 11:
[AZ-332] C1 Okvis2Strategy: pybind11 binding skeleton + Python facade + fake-backend tests. - Body of commit message documents the three-environment split (macOS dev / Ubuntu CI / Jetson tier2) and notes that AC-9 + NFR-perf are tier2-gated.
Step 10 — tracker + archive + batch report (5 min)
- Jira: AZ-332 In Progress → In Testing.
- Move
_docs/02_tasks/todo/AZ-332_c1_okvis2_strategy.md→_docs/02_tasks/done/. - Write
_docs/03_implementation/batch_23_cycle1_report.mdwith the standard report shape. Include the tier2-deferred AC-9 + NFR-perf items under "Deferred to tier2 CI". - Update
_docs/_autodev_state.md: sub_step → next batch detection.
Files to be created / modified (summary)
Created:
cpp/okvis2/upstream/(git submodule)cpp/pybind11/upstream/(git submodule)src/gps_denied_onboard/components/c1_vio/_native/okvis2_binding.cppsrc/gps_denied_onboard/components/c1_vio/okvis2.pytests/unit/c1_vio/conftest.pytests/unit/c1_vio/test_okvis2_strategy.py_docs/03_implementation/batch_23_cycle1_report.md
Modified:
cpp/okvis2/CMakeLists.txt(replace placeholder)cpp/pybind11/CMakeLists.txt(replace placeholder; can stay minimal)src/gps_denied_onboard/components/c1_vio/config.py(addOkvis2Config).github/workflows/ci.yml(add apt-get step; flipBUILD_OKVIS2=ONin deployment kind).gitmodules(auto-edited by submodule add)_docs/_autodev_state.md_docs/02_tasks/todo/AZ-332_c1_okvis2_strategy.md(moved to done/)
Tier2 deliverables (NOT this session — explicit follow-up)
AC-9 (honest covariance monotonicity) and the NFR-perf test (process_frame p95 ≤ 80 ms on Tier-2) require real OKVIS2 + Derkachi-class fixture footage on the actual Jetson hardware. They are:
- Written in
test_okvis2_strategy.pymarked@pytest.mark.tier2. - Skipped on macOS dev + GitHub Actions Linux runner.
- Picked up by
ci-tier2.ymlon push tostageormain. - A remediation task (
AZ-332_tier2_validation) is OPTIONAL — could be tracked separately or rolled into the deferred Jetson MVE phase that D-C1-2 already scheduled. Pick at session-start time.
Pinned upstream versions
Fill in once Step 1 is executed:
cpp/okvis2/upstream— commit hash: TBD; OKVIS2 main branch HEAD at<date>cpp/pybind11/upstream— commit hash: TBD; pybind11 stable release tag<version>
When this doc can be deleted
After AZ-332 lands and the next batch is in flight, this file is historical context. Move to _docs/_archive/ (or delete if _archive doesn't exist) once Jetson tier2 CI has been green at least once on a real OKVIS2 run.