Files
Oleksandr Bezdieniezhnykh 9c35776bcb chore: pre-batch-23 carry-over (state + AZ-332 plan)
Handoff artifacts from the prior /autodev session that stopped at
Step 7 sub_step compute-next-batch:

- _docs/_autodev_state.md: pointer updated to batch 23, AZ-332 only
  (AZ-345 deferred — dep AZ-346 not yet in done/).
- _docs/03_implementation/AZ-332_implementation_plan.md: locked-in
  decisions (no ROS 2, no Python re-impl, three-env split: macOS dev /
  Ubuntu CI / Jetson tier2) + step-by-step playbook for next session.

Pre-batch chore commit per implement skill prereq #4 (clean tree
required before AZ-332 commit so the batch diff stays focused).

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-12 09:18:20 +03:00

15 KiB
Raw Permalink Blame History

AZ-332 — Implementation plan (batch 23, cycle 1)

Date created: 2026-05-12 (carry-over from /autodev session 2026-05-12 morning) Owner: next /autodev invocation starting from Step 7 Implement sub_step compute-next-batch Scope of this doc: a concrete, in-order playbook for the next session. Reading this + the task spec at _docs/02_tasks/todo/AZ-332_c1_okvis2_strategy.md is sufficient to resume — no other re-discovery needed.


Why this is its own plan doc

AZ-332 (C1 OKVIS2 production-default VIO) is the first task in this project to require a native C++ build chain (OKVIS2 + pybind11). The previous session researched paths, surfaced blockers, and landed on a decomposition that splits work across three build environments. That decomposition has to survive the session boundary, hence this file.

Decisions locked in the previous session

  1. No ROS 2 layer. colcon build of OKVIS2 produces the same libraries as standalone CMake plus a ROS 2 node we do not need; ROS 2 runtime IPC was rejected at Plan time (_docs/01_solution/solution.md § D-C1-1-SUB-A — "Rejected (cost + latency budget conflict)"). Build with standalone CMake.
  2. No Python re-implementation of OKVIS2. Forbidden by the task spec ("Unacceptable substitutes" section). Pure-Python VIO violates C1-PT-01 ≤ 80 ms p95 budget by construction.
  3. No alternative VIO substitution. Every C++ VIO candidate (OpenVINS, VINS-Mono, Kimera-VIO) has the same compile-on-macOS problem. The only Python-native candidates (DPVO, KLT+RANSAC) are mono-VO only — not drop-ins for a VIO contract. AZ-332 stays OKVIS2.
  4. Three-environment dev split:
    Environment What runs there What it gates
    macOS dev Python facade + binding C++ editing; unit tests using the fake _native.okvis2_binding (task spec explicitly allows this for tests) AC-1, AC-2, AC-3, AC-4, AC-5, AC-6, AC-7, AC-8, AC-10
    Ubuntu CI runner (ci.yml) Native CMake build of vendored OKVIS2 + binding .so Build-passes gate; no AC validation here
    Self-hosted Jetson runner (ci-tier2.yml) Real-OKVIS2 perf + honest-covariance tests AC-9 (honest covariance monotonicity); NFR-perf p95 ≤ 80 ms

This split honours the task spec ("real Okvis2Strategy calling real C7 InferenceRuntime with real TRT-compiled DISK engine") because the production binary IS the real binding compiled on Linux/Jetson — only the dev-side unit tests use the fake. The fake never ships to production.

Concrete step-by-step for next session (in order; each step has a stop-and-verify gate)

Step 0 — re-entry sanity check (1 min)

  • Read _docs/_autodev_state.md: confirm step 7 / sub_step compute-next-batch / detail points here.
  • Read this doc fully.
  • Read _docs/02_tasks/todo/AZ-332_c1_okvis2_strategy.md once.
  • git status --porcelain must be empty (implement skill prerequisite).

Step 1 — vendor OKVIS2 and pybind11 as git submodules (510 min)

  • git submodule add --depth 1 --recurse-submodules https://github.com/smartroboticslab/okvis2.git cpp/okvis2/upstream
    • Note: submodule path is cpp/okvis2/upstream/ (not cpp/okvis2/ directly) so the existing cpp/okvis2/CMakeLists.txt keeps its project-owned role and add_subdirectory(upstream) pulls in OKVIS2.
  • git submodule add --depth 1 https://github.com/pybind/pybind11.git cpp/pybind11/upstream
    • Same pattern: existing cpp/pybind11/ directory keeps the project README; submodule lives at cpp/pybind11/upstream/.
  • Delete the .gitkeep and placeholder README.md from cpp/pybind11/ once the submodule is in place (or keep them; they're harmless either way — pick one and stay consistent).
  • Pin a known-good commit hash for OKVIS2 (record it in this doc under "Pinned upstream versions" once chosen). Recommendation: pin to the latest main HEAD at the time of submodule add and document the commit short-hash here.
  • Gate: git submodule status shows both submodules with a SHA; git status clean except .gitmodules + submodule entries.

Step 2 — write CMake glue (1530 min)

Files to write:

  • cpp/okvis2/CMakeLists.txt (replace existing placeholder):
    • if(NOT BUILD_OKVIS2) return() endif()
    • add_subdirectory(upstream EXCLUDE_FROM_ALL) with OKVIS2's USE_NN=OFF to drop the LibTorch dep (per Fact #39 — keyframe arch tolerates this).
    • Find_package the Linux deps OKVIS2 needs (Eigen3, Boost, glog, gflags, SuiteSparse, Ceres, OpenCV — every one is an apt package on Ubuntu, brew formula on macOS).
    • add_subdirectory(${CMAKE_SOURCE_DIR}/cpp/pybind11/upstream pybind11_build).
    • pybind11_add_module(okvis2_binding ${CMAKE_CURRENT_SOURCE_DIR}/../../src/gps_denied_onboard/components/c1_vio/_native/okvis2_binding.cpp) — note path back to Python tree.
    • target_link_libraries(okvis2_binding PRIVATE okvis::Estimator okvis::Common ...) (exact target names from OKVIS2's CMake exports — verify by running cmake --build build --target help | grep okvis once submodule is in).
    • install(TARGETS okvis2_binding DESTINATION ${CMAKE_INSTALL_LIBDIR}/gps_denied_onboard/components/c1_vio/_native/).
  • cpp/pybind11/CMakeLists.txt (replace existing placeholder): can stay nearly empty — pybind11 is included by cpp/okvis2/CMakeLists.txt via add_subdirectory.

The existing top-level cpp/CMakeLists.txt already has add_subdirectory(okvis2) gated on BUILD_OKVIS2 OR BUILD_VINS_MONO OR BUILD_KLT_RANSAC — no change needed there.

Gate: cmake -S . -B build -DBUILD_OKVIS2=OFF succeeds on macOS (no-op build with the flag off). The OFF path is what protects the rest of the build from any of this new wiring.

Step 3 — write the pybind11 binding C++ skeleton (12 h)

File: src/gps_denied_onboard/components/c1_vio/_native/okvis2_binding.cpp

Surface needed (mirrors the Python facade's needs — not the full OKVIS2 API):

  • Okvis2Backend class with: ctor from YAML config string + camera intrinsics dict; add_frame(frame_id: str, ns_ts: int, image: ndarray[uint8, H, W, C]) -> bool; add_imu(ns_ts: int, accel: ndarray[float64, 3], gyro: ndarray[float64, 3]) -> None; get_latest_output() -> dict | None (returns frame_id + 4x4 pose matrix + 6x6 covariance + bias + feature_quality dict + emitted_at_ns); reset(body_T_world: ndarray[float64, 4, 4], velocity: ndarray[float64, 3], accel_bias: ndarray[float64, 3], gyro_bias: ndarray[float64, 3]) -> None; health() -> dict (returns {state: str, consecutive_lost: int, bias_norm: float}).
  • Exceptions: every OKVIS / Eigen / std::runtime_error caught inside binding methods and rethrown as a fixed set of Python exceptions registered via py::register_exception — the Python facade then catches those and rewraps into VioError family.
  • Zero-copy pathway: image is py::array_t<uint8_t, py::array::c_style | py::array::forcecast> so DISK ingest avoids a copy.

This is a skeleton — full OKVIS2 estimator wiring (okvis::ThreadedKFVio setup + callback plumbing) can be a follow-up commit if the skeleton + CI Linux build come back green first.

Gate: compiles inside the OKVIS2 CMake target. Tested on Ubuntu CI runner (not macOS).

Step 4 — write the Python facade okvis2.py (12 h)

File: src/gps_denied_onboard/components/c1_vio/okvis2.py

  • Okvis2Strategy class implementing the VioStrategy Protocol from interface.py.
  • Lazy import of _native.okvis2_binding inside the module body (NOT at module top — that's the I-5 / Risk-2 mitigation; AZ-331's test_ac5_build_vio_strategy_flag_off_no_import asserts this and MUST still pass).
  • Constructor signature: __init__(self, config: Config, *, fdr_client: FdrClient) — match the AZ-331 factory's call shape exactly. Inside the constructor: build the ImuPreintegrator from helpers.imu_preintegrator.make_imu_preintegrator(calibration); build the Okvis2Backend from the binding; record the strategy label as "okvis2" (frozen per Protocol invariant).
  • Map every backend exception (raised from the C++ binding's registered exception types) to the VioError family — OkvisInitException → VioInitializingError, OkvisFatalException → VioFatalError, OkvisOptimizationException → VioDegradedError (only when transitioning to fatal — the normal degraded path returns a VioOutput with inflated covariance per AZ-331 v1.0.0).
  • process_frame: feed IMU samples to the preintegrator, push frame to backend, read latest output, build the VioOutput DTO using gtsam.Pose3.matrix() round-trip via helpers.se3_utils (AZ-277). Echo frame_id.
  • reset_to_warm_start: tear down + reconstruct Okvis2Backend from the hint; first call must not raise (idempotency invariant per AC-4); seed bias into the preintegrator via preintegrator.reset_with_bias(hint.bias).
  • health_snapshot: pull backend.health() dict and wrap as VioHealth. Track consecutive_lost Python-side because the binding returns "current state" only.
  • current_strategy_label: return the frozen "okvis2".
  • FDR records on state transitions via the injected fdr_client using the kind="vio.health" schema (AZ-272).

Gate: mypy --strict passes against the new file; ruff check passes; isinstance check isinstance(Okvis2Strategy(...), VioStrategy) returns True without importing the native binding (i.e., the Protocol's structural conformance, not the construction itself).

Step 5 — write Okvis2Config (15 min)

File: src/gps_denied_onboard/components/c1_vio/config.py (extend existing — do not duplicate C1VioConfig).

  • Add @dataclass(frozen=True) class Okvis2Config with fields: keyframe_window_size: int = 15 (∈ [10, 20] per D-C5-3); keyframe_parallax_threshold_px: float = 3.0; ransac_inlier_ratio: float = 0.5; max_optimization_iters: int = 4; degraded_feature_threshold: int = 30; per_frame_debug_log: bool = False.
  • __post_init__ validates ranges and raises ConfigError.
  • Register the block under config.components['c1_vio'].okvis2 (sub-block) — keep C1VioConfig as-is at the top level.

Gate: Okvis2Config(keyframe_window_size=9) raises ConfigError; Okvis2Config() defaults pass.

Step 6 — write unit tests with fake binding (12 h)

Files:

  • tests/unit/c1_vio/conftest.py: a fake_okvis2_binding fixture that installs a types.ModuleType at sys.modules['gps_denied_onboard.components.c1_vio._native.okvis2_binding'] with a scriptable Okvis2Backend test double. The test double exposes a script() method that pre-loads a queue of outputs / exceptions; add_frame pops from the queue. This is the "fake pybind11 binding that returns scripted VioOutput payloads" the task spec explicitly allows.
  • tests/unit/c1_vio/test_okvis2_strategy.py: one test per AC (AC-1 through AC-8, AC-10). Use the fake binding fixture. AC-9 and the NFR-perf test are written here too but marked @pytest.mark.tier2 so pytest -m "not tier2" (the macOS dev loop) skips them; ci-tier2.yml picks them up.

Gate: every unit test passes on macOS with pytest -m "not tier2" tests/unit/c1_vio/. Full sweep (pytest tests/) shows the existing 1093 passing + the new tests, with the tier2-marked ones skipped on macOS.

Step 7 — update .github/workflows/ci.yml to install OKVIS2's Linux deps (510 min)

  • In the build matrix's deployment and research kinds, add a step BEFORE cmake -S . -B build:
    - name: Install OKVIS2 native deps
      run: |
        sudo apt-get update
        sudo apt-get install -y --no-install-recommends \
          libeigen3-dev libboost-all-dev libgoogle-glog-dev libgflags-dev \
          libsuitesparse-dev libceres-dev libopencv-dev
    
  • Toggle BUILD_OKVIS2 to ON in the deployment kind's cmake_flags (default config in solution.md says OKVIS2 is the production-default; the deployment matrix kind should enforce this).
  • The research kind already has BUILD_VINS_MONO=ON; leave BUILD_OKVIS2=ON there too.

Gate: push branch; GitHub Actions Ubuntu runner completes the cmake --build build --parallel step. If OKVIS2's CMake export targets have a different name than okvis::Estimator / okvis::Common, the failure surfaces here and Step 2's target_link_libraries is patched. This is the only build-system feedback loop we get pre-Jetson — exploit it.

Step 8 — AC coverage verification + code review (1530 min)

  • Verify every AC of AZ-332 maps to at least one test (skipped-with-reason counts as covered per implement skill Step 8).
  • Invoke /code-review skill on the batch's changed files. Expected verdict: PASS or PASS_WITH_WARNINGS. Auto-fix or escalate per implement skill Step 10.

Step 9 — commit (5 min)

  • One commit per implement skill Step 11: [AZ-332] C1 Okvis2Strategy: pybind11 binding skeleton + Python facade + fake-backend tests.
  • Body of commit message documents the three-environment split (macOS dev / Ubuntu CI / Jetson tier2) and notes that AC-9 + NFR-perf are tier2-gated.

Step 10 — tracker + archive + batch report (5 min)

  • Jira: AZ-332 In Progress → In Testing.
  • Move _docs/02_tasks/todo/AZ-332_c1_okvis2_strategy.md_docs/02_tasks/done/.
  • Write _docs/03_implementation/batch_23_cycle1_report.md with the standard report shape. Include the tier2-deferred AC-9 + NFR-perf items under "Deferred to tier2 CI".
  • Update _docs/_autodev_state.md: sub_step → next batch detection.

Files to be created / modified (summary)

Created:

  • cpp/okvis2/upstream/ (git submodule)
  • cpp/pybind11/upstream/ (git submodule)
  • src/gps_denied_onboard/components/c1_vio/_native/okvis2_binding.cpp
  • src/gps_denied_onboard/components/c1_vio/okvis2.py
  • tests/unit/c1_vio/conftest.py
  • tests/unit/c1_vio/test_okvis2_strategy.py
  • _docs/03_implementation/batch_23_cycle1_report.md

Modified:

  • cpp/okvis2/CMakeLists.txt (replace placeholder)
  • cpp/pybind11/CMakeLists.txt (replace placeholder; can stay minimal)
  • src/gps_denied_onboard/components/c1_vio/config.py (add Okvis2Config)
  • .github/workflows/ci.yml (add apt-get step; flip BUILD_OKVIS2=ON in deployment kind)
  • .gitmodules (auto-edited by submodule add)
  • _docs/_autodev_state.md
  • _docs/02_tasks/todo/AZ-332_c1_okvis2_strategy.md (moved to done/)

Tier2 deliverables (NOT this session — explicit follow-up)

AC-9 (honest covariance monotonicity) and the NFR-perf test (process_frame p95 ≤ 80 ms on Tier-2) require real OKVIS2 + Derkachi-class fixture footage on the actual Jetson hardware. They are:

  • Written in test_okvis2_strategy.py marked @pytest.mark.tier2.
  • Skipped on macOS dev + GitHub Actions Linux runner.
  • Picked up by ci-tier2.yml on push to stage or main.
  • A remediation task (AZ-332_tier2_validation) is OPTIONAL — could be tracked separately or rolled into the deferred Jetson MVE phase that D-C1-2 already scheduled. Pick at session-start time.

Pinned upstream versions

Fill in once Step 1 is executed:

  • cpp/okvis2/upstream — commit hash: TBD; OKVIS2 main branch HEAD at <date>
  • cpp/pybind11/upstream — commit hash: TBD; pybind11 stable release tag <version>

When this doc can be deleted

After AZ-332 lands and the next batch is in flight, this file is historical context. Move to _docs/_archive/ (or delete if _archive doesn't exist) once Jetson tier2 CI has been green at least once on a real OKVIS2 run.