Files
gps-denied-onboard/_docs/02_tasks/todo/AZ-943_okvis2_threadedslam_binding.md
T
Oleksandr Bezdieniezhnykh 05fcacffa3 [AZ-943] [AZ-951] [AZ-952] Move OKVIS2 chain back to todo/ as next phase
Per user 2026-05-29 directive: "OKVIS2-related tasks needed to be
implemented after full e2e derkachi flight test would be finished
successfully. So maybe put it back to todo?"

Reasoning accepted. OKVIS2 chain is the planned NEXT phase after
the cycle-4 Derkachi demo lands, not a cycle-5+ deferral. The
2026-05-27 production-default pivot directive remains in force;
today's earlier "deferred to cycle-5+" framing was over-correction
after the AZ-943 spec-reality gap.

- AZ-943 stays HARD-BLOCKED on AZ-951 + AZ-952 (PAUSED preamble
  preserved). Cannot be worked on until both blockers land. Moving
  to todo/ signals "queued, next-after-blockers", not "actionable
  now".
- AZ-951 + AZ-952 are themselves NOT blocked. They ship the
  upstream patches that unblock AZ-943.

Implementation sequence (unchanged): finish cycle-4 demo (AZ-959
+ remaining CSV-replay path) → AZ-951 → AZ-952 → AZ-943 → AZ-944
→ AZ-945. Current implement-batch target stays AZ-959; this
commit is bookkeeping only, does not change what's next on deck.

Touches: 3 file moves (backlog/ → todo/), dep-table preamble
fourth bump narrative documenting the placement reversal.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-29 12:34:32 +03:00

17 KiB
Raw Blame History

C1 OKVIS2 Binding — Real ThreadedSlam Wiring (AZ-592 split 1/3)

STATUS (2026-05-29): PAUSED — BLOCKED on AZ-951 + AZ-952.

Implementation attempt on 2026-05-29 confirmed AC-4 is structurally unreachable without upstream OKVIS2 patches:

  • ThreadedSlam::estimator_ is private (not protected) → in-binding subclass workaround proposed in Implementation Notes "approach (a)" is impossible.
  • ViSlamBackend has no public accessor for 6×6 pose covariance, feature counts, mean parallax, or MRE.
  • TrackingState (callback arg) only carries id / isKeyframe / TrackingQuality enum / recognisedPlace / isFullGraphOptimising / currentKeyframeId — none of the AC-4 telemetry fields.

The "approach (b) upstream patch" fallback documented in this file + AZ-592 has been filed as two sibling tickets and linked as is blocked by against AZ-943:

  • AZ-951 (3 SP): upstream patch — expose 6×6 pose covariance accessor (+ ADR for pin deviation).
  • AZ-952 (3 SP): upstream patch — expose tracking-stats accessor (feature counts + parallax + MRE).

Jira AZ-943 reverted to To Do. This local file moved from todo/backlog/. The AC list + Implementation Notes below are PRESERVED unchanged for audit; once AZ-951 + AZ-952 land, AC-4 implementation will call backend().computeCovariance6x6(state.id) + backend().getLatestTrackingStats(state.id, ...) and the file moves back to todo/.

Audit reference: AZ-943 Jira comment "Implementation paused: spec gap discovered (2026-05-29)" — full root-cause + decision rationale.

Task: AZ-943_okvis2_threadedslam_binding Name: OKVIS2 binding: replace AZ-332 skeleton with real okvis::ThreadedSlam wiring Description: Sub-ticket 1 of 3 from the AZ-592 placeholder split (per state file 2026-05-27 split rationale). Replaces the AZ-332 skeleton in src/gps_denied_onboard/components/c1_vio/_native/okvis2_binding.cpp (_build_estimator() no-op, _drive_estimator() raises OkvisFatalException) with the real okvis::ThreadedSlam v2 pipeline: ViParametersReader(yaml).getParameters(...)ThreadedSlam(parameters, dBowDir)setOptimisedGraphCallback(...). Without this wiring, Okvis2Strategy (AZ-332) is the production-default per architecture but throws on first add_frame — the production VIO is unusable. CI build env + Jetson validation are tracked in sibling tickets AZ-944 (3pt, Linux CI + DBoW2 vocab + Tier-1 smoke) and AZ-945 (3pt, Jetson L4T + Tier-2 Derkachi e2e); the Blocks chain in Jira is AZ-943 → AZ-944 → AZ-945. This ticket touches ONLY the C++ binding and the Python facade fake-binding fixture; it does NOT flip BUILD_OKVIS2=ON in CI (that's AZ-944's deliverable). Complexity: 5 points Dependencies: AZ-332 (the AZ-332 skeleton this replaces; in done/), AZ-592 (parent umbrella placeholder; in backlog/) Component: c1_vio (epic AZ-254 / E-C1) Tracker: AZ-943 (https://denyspopov.atlassian.net/browse/AZ-943) Epic: AZ-254 (E-C1)

Document Dependencies

  • _docs/02_document/contracts/c1_vio/vio_strategy_protocol.md — the Protocol the strategy implements (AZ-331).
  • _docs/02_document/components/01_c1_vio/description.md — § 5 implementation details (sliding-window K=1020, per-frame cost), § 7 caveats (thermal throttle latency spikes).
  • ADR-002 (KLT/RANSAC mandatory baseline) — explains why this OKVIS2 wiring does NOT replace KLT/RANSAC; both ship.
  • cpp/okvis2/upstream/ — fully-populated v2 source tree the binding links against.

Problem

src/gps_denied_onboard/components/c1_vio/_native/okvis2_binding.cpp is the AZ-332 skeleton:

  • _build_estimator() (line ~251) sets estimator_built_ = false and does nothing else.
  • _drive_estimator() (line ~261) throws OkvisFatalException("OKVIS2 estimator not yet wired — this binding is the AZ-332 skeleton; tier2 follow-up wires okvis::ThreadedKFVio") on first frame.
  • Real OKVIS2 includes (#include <okvis/ThreadedKFVio.hpp> etc.) are commented out at lines ~4850.

Without this wiring, Okvis2Strategy cannot produce any output — the Python facade is complete, the binding compiles and loads, but the first add_frame immediately raises. The production-default VIO is unusable.

API correction since AZ-332: OKVIS2 v2 upstream uses okvis::ThreadedSlam (NOT okvis::ThreadedKFVio as the AZ-332 spec referenced; that's the OKVIS v1 API). The wiring must follow v2 conventions:

okvis::ViParametersReader(yaml_path).getParameters(parameters);
auto estimator = std::make_unique<okvis::ThreadedSlam>(parameters, dBowDir);
estimator->setOptimisedGraphCallback([this](auto&& g, auto&& l, auto&& s) { ... });

Outcome

  • Okvis2Strategy.add_frame(...) produces a real VioOutput (pose + 6×6 covariance + biases + tracking-quality counts) on every keyframe the OKVIS2 backend optimises — no exceptions on the first frame.
  • Okvis2Strategy.reset(...) tears down the C++ estimator and rebuilds it with the supplied seed pose/velocity/bias.
  • Existing Python unit tests (tests/unit/c1_vio/test_okvis2_strategy.py) remain green against the unchanged fake-binding fixture (tests/unit/c1_vio/conftest.py).
  • This ticket alone does NOT light up the Tier-1 or Tier-2 e2e path against real OKVIS2 — that's AZ-944 / AZ-945. Tier-1 unit suite stays the only green-bar evidence here.

Scope

Included

  • Rewrite _build_estimator() to construct a real okvis::ThreadedSlam from yaml_config_ via okvis::ViParametersReader. The DBoW2 vocabulary directory comes from a CMake-defined preprocessor constant (vocab artifact provisioning is AZ-944's scope; this ticket only consumes the path).
  • Rewrite _drive_estimator() to convert py::array_t<uint8_t>cv::Mat (zero-copy preferred) and call estimator_->addImages(stamp, {0: cv_mat}). Returns true iff the optimised-graph callback fired for this frame's keyframe.
  • Wire add_imu(ts_ns, accel, gyro) through estimator_->addImuMeasurement(stamp, alpha, omega). Keep the existing strict-monotonic guard on the binding side (line ~161).
  • Implement the setOptimisedGraphCallback(...) lambda: fill latest_output_ under output_mtx_ with pose_T_world_body (Eigen::Matrix4d), pose_covariance_6x6 (extracted from ViSlamBackend marginalised block — see Implementation Notes), accel_bias / gyro_bias, tracked / new / lost feature counts, mean_parallax, mre_px, emitted_at_ns.
  • Map okvis::TrackingQualityHealthState: GoodTracking, MarginalDegraded, LostLost. Update state_ inside the callback before latest_output_ is filled.
  • Rewrite reset() to release the existing estimator and reconstruct via _build_estimator(); apply the seed pose/velocity/bias to the new instance.
  • Catch all OKVIS2 / Eigen / std::runtime_error inside the binding and rethrow as OkvisInitException (during construction), OkvisOptimizationException (during operation), or OkvisFatalException (irrecoverable). No raw exceptions cross into Python.
  • Uncomment the OKVIS2 #include block (lines ~4850) and verify the _build_estimator / _drive_estimator paths compile cleanly under BUILD_OKVIS2=ON on a developer machine that has the apt deps. CI green-bar is AZ-944, not this ticket.

Excluded

  • CI apt deps and BUILD_OKVIS2=ON flip in Dockerfile.test.jetson / Linux runners — that's AZ-944's deliverable. This ticket leaves the CI build off; the C++ change rides as compile-clean only on hosts that already provision the deps (or after AZ-944 lands).
  • Jetson L4T image build + Tier-2 Derkachi e2e (--vio-strategy okvis2) — that's AZ-945's deliverable.
  • DBoW2 small_voc artifact provisioning — sibling decision in AZ-944 (vendor in-tree vs. download-on-build vs. build-from-source). This ticket consumes whatever path the CMake constant resolves to.
  • AZ-332 skeleton's surface decisions — exception types, latest_output_ struct fields, py::dict shape — settled by AZ-332. This ticket does not change them.
  • Multi-camera support — single nav-camera per RESTRICT-UAV-3 / AZ-332.
  • OKVIS2 upstream source modifications — pin is fixed per AZ-332 Plan-phase; deviations require an ADR. The covariance side-channel approach (Implementation Notes) is intentionally chosen to avoid upstream patching.

Acceptance Criteria

AC-1: Real estimator construction Given yaml_config_ is a valid OKVIS2 v2 YAML config and the DBoW2 vocab path resolves When _build_estimator() runs Then it constructs an okvis::ThreadedSlam instance via okvis::ViParametersReader and stores it in estimator_ (no longer nullptr); estimator_built_ is true; no exception thrown.

AC-2: Frame ingestion drives the estimator Given _drive_estimator() receives a py::array_t<uint8_t> of shape (H, W) (mono camera per RESTRICT-UAV-3) with a valid stamp_ns When the function runs Then it converts the array to cv::Mat (zero-copy preferred) and calls estimator_->addImages(stamp, {0: cv_mat}). Returns true iff the optimised-graph callback fired for this frame's keyframe within the configured timeout.

AC-3: IMU forwarding Given add_imu(ts_ns, accel, gyro) is called with strictly-monotonic timestamps When the function runs Then it forwards (stamp, alpha, omega) to estimator_->addImuMeasurement(...). The existing strict-monotonic guard (binding-side, line ~161) is preserved.

AC-4: Optimised-graph callback fills latest_output_ Given estimator_->setOptimisedGraphCallback(...) is wired with the binding's lambda When the OKVIS2 backend optimises a keyframe Then latest_output_ is filled under output_mtx_ with: pose_T_world_body (Eigen::Matrix4d), pose_covariance_6x6, accel_bias, gyro_bias, tracked_count / new_count / lost_count, mean_parallax, mre_px, emitted_at_ns. The 6×6 covariance is extracted from the ViSlamBackend marginalised block (see Implementation Notes for approach).

AC-5: Health-state mapping Given okvis::TrackingQuality is one of {Good, Marginal, Lost} When the callback fires Then state_ updates to {Tracking, Degraded, Lost} respectively, BEFORE latest_output_ is filled, so a concurrent reader sees consistent state+output.

AC-6: Reset rebuilds with seed Given an active Okvis2Strategy with a built estimator When reset(seed_pose, seed_velocity, seed_bias) is called Then the existing estimator is released (C++ resources freed), _build_estimator() reconstructs a fresh instance, and the seed is applied via OKVIS2's setSeedFromPriors(...) (or equivalent) before the next add_frame.

AC-7: Exception translation Given an OKVIS2-internal exception, an Eigen exception, or a std::runtime_error is raised inside the binding When the binding catches it Then it is rethrown as one of: OkvisInitException (if raised from _build_estimator), OkvisOptimizationException (if raised from _drive_estimator / add_imu), OkvisFatalException (if the backend signals irrecoverable failure). No raw C++ exception crosses the pybind11 boundary.

AC-8: Python unit tests stay green against the fake binding Given the fake-binding fixture at tests/unit/c1_vio/conftest.py is unchanged When pytest tests/unit/c1_vio/test_okvis2_strategy.py -v --tb=short runs (Tier-1) Then all pre-existing unit tests pass with no behavioural change. The fake-binding contract is unchanged — only the real C++ side gets wired.

Implementation Notes

Headers needed

  • okvis/ThreadedSlam.hpp — v2 SLAM front-end + back-end coordinator (replaces v1's ThreadedKFVio).
  • okvis/ViParametersReader.hpp — YAML config loader.
  • okvis/Estimator.hpp — back-end (needed for the covariance side-channel access).
  • okvis/cameras/PinholeCamera.hpp — K-matrix → OKVIS camera-object conversion if the binding constructs cameras directly (otherwise the YAML carries them).

6×6 covariance extraction — the known unknown

The setOptimisedGraphCallback payload (ViGraph snapshot) does NOT carry the latest-pose covariance directly; covariance lives inside the Estimator's back-end. Two approaches:

  • (a) Side-channel accessor (preferred for first cut): inside the callback, take a non-const handle to estimator_->backend() (or equivalent) and read the marginalised 6×6 block for the latest pose state. Keep the read protected by output_mtx_. If OKVIS2 v2 marks the back-end accessor private, fall back to subclassing ThreadedSlam and exposing a thin protected getter — still in our binding, no upstream change.
  • (b) Tiny upstream patch: add a public latestPoseCovariance6x6() method to okvis::ViSlamBackend and submit upstream. Faster diff but requires a pin bump + ADR per AZ-332 Plan-phase. Defer to (b) only if (a) hits a hard private-field block.

Pick (a) for the first cut. If (a) requires a subclass-exposed getter, document the subclass in a code comment referencing this AC and AZ-943.

cpp/okvis2/CMakeLists.txt already declares the link targets at lines ~6473: okvis_ceres, okvis_frontend, okvis_multisensor_processing, okvis_kinematics, okvis_cv, okvis_common, okvis_time, okvis_util. The _drive_estimator function needs okvis_cv for the cv::Mat integration. No new targets to add — verify the linker pulls them in cleanly under BUILD_OKVIS2=ON.

pybind11 surface — DO NOT change

The pybind11 module shape (lines ~296318) is correct and the Python facade unit tests confirm it. Do NOT alter the surface — add_frame, add_imu, reset, and the result struct fields stay byte-compatible with the fake binding. Only the C++ implementations behind those symbols change.

DBoW2 vocab path

Define a CMake preprocessor constant (e.g. OKVIS2_DBOW2_VOCAB_DIR) that points to a path the runtime can resolve. AZ-944 will populate this path with the small-vocabulary artifact (decision: vendor in-tree vs. download-on-build vs. build-from-source). For this ticket: declare the constant, consume it, and document the expected file layout (e.g. ${OKVIS2_DBOW2_VOCAB_DIR}/small_voc.yml.gz or similar) in a code comment referencing AZ-944.

Build verification

Compile-clean evidence on a host with apt deps installed (developer Mac with brew install ... equivalents OR a Linux dev VM with apt deps):

BUILD_OKVIS2=ON cmake -S . -B build && cmake --build build --target c1_vio_okvis2_native

Should produce the .so. Capture the build log in the batch report. The _native/__init__.py Python-side import test then confirms the symbol is loadable (without running OKVIS2 — just loading the shared object).

Constraints

  • Pin: OKVIS2 v2 upstream pin from AZ-332 Plan-phase is fixed. Any deviation requires an ADR.
  • No upstream patches unless approach (a) for covariance fails and is documented in a comment + retro entry.
  • Single nav-camera per RESTRICT-UAV-3 — multi-camera ingestion is out of scope.
  • No CI flip: this ticket leaves BUILD_OKVIS2=OFF in Dockerfile.test.jetson / Linux CI runners. AZ-944 owns the flip.
  • Backward compatibility: Python facade fake-binding tests stay green with no fixture changes.

Unit Tests

AC Ref What to Test Required Outcome
AC-1 C++ unit (gtest) — construct Okvis2Binding with a known-good YAML, assert estimator_built_ is true and no exception thrown Pass on a host with apt deps installed
AC-2 C++ unit — feed a synthetic cv::Mat via the C++ side, assert addImages is called once and the optimised-graph callback fires Pass
AC-4 C++ unit — drive a short EuRoC-like image+IMU sequence, assert latest_output_.pose_covariance_6x6 is non-zero finite SPD Pass; eigvals all > 0
AC-7 C++ unit — feed a known-bad YAML; assert OkvisInitException propagates with non-empty what() Pass
AC-8 Python — pytest tests/unit/c1_vio/test_okvis2_strategy.py -v --tb=short All pre-existing tests still pass (uses fake binding, no real OKVIS2)

C++ unit tests live under cpp/okvis2/tests/ (or wherever the existing OKVIS2 test layout sits — confirm during implementation; if no harness exists, add a minimal one and document in the batch report).

References

  • Jira ticket: AZ-943 (parent split AZ-592)
  • Sibling Jira tickets (Blocks chain AZ-943 → AZ-944 → AZ-945):
    • AZ-944 (3pt, Linux CI build env + DBoW2 vocab artifact + Tier-1 EuRoC mini smoke)
    • AZ-945 (3pt, Jetson L4T build + Tier-2 Derkachi --vio-strategy okvis2 e2e + perf baseline)
  • AZ-332 spec (the skeleton this replaces): _docs/02_tasks/done/AZ-332_c1_okvis2_strategy.md
  • ADR-002 (KLT/RANSAC mandatory baseline; OKVIS2 is the production-default architectural target)
  • cpp/okvis2/upstream/ (v2 source tree)
  • _docs/_autodev_state.md (resume context: Out-of-band bugfix cycle 94d2358 already committed; AZ-942 / AZ-923 parked; AZ-943→AZ-944→AZ-945 split rationale)