[AZ-943] Import OKVIS2 binding spec + dep table; autodev state trim

Imports AZ-943 (OKVIS2 binding: real ThreadedSlam wiring; AZ-592 split
1/3, 5pt) from Jira into a local task spec at
_docs/02_tasks/todo/AZ-943_okvis2_threadedslam_binding.md so the
implement skill batch loop has the input it needs.

Dependency table: +AZ-943 row, +preamble entry, totals 180→181 tasks /
570→575 SP. AZ-944 + AZ-945 stay Jira-only this session per the
AZ-943→AZ-944→AZ-945 Blocks chain (their local specs land when their
Implement turns come up).

State file trimmed from 52 lines to schema-compliant 13 lines per
.cursor/skills/autodev/state.md (sub_step.detail must be a one-line
pointer, not a logbook). Resume context lives in the new task spec +
git log of 94d2358 (AZ-918..AZ-922 baseline fixes).

Per AZ-942 + AZ-923 are parked (state file's "Open Items At Pause" is
recorded in git log via this commit's body; not retained in state file
going forward).

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-29 10:50:18 +03:00
parent 94d2358c8b
commit e367b07e3b
3 changed files with 184 additions and 6 deletions
File diff suppressed because one or more lines are too long
@@ -0,0 +1,177 @@
# C1 OKVIS2 Binding — Real ThreadedSlam Wiring (AZ-592 split 1/3)
**Task**: AZ-943_okvis2_threadedslam_binding
**Name**: OKVIS2 binding: replace AZ-332 skeleton with real `okvis::ThreadedSlam` wiring
**Description**: Sub-ticket 1 of 3 from the AZ-592 placeholder split (per state file 2026-05-27 split rationale). Replaces the AZ-332 skeleton in `src/gps_denied_onboard/components/c1_vio/_native/okvis2_binding.cpp` (`_build_estimator()` no-op, `_drive_estimator()` raises `OkvisFatalException`) with the real `okvis::ThreadedSlam` v2 pipeline: `ViParametersReader(yaml).getParameters(...)``ThreadedSlam(parameters, dBowDir)``setOptimisedGraphCallback(...)`. Without this wiring, `Okvis2Strategy` (AZ-332) is the production-default per architecture but throws on first `add_frame` — the production VIO is unusable. CI build env + Jetson validation are tracked in sibling tickets AZ-944 (3pt, Linux CI + DBoW2 vocab + Tier-1 smoke) and AZ-945 (3pt, Jetson L4T + Tier-2 Derkachi e2e); the Blocks chain in Jira is AZ-943 → AZ-944 → AZ-945. This ticket touches ONLY the C++ binding and the Python facade fake-binding fixture; it does NOT flip `BUILD_OKVIS2=ON` in CI (that's AZ-944's deliverable).
**Complexity**: 5 points
**Dependencies**: AZ-332 (the AZ-332 skeleton this replaces; in `done/`), AZ-592 (parent umbrella placeholder; in `backlog/`)
**Component**: c1_vio (epic AZ-254 / E-C1)
**Tracker**: AZ-943 (https://denyspopov.atlassian.net/browse/AZ-943)
**Epic**: AZ-254 (E-C1)
### Document Dependencies
- `_docs/02_document/contracts/c1_vio/vio_strategy_protocol.md` — the Protocol the strategy implements (AZ-331).
- `_docs/02_document/components/01_c1_vio/description.md` — § 5 implementation details (sliding-window K=1020, per-frame cost), § 7 caveats (thermal throttle latency spikes).
- ADR-002 (KLT/RANSAC mandatory baseline) — explains why this OKVIS2 wiring does NOT replace KLT/RANSAC; both ship.
- `cpp/okvis2/upstream/` — fully-populated v2 source tree the binding links against.
## Problem
`src/gps_denied_onboard/components/c1_vio/_native/okvis2_binding.cpp` is the AZ-332 skeleton:
- `_build_estimator()` (line ~251) sets `estimator_built_ = false` and does nothing else.
- `_drive_estimator()` (line ~261) throws `OkvisFatalException("OKVIS2 estimator not yet wired — this binding is the AZ-332 skeleton; tier2 follow-up wires okvis::ThreadedKFVio")` on first frame.
- Real OKVIS2 includes (`#include <okvis/ThreadedKFVio.hpp>` etc.) are commented out at lines ~4850.
Without this wiring, `Okvis2Strategy` cannot produce any output — the Python facade is complete, the binding compiles and loads, but the first `add_frame` immediately raises. The production-default VIO is unusable.
**API correction since AZ-332**: OKVIS2 v2 upstream uses `okvis::ThreadedSlam` (NOT `okvis::ThreadedKFVio` as the AZ-332 spec referenced; that's the OKVIS v1 API). The wiring must follow v2 conventions:
```
okvis::ViParametersReader(yaml_path).getParameters(parameters);
auto estimator = std::make_unique<okvis::ThreadedSlam>(parameters, dBowDir);
estimator->setOptimisedGraphCallback([this](auto&& g, auto&& l, auto&& s) { ... });
```
## Outcome
- `Okvis2Strategy.add_frame(...)` produces a real `VioOutput` (pose + 6×6 covariance + biases + tracking-quality counts) on every keyframe the OKVIS2 backend optimises — no exceptions on the first frame.
- `Okvis2Strategy.reset(...)` tears down the C++ estimator and rebuilds it with the supplied seed pose/velocity/bias.
- Existing Python unit tests (`tests/unit/c1_vio/test_okvis2_strategy.py`) remain green against the unchanged fake-binding fixture (`tests/unit/c1_vio/conftest.py`).
- This ticket alone does NOT light up the Tier-1 or Tier-2 e2e path against real OKVIS2 — that's AZ-944 / AZ-945. Tier-1 unit suite stays the only green-bar evidence here.
## Scope
### Included
- Rewrite `_build_estimator()` to construct a real `okvis::ThreadedSlam` from `yaml_config_` via `okvis::ViParametersReader`. The DBoW2 vocabulary directory comes from a CMake-defined preprocessor constant (vocab artifact provisioning is AZ-944's scope; this ticket only consumes the path).
- Rewrite `_drive_estimator()` to convert `py::array_t<uint8_t>``cv::Mat` (zero-copy preferred) and call `estimator_->addImages(stamp, {0: cv_mat})`. Returns `true` iff the optimised-graph callback fired for this frame's keyframe.
- Wire `add_imu(ts_ns, accel, gyro)` through `estimator_->addImuMeasurement(stamp, alpha, omega)`. Keep the existing strict-monotonic guard on the binding side (line ~161).
- Implement the `setOptimisedGraphCallback(...)` lambda: fill `latest_output_` under `output_mtx_` with pose_T_world_body (Eigen::Matrix4d), pose_covariance_6x6 (extracted from `ViSlamBackend` marginalised block — see Implementation Notes), accel_bias / gyro_bias, tracked / new / lost feature counts, mean_parallax, mre_px, emitted_at_ns.
- Map `okvis::TrackingQuality``HealthState`: `Good``Tracking`, `Marginal``Degraded`, `Lost``Lost`. Update `state_` inside the callback before `latest_output_` is filled.
- Rewrite `reset()` to release the existing estimator and reconstruct via `_build_estimator()`; apply the seed pose/velocity/bias to the new instance.
- Catch all OKVIS2 / Eigen / `std::runtime_error` inside the binding and rethrow as `OkvisInitException` (during construction), `OkvisOptimizationException` (during operation), or `OkvisFatalException` (irrecoverable). No raw exceptions cross into Python.
- Uncomment the OKVIS2 `#include` block (lines ~4850) and verify the `_build_estimator` / `_drive_estimator` paths compile cleanly under `BUILD_OKVIS2=ON` on a developer machine that has the apt deps. CI green-bar is AZ-944, not this ticket.
### Excluded
- **CI apt deps and `BUILD_OKVIS2=ON` flip in `Dockerfile.test.jetson` / Linux runners** — that's AZ-944's deliverable. This ticket leaves the CI build off; the C++ change rides as compile-clean only on hosts that already provision the deps (or after AZ-944 lands).
- **Jetson L4T image build + Tier-2 Derkachi e2e (`--vio-strategy okvis2`)** — that's AZ-945's deliverable.
- **DBoW2 small_voc artifact provisioning** — sibling decision in AZ-944 (vendor in-tree vs. download-on-build vs. build-from-source). This ticket consumes whatever path the CMake constant resolves to.
- **AZ-332 skeleton's surface decisions** — exception types, `latest_output_` struct fields, py::dict shape — settled by AZ-332. This ticket does not change them.
- **Multi-camera support** — single nav-camera per RESTRICT-UAV-3 / AZ-332.
- **OKVIS2 upstream source modifications** — pin is fixed per AZ-332 Plan-phase; deviations require an ADR. The covariance side-channel approach (Implementation Notes) is intentionally chosen to avoid upstream patching.
## Acceptance Criteria
**AC-1: Real estimator construction**
Given `yaml_config_` is a valid OKVIS2 v2 YAML config and the DBoW2 vocab path resolves
When `_build_estimator()` runs
Then it constructs an `okvis::ThreadedSlam` instance via `okvis::ViParametersReader` and stores it in `estimator_` (no longer `nullptr`); `estimator_built_` is `true`; no exception thrown.
**AC-2: Frame ingestion drives the estimator**
Given `_drive_estimator()` receives a `py::array_t<uint8_t>` of shape `(H, W)` (mono camera per RESTRICT-UAV-3) with a valid `stamp_ns`
When the function runs
Then it converts the array to `cv::Mat` (zero-copy preferred) and calls `estimator_->addImages(stamp, {0: cv_mat})`. Returns `true` iff the optimised-graph callback fired for this frame's keyframe within the configured timeout.
**AC-3: IMU forwarding**
Given `add_imu(ts_ns, accel, gyro)` is called with strictly-monotonic timestamps
When the function runs
Then it forwards `(stamp, alpha, omega)` to `estimator_->addImuMeasurement(...)`. The existing strict-monotonic guard (binding-side, line ~161) is preserved.
**AC-4: Optimised-graph callback fills `latest_output_`**
Given `estimator_->setOptimisedGraphCallback(...)` is wired with the binding's lambda
When the OKVIS2 backend optimises a keyframe
Then `latest_output_` is filled under `output_mtx_` with: `pose_T_world_body` (Eigen::Matrix4d), `pose_covariance_6x6`, `accel_bias`, `gyro_bias`, `tracked_count` / `new_count` / `lost_count`, `mean_parallax`, `mre_px`, `emitted_at_ns`. The 6×6 covariance is extracted from the `ViSlamBackend` marginalised block (see Implementation Notes for approach).
**AC-5: Health-state mapping**
Given `okvis::TrackingQuality` is one of `{Good, Marginal, Lost}`
When the callback fires
Then `state_` updates to `{Tracking, Degraded, Lost}` respectively, BEFORE `latest_output_` is filled, so a concurrent reader sees consistent state+output.
**AC-6: Reset rebuilds with seed**
Given an active `Okvis2Strategy` with a built estimator
When `reset(seed_pose, seed_velocity, seed_bias)` is called
Then the existing estimator is released (C++ resources freed), `_build_estimator()` reconstructs a fresh instance, and the seed is applied via OKVIS2's `setSeedFromPriors(...)` (or equivalent) before the next `add_frame`.
**AC-7: Exception translation**
Given an OKVIS2-internal exception, an Eigen exception, or a `std::runtime_error` is raised inside the binding
When the binding catches it
Then it is rethrown as one of: `OkvisInitException` (if raised from `_build_estimator`), `OkvisOptimizationException` (if raised from `_drive_estimator` / `add_imu`), `OkvisFatalException` (if the backend signals irrecoverable failure). No raw C++ exception crosses the pybind11 boundary.
**AC-8: Python unit tests stay green against the fake binding**
Given the fake-binding fixture at `tests/unit/c1_vio/conftest.py` is unchanged
When `pytest tests/unit/c1_vio/test_okvis2_strategy.py -v --tb=short` runs (Tier-1)
Then all pre-existing unit tests pass with no behavioural change. The fake-binding contract is unchanged — only the real C++ side gets wired.
## Implementation Notes
### Headers needed
- `okvis/ThreadedSlam.hpp` — v2 SLAM front-end + back-end coordinator (replaces v1's `ThreadedKFVio`).
- `okvis/ViParametersReader.hpp` — YAML config loader.
- `okvis/Estimator.hpp` — back-end (needed for the covariance side-channel access).
- `okvis/cameras/PinholeCamera.hpp` — K-matrix → OKVIS camera-object conversion if the binding constructs cameras directly (otherwise the YAML carries them).
### 6×6 covariance extraction — the known unknown
The `setOptimisedGraphCallback` payload (`ViGraph` snapshot) does NOT carry the latest-pose covariance directly; covariance lives inside the `Estimator`'s back-end. Two approaches:
- **(a) Side-channel accessor** (preferred for first cut): inside the callback, take a non-const handle to `estimator_->backend()` (or equivalent) and read the marginalised 6×6 block for the latest pose state. Keep the read protected by `output_mtx_`. If OKVIS2 v2 marks the back-end accessor private, fall back to subclassing `ThreadedSlam` and exposing a thin protected getter — still in our binding, no upstream change.
- **(b) Tiny upstream patch**: add a public `latestPoseCovariance6x6()` method to `okvis::ViSlamBackend` and submit upstream. Faster diff but requires a pin bump + ADR per AZ-332 Plan-phase. Defer to (b) only if (a) hits a hard private-field block.
Pick (a) for the first cut. If (a) requires a subclass-exposed getter, document the subclass in a code comment referencing this AC and AZ-943.
### CMake link targets
`cpp/okvis2/CMakeLists.txt` already declares the link targets at lines ~6473: `okvis_ceres`, `okvis_frontend`, `okvis_multisensor_processing`, `okvis_kinematics`, `okvis_cv`, `okvis_common`, `okvis_time`, `okvis_util`. The `_drive_estimator` function needs `okvis_cv` for the `cv::Mat` integration. No new targets to add — verify the linker pulls them in cleanly under `BUILD_OKVIS2=ON`.
### pybind11 surface — DO NOT change
The pybind11 module shape (lines ~296318) is correct and the Python facade unit tests confirm it. Do NOT alter the surface — `add_frame`, `add_imu`, `reset`, and the result struct fields stay byte-compatible with the fake binding. Only the C++ implementations behind those symbols change.
### DBoW2 vocab path
Define a CMake preprocessor constant (e.g. `OKVIS2_DBOW2_VOCAB_DIR`) that points to a path the runtime can resolve. AZ-944 will populate this path with the small-vocabulary artifact (decision: vendor in-tree vs. download-on-build vs. build-from-source). For this ticket: declare the constant, consume it, and document the expected file layout (e.g. `${OKVIS2_DBOW2_VOCAB_DIR}/small_voc.yml.gz` or similar) in a code comment referencing AZ-944.
### Build verification
Compile-clean evidence on a host with apt deps installed (developer Mac with `brew install ...` equivalents OR a Linux dev VM with apt deps):
```
BUILD_OKVIS2=ON cmake -S . -B build && cmake --build build --target c1_vio_okvis2_native
```
Should produce the `.so`. Capture the build log in the batch report. The `_native/__init__.py` Python-side import test then confirms the symbol is loadable (without running OKVIS2 — just loading the shared object).
## Constraints
- **Pin**: OKVIS2 v2 upstream pin from AZ-332 Plan-phase is fixed. Any deviation requires an ADR.
- **No upstream patches** unless approach (a) for covariance fails and is documented in a comment + retro entry.
- **Single nav-camera** per RESTRICT-UAV-3 — multi-camera ingestion is out of scope.
- **No CI flip**: this ticket leaves `BUILD_OKVIS2=OFF` in `Dockerfile.test.jetson` / Linux CI runners. AZ-944 owns the flip.
- **Backward compatibility**: Python facade fake-binding tests stay green with no fixture changes.
## Unit Tests
| AC Ref | What to Test | Required Outcome |
|--------|--------------|------------------|
| AC-1 | C++ unit (gtest) — construct `Okvis2Binding` with a known-good YAML, assert `estimator_built_` is `true` and no exception thrown | Pass on a host with apt deps installed |
| AC-2 | C++ unit — feed a synthetic `cv::Mat` via the C++ side, assert `addImages` is called once and the optimised-graph callback fires | Pass |
| AC-4 | C++ unit — drive a short EuRoC-like image+IMU sequence, assert `latest_output_.pose_covariance_6x6` is non-zero finite SPD | Pass; eigvals all > 0 |
| AC-7 | C++ unit — feed a known-bad YAML; assert `OkvisInitException` propagates with non-empty `what()` | Pass |
| AC-8 | Python — `pytest tests/unit/c1_vio/test_okvis2_strategy.py -v --tb=short` | All pre-existing tests still pass (uses fake binding, no real OKVIS2) |
C++ unit tests live under `cpp/okvis2/tests/` (or wherever the existing OKVIS2 test layout sits — confirm during implementation; if no harness exists, add a minimal one and document in the batch report).
## References
- Jira ticket: AZ-943 (parent split AZ-592)
- Sibling Jira tickets (Blocks chain AZ-943 → AZ-944 → AZ-945):
- AZ-944 (3pt, Linux CI build env + DBoW2 vocab artifact + Tier-1 EuRoC mini smoke)
- AZ-945 (3pt, Jetson L4T build + Tier-2 Derkachi `--vio-strategy okvis2` e2e + perf baseline)
- AZ-332 spec (the skeleton this replaces): `_docs/02_tasks/done/AZ-332_c1_okvis2_strategy.md`
- ADR-002 (KLT/RANSAC mandatory baseline; OKVIS2 is the production-default architectural target)
- `cpp/okvis2/upstream/` (v2 source tree)
- `_docs/_autodev_state.md` (resume context: Out-of-band bugfix cycle 94d2358 already committed; AZ-942 / AZ-923 parked; AZ-943→AZ-944→AZ-945 split rationale)