mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-22 23:51:12 +00:00
[AZ-943] [AZ-951] [AZ-952] Move OKVIS2 chain back to todo/ as next phase
Per user 2026-05-29 directive: "OKVIS2-related tasks needed to be implemented after full e2e derkachi flight test would be finished successfully. So maybe put it back to todo?" Reasoning accepted. OKVIS2 chain is the planned NEXT phase after the cycle-4 Derkachi demo lands, not a cycle-5+ deferral. The 2026-05-27 production-default pivot directive remains in force; today's earlier "deferred to cycle-5+" framing was over-correction after the AZ-943 spec-reality gap. - AZ-943 stays HARD-BLOCKED on AZ-951 + AZ-952 (PAUSED preamble preserved). Cannot be worked on until both blockers land. Moving to todo/ signals "queued, next-after-blockers", not "actionable now". - AZ-951 + AZ-952 are themselves NOT blocked. They ship the upstream patches that unblock AZ-943. Implementation sequence (unchanged): finish cycle-4 demo (AZ-959 + remaining CSV-replay path) → AZ-951 → AZ-952 → AZ-943 → AZ-944 → AZ-945. Current implement-batch target stays AZ-959; this commit is bookkeeping only, does not change what's next on deck. Touches: 3 file moves (backlog/ → todo/), dep-table preamble fourth bump narrative documenting the placement reversal. Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -0,0 +1,194 @@
|
||||
# C1 OKVIS2 Binding — Real ThreadedSlam Wiring (AZ-592 split 1/3)
|
||||
|
||||
> **STATUS (2026-05-29): PAUSED — BLOCKED on AZ-951 + AZ-952.**
|
||||
>
|
||||
> Implementation attempt on 2026-05-29 confirmed AC-4 is structurally unreachable without upstream OKVIS2 patches:
|
||||
>
|
||||
> - `ThreadedSlam::estimator_` is `private` (not `protected`) → in-binding subclass workaround proposed in Implementation Notes "approach (a)" is impossible.
|
||||
> - `ViSlamBackend` has no public accessor for 6×6 pose covariance, feature counts, mean parallax, or MRE.
|
||||
> - `TrackingState` (callback arg) only carries id / isKeyframe / TrackingQuality enum / recognisedPlace / isFullGraphOptimising / currentKeyframeId — none of the AC-4 telemetry fields.
|
||||
>
|
||||
> The "approach (b) upstream patch" fallback documented in this file + AZ-592 has been filed as two sibling tickets and linked as `is blocked by` against AZ-943:
|
||||
>
|
||||
> - **AZ-951** (3 SP): upstream patch — expose 6×6 pose covariance accessor (+ ADR for pin deviation).
|
||||
> - **AZ-952** (3 SP): upstream patch — expose tracking-stats accessor (feature counts + parallax + MRE).
|
||||
>
|
||||
> Jira AZ-943 reverted to To Do. This local file moved from `todo/` → `backlog/`. The AC list + Implementation Notes below are PRESERVED unchanged for audit; once AZ-951 + AZ-952 land, AC-4 implementation will call `backend().computeCovariance6x6(state.id)` + `backend().getLatestTrackingStats(state.id, ...)` and the file moves back to `todo/`.
|
||||
>
|
||||
> Audit reference: AZ-943 Jira comment "Implementation paused: spec gap discovered (2026-05-29)" — full root-cause + decision rationale.
|
||||
|
||||
**Task**: AZ-943_okvis2_threadedslam_binding
|
||||
**Name**: OKVIS2 binding: replace AZ-332 skeleton with real `okvis::ThreadedSlam` wiring
|
||||
**Description**: Sub-ticket 1 of 3 from the AZ-592 placeholder split (per state file 2026-05-27 split rationale). Replaces the AZ-332 skeleton in `src/gps_denied_onboard/components/c1_vio/_native/okvis2_binding.cpp` (`_build_estimator()` no-op, `_drive_estimator()` raises `OkvisFatalException`) with the real `okvis::ThreadedSlam` v2 pipeline: `ViParametersReader(yaml).getParameters(...)` → `ThreadedSlam(parameters, dBowDir)` → `setOptimisedGraphCallback(...)`. Without this wiring, `Okvis2Strategy` (AZ-332) is the production-default per architecture but throws on first `add_frame` — the production VIO is unusable. CI build env + Jetson validation are tracked in sibling tickets AZ-944 (3pt, Linux CI + DBoW2 vocab + Tier-1 smoke) and AZ-945 (3pt, Jetson L4T + Tier-2 Derkachi e2e); the Blocks chain in Jira is AZ-943 → AZ-944 → AZ-945. This ticket touches ONLY the C++ binding and the Python facade fake-binding fixture; it does NOT flip `BUILD_OKVIS2=ON` in CI (that's AZ-944's deliverable).
|
||||
**Complexity**: 5 points
|
||||
**Dependencies**: AZ-332 (the AZ-332 skeleton this replaces; in `done/`), AZ-592 (parent umbrella placeholder; in `backlog/`)
|
||||
**Component**: c1_vio (epic AZ-254 / E-C1)
|
||||
**Tracker**: AZ-943 (https://denyspopov.atlassian.net/browse/AZ-943)
|
||||
**Epic**: AZ-254 (E-C1)
|
||||
|
||||
### Document Dependencies
|
||||
|
||||
- `_docs/02_document/contracts/c1_vio/vio_strategy_protocol.md` — the Protocol the strategy implements (AZ-331).
|
||||
- `_docs/02_document/components/01_c1_vio/description.md` — § 5 implementation details (sliding-window K=10–20, per-frame cost), § 7 caveats (thermal throttle latency spikes).
|
||||
- ADR-002 (KLT/RANSAC mandatory baseline) — explains why this OKVIS2 wiring does NOT replace KLT/RANSAC; both ship.
|
||||
- `cpp/okvis2/upstream/` — fully-populated v2 source tree the binding links against.
|
||||
|
||||
## Problem
|
||||
|
||||
`src/gps_denied_onboard/components/c1_vio/_native/okvis2_binding.cpp` is the AZ-332 skeleton:
|
||||
|
||||
- `_build_estimator()` (line ~251) sets `estimator_built_ = false` and does nothing else.
|
||||
- `_drive_estimator()` (line ~261) throws `OkvisFatalException("OKVIS2 estimator not yet wired — this binding is the AZ-332 skeleton; tier2 follow-up wires okvis::ThreadedKFVio")` on first frame.
|
||||
- Real OKVIS2 includes (`#include <okvis/ThreadedKFVio.hpp>` etc.) are commented out at lines ~48–50.
|
||||
|
||||
Without this wiring, `Okvis2Strategy` cannot produce any output — the Python facade is complete, the binding compiles and loads, but the first `add_frame` immediately raises. The production-default VIO is unusable.
|
||||
|
||||
**API correction since AZ-332**: OKVIS2 v2 upstream uses `okvis::ThreadedSlam` (NOT `okvis::ThreadedKFVio` as the AZ-332 spec referenced; that's the OKVIS v1 API). The wiring must follow v2 conventions:
|
||||
|
||||
```
|
||||
okvis::ViParametersReader(yaml_path).getParameters(parameters);
|
||||
auto estimator = std::make_unique<okvis::ThreadedSlam>(parameters, dBowDir);
|
||||
estimator->setOptimisedGraphCallback([this](auto&& g, auto&& l, auto&& s) { ... });
|
||||
```
|
||||
|
||||
## Outcome
|
||||
|
||||
- `Okvis2Strategy.add_frame(...)` produces a real `VioOutput` (pose + 6×6 covariance + biases + tracking-quality counts) on every keyframe the OKVIS2 backend optimises — no exceptions on the first frame.
|
||||
- `Okvis2Strategy.reset(...)` tears down the C++ estimator and rebuilds it with the supplied seed pose/velocity/bias.
|
||||
- Existing Python unit tests (`tests/unit/c1_vio/test_okvis2_strategy.py`) remain green against the unchanged fake-binding fixture (`tests/unit/c1_vio/conftest.py`).
|
||||
- This ticket alone does NOT light up the Tier-1 or Tier-2 e2e path against real OKVIS2 — that's AZ-944 / AZ-945. Tier-1 unit suite stays the only green-bar evidence here.
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
|
||||
- Rewrite `_build_estimator()` to construct a real `okvis::ThreadedSlam` from `yaml_config_` via `okvis::ViParametersReader`. The DBoW2 vocabulary directory comes from a CMake-defined preprocessor constant (vocab artifact provisioning is AZ-944's scope; this ticket only consumes the path).
|
||||
- Rewrite `_drive_estimator()` to convert `py::array_t<uint8_t>` → `cv::Mat` (zero-copy preferred) and call `estimator_->addImages(stamp, {0: cv_mat})`. Returns `true` iff the optimised-graph callback fired for this frame's keyframe.
|
||||
- Wire `add_imu(ts_ns, accel, gyro)` through `estimator_->addImuMeasurement(stamp, alpha, omega)`. Keep the existing strict-monotonic guard on the binding side (line ~161).
|
||||
- Implement the `setOptimisedGraphCallback(...)` lambda: fill `latest_output_` under `output_mtx_` with pose_T_world_body (Eigen::Matrix4d), pose_covariance_6x6 (extracted from `ViSlamBackend` marginalised block — see Implementation Notes), accel_bias / gyro_bias, tracked / new / lost feature counts, mean_parallax, mre_px, emitted_at_ns.
|
||||
- Map `okvis::TrackingQuality` → `HealthState`: `Good`→`Tracking`, `Marginal`→`Degraded`, `Lost`→`Lost`. Update `state_` inside the callback before `latest_output_` is filled.
|
||||
- Rewrite `reset()` to release the existing estimator and reconstruct via `_build_estimator()`; apply the seed pose/velocity/bias to the new instance.
|
||||
- Catch all OKVIS2 / Eigen / `std::runtime_error` inside the binding and rethrow as `OkvisInitException` (during construction), `OkvisOptimizationException` (during operation), or `OkvisFatalException` (irrecoverable). No raw exceptions cross into Python.
|
||||
- Uncomment the OKVIS2 `#include` block (lines ~48–50) and verify the `_build_estimator` / `_drive_estimator` paths compile cleanly under `BUILD_OKVIS2=ON` on a developer machine that has the apt deps. CI green-bar is AZ-944, not this ticket.
|
||||
|
||||
### Excluded
|
||||
|
||||
- **CI apt deps and `BUILD_OKVIS2=ON` flip in `Dockerfile.test.jetson` / Linux runners** — that's AZ-944's deliverable. This ticket leaves the CI build off; the C++ change rides as compile-clean only on hosts that already provision the deps (or after AZ-944 lands).
|
||||
- **Jetson L4T image build + Tier-2 Derkachi e2e (`--vio-strategy okvis2`)** — that's AZ-945's deliverable.
|
||||
- **DBoW2 small_voc artifact provisioning** — sibling decision in AZ-944 (vendor in-tree vs. download-on-build vs. build-from-source). This ticket consumes whatever path the CMake constant resolves to.
|
||||
- **AZ-332 skeleton's surface decisions** — exception types, `latest_output_` struct fields, py::dict shape — settled by AZ-332. This ticket does not change them.
|
||||
- **Multi-camera support** — single nav-camera per RESTRICT-UAV-3 / AZ-332.
|
||||
- **OKVIS2 upstream source modifications** — pin is fixed per AZ-332 Plan-phase; deviations require an ADR. The covariance side-channel approach (Implementation Notes) is intentionally chosen to avoid upstream patching.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: Real estimator construction**
|
||||
Given `yaml_config_` is a valid OKVIS2 v2 YAML config and the DBoW2 vocab path resolves
|
||||
When `_build_estimator()` runs
|
||||
Then it constructs an `okvis::ThreadedSlam` instance via `okvis::ViParametersReader` and stores it in `estimator_` (no longer `nullptr`); `estimator_built_` is `true`; no exception thrown.
|
||||
|
||||
**AC-2: Frame ingestion drives the estimator**
|
||||
Given `_drive_estimator()` receives a `py::array_t<uint8_t>` of shape `(H, W)` (mono camera per RESTRICT-UAV-3) with a valid `stamp_ns`
|
||||
When the function runs
|
||||
Then it converts the array to `cv::Mat` (zero-copy preferred) and calls `estimator_->addImages(stamp, {0: cv_mat})`. Returns `true` iff the optimised-graph callback fired for this frame's keyframe within the configured timeout.
|
||||
|
||||
**AC-3: IMU forwarding**
|
||||
Given `add_imu(ts_ns, accel, gyro)` is called with strictly-monotonic timestamps
|
||||
When the function runs
|
||||
Then it forwards `(stamp, alpha, omega)` to `estimator_->addImuMeasurement(...)`. The existing strict-monotonic guard (binding-side, line ~161) is preserved.
|
||||
|
||||
**AC-4: Optimised-graph callback fills `latest_output_`**
|
||||
Given `estimator_->setOptimisedGraphCallback(...)` is wired with the binding's lambda
|
||||
When the OKVIS2 backend optimises a keyframe
|
||||
Then `latest_output_` is filled under `output_mtx_` with: `pose_T_world_body` (Eigen::Matrix4d), `pose_covariance_6x6`, `accel_bias`, `gyro_bias`, `tracked_count` / `new_count` / `lost_count`, `mean_parallax`, `mre_px`, `emitted_at_ns`. The 6×6 covariance is extracted from the `ViSlamBackend` marginalised block (see Implementation Notes for approach).
|
||||
|
||||
**AC-5: Health-state mapping**
|
||||
Given `okvis::TrackingQuality` is one of `{Good, Marginal, Lost}`
|
||||
When the callback fires
|
||||
Then `state_` updates to `{Tracking, Degraded, Lost}` respectively, BEFORE `latest_output_` is filled, so a concurrent reader sees consistent state+output.
|
||||
|
||||
**AC-6: Reset rebuilds with seed**
|
||||
Given an active `Okvis2Strategy` with a built estimator
|
||||
When `reset(seed_pose, seed_velocity, seed_bias)` is called
|
||||
Then the existing estimator is released (C++ resources freed), `_build_estimator()` reconstructs a fresh instance, and the seed is applied via OKVIS2's `setSeedFromPriors(...)` (or equivalent) before the next `add_frame`.
|
||||
|
||||
**AC-7: Exception translation**
|
||||
Given an OKVIS2-internal exception, an Eigen exception, or a `std::runtime_error` is raised inside the binding
|
||||
When the binding catches it
|
||||
Then it is rethrown as one of: `OkvisInitException` (if raised from `_build_estimator`), `OkvisOptimizationException` (if raised from `_drive_estimator` / `add_imu`), `OkvisFatalException` (if the backend signals irrecoverable failure). No raw C++ exception crosses the pybind11 boundary.
|
||||
|
||||
**AC-8: Python unit tests stay green against the fake binding**
|
||||
Given the fake-binding fixture at `tests/unit/c1_vio/conftest.py` is unchanged
|
||||
When `pytest tests/unit/c1_vio/test_okvis2_strategy.py -v --tb=short` runs (Tier-1)
|
||||
Then all pre-existing unit tests pass with no behavioural change. The fake-binding contract is unchanged — only the real C++ side gets wired.
|
||||
|
||||
## Implementation Notes
|
||||
|
||||
### Headers needed
|
||||
|
||||
- `okvis/ThreadedSlam.hpp` — v2 SLAM front-end + back-end coordinator (replaces v1's `ThreadedKFVio`).
|
||||
- `okvis/ViParametersReader.hpp` — YAML config loader.
|
||||
- `okvis/Estimator.hpp` — back-end (needed for the covariance side-channel access).
|
||||
- `okvis/cameras/PinholeCamera.hpp` — K-matrix → OKVIS camera-object conversion if the binding constructs cameras directly (otherwise the YAML carries them).
|
||||
|
||||
### 6×6 covariance extraction — the known unknown
|
||||
|
||||
The `setOptimisedGraphCallback` payload (`ViGraph` snapshot) does NOT carry the latest-pose covariance directly; covariance lives inside the `Estimator`'s back-end. Two approaches:
|
||||
|
||||
- **(a) Side-channel accessor** (preferred for first cut): inside the callback, take a non-const handle to `estimator_->backend()` (or equivalent) and read the marginalised 6×6 block for the latest pose state. Keep the read protected by `output_mtx_`. If OKVIS2 v2 marks the back-end accessor private, fall back to subclassing `ThreadedSlam` and exposing a thin protected getter — still in our binding, no upstream change.
|
||||
- **(b) Tiny upstream patch**: add a public `latestPoseCovariance6x6()` method to `okvis::ViSlamBackend` and submit upstream. Faster diff but requires a pin bump + ADR per AZ-332 Plan-phase. Defer to (b) only if (a) hits a hard private-field block.
|
||||
|
||||
Pick (a) for the first cut. If (a) requires a subclass-exposed getter, document the subclass in a code comment referencing this AC and AZ-943.
|
||||
|
||||
### CMake link targets
|
||||
|
||||
`cpp/okvis2/CMakeLists.txt` already declares the link targets at lines ~64–73: `okvis_ceres`, `okvis_frontend`, `okvis_multisensor_processing`, `okvis_kinematics`, `okvis_cv`, `okvis_common`, `okvis_time`, `okvis_util`. The `_drive_estimator` function needs `okvis_cv` for the `cv::Mat` integration. No new targets to add — verify the linker pulls them in cleanly under `BUILD_OKVIS2=ON`.
|
||||
|
||||
### pybind11 surface — DO NOT change
|
||||
|
||||
The pybind11 module shape (lines ~296–318) is correct and the Python facade unit tests confirm it. Do NOT alter the surface — `add_frame`, `add_imu`, `reset`, and the result struct fields stay byte-compatible with the fake binding. Only the C++ implementations behind those symbols change.
|
||||
|
||||
### DBoW2 vocab path
|
||||
|
||||
Define a CMake preprocessor constant (e.g. `OKVIS2_DBOW2_VOCAB_DIR`) that points to a path the runtime can resolve. AZ-944 will populate this path with the small-vocabulary artifact (decision: vendor in-tree vs. download-on-build vs. build-from-source). For this ticket: declare the constant, consume it, and document the expected file layout (e.g. `${OKVIS2_DBOW2_VOCAB_DIR}/small_voc.yml.gz` or similar) in a code comment referencing AZ-944.
|
||||
|
||||
### Build verification
|
||||
|
||||
Compile-clean evidence on a host with apt deps installed (developer Mac with `brew install ...` equivalents OR a Linux dev VM with apt deps):
|
||||
|
||||
```
|
||||
BUILD_OKVIS2=ON cmake -S . -B build && cmake --build build --target c1_vio_okvis2_native
|
||||
```
|
||||
|
||||
Should produce the `.so`. Capture the build log in the batch report. The `_native/__init__.py` Python-side import test then confirms the symbol is loadable (without running OKVIS2 — just loading the shared object).
|
||||
|
||||
## Constraints
|
||||
|
||||
- **Pin**: OKVIS2 v2 upstream pin from AZ-332 Plan-phase is fixed. Any deviation requires an ADR.
|
||||
- **No upstream patches** unless approach (a) for covariance fails and is documented in a comment + retro entry.
|
||||
- **Single nav-camera** per RESTRICT-UAV-3 — multi-camera ingestion is out of scope.
|
||||
- **No CI flip**: this ticket leaves `BUILD_OKVIS2=OFF` in `Dockerfile.test.jetson` / Linux CI runners. AZ-944 owns the flip.
|
||||
- **Backward compatibility**: Python facade fake-binding tests stay green with no fixture changes.
|
||||
|
||||
## Unit Tests
|
||||
|
||||
| AC Ref | What to Test | Required Outcome |
|
||||
|--------|--------------|------------------|
|
||||
| AC-1 | C++ unit (gtest) — construct `Okvis2Binding` with a known-good YAML, assert `estimator_built_` is `true` and no exception thrown | Pass on a host with apt deps installed |
|
||||
| AC-2 | C++ unit — feed a synthetic `cv::Mat` via the C++ side, assert `addImages` is called once and the optimised-graph callback fires | Pass |
|
||||
| AC-4 | C++ unit — drive a short EuRoC-like image+IMU sequence, assert `latest_output_.pose_covariance_6x6` is non-zero finite SPD | Pass; eigvals all > 0 |
|
||||
| AC-7 | C++ unit — feed a known-bad YAML; assert `OkvisInitException` propagates with non-empty `what()` | Pass |
|
||||
| AC-8 | Python — `pytest tests/unit/c1_vio/test_okvis2_strategy.py -v --tb=short` | All pre-existing tests still pass (uses fake binding, no real OKVIS2) |
|
||||
|
||||
C++ unit tests live under `cpp/okvis2/tests/` (or wherever the existing OKVIS2 test layout sits — confirm during implementation; if no harness exists, add a minimal one and document in the batch report).
|
||||
|
||||
## References
|
||||
|
||||
- Jira ticket: AZ-943 (parent split AZ-592)
|
||||
- Sibling Jira tickets (Blocks chain AZ-943 → AZ-944 → AZ-945):
|
||||
- AZ-944 (3pt, Linux CI build env + DBoW2 vocab artifact + Tier-1 EuRoC mini smoke)
|
||||
- AZ-945 (3pt, Jetson L4T build + Tier-2 Derkachi `--vio-strategy okvis2` e2e + perf baseline)
|
||||
- AZ-332 spec (the skeleton this replaces): `_docs/02_tasks/done/AZ-332_c1_okvis2_strategy.md`
|
||||
- ADR-002 (KLT/RANSAC mandatory baseline; OKVIS2 is the production-default architectural target)
|
||||
- `cpp/okvis2/upstream/` (v2 source tree)
|
||||
- `_docs/_autodev_state.md` (resume context: Out-of-band bugfix cycle 94d2358 already committed; AZ-942 / AZ-923 parked; AZ-943→AZ-944→AZ-945 split rationale)
|
||||
Reference in New Issue
Block a user