[AZ-943] [AZ-951] [AZ-952] Pause AZ-943 on OKVIS2 telemetry gap

AZ-943 implementation attempt confirmed the C++ binding cannot satisfy
AC-4 without upstream OKVIS2 patches. The spec's "approach (a)
in-binding subclass workaround" is structurally impossible:
- ThreadedSlam::estimator_ is `private` (not `protected`)
- ViSlamBackend has no public covariance / counts / parallax / MRE
  accessor in the v2 upstream headers
- TrackingState carries only id / isKeyframe / TrackingQuality enum /
  recognisedPlace / isFullGraphOptimising / currentKeyframeId — none
  of the five tracking-stats fields the binding needs

Filed the spec-documented "approach (b)" fallback as two sibling
tickets, both linked Jira-side as `is blocked by` against AZ-943:

- AZ-951 (3 SP): upstream patch — expose 6x6 pose covariance accessor
  (+ ADR-XXX for the AZ-332 Plan-phase pin deviation)
- AZ-952 (3 SP): upstream patch — expose tracking-stats accessor
  (feature counts + parallax + MRE)

AZ-943 transitioned In Progress -> To Do in Jira, full audit comment
attached. Local AZ-943 spec moved todo/ -> backlog/ with PAUSED
preamble; original AC list preserved for the post-unblock turn.

Per user 2026-05-29 confirmation: cycle-4 Derkachi demo target stays
KLT/RANSAC (tests/e2e/replay/conftest.py line 159
c1_vio: strategy: klt_ransac), so AZ-951 + AZ-952 + AZ-943 chain is
correctly deferred. Pivoting next batch to AZ-897 (replay UI form).

Touches: _docs/02_tasks/_dependencies_table.md (preamble + table
rows for AZ-943 paused / AZ-951 / AZ-952 added; totals bumped to
142 product + 41 blackbox-test = 183, 448 product + 133 blackbox
= 581), _docs/_autodev_state.md (sub_step pivot to AZ-897).

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-29 11:48:09 +03:00
parent 42b1db6ace
commit e8caa29da6
5 changed files with 161 additions and 7 deletions
File diff suppressed because one or more lines are too long
@@ -1,5 +1,22 @@
# C1 OKVIS2 Binding — Real ThreadedSlam Wiring (AZ-592 split 1/3)
> **STATUS (2026-05-29): PAUSED — BLOCKED on AZ-951 + AZ-952.**
>
> Implementation attempt on 2026-05-29 confirmed AC-4 is structurally unreachable without upstream OKVIS2 patches:
>
> - `ThreadedSlam::estimator_` is `private` (not `protected`) → in-binding subclass workaround proposed in Implementation Notes "approach (a)" is impossible.
> - `ViSlamBackend` has no public accessor for 6×6 pose covariance, feature counts, mean parallax, or MRE.
> - `TrackingState` (callback arg) only carries id / isKeyframe / TrackingQuality enum / recognisedPlace / isFullGraphOptimising / currentKeyframeId — none of the AC-4 telemetry fields.
>
> The "approach (b) upstream patch" fallback documented in this file + AZ-592 has been filed as two sibling tickets and linked as `is blocked by` against AZ-943:
>
> - **AZ-951** (3 SP): upstream patch — expose 6×6 pose covariance accessor (+ ADR for pin deviation).
> - **AZ-952** (3 SP): upstream patch — expose tracking-stats accessor (feature counts + parallax + MRE).
>
> Jira AZ-943 reverted to To Do. This local file moved from `todo/``backlog/`. The AC list + Implementation Notes below are PRESERVED unchanged for audit; once AZ-951 + AZ-952 land, AC-4 implementation will call `backend().computeCovariance6x6(state.id)` + `backend().getLatestTrackingStats(state.id, ...)` and the file moves back to `todo/`.
>
> Audit reference: AZ-943 Jira comment "Implementation paused: spec gap discovered (2026-05-29)" — full root-cause + decision rationale.
**Task**: AZ-943_okvis2_threadedslam_binding
**Name**: OKVIS2 binding: replace AZ-332 skeleton with real `okvis::ThreadedSlam` wiring
**Description**: Sub-ticket 1 of 3 from the AZ-592 placeholder split (per state file 2026-05-27 split rationale). Replaces the AZ-332 skeleton in `src/gps_denied_onboard/components/c1_vio/_native/okvis2_binding.cpp` (`_build_estimator()` no-op, `_drive_estimator()` raises `OkvisFatalException`) with the real `okvis::ThreadedSlam` v2 pipeline: `ViParametersReader(yaml).getParameters(...)``ThreadedSlam(parameters, dBowDir)``setOptimisedGraphCallback(...)`. Without this wiring, `Okvis2Strategy` (AZ-332) is the production-default per architecture but throws on first `add_frame` — the production VIO is unusable. CI build env + Jetson validation are tracked in sibling tickets AZ-944 (3pt, Linux CI + DBoW2 vocab + Tier-1 smoke) and AZ-945 (3pt, Jetson L4T + Tier-2 Derkachi e2e); the Blocks chain in Jira is AZ-943 → AZ-944 → AZ-945. This ticket touches ONLY the C++ binding and the Python facade fake-binding fixture; it does NOT flip `BUILD_OKVIS2=ON` in CI (that's AZ-944's deliverable).
@@ -0,0 +1,65 @@
# OKVIS2 v2 upstream patch: expose 6×6 pose covariance accessor (+ ADR for pin deviation)
**Task**: AZ-951_okvis2_upstream_covariance_patch
**Name**: OKVIS2 v2 upstream patch — expose 6×6 pose covariance accessor (+ ADR for pin deviation)
**Description**: Land the documented "approach (b) upstream patch" escape hatch from AZ-592 (line 30) and AZ-943 (Implementation Notes "Tiny upstream patch"). AZ-943's implementation attempt on 2026-05-29 confirmed that the proposed "approach (a) in-binding subclass workaround" is structurally impossible: `ThreadedSlam::estimator_` is declared `private` (not `protected`), and `ViSlamBackend` has no public covariance accessor anywhere in the v2 upstream headers.
**Complexity**: 3 SP
**Dependencies**: AZ-332 (the AZ-332 Plan-phase pin this work deviates from), AZ-592 (parent placeholder that originally offered this approach as option (a) in its line 30-31)
**Component**: c1_vio (epic AZ-254 / E-C1); also touches `cpp/okvis2/` (upstream wrapper) and `_docs/03_implementation/architecture/decisions/` (ADR)
**Tracker**: AZ-951 (https://denyspopov.atlassian.net/browse/AZ-951)
**Parent Epic**: AZ-254 (E-C1)
**Blocks**: AZ-943 AC-4 (pose_covariance_6x6 field)
Jira AZ-951 is the authoritative spec; this file is the in-workspace mirror.
## Goal
Land the documented "approach (b) upstream patch" escape hatch. **Blocks AZ-943 AC-4** (`pose_covariance_6x6` field). The Python facade (`okvis2.py` `_build_vio_output`) shape-checks the 6×6 covariance; downstream EKF in C2 treats it as Kalman gain weight, so an identity placeholder would lie about VIO uncertainty (contradicts AZ-848 ESKF out-of-order analysis).
## Scope
1. **ADR-XXX (pin deviation rationale)** under `_docs/03_implementation/architecture/decisions/`:
- Title: "OKVIS2 v2 upstream patch — expose ViSlamBackend pose-covariance accessor for Okvis2Backend C++ binding"
- Decision: deviate from AZ-332 Plan-phase fixed pin to land a small, surgical, documented patch.
- Alternatives considered: (1) keep pin + ship placeholder covariance (violates meta-rule "Real Results, Not Simulated Ones"); (2) hard fork OKVIS2 (rejected — too much surface); (3) upstream the patch as a follow-up contribution to the OKVIS2 maintainers (recommended).
- Consequences: future upstream rebases must reapply; patch is small and self-contained to minimise that cost.
2. **Patch file** under `cpp/okvis2/patches/expose_covariance.patch`:
- Make `ThreadedSlam::estimator_` reachable from the binding: either add `public okvis::ViSlamBackend& backend()` to `ThreadedSlam` OR change `estimator_` from `private` to `protected`. Recommend the public accessor — cleaner API surface, less invasive.
- Add `Eigen::Matrix<double, 6, 6> ViSlamBackend::computeCovariance6x6(StateId id) const` — wraps `ceres::Covariance::Compute` over the `realtimeGraph_`'s ceres::Problem for the pose parameter block at `id`. Returns a documented failure-sentinel (identity * large scale + warning log) when the covariance computation is rank-deficient; binding then flags the output as low-confidence.
3. **CMake glue** in `cpp/okvis2/CMakeLists.txt`: apply the patch via the chosen mechanism (decided at scheduling — see Open Decisions).
4. **Verification path**: compile-clean evidence comes via AZ-944 (Linux CI BUILD_OKVIS2=ON flip). Local macOS gets no compile evidence (project policy).
## Acceptance Criteria
- **AC-1**: ADR-XXX exists under `_docs/03_implementation/architecture/decisions/`, follows the project's existing ADR template, and cites AZ-332 (Plan-phase pin), AZ-592 (parent placeholder), AZ-943 (the blocked binding ticket), and this ticket's Jira key.
- **AC-2**: `cpp/okvis2/patches/expose_covariance.patch` exists, applies cleanly to the vendored upstream at `cpp/okvis2/upstream/`, and the patch surface is ≤ 100 lines of diff (keeps future rebase cost low).
- **AC-3**: After patch application, `okvis::ThreadedSlam` has a public method that returns a reference / pointer to its `okvis::ViSlamBackend` member.
- **AC-4**: After patch application, `okvis::ViSlamBackend` has a public `Eigen::Matrix<double, 6, 6> computeCovariance6x6(StateId id) const` method backed by `ceres::Covariance::Compute`. Behaviour:
- Success: returns the 6×6 marginalised pose covariance.
- Failure (rank-deficient / non-converged / wrong state ID): returns a documented failure sentinel and emits a single warning log per occurrence — NO exception thrown into the binding (the binding layer decides whether to surface this as an OkvisOptimizationException).
- **AC-5**: Patch mechanism (in-place `git apply` vs vendored header overrides vs forked submodule) is chosen at scheduling and documented in the patch's commit message + ADR-XXX.
- **AC-6**: Local task spec for AZ-943 is updated to call `backend().computeCovariance6x6(state.id)` inside the `setOptimisedGraphCallback` lambda (the AZ-943 implementation, post-unblock).
## Open Decisions (resolve at scheduling)
1. **Patch application mechanism**: in-place `git apply` (simplest, but touches vendored source) vs vendored header overrides under `cpp/okvis2/include_overrides/` (most transparent in code review) vs forked submodule (heaviest, only if patch grows large). Default proposal: vendored header overrides.
2. **Covariance failure semantics**: silent identity sentinel + log (proposed default; binding flags output as low-confidence) vs raise an OKVIS2-side exception (then binding rethrows as `OkvisOptimizationException`).
## Out of scope
- Tracking-stats telemetry (tracked/new/lost feature counts, mean_parallax, mre_px) — separate sibling ticket (AZ-952); this one is covariance-only because the two pieces have different upstream surface area and risk profiles.
- Submitting the patch to upstream OKVIS2 maintainers (tracked as a follow-up issue on the upstream GitHub mirror after this lands locally).
- The downstream AZ-943 binding update — owned by AZ-943, which is currently blocked-by this ticket.
## References
- AZ-943 implementation attempt (2026-05-29): proved "approach (a) subclass workaround" infeasible — `ThreadedSlam::estimator_` is `private`, `ViSlamBackend` has no public covariance accessor.
- AZ-592 line 30-31: offered this exact approach as fallback when (a) fails.
- AZ-943 Implementation Notes "Tiny upstream patch": defers to this approach explicitly.
- `cpp/okvis2/upstream/okvis_multisensor_processing/include/okvis/ThreadedSlam.hpp` line 254: `private: okvis::ViSlamBackend estimator_;`
- `cpp/okvis2/upstream/okvis_ceres/include/okvis/ViSlamBackend.hpp`: no public covariance accessor anywhere.
- meta-rule.mdc "Real Results, Not Simulated Ones" — the constraint that forces this path over a placeholder.
- Sibling ticket: AZ-952 (tracking-stats accessor).
@@ -0,0 +1,70 @@
# OKVIS2 v2 upstream patch: expose tracking-stats accessor (feature counts + parallax + MRE)
**Task**: AZ-952_okvis2_upstream_tracking_stats_patch
**Name**: OKVIS2 v2 upstream patch — expose tracking-stats accessor (feature counts + parallax + MRE)
**Description**: Sibling to AZ-951 (covariance + ADR). AZ-943's implementation attempt on 2026-05-29 confirmed that the four tracking-stats fields the `Okvis2Backend` C++ binding must fill have no source in OKVIS2 v2's public `setOptimisedGraphCallback` arg list. `TrackingState` (`okvis/ViInterface.hpp` lines 167-174) carries only `id`, `isKeyframe`, `trackingQuality` (enum: Good/Marginal/Lost), `recognisedPlace`, `isFullGraphOptimising`, `currentKeyframeId` — NONE of the five tracking-stats fields the binding needs.
**Complexity**: 3 SP
**Dependencies**: AZ-951 (SOFT — same ADR, same patch-mechanism decision; can land in either order, but combining patches is easier if scheduled together), AZ-332 (the AZ-332 Plan-phase pin this work deviates from alongside AZ-951), AZ-592 (parent placeholder)
**Component**: c1_vio (epic AZ-254 / E-C1); also touches `cpp/okvis2/` (upstream wrapper)
**Tracker**: AZ-952 (https://denyspopov.atlassian.net/browse/AZ-952)
**Parent Epic**: AZ-254 (E-C1)
**Blocks**: AZ-943 AC-4 (tracked/new/lost feature counts + mean_parallax + mre_px fields)
Jira AZ-952 is the authoritative spec; this file is the in-workspace mirror.
## Goal
**Blocks AZ-943 AC-4** (the five tracking-stats fields). The Python facade (`okvis2.py` `_build_vio_output`) consumes all five (line 393-399: `FeatureQuality(tracked=..., new=..., lost=..., mean_parallax=..., mre_px=...)`). The `tracked` field also feeds the `_classify_state(vio_output.feature_quality)` DEGRADED-state classifier (line 241) — placeholders would systematically misclassify health.
Five fields with no source in the public callback surface:
- `tracked_features` (int) — not in callback args; computed inside `okvis::Frontend` during matching.
- `new_features` (int) — same.
- `lost_features` (int) — same.
- `mean_parallax` (double, px) — not in callback args; computed inside `okvis::Frontend` keyframe selection.
- `mre_px` (double, mean reprojection error) — not in callback args; ceres optimisation byproduct on the realtimeGraph.
## Scope
1. **Patch file** under `cpp/okvis2/patches/expose_tracking_stats.patch` (or merge into a single combined patch with AZ-951's covariance patch — scheduler decides):
- Add `void ViSlamBackend::getLatestTrackingStats(StateId id, int& tracked, int& newCount, int& lost, double& meanParallaxPx, double& mreReprojectionPx) const` — reads from the relevant private members (`frontend_` / `realtimeGraph_` / `multiFrames_`) via a single batched accessor.
- `tracked` = count of landmark observations for state `id` that were also observed in the most recent prior keyframe.
- `newCount` = count of landmark observations for state `id` that were NOT observed in any prior frame.
- `lost` = count of landmarks observed in the prior keyframe but absent from `id`.
- `meanParallaxPx` = mean keypoint pixel displacement between `id` and the most recent prior keyframe, over the `tracked` matched set.
- `mreReprojectionPx` = mean per-observation reprojection residual from the realtimeGraph optimisation, over all observations attached to `id`.
2. **CMake glue**: same mechanism as AZ-951's covariance patch (vendored header overrides vs in-place git apply vs forked submodule — decided at AZ-951 scheduling). If AZ-951 lands first, this ticket reuses that mechanism; if scheduled in parallel, mechanism is decided together.
3. **Verification path**: compile-clean evidence comes via AZ-944 (Linux CI BUILD_OKVIS2=ON flip). Local macOS gets no compile evidence (project policy).
## Acceptance Criteria
- **AC-1**: `cpp/okvis2/patches/expose_tracking_stats.patch` (or the combined patch with AZ-951's content) exists, applies cleanly to the vendored upstream at `cpp/okvis2/upstream/`, and the total patch surface across this + AZ-951 stays ≤ 200 lines of diff (keeps future rebase cost low).
- **AC-2**: After patch application, `okvis::ViSlamBackend::getLatestTrackingStats(...)` is a public method that fills the five out-params with finite values for any valid `StateId` of a state that has been through realtimeGraph optimisation. For state IDs that have not yet been optimised, all five are set to documented sentinel values (zeros + warning log).
- **AC-3**: All five values are computed from the realtimeGraph's actual matched-observation set; no placeholders, no defaults. Code comment in the patch explains the derivation for each field.
- **AC-4**: ADR-XXX from AZ-951 is updated to cite this ticket alongside the covariance accessor work, so the deviation-from-pin rationale documents the FULL telemetry exposure surface.
- **AC-5**: Local task spec for AZ-943 is updated to call `backend().getLatestTrackingStats(state.id, ...)` inside the `setOptimisedGraphCallback` lambda (the AZ-943 implementation, post-unblock).
## Open Decisions (resolve at scheduling)
1. **Combined vs separate patch file**: ship as `expose_telemetry.patch` (one file covering both AZ-951's covariance + this ticket's tracking-stats) or as two separate `.patch` files. Default proposal: one combined patch with two logical commits inside it.
2. **Sentinel semantics for unoptimised states**: zeros + warning log (proposed default) vs raise OKVIS2-side exception (binding rethrows as `OkvisOptimizationException`).
3. **Parallax denominator edge case**: when `tracked == 0` (no matches at all), `meanParallaxPx` is undefined. Proposed default: emit NaN + warning log; binding then short-circuits to DEGRADED health state. Scheduler may choose 0.0 instead.
## Out of scope
- 6×6 pose covariance accessor — covered by AZ-951 (sibling ticket; same ADR, same patch mechanism).
- The ADR creation itself — owned by AZ-951; this ticket extends the ADR's scope rather than creating a separate one.
- Submitting the patch to upstream OKVIS2 maintainers (tracked as a follow-up issue after both tickets land locally).
- The downstream AZ-943 binding update — owned by AZ-943, which is currently blocked-by this ticket.
## References
- AZ-943 implementation attempt (2026-05-29): proved the five tracking-stats fields have no source in OKVIS2 v2's public callback surface.
- AZ-592 line 24 (incorrect spec assumption): "derive tracked_features + mean_parallax from TrackingState" — superseded by this ticket; TrackingState does NOT carry these.
- `cpp/okvis2/upstream/okvis_common/include/okvis/ViInterface.hpp` lines 167-174: TrackingState definition (only 6 fields, none of them tracking-stats).
- `cpp/okvis2/upstream/okvis_ceres/include/okvis/ViSlamBackend.hpp`: no public tracking-stats accessor anywhere.
- AZ-848 (jetson_eskf_out_of_order_regression): downstream EKF assumes finite, computed VIO telemetry; placeholders would mislead its diagnostic surface.
- meta-rule.mdc "Real Results, Not Simulated Ones" — the constraint that forces this path over a placeholder.
- Sibling ticket: AZ-951 (covariance + ADR).
+3 -3
View File
@@ -6,9 +6,9 @@ step: 10
name: Implement
status: in_progress
sub_step:
phase: 7
name: batch-loop
detail: "batch 4 closed (AZ-842); next: AZ-897 or AZ-943"
phase: 6
name: implement-tasks
detail: "batch 5 of N: AZ-897 replay UI web form (pivoted from AZ-943 — paused, blocked on AZ-951+AZ-952 per 2026-05-29 spec-reality gap)"
retry_count: 0
cycle: 4
tracker: jira