mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-21 21:51:13 +00:00
9c35776bcb
Handoff artifacts from the prior /autodev session that stopped at Step 7 sub_step compute-next-batch: - _docs/_autodev_state.md: pointer updated to batch 23, AZ-332 only (AZ-345 deferred — dep AZ-346 not yet in done/). - _docs/03_implementation/AZ-332_implementation_plan.md: locked-in decisions (no ROS 2, no Python re-impl, three-env split: macOS dev / Ubuntu CI / Jetson tier2) + step-by-step playbook for next session. Pre-batch chore commit per implement skill prereq #4 (clean tree required before AZ-332 commit so the batch diff stays focused). Co-authored-by: Cursor <cursoragent@cursor.com>
169 lines
15 KiB
Markdown
169 lines
15 KiB
Markdown
# AZ-332 — Implementation plan (batch 23, cycle 1)
|
||
|
||
**Date created**: 2026-05-12 (carry-over from `/autodev` session 2026-05-12 morning)
|
||
**Owner**: next `/autodev` invocation starting from Step 7 Implement sub_step `compute-next-batch`
|
||
**Scope of this doc**: a concrete, in-order playbook for the next session. Reading this + the task spec at `_docs/02_tasks/todo/AZ-332_c1_okvis2_strategy.md` is sufficient to resume — no other re-discovery needed.
|
||
|
||
---
|
||
|
||
## Why this is its own plan doc
|
||
|
||
AZ-332 (C1 OKVIS2 production-default VIO) is the first task in this project to require a native C++ build chain (OKVIS2 + pybind11). The previous session researched paths, surfaced blockers, and landed on a decomposition that splits work across three build environments. That decomposition has to survive the session boundary, hence this file.
|
||
|
||
## Decisions locked in the previous session
|
||
|
||
1. **No ROS 2 layer.** `colcon` build of OKVIS2 produces the same libraries as standalone CMake plus a ROS 2 node we do not need; ROS 2 runtime IPC was rejected at Plan time (`_docs/01_solution/solution.md` § D-C1-1-SUB-A — "Rejected (cost + latency budget conflict)"). Build with **standalone CMake**.
|
||
2. **No Python re-implementation of OKVIS2.** Forbidden by the task spec ("Unacceptable substitutes" section). Pure-Python VIO violates C1-PT-01 ≤ 80 ms p95 budget by construction.
|
||
3. **No alternative VIO substitution.** Every C++ VIO candidate (OpenVINS, VINS-Mono, Kimera-VIO) has the same compile-on-macOS problem. The only Python-native candidates (DPVO, KLT+RANSAC) are mono-VO only — not drop-ins for a VIO contract. AZ-332 stays OKVIS2.
|
||
4. **Three-environment dev split**:
|
||
| Environment | What runs there | What it gates |
|
||
|---|---|---|
|
||
| macOS dev | Python facade + binding C++ editing; unit tests using the fake `_native.okvis2_binding` (task spec explicitly allows this for tests) | AC-1, AC-2, AC-3, AC-4, AC-5, AC-6, AC-7, AC-8, AC-10 |
|
||
| Ubuntu CI runner (`ci.yml`) | Native CMake build of vendored OKVIS2 + binding `.so` | Build-passes gate; no AC validation here |
|
||
| Self-hosted Jetson runner (`ci-tier2.yml`) | Real-OKVIS2 perf + honest-covariance tests | AC-9 (honest covariance monotonicity); NFR-perf p95 ≤ 80 ms |
|
||
|
||
This split honours the task spec ("real `Okvis2Strategy` calling real C7 `InferenceRuntime` with real TRT-compiled DISK engine") because the production binary IS the real binding compiled on Linux/Jetson — only the dev-side unit tests use the fake. The fake never ships to production.
|
||
|
||
## Concrete step-by-step for next session (in order; each step has a stop-and-verify gate)
|
||
|
||
### Step 0 — re-entry sanity check (1 min)
|
||
- Read `_docs/_autodev_state.md`: confirm step 7 / sub_step `compute-next-batch` / detail points here.
|
||
- Read this doc fully.
|
||
- Read `_docs/02_tasks/todo/AZ-332_c1_okvis2_strategy.md` once.
|
||
- `git status --porcelain` must be empty (implement skill prerequisite).
|
||
|
||
### Step 1 — vendor OKVIS2 and pybind11 as git submodules (5–10 min)
|
||
- `git submodule add --depth 1 --recurse-submodules https://github.com/smartroboticslab/okvis2.git cpp/okvis2/upstream`
|
||
- Note: submodule path is `cpp/okvis2/upstream/` (not `cpp/okvis2/` directly) so the existing `cpp/okvis2/CMakeLists.txt` keeps its project-owned role and `add_subdirectory(upstream)` pulls in OKVIS2.
|
||
- `git submodule add --depth 1 https://github.com/pybind/pybind11.git cpp/pybind11/upstream`
|
||
- Same pattern: existing `cpp/pybind11/` directory keeps the project README; submodule lives at `cpp/pybind11/upstream/`.
|
||
- Delete the `.gitkeep` and placeholder `README.md` from `cpp/pybind11/` once the submodule is in place (or keep them; they're harmless either way — pick one and stay consistent).
|
||
- Pin a known-good commit hash for OKVIS2 (record it in this doc under "Pinned upstream versions" once chosen). Recommendation: pin to the latest `main` HEAD at the time of submodule add and document the commit short-hash here.
|
||
- **Gate**: `git submodule status` shows both submodules with a SHA; `git status` clean except `.gitmodules` + submodule entries.
|
||
|
||
### Step 2 — write CMake glue (15–30 min)
|
||
Files to write:
|
||
- `cpp/okvis2/CMakeLists.txt` (replace existing placeholder):
|
||
- `if(NOT BUILD_OKVIS2) return() endif()`
|
||
- `add_subdirectory(upstream EXCLUDE_FROM_ALL)` with OKVIS2's `USE_NN=OFF` to drop the LibTorch dep (per Fact #39 — keyframe arch tolerates this).
|
||
- Find_package the Linux deps OKVIS2 needs (Eigen3, Boost, glog, gflags, SuiteSparse, Ceres, OpenCV — every one is an apt package on Ubuntu, brew formula on macOS).
|
||
- `add_subdirectory(${CMAKE_SOURCE_DIR}/cpp/pybind11/upstream pybind11_build)`.
|
||
- `pybind11_add_module(okvis2_binding ${CMAKE_CURRENT_SOURCE_DIR}/../../src/gps_denied_onboard/components/c1_vio/_native/okvis2_binding.cpp)` — note path back to Python tree.
|
||
- `target_link_libraries(okvis2_binding PRIVATE okvis::Estimator okvis::Common ...)` (exact target names from OKVIS2's CMake exports — verify by running `cmake --build build --target help | grep okvis` once submodule is in).
|
||
- `install(TARGETS okvis2_binding DESTINATION ${CMAKE_INSTALL_LIBDIR}/gps_denied_onboard/components/c1_vio/_native/)`.
|
||
- `cpp/pybind11/CMakeLists.txt` (replace existing placeholder): can stay nearly empty — pybind11 is included by `cpp/okvis2/CMakeLists.txt` via `add_subdirectory`.
|
||
|
||
The existing top-level `cpp/CMakeLists.txt` already has `add_subdirectory(okvis2)` gated on `BUILD_OKVIS2 OR BUILD_VINS_MONO OR BUILD_KLT_RANSAC` — no change needed there.
|
||
|
||
**Gate**: `cmake -S . -B build -DBUILD_OKVIS2=OFF` succeeds on macOS (no-op build with the flag off). The OFF path is what protects the rest of the build from any of this new wiring.
|
||
|
||
### Step 3 — write the pybind11 binding C++ skeleton (1–2 h)
|
||
File: `src/gps_denied_onboard/components/c1_vio/_native/okvis2_binding.cpp`
|
||
|
||
Surface needed (mirrors the Python facade's needs — not the full OKVIS2 API):
|
||
- `Okvis2Backend` class with: ctor from YAML config string + camera intrinsics dict; `add_frame(frame_id: str, ns_ts: int, image: ndarray[uint8, H, W, C]) -> bool`; `add_imu(ns_ts: int, accel: ndarray[float64, 3], gyro: ndarray[float64, 3]) -> None`; `get_latest_output() -> dict | None` (returns frame_id + 4x4 pose matrix + 6x6 covariance + bias + feature_quality dict + emitted_at_ns); `reset(body_T_world: ndarray[float64, 4, 4], velocity: ndarray[float64, 3], accel_bias: ndarray[float64, 3], gyro_bias: ndarray[float64, 3]) -> None`; `health() -> dict` (returns `{state: str, consecutive_lost: int, bias_norm: float}`).
|
||
- Exceptions: every OKVIS / Eigen / std::runtime_error caught inside binding methods and rethrown as a fixed set of Python exceptions registered via `py::register_exception` — the Python facade then catches those and rewraps into `VioError` family.
|
||
- Zero-copy pathway: `image` is `py::array_t<uint8_t, py::array::c_style | py::array::forcecast>` so DISK ingest avoids a copy.
|
||
|
||
This is a skeleton — full OKVIS2 estimator wiring (`okvis::ThreadedKFVio` setup + callback plumbing) can be a follow-up commit if the skeleton + CI Linux build come back green first.
|
||
|
||
**Gate**: compiles inside the OKVIS2 CMake target. Tested on Ubuntu CI runner (not macOS).
|
||
|
||
### Step 4 — write the Python facade `okvis2.py` (1–2 h)
|
||
File: `src/gps_denied_onboard/components/c1_vio/okvis2.py`
|
||
|
||
- `Okvis2Strategy` class implementing the `VioStrategy` Protocol from `interface.py`.
|
||
- Lazy import of `_native.okvis2_binding` inside the module body (NOT at module top — that's the I-5 / Risk-2 mitigation; AZ-331's `test_ac5_build_vio_strategy_flag_off_no_import` asserts this and MUST still pass).
|
||
- Constructor signature: `__init__(self, config: Config, *, fdr_client: FdrClient)` — match the AZ-331 factory's call shape exactly. Inside the constructor: build the `ImuPreintegrator` from `helpers.imu_preintegrator.make_imu_preintegrator(calibration)`; build the `Okvis2Backend` from the binding; record the strategy label as `"okvis2"` (frozen per Protocol invariant).
|
||
- Map every backend exception (raised from the C++ binding's registered exception types) to the `VioError` family — `OkvisInitException → VioInitializingError`, `OkvisFatalException → VioFatalError`, `OkvisOptimizationException → VioDegradedError` (only when transitioning to fatal — the normal degraded path returns a `VioOutput` with inflated covariance per AZ-331 v1.0.0).
|
||
- `process_frame`: feed IMU samples to the preintegrator, push frame to backend, read latest output, build the `VioOutput` DTO using `gtsam.Pose3.matrix()` round-trip via `helpers.se3_utils` (AZ-277). Echo `frame_id`.
|
||
- `reset_to_warm_start`: tear down + reconstruct `Okvis2Backend` from the hint; first call must not raise (idempotency invariant per AC-4); seed bias into the preintegrator via `preintegrator.reset_with_bias(hint.bias)`.
|
||
- `health_snapshot`: pull `backend.health()` dict and wrap as `VioHealth`. Track `consecutive_lost` Python-side because the binding returns "current state" only.
|
||
- `current_strategy_label`: return the frozen `"okvis2"`.
|
||
- FDR records on state transitions via the injected `fdr_client` using the `kind="vio.health"` schema (AZ-272).
|
||
|
||
**Gate**: `mypy --strict` passes against the new file; `ruff check` passes; isinstance check `isinstance(Okvis2Strategy(...), VioStrategy)` returns True without importing the native binding (i.e., the Protocol's structural conformance, not the construction itself).
|
||
|
||
### Step 5 — write `Okvis2Config` (15 min)
|
||
File: `src/gps_denied_onboard/components/c1_vio/config.py` (extend existing — do not duplicate `C1VioConfig`).
|
||
|
||
- Add `@dataclass(frozen=True) class Okvis2Config` with fields: `keyframe_window_size: int = 15` (∈ [10, 20] per D-C5-3); `keyframe_parallax_threshold_px: float = 3.0`; `ransac_inlier_ratio: float = 0.5`; `max_optimization_iters: int = 4`; `degraded_feature_threshold: int = 30`; `per_frame_debug_log: bool = False`.
|
||
- `__post_init__` validates ranges and raises `ConfigError`.
|
||
- Register the block under `config.components['c1_vio'].okvis2` (sub-block) — keep `C1VioConfig` as-is at the top level.
|
||
|
||
**Gate**: `Okvis2Config(keyframe_window_size=9)` raises `ConfigError`; `Okvis2Config()` defaults pass.
|
||
|
||
### Step 6 — write unit tests with fake binding (1–2 h)
|
||
Files:
|
||
- `tests/unit/c1_vio/conftest.py`: a `fake_okvis2_binding` fixture that installs a `types.ModuleType` at `sys.modules['gps_denied_onboard.components.c1_vio._native.okvis2_binding']` with a scriptable `Okvis2Backend` test double. The test double exposes a `script()` method that pre-loads a queue of outputs / exceptions; `add_frame` pops from the queue. This is the "fake pybind11 binding that returns scripted `VioOutput` payloads" the task spec explicitly allows.
|
||
- `tests/unit/c1_vio/test_okvis2_strategy.py`: one test per AC (AC-1 through AC-8, AC-10). Use the fake binding fixture. AC-9 and the NFR-perf test are written here too but marked `@pytest.mark.tier2` so `pytest -m "not tier2"` (the macOS dev loop) skips them; `ci-tier2.yml` picks them up.
|
||
|
||
**Gate**: every unit test passes on macOS with `pytest -m "not tier2" tests/unit/c1_vio/`. Full sweep (`pytest tests/`) shows the existing 1093 passing + the new tests, with the tier2-marked ones skipped on macOS.
|
||
|
||
### Step 7 — update `.github/workflows/ci.yml` to install OKVIS2's Linux deps (5–10 min)
|
||
- In the `build` matrix's `deployment` and `research` kinds, add a step BEFORE `cmake -S . -B build`:
|
||
```yaml
|
||
- name: Install OKVIS2 native deps
|
||
run: |
|
||
sudo apt-get update
|
||
sudo apt-get install -y --no-install-recommends \
|
||
libeigen3-dev libboost-all-dev libgoogle-glog-dev libgflags-dev \
|
||
libsuitesparse-dev libceres-dev libopencv-dev
|
||
```
|
||
- Toggle `BUILD_OKVIS2` to `ON` in the `deployment` kind's `cmake_flags` (default config in `solution.md` says OKVIS2 is the production-default; the deployment matrix kind should enforce this).
|
||
- The `research` kind already has `BUILD_VINS_MONO=ON`; leave `BUILD_OKVIS2=ON` there too.
|
||
|
||
**Gate**: push branch; GitHub Actions Ubuntu runner completes the `cmake --build build --parallel` step. If OKVIS2's CMake export targets have a different name than `okvis::Estimator` / `okvis::Common`, the failure surfaces here and Step 2's `target_link_libraries` is patched. This is the only build-system feedback loop we get pre-Jetson — exploit it.
|
||
|
||
### Step 8 — AC coverage verification + code review (15–30 min)
|
||
- Verify every AC of AZ-332 maps to at least one test (skipped-with-reason counts as covered per implement skill Step 8).
|
||
- Invoke `/code-review` skill on the batch's changed files. Expected verdict: PASS or PASS_WITH_WARNINGS. Auto-fix or escalate per implement skill Step 10.
|
||
|
||
### Step 9 — commit (5 min)
|
||
- One commit per implement skill Step 11: `[AZ-332] C1 Okvis2Strategy: pybind11 binding skeleton + Python facade + fake-backend tests`.
|
||
- Body of commit message documents the three-environment split (macOS dev / Ubuntu CI / Jetson tier2) and notes that AC-9 + NFR-perf are tier2-gated.
|
||
|
||
### Step 10 — tracker + archive + batch report (5 min)
|
||
- Jira: AZ-332 In Progress → In Testing.
|
||
- Move `_docs/02_tasks/todo/AZ-332_c1_okvis2_strategy.md` → `_docs/02_tasks/done/`.
|
||
- Write `_docs/03_implementation/batch_23_cycle1_report.md` with the standard report shape. Include the tier2-deferred AC-9 + NFR-perf items under "Deferred to tier2 CI".
|
||
- Update `_docs/_autodev_state.md`: sub_step → next batch detection.
|
||
|
||
## Files to be created / modified (summary)
|
||
|
||
Created:
|
||
- `cpp/okvis2/upstream/` (git submodule)
|
||
- `cpp/pybind11/upstream/` (git submodule)
|
||
- `src/gps_denied_onboard/components/c1_vio/_native/okvis2_binding.cpp`
|
||
- `src/gps_denied_onboard/components/c1_vio/okvis2.py`
|
||
- `tests/unit/c1_vio/conftest.py`
|
||
- `tests/unit/c1_vio/test_okvis2_strategy.py`
|
||
- `_docs/03_implementation/batch_23_cycle1_report.md`
|
||
|
||
Modified:
|
||
- `cpp/okvis2/CMakeLists.txt` (replace placeholder)
|
||
- `cpp/pybind11/CMakeLists.txt` (replace placeholder; can stay minimal)
|
||
- `src/gps_denied_onboard/components/c1_vio/config.py` (add `Okvis2Config`)
|
||
- `.github/workflows/ci.yml` (add apt-get step; flip `BUILD_OKVIS2=ON` in deployment kind)
|
||
- `.gitmodules` (auto-edited by submodule add)
|
||
- `_docs/_autodev_state.md`
|
||
- `_docs/02_tasks/todo/AZ-332_c1_okvis2_strategy.md` (moved to done/)
|
||
|
||
## Tier2 deliverables (NOT this session — explicit follow-up)
|
||
|
||
AC-9 (honest covariance monotonicity) and the NFR-perf test (`process_frame` p95 ≤ 80 ms on Tier-2) require real OKVIS2 + Derkachi-class fixture footage on the actual Jetson hardware. They are:
|
||
- Written in `test_okvis2_strategy.py` marked `@pytest.mark.tier2`.
|
||
- Skipped on macOS dev + GitHub Actions Linux runner.
|
||
- Picked up by `ci-tier2.yml` on push to `stage` or `main`.
|
||
- A remediation task (`AZ-332_tier2_validation`) is OPTIONAL — could be tracked separately or rolled into the deferred Jetson MVE phase that D-C1-2 already scheduled. Pick at session-start time.
|
||
|
||
## Pinned upstream versions
|
||
|
||
Fill in once Step 1 is executed:
|
||
- `cpp/okvis2/upstream` — commit hash: _TBD_; OKVIS2 main branch HEAD at `<date>`
|
||
- `cpp/pybind11/upstream` — commit hash: _TBD_; pybind11 stable release tag `<version>`
|
||
|
||
## When this doc can be deleted
|
||
|
||
After AZ-332 lands and the next batch is in flight, this file is historical context. Move to `_docs/_archive/` (or delete if `_archive` doesn't exist) once Jetson tier2 CI has been green at least once on a real OKVIS2 run.
|