[AZ-331] C1 VioStrategy: Protocol + DTOs + factory + C5 migration

Freezes the c1_vio Public API per
_docs/02_document/contracts/c1_vio/vio_strategy_protocol.md v1.0.0:

- VioStrategy Protocol (4 methods: process_frame, reset_to_warm_start,
  health_snapshot, current_strategy_label) in
  components/c1_vio/interface.py.
- DTOs (VioOutput, VioHealth, FeatureQuality, WarmStartPose) + VioState
  enum in _types/nav.py — L1 placement so C5 + C13 consume them without
  crossing the components.* boundary (AZ-270 AC-6). The new VioOutput
  shape (frame_id: str, relative_pose_T: gtsam.Pose3,
  pose_covariance_6x6, imu_bias, feature_quality, emitted_at_ns)
  replaces the AZ-263 scaffolding in _types/vio.py, which is now
  deleted.
- VioError family (VioInitializingError / VioDegradedError /
  VioFatalError) in components/c1_vio/errors.py. Documented
  rationale: the degraded-operation path returns a VioOutput with
  inflated covariance + VioHealth.state=DEGRADED rather than raising
  VioDegradedError — the error type exists only for the rare
  degraded->fatal transition.
- C1VioConfig per-component config block (strategy enum,
  lost_frame_threshold default 9, warm_start_max_frames default 5)
  with constructor-time validation rejecting unknown strategy labels.
- StrategyNotAvailableError added to runtime_root/errors.py;
  composition-time error distinct from the VioError family.
- Composition-root factory build_vio_strategy in
  runtime_root/vio_factory.py with three BUILD_* gates (BUILD_OKVIS2,
  BUILD_VINS_MONO, BUILD_KLT_RANSAC). Concrete strategy modules are
  imported lazily via __import__ AFTER the flag check — Tier-0
  workstation builds with the flag OFF MUST NOT load the strategy
  module (Risk-2 / I-5; verifiable via sys.modules).
- 36 conformance tests cover all 9 ACs + NFR-perf-factory
  (p99 build under 200 ms x 1000 calls) + NFR-reliability-error-family.
  AC-8 introspects the contract file's Shape table and asserts method
  parity against the runtime Protocol; AC-9 asserts the frame_id
  annotation is 'str' (PEP-563 stringified).

C5 migration (consumers of the new VioOutput shape):
- gtsam_isam2_estimator.py + eskf_baseline.py: replaced
  vio.timestamp -> vio.emitted_at_ns (drops _datetime_to_ns on the
  VIO path), vio.pose_se3 -> vio.relative_pose_T (gtsam.Pose3 direct;
  drops _pose_se3_to_gtsam / _pose_se3_to_array), vio.covariance_6x6
  -> vio.pose_covariance_6x6 (rename).
- key_for_frame signature widened to UUID | int | str to accept the
  new str frame_id.
- 4 C5 test files migrated to the new VioOutput shape with helper
  fixtures producing ImuBias + FeatureQuality + str frame_id.
- c5_state/interface.py TYPE_CHECKING import path updated.

Bootstrap healthcheck + test_types_importable updated to drop the
deleted _types/vio module and pick up _types/inference (AZ-297) in
the same sweep.

Full unit-test sweep: 884 passed, 2 pre-existing environment skips
(cmake, actionlint).

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-12 04:44:31 +03:00
parent daff5d4d1c
commit 6c7d24f7e0
20 changed files with 1027 additions and 81 deletions
@@ -0,0 +1,182 @@
# C1 VioStrategy Protocol + Composition-Root Selection
**Task**: AZ-331_c1_vio_strategy_protocol
**Name**: C1 VioStrategy Protocol
**Description**: Define the `VioStrategy` Protocol, its DTOs (`WarmStartPose`, `VioOutput`, `VioHealth`, `FeatureQuality`, `VioState` enum), the runtime error taxonomy (`VioError` family), and the composition-root selection switch that wires exactly one of `Okvis2Strategy` / `VinsMonoStrategy` / `KltRansacStrategy` at startup based on ADR-001 (config) and ADR-002 (`BUILD_OKVIS2` / `BUILD_VINS_MONO` / `BUILD_KLT_RANSAC` flags). This is the foundational shared-API task for E-C1 — every other E-C1 strategy task implements this Protocol; C5 state estimator tasks (AZ-260) consume `VioOutput`; C13 FDR writer tasks (AZ-248) consume `VioHealth`; the warm-start + F8 reboot recovery wiring task (this same epic) invokes `reset_to_warm_start`.
**Complexity**: 3 points
**Dependencies**: AZ-263_initial_structure, AZ-269_config_loader, AZ-266_log_module, AZ-270_compose_root, AZ-272_fdr_record_schema, AZ-276_imu_preintegrator, AZ-277_se3_utils
**Component**: c1_vio (epic AZ-254 / E-C1)
**Tracker**: AZ-331
**Epic**: AZ-254 (E-C1)
### Document Dependencies
- `_docs/02_document/contracts/c1_vio/vio_strategy_protocol.md` — frozen public interface this task produces.
- `_docs/02_document/contracts/shared_helpers/imu_preintegrator.md` — referenced for the IMU substrate every strategy consumes (AZ-276).
- `_docs/02_document/contracts/shared_helpers/se3_utils.md``SE3` math primitives used by `WarmStartPose` and `VioOutput.relative_pose_T` (AZ-277).
- `_docs/02_document/contracts/shared_config/composition_root_protocol.md``Config` extension for the new `config.vio.strategy` enum.
- `_docs/02_document/components/01_c1_vio/description.md` — § 1 high-level overview and § 2 internal interfaces (the Protocol's source of truth).
## Problem
Three different concrete VIO backends (OKVIS2 production-default, VINS-Mono research-only, KLT/RANSAC mandatory simple-baseline per ADR-002 engine rule) plus three external consumers (C5 state estimator, C13 FDR writer, runtime_root composition) all need a single, frozen interface to per-frame VIO output. Without it:
- Each consumer would import a concrete OKVIS2 / VINS-Mono / KLT-RANSAC class directly, hard-coding the runtime choice and breaking ADR-001's runtime selectability.
- `BUILD_OKVIS2=OFF` (Tier-0 workstation without OKVIS2 native libs installed) would not import because consumers depend on OKVIS2-specific symbols.
- The composition root would have to know per-component which strategy is acceptable; today only ADR-001 (config) + ADR-002 (`BUILD_*` flags) decide.
- Error handling would diverge per backend; `VioFatalError` would have a different shape per implementation, making the AC-5.2 fallback path fragile (3 s no-estimate timer expects a uniform error).
- C5 fusion would have to handle a different `VioOutput` shape per backend; the iSAM2 graph's `BetweenFactorPose3` insertion path assumes a fixed 6×6 covariance shape.
- The honest-covariance contract — strategies MUST NOT tighten covariance during a degradation event — would have no canonical surface to enforce; AC-1.4 drift becomes silent.
This task delivers the typed boundary every consumer reads against and every strategy conforms to. It writes no per-frame VIO logic — the concrete backends are separate tasks in this epic.
## Outcome
- A `VioStrategy` Protocol (PEP 544 `typing.Protocol`, `runtime_checkable=True`) is exported from `src/gps_denied_onboard/components/c1_vio/interface.py` and re-exported from the component's `__init__.py`.
- The DTOs `WarmStartPose`, `VioOutput`, `VioHealth`, `FeatureQuality` are stdlib `@dataclass(frozen=True)`; `VioState` is a `str`-Enum. `VioOutput` and `VioHealth` are placed in `src/gps_denied_onboard/_types/nav.py` for cross-component access (C5, C13); `WarmStartPose`, `FeatureQuality`, `VioState`, and the `VioStrategy` Protocol live in the c1_vio component.
- The runtime error taxonomy is a single hierarchy under `gps_denied_onboard.components.c1_vio.errors`: `VioError` ← {`VioInitializingError`, `VioDegradedError`, `VioFatalError`}. Every strategy raises only these; consumers catch only these.
- The composition root has a `build_vio_strategy(config: Config, *, fdr_client: FdrClient) -> VioStrategy` factory function at `src/gps_denied_onboard/runtime_root/vio_factory.py` that selects the strategy by `config.vio.strategy` (`okvis2 | vins_mono | klt_ransac`) and respects compile-time `BUILD_*` gating: requesting a strategy whose `BUILD_*` flag is OFF raises `StrategyNotAvailableError` at composition time (NOT at first frame).
- Concrete strategy modules are lazy-imported inside the factory under `if BUILD_OKVIS2: from c1_vio.okvis2 import Okvis2Strategy` so a Tier-0 workstation build that lacks OKVIS2 native libs still composes successfully when only KLT/RANSAC is requested.
- Every strategy's `current_strategy_label()` returns the lowercase label matching the config value (`"okvis2"`, `"vins_mono"`, `"klt_ransac"`); this is the FDR-stamped label for AC-NEW-3 audit.
- A `ConfigSchemaError` extension to AZ-269's config loader for the new `config.vio.strategy` enum + `config.vio.lost_frame_threshold` (default 9; consecutive-LOST frames before `VioFatalError` is raised) + `config.vio.warm_start_max_frames` (default 5; convergence budget after `reset_to_warm_start`).
- A frozen contract file at `_docs/02_document/contracts/c1_vio/vio_strategy_protocol.md` (already drafted alongside this task) carries the full shape; consumers read that file, not this task spec.
## Scope
### Included
- `VioStrategy` Protocol with the four methods from `_docs/02_document/components/01_c1_vio/description.md` § 2: `process_frame`, `reset_to_warm_start`, `health_snapshot`, `current_strategy_label`.
- DTO dataclasses for `WarmStartPose`, `VioOutput`, `VioHealth`, `FeatureQuality`. All `frozen=True`. The `VioState` enum.
- Error hierarchy under `c1_vio.errors`: every error type the Protocol promises; all derived from a common `VioError` so consumers can catch the family.
- Placement of `VioOutput` and `VioHealth` in `src/gps_denied_onboard/_types/nav.py` (per `module-layout.md` cross-component DTO rule); the rest of the surface lives in `components/c1_vio/`.
- `build_vio_strategy(config, *, fdr_client) -> VioStrategy` composition-root factory in `src/gps_denied_onboard/runtime_root/vio_factory.py`. Imports the concrete strategy lazily — guarded by `if BUILD_OKVIS2: from c1_vio.okvis2 import Okvis2Strategy` so an OFF flag does not force a native-lib import.
- A `StrategyNotAvailableError` raised by the factory when the requested strategy is not built into this binary, with a message naming the missing `BUILD_*` flag.
- Config schema extension to AZ-269's loader for the three new fields (`config.vio.strategy`, `config.vio.lost_frame_threshold`, `config.vio.warm_start_max_frames`) with enum validation at load time.
- The contract file at `_docs/02_document/contracts/c1_vio/vio_strategy_protocol.md` filled per `decompose/templates/api-contract.md` with Shape, Invariants, Non-Goals, Versioning Rules, and at least three Test Cases (already drafted as part of this task).
- Type-only unit tests that verify each strategy module's class actually conforms to the Protocol via `runtime_checkable` + `isinstance` (catches drift at CI time, not deployment).
### Excluded
- `Okvis2Strategy` implementation — separate task in this epic.
- `VinsMonoStrategy` implementation — separate task in this epic.
- `KltRansacStrategy` implementation — separate task in this epic.
- Warm-start hint persistence (write to disk after takeoff, read after F8 reboot) — separate task in this epic; this task only defines the in-memory `WarmStartPose` DTO and the `reset_to_warm_start` Protocol method.
- IMU preintegration — owned by AZ-276 (`helpers.imu_preintegrator`); strategies feed `ImuWindow` to the helper, they do not implement preintegration here.
- C5 fusion wiring — owned by E-C5 (AZ-260).
- C13 FDR consumer wiring of `VioHealth` — owned by E-C13 (AZ-248).
## Acceptance Criteria
**AC-1: Protocol is conformance-checkable**
Given a class that implements all four Protocol methods with matching signatures
When `isinstance(impl, VioStrategy)` is evaluated under `runtime_checkable`
Then the result is `True`; for a class that omits any method or has a wrong signature, the result is `False`
**AC-2: Frozen DTOs reject mutation**
Given a constructed `VioOutput(...)`, `VioHealth(...)`, `WarmStartPose(...)`, or `FeatureQuality(...)` instance
When the test attempts to reassign any field
Then `dataclasses.FrozenInstanceError` is raised; the original value is preserved
**AC-3: Error hierarchy catchable as a single family**
Given any of the three documented error subtypes
When the consumer wraps a strategy call in `try: ... except c1_vio.errors.VioError`
Then every documented subtype is caught; an unrelated `Exception` (e.g., `ValueError`) is NOT caught (the Protocol's error envelope does not leak into general exception handling)
**AC-4: Composition-root factory honours config**
Given `config.vio.strategy = "okvis2"` and `BUILD_OKVIS2=ON`
When `build_vio_strategy(config, fdr_client=fake_client)` is called
Then an `Okvis2Strategy` instance is returned and `instance.current_strategy_label() == "okvis2"`
**AC-5: Composition-root factory honours BUILD flag gate**
Given `config.vio.strategy = "vins_mono"` and `BUILD_VINS_MONO=OFF`
When `build_vio_strategy(config, fdr_client=fake_client)` is called
Then `StrategyNotAvailableError` is raised at composition time with a message naming `"vins_mono"` AND the missing `BUILD_VINS_MONO` flag; no module-level import of `vins_mono` symbols has occurred (verifiable via `sys.modules` having no `gps_denied_onboard.components.c1_vio.vins_mono` entry)
**AC-6: Unknown strategy label rejected at config load**
Given `config.vio.strategy = "openvslam"` (not in the enum)
When the config is loaded via AZ-269's loader
Then `ConfigSchemaError` is raised at load time with a message listing the valid values (`okvis2 | vins_mono | klt_ransac`); `build_vio_strategy` is never reached
**AC-7: `current_strategy_label()` matches config value exactly**
Given any selectable strategy
When `instance.current_strategy_label()` is called
Then the returned string is one of `"okvis2"`, `"vins_mono"`, `"klt_ransac"` and equals `config.vio.strategy`; AC-NEW-3 audit relies on this exact-match property
**AC-8: Contract file matches Protocol shape**
Given the contract file at `_docs/02_document/contracts/c1_vio/vio_strategy_protocol.md`
When a contract-test parses the Shape section's method/field tables and compares against the runtime Protocol via introspection
Then every method, every field, every error type is present and consistent in both
**AC-9: `VioOutput.frame_id` echo invariant typed**
Given the DTO definitions
When inspecting `VioOutput.frame_id`'s type and the Protocol's docstring
Then `frame_id` is typed `str` and the docstring states "MUST equal `NavCameraFrame.frame_id` from the input frame"; AC-9 is enforced behaviourally by the strategy implementations and contract-tested in Step 9
## Non-Functional Requirements
**Compatibility**
- The Protocol is `typing.Protocol` (PEP 544 structural typing) so existing components that import a concrete VIO class today (none yet — this is greenfield) can be retrofitted without inheritance changes.
- All error types subclass `Exception` (not `BaseException`) so `except Exception:` in upstream layers continues to work as expected.
**Performance**
- The factory `build_vio_strategy` returns within 200 ms (it imports + constructs one strategy; native-library bring-up time inside OKVIS2 / VINS-Mono is amortised against this; the 200 ms is a sanity bound for the factory dispatch logic itself, not for OKVIS2 backend init).
- DTO construction (`VioOutput`, `VioHealth`, `WarmStartPose`) is dataclass-frozen; per-instance overhead is the bare-cost dataclass `__init__`.
**Reliability**
- The Protocol is the boundary of acceptable runtime errors. Strategies MUST NOT raise other types into consumers; if a third-party library (OpenCV, OKVIS2, VINS-Mono, GTSAM) raises something else, the strategy catches and rewraps into `VioError` family.
- Versioning: any breaking change to the Protocol or its DTOs MUST bump the contract file's `Version` and notify every consumer task listed in the contract header.
## Unit Tests
| AC Ref | What to Test | Required Outcome |
|--------|-------------|-----------------|
| AC-1 | `runtime_checkable` Protocol vs. a fully-implementing fake; vs. a fake missing one method | `isinstance` returns True for full, False for partial |
| AC-2 | Mutation attempt on each frozen DTO | `FrozenInstanceError` raised; original value preserved |
| AC-3 | Raise each of the three error subtypes; catch as `c1_vio.errors.VioError` | All caught; an unrelated `ValueError` is NOT caught by the same handler |
| AC-4 | `build_vio_strategy` with `okvis2` + flag ON → fake `Okvis2Strategy` | Returned instance is `Okvis2Strategy`; `current_strategy_label()` == `"okvis2"` |
| AC-5 | `build_vio_strategy` with `vins_mono` + flag OFF | `StrategyNotAvailableError`; `sys.modules` does NOT contain `c1_vio.vins_mono` |
| AC-6 | Config load with invalid `vio.strategy` value | `ConfigSchemaError`; valid values listed in message |
| AC-7 | `current_strategy_label()` for each strategy | Matches the config value used to construct it |
| AC-8 | Contract introspection vs. Protocol introspection | Shape parity test passes |
| AC-9 | Inspect `VioOutput.frame_id` type and docstring | `str` type; echo-invariant noted in docstring |
| NFR-perf-factory | Microbench `build_vio_strategy` × 1000 (with each strategy mocked) | p99 ≤ 200 ms (dominated by lazy import on first call; subsequent calls << 1 ms) |
| NFR-reliability-error-family | All three subtypes inherit from `c1_vio.errors.VioError` | Verified via `issubclass` for each |
## Constraints
- The Protocol uses `typing.Protocol` from stdlib; no third-party Protocol library is introduced.
- DTO dataclasses use stdlib `dataclasses` with `frozen=True`; no `pydantic` or `attrs` dependency.
- Lazy import of concrete strategies is mandatory. The factory's `if BUILD_OKVIS2: from c1_vio.okvis2 import Okvis2Strategy` block is not optional — it is the mechanism by which Tier-0 workstation builds compose without OKVIS2 native libs installed.
- The contract file at `_docs/02_document/contracts/c1_vio/vio_strategy_protocol.md` is the source of truth. If the Protocol shape changes here without the contract updating, that is a Spec-Gap finding (High) per code-review skill Phase 2.
- This task does NOT add new third-party dependencies — `typing.Protocol`, `dataclasses`, `enum` are stdlib.
- `VioOutput` and `VioHealth` MUST be placed in `src/gps_denied_onboard/_types/nav.py` per `module-layout.md` (cross-component DTOs live under `_types`); placing them inside `components/c1_vio/` would force C5 and C13 to import a Layer-3 module from another Layer-3 module, violating the layering table.
## Risks & Mitigation
**Risk 1: Protocol drift between contract and code**
- *Risk*: Strategy implementations diverge from the contract over time; consumers cannot tell which is canonical.
- *Mitigation*: AC-8 contract-introspection test runs in CI; any drift fails the test before merge. The contract file's `## Test Cases` section names this exact test.
**Risk 2: Lazy-import gating is bypassed by a transitively-imported module**
- *Risk*: A consumer imports `c1_vio` (the package) and the package's `__init__.py` eagerly imports a concrete strategy, triggering the OKVIS2 native-lib import even when `BUILD_OKVIS2=OFF`.
- *Mitigation*: The package `__init__.py` re-exports ONLY the Protocol and DTOs and errors — it does NOT import any concrete strategy. AC-5 verifies via `sys.modules` that no strategy module is loaded during a Tier-0 factory call.
**Risk 3: `VioDegradedError` is misused as a hard error**
- *Risk*: A strategy implementation interprets degraded operation as a raise condition and stops emitting `VioOutput`; C5 fusion thinks VIO is dead and falls back to FC IMU prematurely, missing AC-1.3 drift bound during recoverable degradations.
- *Mitigation*: Contract file Invariants section explicitly states degraded operation returns `VioOutput` with inflated covariance + `VioHealth.state = DEGRADED`; the error type exists only for the rare degraded-to-fatal transition. Strategy task specs (separate tasks in this epic) cite this in their Constraints.
**Risk 4: Error hierarchy widens silently**
- *Risk*: A future strategy adds a fourth error type without updating the contract or the family base class.
- *Mitigation*: The contract file lists the canonical three. Implementations MUST raise only members of `c1_vio.errors.VioError`; a strategy raising a non-family error is a Spec-Gap finding (High) at code-review time. AC-3's catch-as-family test catches the obvious case.
## Runtime Completeness
- **Named capability**: typed Protocol + DTOs + error envelope + composition-root selection (architecture / E-C1 / ADR-001 + ADR-002 + ADR-009).
- **Production code that must exist**: real Protocol declaration, real frozen DTOs, real error hierarchy, real composition-root factory with lazy-import gating, real config-loader extension for the strategy enum + thresholds.
- **Allowed external stubs**: tests MAY substitute fake strategy classes that conform to the Protocol; production wiring uses the real strategies from this epic's other tasks.
- **Unacceptable substitutes**: ABCs instead of `typing.Protocol` (would force inheritance changes downstream), `pydantic.BaseModel` instead of `@dataclass(frozen=True)` (would add a runtime validation layer this task does not need), eager imports of concrete strategies in `__init__.py` (would defeat `BUILD_*` gating), or a `strategy: str` config field without an enum (would lose the load-time validation in AC-6).
## Contract
This task produces/implements the contract at `_docs/02_document/contracts/c1_vio/vio_strategy_protocol.md`.
Consumers MUST read that file — not this task spec — to discover the interface.