[AZ-335] C1 warm-start hint persistence + F8 reboot recovery wiring

Adds JsonSidecarWarmStartHintStore (atomic JSON + SHA-256 sidecar via
AZ-280) inside c1_vio, plus the cross-strategy WarmStartWiredStrategy
wrapper + prime_warm_start_from_disk / prime_warm_start_from_fc hooks
at runtime_root. AC-7 post-reset covariance inflation and AC-8 "no
fake confidence" baseline floor are enforced at the wiring layer so
no strategy module needed edits. Adds three c1_vio config fields
(warm_start_store_dir, warm_start_save_period_frames,
post_reset_covariance_inflation_factor) and registers the new FDR
kind vio.warm_start. 34 unit tests cover all 10 ACs + 3 NFRs.

Verdict PASS_WITH_WARNINGS — see
_docs/03_implementation/reviews/batch_56_review.md for the four
non-blocking documentation findings (F1 cold-start log kind shorthand,
F2 strategy-frame pose semantics, F3 dev-hardware perf smoke, F4
runtime_root importing c1-internal _facade_spine for shared FDR
conventions).

Closes AZ-335; depends on AZ-528 (batch 55).

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-14 03:30:46 +03:00
parent f12789ebf0
commit 06f655d8fb
10 changed files with 2239 additions and 3 deletions
@@ -273,7 +273,27 @@ class C1VioConfig:
default 9 per ``vio_strategy_protocol.md`` v1.0.0.
``warm_start_max_frames`` is the convergence budget after
:meth:`VioStrategy.reset_to_warm_start`; default 5.
:meth:`VioStrategy.reset_to_warm_start`; default 5. The same
integer also drives the AZ-335 post-reset covariance-inflation
window (the runtime root inflates the strategy's emitted
covariance for exactly this many frames after every
``reset_to_warm_start``).
``warm_start_store_dir`` is the on-disk directory the AZ-335
warm-start hint store writes ``c1_warm_start.json`` into. Default
``/var/lib/gps_denied_onboard/warm_start/``. The operator's systemd
unit MUST point this at a writable mount on the airborne deployment.
``warm_start_save_period_frames`` throttles the per-frame
save hook — the wiring saves the hint only every Nth successful
``VioOutput`` to bound disk I/O at the 3 Hz frame rate. Default 5
(≈ 0.6 Hz).
``post_reset_covariance_inflation_factor`` multiplies the
strategy's emitted ``pose_covariance_6x6`` for the first
``warm_start_max_frames`` frames after every ``reset_to_warm_start``;
enforced at the wiring layer to defend AC-5.3's "no fake confidence"
invariant. Default 2.0; must be > 1.0 (1.0 would defeat AC-8).
``okvis2`` carries OKVIS2-specific knobs (AZ-332); consulted only
when ``strategy == "okvis2"``.
@@ -288,6 +308,9 @@ class C1VioConfig:
strategy: str = "klt_ransac"
lost_frame_threshold: int = 9
warm_start_max_frames: int = 5
warm_start_store_dir: str = "/var/lib/gps_denied_onboard/warm_start/"
warm_start_save_period_frames: int = 5
post_reset_covariance_inflation_factor: float = 2.0
okvis2: Okvis2Config = field(default_factory=Okvis2Config)
vins_mono: VinsMonoConfig = field(default_factory=VinsMonoConfig)
klt_ransac: KltRansacConfig = field(default_factory=KltRansacConfig)
@@ -305,3 +328,19 @@ class C1VioConfig:
raise ConfigError(
f"C1VioConfig.warm_start_max_frames must be >= 1; got {self.warm_start_max_frames}"
)
if not self.warm_start_store_dir:
raise ConfigError(
"C1VioConfig.warm_start_store_dir must be a non-empty path; "
f"got {self.warm_start_store_dir!r}"
)
if self.warm_start_save_period_frames < 1:
raise ConfigError(
"C1VioConfig.warm_start_save_period_frames must be >= 1; "
f"got {self.warm_start_save_period_frames}"
)
if self.post_reset_covariance_inflation_factor <= 1.0:
raise ConfigError(
"C1VioConfig.post_reset_covariance_inflation_factor must be > 1.0 "
"(1.0 would defeat AC-5.3's 'no fake confidence' floor); "
f"got {self.post_reset_covariance_inflation_factor}"
)
@@ -0,0 +1,439 @@
"""Warm-start hint persistence (AZ-335 / E-C1).
C1-internal storage layer for the warm-start + F8 reboot recovery
wiring. Defines:
- :class:`WarmStartHintStore` (PEP 544 Protocol) — the typed store
contract. Default impl is :class:`JsonSidecarWarmStartHintStore`;
a future operator-managed store (e.g. Redis-backed) can plug in via
the same Protocol without touching the wiring.
- :class:`LoadedWarmStartHint` (frozen dataclass) — what
:meth:`WarmStartHintStore.load` returns: the pose hint plus the
AC-5.3 baseline covariance norm captured at the same save.
- :class:`JsonSidecarWarmStartHintStore` — atomic-JSON-write +
SHA-256 sidecar persistence via :class:`Sha256Sidecar` (AZ-280).
- :class:`WarmStartFcSource` (PEP 544 Protocol) — the consumer-side
structural cut over the C8 ``FcAdapter`` family that
:func:`prime_warm_start_from_fc` consumes. Defined here (NOT
imported from c8) per AZ-507's cross-component rule: a c1 module
must not import from another component's module; consumer-side
Protocol cuts live with the consumer.
The on-disk schema (JSON) is owned by this module; ``version`` is
always ``1`` for this cycle. The schema layout is documented inline
in :func:`_serialise_envelope` / :func:`_deserialise_envelope` so
the round-trip contract stays close to the wire format.
The store is L2 component-internal (NOT in
``c1_vio/__init__.py``'s public surface); the runtime root pulls
the concrete class via this module path at composition time, the
same lazy-import pattern used by the AZ-331 vio_factory for
strategy modules.
"""
from __future__ import annotations
import json
from dataclasses import dataclass
from pathlib import Path
from typing import Any, Protocol, runtime_checkable
import numpy as np
from gps_denied_onboard._types.nav import ImuBias, WarmStartPose
from gps_denied_onboard.helpers.se3_utils import (
Se3InvalidMatrixError,
matrix_to_se3,
se3_to_matrix,
)
from gps_denied_onboard.helpers.sha256_sidecar import (
SIDECAR_SUFFIX,
Sha256Sidecar,
Sha256SidecarError,
)
from gps_denied_onboard.logging import get_logger
__all__ = [
"HINT_FILENAME",
"HINT_SCHEMA_VERSION",
"JsonSidecarWarmStartHintStore",
"LoadedWarmStartHint",
"WarmStartFcSource",
"WarmStartHintStore",
]
HINT_FILENAME: str = "c1_warm_start.json"
HINT_SCHEMA_VERSION: int = 1
_LOGGER_NAME: str = "components.c1_vio.warm_start_store"
_LOGGER_COMPONENT: str = "c1_vio"
@dataclass(frozen=True)
class LoadedWarmStartHint:
"""What :meth:`WarmStartHintStore.load` returns on success.
``pose`` is the persisted :class:`WarmStartPose` deep-equal to the
last saved hint. ``pre_reboot_covariance_norm`` is the Frobenius
norm of the strategy's last steady-state ``pose_covariance_6x6``
captured by the wiring at save time — the F8 reload path uses
this as the AC-5.3 / AC-8 "no fake confidence" floor.
``calibration_id`` is the camera-calibration identifier the hint
was produced under; the wiring rejects the hint if the current
calibration differs (Risk 2 mitigation).
"""
pose: WarmStartPose
pre_reboot_covariance_norm: float
calibration_id: str
@runtime_checkable
class WarmStartHintStore(Protocol):
"""Persistence contract for a single warm-start hint per c1_vio process.
Implementations MUST satisfy:
- :meth:`save` is atomic (no half-written file is ever loadable).
- :meth:`load` returns ``None`` on cold start (no prior hint),
on sidecar mismatch (corruption), and on calibration mismatch
(Risk 2). All three cases are observable via INFO/WARN logs.
- :meth:`clear` removes both the payload file and its sidecar
together (no half-cleared state).
"""
def save(
self,
hint: WarmStartPose,
*,
pre_reboot_covariance_norm: float,
) -> None: ...
def load(self) -> LoadedWarmStartHint | None: ...
def clear(self) -> None: ...
@runtime_checkable
class WarmStartFcSource(Protocol):
"""Consumer-side cut over the C8 ``FcAdapter`` family (AZ-507).
The F2 takeoff prime path calls :meth:`fetch_warm_start_pose` to
pull the FC EKF's last valid GPS + IMU-extrapolated pose. The
return is ``None`` when the FC has no valid GPS yet (the prime
path then degrades to cold-start with a WARN log; AC-NFR-no-crash).
The runtime-root composition wires a thin adapter from the
concrete C8 :class:`FcAdapter` to this Protocol; tests inject a
fake matching this surface directly. NEVER import a c8 concrete
adapter from inside c1_vio.
"""
def fetch_warm_start_pose(self) -> WarmStartPose | None: ...
def calibration_id(self) -> str: ...
def _serialise_envelope(
hint: WarmStartPose,
*,
pre_reboot_covariance_norm: float,
calibration_id: str,
) -> bytes:
"""Pack ``hint`` into the on-disk JSON envelope.
Schema v1 layout (top-level dict):
- ``version`` (int) — always :data:`HINT_SCHEMA_VERSION`.
- ``calibration_id`` (str) — see Risk 2 mitigation.
- ``pre_reboot_covariance_norm`` (float) — AC-5.3 / AC-8 baseline.
- ``pose`` (dict) — the :class:`WarmStartPose` flattened to
JSON-native types: ``body_T_world_4x4`` (4-list of 4-list of
float), ``velocity_b`` (3-list of float), ``bias`` (dict with
``accel_bias`` + ``gyro_bias`` 3-lists of float),
``captured_at_ns`` (int).
"""
matrix = se3_to_matrix(hint.body_T_world)
envelope: dict[str, Any] = {
"version": HINT_SCHEMA_VERSION,
"calibration_id": calibration_id,
"pre_reboot_covariance_norm": float(pre_reboot_covariance_norm),
"pose": {
"body_T_world_4x4": matrix.tolist(),
"velocity_b": [float(v) for v in hint.velocity_b],
"bias": {
"accel_bias": [float(v) for v in hint.bias.accel_bias],
"gyro_bias": [float(v) for v in hint.bias.gyro_bias],
},
"captured_at_ns": int(hint.captured_at_ns),
},
}
return json.dumps(envelope, sort_keys=True).encode("utf-8")
def _deserialise_envelope(
payload: bytes,
) -> tuple[WarmStartPose, float, str]:
"""Inverse of :func:`_serialise_envelope`.
Raises :class:`ValueError` (with context) on any structural
deviation from schema v1 — the calling :meth:`load` routes those
failures through the same WARN-and-return-None path as a sidecar
mismatch (the file is not loadable; cold-start is the right
fallback).
"""
try:
decoded = json.loads(payload.decode("utf-8"))
except (UnicodeDecodeError, json.JSONDecodeError) as exc:
raise ValueError(f"warm-start hint payload is not valid UTF-8 JSON: {exc}") from exc
if not isinstance(decoded, dict):
raise ValueError(
f"warm-start hint payload must decode to a dict; got {type(decoded).__name__}"
)
version = decoded.get("version")
if version != HINT_SCHEMA_VERSION:
raise ValueError(
f"warm-start hint version mismatch: expected {HINT_SCHEMA_VERSION}, got {version!r}"
)
calibration_id = decoded.get("calibration_id")
if not isinstance(calibration_id, str) or not calibration_id:
raise ValueError(
f"warm-start hint envelope missing non-empty calibration_id; got {calibration_id!r}"
)
pre_reboot_covariance_norm = decoded.get("pre_reboot_covariance_norm")
if not isinstance(pre_reboot_covariance_norm, (int, float)) or isinstance(
pre_reboot_covariance_norm, bool
):
raise ValueError(
"warm-start hint envelope.pre_reboot_covariance_norm must be a float; "
f"got {pre_reboot_covariance_norm!r}"
)
pose_dict = decoded.get("pose")
if not isinstance(pose_dict, dict):
raise ValueError(
f"warm-start hint envelope.pose must be a dict; got {type(pose_dict).__name__}"
)
matrix_list = pose_dict.get("body_T_world_4x4")
if not isinstance(matrix_list, list) or len(matrix_list) != 4:
raise ValueError("warm-start hint pose.body_T_world_4x4 must be a 4-list of rows")
try:
matrix = np.asarray(matrix_list, dtype=np.float64)
except (TypeError, ValueError) as exc:
raise ValueError(f"warm-start hint pose.body_T_world_4x4 not numeric: {exc}") from exc
try:
body_T_world = matrix_to_se3(matrix)
except Se3InvalidMatrixError as exc:
raise ValueError(f"warm-start hint pose.body_T_world_4x4 not a valid SE(3): {exc}") from exc
velocity_list = pose_dict.get("velocity_b")
if not isinstance(velocity_list, list) or len(velocity_list) != 3:
raise ValueError("warm-start hint pose.velocity_b must be a 3-list of floats")
velocity_b = (
float(velocity_list[0]),
float(velocity_list[1]),
float(velocity_list[2]),
)
bias_dict = pose_dict.get("bias")
if not isinstance(bias_dict, dict):
raise ValueError("warm-start hint pose.bias must be a dict")
accel_list = bias_dict.get("accel_bias")
gyro_list = bias_dict.get("gyro_bias")
if (
not isinstance(accel_list, list)
or len(accel_list) != 3
or not isinstance(gyro_list, list)
or len(gyro_list) != 3
):
raise ValueError(
"warm-start hint pose.bias must contain 3-list accel_bias and 3-list gyro_bias"
)
bias = ImuBias(
accel_bias=(float(accel_list[0]), float(accel_list[1]), float(accel_list[2])),
gyro_bias=(float(gyro_list[0]), float(gyro_list[1]), float(gyro_list[2])),
)
captured_at_ns = pose_dict.get("captured_at_ns")
if not isinstance(captured_at_ns, int) or isinstance(captured_at_ns, bool):
raise ValueError(
f"warm-start hint pose.captured_at_ns must be an int; got {captured_at_ns!r}"
)
pose = WarmStartPose(
body_T_world=body_T_world,
velocity_b=velocity_b,
bias=bias,
captured_at_ns=captured_at_ns,
)
return pose, float(pre_reboot_covariance_norm), calibration_id
class JsonSidecarWarmStartHintStore:
"""Default :class:`WarmStartHintStore` impl backed by JSON + SHA-256 sidecar.
``store_dir`` is the directory the hint file lives in; created on
first ``save`` if missing. ``calibration_id`` is bound at
construction time — the composition root reads
:class:`CameraCalibration.id` once and passes it here. A loaded
hint whose ``calibration_id`` differs from the constructor value
is rejected (returns ``None`` + WARN log) per Risk 2.
The atomic-write and sidecar-verify guarantees come from
:class:`Sha256Sidecar` (AZ-280); this class never opens the
payload file directly except through that helper. The class is
process-local (no cross-process locking) — by AZ-331 invariant
the c1_vio strategy is single-instanced per process and the
composition root owns this store.
"""
def __init__(self, store_dir: Path, *, calibration_id: str) -> None:
if not calibration_id:
raise ValueError(
"JsonSidecarWarmStartHintStore.calibration_id must be a non-empty string"
)
self._store_dir = Path(store_dir)
self._calibration_id = calibration_id
self._payload_path = self._store_dir / HINT_FILENAME
self._sidecar_path = Path(str(self._payload_path) + SIDECAR_SUFFIX)
self._log = get_logger(_LOGGER_NAME)
@property
def payload_path(self) -> Path:
"""The on-disk JSON file path (exposed for tests + forensics)."""
return self._payload_path
@property
def sidecar_path(self) -> Path:
"""The sidecar ``<payload>.sha256`` path (exposed for tests + forensics)."""
return self._sidecar_path
def save(
self,
hint: WarmStartPose,
*,
pre_reboot_covariance_norm: float,
) -> None:
"""Write the envelope atomically + sidecar.
Failures (write errors, parent-dir creation errors) propagate
as :class:`Sha256SidecarError` / :class:`OSError` so the
caller can route them through the wiring's no-crash policy
(the wiring catches these and emits an ERROR log per
AC-NFR-no-crash; the process keeps running and falls through
to cold-start on the next prime).
"""
self._store_dir.mkdir(parents=True, exist_ok=True)
payload = _serialise_envelope(
hint,
pre_reboot_covariance_norm=pre_reboot_covariance_norm,
calibration_id=self._calibration_id,
)
Sha256Sidecar.write_atomic_and_sidecar(self._payload_path, payload)
def load(self) -> LoadedWarmStartHint | None:
"""Return the persisted hint, or ``None`` on any non-loadable state.
Branches that emit ``None``:
- Payload file does not exist (cold start; no INFO log here —
the prime path emits ``c1.warm_start.cold_start``).
- Sidecar does not exist or is malformed (corruption — WARN
log ``c1.warm_start.corrupted`` with the offending path).
The file is NOT silently deleted (operator may want to
forensically inspect — AC-2).
- SHA-256 mismatch (corruption — same WARN log).
- JSON envelope structurally invalid (corruption — same WARN
log; the on-disk file is left intact).
- ``calibration_id`` mismatch (Risk 2 — WARN log
``c1.warm_start.calibration_mismatch``; not the same kind
as ``corrupted`` because the file IS valid, just stale).
"""
if not self._payload_path.exists():
return None
try:
verified = Sha256Sidecar.verify(self._payload_path)
except Sha256SidecarError as exc:
self._emit_corrupted_warning(reason=str(exc))
return None
if not verified:
self._emit_corrupted_warning(reason="sha256_mismatch")
return None
try:
payload = self._payload_path.read_bytes()
except OSError as exc:
self._emit_corrupted_warning(reason=f"oserror: {exc}")
return None
try:
pose, pre_reboot_norm, on_disk_calibration_id = _deserialise_envelope(payload)
except ValueError as exc:
self._emit_corrupted_warning(reason=str(exc))
return None
if on_disk_calibration_id != self._calibration_id:
self._log.warning(
"warm-start hint calibration mismatch",
extra={
"component": _LOGGER_COMPONENT,
"kind": "c1.warm_start.calibration_mismatch",
"kv": {
"path": str(self._payload_path),
"saved_calibration_id": on_disk_calibration_id,
"current_calibration_id": self._calibration_id,
},
},
)
return None
return LoadedWarmStartHint(
pose=pose,
pre_reboot_covariance_norm=pre_reboot_norm,
calibration_id=on_disk_calibration_id,
)
def _emit_corrupted_warning(self, *, reason: str) -> None:
"""Single emission point for the AC-2 ``c1.warm_start.corrupted`` WARN."""
self._log.warning(
"warm-start hint corrupted",
extra={
"component": _LOGGER_COMPONENT,
"kind": "c1.warm_start.corrupted",
"kv": {
"path": str(self._payload_path),
"reason": reason,
},
},
)
def clear(self) -> None:
"""Remove both the payload file and its sidecar.
Idempotent — missing files are not an error. Emits ONE INFO
log on every invocation, regardless of whether a file existed,
so the operator log shows the explicit clear action.
"""
for path in (self._payload_path, self._sidecar_path):
try:
path.unlink(missing_ok=True)
except OSError as exc:
self._log.error(
"warm-start hint clear failed",
extra={
"component": _LOGGER_COMPONENT,
"kind": "c1.warm_start.clear_failed",
"kv": {
"path": str(path),
"reason": str(exc),
},
},
)
raise
self._log.info(
"warm-start hint store cleared",
extra={
"component": _LOGGER_COMPONENT,
"kind": "c1.warm_start.cleared",
"kv": {
"store_dir": str(self._store_dir),
},
},
)
@@ -47,6 +47,28 @@ KNOWN_PAYLOAD_KEYS: Final[dict[str, frozenset[str]]] = {
"vio.health": frozenset(
{"state", "consecutive_lost", "bias_norm", "strategy_label", "frame_id"}
),
# AZ-335 / E-C1: emitted by the warm-start wiring on every successful
# `prime_warm_start_*` invocation (F2 takeoff load, F8 reboot reload,
# cold-start fall-through). Exactly ONE record per prime call.
# `source` is one of "f2_takeoff_fc" | "f8_reboot_disk" |
# "cold_start_no_hint" — distinguishes the three runtime paths so
# post-flight forensics can answer "did this flight reuse a prior
# hint?". `bias_norm` is the L2 norm of the loaded hint's accel||gyro
# bias (None on cold start, since there is no hint). `staleness_ns`
# is the monotonic-ns delta between hint capture and prime time
# (None on cold start). `pre_reboot_covariance_norm` is the AC-8
# baseline carried alongside the hint on the F8 path (None on F2
# and cold start, since the wiring's covariance floor is only
# enforced on the F8 reload path).
"vio.warm_start": frozenset(
{
"source",
"strategy_label",
"bias_norm",
"staleness_ns",
"pre_reboot_covariance_norm",
}
),
"state.tick": frozenset({"frame_id", "fused_pose", "covariance_2x2", "estimator_label"}),
"tile_match": frozenset({"frame_id", "tile_id", "score", "match_count", "ransac_inliers"}),
"overrun": frozenset({"producer_id", "dropped_count"}),
@@ -0,0 +1,562 @@
"""C1 warm-start runtime wiring (AZ-335 / E-C1).
Cross-strategy orchestration for warm-start hint persistence + F2
takeoff load + F8 reboot recovery. The wiring lives at the
composition root because the concerns it implements span more than
the :class:`VioStrategy` Protocol surface:
- AC-5.1 / AC-5.3 require a hint flow ``FC EKF → strategy``
(F2 takeoff) and ``disk → strategy`` (F8 reboot) that no single
strategy can implement on its own.
- The post-reset covariance inflation + AC-5.3 "no fake confidence"
floor is enforced HERE, not inside any strategy — adding the
inflation to a strategy would double-inflate when the wiring also
inflates (Constraints, AZ-335 task spec).
- The per-frame save throttle keeps disk I/O bounded at the 3 Hz
steady-state frame rate.
Public surface:
- :class:`WarmStartWiredStrategy` — a :class:`VioStrategy` impl that
wraps any concrete :class:`VioStrategy` (OKVIS2 / VINS-Mono /
KLT-RANSAC) with the per-frame save + post-reset covariance
inflation + AC-8 baseline floor. Exposes the standard Protocol
methods PLUS :meth:`prime_post_reboot` which the F8 prime path
uses to install the loaded baseline.
- :func:`prime_warm_start_from_disk` — F8 reboot prime hook.
- :func:`prime_warm_start_from_fc` — F2 takeoff prime hook.
The composition root constructs a :class:`WarmStartWiredStrategy`
from ``runtime_root.vio_factory.build_vio_strategy(config,
fdr_client=...)`` and the per-binary :class:`WarmStartHintStore`,
then calls :func:`prime_warm_start_from_disk` once at process
startup before the first ``process_frame``. The F2 hook is invoked
on the FC's ``flight_state`` transition to ``IN_AIR`` (operator-side
or auto-detected; that wiring is owned by the composition root, not
this module).
"""
from __future__ import annotations
import time
from dataclasses import replace
from typing import TYPE_CHECKING, Any, Literal
import numpy as np
from gps_denied_onboard._types.nav import (
ImuWindow,
NavCameraFrame,
VioHealth,
VioOutput,
WarmStartPose,
)
from gps_denied_onboard.components.c1_vio._facade_spine import bias_norm, now_iso
from gps_denied_onboard.components.c1_vio.interface import VioStrategy
from gps_denied_onboard.components.c1_vio.warm_start_store import (
LoadedWarmStartHint,
WarmStartFcSource,
WarmStartHintStore,
)
from gps_denied_onboard.fdr_client.records import CURRENT_SCHEMA_VERSION, FdrRecord
from gps_denied_onboard.logging import get_logger
if TYPE_CHECKING:
from gps_denied_onboard._types.calibration import CameraCalibration
from gps_denied_onboard.fdr_client.client import FdrClient
__all__ = [
"WARM_START_PRODUCER_ID",
"WarmStartWiredStrategy",
"prime_warm_start_from_disk",
"prime_warm_start_from_fc",
]
WARM_START_PRODUCER_ID: str = "components.c1_vio.warm_start"
_LOGGER_NAME: str = "components.c1_vio.warm_start_wiring"
_LOGGER_COMPONENT: str = "c1_vio"
_SOURCE_F2_TAKEOFF: str = "f2_takeoff_fc"
_SOURCE_F8_REBOOT: str = "f8_reboot_disk"
_SOURCE_COLD_START: str = "cold_start_no_hint"
def _frobenius_norm(matrix: Any) -> float:
"""Frobenius norm of a 6×6 covariance, hardened against non-array inputs."""
arr = np.asarray(matrix, dtype=np.float64)
return float(np.linalg.norm(arr, ord="fro"))
class WarmStartWiredStrategy:
"""Facade around a concrete :class:`VioStrategy` with AZ-335 wiring.
Wraps an inner strategy so that:
1. Every successful :meth:`process_frame` is replicated to the
:class:`WarmStartHintStore` once every
``warm_start_save_period_frames`` frames (AC-6).
2. For the first ``warm_start_max_frames`` frames after every
:meth:`reset_to_warm_start` call, the emitted
``pose_covariance_6x6`` is multiplied by
``post_reset_covariance_inflation_factor`` (AC-7).
3. When a baseline floor was installed by
:meth:`prime_post_reboot`, post-reset frames are additionally
scaled up so their Frobenius norm is at least the saved
pre-reboot value (AC-8 — the "no fake confidence" invariant).
The wrapper is itself a :class:`VioStrategy` (PEP 544 structural
typing). ``runtime_checkable`` conformance is verified by the
AZ-335 unit tests; downstream consumers (C5 fusion, C13 FDR)
cannot tell the difference between the wrapped and the bare
strategy because the public Protocol shape is preserved.
Per-frame save errors do NOT crash the process — a
:class:`Sha256SidecarError` or :class:`OSError` raised by
:meth:`WarmStartHintStore.save` is logged at ERROR (kind
``c1.warm_start.save_failed``) and swallowed so the camera
ingest hot path keeps flowing (AC-NFR-no-crash).
"""
def __init__(
self,
inner: VioStrategy,
*,
store: WarmStartHintStore,
warm_start_max_frames: int,
post_reset_covariance_inflation_factor: float,
warm_start_save_period_frames: int,
) -> None:
if warm_start_max_frames < 1:
raise ValueError(
"warm_start_max_frames must be >= 1; "
f"got {warm_start_max_frames}"
)
if post_reset_covariance_inflation_factor <= 1.0:
raise ValueError(
"post_reset_covariance_inflation_factor must be > 1.0 "
"(1.0 would defeat AC-5.3 / AC-8 floor); "
f"got {post_reset_covariance_inflation_factor}"
)
if warm_start_save_period_frames < 1:
raise ValueError(
"warm_start_save_period_frames must be >= 1; "
f"got {warm_start_save_period_frames}"
)
self._inner = inner
self._store = store
self._max_frames = warm_start_max_frames
self._inflation_factor = float(post_reset_covariance_inflation_factor)
self._save_period = warm_start_save_period_frames
self._post_reset_remaining: int = 0
self._baseline_floor: float = 0.0
self._frames_since_save: int = 0
self._last_emitted_covariance_norm: float = 0.0
self._log = get_logger(_LOGGER_NAME)
@property
def post_reset_remaining(self) -> int:
"""Frames left in the active inflation window (0 in steady-state)."""
return self._post_reset_remaining
@property
def baseline_floor(self) -> float:
"""Currently installed AC-8 covariance floor (0.0 when no F8 prime)."""
return self._baseline_floor
@property
def last_emitted_covariance_norm(self) -> float:
"""Frobenius norm of the last :class:`VioOutput` returned to the consumer."""
return self._last_emitted_covariance_norm
def process_frame(
self,
frame: NavCameraFrame,
imu: ImuWindow,
calibration: "CameraCalibration",
) -> VioOutput:
"""Forward to inner strategy, then apply inflation + throttled save."""
out = self._inner.process_frame(frame, imu, calibration)
if self._post_reset_remaining > 0:
out = self._apply_post_reset_inflation(out)
self._post_reset_remaining -= 1
self._last_emitted_covariance_norm = _frobenius_norm(out.pose_covariance_6x6)
self._frames_since_save += 1
if self._frames_since_save >= self._save_period:
self._frames_since_save = 0
self._save_hint_from_output(out)
return out
def reset_to_warm_start(self, hint: WarmStartPose) -> None:
"""Protocol method: forward to inner, arm inflation window WITHOUT a floor.
Used by the F2 takeoff prime path — the FC EKF supplies a
fresh pose, so there is no pre-reboot baseline to defend
against. The :data:`_baseline_floor` attribute is reset to
``0.0`` so the AC-8 max() degenerates to plain inflation.
"""
self._inner.reset_to_warm_start(hint)
self._post_reset_remaining = self._max_frames
self._baseline_floor = 0.0
self._frames_since_save = 0
def prime_post_reboot(self, loaded: LoadedWarmStartHint) -> None:
"""Wrapper extension: F8 reboot path, installs the AC-8 floor.
Forwards the loaded pose to the inner strategy via
:meth:`reset_to_warm_start`, then arms the inflation window
AND captures ``loaded.pre_reboot_covariance_norm`` as the
floor that subsequent :meth:`process_frame` calls must
respect for ``warm_start_max_frames`` frames.
NOT a Protocol method — the autodev-injected F8 path calls
this directly on a :class:`WarmStartWiredStrategy` instance.
"""
self._inner.reset_to_warm_start(loaded.pose)
self._post_reset_remaining = self._max_frames
self._baseline_floor = float(loaded.pre_reboot_covariance_norm)
self._frames_since_save = 0
def health_snapshot(self) -> VioHealth:
"""Forward unchanged — health is a strategy concern, not a wiring concern."""
return self._inner.health_snapshot()
def current_strategy_label(
self,
) -> Literal["okvis2", "vins_mono", "klt_ransac"]:
"""Forward unchanged so :class:`VioHealth.strategy_label` audit is honest."""
return self._inner.current_strategy_label()
def _apply_post_reset_inflation(self, out: VioOutput) -> VioOutput:
"""Inflate the emitted covariance by the configured factor + AC-8 floor.
AC-7: inflated norm = factor × strategy_emitted_norm. AC-8:
further scale up so inflated norm ≥ ``_baseline_floor``. Both
scalings preserve symmetry and positive-definiteness because
they are pure positive scalar multiplications of the SPD
matrix (eigenvalues stay strictly positive).
"""
original = np.asarray(out.pose_covariance_6x6, dtype=np.float64)
inflated = original * self._inflation_factor
inflated_norm = float(np.linalg.norm(inflated, ord="fro"))
if (
self._baseline_floor > 0.0
and inflated_norm > 0.0
and inflated_norm < self._baseline_floor
):
scale = self._baseline_floor / inflated_norm
inflated = inflated * scale
return replace(out, pose_covariance_6x6=inflated)
def _save_hint_from_output(self, out: VioOutput) -> None:
"""Construct a :class:`WarmStartPose` from the last emitted output and save.
``velocity_b`` is left at zero — the wrapper has no velocity
source on the per-frame save path (the strategy's
:class:`VioOutput` does not expose velocity, and chasing it
would require a numerical-differentiation sidecar that
belongs in a future cycle). On F8 reload the strategy
re-estimates velocity from its IMU integration, so a
zero-velocity hint is acceptable for the recovery path.
Per-frame save failures do NOT propagate — they are logged
at ERROR and swallowed (AC-NFR-no-crash). The hint store
will be in whatever state the failed atomic-write left it
(the AZ-280 contract guarantees no half-written file).
"""
hint = WarmStartPose(
body_T_world=out.relative_pose_T,
velocity_b=(0.0, 0.0, 0.0),
bias=out.imu_bias,
captured_at_ns=int(out.emitted_at_ns),
)
try:
self._store.save(
hint,
pre_reboot_covariance_norm=self._last_emitted_covariance_norm,
)
except (OSError, RuntimeError, ValueError) as exc:
self._log.error(
"warm-start hint save failed",
extra={
"component": _LOGGER_COMPONENT,
"kind": "c1.warm_start.save_failed",
"kv": {
"reason": str(exc),
"frame_id": out.frame_id,
},
},
)
def _emit_prime_fdr(
*,
fdr_client: "FdrClient",
source: str,
strategy_label: str,
bias_norm_value: float | None,
staleness_ns: int | None,
pre_reboot_covariance_norm: float | None,
) -> None:
"""Emit the single AZ-335 ``vio.warm_start`` FDR record."""
record = FdrRecord(
schema_version=CURRENT_SCHEMA_VERSION,
ts=now_iso(),
producer_id=WARM_START_PRODUCER_ID,
kind="vio.warm_start",
payload={
"source": source,
"strategy_label": strategy_label,
"bias_norm": bias_norm_value,
"staleness_ns": staleness_ns,
"pre_reboot_covariance_norm": pre_reboot_covariance_norm,
},
)
fdr_client.enqueue(record)
def _emit_prime_log(
*,
log: Any,
level: str,
msg: str,
source: str,
strategy_label: str,
extra_kv: dict[str, Any] | None = None,
) -> None:
"""Single emission point for prime-time INFO/WARN logs."""
kv: dict[str, Any] = {
"source": source,
"strategy_label": strategy_label,
}
if extra_kv:
kv.update(extra_kv)
record_extra = {
"component": _LOGGER_COMPONENT,
"kind": f"c1.warm_start.{source}",
"kv": kv,
}
if level == "warning":
log.warning(msg, extra=record_extra)
else:
log.info(msg, extra=record_extra)
def prime_warm_start_from_disk(
strategy: WarmStartWiredStrategy,
store: WarmStartHintStore,
*,
fdr_client: "FdrClient",
) -> bool:
"""F8 reboot prime hook — called at process startup before first ``process_frame``.
Reads the persisted hint via ``store.load()``:
- If a hint is loaded, calls :meth:`WarmStartWiredStrategy.prime_post_reboot`
(which forwards to the inner strategy AND installs the AC-8 floor),
emits one INFO log ``c1.warm_start.f8_reboot_disk``, and emits one
FDR record ``vio.warm_start`` with ``source="f8_reboot_disk"``.
- If ``store.load()`` returns ``None`` (cold start, corrupted file,
calibration mismatch), emits one INFO log
``c1.warm_start.cold_start_no_hint`` and one FDR record with
``source="cold_start_no_hint"``. The strategy is left untouched
and proceeds with its own INIT-state behaviour.
Returns ``True`` iff a hint was loaded AND applied. Never raises:
a :class:`VioFatalError` from the inner strategy's
:meth:`reset_to_warm_start` is caught, logged at ERROR
(``c1.warm_start.reset_failed``), and the function returns
``False`` so the camera ingest can still start in cold-start mode.
"""
log = get_logger(_LOGGER_NAME)
strategy_label = strategy.current_strategy_label()
loaded = store.load()
if loaded is None:
_emit_prime_log(
log=log,
level="info",
msg="warm-start cold start — no prior hint",
source=_SOURCE_COLD_START,
strategy_label=strategy_label,
)
_emit_prime_fdr(
fdr_client=fdr_client,
source=_SOURCE_COLD_START,
strategy_label=strategy_label,
bias_norm_value=None,
staleness_ns=None,
pre_reboot_covariance_norm=None,
)
return False
try:
strategy.prime_post_reboot(loaded)
except Exception as exc:
log.error(
"warm-start prime_post_reboot failed",
extra={
"component": _LOGGER_COMPONENT,
"kind": "c1.warm_start.reset_failed",
"kv": {
"source": _SOURCE_F8_REBOOT,
"strategy_label": strategy_label,
"reason": str(exc),
},
},
)
return False
staleness_ns = max(0, int(time.monotonic_ns()) - int(loaded.pose.captured_at_ns))
_emit_prime_log(
log=log,
level="info",
msg="warm-start F8 reboot — hint loaded from disk",
source=_SOURCE_F8_REBOOT,
strategy_label=strategy_label,
extra_kv={
"staleness_ns": staleness_ns,
"pre_reboot_covariance_norm": loaded.pre_reboot_covariance_norm,
},
)
_emit_prime_fdr(
fdr_client=fdr_client,
source=_SOURCE_F8_REBOOT,
strategy_label=strategy_label,
bias_norm_value=bias_norm(loaded.pose.bias),
staleness_ns=staleness_ns,
pre_reboot_covariance_norm=loaded.pre_reboot_covariance_norm,
)
return True
def prime_warm_start_from_fc(
strategy: WarmStartWiredStrategy,
source: WarmStartFcSource,
store: WarmStartHintStore,
*,
fdr_client: "FdrClient",
) -> bool:
"""F2 takeoff prime hook — called once on the ``IN_AIR`` flight-state edge.
Asks the consumer-side cut for the FC EKF's last valid pose:
- If a hint is returned, calls :meth:`WarmStartWiredStrategy.reset_to_warm_start`
(the inflation window arms WITHOUT an AC-8 floor — there is no
pre-reboot baseline on the F2 path because the FC just provided
a fresh pose), persists the same hint via ``store.save`` so the
next F8 reboot can recover from it, and emits the INFO log +
FDR record with ``source="f2_takeoff_fc"``.
- If the source returns ``None`` or raises, emits one WARN log
``c1.warm_start.f2_takeoff_fc_unavailable`` and an FDR record
with ``source="cold_start_no_hint"``; the strategy is left in
its current state and the camera ingest proceeds (AC-NFR-no-crash).
Returns ``True`` iff a hint was fetched, applied, AND persisted.
Never raises.
"""
log = get_logger(_LOGGER_NAME)
strategy_label = strategy.current_strategy_label()
try:
hint = source.fetch_warm_start_pose()
except Exception as exc:
log.warning(
"warm-start FC fetch raised",
extra={
"component": _LOGGER_COMPONENT,
"kind": "c1.warm_start.f2_takeoff_fc_unavailable",
"kv": {
"source": _SOURCE_F2_TAKEOFF,
"strategy_label": strategy_label,
"reason": str(exc),
},
},
)
_emit_prime_fdr(
fdr_client=fdr_client,
source=_SOURCE_COLD_START,
strategy_label=strategy_label,
bias_norm_value=None,
staleness_ns=None,
pre_reboot_covariance_norm=None,
)
return False
if hint is None:
log.warning(
"warm-start FC has no valid pose yet",
extra={
"component": _LOGGER_COMPONENT,
"kind": "c1.warm_start.f2_takeoff_fc_unavailable",
"kv": {
"source": _SOURCE_F2_TAKEOFF,
"strategy_label": strategy_label,
"reason": "fc_returned_none",
},
},
)
_emit_prime_fdr(
fdr_client=fdr_client,
source=_SOURCE_COLD_START,
strategy_label=strategy_label,
bias_norm_value=None,
staleness_ns=None,
pre_reboot_covariance_norm=None,
)
return False
try:
strategy.reset_to_warm_start(hint)
except Exception as exc:
log.error(
"warm-start F2 reset_to_warm_start failed",
extra={
"component": _LOGGER_COMPONENT,
"kind": "c1.warm_start.reset_failed",
"kv": {
"source": _SOURCE_F2_TAKEOFF,
"strategy_label": strategy_label,
"reason": str(exc),
},
},
)
return False
try:
store.save(hint, pre_reboot_covariance_norm=0.0)
except (OSError, RuntimeError, ValueError) as exc:
log.error(
"warm-start F2 persist failed",
extra={
"component": _LOGGER_COMPONENT,
"kind": "c1.warm_start.save_failed",
"kv": {
"source": _SOURCE_F2_TAKEOFF,
"strategy_label": strategy_label,
"reason": str(exc),
},
},
)
# the strategy already accepted the hint; the FDR record
# below still records the F2 prime for audit, but we return
# False to indicate persistence did not complete. The next
# successful per-frame save will restore the on-disk state.
_emit_prime_fdr(
fdr_client=fdr_client,
source=_SOURCE_F2_TAKEOFF,
strategy_label=strategy_label,
bias_norm_value=bias_norm(hint.bias),
staleness_ns=None,
pre_reboot_covariance_norm=None,
)
return False
_emit_prime_log(
log=log,
level="info",
msg="warm-start F2 takeoff — hint primed from FC",
source=_SOURCE_F2_TAKEOFF,
strategy_label=strategy_label,
)
_emit_prime_fdr(
fdr_client=fdr_client,
source=_SOURCE_F2_TAKEOFF,
strategy_label=strategy_label,
bias_norm_value=bias_norm(hint.bias),
staleness_ns=None,
pre_reboot_covariance_norm=None,
)
return True