[AZ-335] C1 warm-start hint persistence + F8 reboot recovery wiring

Adds JsonSidecarWarmStartHintStore (atomic JSON + SHA-256 sidecar via
AZ-280) inside c1_vio, plus the cross-strategy WarmStartWiredStrategy
wrapper + prime_warm_start_from_disk / prime_warm_start_from_fc hooks
at runtime_root. AC-7 post-reset covariance inflation and AC-8 "no
fake confidence" baseline floor are enforced at the wiring layer so
no strategy module needed edits. Adds three c1_vio config fields
(warm_start_store_dir, warm_start_save_period_frames,
post_reset_covariance_inflation_factor) and registers the new FDR
kind vio.warm_start. 34 unit tests cover all 10 ACs + 3 NFRs.

Verdict PASS_WITH_WARNINGS — see
_docs/03_implementation/reviews/batch_56_review.md for the four
non-blocking documentation findings (F1 cold-start log kind shorthand,
F2 strategy-frame pose semantics, F3 dev-hardware perf smoke, F4
runtime_root importing c1-internal _facade_spine for shared FDR
conventions).

Closes AZ-335; depends on AZ-528 (batch 55).

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-14 03:30:46 +03:00
parent f12789ebf0
commit 06f655d8fb
10 changed files with 2239 additions and 3 deletions
@@ -273,7 +273,27 @@ class C1VioConfig:
default 9 per ``vio_strategy_protocol.md`` v1.0.0.
``warm_start_max_frames`` is the convergence budget after
:meth:`VioStrategy.reset_to_warm_start`; default 5.
:meth:`VioStrategy.reset_to_warm_start`; default 5. The same
integer also drives the AZ-335 post-reset covariance-inflation
window (the runtime root inflates the strategy's emitted
covariance for exactly this many frames after every
``reset_to_warm_start``).
``warm_start_store_dir`` is the on-disk directory the AZ-335
warm-start hint store writes ``c1_warm_start.json`` into. Default
``/var/lib/gps_denied_onboard/warm_start/``. The operator's systemd
unit MUST point this at a writable mount on the airborne deployment.
``warm_start_save_period_frames`` throttles the per-frame
save hook — the wiring saves the hint only every Nth successful
``VioOutput`` to bound disk I/O at the 3 Hz frame rate. Default 5
(≈ 0.6 Hz).
``post_reset_covariance_inflation_factor`` multiplies the
strategy's emitted ``pose_covariance_6x6`` for the first
``warm_start_max_frames`` frames after every ``reset_to_warm_start``;
enforced at the wiring layer to defend AC-5.3's "no fake confidence"
invariant. Default 2.0; must be > 1.0 (1.0 would defeat AC-8).
``okvis2`` carries OKVIS2-specific knobs (AZ-332); consulted only
when ``strategy == "okvis2"``.
@@ -288,6 +308,9 @@ class C1VioConfig:
strategy: str = "klt_ransac"
lost_frame_threshold: int = 9
warm_start_max_frames: int = 5
warm_start_store_dir: str = "/var/lib/gps_denied_onboard/warm_start/"
warm_start_save_period_frames: int = 5
post_reset_covariance_inflation_factor: float = 2.0
okvis2: Okvis2Config = field(default_factory=Okvis2Config)
vins_mono: VinsMonoConfig = field(default_factory=VinsMonoConfig)
klt_ransac: KltRansacConfig = field(default_factory=KltRansacConfig)
@@ -305,3 +328,19 @@ class C1VioConfig:
raise ConfigError(
f"C1VioConfig.warm_start_max_frames must be >= 1; got {self.warm_start_max_frames}"
)
if not self.warm_start_store_dir:
raise ConfigError(
"C1VioConfig.warm_start_store_dir must be a non-empty path; "
f"got {self.warm_start_store_dir!r}"
)
if self.warm_start_save_period_frames < 1:
raise ConfigError(
"C1VioConfig.warm_start_save_period_frames must be >= 1; "
f"got {self.warm_start_save_period_frames}"
)
if self.post_reset_covariance_inflation_factor <= 1.0:
raise ConfigError(
"C1VioConfig.post_reset_covariance_inflation_factor must be > 1.0 "
"(1.0 would defeat AC-5.3's 'no fake confidence' floor); "
f"got {self.post_reset_covariance_inflation_factor}"
)
@@ -0,0 +1,439 @@
"""Warm-start hint persistence (AZ-335 / E-C1).
C1-internal storage layer for the warm-start + F8 reboot recovery
wiring. Defines:
- :class:`WarmStartHintStore` (PEP 544 Protocol) — the typed store
contract. Default impl is :class:`JsonSidecarWarmStartHintStore`;
a future operator-managed store (e.g. Redis-backed) can plug in via
the same Protocol without touching the wiring.
- :class:`LoadedWarmStartHint` (frozen dataclass) — what
:meth:`WarmStartHintStore.load` returns: the pose hint plus the
AC-5.3 baseline covariance norm captured at the same save.
- :class:`JsonSidecarWarmStartHintStore` — atomic-JSON-write +
SHA-256 sidecar persistence via :class:`Sha256Sidecar` (AZ-280).
- :class:`WarmStartFcSource` (PEP 544 Protocol) — the consumer-side
structural cut over the C8 ``FcAdapter`` family that
:func:`prime_warm_start_from_fc` consumes. Defined here (NOT
imported from c8) per AZ-507's cross-component rule: a c1 module
must not import from another component's module; consumer-side
Protocol cuts live with the consumer.
The on-disk schema (JSON) is owned by this module; ``version`` is
always ``1`` for this cycle. The schema layout is documented inline
in :func:`_serialise_envelope` / :func:`_deserialise_envelope` so
the round-trip contract stays close to the wire format.
The store is L2 component-internal (NOT in
``c1_vio/__init__.py``'s public surface); the runtime root pulls
the concrete class via this module path at composition time, the
same lazy-import pattern used by the AZ-331 vio_factory for
strategy modules.
"""
from __future__ import annotations
import json
from dataclasses import dataclass
from pathlib import Path
from typing import Any, Protocol, runtime_checkable
import numpy as np
from gps_denied_onboard._types.nav import ImuBias, WarmStartPose
from gps_denied_onboard.helpers.se3_utils import (
Se3InvalidMatrixError,
matrix_to_se3,
se3_to_matrix,
)
from gps_denied_onboard.helpers.sha256_sidecar import (
SIDECAR_SUFFIX,
Sha256Sidecar,
Sha256SidecarError,
)
from gps_denied_onboard.logging import get_logger
__all__ = [
"HINT_FILENAME",
"HINT_SCHEMA_VERSION",
"JsonSidecarWarmStartHintStore",
"LoadedWarmStartHint",
"WarmStartFcSource",
"WarmStartHintStore",
]
HINT_FILENAME: str = "c1_warm_start.json"
HINT_SCHEMA_VERSION: int = 1
_LOGGER_NAME: str = "components.c1_vio.warm_start_store"
_LOGGER_COMPONENT: str = "c1_vio"
@dataclass(frozen=True)
class LoadedWarmStartHint:
"""What :meth:`WarmStartHintStore.load` returns on success.
``pose`` is the persisted :class:`WarmStartPose` deep-equal to the
last saved hint. ``pre_reboot_covariance_norm`` is the Frobenius
norm of the strategy's last steady-state ``pose_covariance_6x6``
captured by the wiring at save time — the F8 reload path uses
this as the AC-5.3 / AC-8 "no fake confidence" floor.
``calibration_id`` is the camera-calibration identifier the hint
was produced under; the wiring rejects the hint if the current
calibration differs (Risk 2 mitigation).
"""
pose: WarmStartPose
pre_reboot_covariance_norm: float
calibration_id: str
@runtime_checkable
class WarmStartHintStore(Protocol):
"""Persistence contract for a single warm-start hint per c1_vio process.
Implementations MUST satisfy:
- :meth:`save` is atomic (no half-written file is ever loadable).
- :meth:`load` returns ``None`` on cold start (no prior hint),
on sidecar mismatch (corruption), and on calibration mismatch
(Risk 2). All three cases are observable via INFO/WARN logs.
- :meth:`clear` removes both the payload file and its sidecar
together (no half-cleared state).
"""
def save(
self,
hint: WarmStartPose,
*,
pre_reboot_covariance_norm: float,
) -> None: ...
def load(self) -> LoadedWarmStartHint | None: ...
def clear(self) -> None: ...
@runtime_checkable
class WarmStartFcSource(Protocol):
"""Consumer-side cut over the C8 ``FcAdapter`` family (AZ-507).
The F2 takeoff prime path calls :meth:`fetch_warm_start_pose` to
pull the FC EKF's last valid GPS + IMU-extrapolated pose. The
return is ``None`` when the FC has no valid GPS yet (the prime
path then degrades to cold-start with a WARN log; AC-NFR-no-crash).
The runtime-root composition wires a thin adapter from the
concrete C8 :class:`FcAdapter` to this Protocol; tests inject a
fake matching this surface directly. NEVER import a c8 concrete
adapter from inside c1_vio.
"""
def fetch_warm_start_pose(self) -> WarmStartPose | None: ...
def calibration_id(self) -> str: ...
def _serialise_envelope(
hint: WarmStartPose,
*,
pre_reboot_covariance_norm: float,
calibration_id: str,
) -> bytes:
"""Pack ``hint`` into the on-disk JSON envelope.
Schema v1 layout (top-level dict):
- ``version`` (int) — always :data:`HINT_SCHEMA_VERSION`.
- ``calibration_id`` (str) — see Risk 2 mitigation.
- ``pre_reboot_covariance_norm`` (float) — AC-5.3 / AC-8 baseline.
- ``pose`` (dict) — the :class:`WarmStartPose` flattened to
JSON-native types: ``body_T_world_4x4`` (4-list of 4-list of
float), ``velocity_b`` (3-list of float), ``bias`` (dict with
``accel_bias`` + ``gyro_bias`` 3-lists of float),
``captured_at_ns`` (int).
"""
matrix = se3_to_matrix(hint.body_T_world)
envelope: dict[str, Any] = {
"version": HINT_SCHEMA_VERSION,
"calibration_id": calibration_id,
"pre_reboot_covariance_norm": float(pre_reboot_covariance_norm),
"pose": {
"body_T_world_4x4": matrix.tolist(),
"velocity_b": [float(v) for v in hint.velocity_b],
"bias": {
"accel_bias": [float(v) for v in hint.bias.accel_bias],
"gyro_bias": [float(v) for v in hint.bias.gyro_bias],
},
"captured_at_ns": int(hint.captured_at_ns),
},
}
return json.dumps(envelope, sort_keys=True).encode("utf-8")
def _deserialise_envelope(
payload: bytes,
) -> tuple[WarmStartPose, float, str]:
"""Inverse of :func:`_serialise_envelope`.
Raises :class:`ValueError` (with context) on any structural
deviation from schema v1 — the calling :meth:`load` routes those
failures through the same WARN-and-return-None path as a sidecar
mismatch (the file is not loadable; cold-start is the right
fallback).
"""
try:
decoded = json.loads(payload.decode("utf-8"))
except (UnicodeDecodeError, json.JSONDecodeError) as exc:
raise ValueError(f"warm-start hint payload is not valid UTF-8 JSON: {exc}") from exc
if not isinstance(decoded, dict):
raise ValueError(
f"warm-start hint payload must decode to a dict; got {type(decoded).__name__}"
)
version = decoded.get("version")
if version != HINT_SCHEMA_VERSION:
raise ValueError(
f"warm-start hint version mismatch: expected {HINT_SCHEMA_VERSION}, got {version!r}"
)
calibration_id = decoded.get("calibration_id")
if not isinstance(calibration_id, str) or not calibration_id:
raise ValueError(
f"warm-start hint envelope missing non-empty calibration_id; got {calibration_id!r}"
)
pre_reboot_covariance_norm = decoded.get("pre_reboot_covariance_norm")
if not isinstance(pre_reboot_covariance_norm, (int, float)) or isinstance(
pre_reboot_covariance_norm, bool
):
raise ValueError(
"warm-start hint envelope.pre_reboot_covariance_norm must be a float; "
f"got {pre_reboot_covariance_norm!r}"
)
pose_dict = decoded.get("pose")
if not isinstance(pose_dict, dict):
raise ValueError(
f"warm-start hint envelope.pose must be a dict; got {type(pose_dict).__name__}"
)
matrix_list = pose_dict.get("body_T_world_4x4")
if not isinstance(matrix_list, list) or len(matrix_list) != 4:
raise ValueError("warm-start hint pose.body_T_world_4x4 must be a 4-list of rows")
try:
matrix = np.asarray(matrix_list, dtype=np.float64)
except (TypeError, ValueError) as exc:
raise ValueError(f"warm-start hint pose.body_T_world_4x4 not numeric: {exc}") from exc
try:
body_T_world = matrix_to_se3(matrix)
except Se3InvalidMatrixError as exc:
raise ValueError(f"warm-start hint pose.body_T_world_4x4 not a valid SE(3): {exc}") from exc
velocity_list = pose_dict.get("velocity_b")
if not isinstance(velocity_list, list) or len(velocity_list) != 3:
raise ValueError("warm-start hint pose.velocity_b must be a 3-list of floats")
velocity_b = (
float(velocity_list[0]),
float(velocity_list[1]),
float(velocity_list[2]),
)
bias_dict = pose_dict.get("bias")
if not isinstance(bias_dict, dict):
raise ValueError("warm-start hint pose.bias must be a dict")
accel_list = bias_dict.get("accel_bias")
gyro_list = bias_dict.get("gyro_bias")
if (
not isinstance(accel_list, list)
or len(accel_list) != 3
or not isinstance(gyro_list, list)
or len(gyro_list) != 3
):
raise ValueError(
"warm-start hint pose.bias must contain 3-list accel_bias and 3-list gyro_bias"
)
bias = ImuBias(
accel_bias=(float(accel_list[0]), float(accel_list[1]), float(accel_list[2])),
gyro_bias=(float(gyro_list[0]), float(gyro_list[1]), float(gyro_list[2])),
)
captured_at_ns = pose_dict.get("captured_at_ns")
if not isinstance(captured_at_ns, int) or isinstance(captured_at_ns, bool):
raise ValueError(
f"warm-start hint pose.captured_at_ns must be an int; got {captured_at_ns!r}"
)
pose = WarmStartPose(
body_T_world=body_T_world,
velocity_b=velocity_b,
bias=bias,
captured_at_ns=captured_at_ns,
)
return pose, float(pre_reboot_covariance_norm), calibration_id
class JsonSidecarWarmStartHintStore:
"""Default :class:`WarmStartHintStore` impl backed by JSON + SHA-256 sidecar.
``store_dir`` is the directory the hint file lives in; created on
first ``save`` if missing. ``calibration_id`` is bound at
construction time — the composition root reads
:class:`CameraCalibration.id` once and passes it here. A loaded
hint whose ``calibration_id`` differs from the constructor value
is rejected (returns ``None`` + WARN log) per Risk 2.
The atomic-write and sidecar-verify guarantees come from
:class:`Sha256Sidecar` (AZ-280); this class never opens the
payload file directly except through that helper. The class is
process-local (no cross-process locking) — by AZ-331 invariant
the c1_vio strategy is single-instanced per process and the
composition root owns this store.
"""
def __init__(self, store_dir: Path, *, calibration_id: str) -> None:
if not calibration_id:
raise ValueError(
"JsonSidecarWarmStartHintStore.calibration_id must be a non-empty string"
)
self._store_dir = Path(store_dir)
self._calibration_id = calibration_id
self._payload_path = self._store_dir / HINT_FILENAME
self._sidecar_path = Path(str(self._payload_path) + SIDECAR_SUFFIX)
self._log = get_logger(_LOGGER_NAME)
@property
def payload_path(self) -> Path:
"""The on-disk JSON file path (exposed for tests + forensics)."""
return self._payload_path
@property
def sidecar_path(self) -> Path:
"""The sidecar ``<payload>.sha256`` path (exposed for tests + forensics)."""
return self._sidecar_path
def save(
self,
hint: WarmStartPose,
*,
pre_reboot_covariance_norm: float,
) -> None:
"""Write the envelope atomically + sidecar.
Failures (write errors, parent-dir creation errors) propagate
as :class:`Sha256SidecarError` / :class:`OSError` so the
caller can route them through the wiring's no-crash policy
(the wiring catches these and emits an ERROR log per
AC-NFR-no-crash; the process keeps running and falls through
to cold-start on the next prime).
"""
self._store_dir.mkdir(parents=True, exist_ok=True)
payload = _serialise_envelope(
hint,
pre_reboot_covariance_norm=pre_reboot_covariance_norm,
calibration_id=self._calibration_id,
)
Sha256Sidecar.write_atomic_and_sidecar(self._payload_path, payload)
def load(self) -> LoadedWarmStartHint | None:
"""Return the persisted hint, or ``None`` on any non-loadable state.
Branches that emit ``None``:
- Payload file does not exist (cold start; no INFO log here —
the prime path emits ``c1.warm_start.cold_start``).
- Sidecar does not exist or is malformed (corruption — WARN
log ``c1.warm_start.corrupted`` with the offending path).
The file is NOT silently deleted (operator may want to
forensically inspect — AC-2).
- SHA-256 mismatch (corruption — same WARN log).
- JSON envelope structurally invalid (corruption — same WARN
log; the on-disk file is left intact).
- ``calibration_id`` mismatch (Risk 2 — WARN log
``c1.warm_start.calibration_mismatch``; not the same kind
as ``corrupted`` because the file IS valid, just stale).
"""
if not self._payload_path.exists():
return None
try:
verified = Sha256Sidecar.verify(self._payload_path)
except Sha256SidecarError as exc:
self._emit_corrupted_warning(reason=str(exc))
return None
if not verified:
self._emit_corrupted_warning(reason="sha256_mismatch")
return None
try:
payload = self._payload_path.read_bytes()
except OSError as exc:
self._emit_corrupted_warning(reason=f"oserror: {exc}")
return None
try:
pose, pre_reboot_norm, on_disk_calibration_id = _deserialise_envelope(payload)
except ValueError as exc:
self._emit_corrupted_warning(reason=str(exc))
return None
if on_disk_calibration_id != self._calibration_id:
self._log.warning(
"warm-start hint calibration mismatch",
extra={
"component": _LOGGER_COMPONENT,
"kind": "c1.warm_start.calibration_mismatch",
"kv": {
"path": str(self._payload_path),
"saved_calibration_id": on_disk_calibration_id,
"current_calibration_id": self._calibration_id,
},
},
)
return None
return LoadedWarmStartHint(
pose=pose,
pre_reboot_covariance_norm=pre_reboot_norm,
calibration_id=on_disk_calibration_id,
)
def _emit_corrupted_warning(self, *, reason: str) -> None:
"""Single emission point for the AC-2 ``c1.warm_start.corrupted`` WARN."""
self._log.warning(
"warm-start hint corrupted",
extra={
"component": _LOGGER_COMPONENT,
"kind": "c1.warm_start.corrupted",
"kv": {
"path": str(self._payload_path),
"reason": reason,
},
},
)
def clear(self) -> None:
"""Remove both the payload file and its sidecar.
Idempotent — missing files are not an error. Emits ONE INFO
log on every invocation, regardless of whether a file existed,
so the operator log shows the explicit clear action.
"""
for path in (self._payload_path, self._sidecar_path):
try:
path.unlink(missing_ok=True)
except OSError as exc:
self._log.error(
"warm-start hint clear failed",
extra={
"component": _LOGGER_COMPONENT,
"kind": "c1.warm_start.clear_failed",
"kv": {
"path": str(path),
"reason": str(exc),
},
},
)
raise
self._log.info(
"warm-start hint store cleared",
extra={
"component": _LOGGER_COMPONENT,
"kind": "c1.warm_start.cleared",
"kv": {
"store_dir": str(self._store_dir),
},
},
)