[AZ-335] C1 warm-start hint persistence + F8 reboot recovery wiring

Adds JsonSidecarWarmStartHintStore (atomic JSON + SHA-256 sidecar via
AZ-280) inside c1_vio, plus the cross-strategy WarmStartWiredStrategy
wrapper + prime_warm_start_from_disk / prime_warm_start_from_fc hooks
at runtime_root. AC-7 post-reset covariance inflation and AC-8 "no
fake confidence" baseline floor are enforced at the wiring layer so
no strategy module needed edits. Adds three c1_vio config fields
(warm_start_store_dir, warm_start_save_period_frames,
post_reset_covariance_inflation_factor) and registers the new FDR
kind vio.warm_start. 34 unit tests cover all 10 ACs + 3 NFRs.

Verdict PASS_WITH_WARNINGS — see
_docs/03_implementation/reviews/batch_56_review.md for the four
non-blocking documentation findings (F1 cold-start log kind shorthand,
F2 strategy-frame pose semantics, F3 dev-hardware perf smoke, F4
runtime_root importing c1-internal _facade_spine for shared FDR
conventions).

Closes AZ-335; depends on AZ-528 (batch 55).

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-14 03:30:46 +03:00
parent f12789ebf0
commit 06f655d8fb
10 changed files with 2239 additions and 3 deletions
@@ -0,0 +1,562 @@
"""C1 warm-start runtime wiring (AZ-335 / E-C1).
Cross-strategy orchestration for warm-start hint persistence + F2
takeoff load + F8 reboot recovery. The wiring lives at the
composition root because the concerns it implements span more than
the :class:`VioStrategy` Protocol surface:
- AC-5.1 / AC-5.3 require a hint flow ``FC EKF → strategy``
(F2 takeoff) and ``disk → strategy`` (F8 reboot) that no single
strategy can implement on its own.
- The post-reset covariance inflation + AC-5.3 "no fake confidence"
floor is enforced HERE, not inside any strategy — adding the
inflation to a strategy would double-inflate when the wiring also
inflates (Constraints, AZ-335 task spec).
- The per-frame save throttle keeps disk I/O bounded at the 3 Hz
steady-state frame rate.
Public surface:
- :class:`WarmStartWiredStrategy` — a :class:`VioStrategy` impl that
wraps any concrete :class:`VioStrategy` (OKVIS2 / VINS-Mono /
KLT-RANSAC) with the per-frame save + post-reset covariance
inflation + AC-8 baseline floor. Exposes the standard Protocol
methods PLUS :meth:`prime_post_reboot` which the F8 prime path
uses to install the loaded baseline.
- :func:`prime_warm_start_from_disk` — F8 reboot prime hook.
- :func:`prime_warm_start_from_fc` — F2 takeoff prime hook.
The composition root constructs a :class:`WarmStartWiredStrategy`
from ``runtime_root.vio_factory.build_vio_strategy(config,
fdr_client=...)`` and the per-binary :class:`WarmStartHintStore`,
then calls :func:`prime_warm_start_from_disk` once at process
startup before the first ``process_frame``. The F2 hook is invoked
on the FC's ``flight_state`` transition to ``IN_AIR`` (operator-side
or auto-detected; that wiring is owned by the composition root, not
this module).
"""
from __future__ import annotations
import time
from dataclasses import replace
from typing import TYPE_CHECKING, Any, Literal
import numpy as np
from gps_denied_onboard._types.nav import (
ImuWindow,
NavCameraFrame,
VioHealth,
VioOutput,
WarmStartPose,
)
from gps_denied_onboard.components.c1_vio._facade_spine import bias_norm, now_iso
from gps_denied_onboard.components.c1_vio.interface import VioStrategy
from gps_denied_onboard.components.c1_vio.warm_start_store import (
LoadedWarmStartHint,
WarmStartFcSource,
WarmStartHintStore,
)
from gps_denied_onboard.fdr_client.records import CURRENT_SCHEMA_VERSION, FdrRecord
from gps_denied_onboard.logging import get_logger
if TYPE_CHECKING:
from gps_denied_onboard._types.calibration import CameraCalibration
from gps_denied_onboard.fdr_client.client import FdrClient
__all__ = [
"WARM_START_PRODUCER_ID",
"WarmStartWiredStrategy",
"prime_warm_start_from_disk",
"prime_warm_start_from_fc",
]
WARM_START_PRODUCER_ID: str = "components.c1_vio.warm_start"
_LOGGER_NAME: str = "components.c1_vio.warm_start_wiring"
_LOGGER_COMPONENT: str = "c1_vio"
_SOURCE_F2_TAKEOFF: str = "f2_takeoff_fc"
_SOURCE_F8_REBOOT: str = "f8_reboot_disk"
_SOURCE_COLD_START: str = "cold_start_no_hint"
def _frobenius_norm(matrix: Any) -> float:
"""Frobenius norm of a 6×6 covariance, hardened against non-array inputs."""
arr = np.asarray(matrix, dtype=np.float64)
return float(np.linalg.norm(arr, ord="fro"))
class WarmStartWiredStrategy:
"""Facade around a concrete :class:`VioStrategy` with AZ-335 wiring.
Wraps an inner strategy so that:
1. Every successful :meth:`process_frame` is replicated to the
:class:`WarmStartHintStore` once every
``warm_start_save_period_frames`` frames (AC-6).
2. For the first ``warm_start_max_frames`` frames after every
:meth:`reset_to_warm_start` call, the emitted
``pose_covariance_6x6`` is multiplied by
``post_reset_covariance_inflation_factor`` (AC-7).
3. When a baseline floor was installed by
:meth:`prime_post_reboot`, post-reset frames are additionally
scaled up so their Frobenius norm is at least the saved
pre-reboot value (AC-8 — the "no fake confidence" invariant).
The wrapper is itself a :class:`VioStrategy` (PEP 544 structural
typing). ``runtime_checkable`` conformance is verified by the
AZ-335 unit tests; downstream consumers (C5 fusion, C13 FDR)
cannot tell the difference between the wrapped and the bare
strategy because the public Protocol shape is preserved.
Per-frame save errors do NOT crash the process — a
:class:`Sha256SidecarError` or :class:`OSError` raised by
:meth:`WarmStartHintStore.save` is logged at ERROR (kind
``c1.warm_start.save_failed``) and swallowed so the camera
ingest hot path keeps flowing (AC-NFR-no-crash).
"""
def __init__(
self,
inner: VioStrategy,
*,
store: WarmStartHintStore,
warm_start_max_frames: int,
post_reset_covariance_inflation_factor: float,
warm_start_save_period_frames: int,
) -> None:
if warm_start_max_frames < 1:
raise ValueError(
"warm_start_max_frames must be >= 1; "
f"got {warm_start_max_frames}"
)
if post_reset_covariance_inflation_factor <= 1.0:
raise ValueError(
"post_reset_covariance_inflation_factor must be > 1.0 "
"(1.0 would defeat AC-5.3 / AC-8 floor); "
f"got {post_reset_covariance_inflation_factor}"
)
if warm_start_save_period_frames < 1:
raise ValueError(
"warm_start_save_period_frames must be >= 1; "
f"got {warm_start_save_period_frames}"
)
self._inner = inner
self._store = store
self._max_frames = warm_start_max_frames
self._inflation_factor = float(post_reset_covariance_inflation_factor)
self._save_period = warm_start_save_period_frames
self._post_reset_remaining: int = 0
self._baseline_floor: float = 0.0
self._frames_since_save: int = 0
self._last_emitted_covariance_norm: float = 0.0
self._log = get_logger(_LOGGER_NAME)
@property
def post_reset_remaining(self) -> int:
"""Frames left in the active inflation window (0 in steady-state)."""
return self._post_reset_remaining
@property
def baseline_floor(self) -> float:
"""Currently installed AC-8 covariance floor (0.0 when no F8 prime)."""
return self._baseline_floor
@property
def last_emitted_covariance_norm(self) -> float:
"""Frobenius norm of the last :class:`VioOutput` returned to the consumer."""
return self._last_emitted_covariance_norm
def process_frame(
self,
frame: NavCameraFrame,
imu: ImuWindow,
calibration: "CameraCalibration",
) -> VioOutput:
"""Forward to inner strategy, then apply inflation + throttled save."""
out = self._inner.process_frame(frame, imu, calibration)
if self._post_reset_remaining > 0:
out = self._apply_post_reset_inflation(out)
self._post_reset_remaining -= 1
self._last_emitted_covariance_norm = _frobenius_norm(out.pose_covariance_6x6)
self._frames_since_save += 1
if self._frames_since_save >= self._save_period:
self._frames_since_save = 0
self._save_hint_from_output(out)
return out
def reset_to_warm_start(self, hint: WarmStartPose) -> None:
"""Protocol method: forward to inner, arm inflation window WITHOUT a floor.
Used by the F2 takeoff prime path — the FC EKF supplies a
fresh pose, so there is no pre-reboot baseline to defend
against. The :data:`_baseline_floor` attribute is reset to
``0.0`` so the AC-8 max() degenerates to plain inflation.
"""
self._inner.reset_to_warm_start(hint)
self._post_reset_remaining = self._max_frames
self._baseline_floor = 0.0
self._frames_since_save = 0
def prime_post_reboot(self, loaded: LoadedWarmStartHint) -> None:
"""Wrapper extension: F8 reboot path, installs the AC-8 floor.
Forwards the loaded pose to the inner strategy via
:meth:`reset_to_warm_start`, then arms the inflation window
AND captures ``loaded.pre_reboot_covariance_norm`` as the
floor that subsequent :meth:`process_frame` calls must
respect for ``warm_start_max_frames`` frames.
NOT a Protocol method — the autodev-injected F8 path calls
this directly on a :class:`WarmStartWiredStrategy` instance.
"""
self._inner.reset_to_warm_start(loaded.pose)
self._post_reset_remaining = self._max_frames
self._baseline_floor = float(loaded.pre_reboot_covariance_norm)
self._frames_since_save = 0
def health_snapshot(self) -> VioHealth:
"""Forward unchanged — health is a strategy concern, not a wiring concern."""
return self._inner.health_snapshot()
def current_strategy_label(
self,
) -> Literal["okvis2", "vins_mono", "klt_ransac"]:
"""Forward unchanged so :class:`VioHealth.strategy_label` audit is honest."""
return self._inner.current_strategy_label()
def _apply_post_reset_inflation(self, out: VioOutput) -> VioOutput:
"""Inflate the emitted covariance by the configured factor + AC-8 floor.
AC-7: inflated norm = factor × strategy_emitted_norm. AC-8:
further scale up so inflated norm ≥ ``_baseline_floor``. Both
scalings preserve symmetry and positive-definiteness because
they are pure positive scalar multiplications of the SPD
matrix (eigenvalues stay strictly positive).
"""
original = np.asarray(out.pose_covariance_6x6, dtype=np.float64)
inflated = original * self._inflation_factor
inflated_norm = float(np.linalg.norm(inflated, ord="fro"))
if (
self._baseline_floor > 0.0
and inflated_norm > 0.0
and inflated_norm < self._baseline_floor
):
scale = self._baseline_floor / inflated_norm
inflated = inflated * scale
return replace(out, pose_covariance_6x6=inflated)
def _save_hint_from_output(self, out: VioOutput) -> None:
"""Construct a :class:`WarmStartPose` from the last emitted output and save.
``velocity_b`` is left at zero — the wrapper has no velocity
source on the per-frame save path (the strategy's
:class:`VioOutput` does not expose velocity, and chasing it
would require a numerical-differentiation sidecar that
belongs in a future cycle). On F8 reload the strategy
re-estimates velocity from its IMU integration, so a
zero-velocity hint is acceptable for the recovery path.
Per-frame save failures do NOT propagate — they are logged
at ERROR and swallowed (AC-NFR-no-crash). The hint store
will be in whatever state the failed atomic-write left it
(the AZ-280 contract guarantees no half-written file).
"""
hint = WarmStartPose(
body_T_world=out.relative_pose_T,
velocity_b=(0.0, 0.0, 0.0),
bias=out.imu_bias,
captured_at_ns=int(out.emitted_at_ns),
)
try:
self._store.save(
hint,
pre_reboot_covariance_norm=self._last_emitted_covariance_norm,
)
except (OSError, RuntimeError, ValueError) as exc:
self._log.error(
"warm-start hint save failed",
extra={
"component": _LOGGER_COMPONENT,
"kind": "c1.warm_start.save_failed",
"kv": {
"reason": str(exc),
"frame_id": out.frame_id,
},
},
)
def _emit_prime_fdr(
*,
fdr_client: "FdrClient",
source: str,
strategy_label: str,
bias_norm_value: float | None,
staleness_ns: int | None,
pre_reboot_covariance_norm: float | None,
) -> None:
"""Emit the single AZ-335 ``vio.warm_start`` FDR record."""
record = FdrRecord(
schema_version=CURRENT_SCHEMA_VERSION,
ts=now_iso(),
producer_id=WARM_START_PRODUCER_ID,
kind="vio.warm_start",
payload={
"source": source,
"strategy_label": strategy_label,
"bias_norm": bias_norm_value,
"staleness_ns": staleness_ns,
"pre_reboot_covariance_norm": pre_reboot_covariance_norm,
},
)
fdr_client.enqueue(record)
def _emit_prime_log(
*,
log: Any,
level: str,
msg: str,
source: str,
strategy_label: str,
extra_kv: dict[str, Any] | None = None,
) -> None:
"""Single emission point for prime-time INFO/WARN logs."""
kv: dict[str, Any] = {
"source": source,
"strategy_label": strategy_label,
}
if extra_kv:
kv.update(extra_kv)
record_extra = {
"component": _LOGGER_COMPONENT,
"kind": f"c1.warm_start.{source}",
"kv": kv,
}
if level == "warning":
log.warning(msg, extra=record_extra)
else:
log.info(msg, extra=record_extra)
def prime_warm_start_from_disk(
strategy: WarmStartWiredStrategy,
store: WarmStartHintStore,
*,
fdr_client: "FdrClient",
) -> bool:
"""F8 reboot prime hook — called at process startup before first ``process_frame``.
Reads the persisted hint via ``store.load()``:
- If a hint is loaded, calls :meth:`WarmStartWiredStrategy.prime_post_reboot`
(which forwards to the inner strategy AND installs the AC-8 floor),
emits one INFO log ``c1.warm_start.f8_reboot_disk``, and emits one
FDR record ``vio.warm_start`` with ``source="f8_reboot_disk"``.
- If ``store.load()`` returns ``None`` (cold start, corrupted file,
calibration mismatch), emits one INFO log
``c1.warm_start.cold_start_no_hint`` and one FDR record with
``source="cold_start_no_hint"``. The strategy is left untouched
and proceeds with its own INIT-state behaviour.
Returns ``True`` iff a hint was loaded AND applied. Never raises:
a :class:`VioFatalError` from the inner strategy's
:meth:`reset_to_warm_start` is caught, logged at ERROR
(``c1.warm_start.reset_failed``), and the function returns
``False`` so the camera ingest can still start in cold-start mode.
"""
log = get_logger(_LOGGER_NAME)
strategy_label = strategy.current_strategy_label()
loaded = store.load()
if loaded is None:
_emit_prime_log(
log=log,
level="info",
msg="warm-start cold start — no prior hint",
source=_SOURCE_COLD_START,
strategy_label=strategy_label,
)
_emit_prime_fdr(
fdr_client=fdr_client,
source=_SOURCE_COLD_START,
strategy_label=strategy_label,
bias_norm_value=None,
staleness_ns=None,
pre_reboot_covariance_norm=None,
)
return False
try:
strategy.prime_post_reboot(loaded)
except Exception as exc:
log.error(
"warm-start prime_post_reboot failed",
extra={
"component": _LOGGER_COMPONENT,
"kind": "c1.warm_start.reset_failed",
"kv": {
"source": _SOURCE_F8_REBOOT,
"strategy_label": strategy_label,
"reason": str(exc),
},
},
)
return False
staleness_ns = max(0, int(time.monotonic_ns()) - int(loaded.pose.captured_at_ns))
_emit_prime_log(
log=log,
level="info",
msg="warm-start F8 reboot — hint loaded from disk",
source=_SOURCE_F8_REBOOT,
strategy_label=strategy_label,
extra_kv={
"staleness_ns": staleness_ns,
"pre_reboot_covariance_norm": loaded.pre_reboot_covariance_norm,
},
)
_emit_prime_fdr(
fdr_client=fdr_client,
source=_SOURCE_F8_REBOOT,
strategy_label=strategy_label,
bias_norm_value=bias_norm(loaded.pose.bias),
staleness_ns=staleness_ns,
pre_reboot_covariance_norm=loaded.pre_reboot_covariance_norm,
)
return True
def prime_warm_start_from_fc(
strategy: WarmStartWiredStrategy,
source: WarmStartFcSource,
store: WarmStartHintStore,
*,
fdr_client: "FdrClient",
) -> bool:
"""F2 takeoff prime hook — called once on the ``IN_AIR`` flight-state edge.
Asks the consumer-side cut for the FC EKF's last valid pose:
- If a hint is returned, calls :meth:`WarmStartWiredStrategy.reset_to_warm_start`
(the inflation window arms WITHOUT an AC-8 floor — there is no
pre-reboot baseline on the F2 path because the FC just provided
a fresh pose), persists the same hint via ``store.save`` so the
next F8 reboot can recover from it, and emits the INFO log +
FDR record with ``source="f2_takeoff_fc"``.
- If the source returns ``None`` or raises, emits one WARN log
``c1.warm_start.f2_takeoff_fc_unavailable`` and an FDR record
with ``source="cold_start_no_hint"``; the strategy is left in
its current state and the camera ingest proceeds (AC-NFR-no-crash).
Returns ``True`` iff a hint was fetched, applied, AND persisted.
Never raises.
"""
log = get_logger(_LOGGER_NAME)
strategy_label = strategy.current_strategy_label()
try:
hint = source.fetch_warm_start_pose()
except Exception as exc:
log.warning(
"warm-start FC fetch raised",
extra={
"component": _LOGGER_COMPONENT,
"kind": "c1.warm_start.f2_takeoff_fc_unavailable",
"kv": {
"source": _SOURCE_F2_TAKEOFF,
"strategy_label": strategy_label,
"reason": str(exc),
},
},
)
_emit_prime_fdr(
fdr_client=fdr_client,
source=_SOURCE_COLD_START,
strategy_label=strategy_label,
bias_norm_value=None,
staleness_ns=None,
pre_reboot_covariance_norm=None,
)
return False
if hint is None:
log.warning(
"warm-start FC has no valid pose yet",
extra={
"component": _LOGGER_COMPONENT,
"kind": "c1.warm_start.f2_takeoff_fc_unavailable",
"kv": {
"source": _SOURCE_F2_TAKEOFF,
"strategy_label": strategy_label,
"reason": "fc_returned_none",
},
},
)
_emit_prime_fdr(
fdr_client=fdr_client,
source=_SOURCE_COLD_START,
strategy_label=strategy_label,
bias_norm_value=None,
staleness_ns=None,
pre_reboot_covariance_norm=None,
)
return False
try:
strategy.reset_to_warm_start(hint)
except Exception as exc:
log.error(
"warm-start F2 reset_to_warm_start failed",
extra={
"component": _LOGGER_COMPONENT,
"kind": "c1.warm_start.reset_failed",
"kv": {
"source": _SOURCE_F2_TAKEOFF,
"strategy_label": strategy_label,
"reason": str(exc),
},
},
)
return False
try:
store.save(hint, pre_reboot_covariance_norm=0.0)
except (OSError, RuntimeError, ValueError) as exc:
log.error(
"warm-start F2 persist failed",
extra={
"component": _LOGGER_COMPONENT,
"kind": "c1.warm_start.save_failed",
"kv": {
"source": _SOURCE_F2_TAKEOFF,
"strategy_label": strategy_label,
"reason": str(exc),
},
},
)
# the strategy already accepted the hint; the FDR record
# below still records the F2 prime for audit, but we return
# False to indicate persistence did not complete. The next
# successful per-frame save will restore the on-disk state.
_emit_prime_fdr(
fdr_client=fdr_client,
source=_SOURCE_F2_TAKEOFF,
strategy_label=strategy_label,
bias_norm_value=bias_norm(hint.bias),
staleness_ns=None,
pre_reboot_covariance_norm=None,
)
return False
_emit_prime_log(
log=log,
level="info",
msg="warm-start F2 takeoff — hint primed from FC",
source=_SOURCE_F2_TAKEOFF,
strategy_label=strategy_label,
)
_emit_prime_fdr(
fdr_client=fdr_client,
source=_SOURCE_F2_TAKEOFF,
strategy_label=strategy_label,
bias_norm_value=bias_norm(hint.bias),
staleness_ns=None,
pre_reboot_covariance_norm=None,
)
return True