[AZ-297] C7 InferenceRuntime: Protocol + DTOs + factory

Freezes the c7_inference Public API per _docs/02_document/contracts/c7_inference/inference_runtime_protocol.md v1.0.0: - InferenceRuntime Protocol (6 methods: compile_engine, deserialize_engine, infer, release_engine, thermal_state, current_runtime_label) in components/c7_inference/interface.py. - DTOs (PrecisionMode enum, OptimizationProfile, BuildConfig, EngineCacheEntry, EngineHandle opaque marker) in _types/inference.py — placed at the L1 types layer so C10 can re-export EngineCacheEntry without crossing the components.* boundary (AZ-270 AC-6). - ThermalState DTO expanded in _types/thermal.py from the AZ-355 forward-declared stub to the AZ-297 contract shape (cpu/gpu temp, thermal_throttle_active, measured_clock_mhz, measured_at_ns, is_telemetry_available). Invariant I-6: when telemetry is unavailable, throttle is False. - Error family rooted at c7_inference.errors.RuntimeError (9 subtypes: EngineBuildError, EngineDeserializeError, EngineHashMismatchError, EngineSchemaMismatchError, EngineSidecarMissingError, CalibrationCacheError, InferenceError, OutOfMemoryError, TelemetryUnavailableError). RuntimeNotAvailableError stays in runtime_root/errors.py — composition-time, outside the family. - C7InferenceConfig per-component config block (runtime label, thermal_poll_hz, engine_cache_dir) with constructor-time validation rejecting unknown runtime labels. - Composition-root factory build_inference_runtime in runtime_root/inference_factory.py with three BUILD_* gates (BUILD_TENSORRT_RUNTIME, BUILD_ONNX_TRT_EP_RUNTIME, BUILD_PYTORCH_FP16_RUNTIME). Concrete strategy modules are imported lazily via __import__ AFTER the flag check, so a Tier-0 build with the flag OFF MUST NOT load the strategy module (AC-5 / I-5; verifiable via sys.modules). - 37 conformance tests cover all 8 ACs + NFR-perf-factory (p99 build under 200 ms × 1000 calls) + NFR-reliability-error-family. AC-8 introspects the contract file's Shape table and asserts method parity against the runtime Protocol; also asserts all 9 error subtypes are documented. Retired the AZ-263 scaffolding EngineCacheEntry from _types/manifests.py (replaced by the AZ-297 canonical shape in _types/inference.py); updated the LightGlue-flavoured EngineHandle Protocol docstring in _types/manifests.py to rationalize its intentional dual existence with the C7 opaque EngineHandle (same name, different consumer-side cut, mirroring the C4/C5 ISam2GraphHandle pattern). Stale ThermalState.throttle docstring references in c4_pose/config.py, c4_pose/interface.py, and _types/pose.py updated to thermal_throttle_active. Full unit-test sweep: 843 passed, 2 pre-existing environment skips (cmake, actionlint). Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-22 18:01:13 +00:00 · 2026-05-12 04:30:14 +03:00
parent f925af9de3
commit daff5d4d1c
15 changed files with 1089 additions and 60 deletions
@@ -0,0 +1,106 @@
+"""C7 inference runtime DTOs + enums (AZ-297) — L1 shared types.
+
+Hosted at the ``_types`` layer (not under ``components/c7_inference``)
+because C10 ``CacheProvisioner`` re-exports :class:`EngineCacheEntry`
+as part of its own Public API; per the architecture rule
+(``test_az270_compose_root.py`` AC-6) components MUST NOT import
+other components, so cross-component DTOs live in ``_types/``.
+
+C7's :class:`gps_denied_onboard.components.c7_inference` package
+re-exports these names for the canonical contract surface; consumers
+SHOULD import from ``c7_inference`` (the contract namespace) rather
+than from here. The contract at
+``_docs/02_document/contracts/c7_inference/inference_runtime_protocol.md``
+v1.0.0 is the authoritative shape; this module mirrors it 1:1.
+"""
+
+from __future__ import annotations
+
+from dataclasses import dataclass
+from enum import Enum
+from pathlib import Path
+
+__all__ = [
+    "BuildConfig",
+    "EngineCacheEntry",
+    "EngineHandle",
+    "OptimizationProfile",
+    "PrecisionMode",
+]
+
+
+class PrecisionMode(str, Enum):
+    """TensorRT / ORT / PyTorch precision the engine is compiled for."""
+
+    FP16 = "fp16"
+    INT8 = "int8"
+    MIXED = "mixed"
+
+
+@dataclass(frozen=True)
+class OptimizationProfile:
+    """One named TRT optimisation profile.
+
+    The runtime picks an actual shape inside ``[min_shape, max_shape]``
+    based on the input dict supplied to ``infer``; the Protocol does
+    NOT auto-batch (see ``inference_runtime_protocol.md`` Non-Goals).
+    """
+
+    input_name: str
+    min_shape: tuple[int, ...]
+    opt_shape: tuple[int, ...]
+    max_shape: tuple[int, ...]
+
+
+@dataclass(frozen=True)
+class BuildConfig:
+    """Engine-build hyperparameters passed to ``compile_engine``.
+
+    ``calibration_dataset`` is required when ``precision == INT8`` and
+    is ``None`` for FP16 / MIXED. ``use_trtexec`` is a TRT-only hint;
+    the ORT / PyTorch strategies ignore it.
+    """
+
+    precision: PrecisionMode
+    workspace_mb: int
+    calibration_dataset: Path | None
+    optimization_profiles: tuple[OptimizationProfile, ...]
+    use_trtexec: bool = False
+
+
+@dataclass(frozen=True)
+class EngineCacheEntry:
+    """Self-describing pointer to a compiled engine on disk.
+
+    The five-tuple ``(model, sm, jp, trt, precision)`` matches the
+    filename schema from AZ-281 (``engine_filename_schema.md``);
+    ``sha256_hex`` is produced via the AZ-280 sidecar pattern. For
+    PyTorch baselines the ``sm`` / ``jp`` / ``trt`` fields are ``None``
+    (the runtime is hardware-portable). ``extras`` carries
+    strategy-specific metadata (e.g., calibration cache path).
+    """
+
+    engine_path: Path
+    sha256_hex: str
+    sm: int | None
+    jp: str | None
+    trt: str | None
+    precision: PrecisionMode
+    extras: dict[str, str]
+
+
+class EngineHandle:
+    """Opaque token returned by C7 ``deserialize_engine``.
+
+    Consumers MUST pass the handle back to the same runtime
+    (``infer`` / ``release_engine``) and MUST NOT introspect fields —
+    Invariant I-4 of ``inference_runtime_protocol.md``. Each concrete
+    strategy subclasses this with implementation-private state.
+
+    A separate :class:`gps_denied_onboard._types.manifests.EngineHandle`
+    Protocol exists for the LightGlue consumer-side cut (descriptor_dim
+    + forward). That is a duck-typed structural cut, intentionally
+    distinct from this opaque marker; concrete C7 strategies subclass
+    THIS class and structurally satisfy the LightGlue Protocol when
+    paired with a matching engine binary.
+    """
@@ -22,19 +22,6 @@ class Manifest:
    metadata: dict[str, Any] = field(default_factory=dict)


-@dataclass(frozen=True)
-class EngineCacheEntry:
-    """TensorRT engine + calibration cache, keyed by SM/JP/TRT/precision (D-C10-7)."""
-
-    engine_path: str
-    sm_arch: str
-    jetpack_version: str
-    tensorrt_version: str
-    precision: str
-    content_hash: str
-    int8_calibration_path: str | None = None
-
-
@dataclass(frozen=True)
 class EngineCacheKey:
    """Parsed tuple of a self-describing `.engine` filename (D-C10-7, AZ-281).
@@ -53,12 +40,23 @@ class EngineCacheKey:

@runtime_checkable
 class EngineHandle(Protocol):
-    """Opaque Protocol for an inference engine handle (C7-owned implementation).
+    """Consumer-side structural cut of a C7 engine handle for the
+    LightGlue helper (AZ-278 / R14 fix).

-    The production handle is created by C7's
-    ``InferenceRuntime.deserialize_engine`` and injected by the
-    composition root into ``LightGlueRuntime``. The helper depends on
-    this Protocol from `_types` so Layer 1 never imports C7 (R14 fix).
+    Intentionally distinct from
+    :class:`gps_denied_onboard.components.c7_inference.EngineHandle`
+    (the AZ-297 opaque marker class): same name, different shape, by
+    design. The C7 Protocol says EngineHandle is opaque to general
+    consumers (Invariant I-4); LightGlueRuntime is a specific helper
+    that needs a narrower structural Protocol with
+    ``descriptor_dim`` + ``forward``. Concrete C7 strategies
+    (TensorrtRuntime / OnnxTrtEpRuntime / PytorchFp16Runtime)
+    subclass the c7_inference opaque marker AND structurally satisfy
+    this Protocol when paired with a matching engine binary.
+
+    The helper depends on THIS Protocol from ``_types`` so Layer 1
+    never imports C7 (R14 fix). See ADR for the dual-Protocol
+    design rationale (mirrors the C4/C5 ISam2GraphHandle split).
    """

    @property
@@ -54,7 +54,7 @@ class CovarianceMode(Enum):
      iSAM2 graph (ADR-003 substrate). The cost-accurate path.
    * ``JACOBIAN`` — D-CROSS-LATENCY-1 thermal-throttle fallback
      (ADR-006). Cheaper but ~5-10% less accurate; engaged per-frame
-      when ``ThermalState.throttle == True``.
+      when ``ThermalState.thermal_throttle_active == True``.

    AZ-355 owns the enum; AZ-358 (Marginals) + AZ-361 (Hybrid) own
    the producer paths.
@@ -1,14 +1,17 @@
-"""C7 ``ThermalState`` DTO stub (forward-declared for AZ-355).
+"""C7 ``ThermalState`` DTO (frozen at AZ-297 v1.0.0).

-AZ-355 (C4 PoseEstimator Protocol) needs a ``ThermalState`` type to
-annotate :meth:`PoseEstimator.estimate`. The full producer
-(``ThermalStatePublisher`` in C7) is owned by AZ-302; this module
-holds the minimal DTO surface C4 needs so the Protocol typechecks
-without a circular dependency or a ``TYPE_CHECKING`` workaround.
+Originally introduced as a forward-declared stub for AZ-355 (C4 needs
+the type before C7 lands); AZ-297 expanded the surface to the full
+contract shape (cpu/gpu temp, throttle, measured clock, monotonic
+timestamp, telemetry-availability bit) and AZ-302 will own the
+producer-side polling thread.

-When AZ-302 lands, it MAY add fields here (temperature reading,
-thermal-zone source, captured_at) but MUST keep the ``throttle``
-boolean — it is the only field the C4 Protocol contract pins.
+C4 consumers read ``thermal_throttle_active`` (the canonical name);
+the older docstring references to ``throttle`` are stale and now
+point at this field. Invariant I-6 of
+``_docs/02_document/contracts/c7_inference/inference_runtime_protocol.md``
+binds ``thermal_throttle_active == False`` whenever
+``is_telemetry_available == False``.
 """

 from __future__ import annotations
@@ -22,11 +25,22 @@ __all__ = ["ThermalState"]
 class ThermalState:
    """C7-reported thermal state consumed by C4 for the per-frame mode switch.

-    ``throttle == True`` triggers the Jacobian path (D-CROSS-LATENCY-1
-    / ADR-006). ``False`` keeps the production Marginals path.
+    ``thermal_throttle_active == True`` triggers the Jacobian path
+    (D-CROSS-LATENCY-1 / ADR-006); ``False`` keeps the production
+    Marginals path. ``is_telemetry_available`` is the default-safe
+    bit: when the source is hung or absent, consumers see
+    ``thermal_throttle_active == False`` regardless of any other
+    field's value.

-    The full C7 publisher (AZ-302) emits these on a fixed cadence;
-    C4 reads the latest value at every ``estimate`` call entry.
+    Temperatures are degrees Celsius; ``measured_clock_mhz`` is the
+    most-recent GPU clock report (None when unavailable);
+    ``measured_at_ns`` is :func:`time.monotonic_ns` at poll time so
+    consumers can age-gate the reading.
    """

-    throttle: bool
+    cpu_temp_c: float | None
+    gpu_temp_c: float | None
+    thermal_throttle_active: bool
+    measured_clock_mhz: int | None
+    measured_at_ns: int
+    is_telemetry_available: bool
@@ -1,6 +1,12 @@
-"""C10 Cache Provisioning component — Public API."""
+"""C10 Cache Provisioning component — Public API.

-from gps_denied_onboard._types.manifests import EngineCacheEntry, Manifest
+``EngineCacheEntry`` is the C7 canonical DTO (frozen at AZ-297) and
+lives at the L1 ``_types`` layer so C10 can re-export it without
+crossing the components.* boundary (architecture rule AC-6).
+"""
+
+from gps_denied_onboard._types.inference import EngineCacheEntry
+from gps_denied_onboard._types.manifests import Manifest
 from gps_denied_onboard.components.c10_provisioning.interface import CacheProvisioner

 __all__ = ["CacheProvisioner", "EngineCacheEntry", "Manifest"]
@@ -34,7 +34,7 @@ class C4PoseConfig:
    * ``ransac_reprojection_threshold_px`` — RANSAC inlier-distance
      threshold. Default 4.0 pixels per the contract.
    * ``thermal_throttle_threshold_celsius`` — informational only;
-      the actual ``ThermalState.throttle`` decision is owned by C7
+      the actual ``ThermalState.thermal_throttle_active`` decision is owned by C7
      (AZ-302). Default 75.0 °C.
    """

@@ -51,7 +51,7 @@ class PoseEstimator(Protocol):
    ) -> PoseEstimate:
        """Run PnP → factor add → covariance recovery.

-        Per-frame thermal decision: ``thermal_state.throttle == True``
+        Per-frame thermal decision: ``thermal_state.thermal_throttle_active == True``
        engages the Jacobian path (cheap, ~5-10 % accuracy loss);
        ``False`` engages the Marginals path (production default).

@@ -1,6 +1,68 @@
-"""C7 Inference Runtime component — Public API."""
+"""C7 Inference Runtime — Public API (AZ-297).

-from gps_denied_onboard._types.manifests import EngineCacheEntry
+Per ``inference_runtime_protocol.md`` v1.0.0 the public surface
+consists of:
+
+- :class:`InferenceRuntime` Protocol (6 methods).
+- DTOs: :class:`BuildConfig`, :class:`EngineCacheEntry`,
+  :class:`EngineHandle` (opaque marker), :class:`OptimizationProfile`,
+  :class:`PrecisionMode` enum.
+- :class:`ThermalState` re-exported from
+  :mod:`gps_denied_onboard._types.thermal` (its canonical home for
+  C4 forward-compatibility).
+- Error family rooted at :class:`RuntimeError`; nine documented subtypes.
+- Config block :class:`C7InferenceConfig` (registered on import).
+
+Concrete strategies (``TensorrtRuntime``, ``OnnxTrtEpRuntime``,
+``PytorchFp16Runtime``) live in sibling modules and are imported
+lazily by :mod:`gps_denied_onboard.runtime_root.inference_factory` —
+Risk-2 mitigation: this ``__init__.py`` MUST NOT import any concrete
+strategy module.
+"""
+
+from gps_denied_onboard._types.inference import (
+    BuildConfig,
+    EngineCacheEntry,
+    EngineHandle,
+    OptimizationProfile,
+    PrecisionMode,
+)
+from gps_denied_onboard._types.thermal import ThermalState
+from gps_denied_onboard.components.c7_inference.config import C7InferenceConfig
+from gps_denied_onboard.components.c7_inference.errors import (
+    CalibrationCacheError,
+    EngineBuildError,
+    EngineDeserializeError,
+    EngineHashMismatchError,
+    EngineSchemaMismatchError,
+    EngineSidecarMissingError,
+    InferenceError,
+    OutOfMemoryError,
+    RuntimeError,
+    TelemetryUnavailableError,
+)
 from gps_denied_onboard.components.c7_inference.interface import InferenceRuntime
+from gps_denied_onboard.config.schema import register_component_block

-__all__ = ["EngineCacheEntry", "InferenceRuntime"]
+register_component_block("c7_inference", C7InferenceConfig)
+
+__all__ = [
+    "BuildConfig",
+    "C7InferenceConfig",
+    "CalibrationCacheError",
+    "EngineBuildError",
+    "EngineCacheEntry",
+    "EngineDeserializeError",
+    "EngineHandle",
+    "EngineHashMismatchError",
+    "EngineSchemaMismatchError",
+    "EngineSidecarMissingError",
+    "InferenceError",
+    "InferenceRuntime",
+    "OptimizationProfile",
+    "OutOfMemoryError",
+    "PrecisionMode",
+    "RuntimeError",
+    "TelemetryUnavailableError",
+    "ThermalState",
+]
@@ -0,0 +1,63 @@
+"""C7 inference runtime config block (AZ-297).
+
+Registered into ``config.components['c7_inference']`` by the package
+``__init__.py``. The composition-root factory
+:func:`gps_denied_onboard.runtime_root.inference_factory.build_inference_runtime`
+reads this block to select the strategy and locate the engine cache.
+"""
+
+from __future__ import annotations
+
+from dataclasses import dataclass
+from typing import Final
+
+from gps_denied_onboard.config.schema import ConfigError
+
+__all__ = [
+    "C7InferenceConfig",
+    "KNOWN_RUNTIMES",
+]
+
+KNOWN_RUNTIMES: Final[frozenset[str]] = frozenset(
+    {"tensorrt", "onnx_trt_ep", "pytorch_fp16"}
+)
+
+
+@dataclass(frozen=True)
+class C7InferenceConfig:
+    """Per-component config for C7 inference.
+
+    ``runtime`` selects exactly one of the three strategies
+    (``tensorrt`` / ``onnx_trt_ep`` / ``pytorch_fp16``); the
+    composition-root factory respects compile-time ``BUILD_*`` gating
+    on top of this label (a runtime whose flag is OFF raises
+    :class:`RuntimeNotAvailableError` at composition time).
+
+    ``thermal_poll_hz`` is the cadence at which AZ-302's
+    ``ThermalStatePublisher`` polls ``tegrastats`` / ``jtop``;
+    default 1 Hz keeps the SDK telemetry source cool.
+
+    ``engine_cache_dir`` is the filesystem root where compiled
+    ``.engine`` binaries + ``.sha256`` sidecars live; the C10
+    pre-flight ``CacheProvisioner`` writes here.
+    """
+
+    runtime: str = "pytorch_fp16"
+    thermal_poll_hz: float = 1.0
+    engine_cache_dir: str = "/var/lib/gps-denied/engines"
+
+    def __post_init__(self) -> None:
+        if self.runtime not in KNOWN_RUNTIMES:
+            raise ConfigError(
+                f"C7InferenceConfig.runtime={self.runtime!r} not in "
+                f"{sorted(KNOWN_RUNTIMES)}"
+            )
+        if self.thermal_poll_hz <= 0:
+            raise ConfigError(
+                f"C7InferenceConfig.thermal_poll_hz must be > 0; "
+                f"got {self.thermal_poll_hz}"
+            )
+        if not self.engine_cache_dir:
+            raise ConfigError(
+                "C7InferenceConfig.engine_cache_dir must be non-empty"
+            )
@@ -0,0 +1,94 @@
+"""C7 inference runtime error taxonomy (AZ-297).
+
+All Protocol methods raise only members of :class:`RuntimeError` (the
+C7-local family — NOT stdlib :class:`builtins.RuntimeError`).
+Consumers catch the family with a single
+``except c7_inference.errors.RuntimeError``; implementations rewrap
+third-party library exceptions (TRT C++ exceptions, ORT internal
+errors, PyTorch CUDA errors) into one of these types.
+
+Two composition-time errors live OUTSIDE the family:
+- :class:`gps_denied_onboard.runtime_root.errors.RuntimeNotAvailableError`
+  is raised by the factory when the requested ``BUILD_*`` is OFF.
+- :class:`gps_denied_onboard.config.schema.ConfigError` is raised at
+  config-load time for unknown ``runtime`` labels.
+"""
+
+from __future__ import annotations
+
+__all__ = [
+    "CalibrationCacheError",
+    "EngineBuildError",
+    "EngineDeserializeError",
+    "EngineHashMismatchError",
+    "EngineSchemaMismatchError",
+    "EngineSidecarMissingError",
+    "InferenceError",
+    "OutOfMemoryError",
+    "RuntimeError",
+    "TelemetryUnavailableError",
+]
+
+
+class RuntimeError(Exception):  # noqa: A001  - the contract names this RuntimeError
+    """Base class for the C7 inference runtime error family.
+
+    Named ``RuntimeError`` per the contract (``inference_runtime_protocol.md``
+    Shape § Error hierarchy); this shadows the stdlib name only inside
+    ``c7_inference.errors``. Consumers MUST import from this module by
+    fully-qualified name to avoid collision.
+    """
+
+
+class EngineBuildError(RuntimeError):
+    """``compile_engine`` failed before producing a usable engine binary."""
+
+
+class EngineDeserializeError(RuntimeError):
+    """``deserialize_engine`` could not load the engine binary into memory."""
+
+
+class EngineHashMismatchError(RuntimeError):
+    """The sha256 of the engine file does not match
+    :attr:`EngineCacheEntry.sha256_hex` (D-C10-3 / cache poisoning gate)."""
+
+
+class EngineSchemaMismatchError(RuntimeError):
+    """The engine's IO schema (input / output tensor names + shapes) does
+    not match the runtime expectation (D-C10-7)."""
+
+
+class EngineSidecarMissingError(RuntimeError):
+    """The ``.sha256`` sidecar file expected next to the engine binary is
+    absent or unreadable."""
+
+
+class CalibrationCacheError(RuntimeError):
+    """``compile_engine`` with ``precision=INT8`` could not load or write
+    the calibration cache."""
+
+
+class InferenceError(RuntimeError):
+    """``infer`` failed during GPU execution.
+
+    The impl MUST rewrap CUDA / TRT C++ / ORT exceptions into this
+    type; raw exceptions MUST NOT escape the Protocol.
+    """
+
+
+class OutOfMemoryError(RuntimeError):  # noqa: A001  - the contract names this OutOfMemoryError
+    """GPU OOM during ``deserialize_engine`` or ``infer``.
+
+    Named ``OutOfMemoryError`` per the contract; this shadows the
+    stdlib name only inside ``c7_inference.errors``.
+    """
+
+
+class TelemetryUnavailableError(RuntimeError):
+    """``thermal_state`` cold-start could not bind to the
+    ``tegrastats`` / ``jtop`` source.
+
+    Steady-state telemetry absence is signalled by
+    ``ThermalState.is_telemetry_available == False`` (NOT this
+    exception); this error is the cold-start fail-fast.
+    """
@@ -1,18 +1,106 @@
-"""C7 `InferenceRuntime` Protocol.
+"""C7 ``InferenceRuntime`` Protocol (AZ-297).

-Concrete impls: `TensorrtRuntime` (production-default; TensorRT 10.3),
-`OnnxTrtEpRuntime` (ONNX Runtime + TensorRT EP), `PytorchFp16Runtime` (research
-baseline). See `_docs/02_document/components/09_c7_inference/`.
+PEP 544 ``typing.Protocol`` with ``runtime_checkable=True``; six
+methods that span engine lifecycle (compile / deserialize / release),
+per-call inference, thermal-state telemetry, and runtime-label
+self-report. Concrete impls — :class:`TensorrtRuntime` (AZ-298),
+:class:`OnnxTrtEpRuntime` (AZ-299), :class:`PytorchFp16Runtime`
+(AZ-300) — live in sibling modules and are imported lazily by
+:mod:`gps_denied_onboard.runtime_root.inference_factory`.
+
+The contract at
+``_docs/02_document/contracts/c7_inference/inference_runtime_protocol.md``
+v1.0.0 is the authoritative shape; this module mirrors it 1:1.
 """

 from __future__ import annotations

-from typing import Any, Protocol
+from pathlib import Path
+from typing import TYPE_CHECKING, Literal, Protocol, runtime_checkable
+
+from gps_denied_onboard._types.inference import (
+    BuildConfig,
+    EngineCacheEntry,
+    EngineHandle,
+)
+from gps_denied_onboard._types.thermal import ThermalState
+
+if TYPE_CHECKING:
+    import numpy as np
+
+__all__ = ["InferenceRuntime"]


+@runtime_checkable
 class InferenceRuntime(Protocol):
-    """Compiled-engine inference runtime."""
+    """On-Jetson inference runtime. See ``inference_runtime_protocol.md`` v1.0.0.

-    def infer(self, inputs: Any) -> Any: ...
+    Implementations: :class:`TensorrtRuntime` (production-default, TRT 10.3),
+    :class:`OnnxTrtEpRuntime` (ONNX Runtime + TensorRT EP fallback),
+    :class:`PytorchFp16Runtime` (research / Tier-0 baseline). Selection
+    is owned by the composition root.
+    """

-    def load(self, engine_path: str) -> None: ...
+    def compile_engine(
+        self, model_path: Path, build_config: BuildConfig
+    ) -> EngineCacheEntry:
+        """Offline (minutes for INT8) engine compilation.
+
+        Raises :class:`EngineBuildError` on compile failure;
+        :class:`CalibrationCacheError` when
+        ``build_config.precision == INT8`` and the calibration data
+        cannot be loaded.
+        """
+        ...
+
+    def deserialize_engine(self, entry: EngineCacheEntry) -> EngineHandle:
+        """Load a compiled engine into GPU memory.
+
+        Raises :class:`EngineDeserializeError` /
+        :class:`EngineHashMismatchError` /
+        :class:`EngineSchemaMismatchError` /
+        :class:`EngineSidecarMissingError` /
+        :class:`OutOfMemoryError` per the contract envelope.
+        """
+        ...
+
+    def infer(
+        self,
+        handle: EngineHandle,
+        inputs: "dict[str, np.ndarray]",
+    ) -> "dict[str, np.ndarray]":
+        """Synchronous request/response inference (Invariant I-8).
+
+        Returns AFTER the GPU stream has synchronised; output tensors
+        are host-resident :class:`numpy.ndarray` ready for consumer
+        use. Raises :class:`InferenceError` /
+        :class:`OutOfMemoryError`.
+        """
+        ...
+
+    def release_engine(self, handle: EngineHandle) -> None:
+        """Drop the engine + free GPU memory.
+
+        Idempotent (Invariant I-7) — second-and-later calls on the
+        same handle return silently.
+        """
+        ...
+
+    def thermal_state(self) -> ThermalState:
+        """Most-recent thermal reading from ``tegrastats`` / ``jtop``.
+
+        Raises :class:`TelemetryUnavailableError` only on cold-start
+        binding failure; steady-state telemetry absence is signalled
+        by ``ThermalState.is_telemetry_available == False`` (Invariant I-6).
+        """
+        ...
+
+    def current_runtime_label(
+        self,
+    ) -> Literal["tensorrt", "onnx_trt_ep", "pytorch_fp16"]:
+        """Identify which concrete strategy is wired here.
+
+        Returned string equals ``config.inference.runtime`` exactly
+        (Invariant I-1); AC-NEW-3 FDR audit relies on this property.
+        """
+        ...
@@ -0,0 +1,118 @@
+"""C7 inference-runtime composition-root factory (AZ-297).
+
+:func:`build_inference_runtime` selects exactly one strategy by
+``config.components['c7_inference'].runtime`` and respects compile-time
+``BUILD_*`` gating: requesting a strategy whose flag is OFF raises
+:class:`RuntimeNotAvailableError` at composition time (NOT at first
+inference call).
+
+The concrete strategy modules (``tensorrt_runtime``, ``onnx_trt_ep_runtime``,
+``pytorch_fp16_runtime``) are imported lazily — a Tier-0 workstation
+build with ``BUILD_TENSORRT_RUNTIME=OFF`` MUST NOT load
+``c7_inference.tensorrt_runtime`` (Invariant I-5; verifiable via
+``sys.modules``).
+"""
+
+from __future__ import annotations
+
+import os
+from typing import TYPE_CHECKING
+
+from gps_denied_onboard.runtime_root.errors import RuntimeNotAvailableError
+
+if TYPE_CHECKING:
+    from gps_denied_onboard.components.c7_inference import (
+        C7InferenceConfig,
+        InferenceRuntime,
+    )
+    from gps_denied_onboard.config.schema import Config
+
+__all__ = ["build_inference_runtime"]
+
+
+_RUNTIME_TO_BUILD_FLAG: dict[str, str] = {
+    "tensorrt": "BUILD_TENSORRT_RUNTIME",
+    "onnx_trt_ep": "BUILD_ONNX_TRT_EP_RUNTIME",
+    "pytorch_fp16": "BUILD_PYTORCH_FP16_RUNTIME",
+}
+
+_RUNTIME_TO_MODULE: dict[str, tuple[str, str]] = {
+    "tensorrt": (
+        "gps_denied_onboard.components.c7_inference.tensorrt_runtime",
+        "TensorrtRuntime",
+    ),
+    "onnx_trt_ep": (
+        "gps_denied_onboard.components.c7_inference.onnx_trt_ep_runtime",
+        "OnnxTrtEpRuntime",
+    ),
+    "pytorch_fp16": (
+        "gps_denied_onboard.components.c7_inference.pytorch_fp16_runtime",
+        "PytorchFp16Runtime",
+    ),
+}
+
+
+def _is_build_flag_on(flag_name: str) -> bool:
+    """Read a compile-time ``BUILD_*`` flag from the environment.
+
+    ``ON`` / ``1`` / ``true`` / ``yes`` (case-insensitive) → ``True``;
+    anything else (including unset) → ``False``. Defaults to OFF so
+    test environments must opt-in explicitly per strategy.
+    """
+    raw = os.environ.get(flag_name, "")
+    return raw.strip().lower() in {"on", "1", "true", "yes"}
+
+
+def _c7_config(config: "Config") -> "C7InferenceConfig":
+    """Pull the registered C7 config block.
+
+    ``c7_inference.__init__`` registers it on import; a missing
+    registration is a developer error and surfaces as ``KeyError``
+    rather than a silent fallback.
+    """
+    return config.components["c7_inference"]
+
+
+def build_inference_runtime(config: "Config") -> "InferenceRuntime":
+    """Construct the :class:`InferenceRuntime` impl selected by config.
+
+    The factory:
+
+    1. Reads ``config.components['c7_inference'].runtime``.
+    2. Checks the matching ``BUILD_*`` flag — if OFF, raises
+       :class:`RuntimeNotAvailableError` BEFORE any import.
+    3. Lazily imports the concrete strategy module.
+    4. Constructs and returns the strategy instance, passing ``config``.
+
+    Raises :class:`RuntimeNotAvailableError` when:
+
+    - The compile-time flag is OFF (the canonical Tier-0 path).
+    - The concrete strategy module has not been built yet (AZ-298 /
+      AZ-299 / AZ-300 are still pending) — the import fails and the
+      factory wraps :class:`ModuleNotFoundError`.
+    """
+    block = _c7_config(config)
+    runtime = block.runtime
+    flag_name = _RUNTIME_TO_BUILD_FLAG.get(runtime)
+    module_info = _RUNTIME_TO_MODULE.get(runtime)
+    if flag_name is None or module_info is None:
+        raise RuntimeNotAvailableError(
+            f"InferenceRuntime runtime {runtime!r} is not buildable in "
+            "this binary."
+        )
+    if not _is_build_flag_on(flag_name):
+        raise RuntimeNotAvailableError(
+            f"InferenceRuntime runtime {runtime!r} requires "
+            f"{flag_name}=ON in this binary; the flag is OFF."
+        )
+    module_name, class_name = module_info
+    try:
+        module = __import__(module_name, fromlist=[class_name])
+    except ModuleNotFoundError as exc:
+        raise RuntimeNotAvailableError(
+            f"InferenceRuntime runtime {runtime!r} is configured but its "
+            f"concrete impl module {module_name!r} has not been built into "
+            "this binary yet (AZ-298 / AZ-299 / AZ-300 pending)."
+        ) from exc
+    strategy_cls = getattr(module, class_name)
+    return strategy_cls(config)