# C7 InferenceRuntime Protocol + Composition-Root Selection **Task**: AZ-297_c7_runtime_protocol **Name**: C7 InferenceRuntime Protocol **Description**: Define the `InferenceRuntime` Protocol, its DTOs (`BuildConfig`, `EngineCacheEntry`, `EngineHandle`, `ThermalState`), the runtime error taxonomy, and the composition-root selection switch that wires exactly one of `TensorrtRuntime` / `OnnxTrtEpRuntime` / `PytorchFp16Runtime` at startup based on ADR-001 (config) and ADR-002 (`BUILD_*` flags). This is the foundational shared-API task for E-C7 — every other E-C7 task implements this Protocol, and five external components (C2, C2.5, C3, C3.5, C10) plus C4 (ThermalState consumer) depend on the contract this task freezes. **Complexity**: 3 points **Dependencies**: AZ-263_initial_structure, AZ-269_config_loader, AZ-266_log_module, AZ-280_sha256_sidecar, AZ-281_engine_filename_schema **Component**: c7_inference (epic AZ-249 / E-C7) **Tracker**: AZ-297 **Epic**: AZ-249 (E-C7) ### Document Dependencies - `_docs/02_document/contracts/shared_helpers/sha256_sidecar.md` — `EngineCacheEntry` carries the sha256 of the engine binary; this contract defines that representation. - `_docs/02_document/contracts/shared_helpers/engine_filename_schema.md` — `EngineCacheEntry` carries the parsed `(SM, JP, TRT, precision)` tuple from the filename schema. - `_docs/02_document/contracts/shared_config/composition_root_protocol.md` — runtime selection is a Config field; this contract defines the field and the runtime-label vocabulary. - `_docs/02_document/contracts/shared_logging/log_record_schema.md` — error events emitted by Protocol implementations use this log shape. ## Problem Five different components (C2 VPR backbone, C2.5 ReRanker, C3 CrossDomainMatcher, C3.5 AdHoP, C10 CacheProvisioner) and one consumer of the thermal-throttle telemetry feed (C4 Pose) all need a single, frozen interface to the on-Jetson inference runtime. Without it: - Each consumer would import a concrete TRT / ONNX-RT / PyTorch class directly, hard-coding the runtime choice and breaking ADR-001's runtime selectability. - `BUILD_TENSORRT_RUNTIME=OFF` (Tier-0 workstation) would not compile because consumers depend on TRT-specific symbols. - The composition root would have to know per-component which runtime is acceptable; today only ADR-001 (config) + ADR-002 (`BUILD_*` flags) decide. - Error handling would diverge per runtime; `EngineHashMismatchError` (D-C10-3) and `EngineSchemaMismatchError` (D-C10-7) would have different shapes per implementation, making the F2 takeoff abort path fragile. - The C4 hybrid covariance decision (D-CROSS-LATENCY-1) would have no canonical `ThermalState` shape to read. This task delivers the typed boundary every consumer reads against and every implementation conforms to. It writes no runtime logic — the concrete TRT / ONNX-RT / PyTorch strategies are AZ-298 / AZ-299 / AZ-300. ## Outcome - A `InferenceRuntime` Protocol (PEP 544 `typing.Protocol`) is exported from `src/gps_denied_onboard/components/c7_inference/interface.py` and re-exported from the component's `__init__.py`. - The DTOs `BuildConfig`, `EngineCacheEntry`, `EngineHandle`, `ThermalState` are dataclasses (frozen) at the same import path; field shape and invariants match the contract file. - The runtime error taxonomy is a single hierarchy under `c7_inference.errors`: `RuntimeError` ← {`EngineBuildError`, `EngineDeserializeError`, `EngineHashMismatchError`, `EngineSchemaMismatchError`, `EngineSidecarMissingError`, `CalibrationCacheError`, `InferenceError`, `OutOfMemoryError`, `TelemetryUnavailableError`}. Every implementation raises only these; consumers catch only these. - The composition root has a `build_inference_runtime(config: Config) -> InferenceRuntime` factory function that selects the strategy by `config.inference.runtime` (`tensorrt` | `onnx_trt_ep` | `pytorch_fp16`) and respects compile-time `BUILD_*` gating: requesting a strategy whose `BUILD_*` flag is OFF raises `RuntimeNotAvailableError` at composition time (NOT at first inference). - Every implementation's `current_runtime_label()` returns the lowercase label matching the config value (`"tensorrt"`, `"onnx_trt_ep"`, `"pytorch_fp16"`); this is the FDR-stamped label for AC-NEW-3 audit. - A frozen contract file at `_docs/02_document/contracts/c7_inference/inference_runtime_protocol.md` carries the full shape; consumers read that file, not this task spec. ## Scope ### Included - `InferenceRuntime` Protocol with the six methods from `_docs/02_document/components/09_c7_inference/description.md` § 2: `compile_engine`, `deserialize_engine`, `infer`, `release_engine`, `thermal_state`, `current_runtime_label`. - DTO dataclasses for `BuildConfig`, `EngineCacheEntry`, `EngineHandle` (opaque marker class), `ThermalState`. All frozen except `EngineHandle` (which is opaque to consumers — implementations subclass). - Error hierarchy under `c7_inference.errors`; every error type the Protocol promises; all are derived from a common `c7_inference.errors.RuntimeError` so consumers can catch the family. - `build_inference_runtime(config) -> InferenceRuntime` composition-root factory in `src/gps_denied_onboard/runtime_root/inference_factory.py`. Imports the concrete strategy lazily — guarded by `if BUILD_TENSORRT_RUNTIME: from c7_inference.tensorrt_runtime import TensorrtRuntime` so an OFF flag does not force an import. - A `RuntimeNotAvailableError` raised by the factory when the requested strategy is not built into this binary. - A `ConfigSchemaError` extension to AZ-269's config loader for the new `config.inference.runtime` enum + the optional `config.inference.thermal_poll_hz` (default 1.0) + `config.inference.engine_cache_dir` fields. - The contract file at `_docs/02_document/contracts/c7_inference/inference_runtime_protocol.md` filled per `decompose/templates/api-contract.md` with Shape, Invariants, Non-Goals, Versioning Rules, and at least three Test Cases. - Type-only unit tests that verify each concrete strategy module's class actually conforms to the Protocol via `runtime_checkable` + `isinstance` (catches drift at CI time, not deployment). ### Excluded - `TensorrtRuntime` implementation — AZ-298. - `OnnxTrtEpRuntime` implementation — AZ-299. - `PytorchFp16Runtime` implementation — AZ-300. - `EngineGate` validator — AZ-301 (this task defines the error types it raises, not the validator). - Background thermal-state polling loop — AZ-302 (this task defines the `ThermalState` DTO and the `thermal_state()` Protocol method, not the polling thread). - C4 hybrid covariance-mode consumer wiring — owned by E-C4. - C10 CacheProvisioner consumer wiring of `compile_engine` — owned by E-C10. ## Acceptance Criteria **AC-1: Protocol is conformance-checkable** Given a class that implements all six Protocol methods with matching signatures When `isinstance(impl, InferenceRuntime)` is evaluated under `runtime_checkable` Then the result is `True`; for a class that omits any method, the result is `False` **AC-2: Frozen DTOs reject mutation** Given a constructed `BuildConfig(precision=Fp16, ...)`, `EngineCacheEntry(...)`, or `ThermalState(...)` instance When the test attempts `instance.precision = Int8` (or any field reassignment) Then `dataclasses.FrozenInstanceError` is raised; the original value is preserved **AC-3: Error hierarchy catchable as a single family** Given any of the nine documented error subtypes When the consumer wraps an implementation call in `try: ... except c7_inference.errors.RuntimeError` Then every documented subtype is caught; an unrelated `Exception` is NOT caught (the Protocol's error envelope does not leak into general exception handling) **AC-4: Composition-root factory honours config** Given `config.inference.runtime = "tensorrt"` and `BUILD_TENSORRT_RUNTIME=ON` When `build_inference_runtime(config)` is called Then a `TensorrtRuntime` instance is returned and `instance.current_runtime_label() == "tensorrt"` **AC-5: Composition-root factory honours BUILD flag gate** Given `config.inference.runtime = "tensorrt"` and `BUILD_TENSORRT_RUNTIME=OFF` When `build_inference_runtime(config)` is called Then `RuntimeNotAvailableError` is raised at composition time with a message naming `"tensorrt"`; no module-level import of TRT symbols has occurred (verifiable via `sys.modules`) **AC-6: Unknown runtime label rejected at config load** Given `config.inference.runtime = "tensorflow_lite"` (not in the enum) When the config is loaded via AZ-269's loader Then `ConfigSchemaError` is raised at load time with a message listing the valid values; `build_inference_runtime` is never reached **AC-7: `current_runtime_label()` matches config value exactly** Given any selectable runtime When `instance.current_runtime_label()` is called Then the returned string is one of `"tensorrt"`, `"onnx_trt_ep"`, `"pytorch_fp16"` and equals `config.inference.runtime`; AC-NEW-3 audit relies on this exact-match property **AC-8: Contract file matches Protocol shape** Given the contract file at `_docs/02_document/contracts/c7_inference/inference_runtime_protocol.md` When a contract-test parses the Shape section's method/field tables and compares against the runtime Protocol via introspection Then every method, every field, every error type is present and consistent in both ## Non-Functional Requirements **Compatibility** - The Protocol is `typing.Protocol` (PEP 544 structural typing) so existing components that import the concrete TRT class today (none yet — this is greenfield) can be retrofitted without inheritance changes. - All error types subclass `Exception` (not `BaseException`) so `except Exception:` in upstream layers continues to work as expected. **Performance** - The factory `build_inference_runtime` returns within 200 ms (it imports + constructs one strategy; the heavy GPU work happens inside the strategy's own `compile_engine` / `deserialize_engine` calls — not the factory). - DTO construction (`BuildConfig`, `EngineCacheEntry`, `ThermalState`) is dataclass-frozen; per-instance overhead is the bare-cost dataclass `__init__`. **Reliability** - The Protocol is the boundary of acceptable runtime errors. Implementations MUST NOT raise other types into consumers; if a third-party library (TRT, ONNX-RT, PyTorch) raises something else, the implementation catches and rewraps into the documented family. - Versioning: any breaking change to the Protocol or its DTOs MUST bump the contract file's `Version` and notify every consumer task listed in the contract header. ## Unit Tests | AC Ref | What to Test | Required Outcome | |--------|-------------|-----------------| | AC-1 | `runtime_checkable` Protocol vs. a fully-implementing fake; vs. a fake missing one method | `isinstance` returns True for full, False for partial | | AC-2 | Mutation attempt on each frozen DTO | `FrozenInstanceError` raised; original value preserved | | AC-3 | Raise each of the nine error subtypes; catch as `c7_inference.errors.RuntimeError` | All caught; an unrelated `ValueError` is NOT caught by the same handler | | AC-4 | `build_inference_runtime` with `tensorrt` + flag ON → fake `TensorrtRuntime` | Returned instance is `TensorrtRuntime`; `current_runtime_label()` == `"tensorrt"` | | AC-5 | `build_inference_runtime` with `tensorrt` + flag OFF | `RuntimeNotAvailableError`; `sys.modules` does NOT contain `c7_inference.tensorrt_runtime` | | AC-6 | Config load with invalid `runtime` value | `ConfigSchemaError`; valid values listed in message | | AC-7 | `current_runtime_label()` for each strategy | Matches the config value used to construct it | | AC-8 | Contract introspection vs. Protocol introspection | Shape parity test passes | | NFR-perf-factory | Microbench `build_inference_runtime` × 1000 | p99 ≤ 200 ms (dominated by lazy import on first call; subsequent calls << 1 ms) | | NFR-reliability-error-family | All nine subtypes inherit from `c7_inference.errors.RuntimeError` | Verified via `issubclass` for each | ## Constraints - The Protocol uses `typing.Protocol` from stdlib; no third-party Protocol library is introduced. - DTO dataclasses use stdlib `dataclasses` with `frozen=True`; no `pydantic` or `attrs` dependency. - `EngineHandle` is an opaque marker class — consumers MUST NOT introspect its fields. Each strategy subclasses with implementation-specific state. The Protocol exposes `EngineHandle` as the type but consumers treat it as a token to pass back to the same strategy. - Lazy import of concrete strategies is mandatory. The factory's `if BUILD_TENSORRT_RUNTIME: from c7_inference.tensorrt_runtime import TensorrtRuntime` block is not optional — it is the mechanism by which Tier-0 workstation builds compile without TRT installed. - The contract file at `_docs/02_document/contracts/c7_inference/inference_runtime_protocol.md` is the source of truth. If the Protocol shape changes here without the contract updating, that is a Spec-Gap finding (High) per code-review skill Phase 2. - This task does NOT add new third-party dependencies — `typing.Protocol`, `dataclasses`, `enum` are stdlib. ## Risks & Mitigation **Risk 1: Protocol drift between contract and code** - *Risk*: Implementations diverge from the contract over time; consumers cannot tell which is canonical. - *Mitigation*: AC-8 contract-introspection test runs in CI; any drift fails the test before merge. The contract file's `## Test Cases` section names this exact test. **Risk 2: Lazy-import gating is bypassed by a transitively-imported module** - *Risk*: A consumer imports `c7_inference` (the package) and the package's `__init__.py` eagerly imports a concrete strategy, triggering the TRT import even when `BUILD_TENSORRT_RUNTIME=OFF`. - *Mitigation*: The package `__init__.py` re-exports ONLY the Protocol and DTOs and errors — it does NOT import any concrete strategy. AC-5 verifies via `sys.modules` that no strategy module is loaded during a Tier-0 factory call. **Risk 3: Error hierarchy widens silently** - *Risk*: A future strategy adds a tenth error type without updating the contract or the family base class. - *Mitigation*: The contract file lists the canonical nine. Implementations MUST raise only members of `c7_inference.errors.RuntimeError`; a strategy raising a non-family error is a Spec-Gap finding (High) at code-review time. AC-3's catch-as-family test catches the obvious case. ## Runtime Completeness - **Named capability**: typed Protocol + DTOs + error envelope + composition-root selection (architecture / E-C7 / ADR-001 + ADR-002 + ADR-009). - **Production code that must exist**: real Protocol declaration, real frozen DTOs, real error hierarchy, real composition-root factory with lazy-import gating, real config-loader extension for the runtime enum. - **Allowed external stubs**: tests MAY substitute fake strategy classes that conform to the Protocol; production wiring uses the real strategies from AZ-298 / AZ-299 / AZ-300. - **Unacceptable substitutes**: ABCs instead of `typing.Protocol` (would force inheritance changes downstream), `pydantic.BaseModel` instead of `@dataclass(frozen=True)` (would add a runtime validation layer this task does not need), eager imports of concrete strategies in `__init__.py` (would defeat `BUILD_*` gating), or a `runtime: str` config field without an enum (would lose the load-time validation in AC-6). ## Contract This task produces/implements the contract at `_docs/02_document/contracts/c7_inference/inference_runtime_protocol.md`. Consumers MUST read that file — not this task spec — to discover the interface.