[AZ-297] C7 InferenceRuntime: Protocol + DTOs + factory

Freezes the c7_inference Public API per
_docs/02_document/contracts/c7_inference/inference_runtime_protocol.md
v1.0.0:

- InferenceRuntime Protocol (6 methods: compile_engine,
  deserialize_engine, infer, release_engine, thermal_state,
  current_runtime_label) in components/c7_inference/interface.py.
- DTOs (PrecisionMode enum, OptimizationProfile, BuildConfig,
  EngineCacheEntry, EngineHandle opaque marker) in _types/inference.py
  — placed at the L1 types layer so C10 can re-export EngineCacheEntry
  without crossing the components.* boundary (AZ-270 AC-6).
- ThermalState DTO expanded in _types/thermal.py from the AZ-355
  forward-declared stub to the AZ-297 contract shape (cpu/gpu temp,
  thermal_throttle_active, measured_clock_mhz, measured_at_ns,
  is_telemetry_available). Invariant I-6: when telemetry is
  unavailable, throttle is False.
- Error family rooted at c7_inference.errors.RuntimeError (9 subtypes:
  EngineBuildError, EngineDeserializeError, EngineHashMismatchError,
  EngineSchemaMismatchError, EngineSidecarMissingError,
  CalibrationCacheError, InferenceError, OutOfMemoryError,
  TelemetryUnavailableError). RuntimeNotAvailableError stays in
  runtime_root/errors.py — composition-time, outside the family.
- C7InferenceConfig per-component config block (runtime label,
  thermal_poll_hz, engine_cache_dir) with constructor-time validation
  rejecting unknown runtime labels.
- Composition-root factory build_inference_runtime in
  runtime_root/inference_factory.py with three BUILD_* gates
  (BUILD_TENSORRT_RUNTIME, BUILD_ONNX_TRT_EP_RUNTIME,
  BUILD_PYTORCH_FP16_RUNTIME). Concrete strategy modules are imported
  lazily via __import__ AFTER the flag check, so a Tier-0 build with
  the flag OFF MUST NOT load the strategy module (AC-5 / I-5;
  verifiable via sys.modules).
- 37 conformance tests cover all 8 ACs + NFR-perf-factory
  (p99 build under 200 ms × 1000 calls) + NFR-reliability-error-family.
  AC-8 introspects the contract file's Shape table and asserts method
  parity against the runtime Protocol; also asserts all 9 error
  subtypes are documented.

Retired the AZ-263 scaffolding EngineCacheEntry from _types/manifests.py
(replaced by the AZ-297 canonical shape in _types/inference.py); updated
the LightGlue-flavoured EngineHandle Protocol docstring in
_types/manifests.py to rationalize its intentional dual existence
with the C7 opaque EngineHandle (same name, different consumer-side
cut, mirroring the C4/C5 ISam2GraphHandle pattern).

Stale ThermalState.throttle docstring references in c4_pose/config.py,
c4_pose/interface.py, and _types/pose.py updated to
thermal_throttle_active.

Full unit-test sweep: 843 passed, 2 pre-existing environment skips
(cmake, actionlint).

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-12 04:30:14 +03:00
parent f925af9de3
commit daff5d4d1c
15 changed files with 1089 additions and 60 deletions
@@ -1,167 +0,0 @@
# C7 InferenceRuntime Protocol + Composition-Root Selection
**Task**: AZ-297_c7_runtime_protocol
**Name**: C7 InferenceRuntime Protocol
**Description**: Define the `InferenceRuntime` Protocol, its DTOs (`BuildConfig`, `EngineCacheEntry`, `EngineHandle`, `ThermalState`), the runtime error taxonomy, and the composition-root selection switch that wires exactly one of `TensorrtRuntime` / `OnnxTrtEpRuntime` / `PytorchFp16Runtime` at startup based on ADR-001 (config) and ADR-002 (`BUILD_*` flags). This is the foundational shared-API task for E-C7 — every other E-C7 task implements this Protocol, and five external components (C2, C2.5, C3, C3.5, C10) plus C4 (ThermalState consumer) depend on the contract this task freezes.
**Complexity**: 3 points
**Dependencies**: AZ-263_initial_structure, AZ-269_config_loader, AZ-266_log_module, AZ-280_sha256_sidecar, AZ-281_engine_filename_schema
**Component**: c7_inference (epic AZ-249 / E-C7)
**Tracker**: AZ-297
**Epic**: AZ-249 (E-C7)
### Document Dependencies
- `_docs/02_document/contracts/shared_helpers/sha256_sidecar.md``EngineCacheEntry` carries the sha256 of the engine binary; this contract defines that representation.
- `_docs/02_document/contracts/shared_helpers/engine_filename_schema.md``EngineCacheEntry` carries the parsed `(SM, JP, TRT, precision)` tuple from the filename schema.
- `_docs/02_document/contracts/shared_config/composition_root_protocol.md` — runtime selection is a Config field; this contract defines the field and the runtime-label vocabulary.
- `_docs/02_document/contracts/shared_logging/log_record_schema.md` — error events emitted by Protocol implementations use this log shape.
## Problem
Five different components (C2 VPR backbone, C2.5 ReRanker, C3 CrossDomainMatcher, C3.5 AdHoP, C10 CacheProvisioner) and one consumer of the thermal-throttle telemetry feed (C4 Pose) all need a single, frozen interface to the on-Jetson inference runtime. Without it:
- Each consumer would import a concrete TRT / ONNX-RT / PyTorch class directly, hard-coding the runtime choice and breaking ADR-001's runtime selectability.
- `BUILD_TENSORRT_RUNTIME=OFF` (Tier-0 workstation) would not compile because consumers depend on TRT-specific symbols.
- The composition root would have to know per-component which runtime is acceptable; today only ADR-001 (config) + ADR-002 (`BUILD_*` flags) decide.
- Error handling would diverge per runtime; `EngineHashMismatchError` (D-C10-3) and `EngineSchemaMismatchError` (D-C10-7) would have different shapes per implementation, making the F2 takeoff abort path fragile.
- The C4 hybrid covariance decision (D-CROSS-LATENCY-1) would have no canonical `ThermalState` shape to read.
This task delivers the typed boundary every consumer reads against and every implementation conforms to. It writes no runtime logic — the concrete TRT / ONNX-RT / PyTorch strategies are AZ-298 / AZ-299 / AZ-300.
## Outcome
- A `InferenceRuntime` Protocol (PEP 544 `typing.Protocol`) is exported from `src/gps_denied_onboard/components/c7_inference/interface.py` and re-exported from the component's `__init__.py`.
- The DTOs `BuildConfig`, `EngineCacheEntry`, `EngineHandle`, `ThermalState` are dataclasses (frozen) at the same import path; field shape and invariants match the contract file.
- The runtime error taxonomy is a single hierarchy under `c7_inference.errors`: `RuntimeError` ← {`EngineBuildError`, `EngineDeserializeError`, `EngineHashMismatchError`, `EngineSchemaMismatchError`, `EngineSidecarMissingError`, `CalibrationCacheError`, `InferenceError`, `OutOfMemoryError`, `TelemetryUnavailableError`}. Every implementation raises only these; consumers catch only these.
- The composition root has a `build_inference_runtime(config: Config) -> InferenceRuntime` factory function that selects the strategy by `config.inference.runtime` (`tensorrt` | `onnx_trt_ep` | `pytorch_fp16`) and respects compile-time `BUILD_*` gating: requesting a strategy whose `BUILD_*` flag is OFF raises `RuntimeNotAvailableError` at composition time (NOT at first inference).
- Every implementation's `current_runtime_label()` returns the lowercase label matching the config value (`"tensorrt"`, `"onnx_trt_ep"`, `"pytorch_fp16"`); this is the FDR-stamped label for AC-NEW-3 audit.
- A frozen contract file at `_docs/02_document/contracts/c7_inference/inference_runtime_protocol.md` carries the full shape; consumers read that file, not this task spec.
## Scope
### Included
- `InferenceRuntime` Protocol with the six methods from `_docs/02_document/components/09_c7_inference/description.md` § 2: `compile_engine`, `deserialize_engine`, `infer`, `release_engine`, `thermal_state`, `current_runtime_label`.
- DTO dataclasses for `BuildConfig`, `EngineCacheEntry`, `EngineHandle` (opaque marker class), `ThermalState`. All frozen except `EngineHandle` (which is opaque to consumers — implementations subclass).
- Error hierarchy under `c7_inference.errors`; every error type the Protocol promises; all are derived from a common `c7_inference.errors.RuntimeError` so consumers can catch the family.
- `build_inference_runtime(config) -> InferenceRuntime` composition-root factory in `src/gps_denied_onboard/runtime_root/inference_factory.py`. Imports the concrete strategy lazily — guarded by `if BUILD_TENSORRT_RUNTIME: from c7_inference.tensorrt_runtime import TensorrtRuntime` so an OFF flag does not force an import.
- A `RuntimeNotAvailableError` raised by the factory when the requested strategy is not built into this binary.
- A `ConfigSchemaError` extension to AZ-269's config loader for the new `config.inference.runtime` enum + the optional `config.inference.thermal_poll_hz` (default 1.0) + `config.inference.engine_cache_dir` fields.
- The contract file at `_docs/02_document/contracts/c7_inference/inference_runtime_protocol.md` filled per `decompose/templates/api-contract.md` with Shape, Invariants, Non-Goals, Versioning Rules, and at least three Test Cases.
- Type-only unit tests that verify each concrete strategy module's class actually conforms to the Protocol via `runtime_checkable` + `isinstance` (catches drift at CI time, not deployment).
### Excluded
- `TensorrtRuntime` implementation — AZ-298.
- `OnnxTrtEpRuntime` implementation — AZ-299.
- `PytorchFp16Runtime` implementation — AZ-300.
- `EngineGate` validator — AZ-301 (this task defines the error types it raises, not the validator).
- Background thermal-state polling loop — AZ-302 (this task defines the `ThermalState` DTO and the `thermal_state()` Protocol method, not the polling thread).
- C4 hybrid covariance-mode consumer wiring — owned by E-C4.
- C10 CacheProvisioner consumer wiring of `compile_engine` — owned by E-C10.
## Acceptance Criteria
**AC-1: Protocol is conformance-checkable**
Given a class that implements all six Protocol methods with matching signatures
When `isinstance(impl, InferenceRuntime)` is evaluated under `runtime_checkable`
Then the result is `True`; for a class that omits any method, the result is `False`
**AC-2: Frozen DTOs reject mutation**
Given a constructed `BuildConfig(precision=Fp16, ...)`, `EngineCacheEntry(...)`, or `ThermalState(...)` instance
When the test attempts `instance.precision = Int8` (or any field reassignment)
Then `dataclasses.FrozenInstanceError` is raised; the original value is preserved
**AC-3: Error hierarchy catchable as a single family**
Given any of the nine documented error subtypes
When the consumer wraps an implementation call in `try: ... except c7_inference.errors.RuntimeError`
Then every documented subtype is caught; an unrelated `Exception` is NOT caught (the Protocol's error envelope does not leak into general exception handling)
**AC-4: Composition-root factory honours config**
Given `config.inference.runtime = "tensorrt"` and `BUILD_TENSORRT_RUNTIME=ON`
When `build_inference_runtime(config)` is called
Then a `TensorrtRuntime` instance is returned and `instance.current_runtime_label() == "tensorrt"`
**AC-5: Composition-root factory honours BUILD flag gate**
Given `config.inference.runtime = "tensorrt"` and `BUILD_TENSORRT_RUNTIME=OFF`
When `build_inference_runtime(config)` is called
Then `RuntimeNotAvailableError` is raised at composition time with a message naming `"tensorrt"`; no module-level import of TRT symbols has occurred (verifiable via `sys.modules`)
**AC-6: Unknown runtime label rejected at config load**
Given `config.inference.runtime = "tensorflow_lite"` (not in the enum)
When the config is loaded via AZ-269's loader
Then `ConfigSchemaError` is raised at load time with a message listing the valid values; `build_inference_runtime` is never reached
**AC-7: `current_runtime_label()` matches config value exactly**
Given any selectable runtime
When `instance.current_runtime_label()` is called
Then the returned string is one of `"tensorrt"`, `"onnx_trt_ep"`, `"pytorch_fp16"` and equals `config.inference.runtime`; AC-NEW-3 audit relies on this exact-match property
**AC-8: Contract file matches Protocol shape**
Given the contract file at `_docs/02_document/contracts/c7_inference/inference_runtime_protocol.md`
When a contract-test parses the Shape section's method/field tables and compares against the runtime Protocol via introspection
Then every method, every field, every error type is present and consistent in both
## Non-Functional Requirements
**Compatibility**
- The Protocol is `typing.Protocol` (PEP 544 structural typing) so existing components that import the concrete TRT class today (none yet — this is greenfield) can be retrofitted without inheritance changes.
- All error types subclass `Exception` (not `BaseException`) so `except Exception:` in upstream layers continues to work as expected.
**Performance**
- The factory `build_inference_runtime` returns within 200 ms (it imports + constructs one strategy; the heavy GPU work happens inside the strategy's own `compile_engine` / `deserialize_engine` calls — not the factory).
- DTO construction (`BuildConfig`, `EngineCacheEntry`, `ThermalState`) is dataclass-frozen; per-instance overhead is the bare-cost dataclass `__init__`.
**Reliability**
- The Protocol is the boundary of acceptable runtime errors. Implementations MUST NOT raise other types into consumers; if a third-party library (TRT, ONNX-RT, PyTorch) raises something else, the implementation catches and rewraps into the documented family.
- Versioning: any breaking change to the Protocol or its DTOs MUST bump the contract file's `Version` and notify every consumer task listed in the contract header.
## Unit Tests
| AC Ref | What to Test | Required Outcome |
|--------|-------------|-----------------|
| AC-1 | `runtime_checkable` Protocol vs. a fully-implementing fake; vs. a fake missing one method | `isinstance` returns True for full, False for partial |
| AC-2 | Mutation attempt on each frozen DTO | `FrozenInstanceError` raised; original value preserved |
| AC-3 | Raise each of the nine error subtypes; catch as `c7_inference.errors.RuntimeError` | All caught; an unrelated `ValueError` is NOT caught by the same handler |
| AC-4 | `build_inference_runtime` with `tensorrt` + flag ON → fake `TensorrtRuntime` | Returned instance is `TensorrtRuntime`; `current_runtime_label()` == `"tensorrt"` |
| AC-5 | `build_inference_runtime` with `tensorrt` + flag OFF | `RuntimeNotAvailableError`; `sys.modules` does NOT contain `c7_inference.tensorrt_runtime` |
| AC-6 | Config load with invalid `runtime` value | `ConfigSchemaError`; valid values listed in message |
| AC-7 | `current_runtime_label()` for each strategy | Matches the config value used to construct it |
| AC-8 | Contract introspection vs. Protocol introspection | Shape parity test passes |
| NFR-perf-factory | Microbench `build_inference_runtime` × 1000 | p99 ≤ 200 ms (dominated by lazy import on first call; subsequent calls << 1 ms) |
| NFR-reliability-error-family | All nine subtypes inherit from `c7_inference.errors.RuntimeError` | Verified via `issubclass` for each |
## Constraints
- The Protocol uses `typing.Protocol` from stdlib; no third-party Protocol library is introduced.
- DTO dataclasses use stdlib `dataclasses` with `frozen=True`; no `pydantic` or `attrs` dependency.
- `EngineHandle` is an opaque marker class — consumers MUST NOT introspect its fields. Each strategy subclasses with implementation-specific state. The Protocol exposes `EngineHandle` as the type but consumers treat it as a token to pass back to the same strategy.
- Lazy import of concrete strategies is mandatory. The factory's `if BUILD_TENSORRT_RUNTIME: from c7_inference.tensorrt_runtime import TensorrtRuntime` block is not optional — it is the mechanism by which Tier-0 workstation builds compile without TRT installed.
- The contract file at `_docs/02_document/contracts/c7_inference/inference_runtime_protocol.md` is the source of truth. If the Protocol shape changes here without the contract updating, that is a Spec-Gap finding (High) per code-review skill Phase 2.
- This task does NOT add new third-party dependencies — `typing.Protocol`, `dataclasses`, `enum` are stdlib.
## Risks & Mitigation
**Risk 1: Protocol drift between contract and code**
- *Risk*: Implementations diverge from the contract over time; consumers cannot tell which is canonical.
- *Mitigation*: AC-8 contract-introspection test runs in CI; any drift fails the test before merge. The contract file's `## Test Cases` section names this exact test.
**Risk 2: Lazy-import gating is bypassed by a transitively-imported module**
- *Risk*: A consumer imports `c7_inference` (the package) and the package's `__init__.py` eagerly imports a concrete strategy, triggering the TRT import even when `BUILD_TENSORRT_RUNTIME=OFF`.
- *Mitigation*: The package `__init__.py` re-exports ONLY the Protocol and DTOs and errors — it does NOT import any concrete strategy. AC-5 verifies via `sys.modules` that no strategy module is loaded during a Tier-0 factory call.
**Risk 3: Error hierarchy widens silently**
- *Risk*: A future strategy adds a tenth error type without updating the contract or the family base class.
- *Mitigation*: The contract file lists the canonical nine. Implementations MUST raise only members of `c7_inference.errors.RuntimeError`; a strategy raising a non-family error is a Spec-Gap finding (High) at code-review time. AC-3's catch-as-family test catches the obvious case.
## Runtime Completeness
- **Named capability**: typed Protocol + DTOs + error envelope + composition-root selection (architecture / E-C7 / ADR-001 + ADR-002 + ADR-009).
- **Production code that must exist**: real Protocol declaration, real frozen DTOs, real error hierarchy, real composition-root factory with lazy-import gating, real config-loader extension for the runtime enum.
- **Allowed external stubs**: tests MAY substitute fake strategy classes that conform to the Protocol; production wiring uses the real strategies from AZ-298 / AZ-299 / AZ-300.
- **Unacceptable substitutes**: ABCs instead of `typing.Protocol` (would force inheritance changes downstream), `pydantic.BaseModel` instead of `@dataclass(frozen=True)` (would add a runtime validation layer this task does not need), eager imports of concrete strategies in `__init__.py` (would defeat `BUILD_*` gating), or a `runtime: str` config field without an enum (would lose the load-time validation in AC-6).
## Contract
This task produces/implements the contract at `_docs/02_document/contracts/c7_inference/inference_runtime_protocol.md`.
Consumers MUST read that file — not this task spec — to discover the interface.