Freezes the c7_inference Public API per _docs/02_document/contracts/c7_inference/inference_runtime_protocol.md v1.0.0: - InferenceRuntime Protocol (6 methods: compile_engine, deserialize_engine, infer, release_engine, thermal_state, current_runtime_label) in components/c7_inference/interface.py. - DTOs (PrecisionMode enum, OptimizationProfile, BuildConfig, EngineCacheEntry, EngineHandle opaque marker) in _types/inference.py — placed at the L1 types layer so C10 can re-export EngineCacheEntry without crossing the components.* boundary (AZ-270 AC-6). - ThermalState DTO expanded in _types/thermal.py from the AZ-355 forward-declared stub to the AZ-297 contract shape (cpu/gpu temp, thermal_throttle_active, measured_clock_mhz, measured_at_ns, is_telemetry_available). Invariant I-6: when telemetry is unavailable, throttle is False. - Error family rooted at c7_inference.errors.RuntimeError (9 subtypes: EngineBuildError, EngineDeserializeError, EngineHashMismatchError, EngineSchemaMismatchError, EngineSidecarMissingError, CalibrationCacheError, InferenceError, OutOfMemoryError, TelemetryUnavailableError). RuntimeNotAvailableError stays in runtime_root/errors.py — composition-time, outside the family. - C7InferenceConfig per-component config block (runtime label, thermal_poll_hz, engine_cache_dir) with constructor-time validation rejecting unknown runtime labels. - Composition-root factory build_inference_runtime in runtime_root/inference_factory.py with three BUILD_* gates (BUILD_TENSORRT_RUNTIME, BUILD_ONNX_TRT_EP_RUNTIME, BUILD_PYTORCH_FP16_RUNTIME). Concrete strategy modules are imported lazily via __import__ AFTER the flag check, so a Tier-0 build with the flag OFF MUST NOT load the strategy module (AC-5 / I-5; verifiable via sys.modules). - 37 conformance tests cover all 8 ACs + NFR-perf-factory (p99 build under 200 ms × 1000 calls) + NFR-reliability-error-family. AC-8 introspects the contract file's Shape table and asserts method parity against the runtime Protocol; also asserts all 9 error subtypes are documented. Retired the AZ-263 scaffolding EngineCacheEntry from _types/manifests.py (replaced by the AZ-297 canonical shape in _types/inference.py); updated the LightGlue-flavoured EngineHandle Protocol docstring in _types/manifests.py to rationalize its intentional dual existence with the C7 opaque EngineHandle (same name, different consumer-side cut, mirroring the C4/C5 ISam2GraphHandle pattern). Stale ThermalState.throttle docstring references in c4_pose/config.py, c4_pose/interface.py, and _types/pose.py updated to thermal_throttle_active. Full unit-test sweep: 843 passed, 2 pre-existing environment skips (cmake, actionlint). Co-authored-by: Cursor <cursoragent@cursor.com>
15 KiB
C7 InferenceRuntime Protocol + Composition-Root Selection
Task: AZ-297_c7_runtime_protocol
Name: C7 InferenceRuntime Protocol
Description: Define the InferenceRuntime Protocol, its DTOs (BuildConfig, EngineCacheEntry, EngineHandle, ThermalState), the runtime error taxonomy, and the composition-root selection switch that wires exactly one of TensorrtRuntime / OnnxTrtEpRuntime / PytorchFp16Runtime at startup based on ADR-001 (config) and ADR-002 (BUILD_* flags). This is the foundational shared-API task for E-C7 — every other E-C7 task implements this Protocol, and five external components (C2, C2.5, C3, C3.5, C10) plus C4 (ThermalState consumer) depend on the contract this task freezes.
Complexity: 3 points
Dependencies: AZ-263_initial_structure, AZ-269_config_loader, AZ-266_log_module, AZ-280_sha256_sidecar, AZ-281_engine_filename_schema
Component: c7_inference (epic AZ-249 / E-C7)
Tracker: AZ-297
Epic: AZ-249 (E-C7)
Document Dependencies
_docs/02_document/contracts/shared_helpers/sha256_sidecar.md—EngineCacheEntrycarries the sha256 of the engine binary; this contract defines that representation._docs/02_document/contracts/shared_helpers/engine_filename_schema.md—EngineCacheEntrycarries the parsed(SM, JP, TRT, precision)tuple from the filename schema._docs/02_document/contracts/shared_config/composition_root_protocol.md— runtime selection is a Config field; this contract defines the field and the runtime-label vocabulary._docs/02_document/contracts/shared_logging/log_record_schema.md— error events emitted by Protocol implementations use this log shape.
Problem
Five different components (C2 VPR backbone, C2.5 ReRanker, C3 CrossDomainMatcher, C3.5 AdHoP, C10 CacheProvisioner) and one consumer of the thermal-throttle telemetry feed (C4 Pose) all need a single, frozen interface to the on-Jetson inference runtime. Without it:
- Each consumer would import a concrete TRT / ONNX-RT / PyTorch class directly, hard-coding the runtime choice and breaking ADR-001's runtime selectability.
BUILD_TENSORRT_RUNTIME=OFF(Tier-0 workstation) would not compile because consumers depend on TRT-specific symbols.- The composition root would have to know per-component which runtime is acceptable; today only ADR-001 (config) + ADR-002 (
BUILD_*flags) decide. - Error handling would diverge per runtime;
EngineHashMismatchError(D-C10-3) andEngineSchemaMismatchError(D-C10-7) would have different shapes per implementation, making the F2 takeoff abort path fragile. - The C4 hybrid covariance decision (D-CROSS-LATENCY-1) would have no canonical
ThermalStateshape to read.
This task delivers the typed boundary every consumer reads against and every implementation conforms to. It writes no runtime logic — the concrete TRT / ONNX-RT / PyTorch strategies are AZ-298 / AZ-299 / AZ-300.
Outcome
- A
InferenceRuntimeProtocol (PEP 544typing.Protocol) is exported fromsrc/gps_denied_onboard/components/c7_inference/interface.pyand re-exported from the component's__init__.py. - The DTOs
BuildConfig,EngineCacheEntry,EngineHandle,ThermalStateare dataclasses (frozen) at the same import path; field shape and invariants match the contract file. - The runtime error taxonomy is a single hierarchy under
c7_inference.errors:RuntimeError← {EngineBuildError,EngineDeserializeError,EngineHashMismatchError,EngineSchemaMismatchError,EngineSidecarMissingError,CalibrationCacheError,InferenceError,OutOfMemoryError,TelemetryUnavailableError}. Every implementation raises only these; consumers catch only these. - The composition root has a
build_inference_runtime(config: Config) -> InferenceRuntimefactory function that selects the strategy byconfig.inference.runtime(tensorrt|onnx_trt_ep|pytorch_fp16) and respects compile-timeBUILD_*gating: requesting a strategy whoseBUILD_*flag is OFF raisesRuntimeNotAvailableErrorat composition time (NOT at first inference). - Every implementation's
current_runtime_label()returns the lowercase label matching the config value ("tensorrt","onnx_trt_ep","pytorch_fp16"); this is the FDR-stamped label for AC-NEW-3 audit. - A frozen contract file at
_docs/02_document/contracts/c7_inference/inference_runtime_protocol.mdcarries the full shape; consumers read that file, not this task spec.
Scope
Included
InferenceRuntimeProtocol with the six methods from_docs/02_document/components/09_c7_inference/description.md§ 2:compile_engine,deserialize_engine,infer,release_engine,thermal_state,current_runtime_label.- DTO dataclasses for
BuildConfig,EngineCacheEntry,EngineHandle(opaque marker class),ThermalState. All frozen exceptEngineHandle(which is opaque to consumers — implementations subclass). - Error hierarchy under
c7_inference.errors; every error type the Protocol promises; all are derived from a commonc7_inference.errors.RuntimeErrorso consumers can catch the family. build_inference_runtime(config) -> InferenceRuntimecomposition-root factory insrc/gps_denied_onboard/runtime_root/inference_factory.py. Imports the concrete strategy lazily — guarded byif BUILD_TENSORRT_RUNTIME: from c7_inference.tensorrt_runtime import TensorrtRuntimeso an OFF flag does not force an import.- A
RuntimeNotAvailableErrorraised by the factory when the requested strategy is not built into this binary. - A
ConfigSchemaErrorextension to AZ-269's config loader for the newconfig.inference.runtimeenum + the optionalconfig.inference.thermal_poll_hz(default 1.0) +config.inference.engine_cache_dirfields. - The contract file at
_docs/02_document/contracts/c7_inference/inference_runtime_protocol.mdfilled perdecompose/templates/api-contract.mdwith Shape, Invariants, Non-Goals, Versioning Rules, and at least three Test Cases. - Type-only unit tests that verify each concrete strategy module's class actually conforms to the Protocol via
runtime_checkable+isinstance(catches drift at CI time, not deployment).
Excluded
TensorrtRuntimeimplementation — AZ-298.OnnxTrtEpRuntimeimplementation — AZ-299.PytorchFp16Runtimeimplementation — AZ-300.EngineGatevalidator — AZ-301 (this task defines the error types it raises, not the validator).- Background thermal-state polling loop — AZ-302 (this task defines the
ThermalStateDTO and thethermal_state()Protocol method, not the polling thread). - C4 hybrid covariance-mode consumer wiring — owned by E-C4.
- C10 CacheProvisioner consumer wiring of
compile_engine— owned by E-C10.
Acceptance Criteria
AC-1: Protocol is conformance-checkable
Given a class that implements all six Protocol methods with matching signatures
When isinstance(impl, InferenceRuntime) is evaluated under runtime_checkable
Then the result is True; for a class that omits any method, the result is False
AC-2: Frozen DTOs reject mutation
Given a constructed BuildConfig(precision=Fp16, ...), EngineCacheEntry(...), or ThermalState(...) instance
When the test attempts instance.precision = Int8 (or any field reassignment)
Then dataclasses.FrozenInstanceError is raised; the original value is preserved
AC-3: Error hierarchy catchable as a single family
Given any of the nine documented error subtypes
When the consumer wraps an implementation call in try: ... except c7_inference.errors.RuntimeError
Then every documented subtype is caught; an unrelated Exception is NOT caught (the Protocol's error envelope does not leak into general exception handling)
AC-4: Composition-root factory honours config
Given config.inference.runtime = "tensorrt" and BUILD_TENSORRT_RUNTIME=ON
When build_inference_runtime(config) is called
Then a TensorrtRuntime instance is returned and instance.current_runtime_label() == "tensorrt"
AC-5: Composition-root factory honours BUILD flag gate
Given config.inference.runtime = "tensorrt" and BUILD_TENSORRT_RUNTIME=OFF
When build_inference_runtime(config) is called
Then RuntimeNotAvailableError is raised at composition time with a message naming "tensorrt"; no module-level import of TRT symbols has occurred (verifiable via sys.modules)
AC-6: Unknown runtime label rejected at config load
Given config.inference.runtime = "tensorflow_lite" (not in the enum)
When the config is loaded via AZ-269's loader
Then ConfigSchemaError is raised at load time with a message listing the valid values; build_inference_runtime is never reached
AC-7: current_runtime_label() matches config value exactly
Given any selectable runtime
When instance.current_runtime_label() is called
Then the returned string is one of "tensorrt", "onnx_trt_ep", "pytorch_fp16" and equals config.inference.runtime; AC-NEW-3 audit relies on this exact-match property
AC-8: Contract file matches Protocol shape
Given the contract file at _docs/02_document/contracts/c7_inference/inference_runtime_protocol.md
When a contract-test parses the Shape section's method/field tables and compares against the runtime Protocol via introspection
Then every method, every field, every error type is present and consistent in both
Non-Functional Requirements
Compatibility
- The Protocol is
typing.Protocol(PEP 544 structural typing) so existing components that import the concrete TRT class today (none yet — this is greenfield) can be retrofitted without inheritance changes. - All error types subclass
Exception(notBaseException) soexcept Exception:in upstream layers continues to work as expected.
Performance
- The factory
build_inference_runtimereturns within 200 ms (it imports + constructs one strategy; the heavy GPU work happens inside the strategy's owncompile_engine/deserialize_enginecalls — not the factory). - DTO construction (
BuildConfig,EngineCacheEntry,ThermalState) is dataclass-frozen; per-instance overhead is the bare-cost dataclass__init__.
Reliability
- The Protocol is the boundary of acceptable runtime errors. Implementations MUST NOT raise other types into consumers; if a third-party library (TRT, ONNX-RT, PyTorch) raises something else, the implementation catches and rewraps into the documented family.
- Versioning: any breaking change to the Protocol or its DTOs MUST bump the contract file's
Versionand notify every consumer task listed in the contract header.
Unit Tests
| AC Ref | What to Test | Required Outcome |
|---|---|---|
| AC-1 | runtime_checkable Protocol vs. a fully-implementing fake; vs. a fake missing one method |
isinstance returns True for full, False for partial |
| AC-2 | Mutation attempt on each frozen DTO | FrozenInstanceError raised; original value preserved |
| AC-3 | Raise each of the nine error subtypes; catch as c7_inference.errors.RuntimeError |
All caught; an unrelated ValueError is NOT caught by the same handler |
| AC-4 | build_inference_runtime with tensorrt + flag ON → fake TensorrtRuntime |
Returned instance is TensorrtRuntime; current_runtime_label() == "tensorrt" |
| AC-5 | build_inference_runtime with tensorrt + flag OFF |
RuntimeNotAvailableError; sys.modules does NOT contain c7_inference.tensorrt_runtime |
| AC-6 | Config load with invalid runtime value |
ConfigSchemaError; valid values listed in message |
| AC-7 | current_runtime_label() for each strategy |
Matches the config value used to construct it |
| AC-8 | Contract introspection vs. Protocol introspection | Shape parity test passes |
| NFR-perf-factory | Microbench build_inference_runtime × 1000 |
p99 ≤ 200 ms (dominated by lazy import on first call; subsequent calls << 1 ms) |
| NFR-reliability-error-family | All nine subtypes inherit from c7_inference.errors.RuntimeError |
Verified via issubclass for each |
Constraints
- The Protocol uses
typing.Protocolfrom stdlib; no third-party Protocol library is introduced. - DTO dataclasses use stdlib
dataclasseswithfrozen=True; nopydanticorattrsdependency. EngineHandleis an opaque marker class — consumers MUST NOT introspect its fields. Each strategy subclasses with implementation-specific state. The Protocol exposesEngineHandleas the type but consumers treat it as a token to pass back to the same strategy.- Lazy import of concrete strategies is mandatory. The factory's
if BUILD_TENSORRT_RUNTIME: from c7_inference.tensorrt_runtime import TensorrtRuntimeblock is not optional — it is the mechanism by which Tier-0 workstation builds compile without TRT installed. - The contract file at
_docs/02_document/contracts/c7_inference/inference_runtime_protocol.mdis the source of truth. If the Protocol shape changes here without the contract updating, that is a Spec-Gap finding (High) per code-review skill Phase 2. - This task does NOT add new third-party dependencies —
typing.Protocol,dataclasses,enumare stdlib.
Risks & Mitigation
Risk 1: Protocol drift between contract and code
- Risk: Implementations diverge from the contract over time; consumers cannot tell which is canonical.
- Mitigation: AC-8 contract-introspection test runs in CI; any drift fails the test before merge. The contract file's
## Test Casessection names this exact test.
Risk 2: Lazy-import gating is bypassed by a transitively-imported module
- Risk: A consumer imports
c7_inference(the package) and the package's__init__.pyeagerly imports a concrete strategy, triggering the TRT import even whenBUILD_TENSORRT_RUNTIME=OFF. - Mitigation: The package
__init__.pyre-exports ONLY the Protocol and DTOs and errors — it does NOT import any concrete strategy. AC-5 verifies viasys.modulesthat no strategy module is loaded during a Tier-0 factory call.
Risk 3: Error hierarchy widens silently
- Risk: A future strategy adds a tenth error type without updating the contract or the family base class.
- Mitigation: The contract file lists the canonical nine. Implementations MUST raise only members of
c7_inference.errors.RuntimeError; a strategy raising a non-family error is a Spec-Gap finding (High) at code-review time. AC-3's catch-as-family test catches the obvious case.
Runtime Completeness
- Named capability: typed Protocol + DTOs + error envelope + composition-root selection (architecture / E-C7 / ADR-001 + ADR-002 + ADR-009).
- Production code that must exist: real Protocol declaration, real frozen DTOs, real error hierarchy, real composition-root factory with lazy-import gating, real config-loader extension for the runtime enum.
- Allowed external stubs: tests MAY substitute fake strategy classes that conform to the Protocol; production wiring uses the real strategies from AZ-298 / AZ-299 / AZ-300.
- Unacceptable substitutes: ABCs instead of
typing.Protocol(would force inheritance changes downstream),pydantic.BaseModelinstead of@dataclass(frozen=True)(would add a runtime validation layer this task does not need), eager imports of concrete strategies in__init__.py(would defeatBUILD_*gating), or aruntime: strconfig field without an enum (would lose the load-time validation in AC-6).
Contract
This task produces/implements the contract at _docs/02_document/contracts/c7_inference/inference_runtime_protocol.md.
Consumers MUST read that file — not this task spec — to discover the interface.