Files
Oleksandr Bezdieniezhnykh daff5d4d1c [AZ-297] C7 InferenceRuntime: Protocol + DTOs + factory
Freezes the c7_inference Public API per
_docs/02_document/contracts/c7_inference/inference_runtime_protocol.md
v1.0.0:

- InferenceRuntime Protocol (6 methods: compile_engine,
  deserialize_engine, infer, release_engine, thermal_state,
  current_runtime_label) in components/c7_inference/interface.py.
- DTOs (PrecisionMode enum, OptimizationProfile, BuildConfig,
  EngineCacheEntry, EngineHandle opaque marker) in _types/inference.py
  — placed at the L1 types layer so C10 can re-export EngineCacheEntry
  without crossing the components.* boundary (AZ-270 AC-6).
- ThermalState DTO expanded in _types/thermal.py from the AZ-355
  forward-declared stub to the AZ-297 contract shape (cpu/gpu temp,
  thermal_throttle_active, measured_clock_mhz, measured_at_ns,
  is_telemetry_available). Invariant I-6: when telemetry is
  unavailable, throttle is False.
- Error family rooted at c7_inference.errors.RuntimeError (9 subtypes:
  EngineBuildError, EngineDeserializeError, EngineHashMismatchError,
  EngineSchemaMismatchError, EngineSidecarMissingError,
  CalibrationCacheError, InferenceError, OutOfMemoryError,
  TelemetryUnavailableError). RuntimeNotAvailableError stays in
  runtime_root/errors.py — composition-time, outside the family.
- C7InferenceConfig per-component config block (runtime label,
  thermal_poll_hz, engine_cache_dir) with constructor-time validation
  rejecting unknown runtime labels.
- Composition-root factory build_inference_runtime in
  runtime_root/inference_factory.py with three BUILD_* gates
  (BUILD_TENSORRT_RUNTIME, BUILD_ONNX_TRT_EP_RUNTIME,
  BUILD_PYTORCH_FP16_RUNTIME). Concrete strategy modules are imported
  lazily via __import__ AFTER the flag check, so a Tier-0 build with
  the flag OFF MUST NOT load the strategy module (AC-5 / I-5;
  verifiable via sys.modules).
- 37 conformance tests cover all 8 ACs + NFR-perf-factory
  (p99 build under 200 ms × 1000 calls) + NFR-reliability-error-family.
  AC-8 introspects the contract file's Shape table and asserts method
  parity against the runtime Protocol; also asserts all 9 error
  subtypes are documented.

Retired the AZ-263 scaffolding EngineCacheEntry from _types/manifests.py
(replaced by the AZ-297 canonical shape in _types/inference.py); updated
the LightGlue-flavoured EngineHandle Protocol docstring in
_types/manifests.py to rationalize its intentional dual existence
with the C7 opaque EngineHandle (same name, different consumer-side
cut, mirroring the C4/C5 ISam2GraphHandle pattern).

Stale ThermalState.throttle docstring references in c4_pose/config.py,
c4_pose/interface.py, and _types/pose.py updated to
thermal_throttle_active.

Full unit-test sweep: 843 passed, 2 pre-existing environment skips
(cmake, actionlint).

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-12 04:30:14 +03:00

15 KiB
Raw Permalink Blame History

C7 InferenceRuntime Protocol + Composition-Root Selection

Task: AZ-297_c7_runtime_protocol Name: C7 InferenceRuntime Protocol Description: Define the InferenceRuntime Protocol, its DTOs (BuildConfig, EngineCacheEntry, EngineHandle, ThermalState), the runtime error taxonomy, and the composition-root selection switch that wires exactly one of TensorrtRuntime / OnnxTrtEpRuntime / PytorchFp16Runtime at startup based on ADR-001 (config) and ADR-002 (BUILD_* flags). This is the foundational shared-API task for E-C7 — every other E-C7 task implements this Protocol, and five external components (C2, C2.5, C3, C3.5, C10) plus C4 (ThermalState consumer) depend on the contract this task freezes. Complexity: 3 points Dependencies: AZ-263_initial_structure, AZ-269_config_loader, AZ-266_log_module, AZ-280_sha256_sidecar, AZ-281_engine_filename_schema Component: c7_inference (epic AZ-249 / E-C7) Tracker: AZ-297 Epic: AZ-249 (E-C7)

Document Dependencies

  • _docs/02_document/contracts/shared_helpers/sha256_sidecar.mdEngineCacheEntry carries the sha256 of the engine binary; this contract defines that representation.
  • _docs/02_document/contracts/shared_helpers/engine_filename_schema.mdEngineCacheEntry carries the parsed (SM, JP, TRT, precision) tuple from the filename schema.
  • _docs/02_document/contracts/shared_config/composition_root_protocol.md — runtime selection is a Config field; this contract defines the field and the runtime-label vocabulary.
  • _docs/02_document/contracts/shared_logging/log_record_schema.md — error events emitted by Protocol implementations use this log shape.

Problem

Five different components (C2 VPR backbone, C2.5 ReRanker, C3 CrossDomainMatcher, C3.5 AdHoP, C10 CacheProvisioner) and one consumer of the thermal-throttle telemetry feed (C4 Pose) all need a single, frozen interface to the on-Jetson inference runtime. Without it:

  • Each consumer would import a concrete TRT / ONNX-RT / PyTorch class directly, hard-coding the runtime choice and breaking ADR-001's runtime selectability.
  • BUILD_TENSORRT_RUNTIME=OFF (Tier-0 workstation) would not compile because consumers depend on TRT-specific symbols.
  • The composition root would have to know per-component which runtime is acceptable; today only ADR-001 (config) + ADR-002 (BUILD_* flags) decide.
  • Error handling would diverge per runtime; EngineHashMismatchError (D-C10-3) and EngineSchemaMismatchError (D-C10-7) would have different shapes per implementation, making the F2 takeoff abort path fragile.
  • The C4 hybrid covariance decision (D-CROSS-LATENCY-1) would have no canonical ThermalState shape to read.

This task delivers the typed boundary every consumer reads against and every implementation conforms to. It writes no runtime logic — the concrete TRT / ONNX-RT / PyTorch strategies are AZ-298 / AZ-299 / AZ-300.

Outcome

  • A InferenceRuntime Protocol (PEP 544 typing.Protocol) is exported from src/gps_denied_onboard/components/c7_inference/interface.py and re-exported from the component's __init__.py.
  • The DTOs BuildConfig, EngineCacheEntry, EngineHandle, ThermalState are dataclasses (frozen) at the same import path; field shape and invariants match the contract file.
  • The runtime error taxonomy is a single hierarchy under c7_inference.errors: RuntimeError ← {EngineBuildError, EngineDeserializeError, EngineHashMismatchError, EngineSchemaMismatchError, EngineSidecarMissingError, CalibrationCacheError, InferenceError, OutOfMemoryError, TelemetryUnavailableError}. Every implementation raises only these; consumers catch only these.
  • The composition root has a build_inference_runtime(config: Config) -> InferenceRuntime factory function that selects the strategy by config.inference.runtime (tensorrt | onnx_trt_ep | pytorch_fp16) and respects compile-time BUILD_* gating: requesting a strategy whose BUILD_* flag is OFF raises RuntimeNotAvailableError at composition time (NOT at first inference).
  • Every implementation's current_runtime_label() returns the lowercase label matching the config value ("tensorrt", "onnx_trt_ep", "pytorch_fp16"); this is the FDR-stamped label for AC-NEW-3 audit.
  • A frozen contract file at _docs/02_document/contracts/c7_inference/inference_runtime_protocol.md carries the full shape; consumers read that file, not this task spec.

Scope

Included

  • InferenceRuntime Protocol with the six methods from _docs/02_document/components/09_c7_inference/description.md § 2: compile_engine, deserialize_engine, infer, release_engine, thermal_state, current_runtime_label.
  • DTO dataclasses for BuildConfig, EngineCacheEntry, EngineHandle (opaque marker class), ThermalState. All frozen except EngineHandle (which is opaque to consumers — implementations subclass).
  • Error hierarchy under c7_inference.errors; every error type the Protocol promises; all are derived from a common c7_inference.errors.RuntimeError so consumers can catch the family.
  • build_inference_runtime(config) -> InferenceRuntime composition-root factory in src/gps_denied_onboard/runtime_root/inference_factory.py. Imports the concrete strategy lazily — guarded by if BUILD_TENSORRT_RUNTIME: from c7_inference.tensorrt_runtime import TensorrtRuntime so an OFF flag does not force an import.
  • A RuntimeNotAvailableError raised by the factory when the requested strategy is not built into this binary.
  • A ConfigSchemaError extension to AZ-269's config loader for the new config.inference.runtime enum + the optional config.inference.thermal_poll_hz (default 1.0) + config.inference.engine_cache_dir fields.
  • The contract file at _docs/02_document/contracts/c7_inference/inference_runtime_protocol.md filled per decompose/templates/api-contract.md with Shape, Invariants, Non-Goals, Versioning Rules, and at least three Test Cases.
  • Type-only unit tests that verify each concrete strategy module's class actually conforms to the Protocol via runtime_checkable + isinstance (catches drift at CI time, not deployment).

Excluded

  • TensorrtRuntime implementation — AZ-298.
  • OnnxTrtEpRuntime implementation — AZ-299.
  • PytorchFp16Runtime implementation — AZ-300.
  • EngineGate validator — AZ-301 (this task defines the error types it raises, not the validator).
  • Background thermal-state polling loop — AZ-302 (this task defines the ThermalState DTO and the thermal_state() Protocol method, not the polling thread).
  • C4 hybrid covariance-mode consumer wiring — owned by E-C4.
  • C10 CacheProvisioner consumer wiring of compile_engine — owned by E-C10.

Acceptance Criteria

AC-1: Protocol is conformance-checkable Given a class that implements all six Protocol methods with matching signatures When isinstance(impl, InferenceRuntime) is evaluated under runtime_checkable Then the result is True; for a class that omits any method, the result is False

AC-2: Frozen DTOs reject mutation Given a constructed BuildConfig(precision=Fp16, ...), EngineCacheEntry(...), or ThermalState(...) instance When the test attempts instance.precision = Int8 (or any field reassignment) Then dataclasses.FrozenInstanceError is raised; the original value is preserved

AC-3: Error hierarchy catchable as a single family Given any of the nine documented error subtypes When the consumer wraps an implementation call in try: ... except c7_inference.errors.RuntimeError Then every documented subtype is caught; an unrelated Exception is NOT caught (the Protocol's error envelope does not leak into general exception handling)

AC-4: Composition-root factory honours config Given config.inference.runtime = "tensorrt" and BUILD_TENSORRT_RUNTIME=ON When build_inference_runtime(config) is called Then a TensorrtRuntime instance is returned and instance.current_runtime_label() == "tensorrt"

AC-5: Composition-root factory honours BUILD flag gate Given config.inference.runtime = "tensorrt" and BUILD_TENSORRT_RUNTIME=OFF When build_inference_runtime(config) is called Then RuntimeNotAvailableError is raised at composition time with a message naming "tensorrt"; no module-level import of TRT symbols has occurred (verifiable via sys.modules)

AC-6: Unknown runtime label rejected at config load Given config.inference.runtime = "tensorflow_lite" (not in the enum) When the config is loaded via AZ-269's loader Then ConfigSchemaError is raised at load time with a message listing the valid values; build_inference_runtime is never reached

AC-7: current_runtime_label() matches config value exactly Given any selectable runtime When instance.current_runtime_label() is called Then the returned string is one of "tensorrt", "onnx_trt_ep", "pytorch_fp16" and equals config.inference.runtime; AC-NEW-3 audit relies on this exact-match property

AC-8: Contract file matches Protocol shape Given the contract file at _docs/02_document/contracts/c7_inference/inference_runtime_protocol.md When a contract-test parses the Shape section's method/field tables and compares against the runtime Protocol via introspection Then every method, every field, every error type is present and consistent in both

Non-Functional Requirements

Compatibility

  • The Protocol is typing.Protocol (PEP 544 structural typing) so existing components that import the concrete TRT class today (none yet — this is greenfield) can be retrofitted without inheritance changes.
  • All error types subclass Exception (not BaseException) so except Exception: in upstream layers continues to work as expected.

Performance

  • The factory build_inference_runtime returns within 200 ms (it imports + constructs one strategy; the heavy GPU work happens inside the strategy's own compile_engine / deserialize_engine calls — not the factory).
  • DTO construction (BuildConfig, EngineCacheEntry, ThermalState) is dataclass-frozen; per-instance overhead is the bare-cost dataclass __init__.

Reliability

  • The Protocol is the boundary of acceptable runtime errors. Implementations MUST NOT raise other types into consumers; if a third-party library (TRT, ONNX-RT, PyTorch) raises something else, the implementation catches and rewraps into the documented family.
  • Versioning: any breaking change to the Protocol or its DTOs MUST bump the contract file's Version and notify every consumer task listed in the contract header.

Unit Tests

AC Ref What to Test Required Outcome
AC-1 runtime_checkable Protocol vs. a fully-implementing fake; vs. a fake missing one method isinstance returns True for full, False for partial
AC-2 Mutation attempt on each frozen DTO FrozenInstanceError raised; original value preserved
AC-3 Raise each of the nine error subtypes; catch as c7_inference.errors.RuntimeError All caught; an unrelated ValueError is NOT caught by the same handler
AC-4 build_inference_runtime with tensorrt + flag ON → fake TensorrtRuntime Returned instance is TensorrtRuntime; current_runtime_label() == "tensorrt"
AC-5 build_inference_runtime with tensorrt + flag OFF RuntimeNotAvailableError; sys.modules does NOT contain c7_inference.tensorrt_runtime
AC-6 Config load with invalid runtime value ConfigSchemaError; valid values listed in message
AC-7 current_runtime_label() for each strategy Matches the config value used to construct it
AC-8 Contract introspection vs. Protocol introspection Shape parity test passes
NFR-perf-factory Microbench build_inference_runtime × 1000 p99 ≤ 200 ms (dominated by lazy import on first call; subsequent calls << 1 ms)
NFR-reliability-error-family All nine subtypes inherit from c7_inference.errors.RuntimeError Verified via issubclass for each

Constraints

  • The Protocol uses typing.Protocol from stdlib; no third-party Protocol library is introduced.
  • DTO dataclasses use stdlib dataclasses with frozen=True; no pydantic or attrs dependency.
  • EngineHandle is an opaque marker class — consumers MUST NOT introspect its fields. Each strategy subclasses with implementation-specific state. The Protocol exposes EngineHandle as the type but consumers treat it as a token to pass back to the same strategy.
  • Lazy import of concrete strategies is mandatory. The factory's if BUILD_TENSORRT_RUNTIME: from c7_inference.tensorrt_runtime import TensorrtRuntime block is not optional — it is the mechanism by which Tier-0 workstation builds compile without TRT installed.
  • The contract file at _docs/02_document/contracts/c7_inference/inference_runtime_protocol.md is the source of truth. If the Protocol shape changes here without the contract updating, that is a Spec-Gap finding (High) per code-review skill Phase 2.
  • This task does NOT add new third-party dependencies — typing.Protocol, dataclasses, enum are stdlib.

Risks & Mitigation

Risk 1: Protocol drift between contract and code

  • Risk: Implementations diverge from the contract over time; consumers cannot tell which is canonical.
  • Mitigation: AC-8 contract-introspection test runs in CI; any drift fails the test before merge. The contract file's ## Test Cases section names this exact test.

Risk 2: Lazy-import gating is bypassed by a transitively-imported module

  • Risk: A consumer imports c7_inference (the package) and the package's __init__.py eagerly imports a concrete strategy, triggering the TRT import even when BUILD_TENSORRT_RUNTIME=OFF.
  • Mitigation: The package __init__.py re-exports ONLY the Protocol and DTOs and errors — it does NOT import any concrete strategy. AC-5 verifies via sys.modules that no strategy module is loaded during a Tier-0 factory call.

Risk 3: Error hierarchy widens silently

  • Risk: A future strategy adds a tenth error type without updating the contract or the family base class.
  • Mitigation: The contract file lists the canonical nine. Implementations MUST raise only members of c7_inference.errors.RuntimeError; a strategy raising a non-family error is a Spec-Gap finding (High) at code-review time. AC-3's catch-as-family test catches the obvious case.

Runtime Completeness

  • Named capability: typed Protocol + DTOs + error envelope + composition-root selection (architecture / E-C7 / ADR-001 + ADR-002 + ADR-009).
  • Production code that must exist: real Protocol declaration, real frozen DTOs, real error hierarchy, real composition-root factory with lazy-import gating, real config-loader extension for the runtime enum.
  • Allowed external stubs: tests MAY substitute fake strategy classes that conform to the Protocol; production wiring uses the real strategies from AZ-298 / AZ-299 / AZ-300.
  • Unacceptable substitutes: ABCs instead of typing.Protocol (would force inheritance changes downstream), pydantic.BaseModel instead of @dataclass(frozen=True) (would add a runtime validation layer this task does not need), eager imports of concrete strategies in __init__.py (would defeat BUILD_* gating), or a runtime: str config field without an enum (would lose the load-time validation in AC-6).

Contract

This task produces/implements the contract at _docs/02_document/contracts/c7_inference/inference_runtime_protocol.md. Consumers MUST read that file — not this task spec — to discover the interface.