mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-22 12:51:12 +00:00
[AZ-297] C7 InferenceRuntime: Protocol + DTOs + factory
Freezes the c7_inference Public API per _docs/02_document/contracts/c7_inference/inference_runtime_protocol.md v1.0.0: - InferenceRuntime Protocol (6 methods: compile_engine, deserialize_engine, infer, release_engine, thermal_state, current_runtime_label) in components/c7_inference/interface.py. - DTOs (PrecisionMode enum, OptimizationProfile, BuildConfig, EngineCacheEntry, EngineHandle opaque marker) in _types/inference.py — placed at the L1 types layer so C10 can re-export EngineCacheEntry without crossing the components.* boundary (AZ-270 AC-6). - ThermalState DTO expanded in _types/thermal.py from the AZ-355 forward-declared stub to the AZ-297 contract shape (cpu/gpu temp, thermal_throttle_active, measured_clock_mhz, measured_at_ns, is_telemetry_available). Invariant I-6: when telemetry is unavailable, throttle is False. - Error family rooted at c7_inference.errors.RuntimeError (9 subtypes: EngineBuildError, EngineDeserializeError, EngineHashMismatchError, EngineSchemaMismatchError, EngineSidecarMissingError, CalibrationCacheError, InferenceError, OutOfMemoryError, TelemetryUnavailableError). RuntimeNotAvailableError stays in runtime_root/errors.py — composition-time, outside the family. - C7InferenceConfig per-component config block (runtime label, thermal_poll_hz, engine_cache_dir) with constructor-time validation rejecting unknown runtime labels. - Composition-root factory build_inference_runtime in runtime_root/inference_factory.py with three BUILD_* gates (BUILD_TENSORRT_RUNTIME, BUILD_ONNX_TRT_EP_RUNTIME, BUILD_PYTORCH_FP16_RUNTIME). Concrete strategy modules are imported lazily via __import__ AFTER the flag check, so a Tier-0 build with the flag OFF MUST NOT load the strategy module (AC-5 / I-5; verifiable via sys.modules). - 37 conformance tests cover all 8 ACs + NFR-perf-factory (p99 build under 200 ms × 1000 calls) + NFR-reliability-error-family. AC-8 introspects the contract file's Shape table and asserts method parity against the runtime Protocol; also asserts all 9 error subtypes are documented. Retired the AZ-263 scaffolding EngineCacheEntry from _types/manifests.py (replaced by the AZ-297 canonical shape in _types/inference.py); updated the LightGlue-flavoured EngineHandle Protocol docstring in _types/manifests.py to rationalize its intentional dual existence with the C7 opaque EngineHandle (same name, different consumer-side cut, mirroring the C4/C5 ISam2GraphHandle pattern). Stale ThermalState.throttle docstring references in c4_pose/config.py, c4_pose/interface.py, and _types/pose.py updated to thermal_throttle_active. Full unit-test sweep: 843 passed, 2 pre-existing environment skips (cmake, actionlint). Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -1,167 +0,0 @@
|
||||
# C7 InferenceRuntime Protocol + Composition-Root Selection
|
||||
|
||||
**Task**: AZ-297_c7_runtime_protocol
|
||||
**Name**: C7 InferenceRuntime Protocol
|
||||
**Description**: Define the `InferenceRuntime` Protocol, its DTOs (`BuildConfig`, `EngineCacheEntry`, `EngineHandle`, `ThermalState`), the runtime error taxonomy, and the composition-root selection switch that wires exactly one of `TensorrtRuntime` / `OnnxTrtEpRuntime` / `PytorchFp16Runtime` at startup based on ADR-001 (config) and ADR-002 (`BUILD_*` flags). This is the foundational shared-API task for E-C7 — every other E-C7 task implements this Protocol, and five external components (C2, C2.5, C3, C3.5, C10) plus C4 (ThermalState consumer) depend on the contract this task freezes.
|
||||
**Complexity**: 3 points
|
||||
**Dependencies**: AZ-263_initial_structure, AZ-269_config_loader, AZ-266_log_module, AZ-280_sha256_sidecar, AZ-281_engine_filename_schema
|
||||
**Component**: c7_inference (epic AZ-249 / E-C7)
|
||||
**Tracker**: AZ-297
|
||||
**Epic**: AZ-249 (E-C7)
|
||||
|
||||
### Document Dependencies
|
||||
|
||||
- `_docs/02_document/contracts/shared_helpers/sha256_sidecar.md` — `EngineCacheEntry` carries the sha256 of the engine binary; this contract defines that representation.
|
||||
- `_docs/02_document/contracts/shared_helpers/engine_filename_schema.md` — `EngineCacheEntry` carries the parsed `(SM, JP, TRT, precision)` tuple from the filename schema.
|
||||
- `_docs/02_document/contracts/shared_config/composition_root_protocol.md` — runtime selection is a Config field; this contract defines the field and the runtime-label vocabulary.
|
||||
- `_docs/02_document/contracts/shared_logging/log_record_schema.md` — error events emitted by Protocol implementations use this log shape.
|
||||
|
||||
## Problem
|
||||
|
||||
Five different components (C2 VPR backbone, C2.5 ReRanker, C3 CrossDomainMatcher, C3.5 AdHoP, C10 CacheProvisioner) and one consumer of the thermal-throttle telemetry feed (C4 Pose) all need a single, frozen interface to the on-Jetson inference runtime. Without it:
|
||||
|
||||
- Each consumer would import a concrete TRT / ONNX-RT / PyTorch class directly, hard-coding the runtime choice and breaking ADR-001's runtime selectability.
|
||||
- `BUILD_TENSORRT_RUNTIME=OFF` (Tier-0 workstation) would not compile because consumers depend on TRT-specific symbols.
|
||||
- The composition root would have to know per-component which runtime is acceptable; today only ADR-001 (config) + ADR-002 (`BUILD_*` flags) decide.
|
||||
- Error handling would diverge per runtime; `EngineHashMismatchError` (D-C10-3) and `EngineSchemaMismatchError` (D-C10-7) would have different shapes per implementation, making the F2 takeoff abort path fragile.
|
||||
- The C4 hybrid covariance decision (D-CROSS-LATENCY-1) would have no canonical `ThermalState` shape to read.
|
||||
|
||||
This task delivers the typed boundary every consumer reads against and every implementation conforms to. It writes no runtime logic — the concrete TRT / ONNX-RT / PyTorch strategies are AZ-298 / AZ-299 / AZ-300.
|
||||
|
||||
## Outcome
|
||||
|
||||
- A `InferenceRuntime` Protocol (PEP 544 `typing.Protocol`) is exported from `src/gps_denied_onboard/components/c7_inference/interface.py` and re-exported from the component's `__init__.py`.
|
||||
- The DTOs `BuildConfig`, `EngineCacheEntry`, `EngineHandle`, `ThermalState` are dataclasses (frozen) at the same import path; field shape and invariants match the contract file.
|
||||
- The runtime error taxonomy is a single hierarchy under `c7_inference.errors`: `RuntimeError` ← {`EngineBuildError`, `EngineDeserializeError`, `EngineHashMismatchError`, `EngineSchemaMismatchError`, `EngineSidecarMissingError`, `CalibrationCacheError`, `InferenceError`, `OutOfMemoryError`, `TelemetryUnavailableError`}. Every implementation raises only these; consumers catch only these.
|
||||
- The composition root has a `build_inference_runtime(config: Config) -> InferenceRuntime` factory function that selects the strategy by `config.inference.runtime` (`tensorrt` | `onnx_trt_ep` | `pytorch_fp16`) and respects compile-time `BUILD_*` gating: requesting a strategy whose `BUILD_*` flag is OFF raises `RuntimeNotAvailableError` at composition time (NOT at first inference).
|
||||
- Every implementation's `current_runtime_label()` returns the lowercase label matching the config value (`"tensorrt"`, `"onnx_trt_ep"`, `"pytorch_fp16"`); this is the FDR-stamped label for AC-NEW-3 audit.
|
||||
- A frozen contract file at `_docs/02_document/contracts/c7_inference/inference_runtime_protocol.md` carries the full shape; consumers read that file, not this task spec.
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
|
||||
- `InferenceRuntime` Protocol with the six methods from `_docs/02_document/components/09_c7_inference/description.md` § 2: `compile_engine`, `deserialize_engine`, `infer`, `release_engine`, `thermal_state`, `current_runtime_label`.
|
||||
- DTO dataclasses for `BuildConfig`, `EngineCacheEntry`, `EngineHandle` (opaque marker class), `ThermalState`. All frozen except `EngineHandle` (which is opaque to consumers — implementations subclass).
|
||||
- Error hierarchy under `c7_inference.errors`; every error type the Protocol promises; all are derived from a common `c7_inference.errors.RuntimeError` so consumers can catch the family.
|
||||
- `build_inference_runtime(config) -> InferenceRuntime` composition-root factory in `src/gps_denied_onboard/runtime_root/inference_factory.py`. Imports the concrete strategy lazily — guarded by `if BUILD_TENSORRT_RUNTIME: from c7_inference.tensorrt_runtime import TensorrtRuntime` so an OFF flag does not force an import.
|
||||
- A `RuntimeNotAvailableError` raised by the factory when the requested strategy is not built into this binary.
|
||||
- A `ConfigSchemaError` extension to AZ-269's config loader for the new `config.inference.runtime` enum + the optional `config.inference.thermal_poll_hz` (default 1.0) + `config.inference.engine_cache_dir` fields.
|
||||
- The contract file at `_docs/02_document/contracts/c7_inference/inference_runtime_protocol.md` filled per `decompose/templates/api-contract.md` with Shape, Invariants, Non-Goals, Versioning Rules, and at least three Test Cases.
|
||||
- Type-only unit tests that verify each concrete strategy module's class actually conforms to the Protocol via `runtime_checkable` + `isinstance` (catches drift at CI time, not deployment).
|
||||
|
||||
### Excluded
|
||||
|
||||
- `TensorrtRuntime` implementation — AZ-298.
|
||||
- `OnnxTrtEpRuntime` implementation — AZ-299.
|
||||
- `PytorchFp16Runtime` implementation — AZ-300.
|
||||
- `EngineGate` validator — AZ-301 (this task defines the error types it raises, not the validator).
|
||||
- Background thermal-state polling loop — AZ-302 (this task defines the `ThermalState` DTO and the `thermal_state()` Protocol method, not the polling thread).
|
||||
- C4 hybrid covariance-mode consumer wiring — owned by E-C4.
|
||||
- C10 CacheProvisioner consumer wiring of `compile_engine` — owned by E-C10.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: Protocol is conformance-checkable**
|
||||
Given a class that implements all six Protocol methods with matching signatures
|
||||
When `isinstance(impl, InferenceRuntime)` is evaluated under `runtime_checkable`
|
||||
Then the result is `True`; for a class that omits any method, the result is `False`
|
||||
|
||||
**AC-2: Frozen DTOs reject mutation**
|
||||
Given a constructed `BuildConfig(precision=Fp16, ...)`, `EngineCacheEntry(...)`, or `ThermalState(...)` instance
|
||||
When the test attempts `instance.precision = Int8` (or any field reassignment)
|
||||
Then `dataclasses.FrozenInstanceError` is raised; the original value is preserved
|
||||
|
||||
**AC-3: Error hierarchy catchable as a single family**
|
||||
Given any of the nine documented error subtypes
|
||||
When the consumer wraps an implementation call in `try: ... except c7_inference.errors.RuntimeError`
|
||||
Then every documented subtype is caught; an unrelated `Exception` is NOT caught (the Protocol's error envelope does not leak into general exception handling)
|
||||
|
||||
**AC-4: Composition-root factory honours config**
|
||||
Given `config.inference.runtime = "tensorrt"` and `BUILD_TENSORRT_RUNTIME=ON`
|
||||
When `build_inference_runtime(config)` is called
|
||||
Then a `TensorrtRuntime` instance is returned and `instance.current_runtime_label() == "tensorrt"`
|
||||
|
||||
**AC-5: Composition-root factory honours BUILD flag gate**
|
||||
Given `config.inference.runtime = "tensorrt"` and `BUILD_TENSORRT_RUNTIME=OFF`
|
||||
When `build_inference_runtime(config)` is called
|
||||
Then `RuntimeNotAvailableError` is raised at composition time with a message naming `"tensorrt"`; no module-level import of TRT symbols has occurred (verifiable via `sys.modules`)
|
||||
|
||||
**AC-6: Unknown runtime label rejected at config load**
|
||||
Given `config.inference.runtime = "tensorflow_lite"` (not in the enum)
|
||||
When the config is loaded via AZ-269's loader
|
||||
Then `ConfigSchemaError` is raised at load time with a message listing the valid values; `build_inference_runtime` is never reached
|
||||
|
||||
**AC-7: `current_runtime_label()` matches config value exactly**
|
||||
Given any selectable runtime
|
||||
When `instance.current_runtime_label()` is called
|
||||
Then the returned string is one of `"tensorrt"`, `"onnx_trt_ep"`, `"pytorch_fp16"` and equals `config.inference.runtime`; AC-NEW-3 audit relies on this exact-match property
|
||||
|
||||
**AC-8: Contract file matches Protocol shape**
|
||||
Given the contract file at `_docs/02_document/contracts/c7_inference/inference_runtime_protocol.md`
|
||||
When a contract-test parses the Shape section's method/field tables and compares against the runtime Protocol via introspection
|
||||
Then every method, every field, every error type is present and consistent in both
|
||||
|
||||
## Non-Functional Requirements
|
||||
|
||||
**Compatibility**
|
||||
- The Protocol is `typing.Protocol` (PEP 544 structural typing) so existing components that import the concrete TRT class today (none yet — this is greenfield) can be retrofitted without inheritance changes.
|
||||
- All error types subclass `Exception` (not `BaseException`) so `except Exception:` in upstream layers continues to work as expected.
|
||||
|
||||
**Performance**
|
||||
- The factory `build_inference_runtime` returns within 200 ms (it imports + constructs one strategy; the heavy GPU work happens inside the strategy's own `compile_engine` / `deserialize_engine` calls — not the factory).
|
||||
- DTO construction (`BuildConfig`, `EngineCacheEntry`, `ThermalState`) is dataclass-frozen; per-instance overhead is the bare-cost dataclass `__init__`.
|
||||
|
||||
**Reliability**
|
||||
- The Protocol is the boundary of acceptable runtime errors. Implementations MUST NOT raise other types into consumers; if a third-party library (TRT, ONNX-RT, PyTorch) raises something else, the implementation catches and rewraps into the documented family.
|
||||
- Versioning: any breaking change to the Protocol or its DTOs MUST bump the contract file's `Version` and notify every consumer task listed in the contract header.
|
||||
|
||||
## Unit Tests
|
||||
|
||||
| AC Ref | What to Test | Required Outcome |
|
||||
|--------|-------------|-----------------|
|
||||
| AC-1 | `runtime_checkable` Protocol vs. a fully-implementing fake; vs. a fake missing one method | `isinstance` returns True for full, False for partial |
|
||||
| AC-2 | Mutation attempt on each frozen DTO | `FrozenInstanceError` raised; original value preserved |
|
||||
| AC-3 | Raise each of the nine error subtypes; catch as `c7_inference.errors.RuntimeError` | All caught; an unrelated `ValueError` is NOT caught by the same handler |
|
||||
| AC-4 | `build_inference_runtime` with `tensorrt` + flag ON → fake `TensorrtRuntime` | Returned instance is `TensorrtRuntime`; `current_runtime_label()` == `"tensorrt"` |
|
||||
| AC-5 | `build_inference_runtime` with `tensorrt` + flag OFF | `RuntimeNotAvailableError`; `sys.modules` does NOT contain `c7_inference.tensorrt_runtime` |
|
||||
| AC-6 | Config load with invalid `runtime` value | `ConfigSchemaError`; valid values listed in message |
|
||||
| AC-7 | `current_runtime_label()` for each strategy | Matches the config value used to construct it |
|
||||
| AC-8 | Contract introspection vs. Protocol introspection | Shape parity test passes |
|
||||
| NFR-perf-factory | Microbench `build_inference_runtime` × 1000 | p99 ≤ 200 ms (dominated by lazy import on first call; subsequent calls << 1 ms) |
|
||||
| NFR-reliability-error-family | All nine subtypes inherit from `c7_inference.errors.RuntimeError` | Verified via `issubclass` for each |
|
||||
|
||||
## Constraints
|
||||
|
||||
- The Protocol uses `typing.Protocol` from stdlib; no third-party Protocol library is introduced.
|
||||
- DTO dataclasses use stdlib `dataclasses` with `frozen=True`; no `pydantic` or `attrs` dependency.
|
||||
- `EngineHandle` is an opaque marker class — consumers MUST NOT introspect its fields. Each strategy subclasses with implementation-specific state. The Protocol exposes `EngineHandle` as the type but consumers treat it as a token to pass back to the same strategy.
|
||||
- Lazy import of concrete strategies is mandatory. The factory's `if BUILD_TENSORRT_RUNTIME: from c7_inference.tensorrt_runtime import TensorrtRuntime` block is not optional — it is the mechanism by which Tier-0 workstation builds compile without TRT installed.
|
||||
- The contract file at `_docs/02_document/contracts/c7_inference/inference_runtime_protocol.md` is the source of truth. If the Protocol shape changes here without the contract updating, that is a Spec-Gap finding (High) per code-review skill Phase 2.
|
||||
- This task does NOT add new third-party dependencies — `typing.Protocol`, `dataclasses`, `enum` are stdlib.
|
||||
|
||||
## Risks & Mitigation
|
||||
|
||||
**Risk 1: Protocol drift between contract and code**
|
||||
- *Risk*: Implementations diverge from the contract over time; consumers cannot tell which is canonical.
|
||||
- *Mitigation*: AC-8 contract-introspection test runs in CI; any drift fails the test before merge. The contract file's `## Test Cases` section names this exact test.
|
||||
|
||||
**Risk 2: Lazy-import gating is bypassed by a transitively-imported module**
|
||||
- *Risk*: A consumer imports `c7_inference` (the package) and the package's `__init__.py` eagerly imports a concrete strategy, triggering the TRT import even when `BUILD_TENSORRT_RUNTIME=OFF`.
|
||||
- *Mitigation*: The package `__init__.py` re-exports ONLY the Protocol and DTOs and errors — it does NOT import any concrete strategy. AC-5 verifies via `sys.modules` that no strategy module is loaded during a Tier-0 factory call.
|
||||
|
||||
**Risk 3: Error hierarchy widens silently**
|
||||
- *Risk*: A future strategy adds a tenth error type without updating the contract or the family base class.
|
||||
- *Mitigation*: The contract file lists the canonical nine. Implementations MUST raise only members of `c7_inference.errors.RuntimeError`; a strategy raising a non-family error is a Spec-Gap finding (High) at code-review time. AC-3's catch-as-family test catches the obvious case.
|
||||
|
||||
## Runtime Completeness
|
||||
|
||||
- **Named capability**: typed Protocol + DTOs + error envelope + composition-root selection (architecture / E-C7 / ADR-001 + ADR-002 + ADR-009).
|
||||
- **Production code that must exist**: real Protocol declaration, real frozen DTOs, real error hierarchy, real composition-root factory with lazy-import gating, real config-loader extension for the runtime enum.
|
||||
- **Allowed external stubs**: tests MAY substitute fake strategy classes that conform to the Protocol; production wiring uses the real strategies from AZ-298 / AZ-299 / AZ-300.
|
||||
- **Unacceptable substitutes**: ABCs instead of `typing.Protocol` (would force inheritance changes downstream), `pydantic.BaseModel` instead of `@dataclass(frozen=True)` (would add a runtime validation layer this task does not need), eager imports of concrete strategies in `__init__.py` (would defeat `BUILD_*` gating), or a `runtime: str` config field without an enum (would lose the load-time validation in AC-6).
|
||||
|
||||
## Contract
|
||||
|
||||
This task produces/implements the contract at `_docs/02_document/contracts/c7_inference/inference_runtime_protocol.md`.
|
||||
Consumers MUST read that file — not this task spec — to discover the interface.
|
||||
Reference in New Issue
Block a user