mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-22 09:41:13 +00:00
Decompose Step 6 snapshot: 140 task specs + contract docs
Closes out greenfield Step 6 (Decompose) for all 14 components (C1-C13 + cross-cutting helpers/replay). Covers tasks AZ-266..AZ-446 plus the _dependencies_table.md and component contract documents. State file updated to greenfield Step 7 (Implement), not_started. Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -0,0 +1,176 @@
|
||||
# Contract: InferenceRuntime Protocol
|
||||
|
||||
**Component**: c7_inference
|
||||
**Producer task**: AZ-297 — `_docs/02_tasks/todo/AZ-297_c7_runtime_protocol.md`
|
||||
**Consumer tasks**:
|
||||
- AZ-298 (TensorrtRuntime — implements)
|
||||
- AZ-299 (OnnxTrtEpRuntime — implements)
|
||||
- AZ-300 (PytorchFp16Runtime — implements)
|
||||
- AZ-301 (EngineGate — uses error types)
|
||||
- AZ-302 (ThermalState publisher — extends `ThermalState` DTO with `is_telemetry_available`)
|
||||
- TBD at decompose time: E-C2 (AZ-250), E-C2.5 (AZ-251), E-C3 (AZ-252), E-C3.5 (AZ-253), E-C4 (AZ-254 — `ThermalState` consumer), E-C10 (AZ-257 — `compile_engine` caller)
|
||||
**Version**: 1.0.0
|
||||
**Status**: draft
|
||||
**Last Updated**: 2026-05-10
|
||||
|
||||
## Purpose
|
||||
|
||||
Defines the typed boundary between the on-Jetson inference runtime (engine compilation, deserialisation, per-call inference, GPU memory management, thermal-throttle telemetry) and every downstream component that depends on GPU inference. The Protocol is the single point of contact that lets ADR-001 select between three concrete strategies (TensorRT 10.3 production, ONNX Runtime + TRT EP fallback, PyTorch FP16 simple-baseline) at startup without consumers caring which is wired.
|
||||
|
||||
## Shape
|
||||
|
||||
### Protocol surface
|
||||
|
||||
The Protocol is `typing.Protocol` (PEP 544 structural typing) with `runtime_checkable=True`.
|
||||
|
||||
| Method | Signature | Throws / Errors | Blocking? |
|
||||
|--------|-----------|-----------------|-----------|
|
||||
| `compile_engine` | `(model_path: Path, build_config: BuildConfig) -> EngineCacheEntry` | `EngineBuildError`, `CalibrationCacheError` | sync (offline; minutes for INT8) |
|
||||
| `deserialize_engine` | `(entry: EngineCacheEntry) -> EngineHandle` | `EngineDeserializeError`, `EngineHashMismatchError`, `EngineSchemaMismatchError`, `EngineSidecarMissingError`, `OutOfMemoryError` | sync |
|
||||
| `infer` | `(handle: EngineHandle, inputs: dict[str, np.ndarray]) -> dict[str, np.ndarray]` | `InferenceError`, `OutOfMemoryError` | sync (GPU stream sync) |
|
||||
| `release_engine` | `(handle: EngineHandle) -> None` | — (idempotent) | sync |
|
||||
| `thermal_state` | `() -> ThermalState` | `TelemetryUnavailableError` (only on cold-start fail; steady-state defaults to `is_telemetry_available=False`) | sync |
|
||||
| `current_runtime_label` | `() -> Literal["tensorrt", "onnx_trt_ep", "pytorch_fp16"]` | — | sync |
|
||||
|
||||
### DTOs
|
||||
|
||||
All DTOs are stdlib `@dataclass(frozen=True)` (`EngineHandle` is the exception — opaque marker class).
|
||||
|
||||
```python
|
||||
from dataclasses import dataclass
|
||||
from enum import Enum
|
||||
from pathlib import Path
|
||||
from typing import Optional
|
||||
|
||||
|
||||
class PrecisionMode(str, Enum):
|
||||
FP16 = "fp16"
|
||||
INT8 = "int8"
|
||||
MIXED = "mixed"
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class OptimizationProfile:
|
||||
input_name: str
|
||||
min_shape: tuple[int, ...]
|
||||
opt_shape: tuple[int, ...]
|
||||
max_shape: tuple[int, ...]
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class BuildConfig:
|
||||
precision: PrecisionMode
|
||||
workspace_mb: int
|
||||
calibration_dataset: Optional[Path] # required for INT8; None for FP16/Mixed
|
||||
optimization_profiles: tuple[OptimizationProfile, ...]
|
||||
use_trtexec: bool = False # TRT-only hint; ignored by ORT / PyTorch
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class EngineCacheEntry:
|
||||
engine_path: Path # `.engine` for TRT/ORT; `.onnx` for ORT-direct; `.pt` for PyTorch
|
||||
sha256_hex: str # canonical sha256 of engine_path
|
||||
sm: Optional[int] # None for PyTorch (hardware-portable)
|
||||
jp: Optional[str] # JetPack version, e.g. "6.2"
|
||||
trt: Optional[str] # TensorRT version, e.g. "10.3"
|
||||
precision: PrecisionMode
|
||||
extras: dict[str, str] # implementation-specific (e.g., calibration cache path)
|
||||
|
||||
|
||||
class EngineHandle:
|
||||
"""Opaque marker class. Consumers MUST NOT introspect; pass back to the same runtime."""
|
||||
pass
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class ThermalState:
|
||||
cpu_temp_c: Optional[float]
|
||||
gpu_temp_c: Optional[float]
|
||||
thermal_throttle_active: bool # default False on telemetry unavailability
|
||||
measured_clock_mhz: Optional[int]
|
||||
measured_at_ns: int # monotonic_ns of poll
|
||||
is_telemetry_available: bool # False if the source is hung/absent (default-safe path)
|
||||
```
|
||||
|
||||
### Error hierarchy
|
||||
|
||||
All errors live under `c7_inference.errors`:
|
||||
|
||||
```
|
||||
RuntimeError (Exception subclass — NOT stdlib RuntimeError)
|
||||
├── EngineBuildError
|
||||
├── EngineDeserializeError
|
||||
├── EngineHashMismatchError
|
||||
├── EngineSchemaMismatchError
|
||||
├── EngineSidecarMissingError
|
||||
├── CalibrationCacheError
|
||||
├── InferenceError
|
||||
├── OutOfMemoryError
|
||||
└── TelemetryUnavailableError
|
||||
|
||||
RuntimeNotAvailableError (composition-root only; NOT a Protocol family error)
|
||||
ConfigSchemaError (config-load only; NOT a Protocol family error)
|
||||
```
|
||||
|
||||
Consumers catch the family with `except c7_inference.errors.RuntimeError as e`. Implementations MUST raise only members of this family from Protocol methods; third-party library errors (TRT C++ exceptions, ORT internal errors, PyTorch CUDA errors) MUST be caught and rewrapped.
|
||||
|
||||
### Composition-root factory
|
||||
|
||||
Defined in `runtime_root/inference_factory.py` (NOT in `c7_inference` itself; the factory is the wiring layer):
|
||||
|
||||
```python
|
||||
def build_inference_runtime(config: Config) -> InferenceRuntime:
|
||||
"""
|
||||
Selects exactly one strategy by config.inference.runtime + BUILD_* flag gating.
|
||||
Raises RuntimeNotAvailableError if the requested strategy's BUILD_* flag is OFF.
|
||||
"""
|
||||
```
|
||||
|
||||
## Invariants
|
||||
|
||||
- **I-1 (single source of truth for runtime label):** `current_runtime_label()` returns a string equal to `config.inference.runtime`. AC-NEW-3 audit relies on this exact-match property.
|
||||
- **I-2 (Protocol-family error envelope):** Every Protocol method raises only members of `c7_inference.errors.RuntimeError` family or returns normally. Third-party exceptions are caught and rewrapped.
|
||||
- **I-3 (frozen DTOs):** `BuildConfig`, `EngineCacheEntry`, `ThermalState`, and `OptimizationProfile` are `@dataclass(frozen=True)`. Mutation attempts raise `FrozenInstanceError`.
|
||||
- **I-4 (opaque EngineHandle):** Consumers MUST NOT introspect `EngineHandle` fields. Implementations subclass with private state; the Protocol surface is unchanged.
|
||||
- **I-5 (lazy-import gating):** Concrete strategies are imported only inside the factory's `if BUILD_*:` blocks. The package `__init__.py` exports only the Protocol, DTOs, and errors. A Tier-0 build with `BUILD_TENSORRT_RUNTIME=OFF` MUST NOT load `c7_inference.tensorrt_runtime` (verifiable via `sys.modules`).
|
||||
- **I-6 (default-safe thermal):** When `ThermalState.is_telemetry_available == False`, `ThermalState.thermal_throttle_active == False` (the steady-state default; consumers may choose to ignore the throttle bit when telemetry is unavailable).
|
||||
- **I-7 (idempotent release):** `release_engine(handle)` may be called more than once on the same handle; second-and-later calls return silently.
|
||||
- **I-8 (sync-stream `infer`):** `infer` returns only after the GPU stream has synchronised; the returned dict's tensors are host-resident (numpy arrays) and ready for consumer use.
|
||||
|
||||
## Non-Goals
|
||||
|
||||
- **Not covered: multi-stream concurrent inference.** One CUDA stream per Runtime instance this cycle. Future work if the F3 hot path becomes multi-threaded.
|
||||
- **Not covered: cross-process engine cache reuse.** Engines are per-process; a separate process must deserialise from the on-disk cache.
|
||||
- **Not covered: per-frame input/output type negotiation.** Inputs / outputs are numpy arrays in named dicts; type / dtype negotiation is per-strategy and per-engine.
|
||||
- **Not covered: streaming / iterative inference.** `infer` is request/response; no callbacks, no chunked outputs.
|
||||
- **Not covered: dynamic batch.** `OptimizationProfile` carries `min_shape / opt_shape / max_shape`, but the consumer is responsible for picking the actual runtime shape; the Protocol does not auto-batch.
|
||||
- **Not covered: engine versioning / hot-reload.** Engines are loaded at takeoff (F2) and held for the flight; a new engine requires a process restart.
|
||||
|
||||
## Versioning Rules
|
||||
|
||||
- **Breaking changes** (renamed method, removed field, type change, required→optional flip, error-class removed from family) require a new major version (`2.0.0`) and a deprecation path for every consumer task listed in the contract header. The change log MUST list each consuming task that needs a coordinated update.
|
||||
- **Non-breaking additions** (new optional method via Protocol structural compatibility, new optional field on a DTO with a default, new error variant added to the family) require a minor version bump (e.g., `1.1.0`).
|
||||
- **Patch** (clarification only; no shape change) is documentation-only.
|
||||
|
||||
The current contract is `1.0.0` and includes the 1.1.0 anticipated extension `ThermalState.is_telemetry_available` from AZ-302 (added pre-freeze; will be `1.0.0` at first frozen freeze).
|
||||
|
||||
## Test Cases
|
||||
|
||||
| Case | Input | Expected | Notes |
|
||||
|------|-------|----------|-------|
|
||||
| protocol-conformance-full | A class implementing all six methods | `isinstance(impl, InferenceRuntime) == True` | AZ-297 AC-1 |
|
||||
| protocol-conformance-partial | A class missing `thermal_state` | `isinstance == False` | AZ-297 AC-1 |
|
||||
| frozen-dto-mutation | `BuildConfig(precision=Fp16, ...).precision = Int8` | `FrozenInstanceError` | AZ-297 AC-2 / I-3 |
|
||||
| error-family-catch-all | Raise each of the nine error subtypes | All caught by `except c7_inference.errors.RuntimeError` | AZ-297 AC-3 / I-2 |
|
||||
| factory-tensorrt-on | `config.inference.runtime="tensorrt"` + `BUILD_TENSORRT_RUNTIME=ON` | Returns `TensorrtRuntime`; label `"tensorrt"` | AZ-297 AC-4 |
|
||||
| factory-tensorrt-off | Same config + `BUILD_TENSORRT_RUNTIME=OFF` | `RuntimeNotAvailableError`; `sys.modules` does NOT contain `c7_inference.tensorrt_runtime` | AZ-297 AC-5 / I-5 |
|
||||
| factory-unknown-runtime | `config.inference.runtime="tensorflow_lite"` | `ConfigSchemaError` at config-load time | AZ-297 AC-6 |
|
||||
| label-exact-match | Runtime constructed for each of the three strategies | `current_runtime_label()` == `config.inference.runtime` | AZ-297 AC-7 / I-1 |
|
||||
| contract-introspection-parity | Parse this file's Shape section vs. the runtime Protocol | All methods, fields, errors match | AZ-297 AC-8 |
|
||||
| thermal-default-safe | `ThermalState(is_telemetry_available=False, thermal_throttle_active=True)` | Implementations MUST NOT construct this — invariant I-6 says `throttle_active=False` whenever `is_telemetry_available=False`. A test asserts the publisher's output respects this. | I-6 |
|
||||
|
||||
## Change Log
|
||||
|
||||
| Version | Date | Change | Author |
|
||||
|---------|------|--------|--------|
|
||||
| 1.0.0 | 2026-05-10 | Initial contract — Protocol + 4 DTOs + 9-error family + composition-root factory + lazy-import gating. Includes the `ThermalState.is_telemetry_available` field added by AZ-302 (no separate version bump because the field landed before first freeze). | autodev (AZ-297 / AZ-302 coordination) |
|
||||
Reference in New Issue
Block a user