Decompose Step 6 snapshot: 140 task specs + contract docs

Closes out greenfield Step 6 (Decompose) for all 14 components (C1-C13 + cross-cutting helpers/replay). Covers tasks AZ-266..AZ-446 plus the _dependencies_table.md and component contract documents. State file updated to greenfield Step 7 (Implement), not_started. Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-22 09:41:13 +00:00 · 2026-05-11 00:39:48 +03:00
parent 8171fcb29e
commit 880eabcb3f
172 changed files with 22897 additions and 35 deletions
@@ -0,0 +1,176 @@
+# Contract: InferenceRuntime Protocol
+
+**Component**: c7_inference
+**Producer task**: AZ-297 — `_docs/02_tasks/todo/AZ-297_c7_runtime_protocol.md`
+**Consumer tasks**:
+- AZ-298 (TensorrtRuntime — implements)
+- AZ-299 (OnnxTrtEpRuntime — implements)
+- AZ-300 (PytorchFp16Runtime — implements)
+- AZ-301 (EngineGate — uses error types)
+- AZ-302 (ThermalState publisher — extends `ThermalState` DTO with `is_telemetry_available`)
+- TBD at decompose time: E-C2 (AZ-250), E-C2.5 (AZ-251), E-C3 (AZ-252), E-C3.5 (AZ-253), E-C4 (AZ-254 — `ThermalState` consumer), E-C10 (AZ-257 — `compile_engine` caller)
+**Version**: 1.0.0
+**Status**: draft
+**Last Updated**: 2026-05-10
+
+## Purpose
+
+Defines the typed boundary between the on-Jetson inference runtime (engine compilation, deserialisation, per-call inference, GPU memory management, thermal-throttle telemetry) and every downstream component that depends on GPU inference. The Protocol is the single point of contact that lets ADR-001 select between three concrete strategies (TensorRT 10.3 production, ONNX Runtime + TRT EP fallback, PyTorch FP16 simple-baseline) at startup without consumers caring which is wired.
+
+## Shape
+
+### Protocol surface
+
+The Protocol is `typing.Protocol` (PEP 544 structural typing) with `runtime_checkable=True`.
+
+| Method | Signature | Throws / Errors | Blocking? |
+|--------|-----------|-----------------|-----------|
+| `compile_engine` | `(model_path: Path, build_config: BuildConfig) -> EngineCacheEntry` | `EngineBuildError`, `CalibrationCacheError` | sync (offline; minutes for INT8) |
+| `deserialize_engine` | `(entry: EngineCacheEntry) -> EngineHandle` | `EngineDeserializeError`, `EngineHashMismatchError`, `EngineSchemaMismatchError`, `EngineSidecarMissingError`, `OutOfMemoryError` | sync |
+| `infer` | `(handle: EngineHandle, inputs: dict[str, np.ndarray]) -> dict[str, np.ndarray]` | `InferenceError`, `OutOfMemoryError` | sync (GPU stream sync) |
+| `release_engine` | `(handle: EngineHandle) -> None` | — (idempotent) | sync |
+| `thermal_state` | `() -> ThermalState` | `TelemetryUnavailableError` (only on cold-start fail; steady-state defaults to `is_telemetry_available=False`) | sync |
+| `current_runtime_label` | `() -> Literal["tensorrt", "onnx_trt_ep", "pytorch_fp16"]` | — | sync |
+
+### DTOs
+
+All DTOs are stdlib `@dataclass(frozen=True)` (`EngineHandle` is the exception — opaque marker class).
+
+```python
+from dataclasses import dataclass
+from enum import Enum
+from pathlib import Path
+from typing import Optional
+
+
+class PrecisionMode(str, Enum):
+    FP16 = "fp16"
+    INT8 = "int8"
+    MIXED = "mixed"
+
+
+@dataclass(frozen=True)
+class OptimizationProfile:
+    input_name: str
+    min_shape: tuple[int, ...]
+    opt_shape: tuple[int, ...]
+    max_shape: tuple[int, ...]
+
+
+@dataclass(frozen=True)
+class BuildConfig:
+    precision: PrecisionMode
+    workspace_mb: int
+    calibration_dataset: Optional[Path]   # required for INT8; None for FP16/Mixed
+    optimization_profiles: tuple[OptimizationProfile, ...]
+    use_trtexec: bool = False             # TRT-only hint; ignored by ORT / PyTorch
+
+
+@dataclass(frozen=True)
+class EngineCacheEntry:
+    engine_path: Path                     # `.engine` for TRT/ORT; `.onnx` for ORT-direct; `.pt` for PyTorch
+    sha256_hex: str                       # canonical sha256 of engine_path
+    sm: Optional[int]                     # None for PyTorch (hardware-portable)
+    jp: Optional[str]                     # JetPack version, e.g. "6.2"
+    trt: Optional[str]                    # TensorRT version, e.g. "10.3"
+    precision: PrecisionMode
+    extras: dict[str, str]                # implementation-specific (e.g., calibration cache path)
+
+
+class EngineHandle:
+    """Opaque marker class. Consumers MUST NOT introspect; pass back to the same runtime."""
+    pass
+
+
+@dataclass(frozen=True)
+class ThermalState:
+    cpu_temp_c: Optional[float]
+    gpu_temp_c: Optional[float]
+    thermal_throttle_active: bool         # default False on telemetry unavailability
+    measured_clock_mhz: Optional[int]
+    measured_at_ns: int                   # monotonic_ns of poll
+    is_telemetry_available: bool          # False if the source is hung/absent (default-safe path)
+```
+
+### Error hierarchy
+
+All errors live under `c7_inference.errors`:
+
+```
+RuntimeError (Exception subclass — NOT stdlib RuntimeError)
+├── EngineBuildError
+├── EngineDeserializeError
+├── EngineHashMismatchError
+├── EngineSchemaMismatchError
+├── EngineSidecarMissingError
+├── CalibrationCacheError
+├── InferenceError
+├── OutOfMemoryError
+└── TelemetryUnavailableError
+
+RuntimeNotAvailableError (composition-root only; NOT a Protocol family error)
+ConfigSchemaError (config-load only; NOT a Protocol family error)
+```
+
+Consumers catch the family with `except c7_inference.errors.RuntimeError as e`. Implementations MUST raise only members of this family from Protocol methods; third-party library errors (TRT C++ exceptions, ORT internal errors, PyTorch CUDA errors) MUST be caught and rewrapped.
+
+### Composition-root factory
+
+Defined in `runtime_root/inference_factory.py` (NOT in `c7_inference` itself; the factory is the wiring layer):
+
+```python
+def build_inference_runtime(config: Config) -> InferenceRuntime:
+    """
+    Selects exactly one strategy by config.inference.runtime + BUILD_* flag gating.
+    Raises RuntimeNotAvailableError if the requested strategy's BUILD_* flag is OFF.
+    """
+```
+
+## Invariants
+
+- **I-1 (single source of truth for runtime label):** `current_runtime_label()` returns a string equal to `config.inference.runtime`. AC-NEW-3 audit relies on this exact-match property.
+- **I-2 (Protocol-family error envelope):** Every Protocol method raises only members of `c7_inference.errors.RuntimeError` family or returns normally. Third-party exceptions are caught and rewrapped.
+- **I-3 (frozen DTOs):** `BuildConfig`, `EngineCacheEntry`, `ThermalState`, and `OptimizationProfile` are `@dataclass(frozen=True)`. Mutation attempts raise `FrozenInstanceError`.
+- **I-4 (opaque EngineHandle):** Consumers MUST NOT introspect `EngineHandle` fields. Implementations subclass with private state; the Protocol surface is unchanged.
+- **I-5 (lazy-import gating):** Concrete strategies are imported only inside the factory's `if BUILD_*:` blocks. The package `__init__.py` exports only the Protocol, DTOs, and errors. A Tier-0 build with `BUILD_TENSORRT_RUNTIME=OFF` MUST NOT load `c7_inference.tensorrt_runtime` (verifiable via `sys.modules`).
+- **I-6 (default-safe thermal):** When `ThermalState.is_telemetry_available == False`, `ThermalState.thermal_throttle_active == False` (the steady-state default; consumers may choose to ignore the throttle bit when telemetry is unavailable).
+- **I-7 (idempotent release):** `release_engine(handle)` may be called more than once on the same handle; second-and-later calls return silently.
+- **I-8 (sync-stream `infer`):** `infer` returns only after the GPU stream has synchronised; the returned dict's tensors are host-resident (numpy arrays) and ready for consumer use.
+
+## Non-Goals
+
+- **Not covered: multi-stream concurrent inference.** One CUDA stream per Runtime instance this cycle. Future work if the F3 hot path becomes multi-threaded.
+- **Not covered: cross-process engine cache reuse.** Engines are per-process; a separate process must deserialise from the on-disk cache.
+- **Not covered: per-frame input/output type negotiation.** Inputs / outputs are numpy arrays in named dicts; type / dtype negotiation is per-strategy and per-engine.
+- **Not covered: streaming / iterative inference.** `infer` is request/response; no callbacks, no chunked outputs.
+- **Not covered: dynamic batch.** `OptimizationProfile` carries `min_shape / opt_shape / max_shape`, but the consumer is responsible for picking the actual runtime shape; the Protocol does not auto-batch.
+- **Not covered: engine versioning / hot-reload.** Engines are loaded at takeoff (F2) and held for the flight; a new engine requires a process restart.
+
+## Versioning Rules
+
+- **Breaking changes** (renamed method, removed field, type change, required→optional flip, error-class removed from family) require a new major version (`2.0.0`) and a deprecation path for every consumer task listed in the contract header. The change log MUST list each consuming task that needs a coordinated update.
+- **Non-breaking additions** (new optional method via Protocol structural compatibility, new optional field on a DTO with a default, new error variant added to the family) require a minor version bump (e.g., `1.1.0`).
+- **Patch** (clarification only; no shape change) is documentation-only.
+
+The current contract is `1.0.0` and includes the 1.1.0 anticipated extension `ThermalState.is_telemetry_available` from AZ-302 (added pre-freeze; will be `1.0.0` at first frozen freeze).
+
+## Test Cases
+
+| Case | Input | Expected | Notes |
+|------|-------|----------|-------|
+| protocol-conformance-full | A class implementing all six methods | `isinstance(impl, InferenceRuntime) == True` | AZ-297 AC-1 |
+| protocol-conformance-partial | A class missing `thermal_state` | `isinstance == False` | AZ-297 AC-1 |
+| frozen-dto-mutation | `BuildConfig(precision=Fp16, ...).precision = Int8` | `FrozenInstanceError` | AZ-297 AC-2 / I-3 |
+| error-family-catch-all | Raise each of the nine error subtypes | All caught by `except c7_inference.errors.RuntimeError` | AZ-297 AC-3 / I-2 |
+| factory-tensorrt-on | `config.inference.runtime="tensorrt"` + `BUILD_TENSORRT_RUNTIME=ON` | Returns `TensorrtRuntime`; label `"tensorrt"` | AZ-297 AC-4 |
+| factory-tensorrt-off | Same config + `BUILD_TENSORRT_RUNTIME=OFF` | `RuntimeNotAvailableError`; `sys.modules` does NOT contain `c7_inference.tensorrt_runtime` | AZ-297 AC-5 / I-5 |
+| factory-unknown-runtime | `config.inference.runtime="tensorflow_lite"` | `ConfigSchemaError` at config-load time | AZ-297 AC-6 |
+| label-exact-match | Runtime constructed for each of the three strategies | `current_runtime_label()` == `config.inference.runtime` | AZ-297 AC-7 / I-1 |
+| contract-introspection-parity | Parse this file's Shape section vs. the runtime Protocol | All methods, fields, errors match | AZ-297 AC-8 |
+| thermal-default-safe | `ThermalState(is_telemetry_available=False, thermal_throttle_active=True)` | Implementations MUST NOT construct this — invariant I-6 says `throttle_active=False` whenever `is_telemetry_available=False`. A test asserts the publisher's output respects this. | I-6 |
+
+## Change Log
+
+| Version | Date | Change | Author |
+|---------|------|--------|--------|
+| 1.0.0 | 2026-05-10 | Initial contract — Protocol + 4 DTOs + 9-error family + composition-root factory + lazy-import gating. Includes the `ThermalState.is_telemetry_available` field added by AZ-302 (no separate version bump because the field landed before first freeze). | autodev (AZ-297 / AZ-302 coordination) |