[AZ-298] C7 TensorrtRuntime: TRT 10.3 + INT8 calib trust + GPU budget

Implement the production-default InferenceRuntime strategy on JetPack 6.2 + TensorRT 10.3 (per D-C7-9). The runtime owns the full TRT lifecycle: compile_engine via the Polygraphy + trtexec + IBuilderConfig hybrid (FP16 / INT8 / Mixed precision), deserialize_engine with EngineGate-first ordering and a pre-allocation GPU memory budget gate, infer via H2D -> enqueueV3 -> D2H -> stream sync on the owned CUDA stream, idempotent release_engine, and an injected ThermalStatePublisher delegation for thermal_state. INT8 calibration cache trust (D-C10-6, AC-2/3/4) is enforced by a .calib_cache.sha256 file-integrity sidecar (AZ-280) plus a new .calib_cache.dataset_sha256 sidecar that records the dataset content hash at compile time; reuse only when both agree, rebuild silently on dataset hash mismatch, raise CalibrationCacheError on corrupt sidecar (never silently overwritten). GPU memory budget (NFT-LIM-01, default 4 GiB) is checked BEFORE any TRT call beyond the gate (AC-6); a pre-allocation refusal raises OutOfMemoryError and leaves the resident state unchanged. TensorRT 10.3 / Polygraphy / PyCUDA are lazy-imported inside the methods that need them so the module loads cleanly on Tier-0 hosts. A standalone CLI entry (python -m gps_denied_onboard.components.c7_inference.tensorrt_runtime compile <onnx> <build_config.json>) is wired for C10 CacheProvisioner (AZ-321) to invoke pre-flight without holding a runtime instance. C7InferenceConfig gains gpu_memory_budget_bytes (default 4 GiB) and trtexec_timeout_s (default 600 s, Risk 4 mitigation), both validated in __post_init__. Tests: 26 active + 6 Tier-2-gated skips; AC-1 / AC-3 / AC-4 / AC-5 / AC-6 / AC-7 / AC-10 + NFR-reliability fully covered on Tier-1 via fake CUDA / TRT modules; AC-2 / AC-8 / AC-9 / NFR-perf-deserialize placeholders skip with prerequisite reason and live in the AZ-298 Tier-2 microbench harness. Code review verdict PASS_WITH_WARNINGS (1 Medium hot-path hoist fix auto-applied). Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-21 08:21:13 +00:00 · 2026-05-12 23:11:49 +03:00
parent 54942f3052
commit 18a69022b3
9 changed files with 2307 additions and 10 deletions
@@ -177,7 +177,7 @@ Bootstrap reference: `_docs/02_tasks/todo/AZ-263_initial_structure.md`. Architec
  - `manifest.py` (AZ-301; `DeploymentManifest` + `ManifestReader` for engine sidecar manifests)
  - `onnx_trt_runtime.py` (ONNX Runtime + TensorRT EP, pending)
  - `pytorch_fp16_runtime.py` (AZ-300; research-only / simple-baseline strategy)
-  - `tensorrt_runtime.py` (production-default; TensorRT 10.3, pending)
+  - `tensorrt_runtime.py` (AZ-298; production-default TensorRT 10.3 strategy + INT8 calibration cache trust + GPU memory budget enforcement + `python -m ...tensorrt_runtime compile ...` CLI)
  - `thermal_publisher.py` (AZ-302; 1 Hz background poller, jtop/NVML fallback)
 - **Owns**: `src/gps_denied_onboard/components/c7_inference/**`, `tests/unit/c7_inference/**`
 - **Imports from**: `_types`, `helpers.engine_filename_schema`, `helpers.sha256_sidecar`, `config`, `logging`, `fdr_client`