[AZ-298] C7 TensorrtRuntime: TRT 10.3 + INT8 calib trust + GPU budget

Implement the production-default InferenceRuntime strategy on JetPack 6.2 + TensorRT 10.3 (per D-C7-9). The runtime owns the full TRT lifecycle: compile_engine via the Polygraphy + trtexec + IBuilderConfig hybrid (FP16 / INT8 / Mixed precision), deserialize_engine with EngineGate-first ordering and a pre-allocation GPU memory budget gate, infer via H2D -> enqueueV3 -> D2H -> stream sync on the owned CUDA stream, idempotent release_engine, and an injected ThermalStatePublisher delegation for thermal_state. INT8 calibration cache trust (D-C10-6, AC-2/3/4) is enforced by a .calib_cache.sha256 file-integrity sidecar (AZ-280) plus a new .calib_cache.dataset_sha256 sidecar that records the dataset content hash at compile time; reuse only when both agree, rebuild silently on dataset hash mismatch, raise CalibrationCacheError on corrupt sidecar (never silently overwritten). GPU memory budget (NFT-LIM-01, default 4 GiB) is checked BEFORE any TRT call beyond the gate (AC-6); a pre-allocation refusal raises OutOfMemoryError and leaves the resident state unchanged. TensorRT 10.3 / Polygraphy / PyCUDA are lazy-imported inside the methods that need them so the module loads cleanly on Tier-0 hosts. A standalone CLI entry (python -m gps_denied_onboard.components.c7_inference.tensorrt_runtime compile <onnx> <build_config.json>) is wired for C10 CacheProvisioner (AZ-321) to invoke pre-flight without holding a runtime instance. C7InferenceConfig gains gpu_memory_budget_bytes (default 4 GiB) and trtexec_timeout_s (default 600 s, Risk 4 mitigation), both validated in __post_init__. Tests: 26 active + 6 Tier-2-gated skips; AC-1 / AC-3 / AC-4 / AC-5 / AC-6 / AC-7 / AC-10 + NFR-reliability fully covered on Tier-1 via fake CUDA / TRT modules; AC-2 / AC-8 / AC-9 / NFR-perf-deserialize placeholders skip with prerequisite reason and live in the AZ-298 Tier-2 microbench harness. Code review verdict PASS_WITH_WARNINGS (1 Medium hot-path hoist fix auto-applied). Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-22 22:01:14 +00:00 · 2026-05-12 23:11:49 +03:00
parent 54942f3052
commit 18a69022b3
9 changed files with 2307 additions and 10 deletions
@@ -0,0 +1,62 @@
+# Code Review Report — Batch 31 / Cycle 1
+
+**Batch**: 31
+**Tasks**: AZ-298 (C7 TensorrtRuntime)
+**Date**: 2026-05-12
+**Verdict**: PASS_WITH_WARNINGS
+
+## Findings
+
+| # | Severity | Category | File:Line | Title |
+|---|----------|----------|-----------|-------|
+| 1 | Medium | Performance | `src/gps_denied_onboard/components/c7_inference/tensorrt_runtime.py::infer` | `_load_trt()` called inside per-output loop |
+| 2 | Low | Maintainability | `tensorrt_runtime.py::_predicted_deserialize_bytes` | 256 MiB flat fallback when `extras["opt_buffer_bytes"]` is absent |
+| 3 | Low | Test-quality | `tests/unit/c7_inference/test_tensorrt_runtime.py::test_infer_orders_h2d_enqueue_d2h_sync` | Ordering verified via fake-module call counts, not a real CUDA event trace |
+| 4 | Low | Architecture | `tensorrt_runtime.py::infer` | Access to `TrtEngineHandle._input_buffers` / `_output_buffers` via slot names |
+| 5 | Low | Scope | `tensorrt_runtime.py::_safe_del` | `del resource` only drops a local reference; helper is mostly defensive log-warn |
+
+### Finding Details
+
+**F1: `_load_trt()` called inside per-output loop** (Medium / Performance)
+- Location: `src/gps_denied_onboard/components/c7_inference/tensorrt_runtime.py` — `TensorrtRuntime.infer`
+- Description: The original implementation called `self._load_trt()` inside the per-output binding for-loop. The lazy import is module-cached so subsequent calls are cheap, but the attribute lookup + the try/except inside a hot path adds avoidable overhead.
+- Suggestion: Hoist `trt = self._load_trt()` above the loops (alongside `cuda, _ = self._load_pycuda()`).
+- Task: AZ-298
+- Resolution: **AUTO-FIXED** in this batch.
+
+**F2: 256 MiB flat fallback in `_predicted_deserialize_bytes`** (Low / Maintainability)
+- Location: `tensorrt_runtime.py::_predicted_deserialize_bytes`
+- Description: When `EngineCacheEntry.extras["opt_buffer_bytes"]` is missing (engine produced by an older compile path), the budget gate uses a flat 256 MiB upper-bound. This is conservative for typical engines but can underestimate for engines with very large profiles.
+- Suggestion: `compile_engine` already stamps the field. Tighten the fallback only if an externally-produced engine appears in the cache; today the path is dormant.
+- Task: AZ-298
+- Resolution: Open (Low) — accepted as documented.
+
+**F3: Fake-module call-count ordering** (Low / Test-quality)
+- Location: `tests/unit/c7_inference/test_tensorrt_runtime.py::test_infer_orders_h2d_enqueue_d2h_sync`
+- Description: Verifies H2D → enqueueV3 → D2H → sync via fake CUDA/TRT modules counting calls and asserting on a single linear flow. Does not capture a real CUDA event trace.
+- Suggestion: The Tier-2 placeholder `test_ac7_real_infer_records_cuda_event_sequence` exists for the real event trace on Jetson; no change needed here.
+- Task: AZ-298
+- Resolution: Open (Low) — accepted as documented.
+
+**F4: Slot-name access in `infer`** (Low / Architecture)
+- Location: `tensorrt_runtime.py::TensorrtRuntime.infer`
+- Description: `infer` reads `handle._input_buffers`, `handle._output_buffers`, `handle._exec_context`, etc. via the slot names declared on `TrtEngineHandle`. Per Invariant I-4 those fields are private to `TensorrtRuntime`, so the access is intra-class and the test code stays inside the c7_inference component boundary.
+- Suggestion: None — the alternative (a getter method per field) would slow the hot path without contract gain.
+- Task: AZ-298
+- Resolution: Open (Low) — accepted as documented.
+
+**F5: `_safe_del` is mostly defensive** (Low / Scope)
+- Location: `tensorrt_runtime.py::_safe_del`
+- Description: The helper calls `del resource` which only drops a local reference inside the helper scope; the real teardown happens when the caller drops its own reference. The helper exists as a single explicit place to swallow + WARN-log unusual teardown errors.
+- Suggestion: Acceptable. The PyCUDA / TRT C++ shims hook destructors that fire when the last Python reference is released — `_safe_del` documents that contract in one place.
+- Task: AZ-298
+- Resolution: Open (Low) — accepted as documented.
+
+## Verdict Logic
+
+- 0 Critical
+- 0 High
+- 1 Medium (auto-fixed in this batch)
+- 4 Low
+
+→ **PASS_WITH_WARNINGS**: only Medium / Low findings; Medium was auto-fixed.