mirror of https://github.com/azaion/gps-denied-onboard.git synced 2026-06-21 18:31:13 +00:00

Files

T

Oleksandr Bezdieniezhnykh 18a69022b3 [AZ-298] C7 TensorrtRuntime: TRT 10.3 + INT8 calib trust + GPU budget

Implement the production-default InferenceRuntime strategy on JetPack
6.2 + TensorRT 10.3 (per D-C7-9). The runtime owns the full TRT
lifecycle: compile_engine via the Polygraphy + trtexec + IBuilderConfig
hybrid (FP16 / INT8 / Mixed precision), deserialize_engine with
EngineGate-first ordering and a pre-allocation GPU memory budget gate,
infer via H2D -> enqueueV3 -> D2H -> stream sync on the owned CUDA
stream, idempotent release_engine, and an injected
ThermalStatePublisher delegation for thermal_state.

INT8 calibration cache trust (D-C10-6, AC-2/3/4) is enforced by a
.calib_cache.sha256 file-integrity sidecar (AZ-280) plus a new
.calib_cache.dataset_sha256 sidecar that records the dataset content
hash at compile time; reuse only when both agree, rebuild silently on
dataset hash mismatch, raise CalibrationCacheError on corrupt sidecar
(never silently overwritten).

GPU memory budget (NFT-LIM-01, default 4 GiB) is checked BEFORE any
TRT call beyond the gate (AC-6); a pre-allocation refusal raises
OutOfMemoryError and leaves the resident state unchanged.

TensorRT 10.3 / Polygraphy / PyCUDA are lazy-imported inside the
methods that need them so the module loads cleanly on Tier-0 hosts.
A standalone CLI entry (python -m
gps_denied_onboard.components.c7_inference.tensorrt_runtime compile
<onnx> <build_config.json>) is wired for C10 CacheProvisioner
(AZ-321) to invoke pre-flight without holding a runtime instance.

C7InferenceConfig gains gpu_memory_budget_bytes (default 4 GiB) and
trtexec_timeout_s (default 600 s, Risk 4 mitigation), both validated
in __post_init__.

Tests: 26 active + 6 Tier-2-gated skips; AC-1 / AC-3 / AC-4 / AC-5
/ AC-6 / AC-7 / AC-10 + NFR-reliability fully covered on Tier-1
via fake CUDA / TRT modules; AC-2 / AC-8 / AC-9 / NFR-perf-deserialize
placeholders skip with prerequisite reason and live in the AZ-298
Tier-2 microbench harness. Code review verdict
PASS_WITH_WARNINGS (1 Medium hot-path hoist fix auto-applied).

Co-authored-by: Cursor <cursoragent@cursor.com>

2026-05-12 23:11:49 +03:00

4.2 KiB

Raw Blame History

Code Review Report — Batch 31 / Cycle 1

Batch: 31 Tasks: AZ-298 (C7 TensorrtRuntime) Date: 2026-05-12 Verdict: PASS_WITH_WARNINGS

Findings

#	Severity	Category	File:Line	Title
1	Medium	Performance	`src/gps_denied_onboard/components/c7_inference/tensorrt_runtime.py::infer`	`_load_trt()` called inside per-output loop
2	Low	Maintainability	`tensorrt_runtime.py::_predicted_deserialize_bytes`	256 MiB flat fallback when `extras["opt_buffer_bytes"]` is absent
3	Low	Test-quality	`tests/unit/c7_inference/test_tensorrt_runtime.py::test_infer_orders_h2d_enqueue_d2h_sync`	Ordering verified via fake-module call counts, not a real CUDA event trace
4	Low	Architecture	`tensorrt_runtime.py::infer`	Access to `TrtEngineHandle._input_buffers` / `_output_buffers` via slot names
5	Low	Scope	`tensorrt_runtime.py::_safe_del`	`del resource` only drops a local reference; helper is mostly defensive log-warn

Finding Details

F1: _load_trt() called inside per-output loop (Medium / Performance)

Location: src/gps_denied_onboard/components/c7_inference/tensorrt_runtime.py — TensorrtRuntime.infer
Description: The original implementation called self._load_trt() inside the per-output binding for-loop. The lazy import is module-cached so subsequent calls are cheap, but the attribute lookup + the try/except inside a hot path adds avoidable overhead.
Suggestion: Hoist trt = self._load_trt() above the loops (alongside cuda, _ = self._load_pycuda()).
Task: AZ-298
Resolution: AUTO-FIXED in this batch.

F2: 256 MiB flat fallback in _predicted_deserialize_bytes (Low / Maintainability)

Location: tensorrt_runtime.py::_predicted_deserialize_bytes
Description: When EngineCacheEntry.extras["opt_buffer_bytes"] is missing (engine produced by an older compile path), the budget gate uses a flat 256 MiB upper-bound. This is conservative for typical engines but can underestimate for engines with very large profiles.
Suggestion: compile_engine already stamps the field. Tighten the fallback only if an externally-produced engine appears in the cache; today the path is dormant.
Task: AZ-298
Resolution: Open (Low) — accepted as documented.

F3: Fake-module call-count ordering (Low / Test-quality)

Location: tests/unit/c7_inference/test_tensorrt_runtime.py::test_infer_orders_h2d_enqueue_d2h_sync
Description: Verifies H2D → enqueueV3 → D2H → sync via fake CUDA/TRT modules counting calls and asserting on a single linear flow. Does not capture a real CUDA event trace.
Suggestion: The Tier-2 placeholder test_ac7_real_infer_records_cuda_event_sequence exists for the real event trace on Jetson; no change needed here.
Task: AZ-298
Resolution: Open (Low) — accepted as documented.

F4: Slot-name access in infer (Low / Architecture)

Location: tensorrt_runtime.py::TensorrtRuntime.infer
Description: infer reads handle._input_buffers, handle._output_buffers, handle._exec_context, etc. via the slot names declared on TrtEngineHandle. Per Invariant I-4 those fields are private to TensorrtRuntime, so the access is intra-class and the test code stays inside the c7_inference component boundary.
Suggestion: None — the alternative (a getter method per field) would slow the hot path without contract gain.
Task: AZ-298
Resolution: Open (Low) — accepted as documented.

F5: _safe_del is mostly defensive (Low / Scope)

Location: tensorrt_runtime.py::_safe_del
Description: The helper calls del resource which only drops a local reference inside the helper scope; the real teardown happens when the caller drops its own reference. The helper exists as a single explicit place to swallow + WARN-log unusual teardown errors.
Suggestion: Acceptable. The PyCUDA / TRT C++ shims hook destructors that fire when the last Python reference is released — _safe_del documents that contract in one place.
Task: AZ-298
Resolution: Open (Low) — accepted as documented.

Verdict Logic

0 Critical
0 High
1 Medium (auto-fixed in this batch)
4 Low

→ PASS_WITH_WARNINGS: only Medium / Low findings; Medium was auto-fixed.

4.2 KiB Raw Blame History

Code Review Report — Batch 31 / Cycle 1

Findings

Finding Details

Verdict Logic

4.2 KiB

Raw Blame History