mirror of https://github.com/azaion/gps-denied-onboard.git synced 2026-06-21 19:51:12 +00:00

Files

T

Oleksandr Bezdieniezhnykh 64542d32fc Update autodev state, architecture documentation, and glossary terms

Transitioned the autodev state to phase 21, reflecting the completion of Step 5 and the drafting of Step 6 epics. Revised the architecture documentation to clarify the roles of the Tile Manager and its components, ensuring accurate representation of the system's operational flow. Updated glossary entries for Flight State and Operator to incorporate recent changes and enhance clarity on component interactions and responsibilities.

2026-05-10 00:21:34 +03:00

8.6 KiB

Raw Blame History

C7 — On-Jetson Inference Runtime

1. High-Level Overview

Purpose: provide a single inference-runtime abstraction that all GPU-using components (C1 selectively, C2, C2.5, C3, C3.5) consume. Owns engine compilation (Polygraphy / trtexec / IBuilderConfig hybrid), engine deserialization at takeoff load, GPU memory management, INT8 calibration cache trust, and the thermal-throttle telemetry feed that drives the D-CROSS-LATENCY-1 hybrid in C4.

Architectural Pattern: Strategy — InferenceRuntime interface with three concrete implementations: TensorrtRuntime (production-default per D-C7-9 JetPack 6.2 + TensorRT 10.3 lock), OnnxTrtEpRuntime (fallback), PytorchFp16Runtime (mandatory simple-baseline). Selection at startup by config (ADR-001), build-time gating by BUILD_* flags (ADR-002), composition-root wired (ADR-009).

Upstream dependencies:

C10 CacheProvisioner → during F1 (after C11 TileDownloader has populated C6) triggers engine compilation when no cached engine matches the (SM, JP, TRT, precision) tuple.
F2 takeoff load → triggers deserialize_cached_engine for every model used by C1/C2/C2.5/C3/C3.5.
jetson-stats / NVML → thermal-throttle telemetry source.

Downstream consumers:

C2 VPR (backbone forward pass).
C2.5 ReRanker (LightGlue forward pass).
C3 CrossDomainMatcher (DISK / LightGlue / ALIKED / XFeat forward passes).
C3.5 AdHoP (conditional refinement backbone).
C1 (only the strategies that have a CUDA path; KltRansac is CPU-only).
C4 (consumes ThermalState for the D-CROSS-LATENCY-1 covariance-mode decision).

2. Internal Interfaces

Interface: `InferenceRuntime`

Method	Input	Output	Async	Error Types
`compile_engine`	`model_path: Path, build_config: BuildConfig`	`EngineCacheEntry`	No (offline)	`EngineBuildError`, `CalibrationCacheError`
`deserialize_engine`	`EngineCacheEntry`	`EngineHandle`	No	`EngineDeserializeError`
`infer`	`EngineHandle, inputs: dict[str, Tensor]`	`dict[str, Tensor]`	No (sync GPU stream)	`InferenceError`, `OutOfMemoryError`
`release_engine`	`EngineHandle`	`None`	No	—
`thermal_state`	`()`	`ThermalState`	No	`TelemetryUnavailableError`
`current_runtime_label`	`()`	`string`	No	—

Input/Output DTOs:

BuildConfig:
  precision:                   enum {fp16, int8, mixed}
  workspace_mb:                int
  calibration_dataset:         Path (required for int8)
  optimization_profiles:       list[(input_name, min_shape, opt_shape, max_shape)]

EngineCacheEntry:              see data_model.md
EngineHandle:                  opaque GPU-resident handle

ThermalState:
  cpu_temp_c:                  float
  gpu_temp_c:                  float
  thermal_throttle_active:     bool
  measured_clock_mhz:          int
  measured_at:                 monotonic_ns

3. External API Specification

Not applicable.

4. Data Access Patterns

Queries

Query	Frequency	Hot Path	Index Needed
`infer` for VPR backbone	3 Hz	Yes	n/a
`infer` for LightGlue (×10 in C2.5, ×3 in C3)	3 Hz × 13 = 39 Hz	Yes	n/a
`infer` for AdHoP (conditional)	<1 Hz typical	Yes (when invoked)	n/a
`thermal_state` poll	1 Hz from C4	No (sampled, not per-frame)	n/a

Caching Strategy

Data	Cache Type	TTL	Invalidation
Compiled `.engine` files	filesystem keyed by `(SM, JP, TRT, precision)` (D-C10-7)	bounded by JetPack/TRT version stability	manifest content-hash gate at takeoff (D-C10-3)
INT8 calibration cache	filesystem alongside `.engine` (D-C10-6)	bounded by calibration dataset version	rebuild when calibration dataset hash changes
Resident engine handles	GPU memory	flight lifetime	F8 reboot recovery re-deserialises

Storage Estimates

Table/Collection	Est. Row Count (1yr)	Row Size	Total Size	Growth Rate
`.engine` files	one per (model × precision × backbone)	50 MB – 500 MB / engine	up to ~1.5 GB across all backbones for a deployment binary	bounded by AC-8.3 carve-out
INT8 calibration caches	one per engine	1–10 MB	<50 MB	as above

Data Management

Seed data: pre-flight F1 provisioning compiles engines (or reuses cached). No mid-flight compilation.

Rollback: D-C10-7 self-describing filename schema (<model>__sm<SM>_jp<JP>_trt<TRT>_<precision>.engine) makes stale engines visually obvious; F2 takeoff load refuses to deserialize an engine whose metadata doesn't match the host's current (SM, JP, TRT) tuple.

5. Implementation Details

Algorithmic Complexity: per-model forward pass cost is the design driver. Engine builds are O(complexity_of_optimizer_search) — minutes for INT8 with calibration; sub-minute for FP16.

State Management:

Owns the CUDA stream(s) for the runtime; one stream per concurrent consumer (typically one stream because the F3 hot path is single-threaded).
Owns the resident engine handles for the duration of a flight.
Owns the polling loop for thermal-throttle telemetry (1 Hz background thread).

Key Dependencies:

Library	Version	Purpose
TensorRT (C++ + Python)	10.3 (JetPack 6.2 pin) per D-C7-9	Primary engine compile + deserialize + infer
Polygraphy	matches TensorRT	Engine build orchestration
trtexec	bundled with TensorRT	Alternate engine build path
ONNX Runtime + TRT EP	per project pin	Fallback runtime
PyTorch	per simple-baseline pin	FP16 baseline (mandatory)
jetson-stats / pynvml	latest	Thermal-throttle telemetry source

Error Handling Strategy:

EngineBuildError: surface to C10/operator pre-flight; takeoff blocked. Never silently fall back between runtimes — if the configured runtime can't build, the operator must explicitly switch.
EngineDeserializeError at takeoff: refuse takeoff with explicit (SM, JP, TRT, precision) mismatch detail.
InferenceError mid-flight (rare; e.g., transient CUDA fault): emit no result for that frame; the consumer (C2/C3) reports its own degraded path.
OutOfMemoryError: same as above; surface to C13 FDR and C12 operator-tooling for post-flight investigation.
TelemetryUnavailableError: jetson-stats hung or unavailable. Default to "thermal_throttle_active = false" (D-CROSS-LATENCY-1 stays on the steady-state path); log WARN.
CalibrationCacheError: per D-C10-6, calibration cache trust is critical; if the cache hash mismatches, refuse to use it and force a rebuild.

6. Extensions and Helpers

Helper	Purpose	Used By
`EngineFilenameSchema`	self-describing filename per D-C10-7	C7, C10
`Sha256Sidecar`	atomic write + content-hash sidecar pattern	C6, C7, C10

7. Caveats & Edge Cases

Known limitations:

TensorRT engines are NOT portable across (SM, JP, TRT, precision) tuples; Tier-1 (workstation Docker) cannot reuse Tier-2 (Jetson) engines. CI emits both tiers' engines as artifacts.
INT8 calibration cache trust is the lurking foot-gun; D-C10-6 manifest-hash gate is the only protection. Any deviation breaks NFT-PERF-01 / NFT-LIM-01.

Potential race conditions:

The thermal-throttle polling thread MUST be reentrant-safe with the F3 hot path's infer calls. Use a lock-free atomic snapshot for thermal_state.

Performance bottlenecks:

Per-frame inference cost is the F3 hot path's largest contributor. NFT-PERF-01 partition is the source of truth.

8. Dependency Graph

Must be implemented after: nothing internal — C7 is foundational.

Can be implemented in parallel with: C6, C13.

Blocks: C1 (CUDA strategies), C2, C2.5, C3, C3.5, C4 (consumes ThermalState), C10, F1, F2, F3, F6, F8.

9. Logging Strategy

Log Level	When	Example
ERROR	`EngineBuildError`, `EngineDeserializeError`, `OutOfMemoryError`, `CalibrationCacheError`	`C7 OOM during infer; backbone=ultravpr; frame=12345`
WARN	thermal-throttle entered/exited; telemetry unavailable	`C7 thermal throttle active; gpu_temp=83C; clock=750mhz`
INFO	Strategy ready; engine deserialised; backbone resident	`C7 ready: runtime=tensorrt, engines=[ultravpr@fp16, lightglue@fp16, disk@fp16]`
DEBUG	per-frame infer timing per backbone	`C7 infer backbone=ultravpr frame=12345 took=37ms`

Log format: structured JSON. Log storage: stdout / journald / FDR via C13 (ERROR + WARN always; thermal-state transitions always to FDR).

8.6 KiB Raw Blame History Unescape Escape