mirror of https://github.com/azaion/gps-denied-onboard.git synced 2026-06-21 18:51:12 +00:00

Files

T

Oleksandr Bezdieniezhnykh 76f460c88a [autodev] Step 13 partial: c6/c7/c8 cycle-1 doc sync

Batch 3 of the cycle-1 component-doc sync. For each of C6
(tile_cache), C7 (inference), C8 (fc_adapter):

- Append "Cycle-1 operational reality" paragraph to § 1
  documenting the actual cycle-1 wiring path:
  - C6: infrastructure seeded via build_pre_constructed's
    c6_descriptor_index (BUILD_FAISS_INDEX-gated) and
    c6_tile_store slots; no _STRATEGY_REGISTRY slot;
    AZ-687 replay-mode guard skips both seeds when the
    minimal replay Config omits the c6_tile_cache block.
  - C7: single InferenceRuntime built once via
    _build_c7_inference, identity-shared as the engine
    source for c3_lightglue_runtime (AZ-622 phase D);
    C7_AIRBORNE_BUILD_FLAGS lists tensorrt (production-
    default) + pytorch_fp16 (Tier-0 fallback);
    onnx_trt_ep deliberately omitted from airborne flags;
    AZ-687 replay-mode guard cascades to c3_lightglue_runtime.
  - C8: composed via a SEPARATE registry path
    (runtime_root/fc_factory.py) with its own _FC_REGISTRY
    + _GCS_REGISTRY; per-binary bootstrap modules register
    concrete strategies under BUILD_FC_* / BUILD_GCS_*
    flags; bind_outbound_emit_thread enforces the
    single-writer outbound invariant (AC-6).

- Add "Cycle-1 Tier-2 follow-up dependencies" subsection
  in § 7 of C7 only: onnx_trt_ep is implemented and the
  inference_factory recognises BUILD_ONNX_TRT_EP_RUNTIME,
  but airborne config selecting it raises a clean
  AirborneBootstrapError pointing only at the two airborne
  options. C6 and C8 have no parked Tier-2 strategies for
  cycle-1.

None of c6/c7/c8 import cv2 directly, so no OpenCV pin
row is added to § 5 (D-CROSS-CVE-1 leftover stays as it
is; the relaxed pin is recorded against c2.5/c3/c3.5/c4/c5
where the imports actually live).

Also refresh the D-CROSS-CVE-1 leftover replay timestamp
(condition still upstream-gated: gtsam wheels remain
numpy<2) and bump the autodev state's sub_step.detail to
record "batch 3/~5 done (c6/c7/c8); 4 components + 8
helpers + tests/ remain".

Co-authored-by: Cursor <cursoragent@cursor.com>

2026-05-19 17:17:33 +03:00

12 KiB

Raw Blame History

C7 — On-Jetson Inference Runtime

1. High-Level Overview

Purpose: provide a single inference-runtime abstraction that all GPU-using components (C1 selectively, C2, C2.5, C3, C3.5) consume. Owns engine compilation (Polygraphy / trtexec / IBuilderConfig hybrid), engine deserialization at takeoff load, GPU memory management, INT8 calibration cache trust, and the thermal-throttle telemetry feed that drives the D-CROSS-LATENCY-1 hybrid in C4.

Architectural Pattern: Strategy — InferenceRuntime interface with three concrete implementations: TensorrtRuntime (production-default per D-C7-9 JetPack 6.2 + TensorRT 10.3 lock), OnnxTrtEpRuntime (fallback), PytorchFp16Runtime (mandatory simple-baseline). Selection at startup by config (ADR-001), build-time gating by BUILD_* flags (ADR-002), composition-root wired (ADR-009).

Cycle-1 operational reality: C7 is infrastructure shared across consumers — it does NOT have its own slot in the _STRATEGY_REGISTRY populated by register_airborne_strategies() (AZ-591). Instead the airborne binary builds the InferenceRuntime once via runtime_root/airborne_bootstrap.py::_build_c7_inference → inference_factory.build_inference_runtime, and seeds the single instance into pre_constructed["c7_inference"] (AZ-621 / Phase C). The same instance is reused as the engine source for the shared LightGlueRuntime load (AZ-622 / Phase D, _build_c3_lightglue_runtime), so the bootstrap never double-builds the runtime; downstream wrappers (c2_vpr / c3_matcher / c3_5_adhop, per AIRBORNE_REQUIRED_PRE_CONSTRUCTED_KEYS) then receive the identity-shared runtime via compose_root's constructor injection. Airborne-buildable runtimes are gated by C7_AIRBORNE_BUILD_FLAGS = (("tensorrt", "BUILD_TENSORRT_RUNTIME"), ("pytorch_fp16", "BUILD_PYTORCH_FP16_RUNTIME")) — tensorrt is the production-default, pytorch_fp16 is the Tier-0 / workstation fallback (and is the conservative C7InferenceConfig.runtime default so unconfigured test environments resolve to the Tier-0 baseline). onnx_trt_ep is deliberately omitted from the airborne flag matrix even though inference_factory._RUNTIME_TO_BUILD_FLAG recognises it — see § 7 Tier-2 follow-up. When no airborne runtime is buildable (both BUILD_TENSORRT_RUNTIME and BUILD_PYTORCH_FP16_RUNTIME OFF, or the configured runtime's flag is OFF) and any configured consumer still requires c7_inference, _build_c7_inference surfaces the upstream RuntimeNotAvailableError as an AirborneBootstrapError (AC-621.2) naming the missing key, BOTH airborne BUILD_* flags + their runtimes, and the consuming component slug(s) — narrowed to the configured consumers when available. AZ-687 replay-mode guard: when config.mode == "replay" and the minimal replay Config omits the c7_inference block, build_pre_constructed skips both c7_inference AND the cascading c3_lightglue_runtime seed (the LightGlue runtime depends on the inference runtime); the c2_vpr / c3_matcher / c2_5_rerank / c3_5_adhop wrappers that would have consumed the runtime are likewise absent from the replay Config and therefore never look at the skipped slot.

Upstream dependencies:

C10 CacheProvisioner → during F1 (after C11 TileDownloader has populated C6) triggers engine compilation when no cached engine matches the (SM, JP, TRT, precision) tuple.
F2 takeoff load → triggers deserialize_cached_engine for every model used by C1/C2/C2.5/C3/C3.5.
jetson-stats / NVML → thermal-throttle telemetry source.

Downstream consumers:

C2 VPR (backbone forward pass).
C2.5 ReRanker (LightGlue forward pass).
C3 CrossDomainMatcher (DISK / LightGlue / ALIKED / XFeat forward passes).
C3.5 AdHoP (conditional refinement backbone).
C1 (only the strategies that have a CUDA path; KltRansac is CPU-only).
C4 (consumes ThermalState for the D-CROSS-LATENCY-1 covariance-mode decision).

2. Internal Interfaces

Interface: `InferenceRuntime`

Method	Input	Output	Async	Error Types
`compile_engine`	`model_path: Path, build_config: BuildConfig`	`EngineCacheEntry`	No (offline)	`EngineBuildError`, `CalibrationCacheError`
`deserialize_engine`	`EngineCacheEntry`	`EngineHandle`	No	`EngineDeserializeError`
`infer`	`EngineHandle, inputs: dict[str, Tensor]`	`dict[str, Tensor]`	No (sync GPU stream)	`InferenceError`, `OutOfMemoryError`
`release_engine`	`EngineHandle`	`None`	No	—
`thermal_state`	`()`	`ThermalState`	No	`TelemetryUnavailableError`
`current_runtime_label`	`()`	`string`	No	—

Input/Output DTOs:

BuildConfig:
  precision:                   enum {fp16, int8, mixed}
  workspace_mb:                int
  calibration_dataset:         Path (required for int8)
  optimization_profiles:       list[(input_name, min_shape, opt_shape, max_shape)]

EngineCacheEntry:              see data_model.md
EngineHandle:                  opaque GPU-resident handle

ThermalState:
  cpu_temp_c:                  float
  gpu_temp_c:                  float
  thermal_throttle_active:     bool
  measured_clock_mhz:          int
  measured_at:                 monotonic_ns

3. External API Specification

Not applicable.

4. Data Access Patterns

Queries

Query	Frequency	Hot Path	Index Needed
`infer` for VPR backbone	3 Hz	Yes	n/a
`infer` for LightGlue (×10 in C2.5, ×3 in C3)	3 Hz × 13 = 39 Hz	Yes	n/a
`infer` for AdHoP (conditional)	<1 Hz typical	Yes (when invoked)	n/a
`thermal_state` poll	1 Hz from C4	No (sampled, not per-frame)	n/a

Caching Strategy

Data	Cache Type	TTL	Invalidation
Compiled `.engine` files	filesystem keyed by `(SM, JP, TRT, precision)` (D-C10-7)	bounded by JetPack/TRT version stability	manifest content-hash gate at takeoff (D-C10-3)
INT8 calibration cache	filesystem alongside `.engine` (D-C10-6)	bounded by calibration dataset version	rebuild when calibration dataset hash changes
Resident engine handles	GPU memory	flight lifetime	F8 reboot recovery re-deserialises

Storage Estimates

Table/Collection	Est. Row Count (1yr)	Row Size	Total Size	Growth Rate
`.engine` files	one per (model × precision × backbone)	50 MB – 500 MB / engine	up to ~1.5 GB across all backbones for a deployment binary	bounded by AC-8.3 carve-out
INT8 calibration caches	one per engine	1–10 MB	<50 MB	as above

Data Management

Seed data: pre-flight F1 provisioning compiles engines (or reuses cached). No mid-flight compilation.

Rollback: D-C10-7 self-describing filename schema (<model>__sm<SM>_jp<JP>_trt<TRT>_<precision>.engine) makes stale engines visually obvious; F2 takeoff load refuses to deserialize an engine whose metadata doesn't match the host's current (SM, JP, TRT) tuple.

5. Implementation Details

Algorithmic Complexity: per-model forward pass cost is the design driver. Engine builds are O(complexity_of_optimizer_search) — minutes for INT8 with calibration; sub-minute for FP16.

State Management:

Owns the CUDA stream(s) for the runtime; one stream per concurrent consumer (typically one stream because the F3 hot path is single-threaded).
Owns the resident engine handles for the duration of a flight.
Owns the polling loop for thermal-throttle telemetry (1 Hz background thread).

Key Dependencies:

Library	Version	Purpose
TensorRT (C++ + Python)	10.3 (JetPack 6.2 pin) per D-C7-9	Primary engine compile + deserialize + infer
Polygraphy	matches TensorRT	Engine build orchestration
trtexec	bundled with TensorRT	Alternate engine build path
ONNX Runtime + TRT EP	per project pin	Fallback runtime
PyTorch	per simple-baseline pin	FP16 baseline (mandatory)
jetson-stats / pynvml	latest	Thermal-throttle telemetry source

Error Handling Strategy:

EngineBuildError: surface to C10/operator pre-flight; takeoff blocked. Never silently fall back between runtimes — if the configured runtime can't build, the operator must explicitly switch.
EngineDeserializeError at takeoff: refuse takeoff with explicit (SM, JP, TRT, precision) mismatch detail.
InferenceError mid-flight (rare; e.g., transient CUDA fault): emit no result for that frame; the consumer (C2/C3) reports its own degraded path.
OutOfMemoryError: same as above; surface to C13 FDR and C12 operator-tooling for post-flight investigation.
TelemetryUnavailableError: jetson-stats hung or unavailable. Default to "thermal_throttle_active = false" (D-CROSS-LATENCY-1 stays on the steady-state path); log WARN.
CalibrationCacheError: per D-C10-6, calibration cache trust is critical; if the cache hash mismatches, refuse to use it and force a rebuild.

6. Extensions and Helpers

Helper	Purpose	Used By
`EngineFilenameSchema`	self-describing filename per D-C10-7	C7, C10
`Sha256Sidecar`	atomic write + content-hash sidecar pattern	C6, C7, C10

7. Caveats & Edge Cases

Known limitations:

TensorRT engines are NOT portable across (SM, JP, TRT, precision) tuples; Tier-1 (workstation Docker) cannot reuse Tier-2 (Jetson) engines. CI emits both tiers' engines as artifacts.
INT8 calibration cache trust is the lurking foot-gun; D-C10-6 manifest-hash gate is the only protection. Any deviation breaks NFT-PERF-01 / NFT-LIM-01.

Potential race conditions:

The thermal-throttle polling thread MUST be reentrant-safe with the F3 hot path's infer calls. Use a lock-free atomic snapshot for thermal_state.

Performance bottlenecks:

Per-frame inference cost is the F3 hot path's largest contributor. NFT-PERF-01 partition is the source of truth.

Cycle-1 Tier-2 follow-up dependencies:

OnnxTrtEpRuntime — the module + class are implemented and the lower-level inference_factory._RUNTIME_TO_BUILD_FLAG maps "onnx_trt_ep" → "BUILD_ONNX_TRT_EP_RUNTIME", but the airborne C7_AIRBORNE_BUILD_FLAGS tuple in runtime_root/airborne_bootstrap.py deliberately omits it (research-only per the AZ-621 task spec). Setting config.components['c7_inference'].runtime = "onnx_trt_ep" on an airborne binary raises AirborneBootstrapError from _build_c7_inference whose message lists ONLY the two airborne flag options (tensorrt / pytorch_fp16) — operators see a clean recovery path instead of a research-build escape hatch. Tier-2 follow-up: extend C7_AIRBORNE_BUILD_FLAGS (and gate it on BUILD_ONNX_TRT_EP_RUNTIME=ON) only if a future deployment scenario justifies the ORT-TRT-EP path on a flight binary; until then the runtime is exercised via unit-test composition and ad-hoc workstation runs only.

8. Dependency Graph

Must be implemented after: nothing internal — C7 is foundational.

Can be implemented in parallel with: C6, C13.

Blocks: C1 (CUDA strategies), C2, C2.5, C3, C3.5, C4 (consumes ThermalState), C10, F1, F2, F3, F6, F8.

9. Logging Strategy

Log Level	When	Example
ERROR	`EngineBuildError`, `EngineDeserializeError`, `OutOfMemoryError`, `CalibrationCacheError`	`C7 OOM during infer; backbone=ultravpr; frame=12345`
WARN	thermal-throttle entered/exited; telemetry unavailable	`C7 thermal throttle active; gpu_temp=83C; clock=750mhz`
INFO	Strategy ready; engine deserialised; backbone resident	`C7 ready: runtime=tensorrt, engines=[ultravpr@fp16, lightglue@fp16, disk@fp16]`
DEBUG	per-frame infer timing per backbone	`C7 infer backbone=ultravpr frame=12345 took=37ms`

Log format: structured JSON. Log storage: stdout / journald / FDR via C13 (ERROR + WARN always; thermal-state transitions always to FDR).

12 KiB Raw Blame History Unescape Escape