mirror of https://github.com/azaion/gps-denied-onboard.git synced 2026-06-21 22:31:13 +00:00

Files

T

Oleksandr Bezdieniezhnykh 0dfe7c5301 [AZ-321] C10 EngineCompiler: hardware-tied TRT compile + cache reuse

Land the C10 per-model engine compile + cache-reuse orchestrator.
`EngineCompiler.compile_engines_for_corpus(request)` walks the
corpus, computes the canonical engine filename via AZ-281
`EngineFilenameSchema.build`, and either reuses the cached binary
(cache hit, AZ-280 `Sha256Sidecar.verify` returns True) or delegates
to the AZ-297 `compile_engine` on the injected runtime (cache miss;
the runtime owns the write path). Returns one `EngineCompileResult`
per backbone carrying the canonical `EngineCacheEntry`, outcome
(BUILT / REUSED), and `compile_duration_s` (None on reuse).
Hardware-tied reuse (D-C10-6 / D-C10-7) falls out of the filename
schema — a host change rebuilds at the new path and leaves the old
files untouched (AC-4).

Design corrections vs. the task spec body:
- The spec proposed a c10-local `EngineCacheEntry` carrying outcome
  and duration; that name is already taken by the AZ-297 canonical
  DTO. The wrapper is renamed `EngineCompileResult`; the canonical
  shape wins.
- The spec called `InferenceRuntime.host_info()`, which is not in
  the AZ-297 Protocol. `HostCapabilities` is threaded through
  `EngineCompileRequest` instead so the composition root owns host
  probing and the compiler stays decoupled.
- The c10 layer cannot import `components.c7_inference` (arch rule
  `test_az270_compose_root.test_ac6`). `engine_compiler.py` defines
  `CompileEngineCallable` — a structural Protocol cut of
  `InferenceRuntime` exposing only `compile_engine` — and catches
  broad `Exception` (re-raising preserves the original type;
  `error_class` is recorded in the ERROR log payload).

Production
- engine_compiler.py: `CompileOutcome` enum, `BackboneSpec`,
  `EngineCompileRequest`, `EngineCompileResult`,
  `EngineCompileSummary` DTOs; `CompileEngineCallable` Protocol;
  `EngineCompiler` with the single public method.
- config.py: `BackboneConfig` + `C10ProvisioningConfig`
  (`workspace_mb` default 4 GiB to match C7 NFT-LIM-01); validate
  positive shape dims and duplicate model_name detection in
  `__post_init__`.
- runtime_root/c10_factory.py: `build_engine_compiler(config)` wires
  the existing `build_inference_runtime` factory through;
  `build_backbone_specs(config)` materialises the `BackboneSpec`
  tuple from the config block.
- components/c10_provisioning/__init__.py: re-exports the AZ-321
  surface and registers the new config block.

Tests
- test_engine_compiler.py: covers AC-1..AC-10 + missing-sidecar
  sibling case for AC-5. Tier-1 via fake runtime that writes through
  the REAL `Sha256Sidecar.write_atomic_and_sidecar`. Tier-2
  placeholders for the cache-hit p99 NFR (200 MB engine sweep) and
  kill-during-compile atomic-write NFR.

Docs
- module-layout.md: c10_provisioning Per-Component Mapping lists the
  new internal modules (engine_compiler.py, config.py), the
  composition-root c10_factory.py, the AZ-321 public re-export
  surface, and the registered config block.
- batch_33_cycle1_report.md + reviews/batch_33_review.md:
  PASS_WITH_WARNINGS (4 Low findings accepted).

Tests run: c10_provisioning 13 passing + 2 Tier-2 skips; combined
unit suite (excluding pending components) 543 passing, 21
env-skipped.

Co-authored-by: Cursor <cursoragent@cursor.com>

2026-05-13 00:09:53 +03:00

16 KiB

Raw Blame History

C10 Engine Compiler — Per-Model TRT Compile + Hardware-Tied Cache Reuse

Task: AZ-321_c10_engine_compiler Name: C10 Engine Compiler Description: Implement EngineCompiler, the C10-internal phase that compiles or re-uses TensorRT engines for every backbone the corpus needs (DINOv2 reduced for VPR, LightGlue, ALIKED descriptor head, plus any C7-runtime-required model). For each backbone, computes the AZ-281 self-describing filename {model}_{sm}_{jp}_{trt}_{precision}.engine, looks for an existing engine + sidecar at that path, and either re-uses it (cache hit, D-C10-6) or invokes AZ-298's TensorRT runtime to compile from the ONNX source + calibration cache. Writes each new engine via AZ-280's Sha256Sidecar for the takeoff content-hash gate. Returns a list[EngineCacheEntry] recording the per-backbone outcome (built / reused) plus the cache hit ratio. The compile is hardware-tied: SM, Jetpack, TRT version, and precision flags are baked into the filename so re-running on a different device produces a cache miss (correct behaviour, not a bug). Complexity: 5 points Dependencies: AZ-263_initial_structure, AZ-269_config_loader, AZ-266_log_module, AZ-280_sha256_sidecar, AZ-281_engine_filename_schema, AZ-298_c7_tensorrt_runtime Component: c10_provisioning (epic AZ-252 / E-C10) Tracker: AZ-321 Epic: AZ-252 (E-C10)

Document Dependencies

_docs/02_document/contracts/shared_helpers/engine_filename_schema.md — filename shape + parser (AZ-281).
_docs/02_document/contracts/shared_helpers/sha256_sidecar.md — atomic write + sidecar pattern (AZ-280).
_docs/02_document/contracts/c7_inference/inference_runtime_protocol.md — engine compile API (AZ-298).
_docs/02_document/components/11_c10_provisioning/description.md — § 5 error handling, § 7 caveats (D-C10-6 hardware-tied).

Problem

Without a real engine compiler:

AC-NEW-1 (no engine deserialization at takeoff before manifest verify) collapses on the build side — F1 cannot produce the .engine artifacts the airborne C7 deserialise step expects.
D-C10-6 (calibration cache reuse on identical hardware) is unobservable — every build re-compiles from scratch, blowing the C10-PT-01 ≤ 12 min cold target on warm runs.
D-C10-7 (self-describing engine filename) has no producer — without {model}_{sm}_{jp}_{trt}_{precision}.engine, hardware mismatches between operator workstation and Jetson airborne would silently load wrong-arch engines.
The C10-PT-01 warm idempotent re-run target (≤ 1 min) cannot be hit; engines dominate build time.
C10-IT-05 (Tier-2 build produces SM 87 / JP 6.2 / TRT 10.3 / FP16 engines) has no implementation.
Operators have no way to inspect which engines came from cache vs. were rebuilt — a critical signal for diagnosing GPU-OOM or calibration regressions.

This task delivers the per-model compile + cache-reuse logic. It does NOT own the orchestration (T5 owns build_cache_artifacts), the descriptor batching (T2), or the manifest writing (T3).

Outcome

An EngineCompiler class at src/gps_denied_onboard/components/c10_provisioning/engine_compiler.py:
- Constructor: __init__(self, *, inference_runtime: InferenceRuntime, sidecar: Sha256Sidecar, filename_schema: EngineFilenameSchema, logger: Logger).
- Public method: compile_engines_for_corpus(request: EngineCompileRequest) -> list[EngineCacheEntry].
- EngineCompileRequest (@dataclass(frozen=True)): backbones: tuple[BackboneSpec, ...], calibration_path: Path, cache_root: Path, precision: enum {fp16, int8}.
- BackboneSpec (@dataclass(frozen=True)): model_name: str, onnx_path: Path, expected_input_shape: tuple[int, ...].
- EngineCacheEntry (@dataclass(frozen=True)): model_name: str, engine_path: Path, sidecar_path: Path, outcome: enum {built, reused}, compile_duration_s: float | None, engine_sha256_hex: str.
Method flow:
1. For each BackboneSpec: a. Detect runtime hardware (SM, JP, TRT version) via inference_runtime.host_info(). b. Compute the target filename via filename_schema.format(...): {model}_{sm}_{jp}_{trt}_{precision}.engine. c. Compute the target path: {cache_root}/engines/{filename}. d. If target_path.exists() AND sidecar.verify(target_path) returns True:
  - Outcome = reused; emit INFO log kind="c10.engine.cache.hit"; append EngineCacheEntry; continue. e. Else (cache miss):
  - Emit WARN log kind="c10.engine.cache.miss" with {model_name, target_filename}.
  - Call inference_runtime.compile_engine(onnx_path, calibration_path, precision, expected_input_shape) -> bytes (raises EngineBuildError or CalibrationCacheError on failure — propagate).
  - Write the engine bytes via sidecar.write_with_sidecar(target_path, engine_bytes) (atomic write + SHA-256 sidecar at {target_path}.sha256).
  - Outcome = built; record compile_duration_s from time.monotonic() deltas; append EngineCacheEntry.
2. Return the list. Aggregate count: engines_built, engines_reused, total cache hit ratio. INFO log kind="c10.engine.compile.summary" with the totals.
The composition root constructs EngineCompiler and injects it into the T5 CacheProvisioner. Factory: build_engine_compiler(config) -> EngineCompiler.
A BackboneSpec registry at src/gps_denied_onboard/runtime_root/c10_factory.py enumerates the project's backbones (initially DINOv2-VPR + LightGlue + ALIKED — cross-referenced against E-C2/E-C2.5/E-C3 component descriptions). The list is config-driven via config.c10.backbones: list[BackboneSpec] so a future model addition does not require code change.
INFO log on every cache hit; WARN on every cache miss; ERROR on EngineBuildError / CalibrationCacheError with the offending model name.

Scope

Included

EngineCompiler class with the single public method.
The 3 DTOs (EngineCompileRequest, BackboneSpec, EngineCacheEntry) plus their enum types.
Hardware-tied filename construction via AZ-281's schema.
Cache-hit detection via sidecar.verify (sha256 sidecar matches).
Cache-miss compile via AZ-298's InferenceRuntime.compile_engine.
Atomic engine write + sidecar via AZ-280.
Composition-root factory.
Conformance test: a fake InferenceRuntime returns scripted engine bytes; the test asserts cache hit / miss outcomes for the documented matrix.
Per-cache-entry timing instrumentation.
config.c10.backbones schema extension on AZ-269's loader.

Excluded

The orchestration of when to compile (T5 owns build_cache_artifacts).
Descriptor generation (T2 owns).
Manifest writing (T3 owns).
TensorRT internals — owned by AZ-298 (the compile_engine impl); this task only consumes the protocol.
Engine deserialization at takeoff — owned by AZ-298 (load side) + the C7 component runtime self-check.
Engine version compatibility checks across deployments — out of scope; the filename schema (AZ-281) carries enough signal that mismatches surface as cache miss.
Multi-GPU compile — operator workstation is single-GPU per RESTRICT-OPS-2.
A re-build-now CLI flag — operator workflow goes through T5; force-rebuild is achieved by deleting the engine cache directory.

Acceptance Criteria

AC-1: Cold cache compiles every backbone Given an empty cache_root/engines/ and 3 backbones in BackboneSpec[] When compile_engines_for_corpus(request) is called Then 3 EngineCacheEntry are returned, all with outcome = built; 3 .engine files + 3 .sha256 sidecars are present at cache_root/engines/; ONE WARN log per backbone (c10.engine.cache.miss); ONE INFO log summary with engines_built=3, engines_reused=0

AC-2: Warm cache reuses every backbone Given the same cache_root/engines/ populated by a prior cold run When compile_engines_for_corpus(request) is called with identical request Then 3 EngineCacheEntry are returned, all outcome = reused; ZERO calls to inference_runtime.compile_engine (verifiable via spy); ONE INFO log per backbone (c10.engine.cache.hit); summary log shows engines_reused=3

AC-3: Mixed cache (1 hit + 2 miss) Given the cache contains only the DINOv2 engine; LightGlue and ALIKED are missing When compile_engines_for_corpus(request) is called Then DINOv2 → reused, LightGlue + ALIKED → built; the report shows engines_built=2, engines_reused=1

AC-4: Hardware change invalidates cache Given a cache populated for (sm=87, jp=6.2, trt=10.3, fp16) and the runtime now reports (sm=89, jp=6.3, trt=10.5, fp16) When compile_engines_for_corpus(request) is called Then ALL backbones have outcome = built (the filename differs, so the existing engines are not even consulted); the existing engines remain on disk (this task does NOT delete stale engines — that's the orchestrator's call)

AC-5: Tampered sidecar invalidates that one engine Given a .engine file matches its sidecar but a malicious actor flipped a bit in the sidecar (or the engine bytes drifted) When compile_engines_for_corpus(request) is called Then sidecar.verify returns False for that entry; that backbone is recompiled (outcome = built); ONE WARN log kind="c10.engine.sidecar.mismatch" with the offending path

AC-6: EngineBuildError propagates without partial state Given inference_runtime.compile_engine raises EngineBuildError("CUDA OOM") on the second of 3 backbones When compile_engines_for_corpus(request) is called Then EngineBuildError is raised; the first backbone's engine + sidecar ARE present (already-written cache reuse from prior runs); the second backbone's engine is NOT half-written (atomic write); the third backbone is NOT attempted; ONE ERROR log with the model name

AC-7: CalibrationCacheError propagates with diagnostic Given inference_runtime.compile_engine raises CalibrationCacheError("calibration table missing for INT8") When the compiler hits the failing backbone Then the error propagates; ONE ERROR log with {model_name, calibration_path}; partial state is consistent (atomic writes guarantee no half-engine on disk)

AC-8: Filename schema + sidecar layout matches spec Given a freshly-built DINOv2 engine on Tier-2 hardware (SM 87, JP 6.2, TRT 10.3, FP16) When inspecting cache_root/engines/ Then the file is named dinov2_vpr_sm87_jp62_trt103_fp16.engine; the sidecar at dinov2_vpr_sm87_jp62_trt103_fp16.engine.sha256 contains the 64-char hex digest; both match EngineFilenameSchema.parse and Sha256Sidecar.verify

AC-9: compile_duration_s recorded for built; None for reused Given a mix of hits and misses When inspecting EngineCacheEntry Then compile_duration_s is not None for every built entry; compile_duration_s is None for every reused entry; built durations are positive floats

AC-10: Empty backbones list returns empty result Given request.backbones == () When compile_engines_for_corpus(request) is called Then [] is returned; ZERO calls to inference_runtime.compile_engine; ZERO files written; ONE INFO log summary with all-zero counts

Non-Functional Requirements

Performance

Cache-hit path per backbone ≤ 100 ms (one filename construction + one Path.exists + one sidecar verify dominated by SHA-256 of the engine file, which is bounded by disk read bandwidth). For a 200 MB engine, this is ~1 s on NVMe — measure and document.
Cold compile is dominated by AZ-298's TensorRT runtime; this task imposes no additional time budget beyond AZ-298's.

Compatibility

AZ-281 (EngineFilenameSchema) and AZ-280 (Sha256Sidecar) are the schema and atomic-write helpers; this task introduces NO new third-party dependencies.

Reliability

Atomic writes via AZ-280 guarantee no half-engine on disk after a process kill.
Cache-miss recompile is idempotent — running the same compile twice produces identical bytes (TRT engine determinism is owned by AZ-298; this task assumes it).

Unit Tests

AC Ref	What to Test	Required Outcome
AC-1	Empty cache_root + 3 backbones	All `built`; sidecars present
AC-2	Warm cache + identical request	All `reused`; zero `compile_engine` calls
AC-3	Cache populated for 1 of 3 backbones	1 reused + 2 built
AC-4	Hardware change (different SM in fake runtime)	All `built`; old engines untouched
AC-5	Tampered sidecar (flip 1 byte)	That engine rebuilds; WARN log
AC-6	Fake runtime raises `EngineBuildError` mid-run	Error propagates; partial state consistent
AC-7	Fake runtime raises `CalibrationCacheError`	Error propagates with diagnostic
AC-8	Inspect filename + sidecar layout	Matches schema; both verify
AC-9	Compile_duration recorded	Set on `built`, None on `reused`
AC-10	Empty backbones	Empty result; zero side effects
NFR-perf-cache-hit	Microbench cache-hit path × 100 with 200 MB engine	p99 ≤ 1.5 s (mostly SHA-256 read)
NFR-reliability-atomic-write	Kill process mid-`compile_engine`	No half-engine on disk after restart

Constraints

The filename schema is canonical via AZ-281; this task does NOT invent its own (per coderule.mdc "follow established project patterns").
The atomic-write + sidecar pattern is canonical via AZ-280; this task does NOT use open(...).write() or naked pathlib.Path.write_bytes().
Cache hit is decided by sidecar.verify (file SHA-256 matches sidecar value); filename match alone is NOT sufficient (defends against bit-rot or bit-flip).
The BackboneSpec registry is config-driven; adding a new model is a config change, not a code change.
This task does NOT clean up stale engines (the orchestrator T5 may emit ManifestCoverageError on orphan files; cleanup is the operator's call).
This task introduces no new third-party dependencies.

Risks & Mitigation

Risk 1: SHA-256 verification of large engines is slow on warm path

Risk: 200 MB engine × 5 backbones = 1 GB SHA-256 per warm idempotent run; on slow disks, this exceeds C10-PT-01's 1 min budget alone.
Mitigation: AZ-280's Sha256Sidecar.verify uses sendfile / mmap paths where available; benchmark documented in AZ-280. If still too slow, a future task adds an mtime + size quick-check fallback (out of scope this cycle).

Risk 2: Partial cache after EngineBuildError on backbone N

Risk: Backbones 1..N-1 are built and on disk; the N-th fails; backbones N+1..M are never attempted. The cache is "partially valid" — the orchestrator (T5) sees inconsistent state.
Mitigation: T5's coverage check + ManifestCoverageError surface this. The compiler does NOT delete the partial state; T5 decides whether to retry, fail, or roll back per the operator's request mode.

Risk 3: TensorRT engine determinism not guaranteed across builds

Risk: Two compiles of the same ONNX + calibration produce different bytes; cache-hit detection via SHA-256 fails post-rebuild.
Mitigation: TRT engine determinism is AZ-298's contract obligation; if it fails, this task's cache-hit ratio drops to 0 and operators see WARN logs. AZ-298's tests assert determinism; this task assumes it.

Risk 4: Operator manually edits engine file but not sidecar

Risk: Hand-debugging or manual tuning leaves an engine file whose bytes don't match its sidecar; AC-5 covers detection.
Mitigation: AC-5 + WARN log c10.engine.sidecar.mismatch surface the case immediately on next compile run; operators should re-generate via the build command.

Runtime Completeness

Named capability: TRT engine compile + hardware-tied cache reuse per D-C10-6 + D-C10-7 (description.md § 5; epic § Acceptance C10-IT-05; AC-NEW-1).
Production code that must exist: real EngineCompiler orchestrating real AZ-298 compile_engine + real AZ-280 atomic write/verify + real AZ-281 filename construction; real config-driven BackboneSpec registry.
Allowed external stubs: tests MAY use a fake InferenceRuntime that returns scripted bytes + a fake host_info() for hardware variation; production wiring uses the real AZ-298 runtime + real Sha256Sidecar.
Unacceptable substitutes: a Python-level pickle of a "fake engine" object (TRT engines are opaque CUDA blobs; faking them in production breaks AC-NEW-1's takeoff verify); skipping the sidecar (loses bit-rot detection); inventing a new filename scheme inside this task (defeats D-C10-7); Path.write_bytes() instead of AZ-280 (no atomicity guarantee).

16 KiB Raw Blame History Unescape Escape