mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-22 05:21:12 +00:00
[AZ-321] C10 EngineCompiler: hardware-tied TRT compile + cache reuse
Land the C10 per-model engine compile + cache-reuse orchestrator. `EngineCompiler.compile_engines_for_corpus(request)` walks the corpus, computes the canonical engine filename via AZ-281 `EngineFilenameSchema.build`, and either reuses the cached binary (cache hit, AZ-280 `Sha256Sidecar.verify` returns True) or delegates to the AZ-297 `compile_engine` on the injected runtime (cache miss; the runtime owns the write path). Returns one `EngineCompileResult` per backbone carrying the canonical `EngineCacheEntry`, outcome (BUILT / REUSED), and `compile_duration_s` (None on reuse). Hardware-tied reuse (D-C10-6 / D-C10-7) falls out of the filename schema — a host change rebuilds at the new path and leaves the old files untouched (AC-4). Design corrections vs. the task spec body: - The spec proposed a c10-local `EngineCacheEntry` carrying outcome and duration; that name is already taken by the AZ-297 canonical DTO. The wrapper is renamed `EngineCompileResult`; the canonical shape wins. - The spec called `InferenceRuntime.host_info()`, which is not in the AZ-297 Protocol. `HostCapabilities` is threaded through `EngineCompileRequest` instead so the composition root owns host probing and the compiler stays decoupled. - The c10 layer cannot import `components.c7_inference` (arch rule `test_az270_compose_root.test_ac6`). `engine_compiler.py` defines `CompileEngineCallable` — a structural Protocol cut of `InferenceRuntime` exposing only `compile_engine` — and catches broad `Exception` (re-raising preserves the original type; `error_class` is recorded in the ERROR log payload). Production - engine_compiler.py: `CompileOutcome` enum, `BackboneSpec`, `EngineCompileRequest`, `EngineCompileResult`, `EngineCompileSummary` DTOs; `CompileEngineCallable` Protocol; `EngineCompiler` with the single public method. - config.py: `BackboneConfig` + `C10ProvisioningConfig` (`workspace_mb` default 4 GiB to match C7 NFT-LIM-01); validate positive shape dims and duplicate model_name detection in `__post_init__`. - runtime_root/c10_factory.py: `build_engine_compiler(config)` wires the existing `build_inference_runtime` factory through; `build_backbone_specs(config)` materialises the `BackboneSpec` tuple from the config block. - components/c10_provisioning/__init__.py: re-exports the AZ-321 surface and registers the new config block. Tests - test_engine_compiler.py: covers AC-1..AC-10 + missing-sidecar sibling case for AC-5. Tier-1 via fake runtime that writes through the REAL `Sha256Sidecar.write_atomic_and_sidecar`. Tier-2 placeholders for the cache-hit p99 NFR (200 MB engine sweep) and kill-during-compile atomic-write NFR. Docs - module-layout.md: c10_provisioning Per-Component Mapping lists the new internal modules (engine_compiler.py, config.py), the composition-root c10_factory.py, the AZ-321 public re-export surface, and the registered config block. - batch_33_cycle1_report.md + reviews/batch_33_review.md: PASS_WITH_WARNINGS (4 Low findings accepted). Tests run: c10_provisioning 13 passing + 2 Tier-2 skips; combined unit suite (excluding pending components) 543 passing, 21 env-skipped. Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -0,0 +1,197 @@
|
||||
# C10 Engine Compiler — Per-Model TRT Compile + Hardware-Tied Cache Reuse
|
||||
|
||||
**Task**: AZ-321_c10_engine_compiler
|
||||
**Name**: C10 Engine Compiler
|
||||
**Description**: Implement `EngineCompiler`, the C10-internal phase that compiles or re-uses TensorRT engines for every backbone the corpus needs (DINOv2 reduced for VPR, LightGlue, ALIKED descriptor head, plus any C7-runtime-required model). For each backbone, computes the AZ-281 self-describing filename `{model}_{sm}_{jp}_{trt}_{precision}.engine`, looks for an existing engine + sidecar at that path, and either re-uses it (cache hit, D-C10-6) or invokes AZ-298's TensorRT runtime to compile from the ONNX source + calibration cache. Writes each new engine via AZ-280's `Sha256Sidecar` for the takeoff content-hash gate. Returns a `list[EngineCacheEntry]` recording the per-backbone outcome (built / reused) plus the cache hit ratio. The compile is hardware-tied: SM, Jetpack, TRT version, and precision flags are baked into the filename so re-running on a different device produces a cache miss (correct behaviour, not a bug).
|
||||
**Complexity**: 5 points
|
||||
**Dependencies**: AZ-263_initial_structure, AZ-269_config_loader, AZ-266_log_module, AZ-280_sha256_sidecar, AZ-281_engine_filename_schema, AZ-298_c7_tensorrt_runtime
|
||||
**Component**: c10_provisioning (epic AZ-252 / E-C10)
|
||||
**Tracker**: AZ-321
|
||||
**Epic**: AZ-252 (E-C10)
|
||||
|
||||
### Document Dependencies
|
||||
|
||||
- `_docs/02_document/contracts/shared_helpers/engine_filename_schema.md` — filename shape + parser (AZ-281).
|
||||
- `_docs/02_document/contracts/shared_helpers/sha256_sidecar.md` — atomic write + sidecar pattern (AZ-280).
|
||||
- `_docs/02_document/contracts/c7_inference/inference_runtime_protocol.md` — engine compile API (AZ-298).
|
||||
- `_docs/02_document/components/11_c10_provisioning/description.md` — § 5 error handling, § 7 caveats (D-C10-6 hardware-tied).
|
||||
|
||||
## Problem
|
||||
|
||||
Without a real engine compiler:
|
||||
|
||||
- AC-NEW-1 (no engine deserialization at takeoff before manifest verify) collapses on the build side — F1 cannot produce the `.engine` artifacts the airborne C7 deserialise step expects.
|
||||
- D-C10-6 (calibration cache reuse on identical hardware) is unobservable — every build re-compiles from scratch, blowing the C10-PT-01 ≤ 12 min cold target on warm runs.
|
||||
- D-C10-7 (self-describing engine filename) has no producer — without `{model}_{sm}_{jp}_{trt}_{precision}.engine`, hardware mismatches between operator workstation and Jetson airborne would silently load wrong-arch engines.
|
||||
- The C10-PT-01 warm idempotent re-run target (≤ 1 min) cannot be hit; engines dominate build time.
|
||||
- C10-IT-05 (Tier-2 build produces SM 87 / JP 6.2 / TRT 10.3 / FP16 engines) has no implementation.
|
||||
- Operators have no way to inspect which engines came from cache vs. were rebuilt — a critical signal for diagnosing GPU-OOM or calibration regressions.
|
||||
|
||||
This task delivers the per-model compile + cache-reuse logic. It does NOT own the orchestration (T5 owns `build_cache_artifacts`), the descriptor batching (T2), or the manifest writing (T3).
|
||||
|
||||
## Outcome
|
||||
|
||||
- An `EngineCompiler` class at `src/gps_denied_onboard/components/c10_provisioning/engine_compiler.py`:
|
||||
- Constructor: `__init__(self, *, inference_runtime: InferenceRuntime, sidecar: Sha256Sidecar, filename_schema: EngineFilenameSchema, logger: Logger)`.
|
||||
- Public method: `compile_engines_for_corpus(request: EngineCompileRequest) -> list[EngineCacheEntry]`.
|
||||
- `EngineCompileRequest` (`@dataclass(frozen=True)`): `backbones: tuple[BackboneSpec, ...]`, `calibration_path: Path`, `cache_root: Path`, `precision: enum {fp16, int8}`.
|
||||
- `BackboneSpec` (`@dataclass(frozen=True)`): `model_name: str`, `onnx_path: Path`, `expected_input_shape: tuple[int, ...]`.
|
||||
- `EngineCacheEntry` (`@dataclass(frozen=True)`): `model_name: str`, `engine_path: Path`, `sidecar_path: Path`, `outcome: enum {built, reused}`, `compile_duration_s: float | None`, `engine_sha256_hex: str`.
|
||||
- Method flow:
|
||||
1. For each `BackboneSpec`:
|
||||
a. Detect runtime hardware (SM, JP, TRT version) via `inference_runtime.host_info()`.
|
||||
b. Compute the target filename via `filename_schema.format(...)`: `{model}_{sm}_{jp}_{trt}_{precision}.engine`.
|
||||
c. Compute the target path: `{cache_root}/engines/{filename}`.
|
||||
d. If `target_path.exists()` AND `sidecar.verify(target_path)` returns `True`:
|
||||
- Outcome = `reused`; emit INFO log `kind="c10.engine.cache.hit"`; append `EngineCacheEntry`; continue.
|
||||
e. Else (cache miss):
|
||||
- Emit WARN log `kind="c10.engine.cache.miss"` with `{model_name, target_filename}`.
|
||||
- Call `inference_runtime.compile_engine(onnx_path, calibration_path, precision, expected_input_shape) -> bytes` (raises `EngineBuildError` or `CalibrationCacheError` on failure — propagate).
|
||||
- Write the engine bytes via `sidecar.write_with_sidecar(target_path, engine_bytes)` (atomic write + SHA-256 sidecar at `{target_path}.sha256`).
|
||||
- Outcome = `built`; record `compile_duration_s` from `time.monotonic()` deltas; append `EngineCacheEntry`.
|
||||
2. Return the list. Aggregate count: `engines_built`, `engines_reused`, total cache hit ratio. INFO log `kind="c10.engine.compile.summary"` with the totals.
|
||||
- The composition root constructs `EngineCompiler` and injects it into the T5 CacheProvisioner. Factory: `build_engine_compiler(config) -> EngineCompiler`.
|
||||
- A `BackboneSpec` registry at `src/gps_denied_onboard/runtime_root/c10_factory.py` enumerates the project's backbones (initially DINOv2-VPR + LightGlue + ALIKED — cross-referenced against E-C2/E-C2.5/E-C3 component descriptions). The list is config-driven via `config.c10.backbones: list[BackboneSpec]` so a future model addition does not require code change.
|
||||
- INFO log on every cache hit; WARN on every cache miss; ERROR on `EngineBuildError` / `CalibrationCacheError` with the offending model name.
|
||||
|
||||
## Scope
|
||||
|
||||
### Included
|
||||
|
||||
- `EngineCompiler` class with the single public method.
|
||||
- The 3 DTOs (`EngineCompileRequest`, `BackboneSpec`, `EngineCacheEntry`) plus their enum types.
|
||||
- Hardware-tied filename construction via AZ-281's schema.
|
||||
- Cache-hit detection via `sidecar.verify` (sha256 sidecar matches).
|
||||
- Cache-miss compile via AZ-298's `InferenceRuntime.compile_engine`.
|
||||
- Atomic engine write + sidecar via AZ-280.
|
||||
- Composition-root factory.
|
||||
- Conformance test: a fake `InferenceRuntime` returns scripted engine bytes; the test asserts cache hit / miss outcomes for the documented matrix.
|
||||
- Per-cache-entry timing instrumentation.
|
||||
- `config.c10.backbones` schema extension on AZ-269's loader.
|
||||
|
||||
### Excluded
|
||||
|
||||
- The orchestration of when to compile (T5 owns `build_cache_artifacts`).
|
||||
- Descriptor generation (T2 owns).
|
||||
- Manifest writing (T3 owns).
|
||||
- TensorRT internals — owned by AZ-298 (the `compile_engine` impl); this task only consumes the protocol.
|
||||
- Engine deserialization at takeoff — owned by AZ-298 (load side) + the C7 component runtime self-check.
|
||||
- Engine version compatibility checks across deployments — out of scope; the filename schema (AZ-281) carries enough signal that mismatches surface as cache miss.
|
||||
- Multi-GPU compile — operator workstation is single-GPU per RESTRICT-OPS-2.
|
||||
- A re-build-now CLI flag — operator workflow goes through T5; force-rebuild is achieved by deleting the engine cache directory.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: Cold cache compiles every backbone**
|
||||
Given an empty `cache_root/engines/` and 3 backbones in `BackboneSpec[]`
|
||||
When `compile_engines_for_corpus(request)` is called
|
||||
Then 3 `EngineCacheEntry` are returned, all with `outcome = built`; 3 `.engine` files + 3 `.sha256` sidecars are present at `cache_root/engines/`; ONE WARN log per backbone (`c10.engine.cache.miss`); ONE INFO log summary with `engines_built=3, engines_reused=0`
|
||||
|
||||
**AC-2: Warm cache reuses every backbone**
|
||||
Given the same `cache_root/engines/` populated by a prior cold run
|
||||
When `compile_engines_for_corpus(request)` is called with identical request
|
||||
Then 3 `EngineCacheEntry` are returned, all `outcome = reused`; ZERO calls to `inference_runtime.compile_engine` (verifiable via spy); ONE INFO log per backbone (`c10.engine.cache.hit`); summary log shows `engines_reused=3`
|
||||
|
||||
**AC-3: Mixed cache (1 hit + 2 miss)**
|
||||
Given the cache contains only the DINOv2 engine; LightGlue and ALIKED are missing
|
||||
When `compile_engines_for_corpus(request)` is called
|
||||
Then DINOv2 → reused, LightGlue + ALIKED → built; the report shows `engines_built=2, engines_reused=1`
|
||||
|
||||
**AC-4: Hardware change invalidates cache**
|
||||
Given a cache populated for `(sm=87, jp=6.2, trt=10.3, fp16)` and the runtime now reports `(sm=89, jp=6.3, trt=10.5, fp16)`
|
||||
When `compile_engines_for_corpus(request)` is called
|
||||
Then ALL backbones have `outcome = built` (the filename differs, so the existing engines are not even consulted); the existing engines remain on disk (this task does NOT delete stale engines — that's the orchestrator's call)
|
||||
|
||||
**AC-5: Tampered sidecar invalidates that one engine**
|
||||
Given a `.engine` file matches its sidecar but a malicious actor flipped a bit in the sidecar (or the engine bytes drifted)
|
||||
When `compile_engines_for_corpus(request)` is called
|
||||
Then `sidecar.verify` returns `False` for that entry; that backbone is recompiled (`outcome = built`); ONE WARN log `kind="c10.engine.sidecar.mismatch"` with the offending path
|
||||
|
||||
**AC-6: `EngineBuildError` propagates without partial state**
|
||||
Given `inference_runtime.compile_engine` raises `EngineBuildError("CUDA OOM")` on the second of 3 backbones
|
||||
When `compile_engines_for_corpus(request)` is called
|
||||
Then `EngineBuildError` is raised; the first backbone's engine + sidecar ARE present (already-written cache reuse from prior runs); the second backbone's engine is NOT half-written (atomic write); the third backbone is NOT attempted; ONE ERROR log with the model name
|
||||
|
||||
**AC-7: `CalibrationCacheError` propagates with diagnostic**
|
||||
Given `inference_runtime.compile_engine` raises `CalibrationCacheError("calibration table missing for INT8")`
|
||||
When the compiler hits the failing backbone
|
||||
Then the error propagates; ONE ERROR log with `{model_name, calibration_path}`; partial state is consistent (atomic writes guarantee no half-engine on disk)
|
||||
|
||||
**AC-8: Filename schema + sidecar layout matches spec**
|
||||
Given a freshly-built DINOv2 engine on Tier-2 hardware (SM 87, JP 6.2, TRT 10.3, FP16)
|
||||
When inspecting `cache_root/engines/`
|
||||
Then the file is named `dinov2_vpr_sm87_jp62_trt103_fp16.engine`; the sidecar at `dinov2_vpr_sm87_jp62_trt103_fp16.engine.sha256` contains the 64-char hex digest; both match `EngineFilenameSchema.parse` and `Sha256Sidecar.verify`
|
||||
|
||||
**AC-9: `compile_duration_s` recorded for built; None for reused**
|
||||
Given a mix of hits and misses
|
||||
When inspecting `EngineCacheEntry`
|
||||
Then `compile_duration_s is not None` for every `built` entry; `compile_duration_s is None` for every `reused` entry; built durations are positive floats
|
||||
|
||||
**AC-10: Empty `backbones` list returns empty result**
|
||||
Given `request.backbones == ()`
|
||||
When `compile_engines_for_corpus(request)` is called
|
||||
Then `[]` is returned; ZERO calls to `inference_runtime.compile_engine`; ZERO files written; ONE INFO log summary with all-zero counts
|
||||
|
||||
## Non-Functional Requirements
|
||||
|
||||
**Performance**
|
||||
- Cache-hit path per backbone ≤ 100 ms (one filename construction + one `Path.exists` + one sidecar verify dominated by SHA-256 of the engine file, which is bounded by disk read bandwidth). For a 200 MB engine, this is ~1 s on NVMe — measure and document.
|
||||
- Cold compile is dominated by AZ-298's TensorRT runtime; this task imposes no additional time budget beyond AZ-298's.
|
||||
|
||||
**Compatibility**
|
||||
- AZ-281 (`EngineFilenameSchema`) and AZ-280 (`Sha256Sidecar`) are the schema and atomic-write helpers; this task introduces NO new third-party dependencies.
|
||||
|
||||
**Reliability**
|
||||
- Atomic writes via AZ-280 guarantee no half-engine on disk after a process kill.
|
||||
- Cache-miss recompile is idempotent — running the same compile twice produces identical bytes (TRT engine determinism is owned by AZ-298; this task assumes it).
|
||||
|
||||
## Unit Tests
|
||||
|
||||
| AC Ref | What to Test | Required Outcome |
|
||||
|--------|-------------|-----------------|
|
||||
| AC-1 | Empty cache_root + 3 backbones | All `built`; sidecars present |
|
||||
| AC-2 | Warm cache + identical request | All `reused`; zero `compile_engine` calls |
|
||||
| AC-3 | Cache populated for 1 of 3 backbones | 1 reused + 2 built |
|
||||
| AC-4 | Hardware change (different SM in fake runtime) | All `built`; old engines untouched |
|
||||
| AC-5 | Tampered sidecar (flip 1 byte) | That engine rebuilds; WARN log |
|
||||
| AC-6 | Fake runtime raises `EngineBuildError` mid-run | Error propagates; partial state consistent |
|
||||
| AC-7 | Fake runtime raises `CalibrationCacheError` | Error propagates with diagnostic |
|
||||
| AC-8 | Inspect filename + sidecar layout | Matches schema; both verify |
|
||||
| AC-9 | Compile_duration recorded | Set on `built`, None on `reused` |
|
||||
| AC-10 | Empty backbones | Empty result; zero side effects |
|
||||
| NFR-perf-cache-hit | Microbench cache-hit path × 100 with 200 MB engine | p99 ≤ 1.5 s (mostly SHA-256 read) |
|
||||
| NFR-reliability-atomic-write | Kill process mid-`compile_engine` | No half-engine on disk after restart |
|
||||
|
||||
## Constraints
|
||||
|
||||
- The filename schema is canonical via AZ-281; this task does NOT invent its own (per `coderule.mdc` "follow established project patterns").
|
||||
- The atomic-write + sidecar pattern is canonical via AZ-280; this task does NOT use `open(...).write()` or naked `pathlib.Path.write_bytes()`.
|
||||
- Cache hit is decided by `sidecar.verify` (file SHA-256 matches sidecar value); filename match alone is NOT sufficient (defends against bit-rot or bit-flip).
|
||||
- The `BackboneSpec` registry is config-driven; adding a new model is a config change, not a code change.
|
||||
- This task does NOT clean up stale engines (the orchestrator T5 may emit `ManifestCoverageError` on orphan files; cleanup is the operator's call).
|
||||
- This task introduces no new third-party dependencies.
|
||||
|
||||
## Risks & Mitigation
|
||||
|
||||
**Risk 1: SHA-256 verification of large engines is slow on warm path**
|
||||
- *Risk*: 200 MB engine × 5 backbones = 1 GB SHA-256 per warm idempotent run; on slow disks, this exceeds C10-PT-01's 1 min budget alone.
|
||||
- *Mitigation*: AZ-280's `Sha256Sidecar.verify` uses `sendfile` / `mmap` paths where available; benchmark documented in AZ-280. If still too slow, a future task adds an `mtime + size` quick-check fallback (out of scope this cycle).
|
||||
|
||||
**Risk 2: Partial cache after `EngineBuildError` on backbone N**
|
||||
- *Risk*: Backbones 1..N-1 are `built` and on disk; the N-th fails; backbones N+1..M are never attempted. The cache is "partially valid" — the orchestrator (T5) sees inconsistent state.
|
||||
- *Mitigation*: T5's coverage check + `ManifestCoverageError` surface this. The compiler does NOT delete the partial state; T5 decides whether to retry, fail, or roll back per the operator's request mode.
|
||||
|
||||
**Risk 3: TensorRT engine determinism not guaranteed across builds**
|
||||
- *Risk*: Two compiles of the same ONNX + calibration produce different bytes; cache-hit detection via SHA-256 fails post-rebuild.
|
||||
- *Mitigation*: TRT engine determinism is AZ-298's contract obligation; if it fails, this task's cache-hit ratio drops to 0 and operators see WARN logs. AZ-298's tests assert determinism; this task assumes it.
|
||||
|
||||
**Risk 4: Operator manually edits engine file but not sidecar**
|
||||
- *Risk*: Hand-debugging or manual tuning leaves an engine file whose bytes don't match its sidecar; AC-5 covers detection.
|
||||
- *Mitigation*: AC-5 + WARN log `c10.engine.sidecar.mismatch` surface the case immediately on next compile run; operators should re-generate via the build command.
|
||||
|
||||
## Runtime Completeness
|
||||
|
||||
- **Named capability**: TRT engine compile + hardware-tied cache reuse per D-C10-6 + D-C10-7 (description.md § 5; epic § Acceptance C10-IT-05; AC-NEW-1).
|
||||
- **Production code that must exist**: real `EngineCompiler` orchestrating real AZ-298 `compile_engine` + real AZ-280 atomic write/verify + real AZ-281 filename construction; real config-driven `BackboneSpec` registry.
|
||||
- **Allowed external stubs**: tests MAY use a fake `InferenceRuntime` that returns scripted bytes + a fake `host_info()` for hardware variation; production wiring uses the real AZ-298 runtime + real Sha256Sidecar.
|
||||
- **Unacceptable substitutes**: a Python-level `pickle` of a "fake engine" object (TRT engines are opaque CUDA blobs; faking them in production breaks AC-NEW-1's takeoff verify); skipping the sidecar (loses bit-rot detection); inventing a new filename scheme inside this task (defeats D-C10-7); `Path.write_bytes()` instead of AZ-280 (no atomicity guarantee).
|
||||
Reference in New Issue
Block a user