# Batch 33 / Cycle 1 — Implementation Report **Date**: 2026-05-12 **Tasks**: AZ-321 (C10 EngineCompiler — per-model TRT compile + hardware-tied cache reuse + AZ-281 filename schema + AZ-280 sidecar gate) **Story points landed**: 5 **Status**: complete (AZ-321 → In Testing) ## Scope summary Single-task batch landing the C10 per-model engine compile + cache- reuse orchestrator. `EngineCompiler.compile_engines_for_corpus(req)` walks the corpus, computes the canonical engine filename via AZ-281 `EngineFilenameSchema.build(...)`, and either reuses the cached binary (cache hit; AZ-280 `Sha256Sidecar.verify` returns True) or delegates to the AZ-297 `compile_engine` method on the injected runtime (cache miss; the runtime owns the write path and the sidecar emission). The orchestrator returns one `EngineCompileResult` per backbone carrying the canonical `EngineCacheEntry`, the `CompileOutcome.{BUILT,REUSED}` label, and the `compile_duration_s` (None on reuse). Hardware-tied cache reuse (D-C10-6 / D-C10-7) falls out naturally from the filename schema: an engine compiled on `(sm=87, jp=6.2, trt=10.3, fp16)` lives at a different path than one compiled on `(sm=89, jp=6.3, trt=10.5, fp16)`, so a hardware change produces cache misses for the new device and leaves the old files untouched (AC-4). Two design corrections vs. the task spec body: - **`EngineCacheEntry` shape** — the task spec proposed a c10-local `EngineCacheEntry` with `outcome` and `compile_duration_s` fields. That clashes with the canonical AZ-297 `_types.inference.EngineCacheEntry` already re-exported from `components.c10_provisioning`. The canonical shape wins; the AZ-321 wrapper is renamed `EngineCompileResult` and carries `{entry, outcome, compile_duration_s}` cleanly. - **`InferenceRuntime.host_info()`** — the task spec calls a hypothetical `host_info()` method on the runtime to retrieve `(sm, jp, trt)`. The AZ-297 Protocol does NOT expose host info. Rather than expand the frozen Protocol mid-cycle, we accept a `HostCapabilities` field on `EngineCompileRequest` so the composition root threads the host from its own probe (Tier-2 device introspection or Tier-1 test fixture). The compiler stays decoupled from any runtime-side introspection surface. The C10 layer is also forbidden by `test_az270_compose_root.test_ac6` from importing from `components.c7_inference` directly — that rule applies across all `components/*/*.py` files regardless of what the prose-level `module-layout.md` declares the "Imports from" list to be. The lint test wins. To respect it, `engine_compiler.py` defines `CompileEngineCallable` — a structural Protocol cut of `InferenceRuntime` exposing only the single `compile_engine` method the compiler actually uses — and catches the broader `Exception` class (the AZ-297 C7 error family stays the runtime's contract; the compiler dispatches on `type(exc).__name__` in its ERROR log payload and re-raises so the original exception type propagates to the caller intact). ## Files added / modified ### New (production) - `src/gps_denied_onboard/components/c10_provisioning/engine_compiler.py` — `CompileOutcome` enum (`BUILT` / `REUSED`), `BackboneSpec` DTO, `EngineCompileRequest` DTO, `EngineCompileResult` DTO, `EngineCompileSummary` DTO, `CompileEngineCallable` structural Protocol, and the `EngineCompiler` class with the single public `compile_engines_for_corpus` method. Helpers: `_build_config_for_backbone` (synthesises one `OptimizationProfile` with `min == opt == max == expected_input_shape` from the backbone spec; richer dynamic-shape ranges are out of scope for AZ-321), `_summarise` (aggregate counts for the `c10.engine.compile.summary` log). - `src/gps_denied_onboard/components/c10_provisioning/config.py` — `BackboneConfig` DTO (`model_name`, `onnx_path`, `expected_input_shape`, `input_name` with `"input"` default) + `C10ProvisioningConfig` (`backbones` tuple, `workspace_mb` default 4096 to match C7 NFT-LIM-01). Both validate in `__post_init__` (non-empty strings, positive shape dims, duplicate model_name detection). - `src/gps_denied_onboard/runtime_root/c10_factory.py` — `build_engine_compiler(config)` wires the existing `build_inference_runtime` factory through to a new `EngineCompiler` instance with a c10-scoped structured logger; `build_backbone_specs (config)` materialises the `BackboneSpec` tuple from `config.components['c10_provisioning'].backbones`. ### Modified (production) - `src/gps_denied_onboard/components/c10_provisioning/__init__.py` — re-exports the AZ-321 public surface (`EngineCompiler`, `BackboneSpec`, `EngineCompileRequest`, `EngineCompileResult`, `CompileOutcome`, `EngineCompileSummary`, `CompileEngineCallable`, `BackboneConfig`, `C10ProvisioningConfig`) and registers the new config block via `register_component_block("c10_provisioning", C10ProvisioningConfig)`. `CacheProvisioner` / `Manifest` / `EngineCacheEntry` re-exports unchanged. ### New (tests) - `tests/unit/c10_provisioning/test_engine_compiler.py` — **NEW** Tier-1 suite covering every AC + the 2 Tier-2 NFR placeholders: - **AC-1** cold cache + 3 backbones → all `BUILT`; 3 `.engine` + 3 `.sha256` files on disk; 3 `c10.engine.cache.miss` WARN logs; 1 `c10.engine.compile.summary` INFO log with `engines_built=3`. - **AC-2** warm cache + identical request → all `REUSED`; `compile_duration_s is None` for every result; ZERO calls to the fake runtime; 3 `c10.engine.cache.hit` INFO logs. - **AC-3** mixed (1 hit + 2 miss) — DINOv2 reused, LightGlue + ALIKED built; 2 calls to the fake runtime. - **AC-4** hardware change (sm 87→89, jp 6.2→6.3, trt 10.3→10.5): every backbone rebuilt at the new filename; old files at the old filename untouched on disk. - **AC-5** tampered sidecar (overwrite LightGlue's `.sha256` with `0`×64): LightGlue rebuilt; DINOv2 + ALIKED still reused; 1 `c10.engine.sidecar.mismatch` WARN log with `model_name= lightglue` and `reason=digest_mismatch`. Plus a sibling case where the sidecar file is deleted entirely (`Sha256Sidecar.verify` raises) — same WARN-then-rebuild outcome. - **AC-6** `EngineBuildError` mid-corpus (backbone 2 of 3 fails): error propagates; backbone 1 (pre-cached, reused) untouched on disk; backbone 2's would-be engine NOT on disk (atomic-write guarantee from the fake mirrors AZ-298's real behaviour); backbone 3 never attempted (single call recorded for backbone 2). - **AC-7** `CalibrationCacheError` propagates with the `c10.engine.compile.error` ERROR log carrying `model_name`, `calibration_path`, `error_class=CalibrationCacheError`. - **AC-8** filename is exactly `dinov2_vpr__sm87_jp6.2_trt10.3_fp16.engine` (per AZ-281 canonical schema with the `__` separator between model and `sm`); sidecar at `*.engine.sha256` with 64-hex digest; `EngineFilenameSchema.parse` round-trip + `Sha256Sidecar.verify` both pass. - **AC-9** `compile_duration_s` is a positive float for every `BUILT` result, `None` for every `REUSED` result. - **AC-10** empty `backbones` tuple → empty result; ZERO runtime calls; ZERO files written; 1 summary log with all-zero counts. - **NFR-perf-cache-hit** Tier-2 placeholder skip (200 MB engine sweep belongs in the AZ-321 microbench harness on Jetson). - **NFR-reliability-atomic-write** Tier-2 placeholder skip (kill- during-compile scenario lives in the microbench harness; the atomicity contract itself is owned by AZ-280's tests). The Tier-1 tests use a `_FakeRuntime` that satisfies `CompileEngineCallable` and writes deterministic engine bytes via the REAL `Sha256Sidecar.write_atomic_and_sidecar` — so the cache-hit / cache-miss / tampered-sidecar paths run against the same helper the production wiring uses. Only the C7-runtime-specific compile internals (TRT engine bytes, calibration cache, GPU memory) are mocked. ### Modified (docs) - `_docs/02_document/module-layout.md` — c10_provisioning Per- Component Mapping now lists the new internal modules (`engine_compiler.py`, `config.py`) and the composition-root `c10_factory.py`; the Public API re-export list is extended with the AZ-321 surface; the `Config block` line is added (registered on import). `default_provisioner.py` row marked `pending` until the AZ-325 task lands. ## Acceptance criteria coverage | AC | Test | Status | |----|------|--------| | AC-1 Cold cache → all built | `test_ac1_cold_cache_compiles_every_backbone` | passing | | AC-2 Warm cache → all reused, zero compile calls | `test_ac2_warm_cache_reuses_every_backbone` | passing | | AC-3 Mixed cache | `test_ac3_mixed_cache_hits_and_misses` | passing | | AC-4 Hardware change invalidates filename | `test_ac4_hardware_change_invalidates_cache` | passing | | AC-5 Tampered sidecar + missing sidecar paths | `test_ac5_tampered_sidecar_invalidates_that_engine` + `test_missing_sidecar_treated_as_cache_miss` | passing | | AC-6 `EngineBuildError` propagates, partial state consistent | `test_ac6_engine_build_error_propagates_and_third_backbone_untouched` | passing | | AC-7 `CalibrationCacheError` propagates with diagnostic log | `test_ac7_calibration_cache_error_propagates` | passing | | AC-8 Filename + sidecar layout matches AZ-281 schema | `test_ac8_filename_and_sidecar_layout` | passing | | AC-9 `compile_duration_s` recorded for built only | `test_ac9_compile_duration_recorded_for_built_only` | passing | | AC-10 Empty backbones → empty result, no side effects | `test_ac10_empty_backbones_returns_empty` | passing | | NFR-perf-cache-hit p99 ≤ 1.5 s for 200 MB engine | `test_nfr_perf_cache_hit_p99_under_1500ms_for_200mb_engine` (Tier-2) | Tier-2 skipped | | NFR-reliability atomic-write no half-engine after kill | `test_nfr_reliability_atomic_write_no_half_engine_after_kill` (Tier-2) | Tier-2 skipped | ## AC Test Coverage: 10 of 10 covered (+ 2 NFRs) ## Code Review Verdict: PASS_WITH_WARNINGS (4 Low accepted; see Findings) ## Auto-Fix Attempts: 0 ## Stuck Agents: None ## Findings (self-review) | # | Severity | Category | Location | Note | Resolution | |---|----------|----------|----------|------|------------| | 1 | Low | Architecture | `engine_compiler.py::_compile_one` | Catches the broad `Exception` (not the specific AZ-297 `RuntimeError` family) because the c10 layer cannot import `components.c7_inference` (architecture rule `test_az270_compose_root.test_ac6`). The C7 contract scopes its runtime exceptions to its own family; ANY exception bubbling out of `compile_engine` is treated as a compile failure here. Re-raise preserves the original type. Inline comment documents the rule. | Open (Low) — accepted; architecture rule wins. | | 2 | Low | Maintainability | `engine_compiler.py::CompileEngineCallable` | Duplicates the `compile_engine` method shape from the C7 `InferenceRuntime` Protocol. Mirrors the LightGlue dual-Protocol pattern already in `_types/manifests.py` (consumer-side structural cut vs. producer-side opaque marker). | Open (Low) — accepted; matches established pattern. | | 3 | Low | Architecture | `engine_compiler.py` ↔ C7InferenceConfig | `EngineCompileRequest.cache_root` MUST equal the directory the C7 runtime writes to (`C7InferenceConfig.engine_cache_dir`). The composition root (`build_engine_compiler` + the C10 corpus driver T5 in AZ-325) is responsible for keeping the two in sync; the compiler itself trusts the request. A divergence would cause cache hits to always miss. | Open (Low) — flagged for AZ-325 to enforce. | | 4 | Low | Scope | `engine_compiler.py::_build_config_for_backbone` | Synthesises exactly one `OptimizationProfile` with `min == opt == max == expected_input_shape`. Backbones requiring dynamic input ranges would need a richer `BackboneSpec` carrying explicit `OptimizationProfile` tuples. None of the AZ-321 corpus backbones (DINOv2-VPR, LightGlue, ALIKED) need dynamic shapes today, but the limitation is real. | Open (Low) — accepted; future extension. | ## Tracker - AZ-321 transitioned to **In Progress** at session start; will move to **In Testing** post-commit per `protocols.md`. ## Test suite - `tests/unit/c10_provisioning/` — 13 passing, 2 Tier-2 skips (cache-hit p99 NFR + atomic-write kill scenario). - Combined unit suite excluding pending components (c1, c2, c2.5, c3, c3.5, c4, c5, c8, c11, c12) and the c6 collection blocker on this host (missing `psycopg_pool` is a known dev-machine env issue, pre-existing) — 543 passing, 21 environment-skipped, 1 warning (pre-existing `pynvml` FutureWarning unrelated to AZ-321). ## Next batch Cycle 1 advances per the greenfield queue — autodev re-detects the next AZ ticket in the Step 7 batch loop. AZ-321 unblocks AZ-322 (C10 Descriptor Batcher), AZ-337 (C2 UltraVPR), AZ-345 / AZ-346 / AZ-347 (C3 matchers), and AZ-349 (C3.5 refiner) at the topological level; the next ready batch is computed by `compute-next-batch`. A cumulative review (batches 31–33) will fire at the next sub-skill phase boundary per Step 14.5's K=3 trigger.