Land the C10 per-model engine compile + cache-reuse orchestrator. `EngineCompiler.compile_engines_for_corpus(request)` walks the corpus, computes the canonical engine filename via AZ-281 `EngineFilenameSchema.build`, and either reuses the cached binary (cache hit, AZ-280 `Sha256Sidecar.verify` returns True) or delegates to the AZ-297 `compile_engine` on the injected runtime (cache miss; the runtime owns the write path). Returns one `EngineCompileResult` per backbone carrying the canonical `EngineCacheEntry`, outcome (BUILT / REUSED), and `compile_duration_s` (None on reuse). Hardware-tied reuse (D-C10-6 / D-C10-7) falls out of the filename schema — a host change rebuilds at the new path and leaves the old files untouched (AC-4). Design corrections vs. the task spec body: - The spec proposed a c10-local `EngineCacheEntry` carrying outcome and duration; that name is already taken by the AZ-297 canonical DTO. The wrapper is renamed `EngineCompileResult`; the canonical shape wins. - The spec called `InferenceRuntime.host_info()`, which is not in the AZ-297 Protocol. `HostCapabilities` is threaded through `EngineCompileRequest` instead so the composition root owns host probing and the compiler stays decoupled. - The c10 layer cannot import `components.c7_inference` (arch rule `test_az270_compose_root.test_ac6`). `engine_compiler.py` defines `CompileEngineCallable` — a structural Protocol cut of `InferenceRuntime` exposing only `compile_engine` — and catches broad `Exception` (re-raising preserves the original type; `error_class` is recorded in the ERROR log payload). Production - engine_compiler.py: `CompileOutcome` enum, `BackboneSpec`, `EngineCompileRequest`, `EngineCompileResult`, `EngineCompileSummary` DTOs; `CompileEngineCallable` Protocol; `EngineCompiler` with the single public method. - config.py: `BackboneConfig` + `C10ProvisioningConfig` (`workspace_mb` default 4 GiB to match C7 NFT-LIM-01); validate positive shape dims and duplicate model_name detection in `__post_init__`. - runtime_root/c10_factory.py: `build_engine_compiler(config)` wires the existing `build_inference_runtime` factory through; `build_backbone_specs(config)` materialises the `BackboneSpec` tuple from the config block. - components/c10_provisioning/__init__.py: re-exports the AZ-321 surface and registers the new config block. Tests - test_engine_compiler.py: covers AC-1..AC-10 + missing-sidecar sibling case for AC-5. Tier-1 via fake runtime that writes through the REAL `Sha256Sidecar.write_atomic_and_sidecar`. Tier-2 placeholders for the cache-hit p99 NFR (200 MB engine sweep) and kill-during-compile atomic-write NFR. Docs - module-layout.md: c10_provisioning Per-Component Mapping lists the new internal modules (engine_compiler.py, config.py), the composition-root c10_factory.py, the AZ-321 public re-export surface, and the registered config block. - batch_33_cycle1_report.md + reviews/batch_33_review.md: PASS_WITH_WARNINGS (4 Low findings accepted). Tests run: c10_provisioning 13 passing + 2 Tier-2 skips; combined unit suite (excluding pending components) 543 passing, 21 env-skipped. Co-authored-by: Cursor <cursoragent@cursor.com>
13 KiB
Batch 33 / Cycle 1 — Implementation Report
Date: 2026-05-12 Tasks: AZ-321 (C10 EngineCompiler — per-model TRT compile + hardware-tied cache reuse + AZ-281 filename schema + AZ-280 sidecar gate) Story points landed: 5 Status: complete (AZ-321 → In Testing)
Scope summary
Single-task batch landing the C10 per-model engine compile + cache-
reuse orchestrator. EngineCompiler.compile_engines_for_corpus(req)
walks the corpus, computes the canonical engine filename via AZ-281
EngineFilenameSchema.build(...), and either reuses the cached
binary (cache hit; AZ-280 Sha256Sidecar.verify returns True) or
delegates to the AZ-297 compile_engine method on the injected
runtime (cache miss; the runtime owns the write path and the sidecar
emission). The orchestrator returns one EngineCompileResult per
backbone carrying the canonical EngineCacheEntry, the
CompileOutcome.{BUILT,REUSED} label, and the compile_duration_s
(None on reuse). Hardware-tied cache reuse (D-C10-6 / D-C10-7) falls
out naturally from the filename schema: an engine compiled on
(sm=87, jp=6.2, trt=10.3, fp16) lives at a different path than one
compiled on (sm=89, jp=6.3, trt=10.5, fp16), so a hardware change
produces cache misses for the new device and leaves the old files
untouched (AC-4).
Two design corrections vs. the task spec body:
EngineCacheEntryshape — the task spec proposed a c10-localEngineCacheEntrywithoutcomeandcompile_duration_sfields. That clashes with the canonical AZ-297_types.inference.EngineCacheEntryalready re-exported fromcomponents.c10_provisioning. The canonical shape wins; the AZ-321 wrapper is renamedEngineCompileResultand carries{entry, outcome, compile_duration_s}cleanly.InferenceRuntime.host_info()— the task spec calls a hypotheticalhost_info()method on the runtime to retrieve(sm, jp, trt). The AZ-297 Protocol does NOT expose host info. Rather than expand the frozen Protocol mid-cycle, we accept aHostCapabilitiesfield onEngineCompileRequestso the composition root threads the host from its own probe (Tier-2 device introspection or Tier-1 test fixture). The compiler stays decoupled from any runtime-side introspection surface.
The C10 layer is also forbidden by test_az270_compose_root.test_ac6
from importing from components.c7_inference directly — that rule
applies across all components/*/*.py files regardless of what the
prose-level module-layout.md declares the "Imports from" list to be.
The lint test wins. To respect it, engine_compiler.py defines
CompileEngineCallable — a structural Protocol cut of
InferenceRuntime exposing only the single compile_engine method
the compiler actually uses — and catches the broader Exception
class (the AZ-297 C7 error family stays the runtime's contract; the
compiler dispatches on type(exc).__name__ in its ERROR log payload
and re-raises so the original exception type propagates to the
caller intact).
Files added / modified
New (production)
src/gps_denied_onboard/components/c10_provisioning/engine_compiler.py—CompileOutcomeenum (BUILT/REUSED),BackboneSpecDTO,EngineCompileRequestDTO,EngineCompileResultDTO,EngineCompileSummaryDTO,CompileEngineCallablestructural Protocol, and theEngineCompilerclass with the single publiccompile_engines_for_corpusmethod. Helpers:_build_config_for_backbone(synthesises oneOptimizationProfilewithmin == opt == max == expected_input_shapefrom the backbone spec; richer dynamic-shape ranges are out of scope for AZ-321),_summarise(aggregate counts for thec10.engine.compile.summarylog).src/gps_denied_onboard/components/c10_provisioning/config.py—BackboneConfigDTO (model_name,onnx_path,expected_input_shape,input_namewith"input"default) +C10ProvisioningConfig(backbonestuple,workspace_mbdefault 4096 to match C7 NFT-LIM-01). Both validate in__post_init__(non-empty strings, positive shape dims, duplicate model_name detection).src/gps_denied_onboard/runtime_root/c10_factory.py—build_engine_compiler(config)wires the existingbuild_inference_runtimefactory through to a newEngineCompilerinstance with a c10-scoped structured logger;build_backbone_specs (config)materialises theBackboneSpectuple fromconfig.components['c10_provisioning'].backbones.
Modified (production)
src/gps_denied_onboard/components/c10_provisioning/__init__.py— re-exports the AZ-321 public surface (EngineCompiler,BackboneSpec,EngineCompileRequest,EngineCompileResult,CompileOutcome,EngineCompileSummary,CompileEngineCallable,BackboneConfig,C10ProvisioningConfig) and registers the new config block viaregister_component_block("c10_provisioning", C10ProvisioningConfig).CacheProvisioner/Manifest/EngineCacheEntryre-exports unchanged.
New (tests)
tests/unit/c10_provisioning/test_engine_compiler.py— NEW Tier-1 suite covering every AC + the 2 Tier-2 NFR placeholders:- AC-1 cold cache + 3 backbones → all
BUILT; 3.engine+ 3.sha256files on disk; 3c10.engine.cache.missWARN logs; 1c10.engine.compile.summaryINFO log withengines_built=3. - AC-2 warm cache + identical request → all
REUSED;compile_duration_s is Nonefor every result; ZERO calls to the fake runtime; 3c10.engine.cache.hitINFO logs. - AC-3 mixed (1 hit + 2 miss) — DINOv2 reused, LightGlue + ALIKED built; 2 calls to the fake runtime.
- AC-4 hardware change (sm 87→89, jp 6.2→6.3, trt 10.3→10.5): every backbone rebuilt at the new filename; old files at the old filename untouched on disk.
- AC-5 tampered sidecar (overwrite LightGlue's
.sha256with0×64): LightGlue rebuilt; DINOv2 + ALIKED still reused; 1c10.engine.sidecar.mismatchWARN log withmodel_name= lightglueandreason=digest_mismatch. Plus a sibling case where the sidecar file is deleted entirely (Sha256Sidecar.verifyraises) — same WARN-then-rebuild outcome. - AC-6
EngineBuildErrormid-corpus (backbone 2 of 3 fails): error propagates; backbone 1 (pre-cached, reused) untouched on disk; backbone 2's would-be engine NOT on disk (atomic-write guarantee from the fake mirrors AZ-298's real behaviour); backbone 3 never attempted (single call recorded for backbone 2). - AC-7
CalibrationCacheErrorpropagates with thec10.engine.compile.errorERROR log carryingmodel_name,calibration_path,error_class=CalibrationCacheError. - AC-8 filename is exactly
dinov2_vpr__sm87_jp6.2_trt10.3_fp16.engine(per AZ-281 canonical schema with the__separator between model andsm); sidecar at*.engine.sha256with 64-hex digest;EngineFilenameSchema.parseround-trip +Sha256Sidecar.verifyboth pass. - AC-9
compile_duration_sis a positive float for everyBUILTresult,Nonefor everyREUSEDresult. - AC-10 empty
backbonestuple → empty result; ZERO runtime calls; ZERO files written; 1 summary log with all-zero counts. - NFR-perf-cache-hit Tier-2 placeholder skip (200 MB engine sweep belongs in the AZ-321 microbench harness on Jetson).
- NFR-reliability-atomic-write Tier-2 placeholder skip (kill- during-compile scenario lives in the microbench harness; the atomicity contract itself is owned by AZ-280's tests).
- AC-1 cold cache + 3 backbones → all
The Tier-1 tests use a _FakeRuntime that satisfies
CompileEngineCallable and writes deterministic engine bytes via
the REAL Sha256Sidecar.write_atomic_and_sidecar — so the cache-hit
/ cache-miss / tampered-sidecar paths run against the same helper
the production wiring uses. Only the C7-runtime-specific compile
internals (TRT engine bytes, calibration cache, GPU memory) are
mocked.
Modified (docs)
_docs/02_document/module-layout.md— c10_provisioning Per- Component Mapping now lists the new internal modules (engine_compiler.py,config.py) and the composition-rootc10_factory.py; the Public API re-export list is extended with the AZ-321 surface; theConfig blockline is added (registered on import).default_provisioner.pyrow markedpendinguntil the AZ-325 task lands.
Acceptance criteria coverage
| AC | Test | Status |
|---|---|---|
| AC-1 Cold cache → all built | test_ac1_cold_cache_compiles_every_backbone |
passing |
| AC-2 Warm cache → all reused, zero compile calls | test_ac2_warm_cache_reuses_every_backbone |
passing |
| AC-3 Mixed cache | test_ac3_mixed_cache_hits_and_misses |
passing |
| AC-4 Hardware change invalidates filename | test_ac4_hardware_change_invalidates_cache |
passing |
| AC-5 Tampered sidecar + missing sidecar paths | test_ac5_tampered_sidecar_invalidates_that_engine + test_missing_sidecar_treated_as_cache_miss |
passing |
AC-6 EngineBuildError propagates, partial state consistent |
test_ac6_engine_build_error_propagates_and_third_backbone_untouched |
passing |
AC-7 CalibrationCacheError propagates with diagnostic log |
test_ac7_calibration_cache_error_propagates |
passing |
| AC-8 Filename + sidecar layout matches AZ-281 schema | test_ac8_filename_and_sidecar_layout |
passing |
AC-9 compile_duration_s recorded for built only |
test_ac9_compile_duration_recorded_for_built_only |
passing |
| AC-10 Empty backbones → empty result, no side effects | test_ac10_empty_backbones_returns_empty |
passing |
| NFR-perf-cache-hit p99 ≤ 1.5 s for 200 MB engine | test_nfr_perf_cache_hit_p99_under_1500ms_for_200mb_engine (Tier-2) |
Tier-2 skipped |
| NFR-reliability atomic-write no half-engine after kill | test_nfr_reliability_atomic_write_no_half_engine_after_kill (Tier-2) |
Tier-2 skipped |
AC Test Coverage: 10 of 10 covered (+ 2 NFRs)
Code Review Verdict: PASS_WITH_WARNINGS (4 Low accepted; see Findings)
Auto-Fix Attempts: 0
Stuck Agents: None
Findings (self-review)
| # | Severity | Category | Location | Note | Resolution |
|---|---|---|---|---|---|
| 1 | Low | Architecture | engine_compiler.py::_compile_one |
Catches the broad Exception (not the specific AZ-297 RuntimeError family) because the c10 layer cannot import components.c7_inference (architecture rule test_az270_compose_root.test_ac6). The C7 contract scopes its runtime exceptions to its own family; ANY exception bubbling out of compile_engine is treated as a compile failure here. Re-raise preserves the original type. Inline comment documents the rule. |
Open (Low) — accepted; architecture rule wins. |
| 2 | Low | Maintainability | engine_compiler.py::CompileEngineCallable |
Duplicates the compile_engine method shape from the C7 InferenceRuntime Protocol. Mirrors the LightGlue dual-Protocol pattern already in _types/manifests.py (consumer-side structural cut vs. producer-side opaque marker). |
Open (Low) — accepted; matches established pattern. |
| 3 | Low | Architecture | engine_compiler.py ↔ C7InferenceConfig |
EngineCompileRequest.cache_root MUST equal the directory the C7 runtime writes to (C7InferenceConfig.engine_cache_dir). The composition root (build_engine_compiler + the C10 corpus driver T5 in AZ-325) is responsible for keeping the two in sync; the compiler itself trusts the request. A divergence would cause cache hits to always miss. |
Open (Low) — flagged for AZ-325 to enforce. |
| 4 | Low | Scope | engine_compiler.py::_build_config_for_backbone |
Synthesises exactly one OptimizationProfile with min == opt == max == expected_input_shape. Backbones requiring dynamic input ranges would need a richer BackboneSpec carrying explicit OptimizationProfile tuples. None of the AZ-321 corpus backbones (DINOv2-VPR, LightGlue, ALIKED) need dynamic shapes today, but the limitation is real. |
Open (Low) — accepted; future extension. |
Tracker
- AZ-321 transitioned to In Progress at session start; will move
to In Testing post-commit per
protocols.md.
Test suite
tests/unit/c10_provisioning/— 13 passing, 2 Tier-2 skips (cache-hit p99 NFR + atomic-write kill scenario).- Combined unit suite excluding pending components (c1, c2, c2.5,
c3, c3.5, c4, c5, c8, c11, c12) and the c6 collection blocker on
this host (missing
psycopg_poolis a known dev-machine env issue, pre-existing) — 543 passing, 21 environment-skipped, 1 warning (pre-existingpynvmlFutureWarning unrelated to AZ-321).
Next batch
Cycle 1 advances per the greenfield queue — autodev re-detects the
next AZ ticket in the Step 7 batch loop. AZ-321 unblocks AZ-322
(C10 Descriptor Batcher), AZ-337 (C2 UltraVPR), AZ-345 / AZ-346 /
AZ-347 (C3 matchers), and AZ-349 (C3.5 refiner) at the topological
level; the next ready batch is computed by compute-next-batch.
A cumulative review (batches 31–33) will fire at the next sub-skill phase boundary per Step 14.5's K=3 trigger.