mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-22 11:31:13 +00:00
[AZ-321] C10 EngineCompiler: hardware-tied TRT compile + cache reuse
Land the C10 per-model engine compile + cache-reuse orchestrator. `EngineCompiler.compile_engines_for_corpus(request)` walks the corpus, computes the canonical engine filename via AZ-281 `EngineFilenameSchema.build`, and either reuses the cached binary (cache hit, AZ-280 `Sha256Sidecar.verify` returns True) or delegates to the AZ-297 `compile_engine` on the injected runtime (cache miss; the runtime owns the write path). Returns one `EngineCompileResult` per backbone carrying the canonical `EngineCacheEntry`, outcome (BUILT / REUSED), and `compile_duration_s` (None on reuse). Hardware-tied reuse (D-C10-6 / D-C10-7) falls out of the filename schema — a host change rebuilds at the new path and leaves the old files untouched (AC-4). Design corrections vs. the task spec body: - The spec proposed a c10-local `EngineCacheEntry` carrying outcome and duration; that name is already taken by the AZ-297 canonical DTO. The wrapper is renamed `EngineCompileResult`; the canonical shape wins. - The spec called `InferenceRuntime.host_info()`, which is not in the AZ-297 Protocol. `HostCapabilities` is threaded through `EngineCompileRequest` instead so the composition root owns host probing and the compiler stays decoupled. - The c10 layer cannot import `components.c7_inference` (arch rule `test_az270_compose_root.test_ac6`). `engine_compiler.py` defines `CompileEngineCallable` — a structural Protocol cut of `InferenceRuntime` exposing only `compile_engine` — and catches broad `Exception` (re-raising preserves the original type; `error_class` is recorded in the ERROR log payload). Production - engine_compiler.py: `CompileOutcome` enum, `BackboneSpec`, `EngineCompileRequest`, `EngineCompileResult`, `EngineCompileSummary` DTOs; `CompileEngineCallable` Protocol; `EngineCompiler` with the single public method. - config.py: `BackboneConfig` + `C10ProvisioningConfig` (`workspace_mb` default 4 GiB to match C7 NFT-LIM-01); validate positive shape dims and duplicate model_name detection in `__post_init__`. - runtime_root/c10_factory.py: `build_engine_compiler(config)` wires the existing `build_inference_runtime` factory through; `build_backbone_specs(config)` materialises the `BackboneSpec` tuple from the config block. - components/c10_provisioning/__init__.py: re-exports the AZ-321 surface and registers the new config block. Tests - test_engine_compiler.py: covers AC-1..AC-10 + missing-sidecar sibling case for AC-5. Tier-1 via fake runtime that writes through the REAL `Sha256Sidecar.write_atomic_and_sidecar`. Tier-2 placeholders for the cache-hit p99 NFR (200 MB engine sweep) and kill-during-compile atomic-write NFR. Docs - module-layout.md: c10_provisioning Per-Component Mapping lists the new internal modules (engine_compiler.py, config.py), the composition-root c10_factory.py, the AZ-321 public re-export surface, and the registered config block. - batch_33_cycle1_report.md + reviews/batch_33_review.md: PASS_WITH_WARNINGS (4 Low findings accepted). Tests run: c10_provisioning 13 passing + 2 Tier-2 skips; combined unit suite (excluding pending components) 543 passing, 21 env-skipped. Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -0,0 +1,54 @@
|
||||
# Code Review Report — Batch 33 / Cycle 1
|
||||
|
||||
**Batch**: 33
|
||||
**Tasks**: AZ-321 (C10 EngineCompiler)
|
||||
**Date**: 2026-05-12
|
||||
**Verdict**: PASS_WITH_WARNINGS
|
||||
|
||||
## Findings
|
||||
|
||||
| # | Severity | Category | File:Line | Title |
|
||||
|---|----------|----------|-----------|-------|
|
||||
| 1 | Low | Architecture | `engine_compiler.py::_compile_one` | Broad `except Exception` rather than the AZ-297 `RuntimeError` family |
|
||||
| 2 | Low | Maintainability | `engine_compiler.py::CompileEngineCallable` | Duplicates the `compile_engine` signature from the C7 `InferenceRuntime` Protocol |
|
||||
| 3 | Low | Architecture | `engine_compiler.py` ↔ C7InferenceConfig | `EngineCompileRequest.cache_root` MUST mirror `C7InferenceConfig.engine_cache_dir` — invariant enforced by the composition root, not the compiler |
|
||||
| 4 | Low | Scope | `engine_compiler.py::_build_config_for_backbone` | Single static `OptimizationProfile` synthesised per backbone; dynamic shape ranges out of scope |
|
||||
|
||||
### Finding Details
|
||||
|
||||
**F1: Broad `except Exception` on `compile_engine`** (Low / Architecture)
|
||||
- Location: `src/gps_denied_onboard/components/c10_provisioning/engine_compiler.py::EngineCompiler._compile_one`
|
||||
- Description: The C7 contract scopes `InferenceRuntime.compile_engine` exceptions to the C7-local `RuntimeError` family (`EngineBuildError`, `CalibrationCacheError`, ...). The c10 layer is forbidden from importing `components.c7_inference` (architecture rule `test_az270_compose_root.test_ac6` walks all `components/*/*.py` files and flags any cross-component import — including TYPE_CHECKING-guarded ones). Catching the broader `Exception` and dispatching by `type(exc).__name__` in the log payload is the cheapest fix that respects the rule. Re-raise preserves the original exception type for the caller.
|
||||
- Suggestion: A longer-term cleanup would either (a) hoist the C7 error envelope to `_types/inference.py` (parallels the `EngineCacheEntry` move) or (b) extend the architecture-lint to allow Public-API-only imports from sibling components. Both are bigger scope than AZ-321.
|
||||
- Task: AZ-321
|
||||
- Resolution: Open (Low) — accepted as documented; inline comment in source.
|
||||
|
||||
**F2: `CompileEngineCallable` shadow Protocol** (Low / Maintainability)
|
||||
- Location: `engine_compiler.py::CompileEngineCallable`
|
||||
- Description: Defines a structural Protocol carrying only the single `compile_engine` method shape — the c10 compiler's narrow consumer cut of the AZ-297 `InferenceRuntime` Protocol. Mirrors the LightGlue dual-Protocol pattern already documented in `_types/manifests.py` (`EngineHandle` consumer-cut Protocol vs `c7_inference.EngineHandle` opaque marker class).
|
||||
- Suggestion: None — same pattern already accepted across the codebase. A future "Public-API cross-component import allowlist" lint update could collapse this dual.
|
||||
- Task: AZ-321
|
||||
- Resolution: Open (Low) — accepted; matches existing pattern.
|
||||
|
||||
**F3: `cache_root` / `engine_cache_dir` invariant** (Low / Architecture)
|
||||
- Location: `engine_compiler.py::EngineCompileRequest.cache_root` vs `C7InferenceConfig.engine_cache_dir`
|
||||
- Description: The compiler's cache-hit detection writes nothing — it just checks `target_path.exists()` + `Sha256Sidecar.verify`. The C7 runtime owns the `.engine` write path and writes to `C7InferenceConfig.engine_cache_dir`. If the composition root passes a different `cache_root` to the compiler than the C7 runtime is configured for, cache hits will never fire (the compiler will look at the wrong directory). The compiler itself trusts the request; the wiring invariant lives in `build_engine_compiler` and (later) the AZ-325 `CacheProvisioner` driver.
|
||||
- Suggestion: AZ-325 (C10 Cache Provisioner — the orchestrator T5 that drives the compiler) should pull both paths from the same config field or assert their equality at construction time. The compiler stays scope-bound to one model at a time.
|
||||
- Task: AZ-321 (flag for AZ-325 follow-up)
|
||||
- Resolution: Open (Low) — accepted; flagged for AZ-325.
|
||||
|
||||
**F4: Single static `OptimizationProfile` per backbone** (Low / Scope)
|
||||
- Location: `engine_compiler.py::_build_config_for_backbone`
|
||||
- Description: The synthesised `BuildConfig` carries exactly one `OptimizationProfile` with `min == opt == max == expected_input_shape`. Backbones requiring dynamic input ranges (variable batch size, variable image resolution) would need a richer `BackboneSpec` carrying explicit `OptimizationProfile` tuples. AZ-321's named corpus (DINOv2-VPR, LightGlue, ALIKED) uses fixed shapes; this is OK today but a real limitation.
|
||||
- Suggestion: When the first dynamic-shape backbone arrives, extend `BackboneSpec` with a `dynamic_profiles: tuple[OptimizationProfile, ...]` field and prefer it over the synthetic single-profile fallback inside `_build_config_for_backbone`.
|
||||
- Task: AZ-321
|
||||
- Resolution: Open (Low) — accepted; future extension.
|
||||
|
||||
## Verdict Logic
|
||||
|
||||
- 0 Critical
|
||||
- 0 High
|
||||
- 0 Medium
|
||||
- 4 Low
|
||||
|
||||
→ **PASS_WITH_WARNINGS**: only Low findings; all accepted as documented.
|
||||
Reference in New Issue
Block a user