mirror of https://github.com/azaion/gps-denied-onboard.git synced 2026-06-21 22:11:12 +00:00

Files

T

Oleksandr Bezdieniezhnykh 0dfe7c5301 [AZ-321] C10 EngineCompiler: hardware-tied TRT compile + cache reuse

Land the C10 per-model engine compile + cache-reuse orchestrator.
`EngineCompiler.compile_engines_for_corpus(request)` walks the
corpus, computes the canonical engine filename via AZ-281
`EngineFilenameSchema.build`, and either reuses the cached binary
(cache hit, AZ-280 `Sha256Sidecar.verify` returns True) or delegates
to the AZ-297 `compile_engine` on the injected runtime (cache miss;
the runtime owns the write path). Returns one `EngineCompileResult`
per backbone carrying the canonical `EngineCacheEntry`, outcome
(BUILT / REUSED), and `compile_duration_s` (None on reuse).
Hardware-tied reuse (D-C10-6 / D-C10-7) falls out of the filename
schema — a host change rebuilds at the new path and leaves the old
files untouched (AC-4).

Design corrections vs. the task spec body:
- The spec proposed a c10-local `EngineCacheEntry` carrying outcome
  and duration; that name is already taken by the AZ-297 canonical
  DTO. The wrapper is renamed `EngineCompileResult`; the canonical
  shape wins.
- The spec called `InferenceRuntime.host_info()`, which is not in
  the AZ-297 Protocol. `HostCapabilities` is threaded through
  `EngineCompileRequest` instead so the composition root owns host
  probing and the compiler stays decoupled.
- The c10 layer cannot import `components.c7_inference` (arch rule
  `test_az270_compose_root.test_ac6`). `engine_compiler.py` defines
  `CompileEngineCallable` — a structural Protocol cut of
  `InferenceRuntime` exposing only `compile_engine` — and catches
  broad `Exception` (re-raising preserves the original type;
  `error_class` is recorded in the ERROR log payload).

Production
- engine_compiler.py: `CompileOutcome` enum, `BackboneSpec`,
  `EngineCompileRequest`, `EngineCompileResult`,
  `EngineCompileSummary` DTOs; `CompileEngineCallable` Protocol;
  `EngineCompiler` with the single public method.
- config.py: `BackboneConfig` + `C10ProvisioningConfig`
  (`workspace_mb` default 4 GiB to match C7 NFT-LIM-01); validate
  positive shape dims and duplicate model_name detection in
  `__post_init__`.
- runtime_root/c10_factory.py: `build_engine_compiler(config)` wires
  the existing `build_inference_runtime` factory through;
  `build_backbone_specs(config)` materialises the `BackboneSpec`
  tuple from the config block.
- components/c10_provisioning/__init__.py: re-exports the AZ-321
  surface and registers the new config block.

Tests
- test_engine_compiler.py: covers AC-1..AC-10 + missing-sidecar
  sibling case for AC-5. Tier-1 via fake runtime that writes through
  the REAL `Sha256Sidecar.write_atomic_and_sidecar`. Tier-2
  placeholders for the cache-hit p99 NFR (200 MB engine sweep) and
  kill-during-compile atomic-write NFR.

Docs
- module-layout.md: c10_provisioning Per-Component Mapping lists the
  new internal modules (engine_compiler.py, config.py), the
  composition-root c10_factory.py, the AZ-321 public re-export
  surface, and the registered config block.
- batch_33_cycle1_report.md + reviews/batch_33_review.md:
  PASS_WITH_WARNINGS (4 Low findings accepted).

Tests run: c10_provisioning 13 passing + 2 Tier-2 skips; combined
unit suite (excluding pending components) 543 passing, 21
env-skipped.

Co-authored-by: Cursor <cursoragent@cursor.com>

2026-05-13 00:09:53 +03:00

4.9 KiB

Raw Blame History

Code Review Report — Batch 33 / Cycle 1

Batch: 33 Tasks: AZ-321 (C10 EngineCompiler) Date: 2026-05-12 Verdict: PASS_WITH_WARNINGS

Findings

#	Severity	Category	File:Line	Title
1	Low	Architecture	`engine_compiler.py::_compile_one`	Broad `except Exception` rather than the AZ-297 `RuntimeError` family
2	Low	Maintainability	`engine_compiler.py::CompileEngineCallable`	Duplicates the `compile_engine` signature from the C7 `InferenceRuntime` Protocol
3	Low	Architecture	`engine_compiler.py` ↔ C7InferenceConfig	`EngineCompileRequest.cache_root` MUST mirror `C7InferenceConfig.engine_cache_dir` — invariant enforced by the composition root, not the compiler
4	Low	Scope	`engine_compiler.py::_build_config_for_backbone`	Single static `OptimizationProfile` synthesised per backbone; dynamic shape ranges out of scope

Finding Details

F1: Broad except Exception on compile_engine (Low / Architecture)

Location: src/gps_denied_onboard/components/c10_provisioning/engine_compiler.py::EngineCompiler._compile_one
Description: The C7 contract scopes InferenceRuntime.compile_engine exceptions to the C7-local RuntimeError family (EngineBuildError, CalibrationCacheError, ...). The c10 layer is forbidden from importing components.c7_inference (architecture rule test_az270_compose_root.test_ac6 walks all components/*/*.py files and flags any cross-component import — including TYPE_CHECKING-guarded ones). Catching the broader Exception and dispatching by type(exc).__name__ in the log payload is the cheapest fix that respects the rule. Re-raise preserves the original exception type for the caller.
Suggestion: A longer-term cleanup would either (a) hoist the C7 error envelope to _types/inference.py (parallels the EngineCacheEntry move) or (b) extend the architecture-lint to allow Public-API-only imports from sibling components. Both are bigger scope than AZ-321.
Task: AZ-321
Resolution: Open (Low) — accepted as documented; inline comment in source.

F2: CompileEngineCallable shadow Protocol (Low / Maintainability)

Location: engine_compiler.py::CompileEngineCallable
Description: Defines a structural Protocol carrying only the single compile_engine method shape — the c10 compiler's narrow consumer cut of the AZ-297 InferenceRuntime Protocol. Mirrors the LightGlue dual-Protocol pattern already documented in _types/manifests.py (EngineHandle consumer-cut Protocol vs c7_inference.EngineHandle opaque marker class).
Suggestion: None — same pattern already accepted across the codebase. A future "Public-API cross-component import allowlist" lint update could collapse this dual.
Task: AZ-321
Resolution: Open (Low) — accepted; matches existing pattern.

F3: cache_root / engine_cache_dir invariant (Low / Architecture)

Location: engine_compiler.py::EngineCompileRequest.cache_root vs C7InferenceConfig.engine_cache_dir
Description: The compiler's cache-hit detection writes nothing — it just checks target_path.exists() + Sha256Sidecar.verify. The C7 runtime owns the .engine write path and writes to C7InferenceConfig.engine_cache_dir. If the composition root passes a different cache_root to the compiler than the C7 runtime is configured for, cache hits will never fire (the compiler will look at the wrong directory). The compiler itself trusts the request; the wiring invariant lives in build_engine_compiler and (later) the AZ-325 CacheProvisioner driver.
Suggestion: AZ-325 (C10 Cache Provisioner — the orchestrator T5 that drives the compiler) should pull both paths from the same config field or assert their equality at construction time. The compiler stays scope-bound to one model at a time.
Task: AZ-321 (flag for AZ-325 follow-up)
Resolution: Open (Low) — accepted; flagged for AZ-325.

F4: Single static OptimizationProfile per backbone (Low / Scope)

Location: engine_compiler.py::_build_config_for_backbone
Description: The synthesised BuildConfig carries exactly one OptimizationProfile with min == opt == max == expected_input_shape. Backbones requiring dynamic input ranges (variable batch size, variable image resolution) would need a richer BackboneSpec carrying explicit OptimizationProfile tuples. AZ-321's named corpus (DINOv2-VPR, LightGlue, ALIKED) uses fixed shapes; this is OK today but a real limitation.
Suggestion: When the first dynamic-shape backbone arrives, extend BackboneSpec with a dynamic_profiles: tuple[OptimizationProfile, ...] field and prefer it over the synthetic single-profile fallback inside _build_config_for_backbone.
Task: AZ-321
Resolution: Open (Low) — accepted; future extension.

Verdict Logic

0 Critical
0 High
0 Medium
4 Low

→ PASS_WITH_WARNINGS: only Low findings; all accepted as documented.

4.9 KiB Raw Blame History

Code Review Report — Batch 33 / Cycle 1

Findings

Finding Details

Verdict Logic

4.9 KiB

Raw Blame History