diff --git a/_docs/03_implementation/cumulative_review_batches_31-33_cycle1_report.md b/_docs/03_implementation/cumulative_review_batches_31-33_cycle1_report.md new file mode 100644 index 0000000..8b81463 --- /dev/null +++ b/_docs/03_implementation/cumulative_review_batches_31-33_cycle1_report.md @@ -0,0 +1,173 @@ +# Cumulative Code Review — Batches 31–33 / Cycle 1 + +**Date**: 2026-05-12 +**Mode**: Cumulative (all 7 phases, emphasis on Phases 6 + 7) +**Batches covered**: 31, 32, 33 +**Tasks covered**: AZ-298 (C7 TensorrtRuntime), AZ-299 (C7 OnnxTrtEpRuntime fallback), AZ-321 (C10 EngineCompiler) +**Changed files in scope**: 16 (production + tests + docs) + +| Domain | Files (changed since cumulative_review_batches_28-30_cycle1_report.md) | +|--------|-----------------------------------------------------------------------| +| c7_inference (production) | `tensorrt_runtime.py` (new, AZ-298), `onnx_trt_ep_runtime.py` (new, AZ-299), `config.py` (extended by both) | +| c10_provisioning (production) | `engine_compiler.py` (new, AZ-321), `config.py` (new, AZ-321), `__init__.py` (extended, AZ-321) | +| Cross-cutting (production) | `fdr_client/records.py` (two new C7 kinds, AZ-299), `runtime_root/c10_factory.py` (new, AZ-321) | +| Tests | `tests/unit/c7_inference/test_tensorrt_runtime.py` (new), `test_onnx_trt_ep_runtime.py` (new), `test_protocol_conformance.py` (parametrize updated twice), `tests/unit/c10_provisioning/test_engine_compiler.py` (new), `tests/unit/test_az272_fdr_record_schema.py` (two new fixture branches) | +| Docs | `_docs/02_document/module-layout.md` (c7_inference + c10_provisioning rows refreshed each batch) | + +**Verdict**: **PASS_WITH_WARNINGS** + +## Summary + +No Critical or High findings. Three findings total — one Medium +(Architecture, the doc-vs-lint contradiction surfaced by AZ-321), two +Low (Maintainability — duplicated `_iso_ts_now` recurrence pattern, +and Maintainability — `CompileEngineCallable` dual-Protocol pattern +recurring). No cross-component Public API bypasses, no new cyclic +module dependencies, and the C7 strategies cleanly converge on the +shared `EngineCacheEntry` + `BuildConfig` DTOs at the `_types` +layer. + +Across the three batches the dominant architectural success is the +**`_types`-anchored cross-component contract surface**: C10's +`EngineCompiler` consumes C7's compile-side surface through canonical +DTOs in `_types/inference.py` (`EngineCacheEntry`, `BuildConfig`, +`OptimizationProfile`, `PrecisionMode`) and a c10-local structural +Protocol cut (`CompileEngineCallable`); the c10 code itself never +imports `components.c7_inference` directly, so the AZ-270 cross- +component import lint (`test_az270_compose_root.test_ac6`) stays +green. The composition root remains the only seam where the c7 +concrete strategy meets the c10 orchestrator. + +Phase 6 (Cross-Task Consistency) verifies: + +- `TensorrtRuntime.compile_engine` → `OnnxTrtEpRuntime.compile_engine` + → `PytorchFp16Runtime.compile_engine` share the same return type + (canonical `EngineCacheEntry`) and the same error envelope + (`EngineBuildError` / `CalibrationCacheError`) — so + `EngineCompiler._compile_one`'s broad `except Exception` is a + superset that captures all three runtimes' failure modes uniformly. +- The FDR record kinds added by AZ-299 (`c7.fallback_to_onnx_trt_ep`, + `c7.cpu_fallback`) are wired into `tests/unit/test_az272_fdr_record_schema.py`'s + per-kind fixture builder, so all AZ-272 schema tests + (round-trip / schema-version / unknown-kind) automatically cover + them — no orphaned kinds. +- AZ-298's `C7InferenceConfig` extensions + (`gpu_memory_budget_bytes`, `trtexec_timeout_s`) and AZ-299's + extensions (`ort_trt_cache_dir`, `ort_disallow_cpu_fallback`) layer + cleanly on the same `@dataclass(frozen=True)` with sequential + `__post_init__` validations. + +Phase 7 (Architecture Compliance): + +- Layer direction respected: c10 → `_types`, `helpers`, `config`, + `logging`, `fdr_client` — all Layer 1. +- Public API respect: c10 does NOT import from `components.c7_inference`. + The `runtime_root/c10_factory.py` is the only place that touches + both — and it lives in `runtime_root`, the composition-root + namespace explicitly exempted from the lint. +- No new module-level cycles. +- Duplicate symbols across components: `_iso_ts_now` recurs in + `c7_inference/tensorrt_runtime.py` and `c7_inference/onnx_trt_ep_runtime.py` + (both AZ-batch local). Combined with the three copies inside + `c6_tile_cache/` flagged in the 28-30 cumulative review (F3), + the codebase-wide pattern is now 5 component-local copies of the + same single-line ISO-timestamp helper. + +## Findings + +| # | Severity | Category | File:Line | Title | +|---|----------|----------|-----------|-------| +| 1 | Medium | Architecture | `_docs/02_document/module-layout.md:216` ↔ `tests/unit/test_az270_compose_root.py::test_ac6` | `module-layout.md` declares c10_provisioning "Imports from … `components.c7_inference` (Public API)" but the AZ-270 lint test forbids ANY cross-component import | +| 2 | Low | Maintainability | `c7_inference/{tensorrt_runtime,onnx_trt_ep_runtime}.py` + `c6_tile_cache/{postgres_filesystem_store,freshness_gate,cache_budget_enforcer}.py` | Five identical private `_iso_ts_now()` helpers across two components | +| 3 | Low | Maintainability | `c10_provisioning/engine_compiler.py::CompileEngineCallable` (+ `_types/manifests.py::EngineHandle` Protocol) | "Consumer-side structural Protocol cut" pattern recurring — first the LightGlue `EngineHandle` cut, now C10's `CompileEngineCallable`. Worth a documented L1 home before #3 appears. | + +### Finding Details + +**F1: `module-layout.md` allows c10 → c7 Public API import that the AZ-270 lint test forbids** (Medium / Architecture) + +- Location: `_docs/02_document/module-layout.md:216` (c10_provisioning's "Imports from" line) vs `tests/unit/test_az270_compose_root.py::test_ac6_only_compose_root_imports_concrete_strategies` +- Description: The doc says + *"Imports from: `_types`, `helpers.sha256_sidecar`, `helpers.engine_filename_schema`, `helpers.wgs_converter`, `components.c6_tile_cache` (Public API), `components.c7_inference` (Public API: engine compile surface), `config`, `logging`, `fdr_client`"*. The AZ-270 lint test walks every `.py` under `src/gps_denied_onboard/components/` and FAILS the build on any `from gps_denied_onboard.components.X import …` where X is not the importer's own component — regardless of whether X is a Public API surface or an internal module. The lint is strictly stricter than the doc. + This collision surfaced immediately on AZ-321: `engine_compiler.py` initially imported `from components.c7_inference import InferenceRuntime, EngineBuildError, CalibrationCacheError` per the documented allowance; the lint failed; the implementation pivoted to a c10-local `CompileEngineCallable` Protocol cut + broad-`Exception` catch. The pivot was correct (see F3 below), but the doc-vs-lint contradiction will trip every future c10 / c11 / c12 task that needs C7 Public API symbols. +- Suggestion: Pick ONE of two reconciliations and apply it project-wide: + - **(a) Tighten the doc to match the lint.** Rewrite every "Imports from" line that names `components.X (Public API)` to instead point at `_types` (where the cross-component DTOs already live) and demand that helper-class exceptions (`EngineBuildError`, `CalibrationCacheError`, …) be hoisted to `_types/inference_errors.py` so consumers don't need to widen the catch to `Exception`. + - **(b) Loosen the lint to allow Public-API-only imports across components.** Extend `test_az270_compose_root.test_ac6` to read `module-layout.md`'s per-component Public API file list and skip imports that target only those files. This requires the doc to keep an authoritative `Public API` enumeration per component. + - Option (a) is simpler and matches the established `EngineCacheEntry` move to `_types/inference.py`. Option (b) is more flexible but adds parser fragility to the lint. Defer the choice to the architecture owner. +- Task: AZ-321 (drift exposed by) — but the resolution is cross-cutting and should be opened as a hygiene task (`remediate_c10_c7_import_rule_alignment`) rather than re-opening AZ-321. + +**F2: Five identical private `_iso_ts_now()` helpers across c7 + c6** (Low / Maintainability) + +- Location: + - `c7_inference/tensorrt_runtime.py` + - `c7_inference/onnx_trt_ep_runtime.py` + - `c6_tile_cache/postgres_filesystem_store.py` + - `c6_tile_cache/freshness_gate.py` + - `c6_tile_cache/cache_budget_enforcer.py` +- Description: Each module defines a private one-liner returning + `datetime.now(timezone.utc).isoformat(...)` (or the equivalent UTC- + with-microseconds string) for FDR record timestamps. The 28-30 + cumulative review (F3) flagged the three c6 copies; batches 31-32 + added two more in c7. The function is mechanical, deterministic, + and not component-specific. Consolidating into + `helpers/iso_timestamps.py` (or `_types/iso.py`) costs no layering + — every consumer is at Layer 1+ — and eliminates the recurring + drift pattern. +- Suggestion: Open a small hygiene PBI (2 points): create + `src/gps_denied_onboard/helpers/iso_timestamps.py` exposing + `iso_ts_now() -> str`; update the five call sites; delete the + local definitions. Defer to the next "tools / hygiene" autodev + cycle so it doesn't gate AZ-322 / AZ-337 from starting. +- Task: AZ-298 + AZ-299 (re-introduced the pattern); already flagged + for c6 in the 28-30 review. Not blocking. + +**F3: Consumer-side structural Protocol cut pattern recurring** (Low / Maintainability) + +- Location: + - `c10_provisioning/engine_compiler.py::CompileEngineCallable` (AZ-321, batch 33) + - `_types/manifests.py::EngineHandle` Protocol (LightGlue consumer cut, pre-existing) +- Description: Both define a structural Protocol with a single + method that mirrors a method on a Producer-side Protocol or class + living in another component. The intent is identical: keep the + consumer free of cross-component imports while letting the real + producer satisfy the Protocol via duck typing. Today the pattern + appears twice; if AZ-322 (C10 DescriptorBatcher) or AZ-337 (C2 + UltraVPR) need similar narrow cuts of `InferenceRuntime.infer` or + `EngineCacheEntry`-aware helpers, the count grows. +- Suggestion: Add a `## Consumer-Side Protocol Cuts` section to + `_docs/02_document/architecture.md` that codifies the pattern: + consumer-side narrow Protocol cuts live alongside the consumer + (component-local) when used by exactly one consumer; they migrate + to `_types/.py` only when a SECOND consumer needs the + same cut. This is consistent with the `EngineHandle` cut already + hosted in `_types/manifests.py` (used by both LightGlue and + ALIKED matchers eventually). +- Task: AZ-321 (surfaces the pattern) — architecture-doc update is + scope-creep for this PBI; defer to a separate "architecture + hygiene" PBI. + +## Baseline Delta + +`_docs/02_document/architecture_compliance_baseline.md` does not +exist (greenfield project). The Baseline Delta section is omitted +per `code-review/SKILL.md` "Baseline delta". + +## Verdict Logic + +- 0 Critical +- 0 High +- 1 Medium (Architecture — doc-vs-lint contradiction; non-blocking, + flagged for hygiene PBI) +- 2 Low (Maintainability) + +→ **PASS_WITH_WARNINGS**: only Medium / Low findings; the Medium +finding is a documentation alignment, not a code defect, and is +resolved out-of-band via a hygiene PBI rather than blocking batch 34. + +## Test Suite + +- c7_inference: 139 passing, 17 Tier-2 skips. +- c10_provisioning: 13 passing, 2 Tier-2 skips. +- Combined unit suite excluding pending components and the c6 + collection blocker on this dev host (missing `psycopg_pool`): + 543 passing, 21 environment-skipped, 1 warning (pre-existing + `pynvml` FutureWarning unrelated to the batches under review). diff --git a/_docs/_autodev_state.md b/_docs/_autodev_state.md index 8cd47ce..c7aee86 100644 --- a/_docs/_autodev_state.md +++ b/_docs/_autodev_state.md @@ -6,10 +6,11 @@ step: 7 name: Implement status: in_progress sub_step: - phase: 14 - name: loop-to-next-batch + phase: 3 + name: compute-next-batch detail: "" retry_count: 0 cycle: 1 tracker: jira last_completed_batch: 33 +last_cumulative_review: batches_31-33