chore: record cumulative review batches 31-33 + state

Cumulative review covering AZ-298 / AZ-299 / AZ-321:
PASS_WITH_WARNINGS. 0 Critical, 0 High, 1 Medium, 2 Low.

Medium: `module-layout.md` declares c10 may import from c7
Public API but `test_az270_compose_root.test_ac6` forbids ANY
cross-component import — doc-vs-lint mismatch surfaced by
AZ-321; refactor pivoted to `CompileEngineCallable` local
Protocol cut. Flagged for hygiene PBI; not blocking.

Low: `_iso_ts_now` now duplicated five times across c7+c6;
consumer-side Protocol cut pattern recurring (LightGlue
`EngineHandle` + `CompileEngineCallable`). Both deferred to
the next hygiene cycle.

State advances to phase 3 (compute-next-batch) with
last_cumulative_review=batches_31-33 so the next /autodev
invocation enters at the right point.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-13 00:12:30 +03:00
parent 0dfe7c5301
commit defe80dc75
2 changed files with 176 additions and 2 deletions
@@ -0,0 +1,173 @@
# Cumulative Code Review — Batches 3133 / Cycle 1
**Date**: 2026-05-12
**Mode**: Cumulative (all 7 phases, emphasis on Phases 6 + 7)
**Batches covered**: 31, 32, 33
**Tasks covered**: AZ-298 (C7 TensorrtRuntime), AZ-299 (C7 OnnxTrtEpRuntime fallback), AZ-321 (C10 EngineCompiler)
**Changed files in scope**: 16 (production + tests + docs)
| Domain | Files (changed since cumulative_review_batches_28-30_cycle1_report.md) |
|--------|-----------------------------------------------------------------------|
| c7_inference (production) | `tensorrt_runtime.py` (new, AZ-298), `onnx_trt_ep_runtime.py` (new, AZ-299), `config.py` (extended by both) |
| c10_provisioning (production) | `engine_compiler.py` (new, AZ-321), `config.py` (new, AZ-321), `__init__.py` (extended, AZ-321) |
| Cross-cutting (production) | `fdr_client/records.py` (two new C7 kinds, AZ-299), `runtime_root/c10_factory.py` (new, AZ-321) |
| Tests | `tests/unit/c7_inference/test_tensorrt_runtime.py` (new), `test_onnx_trt_ep_runtime.py` (new), `test_protocol_conformance.py` (parametrize updated twice), `tests/unit/c10_provisioning/test_engine_compiler.py` (new), `tests/unit/test_az272_fdr_record_schema.py` (two new fixture branches) |
| Docs | `_docs/02_document/module-layout.md` (c7_inference + c10_provisioning rows refreshed each batch) |
**Verdict**: **PASS_WITH_WARNINGS**
## Summary
No Critical or High findings. Three findings total — one Medium
(Architecture, the doc-vs-lint contradiction surfaced by AZ-321), two
Low (Maintainability — duplicated `_iso_ts_now` recurrence pattern,
and Maintainability — `CompileEngineCallable` dual-Protocol pattern
recurring). No cross-component Public API bypasses, no new cyclic
module dependencies, and the C7 strategies cleanly converge on the
shared `EngineCacheEntry` + `BuildConfig` DTOs at the `_types`
layer.
Across the three batches the dominant architectural success is the
**`_types`-anchored cross-component contract surface**: C10's
`EngineCompiler` consumes C7's compile-side surface through canonical
DTOs in `_types/inference.py` (`EngineCacheEntry`, `BuildConfig`,
`OptimizationProfile`, `PrecisionMode`) and a c10-local structural
Protocol cut (`CompileEngineCallable`); the c10 code itself never
imports `components.c7_inference` directly, so the AZ-270 cross-
component import lint (`test_az270_compose_root.test_ac6`) stays
green. The composition root remains the only seam where the c7
concrete strategy meets the c10 orchestrator.
Phase 6 (Cross-Task Consistency) verifies:
- `TensorrtRuntime.compile_engine``OnnxTrtEpRuntime.compile_engine`
`PytorchFp16Runtime.compile_engine` share the same return type
(canonical `EngineCacheEntry`) and the same error envelope
(`EngineBuildError` / `CalibrationCacheError`) — so
`EngineCompiler._compile_one`'s broad `except Exception` is a
superset that captures all three runtimes' failure modes uniformly.
- The FDR record kinds added by AZ-299 (`c7.fallback_to_onnx_trt_ep`,
`c7.cpu_fallback`) are wired into `tests/unit/test_az272_fdr_record_schema.py`'s
per-kind fixture builder, so all AZ-272 schema tests
(round-trip / schema-version / unknown-kind) automatically cover
them — no orphaned kinds.
- AZ-298's `C7InferenceConfig` extensions
(`gpu_memory_budget_bytes`, `trtexec_timeout_s`) and AZ-299's
extensions (`ort_trt_cache_dir`, `ort_disallow_cpu_fallback`) layer
cleanly on the same `@dataclass(frozen=True)` with sequential
`__post_init__` validations.
Phase 7 (Architecture Compliance):
- Layer direction respected: c10 → `_types`, `helpers`, `config`,
`logging`, `fdr_client` — all Layer 1.
- Public API respect: c10 does NOT import from `components.c7_inference`.
The `runtime_root/c10_factory.py` is the only place that touches
both — and it lives in `runtime_root`, the composition-root
namespace explicitly exempted from the lint.
- No new module-level cycles.
- Duplicate symbols across components: `_iso_ts_now` recurs in
`c7_inference/tensorrt_runtime.py` and `c7_inference/onnx_trt_ep_runtime.py`
(both AZ-batch local). Combined with the three copies inside
`c6_tile_cache/` flagged in the 28-30 cumulative review (F3),
the codebase-wide pattern is now 5 component-local copies of the
same single-line ISO-timestamp helper.
## Findings
| # | Severity | Category | File:Line | Title |
|---|----------|----------|-----------|-------|
| 1 | Medium | Architecture | `_docs/02_document/module-layout.md:216``tests/unit/test_az270_compose_root.py::test_ac6` | `module-layout.md` declares c10_provisioning "Imports from … `components.c7_inference` (Public API)" but the AZ-270 lint test forbids ANY cross-component import |
| 2 | Low | Maintainability | `c7_inference/{tensorrt_runtime,onnx_trt_ep_runtime}.py` + `c6_tile_cache/{postgres_filesystem_store,freshness_gate,cache_budget_enforcer}.py` | Five identical private `_iso_ts_now()` helpers across two components |
| 3 | Low | Maintainability | `c10_provisioning/engine_compiler.py::CompileEngineCallable` (+ `_types/manifests.py::EngineHandle` Protocol) | "Consumer-side structural Protocol cut" pattern recurring — first the LightGlue `EngineHandle` cut, now C10's `CompileEngineCallable`. Worth a documented L1 home before #3 appears. |
### Finding Details
**F1: `module-layout.md` allows c10 → c7 Public API import that the AZ-270 lint test forbids** (Medium / Architecture)
- Location: `_docs/02_document/module-layout.md:216` (c10_provisioning's "Imports from" line) vs `tests/unit/test_az270_compose_root.py::test_ac6_only_compose_root_imports_concrete_strategies`
- Description: The doc says
*"Imports from: `_types`, `helpers.sha256_sidecar`, `helpers.engine_filename_schema`, `helpers.wgs_converter`, `components.c6_tile_cache` (Public API), `components.c7_inference` (Public API: engine compile surface), `config`, `logging`, `fdr_client`"*. The AZ-270 lint test walks every `.py` under `src/gps_denied_onboard/components/` and FAILS the build on any `from gps_denied_onboard.components.X import …` where X is not the importer's own component — regardless of whether X is a Public API surface or an internal module. The lint is strictly stricter than the doc.
This collision surfaced immediately on AZ-321: `engine_compiler.py` initially imported `from components.c7_inference import InferenceRuntime, EngineBuildError, CalibrationCacheError` per the documented allowance; the lint failed; the implementation pivoted to a c10-local `CompileEngineCallable` Protocol cut + broad-`Exception` catch. The pivot was correct (see F3 below), but the doc-vs-lint contradiction will trip every future c10 / c11 / c12 task that needs C7 Public API symbols.
- Suggestion: Pick ONE of two reconciliations and apply it project-wide:
- **(a) Tighten the doc to match the lint.** Rewrite every "Imports from" line that names `components.X (Public API)` to instead point at `_types` (where the cross-component DTOs already live) and demand that helper-class exceptions (`EngineBuildError`, `CalibrationCacheError`, …) be hoisted to `_types/inference_errors.py` so consumers don't need to widen the catch to `Exception`.
- **(b) Loosen the lint to allow Public-API-only imports across components.** Extend `test_az270_compose_root.test_ac6` to read `module-layout.md`'s per-component Public API file list and skip imports that target only those files. This requires the doc to keep an authoritative `Public API` enumeration per component.
- Option (a) is simpler and matches the established `EngineCacheEntry` move to `_types/inference.py`. Option (b) is more flexible but adds parser fragility to the lint. Defer the choice to the architecture owner.
- Task: AZ-321 (drift exposed by) — but the resolution is cross-cutting and should be opened as a hygiene task (`remediate_c10_c7_import_rule_alignment`) rather than re-opening AZ-321.
**F2: Five identical private `_iso_ts_now()` helpers across c7 + c6** (Low / Maintainability)
- Location:
- `c7_inference/tensorrt_runtime.py`
- `c7_inference/onnx_trt_ep_runtime.py`
- `c6_tile_cache/postgres_filesystem_store.py`
- `c6_tile_cache/freshness_gate.py`
- `c6_tile_cache/cache_budget_enforcer.py`
- Description: Each module defines a private one-liner returning
`datetime.now(timezone.utc).isoformat(...)` (or the equivalent UTC-
with-microseconds string) for FDR record timestamps. The 28-30
cumulative review (F3) flagged the three c6 copies; batches 31-32
added two more in c7. The function is mechanical, deterministic,
and not component-specific. Consolidating into
`helpers/iso_timestamps.py` (or `_types/iso.py`) costs no layering
— every consumer is at Layer 1+ — and eliminates the recurring
drift pattern.
- Suggestion: Open a small hygiene PBI (2 points): create
`src/gps_denied_onboard/helpers/iso_timestamps.py` exposing
`iso_ts_now() -> str`; update the five call sites; delete the
local definitions. Defer to the next "tools / hygiene" autodev
cycle so it doesn't gate AZ-322 / AZ-337 from starting.
- Task: AZ-298 + AZ-299 (re-introduced the pattern); already flagged
for c6 in the 28-30 review. Not blocking.
**F3: Consumer-side structural Protocol cut pattern recurring** (Low / Maintainability)
- Location:
- `c10_provisioning/engine_compiler.py::CompileEngineCallable` (AZ-321, batch 33)
- `_types/manifests.py::EngineHandle` Protocol (LightGlue consumer cut, pre-existing)
- Description: Both define a structural Protocol with a single
method that mirrors a method on a Producer-side Protocol or class
living in another component. The intent is identical: keep the
consumer free of cross-component imports while letting the real
producer satisfy the Protocol via duck typing. Today the pattern
appears twice; if AZ-322 (C10 DescriptorBatcher) or AZ-337 (C2
UltraVPR) need similar narrow cuts of `InferenceRuntime.infer` or
`EngineCacheEntry`-aware helpers, the count grows.
- Suggestion: Add a `## Consumer-Side Protocol Cuts` section to
`_docs/02_document/architecture.md` that codifies the pattern:
consumer-side narrow Protocol cuts live alongside the consumer
(component-local) when used by exactly one consumer; they migrate
to `_types/<concern>.py` only when a SECOND consumer needs the
same cut. This is consistent with the `EngineHandle` cut already
hosted in `_types/manifests.py` (used by both LightGlue and
ALIKED matchers eventually).
- Task: AZ-321 (surfaces the pattern) — architecture-doc update is
scope-creep for this PBI; defer to a separate "architecture
hygiene" PBI.
## Baseline Delta
`_docs/02_document/architecture_compliance_baseline.md` does not
exist (greenfield project). The Baseline Delta section is omitted
per `code-review/SKILL.md` "Baseline delta".
## Verdict Logic
- 0 Critical
- 0 High
- 1 Medium (Architecture — doc-vs-lint contradiction; non-blocking,
flagged for hygiene PBI)
- 2 Low (Maintainability)
**PASS_WITH_WARNINGS**: only Medium / Low findings; the Medium
finding is a documentation alignment, not a code defect, and is
resolved out-of-band via a hygiene PBI rather than blocking batch 34.
## Test Suite
- c7_inference: 139 passing, 17 Tier-2 skips.
- c10_provisioning: 13 passing, 2 Tier-2 skips.
- Combined unit suite excluding pending components and the c6
collection blocker on this dev host (missing `psycopg_pool`):
543 passing, 21 environment-skipped, 1 warning (pre-existing
`pynvml` FutureWarning unrelated to the batches under review).
+3 -2
View File
@@ -6,10 +6,11 @@ step: 7
name: Implement
status: in_progress
sub_step:
phase: 14
name: loop-to-next-batch
phase: 3
name: compute-next-batch
detail: ""
retry_count: 0
cycle: 1
tracker: jira
last_completed_batch: 33
last_cumulative_review: batches_31-33