mirror of
https://github.com/azaion/gps-denied-onboard.git
synced 2026-06-21 08:41:12 +00:00
chore: record cumulative review batches 31-33 + state
Cumulative review covering AZ-298 / AZ-299 / AZ-321: PASS_WITH_WARNINGS. 0 Critical, 0 High, 1 Medium, 2 Low. Medium: `module-layout.md` declares c10 may import from c7 Public API but `test_az270_compose_root.test_ac6` forbids ANY cross-component import — doc-vs-lint mismatch surfaced by AZ-321; refactor pivoted to `CompileEngineCallable` local Protocol cut. Flagged for hygiene PBI; not blocking. Low: `_iso_ts_now` now duplicated five times across c7+c6; consumer-side Protocol cut pattern recurring (LightGlue `EngineHandle` + `CompileEngineCallable`). Both deferred to the next hygiene cycle. State advances to phase 3 (compute-next-batch) with last_cumulative_review=batches_31-33 so the next /autodev invocation enters at the right point. Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -0,0 +1,173 @@
|
||||
# Cumulative Code Review — Batches 31–33 / Cycle 1
|
||||
|
||||
**Date**: 2026-05-12
|
||||
**Mode**: Cumulative (all 7 phases, emphasis on Phases 6 + 7)
|
||||
**Batches covered**: 31, 32, 33
|
||||
**Tasks covered**: AZ-298 (C7 TensorrtRuntime), AZ-299 (C7 OnnxTrtEpRuntime fallback), AZ-321 (C10 EngineCompiler)
|
||||
**Changed files in scope**: 16 (production + tests + docs)
|
||||
|
||||
| Domain | Files (changed since cumulative_review_batches_28-30_cycle1_report.md) |
|
||||
|--------|-----------------------------------------------------------------------|
|
||||
| c7_inference (production) | `tensorrt_runtime.py` (new, AZ-298), `onnx_trt_ep_runtime.py` (new, AZ-299), `config.py` (extended by both) |
|
||||
| c10_provisioning (production) | `engine_compiler.py` (new, AZ-321), `config.py` (new, AZ-321), `__init__.py` (extended, AZ-321) |
|
||||
| Cross-cutting (production) | `fdr_client/records.py` (two new C7 kinds, AZ-299), `runtime_root/c10_factory.py` (new, AZ-321) |
|
||||
| Tests | `tests/unit/c7_inference/test_tensorrt_runtime.py` (new), `test_onnx_trt_ep_runtime.py` (new), `test_protocol_conformance.py` (parametrize updated twice), `tests/unit/c10_provisioning/test_engine_compiler.py` (new), `tests/unit/test_az272_fdr_record_schema.py` (two new fixture branches) |
|
||||
| Docs | `_docs/02_document/module-layout.md` (c7_inference + c10_provisioning rows refreshed each batch) |
|
||||
|
||||
**Verdict**: **PASS_WITH_WARNINGS**
|
||||
|
||||
## Summary
|
||||
|
||||
No Critical or High findings. Three findings total — one Medium
|
||||
(Architecture, the doc-vs-lint contradiction surfaced by AZ-321), two
|
||||
Low (Maintainability — duplicated `_iso_ts_now` recurrence pattern,
|
||||
and Maintainability — `CompileEngineCallable` dual-Protocol pattern
|
||||
recurring). No cross-component Public API bypasses, no new cyclic
|
||||
module dependencies, and the C7 strategies cleanly converge on the
|
||||
shared `EngineCacheEntry` + `BuildConfig` DTOs at the `_types`
|
||||
layer.
|
||||
|
||||
Across the three batches the dominant architectural success is the
|
||||
**`_types`-anchored cross-component contract surface**: C10's
|
||||
`EngineCompiler` consumes C7's compile-side surface through canonical
|
||||
DTOs in `_types/inference.py` (`EngineCacheEntry`, `BuildConfig`,
|
||||
`OptimizationProfile`, `PrecisionMode`) and a c10-local structural
|
||||
Protocol cut (`CompileEngineCallable`); the c10 code itself never
|
||||
imports `components.c7_inference` directly, so the AZ-270 cross-
|
||||
component import lint (`test_az270_compose_root.test_ac6`) stays
|
||||
green. The composition root remains the only seam where the c7
|
||||
concrete strategy meets the c10 orchestrator.
|
||||
|
||||
Phase 6 (Cross-Task Consistency) verifies:
|
||||
|
||||
- `TensorrtRuntime.compile_engine` → `OnnxTrtEpRuntime.compile_engine`
|
||||
→ `PytorchFp16Runtime.compile_engine` share the same return type
|
||||
(canonical `EngineCacheEntry`) and the same error envelope
|
||||
(`EngineBuildError` / `CalibrationCacheError`) — so
|
||||
`EngineCompiler._compile_one`'s broad `except Exception` is a
|
||||
superset that captures all three runtimes' failure modes uniformly.
|
||||
- The FDR record kinds added by AZ-299 (`c7.fallback_to_onnx_trt_ep`,
|
||||
`c7.cpu_fallback`) are wired into `tests/unit/test_az272_fdr_record_schema.py`'s
|
||||
per-kind fixture builder, so all AZ-272 schema tests
|
||||
(round-trip / schema-version / unknown-kind) automatically cover
|
||||
them — no orphaned kinds.
|
||||
- AZ-298's `C7InferenceConfig` extensions
|
||||
(`gpu_memory_budget_bytes`, `trtexec_timeout_s`) and AZ-299's
|
||||
extensions (`ort_trt_cache_dir`, `ort_disallow_cpu_fallback`) layer
|
||||
cleanly on the same `@dataclass(frozen=True)` with sequential
|
||||
`__post_init__` validations.
|
||||
|
||||
Phase 7 (Architecture Compliance):
|
||||
|
||||
- Layer direction respected: c10 → `_types`, `helpers`, `config`,
|
||||
`logging`, `fdr_client` — all Layer 1.
|
||||
- Public API respect: c10 does NOT import from `components.c7_inference`.
|
||||
The `runtime_root/c10_factory.py` is the only place that touches
|
||||
both — and it lives in `runtime_root`, the composition-root
|
||||
namespace explicitly exempted from the lint.
|
||||
- No new module-level cycles.
|
||||
- Duplicate symbols across components: `_iso_ts_now` recurs in
|
||||
`c7_inference/tensorrt_runtime.py` and `c7_inference/onnx_trt_ep_runtime.py`
|
||||
(both AZ-batch local). Combined with the three copies inside
|
||||
`c6_tile_cache/` flagged in the 28-30 cumulative review (F3),
|
||||
the codebase-wide pattern is now 5 component-local copies of the
|
||||
same single-line ISO-timestamp helper.
|
||||
|
||||
## Findings
|
||||
|
||||
| # | Severity | Category | File:Line | Title |
|
||||
|---|----------|----------|-----------|-------|
|
||||
| 1 | Medium | Architecture | `_docs/02_document/module-layout.md:216` ↔ `tests/unit/test_az270_compose_root.py::test_ac6` | `module-layout.md` declares c10_provisioning "Imports from … `components.c7_inference` (Public API)" but the AZ-270 lint test forbids ANY cross-component import |
|
||||
| 2 | Low | Maintainability | `c7_inference/{tensorrt_runtime,onnx_trt_ep_runtime}.py` + `c6_tile_cache/{postgres_filesystem_store,freshness_gate,cache_budget_enforcer}.py` | Five identical private `_iso_ts_now()` helpers across two components |
|
||||
| 3 | Low | Maintainability | `c10_provisioning/engine_compiler.py::CompileEngineCallable` (+ `_types/manifests.py::EngineHandle` Protocol) | "Consumer-side structural Protocol cut" pattern recurring — first the LightGlue `EngineHandle` cut, now C10's `CompileEngineCallable`. Worth a documented L1 home before #3 appears. |
|
||||
|
||||
### Finding Details
|
||||
|
||||
**F1: `module-layout.md` allows c10 → c7 Public API import that the AZ-270 lint test forbids** (Medium / Architecture)
|
||||
|
||||
- Location: `_docs/02_document/module-layout.md:216` (c10_provisioning's "Imports from" line) vs `tests/unit/test_az270_compose_root.py::test_ac6_only_compose_root_imports_concrete_strategies`
|
||||
- Description: The doc says
|
||||
*"Imports from: `_types`, `helpers.sha256_sidecar`, `helpers.engine_filename_schema`, `helpers.wgs_converter`, `components.c6_tile_cache` (Public API), `components.c7_inference` (Public API: engine compile surface), `config`, `logging`, `fdr_client`"*. The AZ-270 lint test walks every `.py` under `src/gps_denied_onboard/components/` and FAILS the build on any `from gps_denied_onboard.components.X import …` where X is not the importer's own component — regardless of whether X is a Public API surface or an internal module. The lint is strictly stricter than the doc.
|
||||
This collision surfaced immediately on AZ-321: `engine_compiler.py` initially imported `from components.c7_inference import InferenceRuntime, EngineBuildError, CalibrationCacheError` per the documented allowance; the lint failed; the implementation pivoted to a c10-local `CompileEngineCallable` Protocol cut + broad-`Exception` catch. The pivot was correct (see F3 below), but the doc-vs-lint contradiction will trip every future c10 / c11 / c12 task that needs C7 Public API symbols.
|
||||
- Suggestion: Pick ONE of two reconciliations and apply it project-wide:
|
||||
- **(a) Tighten the doc to match the lint.** Rewrite every "Imports from" line that names `components.X (Public API)` to instead point at `_types` (where the cross-component DTOs already live) and demand that helper-class exceptions (`EngineBuildError`, `CalibrationCacheError`, …) be hoisted to `_types/inference_errors.py` so consumers don't need to widen the catch to `Exception`.
|
||||
- **(b) Loosen the lint to allow Public-API-only imports across components.** Extend `test_az270_compose_root.test_ac6` to read `module-layout.md`'s per-component Public API file list and skip imports that target only those files. This requires the doc to keep an authoritative `Public API` enumeration per component.
|
||||
- Option (a) is simpler and matches the established `EngineCacheEntry` move to `_types/inference.py`. Option (b) is more flexible but adds parser fragility to the lint. Defer the choice to the architecture owner.
|
||||
- Task: AZ-321 (drift exposed by) — but the resolution is cross-cutting and should be opened as a hygiene task (`remediate_c10_c7_import_rule_alignment`) rather than re-opening AZ-321.
|
||||
|
||||
**F2: Five identical private `_iso_ts_now()` helpers across c7 + c6** (Low / Maintainability)
|
||||
|
||||
- Location:
|
||||
- `c7_inference/tensorrt_runtime.py`
|
||||
- `c7_inference/onnx_trt_ep_runtime.py`
|
||||
- `c6_tile_cache/postgres_filesystem_store.py`
|
||||
- `c6_tile_cache/freshness_gate.py`
|
||||
- `c6_tile_cache/cache_budget_enforcer.py`
|
||||
- Description: Each module defines a private one-liner returning
|
||||
`datetime.now(timezone.utc).isoformat(...)` (or the equivalent UTC-
|
||||
with-microseconds string) for FDR record timestamps. The 28-30
|
||||
cumulative review (F3) flagged the three c6 copies; batches 31-32
|
||||
added two more in c7. The function is mechanical, deterministic,
|
||||
and not component-specific. Consolidating into
|
||||
`helpers/iso_timestamps.py` (or `_types/iso.py`) costs no layering
|
||||
— every consumer is at Layer 1+ — and eliminates the recurring
|
||||
drift pattern.
|
||||
- Suggestion: Open a small hygiene PBI (2 points): create
|
||||
`src/gps_denied_onboard/helpers/iso_timestamps.py` exposing
|
||||
`iso_ts_now() -> str`; update the five call sites; delete the
|
||||
local definitions. Defer to the next "tools / hygiene" autodev
|
||||
cycle so it doesn't gate AZ-322 / AZ-337 from starting.
|
||||
- Task: AZ-298 + AZ-299 (re-introduced the pattern); already flagged
|
||||
for c6 in the 28-30 review. Not blocking.
|
||||
|
||||
**F3: Consumer-side structural Protocol cut pattern recurring** (Low / Maintainability)
|
||||
|
||||
- Location:
|
||||
- `c10_provisioning/engine_compiler.py::CompileEngineCallable` (AZ-321, batch 33)
|
||||
- `_types/manifests.py::EngineHandle` Protocol (LightGlue consumer cut, pre-existing)
|
||||
- Description: Both define a structural Protocol with a single
|
||||
method that mirrors a method on a Producer-side Protocol or class
|
||||
living in another component. The intent is identical: keep the
|
||||
consumer free of cross-component imports while letting the real
|
||||
producer satisfy the Protocol via duck typing. Today the pattern
|
||||
appears twice; if AZ-322 (C10 DescriptorBatcher) or AZ-337 (C2
|
||||
UltraVPR) need similar narrow cuts of `InferenceRuntime.infer` or
|
||||
`EngineCacheEntry`-aware helpers, the count grows.
|
||||
- Suggestion: Add a `## Consumer-Side Protocol Cuts` section to
|
||||
`_docs/02_document/architecture.md` that codifies the pattern:
|
||||
consumer-side narrow Protocol cuts live alongside the consumer
|
||||
(component-local) when used by exactly one consumer; they migrate
|
||||
to `_types/<concern>.py` only when a SECOND consumer needs the
|
||||
same cut. This is consistent with the `EngineHandle` cut already
|
||||
hosted in `_types/manifests.py` (used by both LightGlue and
|
||||
ALIKED matchers eventually).
|
||||
- Task: AZ-321 (surfaces the pattern) — architecture-doc update is
|
||||
scope-creep for this PBI; defer to a separate "architecture
|
||||
hygiene" PBI.
|
||||
|
||||
## Baseline Delta
|
||||
|
||||
`_docs/02_document/architecture_compliance_baseline.md` does not
|
||||
exist (greenfield project). The Baseline Delta section is omitted
|
||||
per `code-review/SKILL.md` "Baseline delta".
|
||||
|
||||
## Verdict Logic
|
||||
|
||||
- 0 Critical
|
||||
- 0 High
|
||||
- 1 Medium (Architecture — doc-vs-lint contradiction; non-blocking,
|
||||
flagged for hygiene PBI)
|
||||
- 2 Low (Maintainability)
|
||||
|
||||
→ **PASS_WITH_WARNINGS**: only Medium / Low findings; the Medium
|
||||
finding is a documentation alignment, not a code defect, and is
|
||||
resolved out-of-band via a hygiene PBI rather than blocking batch 34.
|
||||
|
||||
## Test Suite
|
||||
|
||||
- c7_inference: 139 passing, 17 Tier-2 skips.
|
||||
- c10_provisioning: 13 passing, 2 Tier-2 skips.
|
||||
- Combined unit suite excluding pending components and the c6
|
||||
collection blocker on this dev host (missing `psycopg_pool`):
|
||||
543 passing, 21 environment-skipped, 1 warning (pre-existing
|
||||
`pynvml` FutureWarning unrelated to the batches under review).
|
||||
@@ -6,10 +6,11 @@ step: 7
|
||||
name: Implement
|
||||
status: in_progress
|
||||
sub_step:
|
||||
phase: 14
|
||||
name: loop-to-next-batch
|
||||
phase: 3
|
||||
name: compute-next-batch
|
||||
detail: ""
|
||||
retry_count: 0
|
||||
cycle: 1
|
||||
tracker: jira
|
||||
last_completed_batch: 33
|
||||
last_cumulative_review: batches_31-33
|
||||
|
||||
Reference in New Issue
Block a user