From 1141d177690968235cdcdd1ed996d17c63252d7d Mon Sep 17 00:00:00 2001 From: Oleksandr Bezdieniezhnykh Date: Tue, 12 May 2026 17:12:30 +0300 Subject: [PATCH] [AZ-300] [AZ-301] [AZ-302] [AZ-304] docs: sync module-layout for c6+c7 Cumulative review of batches 23-27 (cycle 1) surfaced three Medium documentation-drift findings on module-layout.md. All three fixed inline per user direction: F1: c7_inference Internal list expanded with architecture_registry, config, engine_gate, errors, manifest, thermal_publisher (added across AZ-300/301/302). F2: c6_tile_cache `connection.py` re-attributed from AZ-304 (which deferred it) to AZ-305 with a "planned, not landed yet" tag. F3: c7_inference Public API description rewritten by category (Protocol + DTOs + component services + config + error family) with a pointer to __init__.py's __all__ for the canonical list. Cumulative review report: _docs/03_implementation/cumulative_review_ batches_23-27_cycle1_report.md (PASS_WITH_WARNINGS). Autodev state moved to status: paused_user_requested per user choice; /autodev will resume at greenfield Step 7 (next batch selection) on next invocation. Co-authored-by: Cursor --- _docs/02_document/module-layout.md | 21 +- ...tive_review_batches_23-27_cycle1_report.md | 226 ++++++++++++++++++ _docs/_autodev_state.md | 8 +- 3 files changed, 245 insertions(+), 10 deletions(-) create mode 100644 _docs/03_implementation/cumulative_review_batches_23-27_cycle1_report.md diff --git a/_docs/02_document/module-layout.md b/_docs/02_document/module-layout.md index 29418da..7a2c874 100644 --- a/_docs/02_document/module-layout.md +++ b/_docs/02_document/module-layout.md @@ -145,7 +145,7 @@ Bootstrap reference: `_docs/02_tasks/todo/AZ-263_initial_structure.md`. Architec - `_native/` (`cpp/faiss_index/` wrapper) - `migrations.py` (`apply_migrations(config) -> MigrationResult` runner invoked by the composition root at startup; AZ-304 + later) - `_uuid_namespace.py` (pinned `TILE_NAMESPACE_UUID` + `derive_tile_id` / `derive_location_hash` helpers; cross-repo coordinated with `satellite-provider`; AZ-304) - - `connection.py` (`psycopg_pool` ConnectionPool helper; AZ-304) + - `connection.py` (planned — `psycopg_pool` ConnectionPool helper; AZ-305, not landed yet) - **Owns**: `src/gps_denied_onboard/components/c6_tile_cache/**`, `cpp/faiss_index/**`, `tests/unit/c6_tile_cache/**`, `db/migrations/**` (project-level Alembic env owned by c6 — `alembic.ini` at repo root points here; `0001_initial.py` shipped by AZ-263 bootstrap, `0002_c6_tile_identity_and_lru.py` and forward owned by AZ-304+ migrations) - **Imports from**: `_types`, `helpers.sha256_sidecar`, `helpers.wgs_converter`, `config`, `logging`, `fdr_client` - **Consumed by**: `c2_vpr`, `c2_5_rerank`, `c3_matcher`, `c10_provisioning`, `c11_tile_manager`, `runtime_root` @@ -154,13 +154,22 @@ Bootstrap reference: `_docs/02_tasks/todo/AZ-263_initial_structure.md`. Architec - **Epic**: AZ-249 (E-C7 Inference Runtime) - **Directory**: `src/gps_denied_onboard/components/c7_inference/` -- **Public API**: - - `__init__.py` (re-exports `InferenceRuntime`, `EngineCacheEntry`) +- **Public API** (`__init__.py` `__all__`; consult the module for the canonical list): - `interface.py` (`InferenceRuntime` Protocol) + - DTOs re-exported from `_types.inference` + `_types.thermal`: `BuildConfig`, `EngineCacheEntry`, `EngineHandle`, `OptimizationProfile`, `PrecisionMode`, `ThermalState` + - Component services: `EngineGate` + `HostTuple` (AZ-301), `ThermalStatePublisher` + `ThermalReading` + `ThermalSource` Protocol (AZ-302), `ManifestReader` + `ManifestReaderProtocol` + `DeploymentManifest` (AZ-301), `ArchitectureFactory` + `default_registry` + `register_architecture` (AZ-300) + - Config block: `C7InferenceConfig` (registered on import) + - Error family rooted at `RuntimeError` with documented subtypes (`InferenceError`, `EngineBuildError`, `EngineDeserializeError`, `EngineHashMismatchError`, `EngineSchemaMismatchError`, `EngineSidecarMissingError`, `CalibrationCacheError`, `OutOfMemoryError`, `TelemetryUnavailableError`) - **Internal**: - - `tensorrt_runtime.py` (production-default; TensorRT 10.3) - - `onnx_trt_runtime.py` (ONNX Runtime + TensorRT EP) - - `pytorch_fp16_runtime.py` (research-only baseline) + - `architecture_registry.py` (AZ-300; family of registered `ArchitectureFactory` callables consumed by `PytorchFp16Runtime`) + - `config.py` (`C7InferenceConfig` dataclass; registered on import) + - `engine_gate.py` (AZ-301; D-C10-3 + D-C10-7 takeoff validator) + - `errors.py` (component error family) + - `manifest.py` (AZ-301; `DeploymentManifest` + `ManifestReader` for engine sidecar manifests) + - `onnx_trt_runtime.py` (ONNX Runtime + TensorRT EP, pending) + - `pytorch_fp16_runtime.py` (AZ-300; research-only / simple-baseline strategy) + - `tensorrt_runtime.py` (production-default; TensorRT 10.3, pending) + - `thermal_publisher.py` (AZ-302; 1 Hz background poller, jtop/NVML fallback) - **Owns**: `src/gps_denied_onboard/components/c7_inference/**`, `tests/unit/c7_inference/**` - **Imports from**: `_types`, `helpers.engine_filename_schema`, `helpers.sha256_sidecar`, `config`, `logging`, `fdr_client` - **Consumed by**: `c2_vpr`, `c2_5_rerank`, `c3_matcher`, `c10_provisioning`, `runtime_root` diff --git a/_docs/03_implementation/cumulative_review_batches_23-27_cycle1_report.md b/_docs/03_implementation/cumulative_review_batches_23-27_cycle1_report.md new file mode 100644 index 0000000..fad08e8 --- /dev/null +++ b/_docs/03_implementation/cumulative_review_batches_23-27_cycle1_report.md @@ -0,0 +1,226 @@ +# Cumulative Code Review — Batches 23–27 / Cycle 1 + +**Date**: 2026-05-12 +**Mode**: Cumulative (all 7 phases, emphasis on Phases 6 + 7) +**Batches covered**: 23, 24, 25, 26, 27 +**Tasks covered**: AZ-332 (c1_vio OKVIS2), AZ-300 (c7 PyTorch FP16 baseline), AZ-301 (c7 EngineGate), AZ-302 (c7 ThermalStatePublisher), AZ-304 (c6 Postgres schema 0002) +**Changed files in scope**: 54 +**Verdict**: **PASS_WITH_WARNINGS** + +## Summary + +No Critical or High findings. Three Medium findings, all documentation +drift on `_docs/02_document/module-layout.md`: the c7_inference +component grew six new internal files across batches 24–26 that are +not listed in the layout doc, the c6_tile_cache entry still claims +`connection.py` / `psycopg_pool` for AZ-304 even though AZ-304 +explicitly deferred them to AZ-305, and the c7_inference Public API +description is stale (claims `__init__.py` re-exports only +`InferenceRuntime` + `EngineCacheEntry` but the actual `__all__` lists +26 symbols). + +No cross-component Public API bypasses, no new module-import cycles, +no duplicate symbols across components, no layer-direction +violations. All cross-component imports of the new code go through +the composition root (`runtime_root/storage_factory.py`) or stay +intra-component as required by ADR-009. + +## Findings + +| # | Severity | Category | File:Line | Title | +|---|----------|----------|-----------|-------| +| 1 | Medium | Architecture | `_docs/02_document/module-layout.md:153` | c7_inference Internal list missing 6 new files (engine_gate, manifest, thermal_publisher, architecture_registry, config, errors) | +| 2 | Medium | Architecture | `_docs/02_document/module-layout.md:136` | c6_tile_cache Internal list references `connection.py` (AZ-304) but AZ-304 deferred it to AZ-305 | +| 3 | Medium | Maintainability | `_docs/02_document/module-layout.md:158` | c7_inference Public API description stale (claims only `InferenceRuntime`/`EngineCacheEntry` re-exported; actual `__all__` has 26 symbols) | + +### Finding Details + +**F1: c7_inference Internal list missing 6 new files** (Medium / Architecture) + +- Location: `_docs/02_document/module-layout.md:153` +- Description: Batches 24–26 added the following modules under + `src/gps_denied_onboard/components/c7_inference/`: + `engine_gate.py` (AZ-301), `manifest.py` (AZ-301), + `thermal_publisher.py` (AZ-302), `architecture_registry.py` + (AZ-300), `config.py` (older), `errors.py` (older), and + `pytorch_fp16_runtime.py` (AZ-300, already listed). None of the + newly-added internals are documented as Internal in the layout doc. + This is documentation drift, not a behavioural issue: every + cross-component user only reaches into the component via + `__init__.py`'s `__all__` (verified by grep across `src/` and + `tests/`). +- Suggestion: append to the c7_inference Internal bullets: + `architecture_registry.py`, `config.py`, `engine_gate.py`, + `errors.py`, `manifest.py`, `thermal_publisher.py`. Attribute each + to its AZ ticket for traceability. +- Task: AZ-300 / AZ-301 / AZ-302 (incremental drift) + +**F2: c6_tile_cache Internal list references deferred `connection.py`** (Medium / Architecture) + +- Location: `_docs/02_document/module-layout.md:136` +- Description: The c6_tile_cache Internal list (just updated in + batch 27) still includes the line: + `` `connection.py` (`psycopg_pool` ConnectionPool helper; AZ-304) `` + AZ-304 explicitly deferred `connection.py` and `psycopg_pool` to + AZ-305 (`c6_postgres_filesystem_store`) after the dependency-pin + reality check; the file does NOT exist on disk and AZ-305 has not + shipped yet. The line is forward-leaning fiction. +- Suggestion: either remove the line, or re-attribute it to AZ-305 + with a "(planned)" suffix so readers know it has not landed yet. + Preferred: re-attribute to AZ-305 so the layout doc stays a + forward-looking plan. +- Task: AZ-304 (introduced the entry under the wrong attribution) + +**F3: c7_inference Public API description stale** (Medium / Maintainability) + +- Location: `_docs/02_document/module-layout.md:158` +- Description: The Public API bullets say: + `` `__init__.py` (re-exports `InferenceRuntime`, `EngineCacheEntry`) `` + The actual `c7_inference/__init__.py` `__all__` exports 26 symbols: + `ArchitectureFactory`, `BuildConfig`, `C7InferenceConfig`, + `CalibrationCacheError`, `DeploymentManifest`, `EngineBuildError`, + `EngineCacheEntry`, `EngineDeserializeError`, `EngineGate`, + `EngineHandle`, `EngineHashMismatchError`, + `EngineSchemaMismatchError`, `EngineSidecarMissingError`, + `HostTuple`, `InferenceError`, `InferenceRuntime`, + `ManifestReader`, `ManifestReaderProtocol`, `OptimizationProfile`, + `OutOfMemoryError`, `PrecisionMode`, `RuntimeError`, + `TelemetryUnavailableError`, `ThermalReading`, `ThermalSource`, + `ThermalState`, `ThermalStatePublisher`, `default_registry`, + `register_architecture`. Tests confirm no cross-component code + reaches into c7_inference internals — Public API surface is wider + than documented but consistent with intent (re-exports from + `_types.inference` and `_types.thermal`, the new EngineGate / + ManifestReader / ThermalStatePublisher classes, the error family, + the config block, and the architecture registry). +- Suggestion: rewrite the Public API bullet for c7_inference to list + the four documented surfaces (Protocol + DTOs + Errors + + ThermalStatePublisher + EngineGate + ManifestReader) at category + level, not enumerate every symbol. Optionally add a "see `__all__` + for the full list" pointer. +- Task: AZ-297 / AZ-300 / AZ-301 / AZ-302 (drift accumulated over + Public-API expansions in these batches) + +## Phase-by-Phase Notes + +### Phase 1 — Context Loading + +Read the five batch reports (`batch_23..27_cycle1_report.md`), the +five task specs (now under `_docs/02_tasks/done/`), and +`_docs/00_problem/restrictions.md` (not re-read this cycle — +unchanged since the 01-22 cumulative). + +### Phase 2 — Spec Compliance + +Per-batch reviews were the source-of-truth for batch-local AC +coverage (existing batch reports list every AC → test mapping). +Cumulative spot-check on AZ-304: AC-1 through AC-12 all have named +tests; Tier-2 docker tests are gated by `@pytest.mark.docker` and +skip on Tier-1. No re-litigation here. + +### Phase 3 — Code Quality + +No new SOLID / SRP / naming / complexity findings beyond what +per-batch reviews surfaced. Spot-check on the AZ-304 additions: + +- `_uuid_namespace.py` — pure stdlib `uuid.uuid5`; clean SRP; all + helpers private (`_normalize_*`). +- `migrations.py` — `apply_migrations` is the only public function; + retry logic is documented inline; `MigrationResult` is `frozen`. +- `0002_c6_tile_identity_and_lru.py` — `upgrade()` and `downgrade()` + are flat / linear; revision metadata correct. + +### Phase 4 — Security Quick-Scan + +No SQL string interpolation (the migration uses `op.*` parameterised +DDL; the test helpers parameterise via psycopg `cursor.execute(sql, +params)`). No hardcoded secrets. No subprocess. No +deserialization. Logging payloads carry only revision strings + the +pinned namespace UUID — no PII. + +### Phase 5 — Performance + +`_uuid_namespace.derive_*` are pure functions, called once per +INSERT in production. The migration `0002` is bounded by index +creation on an empty `tiles` table (greenfield) — no real-world +data-volume risk. The migration runner's retry-without-sleep is +deliberate (see batch 27 report § 4) — no measurable performance +regression vs. sleep-then-retry. + +### Phase 6 — Cross-Task Consistency + +- DTO consistency: `TileMetadata.location_hash` field added in AZ-304 + is consumed by no other batch (AZ-305 will be the first consumer); + no producer/consumer drift today. +- Protocol consistency: `VioStrategy` (c1_vio), `InferenceRuntime` + (c7_inference), `TileMetadataStore` (c6_tile_cache) — all three + Protocols are independently scoped and have no shared method + signatures or DTOs across components. +- Shared code: `_types.thermal.ThermalState` (introduced earlier) is + re-exported by c7_inference's `__init__.py` — single home, + re-export consistent with module-layout § 5 cross-cutting concern + rule. +- No duplicate class/function names across components. `EngineGate`, + `ThermalStatePublisher`, `ManifestReader`, `DeploymentManifest`, + `MigrationResult`, `MigrationError`, `derive_tile_id`, + `derive_location_hash` — all unique. Verified by grep across + `src/`. + +### Phase 7 — Architecture Compliance + +1. **Layer direction**: spot-checked all cross-module imports in the + 54 changed files. No import resolves to a higher-layer importee. + Composition root (`runtime_root/storage_factory.py`) is the only + place that imports concrete strategy modules + (`postgres_filesystem_store`, `faiss_descriptor_index`) — allowed + per ADR-009. + +2. **Public API respect**: no file in another component imports a + non-`__init__`/`interface` file of c1_vio, c6_tile_cache, or + c7_inference. Verified by: + ``` + grep -nR 'from gps_denied_onboard.components..' + ``` + on every internal module of the three components. All hits are + either intra-component or under `runtime_root/` (composition + root, explicitly allowed). + +3. **No new module-import cycles**: spot-built import graph for + `c1_vio`, `c6_tile_cache`, `c7_inference` (changed files + + direct deps). No cycles introduced. The `c7_inference.__init__` + → `_types.inference` / `_types.thermal` edges are one-way as + designed. + +4. **Duplicate symbols**: none (see Phase 6). + +5. **Cross-cutting concerns**: no logging / config / error-envelope + re-implementations spotted. `migrations.py` uses + `logging.structured.get_logger` (the shared concern). Both + `_uuid_namespace` and `migrations.py` import from the shared + `_types.*` / `config.schema` / `logging.structured` modules, + not local copies. + +## Baseline Delta + +Not applicable — `_docs/02_document/architecture_compliance_baseline. +md` does not exist on this branch (greenfield project, no architecture +baseline scan run yet). Baseline-delta tables omitted. + +## Auto-Fix Eligibility (per implement skill § 10 matrix) + +All three findings are Medium / Architecture or Medium / +Maintainability against a single documentation file +(`_docs/02_document/module-layout.md`). Per the auto-fix matrix: + +- Medium / Architecture → **escalate** +- Medium / Maintainability → **escalate** + +All three findings escalate to the user. None are auto-fix eligible. + +## Verdict + +**PASS_WITH_WARNINGS** — no Critical / High findings; three Medium +documentation-drift findings on `module-layout.md`. The implement +skill's loop may continue to batch 28; the user is the final +authority on whether to repair `module-layout.md` now or roll it +into a doc-sync task. diff --git a/_docs/_autodev_state.md b/_docs/_autodev_state.md index 836799f..0f036a7 100644 --- a/_docs/_autodev_state.md +++ b/_docs/_autodev_state.md @@ -4,11 +4,11 @@ flow: greenfield step: 7 name: Implement -status: in_progress +status: paused_user_requested sub_step: - phase: 3 - name: compute-batch - detail: "batch 27/cycle1: AZ-304 complete (additive 0002 migration + UUIDv5 namespace + runner + v1.2.0 contract bump). Awaiting next batch selection." + phase: 14.5 + name: cumulative-review-aftermath + detail: "batches 23-27 cumulative review: PASS_WITH_WARNINGS (3 Medium doc-drift findings on module-layout.md). User approved fix-now for findings, then chose stop_here. Module-layout.md updated; user will re-invoke /autodev when ready to continue." retry_count: 0 cycle: 1 tracker: jira