diff --git a/_docs/03_implementation/cumulative_review_batches_43-45_cycle1_report.md b/_docs/03_implementation/cumulative_review_batches_43-45_cycle1_report.md new file mode 100644 index 0000000..c99b501 --- /dev/null +++ b/_docs/03_implementation/cumulative_review_batches_43-45_cycle1_report.md @@ -0,0 +1,324 @@ +# Cumulative Code Review — Batches 43–45 / Cycle 1 + +**Date**: 2026-05-13 +**Mode**: Cumulative (all 7 phases, emphasis on Phases 6 + 7) +**Batches covered**: 43, 44, 45 +**Tasks covered**: AZ-328 (C12 Build-Cache Orchestrator + RemoteCacheProvisionerInvoker + FileLock), AZ-329 (C12 PostLandingUploadOrchestrator + FdrFooterReader), AZ-330 (C12 OperatorReLocService + OperatorCommandTransport Protocol), AZ-523 (C11 internal flight-state gate removal — supersedes AZ-317), AZ-524 (rename `c12_operator_tooling` → `c12_operator_orchestrator`; `OperatorToolServices` → `OperatorOrchestratorServices`), AZ-341 (C2 FAISS HNSW Retrieve Wiring — `FaissBridge` + `DescriptorIndexCut`) +**Changed files in scope**: 53 production sources + 21 test files + 8 docs / specs (full list in §Scope below) +**Verdict**: **PASS_WITH_WARNINGS** + +## Scope + +| Domain | Files | +|--------|-------| +| c2_vpr (production, new) | `_faiss_bridge.py`, `descriptor_index_cut.py` | +| c2_vpr (production, modified) | `__init__.py` (re-exports `DescriptorIndexCut` + `TileIdTuple`), `config.py` (added `warn_top1_threshold` + `debug_per_frame_distances` knobs + validators) | +| c11_tile_manager (production) | `flight_state_gate.py` (DELETED — Batch 44 SRP removal, AZ-523), `interface.py` (`TileUploader.upload_pending_tiles` v2.0.0 — removed `flight_state` parameter), `tile_uploader.py` (removed internal gate), `_types.py` (`UploadRequest` no longer carries `FlightStateSignal`), `errors.py` (removed `FlightStateNotOnGroundError`), `idempotent_retry.py` (signature alignment), `__init__.py` (re-export delta) | +| c12_operator_orchestrator (production, rename of `c12_operator_tooling`) | Every file in the directory was renamed; new modules introduced in batches 43/44: `build_cache.py`, `remote_c10_invoker.py`, `file_lock.py`, `tile_downloader_cut.py`, `tile_uploader_cut.py`, `post_landing_upload.py`, `fdr_footer_reader.py`, `operator_reloc_service.py`, `operator_command_transport.py` | +| c10_provisioning (production, minor) | `manifest_verifier.py` (touch; downstream of c12 type changes), `__init__.py` (re-export touch) | +| c3_5_adhop (production, minor) | `config.py` (touch; no behaviour change) | +| c6_tile_cache (production, minor) | `_tile_pixel_handle.py` (touch; type re-export shape) | +| Composition root | `runtime_root/__init__.py`, `c11_factory.py` (no internal gate to wire; signature update), `c12_factory.py` (`OperatorToolServices` → `OperatorOrchestratorServices`; added factories for `PostLandingUploadOrchestrator` + `OperatorReLocService`) | +| Healthcheck | `healthcheck.py` (touch; aggregator wiring updated for c12 rename) | +| FDR | `fdr_client/records.py` (registered `c12.reloc.requested` in Batch 44; registered `vpr.retrieve_topk` in Batch 45) | +| Tests | `c2_vpr/test_faiss_bridge.py` (NEW, 22 tests); `c12_operator_orchestrator/test_build_cache_orchestrator.py` (NEW, ~29 tests); `c12_operator_orchestrator/test_remote_c10_invoker.py` (NEW); `c12_operator_orchestrator/test_file_lock.py` (NEW); `c12_operator_orchestrator/test_cli_build_cache.py` (rewritten); `c12_operator_orchestrator/test_post_landing_upload_orchestrator.py` (NEW); `c12_operator_orchestrator/test_operator_reloc_service.py` (NEW); `c12_operator_orchestrator/test_fdr_footer_reader.py` (NEW); `c11_tile_manager/test_tile_uploader.py` (rewritten — gate removal); `c11_tile_manager/test_protocol_conformance.py` (rewritten); `test_az272_fdr_record_schema.py` (added `c12.reloc.requested` + `vpr.retrieve_topk` fixtures) | +| Docs | `_docs/02_document/architecture.md` (major Batch-44 doc sweep — C11/C12 entries, principle #4, data model, F10 flow, ADR-004), `system-flows.md` (F10 rewrite + new operator-reloc flow), `epics.md` (E-C11 + E-C12 rewrites), `data_model.md`, `glossary.md`, `FINAL_report.md`, `components/12_c11_tilemanager/{description,tests}.md`, `components/10_c8_fc_adapter/description.md`, `components/13_c12_operator_orchestrator/{description,tests}.md` (rename), `contracts/c11_tilemanager/tile_uploader.md` (v2.0.0), `contracts/c12_operator_orchestrator/` (new dir with `post_landing_upload.md`, `fdr_footer_reader.md`, `operator_command_transport.md`), `contracts/c6_tile_cache/tile_metadata_store.md` (minor C12 naming fix) | +| Tasks | `_docs/02_tasks/todo/AZ-329`, `AZ-330`, `AZ-341` → `done/`; `_docs/02_tasks/_dependencies_table.md` refreshed | + +## Summary + +Three batches, six feature tasks (plus two audit-trail tickets for the +Batch-44 SRP refactor), zero blocking findings. The window's dominant +themes: + +1. **C11 → C12 SRP refactor (Batch 44, AZ-329 + AZ-330 + AZ-523)** — + the internal post-landing flight-state gate was removed from C11 + entirely. The gate now lives once, in C12's + `PostLandingUploadOrchestrator`, where it consults the FDR + `flight_footer.clean_shutdown` field via the new `FdrFooterReader` + Protocol. The pivot eliminates a soft-real-time `FlightStateSignal` + subscription from the upload-side hot path and replaces it with a + one-shot file-read at orchestration time — measurable correctness, + testability, and observability win. C11 `TileUploader` contract + advanced to v2.0.0. + +2. **C12 operator orchestrator finished its end-to-end shape (Batches + 43 + 44, AZ-328 + AZ-329 + AZ-330)** — the operator workstation + now ships three production-functional services on top of the CLI + shell that landed in Batch 42: `BuildCacheOrchestrator` (F1 cache + build), `PostLandingUploadOrchestrator` (F10 post-landing upload), + and `OperatorReLocService` (AC-3.4 re-localization hint). Every + service consumes its cross-component dependencies via local AZ-507 + structural Protocol cuts (`TileDownloaderCut`, `TileUploaderCut`, + `FdrFooterReader`, `OperatorCommandTransport`) — c12 imports + nothing from c11, c10, c7, c6, or c13 directly. The wire-in + adapter lives in the composition root. + +3. **C2 retrieve plumbing centralised (Batch 45, AZ-341)** — + `FaissBridge` lifts the shared `retrieve_topk` body out of every + future concrete `VprStrategy`. The bridge owns the defended-in-depth + INV-4 check (== K, ascending), the WARN-threshold check, the + optional DEBUG per-frame-distances log, and one `vpr.retrieve_topk` + FDR record per call. Cross-strategy duplication risk eliminated; + each strategy's `retrieve_topk` will shrink to a one-line + delegation. The c2_vpr ↔ c6 boundary now uses the same + `DescriptorIndexCut` Protocol-cut pattern established by + c12_operator_orchestrator (Batch 43+44). + +The full unit-test suite reached **1565 passed / 80 environment-skipped** +across the changed surface, up from 1494 at the close of cumulative +review batches 40-42. No failures introduced in the cumulative scope. + +## Phase 1 — Context loading + +Inputs reviewed: + +- Architecture doc (post Batch-44 sweep) — principle #4 phrasing, + ADR-004 § "Why the gate moved to C12 (Batch 44)", new F10 / reloc + data-flow sections. +- Module layout — c2_vpr and c12_operator_orchestrator Per-Component + Mapping entries (the Batch-44 rename), AZ-507 cross-component rule. +- Contracts: `tile_uploader.md` v2.0.0, the new + `c12_operator_orchestrator/` contract directory + (`post_landing_upload.md`, `fdr_footer_reader.md`, + `operator_command_transport.md`), `vpr_strategy_protocol.md`, + `descriptor_index.md`. +- Task specs in scope: `AZ-328`, `AZ-329`, `AZ-330`, `AZ-341`, + `AZ-523` (audit), `AZ-524` (audit). + +## Phase 2 — Spec compliance + +| Task | ACs satisfied | Tests | Notes | +|------|---------------|-------|-------| +| AZ-328 | 15/15 | `test_build_cache_orchestrator.py` (29) + `test_remote_c10_invoker.py` (7) + `test_file_lock.py` (3) | Cycle-1 report `batch_43_cycle1_report.md`, review `batch_43_review.md` (PASS_WITH_WARNINGS) | +| AZ-329 | 14/14 | `test_post_landing_upload_orchestrator.py`, `test_fdr_footer_reader.py` | Cycle-1 report `batch_44_cycle1_report.md`, review `batch_44_review.md` (PASS_WITH_WARNINGS) | +| AZ-330 | 9/9 | `test_operator_reloc_service.py` | Same batch report / review | +| AZ-523 | audit-trail (closed) | C11 protocol-conformance test rewritten | Gate moved to C12 (AZ-329); no production code change since gate was deleted | +| AZ-524 | audit-trail (closed) | Directory rename verified by `module-layout.md` self-test in autodev | Composition-root names updated | +| AZ-341 | 11/11 + NFR-perf | `test_faiss_bridge.py` (22 tests) | Cycle-1 report `batch_45_cycle1_report.md`, review `batch_45_review.md` (PASS_WITH_WARNINGS) | + +Contract verification (per-batch reviews remain authoritative; spot- +checks in cumulative mode): + +- `tile_uploader.md` v2.0.0 — `TileUploader.upload_pending_tiles` + signature no longer takes `flight_state`; c11 implementation + + c12 `PostLandingUploadOrchestrator`'s c11 cut + every c11 + conformance test agree. ✓ +- `vpr_strategy_protocol.md` — `FaissBridge` produces `VprResult` + matching INV-4 (== K, ascending) and INV-5 (non-empty + `backbone_label`); IndexUnavailableError propagation matches + C2-ST-01 intent (the bridge's role is propagation, c6 is the + authoritative defence). ✓ +- `descriptor_index.md` — the bridge consumes the cut, not c6 + directly; the bridge's stricter == K invariant is documented as + intentional defended-in-depth. ✓ + +## Phase 3 — Code quality + +Across the 53 changed production files, no: + +- Bare `except` (every `try` either re-raises or names a specific + exception). +- Function > 50 LOC (longest: `BuildCacheOrchestrator.build_cache` + ~75 LOC including comments; orchestrator sequencing is naturally + longer and each phase is its own helper). +- Cyclomatic complexity > 10 in the orchestrator paths. +- Verbose default-on debug logging (every DEBUG-level log gated by + config; `FaissBridge.debug_per_frame_distances` defaults OFF per + AC-8). +- Silently swallowed errors (every `try/except` either logs at + ERROR + emits FDR, or re-raises a wrapped exception with `from + exc` to preserve the cause chain). + +Constructor parameter validation is consistent across the new +classes: `TypeError` for type violations, `ValueError` for range +violations. + +## Phase 4 — Security quick-scan + +- No SQL injection / command injection / eval / exec. +- No hardcoded secrets — `RemoteCacheProvisionerInvoker` redacts + `api_key` and `auth_token` substrings from streamed SSH stdout + (covered by dedicated tests in `test_remote_c10_invoker.py`). +- No PII / signing material on the new FDR records + (`c12.reloc.requested` carries `LatLonAlt` + free-text reason — + operator-issued, not user PII; `vpr.retrieve_topk` carries raw + float distances). +- `OperatorReLocService` validates `confidence_radius_m > 0` and + `reason` non-empty at the service boundary. +- The Batch-44 doc sweep updated ADR-004 ("Why the gate moved to + C12") and the C12 security note explicitly captures the + defense-in-depth shift. + +## Phase 5 — Performance scan + +- `FaissBridge.retrieve` p95 ≤ 500 µs verified by microbench + (`test_bridge_retrieve_overhead_p95_under_500us`). +- `BuildCacheOrchestrator.build_cache` overhead microbench landed + in Batch 43. +- `PostLandingUploadOrchestrator` is one-shot per landing — no hot + path. +- The C12 cold-start microbench + (`test_cold_start_under_500ms_p99`) was reported flaky in + Batch 44 (`580.8ms p99` on one run). Diagnosed as host-load + variance, not a regression; the underlying PEP 562 lazy-import + discipline is unchanged across batches 43-45. **Carried-over + flake — not introduced in this cumulative scope.** + +## Phase 6 — Cross-task consistency + +The dominant cross-task pattern across this window is the AZ-507 +consumer-side structural Protocol cut. Three new cuts shipped: + +| Cut | Producer (real type) | Consumer | Adapter location | +|-----|----------------------|----------|------------------| +| `TileUploaderCut` | c11 `TileUploader` (AZ-319 v2.0.0) | c12 `PostLandingUploadOrchestrator` (AZ-329) | `runtime_root/c12_factory.py` | +| `TileDownloaderCut` | c11 `HttpTileDownloader` (AZ-316) | c12 `BuildCacheOrchestrator` (AZ-328) | `runtime_root/c12_factory.py` | +| `DescriptorIndexCut` | c6 `DescriptorIndex.search_topk` (AZ-303 + AZ-306) | c2_vpr `FaissBridge` (AZ-341) | NOT YET WIRED — composition-root adapter will land with AZ-337 (first concrete strategy) | + +Plus two intra-c12 Protocol contracts: + +| Protocol | Producer | Consumer | Concrete impl | +|----------|----------|----------|---------------| +| `FdrFooterReader` | c12 (AZ-329) | `PostLandingUploadOrchestrator` | (in `fdr_footer_reader.py`) | +| `OperatorCommandTransport` | c12 (AZ-330) | `OperatorReLocService` | (pymavlink-backed impl deferred to E-C8) | + +All five Protocols use `runtime_checkable=True`, are declared in +a single `*.py` file per Protocol, are imported by their consumer +via the local module path (no cross-component imports), and follow +the same docstring shape ("single-method consumer-side cut of …"). + +No conflicting patterns observed. + +Cross-task DTO consistency: + +- `VprQuery.embedding` is L2-normalised by the producing strategy + (INV-3); the `FaissBridge` consumes the embedding as-is. Future + strategy implementations (AZ-337..AZ-340) must preserve INV-3. +- `VprCandidate.tile_id` carries `tuple[int, float, float]` + (`TileIdTuple`) deliberately — keeps L1 DTOs free of L3 c6 + imports. The `DescriptorIndexCut` matches this shape. +- `UploadRequest` (c11) no longer references `FlightStateSignal` + (Batch 44 removal); `PostLandingUploadOrchestrator` constructs + it from the `BuildCacheRequest` translation in c12_factory. + +## Phase 7 — Architecture compliance + +1. **Layer direction (rule 1)**: no upward imports observed across + the cumulative scope. c2_vpr (L3) imports `_types.vpr` (L1) and + its own component module; c12_operator_orchestrator (L4 + Adapters) imports its own component modules and L1 + shared/`_types`/config/logging/fdr_client/clock. + +2. **Public API respect / AZ-507 (rule 2)**: verified via grep: + + - `c2_vpr/*.py` has zero `from gps_denied_onboard.components.c6_tile_cache` imports. + - `c12_operator_orchestrator/*.py` has zero + `from gps_denied_onboard.components.c11_tile_manager` / + `…c10_provisioning` / `…c7_inference` / `…c6_tile_cache` imports. + +3. **No new cyclic dependencies (rule 3)**: no new cycles introduced. + +4. **Duplicate symbols (rule 4)**: + + - `IndexUnavailableError` exists in both `c2_vpr.errors` and + `c6_tile_cache.errors` — pre-existing intentional duplication, + documented in both module docstrings. The task-spec letter of + AC-4 + the c6 module comment hint at a design tension here (see + Finding F2 of `batch_45_review.md`). + - `_iso_ts_from_clock` (or equivalent inline) appears in 6 + modules: `c2_vpr/_faiss_bridge.py`, + `c12_operator_orchestrator/operator_reloc_service.py`, + `c11_tile_manager/idempotent_retry.py`, + `c11_tile_manager/signing_key.py`, + `c6_tile_cache/postgres_filesystem_store.py`, + `c6_tile_cache/freshness_gate.py`. **Carried-over finding — AZ-508 hygiene PBI exists in `todo/` for exactly this consolidation.** + +5. **Cross-cutting concerns not locally re-implemented (rule 5)**: see + Finding F1 below — the ISO-timestamp duplication is the only + concrete instance. + +## Baseline Delta + +`_docs/02_document/architecture_compliance_baseline.md` does not +exist for this project (greenfield; no prior architecture baseline +to compare against). The cumulative review therefore reports +findings as-is, not as deltas. + +## Findings + +| # | Severity | Category | Files | Title | +|---|----------|----------|-------|-------| +| F1 | Low | Maintainability / Architecture | c2/c6/c11/c12 (6 modules total) | `_iso_ts_from_clock` duplicated across components | +| F2 | Low | Architecture | `c2_vpr.errors.IndexUnavailableError` vs `c6_tile_cache.errors.IndexUnavailableError` | Two same-named exception classes; the bridge's AC-4 propagation contract intentionally lets c6's class leak past c2's "closed envelope" docstring | +| F3 | Low | Spec-Gap | `_docs/02_tasks/done/AZ-341_c2_faiss_retrieve_wiring.md` § Outcome | Spec lists unused `normaliser: DescriptorNormaliser` constructor param; INV-3 makes it redundant — implementation omits it | +| F4 | Low | Maintainability | `test_cold_start_under_500ms_p99` | Reported flaky (580.8ms p99 on one Batch-44 run); pre-existing host-load sensitivity, not introduced in 43-45 | + +### Finding Details + +**F1: `_iso_ts_from_clock` duplicated across components** (Low / Maintainability + Architecture) + +- Locations: see Phase 7 #4 above. +- Description: six modules duplicate the same 6-line helper that + converts `clock.time_ns()` → RFC 3339 UTC ISO string. Pure + duplication; same exact body in each. +- Suggestion: defer to AZ-508 (hygiene PBI already in `todo/` + scoped to ISO-timestamp consolidation across the suite). No + blocking action this cycle. + +**F2: `IndexUnavailableError` namespace duplication + AC-4 closed-envelope tension** (Low / Architecture) + +- Locations: `c2_vpr.errors.IndexUnavailableError`, + `c6_tile_cache.errors.IndexUnavailableError`. +- Description: c2 errors module docstring states "the C6 family is + the storage-layer error a concrete strategy is responsible for + rewrapping". The AZ-341 task spec AC-4 says the bridge "does not + catch and re-raise" the c6 exception. The two cannot both hold + for the closed-envelope invariant (consumers downstream of the + bridge will see c6's class, not c2's, when the c6 stale-handle + defence fires). +- Suggestion: surface to the user for spec/code-comment alignment. + Options: (a) widen AC-4 to allow rewrap; (b) declare c6's + `IndexUnavailableError` part of the C2 closed envelope (subclass + the c2 class from the c6 class, or vice versa); (c) leave as-is + with explicit documentation that consumers must catch either + class. None blocks the current implementation. + +**F3: Unused `normaliser` parameter in AZ-341 spec** (Low / Spec-Gap) + +- Location: task spec § Outcome (Constructor signature). +- Description: spec lists `normaliser: DescriptorNormaliser` but + the documented `retrieve` body never calls it. The bridge omits + the parameter (no behaviour change; INV-3 obligates the + producing strategy to normalise). +- Suggestion: update the spec to drop the parameter, OR add explicit + defensive re-normalisation to the bridge with a note that it + conflicts with INV-3's strategy-side guarantee. + +**F4: Flaky cold-start performance test** (Low / Maintainability) + +- Location: + `tests/unit/c12_operator_orchestrator/test_cli_help_and_logging.py::test_cold_start_under_500ms_p99`. +- Description: occasionally measures p99 > 500 ms on a developer + host (580.8 ms observed in Batch 44). The underlying PEP 562 lazy + import discipline is unchanged across 43-45; the regression is + host-load variance. +- Suggestion: re-evaluate on dedicated CI Tier-1 hardware where the + host is not under interactive load; defer to a future test-host + selection PBI. No blocking action this cycle. + +## Verdict + +**PASS_WITH_WARNINGS** — four Low-severity findings, three of which +are carry-over hygiene / spec items (F1, F3, F4) and one of which +(F2) is a structural inconsistency between a task-spec AC and a +module docstring expectation that pre-dates this cumulative window. + +No Critical, no High, no Medium findings. The implement skill's +auto-fix gate accepts the verdict without intervention. Recommend: + +- Surface F2 + F3 to the user as a small spec-hygiene follow-up + pair (one ticket, ≤ 2pt). +- Confirm AZ-508 (ISO-timestamp consolidation hygiene PBI) is + scoped to absorb F1 when it is scheduled. +- Leave F4 untouched until a dedicated CI Tier-1 host is available. diff --git a/_docs/_autodev_state.md b/_docs/_autodev_state.md index e1c749a..38213f4 100644 --- a/_docs/_autodev_state.md +++ b/_docs/_autodev_state.md @@ -8,9 +8,9 @@ status: in_progress sub_step: phase: 7 name: batch-loop - detail: "batch 45 — AZ-341 (C2 FAISS retrieve wiring)" + detail: "cumulative review batches 43-45 complete; selecting batch 46" retry_count: 0 cycle: 1 tracker: jira last_completed_batch: 45 -last_cumulative_review: batches_40-42 +last_cumulative_review: batches_43-45