chore: record cumulative review batches 28-30 + state

PASS_WITH_WARNINGS verdict for batches 28-30 (AZ-305, AZ-307, AZ-308);
five findings, all Medium/Low — module-layout drift + cross-batch DRY.
No Critical/High, no auto-fix gate; per implement Step 14.5,
PASS_WITH_WARNINGS continues to the next batch.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-12 21:47:40 +03:00
parent d571ca25f9
commit afe42f451c
2 changed files with 137 additions and 1 deletions
@@ -0,0 +1,136 @@
# Cumulative Code Review — Batches 2830 / Cycle 1
**Date**: 2026-05-12
**Mode**: Cumulative (all 7 phases, emphasis on Phases 6 + 7)
**Batches covered**: 28, 29, 30
**Tasks covered**: AZ-305 (C6 PostgresFilesystemStore), AZ-307 (C6 FreshnessGate), AZ-308 (C6 CacheBudgetEnforcer + BudgetEnforcedTileStore)
**Changed files in scope**: 17 (production + tests + contracts)
**Verdict**: **PASS_WITH_WARNINGS**
## Summary
No Critical or High findings. Five findings total — two Medium (Architecture), three Low (Maintainability / Architecture). The dominant drift is the same family of issues the 2327 review flagged for c7_inference: `_docs/02_document/module-layout.md` did not absorb the new files / new public surface introduced by batches 2830, and the c6 package `__init__.py` re-export set has fallen behind the contract files updated in this window.
No cross-component Public API bypasses, no new module-import cycles, no duplicate symbols leaking across components, no layer-direction violations. Inside the c6 component, the three new modules consistently use the AZ-272 FDR record family, the `make_fdr_client(producer_id, config)` helper, and `get_logger(producer_id)` — producer-id strings are unique per concern (`c6_tile_cache.store`, `c6_tile_cache.freshness`, `c6_tile_cache.budget`).
The decorator topology (BudgetEnforcedTileStore wrapping PostgresFilesystemStore + injected CacheBudgetEnforcer + optional FreshnessGate inside the store) is clean and lets each policy layer be unit-tested in isolation. The asymmetry between `build_tile_store` and `build_tile_metadata_store` deserves a note before more callers wire both (see F4).
## Findings
| # | Severity | Category | File:Line | Title |
|---|----------|----------|-----------|-------|
| 1 | Medium | Architecture | `src/gps_denied_onboard/components/c6_tile_cache/__init__.py:41-86` | `CacheBudgetExhaustedError` added in AZ-308 but not re-exported in package `__all__` |
| 2 | Medium | Architecture | `_docs/02_document/module-layout.md:141-150` | c6_tile_cache section stale: missing `freshness_gate.py` / `cache_budget_enforcer.py` / `tools.py`; Public API description lists non-existent symbols (`Tile`, `TileRecord`); `connection.py` "not landed yet" annotation outdated |
| 3 | Low | Maintainability | `c6_tile_cache/{postgres_filesystem_store,freshness_gate,cache_budget_enforcer}.py` | Three identical private `_iso_ts_now()` helpers inside one component |
| 4 | Low | Architecture | `src/gps_denied_onboard/runtime_root/storage_factory.py:65-141` | `build_tile_store` and `build_tile_metadata_store` each call `PostgresFilesystemStore.from_config` — two independent pools + freshness gates if both are invoked |
| 5 | Low | Architecture | `c6_tile_cache/cache_budget_enforcer.py:316-355` + `tile_store.md` v1.1.0 | Budget enforcement decorates only `TileStore.write_tile`; `TileMetadataStore.insert_metadata` bypasses the budget — intentional for pre-flight provisioning but not documented as an explicit exclusion |
### Finding Details
**F1: `CacheBudgetExhaustedError` added in AZ-308 but not re-exported in package `__all__`** (Medium / Architecture)
- Location: `src/gps_denied_onboard/components/c6_tile_cache/__init__.py:41-86`
- Description: AZ-308 added `CacheBudgetExhaustedError` to `errors.py` `__all__` and v1.1.0 of the `tile_store.md` contract documents it as part of the `TileCacheError` family ("New error variant added to `TileCacheError`"). The package-level `__init__.py` was not updated — it still imports and re-exports 8 errors and omits `CacheBudgetExhaustedError`. A future consumer (C11 TileUploader, C5 state estimator) that wants `except CacheBudgetExhaustedError as exc:` must either widen the catch to the family base `TileCacheError` (loses the diagnostic fields `needed_bytes` / `available_bytes` / `evicted_count`) or import from the Internal `errors` module — which violates the "Public API only" cross-component import rule documented in `module-layout.md`.
- Suggestion: in `c6_tile_cache/__init__.py`, add `CacheBudgetExhaustedError` to the `errors` import block (alphabetical position between `ContentHashMismatchError` and `FreshnessRejectionError`) and to `__all__`. Re-run `tests/unit/c6_tile_cache/test_protocol_conformance.py` to confirm no test regressed (the existing tests catch `TileCacheError`).
- Task: AZ-308 (drift introduced in batch 30)
**F2: c6_tile_cache module-layout section stale** (Medium / Architecture)
- Location: `_docs/02_document/module-layout.md:141-150`
- Description: three drift problems compound at this section.
1. **Public API line stale**: line 141 says `__init__.py` re-exports `TileStore, Tile, TileQualityMetadata, TileRecord, SectorClassification`. Actual `__all__` exposes 26 symbols including all three Protocols (`TileStore`, `TileMetadataStore`, `DescriptorIndex`), DTOs (`TileId`, `TileMetadata`, `TileMetadataPersistent`, `Bbox`, `SectorBoundary`, `HnswParams`, `IndexMetadata`), enums (`TileSource`, `FreshnessLabel`, `VotingStatus`, `SectorClassification`), the ABC `TilePixelHandle`, the config block, and the error family. Additionally, `Tile` and `TileRecord` are not symbols in the package — they correspond to `TileMetadata` / `TileMetadataPersistent`. This is the same Public API drift pattern that F3 of the 2327 review flagged for c7_inference; the fix in commit `1141d17` synced c7 but the c6 entry never got the equivalent refresh.
2. **Missing Internal files**: batches 2830 added `tools.py` (AZ-305, batch 28 operator CLI), `freshness_gate.py` (AZ-307, batch 29), and `cache_budget_enforcer.py` (AZ-308, batch 30). None are in the Internal bullets. `errors.py` (which existed and grew new entries in batches 28 + 30) is also not listed explicitly; either it should be added or the doc should call out that error files are implicitly part of every component.
3. **`connection.py` annotation outdated**: line 148 still says `connection.py (planned — psycopg_pool ConnectionPool helper; AZ-305, not landed yet)`. AZ-305 landed in batch 28 and uses `psycopg_pool.ConnectionPool` directly inside `PostgresFilesystemStore.from_config`; no dedicated `connection.py` was needed. The line should be removed.
- Suggestion: rewrite the Internal bullets to mirror the actual files in `src/gps_denied_onboard/components/c6_tile_cache/`:
- `_native/` (FAISS wrapper, planned)
- `migrations.py` (AZ-304 migration runner)
- `_uuid_namespace.py` (AZ-304 derive helpers)
- `postgres_filesystem_store.py` (AZ-305 TileStore + TileMetadataStore impl)
- `freshness_gate.py` (AZ-307 active_conflict reject + stable_rear downgrade)
- `cache_budget_enforcer.py` (AZ-308 RESTRICT-SAT-2 10 GiB hard cap + `BudgetEnforcedTileStore` decorator)
- `tools.py` (operator dump CLI, AZ-305)
- `errors.py`, `config.py`, `_types.py`, `_tile_pixel_handle.py`, `interface.py` (component plumbing)
And rewrite the Public API line to reference `__init__.py`'s `__all__` as the canonical list, mirroring the wording the 2327 review applied to c7_inference (line 157 of module-layout.md).
- Task: AZ-305 / AZ-307 / AZ-308 (incremental drift)
**F3: Three identical `_iso_ts_now()` helpers inside one component** (Low / Maintainability)
- Location: `c6_tile_cache/postgres_filesystem_store.py:106-108`, `c6_tile_cache/freshness_gate.py:119-126`, `c6_tile_cache/cache_budget_enforcer.py:83-90`
- Description: each module privately defines:
```python
def _iso_ts_now() -> str:
return datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%S.%fZ")
```
Same body, same purpose (FDR record envelope `ts` field). If the FDR record envelope timestamp format ever changes (e.g., to add nanoseconds, or to drop the microseconds), three files have to move in lockstep. The duplication appeared because each batch was written self-contained.
- Suggestion: extract into a single intra-component module (e.g., `c6_tile_cache/_timestamp.py` with `iso_ts_now()`) or, since this is a cross-cutting concern, into `gps_denied_onboard/fdr_client/timestamp.py` next to the records module — the FDR envelope is the only thing that needs it. Drop the leading underscore once it's promoted to a single intra-package symbol. Tests do not need to change because they assert on FDR-record shape, not on this helper.
- Task: AZ-305 / AZ-307 / AZ-308 (cross-batch DRY)
**F4: Two independent pools + freshness gates if both factories are invoked** (Low / Architecture)
- Location: `src/gps_denied_onboard/runtime_root/storage_factory.py:65-141`
- Description: `build_tile_store` constructs a `PostgresFilesystemStore.from_config(config)` (which opens a `ConnectionPool` + constructs a `FreshnessGate` against a second `make_fdr_client` and a `WallClock`), then wraps it in `BudgetEnforcedTileStore`. `build_tile_metadata_store` separately constructs another `PostgresFilesystemStore.from_config(config)`, which opens a fresh `ConnectionPool` + fresh `FreshnessGate`. Today only `tests/unit/c6_tile_cache/test_protocol_conformance.py` calls both factories (with stubbed module patching, so no real pool is opened). Once a production composition root wires both — e.g., C11 `TileUploader` wants `pending_uploads()` (a `TileMetadataStore` method) while F1 / F4 producers use `write_tile` (a `TileStore` method) — the system will run with two pools, two freshness gates, two LRU clocks, and two `c6.store.construct` startup log lines.
The factory docstring (`build_tile_metadata_store`) already names this as deliberate ("future SQLite Tier-0 variant can swap one without the other"), but the duplication latency is a real cost: each `from_config` does a synchronous orphan-reconciliation scan + multiple `tile_freshness_rules` / `sector_classifications` reads at construction time.
- Suggestion: when the first production composition root wires both factories (likely the C11 epic), revisit one of:
- **Option A**: have `storage_factory` memoise the `PostgresFilesystemStore` instance by `(root_dir, dsn)` and return the same object from both factories — preserves the protocol-narrow return types while sharing state.
- **Option B**: introduce a `build_tile_cache(config)` super-factory that returns a single object exposing both Protocols; deprecate the two per-Protocol factories. Aligns with the AZ-305 design decision to collapse both surfaces onto one class.
No code change required today; record this in the C6 component description as a known-latent issue so the next caller does not silently double the resource cost.
- Task: AZ-305 / AZ-308 (latent — exposed when a second caller wires the metadata-store factory)
**F5: `insert_metadata` is not budget-checked** (Low / Architecture)
- Location: `c6_tile_cache/cache_budget_enforcer.py:316-355` + `_docs/02_document/contracts/c6_tile_cache/tile_store.md` v1.1.0
- Description: `BudgetEnforcedTileStore` decorates the four `TileStore` methods (`read_tile_pixels`, `write_tile`, `tile_exists`, `delete_tile`). `TileMetadataStore.insert_metadata` (which AZ-305 implements on the same `PostgresFilesystemStore` instance, but reached via the `build_tile_metadata_store` factory — F4 above) is NOT budget-gated, even though it grows `total_disk_bytes` (the row reports the on-disk file's bytes). Looking at the AZ-308 spec rationale ("the enforcer is the SOLE eviction path during a flight" — flight scope), this is plausibly intentional: `insert_metadata` is documented for the F1 pre-flight TileDownloader path where the operator is provisioning a cache for a planned sortie and the budget concept does not apply yet. But the omission is not stated explicitly in `tile_store.md` v1.1.0.
- Suggestion: add a one-line invariant or exclusion note to `tile_store.md` v1.1.0 — for example: "I-X: `insert_metadata` is intentionally outside the `BudgetEnforcedTileStore` envelope. Pre-flight tile provisioning (C10) is responsible for staying under the budget by construction; in-flight ingest goes through `write_tile` and is budget-gated." This converts an implicit design decision into a documented one and prevents a future code-reviewer from "fixing" it.
- Task: AZ-308 / AZ-316 (contract clarification — no code change required)
## Phase Coverage
| Phase | Outcome |
|-------|---------|
| 1. Context loading (task specs + restrictions + solution) | Loaded AZ-305 / AZ-307 / AZ-308 specs, batch reports, `tile_store.md` v1.0.0 → v1.1.0, `fdr_record_schema.md` v1.0.0 → v1.3.0, `architecture.md` (c6 sections), `module-layout.md` c6 section |
| 2. Spec compliance | Re-verified Acceptance Criteria mapping against tests for all three tasks; every per-batch review verdict was already PASS. No spec gaps surfaced cumulatively. |
| 3. Code quality (SOLID / DRY / error handling / naming) | F3 (DRY `_iso_ts_now()`) — Low. Otherwise clean: every public method validates inputs at the boundary, every third-party exception is rewrapped into the `TileCacheError` family, naming is consistent (`c6.<concern>.<event>` for log/FDR kinds, `_<helper>` for private functions). |
| 4. Security quick-scan | All SQL is parameterized via `cur.execute(sql, params_tuple)` — no f-string interpolation. No shell, no eval, no deserialization of untrusted data. `tools.py` and the two CLIs read from operator-supplied args + `os.environ`; no secrets in logs (the `c6.store.construct` log emits only DB-side counts, not the DSN). Clean. |
| 5. Performance scan | `_reconcile_orphans_at_construction` does a synchronous filesystem walk + DB summary at startup; O(N) in row count, runs once per process. `FreshnessGate._classify` is sub-microsecond (rtree point-in-rect lookup). `CacheBudgetEnforcer.reserve_headroom` no-evict path is one indexed SELECT; the eviction-path SQL plan (`ORDER BY accessed_at ASC, id ASC LIMIT %s`) relies on the `tiles(accessed_at)` index which `0002_c6_tile_identity_and_lru.py` (AZ-304, batch 27) provides. No N+1 patterns, no unbounded fetches (all selects are LIMIT-capped or summary aggregations). |
| 6. Cross-task consistency | The three batches consistently use the AZ-272 FDR record family with v1.1.0 → v1.3.0 schema additions, the `make_fdr_client(producer_id, config)` + `get_logger(producer_id)` factory pair, and `WallClock()` for production timestamping with injectable Clock for tests. Producer-id strings are unique (`c6_tile_cache.store` / `c6_tile_cache.freshness` / `c6_tile_cache.budget`) — no namespace collisions. F3 (duplicate `_iso_ts_now`) is the only consistency drift. |
| 7. Architecture compliance (incl. duplicate-symbol + cycle scans) | F1, F2, F4, F5 as described. No new module-import cycles (verified by walking `from gps_denied_onboard....` imports in the four c6 production files added/modified — every cross-component import goes through a Public API surface; intra-component imports are flat). No duplicate symbols across components (`CacheBudgetEnforcer`, `FreshnessGate`, `BudgetEnforcedTileStore`, `MmapTilePixelHandle`, `_iso_ts_now`, `_PRODUCER_ID` are all c6-scoped). |
## Baseline Delta
`_docs/02_document/architecture_compliance_baseline.md` does not exist (this is a greenfield project that never ran a baseline scan), so no Carried-over / Resolved / Newly-introduced partitioning applies. Phase 7 findings are emitted as cumulative-only.
## Cross-batch Test Health
Each batch report flagged `tests/unit/c6_tile_cache/test_ac13_read_tile_pixels_warm_latency_p95` as a heterogeneous-host flake under the combined c6 suite (passing 3-of-3 standalone, occasionally over the 5 ms p95 ceiling when stacked behind the AZ-307 / AZ-308 suites). Verified by `git stash -u` round-trip in batches 28 + 29 + 30 that the regression is shared (i.e., not caused by 29 or 30) and standalone re-runs pass. This is an existing infrastructure note, not a new finding — tracked as a known flake outside this report.
The full Tier-2 suite ran at the end of batch 30: 1267 passed, 8 environment-skipped (CUDA-only, cmake, actionlint), 1 deselected (pynvml). No regressions surfaced by cumulative review.
## Recommended Follow-up Tasks
These belong in the next implement batch (cumulatively or interleaved):
1. **Doc-sync**: address F1 + F2 in a single docs commit — update `c6_tile_cache/__init__.py` re-exports and `module-layout.md` c6 Internal + Public API description. Estimated size: 1 story point. Owner: cross-cutting docs hygiene.
2. **DRY**: address F3 — extract one `_iso_ts_now()` and update the three callers. Estimated size: 1 story point.
3. **Contract clarification**: F5 — single-line addition to `tile_store.md` v1.1.0 (already v1.1.0; this is a minor wording addition under the same version, or bump to v1.1.1 if any contract reader treats this as observable). Estimated size: 1 story point.
4. **Latent F4** — no code change today; track in the C6 component description as a "known cost when first dual-factory caller lands".
Items 13 are small enough to fold into the next code-hygiene mini-batch alongside the next product task. Item 4 is documented here and in the C6 component description; no immediate action.
## Verdict
PASS_WITH_WARNINGS — five findings, all Medium or Low, no Critical or High. Auto-fix gate per `implement/SKILL.md` Step 10:
| Finding | Severity | Category | Auto-fix? |
|---------|----------|----------|-----------|
| F1 | Medium | Architecture | escalate (per matrix) |
| F2 | Medium | Architecture | escalate (per matrix) |
| F3 | Low | Maintainability | yes |
| F4 | Low | Architecture | escalate (per matrix, even though Low — Architecture category is the gate) |
| F5 | Low | Architecture | escalate |
PASS_WITH_WARNINGS does NOT trigger the FAIL auto-fix loop. The findings are surfaced as info and the implement skill continues to the next batch per Step 14.5 ("PASS or PASS_WITH_WARNINGS → continue to next batch"). Items 13 are good candidates to bundle into a small docs/hygiene batch before or alongside the next product task; F4 + F5 remain on the cumulative-record backlog.
+1 -1
View File
@@ -8,7 +8,7 @@ status: in_progress
sub_step:
phase: 14
name: batch-loop
detail: ""
detail: "next: docs-hygiene mini-batch (F1+F2+F3 of cumulative_review_batches_28-30)"
retry_count: 0
cycle: 1
tracker: jira