# Batch 30 / Cycle 1 — Implementation Report **Date**: 2026-05-12 **Tasks**: AZ-308 (C6 Cache Budget Eviction — 10 GB hard cap with LRU sweep) **Story points landed**: 3 **Status**: complete (AZ-308 → In Testing) ## Scope summary Single-task batch landing the production `CacheBudgetEnforcer` — the policy layer that converts AZ-303's `total_disk_bytes` / `lru_candidates` / `delete_tile` / `record_lru_access` primitives into RESTRICT-SAT-2's **10 GB hard cap**. The enforcer runs **synchronously inside `write_tile`** via the new `BudgetEnforcedTileStore` decorator: every write first asks `reserve_headroom(len(tile_blob))`; if head-room is sufficient the call is a single `total_disk_bytes()` SELECT and returns immediately, otherwise the enforcer iterates `lru_candidates(max_count=eviction_batch_size)` in 32-row batches, deletes the oldest tiles via `delete_tile`, and stops as soon as the freed bytes meet the shortfall. If the candidate list is exhausted without meeting the budget, `CacheBudgetExhaustedError` is raised **after** the full sweep (per AC-5 — partial eviction beats no eviction so the operator's recovery has maximum head-room). Eviction is observable end-to-end: one INFO log per evicted tile (`kind="c6.evicted"`, payload `{tile_id, disk_bytes, accessed_at, evicted_at}`), one FDR record per eviction batch (`kind= "c6.eviction_batch"`, payload `{trigger_tile_id, freed_bytes, evicted_count, evicted_tile_ids[:5]}` — capped to 5 ids to keep the record bounded), and one construction-time INFO log (`kind="c6.budget.loaded"`) so the operator sees `(budget_bytes, current_disk_bytes, headroom_bytes)` at process start (with a WARN if the prior flight ended over-budget). The AZ-305 LRU-clock hook is now wired: `PostgresFilesystemStore` accepts an optional `lru_clock: Clock | None = None` ctor argument, and when set, every `read_tile_pixels` call invokes `record_lru_access( tile_id, now)` after the row/file existence check. The unit-test path (AZ-305's existing fixtures) can still construct the store with `lru_clock=None`, preserving the AZ-305 contract. Production wiring in `storage_factory.build_tile_store` always injects `WallClock()` into the inner store and wraps the result in `BudgetEnforcedTileStore`. The decorator pattern is mandatory per the spec § Constraints — making budget enforcement a wrapper keeps the policy layer separable from the store impl (single-responsibility), and a future voting-tier-aware policy can replace the enforcer without changing `PostgresFilesystemStore`. ## Files added / modified ### New (production) - `src/gps_denied_onboard/components/c6_tile_cache/cache_budget_enforcer.py` — `EvictionResult` frozen dataclass; `_iso_ts_now` UTC helper; `CacheBudgetEnforcer` class with one public method `reserve_headroom(needed_bytes) -> EvictionResult` doing the no-evict fast-path → LRU-sweep escalation flow, emitting one INFO log per eviction and one FDR record per batch, plus the AC-12 construction-time `c6.budget.loaded` INFO log (with optional WARN on over-budget startup); `BudgetEnforcedTileStore` decorator implementing the `TileStore` Protocol by delegating `read_tile_pixels` / `tile_exists` / `delete_tile` straight through and calling `enforcer.reserve_headroom(len(tile_blob))` before delegating `write_tile`; and an operator CLI (`python -m gps_denied_onboard.components.c6_tile_cache.cache_budget_enforcer dry-run --pretend-needed-bytes N`) that loads config via `load_config(os.environ)` and prints what WOULD be evicted without performing the eviction (no `delete_tile` call, no FDR write, no INFO log). ### Modified (production) - `src/gps_denied_onboard/components/c6_tile_cache/errors.py` — adds `CacheBudgetExhaustedError` to the `TileCacheError` family with diagnostic fields `needed_bytes`, `available_bytes`, `evicted_count` (all keyword-only, all default to `None` so the parameter set is forward-compatible with future tightening). - `src/gps_denied_onboard/components/c6_tile_cache/config.py` — adds the `eviction_batch_size: int = 32` config knob (default per spec § Constraints, validated `> 0` in `__post_init__`); the existing `lru_eviction_threshold_bytes` already provides the budget. - `src/gps_denied_onboard/components/c6_tile_cache/postgres_filesystem_store.py` — adds optional `lru_clock: Clock | None = None` ctor arg; when present, `read_tile_pixels` calls `self.record_lru_access(tile_id, now_dt)` after row/file existence checks succeed, where `now_dt = datetime.fromtimestamp( self._lru_clock.time_ns() / 1e9, tz=UTC)`. `from_config` now injects `WallClock()` so the production path always updates the LRU clock; AZ-305's unit tests that construct the store directly with no clock keep the pass-through behaviour (the LRU UPDATE is guarded by `if self._lru_clock is not None`). - `src/gps_denied_onboard/fdr_client/records.py` — adds `c6.eviction_batch` (payload `{trigger_tile_id, freed_bytes, evicted_count, evicted_tile_ids}` capped to 5 ids per AC-11) to `KNOWN_PAYLOAD_KEYS`. The per-tile `c6.evicted` event is INFO log only (it is high-frequency under load and would dilute the FDR ring-buffer; aggregated batch counts go to FDR). - `src/gps_denied_onboard/runtime_root/storage_factory.py` — `build_tile_store` now constructs a `PostgresFilesystemStore`, a `CacheBudgetEnforcer` wired to a producer-local `FdrClient` (`producer_id="c6_tile_cache.budget"`) and the C6 logger, with `budget_bytes = config.tile_cache.lru_eviction_threshold_bytes` and `eviction_batch_size = config.tile_cache.eviction_batch_size` — then wraps the store in a `BudgetEnforcedTileStore` and returns the decorator. `build_tile_metadata_store` is unchanged (the decorator only intercepts `TileStore`, never the metadata store). ### Modified (tests) - `tests/unit/c6_tile_cache/test_cache_budget_enforcer.py` — **NEW** suite of 18 tests: - 4 non-docker unit tests for `CacheBudgetEnforcer` against an in-memory `_FakeStore` covering AC-1 (no-eviction fast path), AC-2 (single-tile sweep), AC-3 (multi-tile until shortfall met), AC-4 (batch-size-respecting `lru_candidates` calls). - 3 non-docker tests for the error-handling envelope: AC-5 (sweep exhausted → `CacheBudgetExhaustedError` AFTER all candidates deleted), AC-7 (decorator does NOT rewrap a `ContentHashMismatchError` from the inner store), AC-9 (SELECT-count tally for no-evict vs evict paths). - 4 non-docker tests for FDR + log payloads: AC-11 (evicted_tile_ids truncated to 5 even when 100 evictions occurred), AC-12 (construction-time `c6.budget.loaded` INFO log + WARN-on-over- budget), and the NFR-reliability "candidate gone mid-sweep" case where `delete_tile` returns False. - 1 non-docker NFR test (`reserve_headroom × 10000` no-evict path with a strict p99 ≤ 5 ms ceiling). - 3 `@pytest.mark.docker` Tier-2 tests against a real Postgres (composition-root smoke): AC-6 (decorator + `write_tile` end-to-end with near-cap state), AC-8 (real `read_tile_pixels` bumps the LRU clock and changes `lru_candidates` ordering), and AC-10 (synthetic-fill test — 50 MB of writes under a deliberately tight 50 MB pre-eviction headroom; verifies eviction kicks in and disk usage never exceeds the cap). - 3 protocol-shape sanity tests (`EvictionResult` is frozen and `total_freed_bytes` derives correctly, the wrapper exposes the underlying store as `_wrapped`, and the decorator passes `tile_exists` / `delete_tile` straight through). - `tests/unit/c6_tile_cache/test_protocol_conformance.py` — adjusted `_install_fake_postgres_store_module` to provide a working `total_disk_bytes() -> 0` (the prior `NotImplementedError` stub would break `CacheBudgetEnforcer.__init__` which reads the value for AC-12); and rewrote `test_ac4_build_tile_store_returns_protocol_impl` to recognise the AZ-308 wrapper (`isinstance(store, BudgetEnforcedTileStore)`, `isinstance(store, TileStore)`, `isinstance(store._wrapped, fake_cls)`). No new fakes; the change is local to one shared helper + one test. - `tests/unit/test_az272_fdr_record_schema.py` — adds a fixture payload for the new `c6.eviction_batch` kind so the AZ-272 per-kind round-trip test covers it. ### Modified (docs) - `_docs/02_document/contracts/shared_fdr_client/fdr_record_schema.md` — bumped to v1.3.0; added a row for `c6.eviction_batch` (producer `c6_tile_cache.budget`, payload shape, cap-to-5 note) in the v1.0.0 closed-enum table and a change-log entry. - `_docs/02_document/contracts/c6_tile_cache/tile_store.md` — bumped to v1.1.0 (additive); `CacheBudgetExhaustedError` joins the `TileCacheError` family diagram + change-log entry per the Versioning Rules § "new error variant added to `TileCacheError`". ## Acceptance criteria coverage | AC | Test | Status | |----|------|--------| | AC-1 No-eviction fast path | `test_ac1_no_eviction_fast_path` | passing | | AC-2 Single-tile eviction frees enough | `test_ac2_single_tile_eviction_frees_enough` | passing | | AC-3 Multi-tile eviction iterates LRU candidates | `test_ac3_multi_tile_eviction_iterates_until_target` | passing | | AC-4 Eviction batches respect `eviction_batch_size` | `test_ac4_eviction_batches_respect_batch_size` | passing | | AC-5 Insufficient candidates raise `CacheBudgetExhaustedError` | `test_ac5_insufficient_candidates_raise_after_full_sweep` | passing | | AC-6 `BudgetEnforcedTileStore` decorator integrates with `write_tile` | `test_ac6_decorator_write_tile_triggers_eviction` (Docker) | passing | | AC-7 Decorator propagates `TileCacheError` unchanged | `test_ac7_decorator_propagates_tilecacheerror_unchanged` | passing | | AC-8 `read_tile_pixels` updates the LRU clock | `test_ac8_read_tile_pixels_updates_lru_clock` (Docker) | passing | | AC-9 No-evict path = 1 SELECT; evict path = 1 + N + N | `test_ac9_no_evict_path_uses_single_select` | passing | | AC-10 10 GB budget enforcement under synthetic load | `test_ac10_synthetic_load_stays_under_cap` (Docker) | passing | | AC-11 FDR `evicted_tile_ids` capped to 5 | `test_ac11_fdr_evicted_tile_ids_capped_at_five` | passing | | AC-12 Construction-time disk-bytes report | `test_ac12_construction_emits_budget_loaded_info` + `test_ac12_construction_warns_when_over_budget` | passing | | NFR-perf no-evict p99 ≤ 5 ms | `test_nfr_perf_no_evict_path_p99_under_5ms` | passing | | NFR-reliability candidate-gone mid-sweep | `test_nfr_reliability_delete_returns_false_no_op` | passing | ## AC Test Coverage: 12 of 12 covered (+ 2 NFRs + 1 frozen-dataclass shape test) ## Code Review Verdict: PASS ## Auto-Fix Attempts: 1 (ruff `format` + `check` — 8 cosmetic findings auto-resolved: 4 ambiguous `×` characters in comments, 3 unused `noqa: ARG002` directives, 1 unescaped-metacharacter regex in `pytest.raises(match=...)`) ## Stuck Agents: None ## Findings (self-review) | # | Severity | Category | Location | Note | Resolution | |---|----------|----------|----------|------|------------| | 1 | Low | Maintainability | `CacheBudgetEnforcer.__init__` | The ctor runs `self._store.total_disk_bytes()` synchronously to emit the AC-12 startup INFO log. If the metadata store's pool is contended at process start, this blocks the composition-root path. Accepted because the enforcer is constructed once per process and the cost is one indexed SELECT. | Open (Low) — accepted as-is. | | 2 | Low | Test-quality | `test_ac10_synthetic_load_stays_under_cap` | Uses a 50 MB synthetic budget (not the 10 GB production cap) to keep the test reasonable on a dev laptop. The cap-enforcement logic is the same shape; the test verifies the loop terminates correctly and disk usage never exceeds the cap. | Open (Low) — accepted as-is. | | 3 | Low | Test-quality | `test_ac8_read_tile_pixels_updates_lru_clock` | Wall-clock parity between the host (Python) and Postgres container is sub-second-skew on macOS/Colima, so a real `record_lru_access` UPDATE with the host wall clock can lose to `GREATEST(accessed_at, %s)` against the DB's `DEFAULT now()`. Test pins the LRU clock to a far-future timestamp (`2099-01-01`) via a fixture-local `_FakeClock`; production wiring (`storage_factory`) still injects `WallClock()`. | Open (Low) — accepted as-is. | | 4 | Low | Adjacent-Hygiene | `tests/unit/c6_tile_cache/test_protocol_conformance.py::_FakePostgresFilesystemStore` | The AZ-303 protocol-conformance fake inherits `total_disk_bytes` from `_FullTileMetadataStore` which raises `NotImplementedError`. Once `build_tile_store` started constructing a `CacheBudgetEnforcer` (which calls `total_disk_bytes` at construction), this stub broke the test. Overrode `total_disk_bytes` on the AZ-308 path to return 0 — minimal change, no other test using the shared helper changed semantically. | **FIXED** in this batch. | | 5 | Low | Maintainability | `BudgetEnforcedTileStore._wrapped` | The wrapper exposes the inner store via a private `_wrapped` attribute so tests + future debugging can introspect it. This is documented in the AC-4 protocol-conformance test comment; not part of the public Protocol contract (the Protocol only requires the four `TileStore` methods, which the wrapper provides). | Open (Low) — accepted as documented. | ## Tracker - AZ-308 transitioned to **In Progress** on session start; will be moved to **In Testing** post-commit per `protocols.md`. ## Test suite - `tests/unit/c6_tile_cache/test_cache_budget_enforcer.py` (18 tests) — passing standalone (Tier-2 + Docker Postgres) and as part of the combined c6 suite (193 / 194 passed in the combined run; see below). - `tests/unit/c6_tile_cache/` (194 tests) — 193 passing; the same `test_ac13_read_tile_pixels_warm_latency_p95` flake noted in the AZ-307 batch 29 report (Finding 3 of the AZ-305 batch 28 report) surfaces under combined load. Verified non-regression by `git stash -u` round-trip: with my AZ-308 changes stashed, the same test still fails (`p95 ≈ 8 ms` vs the 5 ms ceiling) in the combined run, and passes 3-of-3 standalone. Not a blocker for AZ-308. - `tests/unit/test_az272_fdr_record_schema.py` — passing with the new `c6.eviction_batch` kind fixtured. - Full unit suite (excluding `tests/integration/` and the unrelated c7 `test_ac8_read_host_tuple_on_jetson` that requires `pynvml`, pre-existing) — 1267 passed, 8 environment-skipped (CUDA-only, cmake, actionlint), 1 deselected (pynvml). ## Next batch Cycle 1 advances per the greenfield queue — autodev re-detects the next AZ ticket in the Step 7 batch loop and continues.