[AZ-305] c6 PostgresFilesystemStore: TileStore + TileMetadataStore impl

Adds the production PostgresFilesystemStore implementing both protocols
in a single class. Filesystem-backed JPEG I/O (atomic sidecar write,
read-only mmap) + Postgres-backed metadata (spatial bbox, LRU, voting,
upload bookkeeping). Wires composition via `from_config` classmethod.

Key behaviors:
- AC-3 strict reading: INSERT runs first inside an open transaction;
  duplicate-key collisions raise `TileMetadataError` BEFORE any byte is
  written, leaving the original file + sidecar byte-identical. Atomic
  sidecar write happens inside the same transaction; commit closes it.
  Comp-delete remains as a safety net for the rare commit-after-write
  failure path.
- AC-2 content-hash gate runs before any I/O.
- Construction performs an orphan-file reconciliation scan and emits an
  INFO `c6.store.construct` log with steady-state stats.

Adds `c6.write` and `c6.write_failed` FDR record kinds (schema v1.1.0,
forward-compatible) and a thin operator CLI at
`c6_tile_cache.tools dump` for inspection.

Dependencies: adds `psycopg-pool>=3.2,<4.0` for the connection pool used
on the F3 read-hot path.

Tests: 25 new tests for c6_tile_cache cover AC-1..AC-15 plus
MmapTilePixelHandle + helper round-trips. Full Tier-2 unit suite passes
(1215 passed, 8 skipped, 1 pre-existing unrelated failure
`test_ac8_read_host_tuple_on_jetson` — missing `pynvml` on macOS,
Jetson-only).

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-12 18:01:50 +03:00
parent bf33b94260
commit d1c1cd9ab4
14 changed files with 2382 additions and 18 deletions
@@ -0,0 +1,151 @@
# Batch 28 / Cycle 1 — Implementation Report
**Date**: 2026-05-12
**Tasks**: AZ-305 (C6 PostgresFilesystemStore — TileStore + TileMetadataStore production impl)
**Story points landed**: 5
**Status**: complete (AZ-305 → In Testing)
## Scope summary
Single-task batch landing the production `PostgresFilesystemStore` — the
single class that satisfies BOTH `TileStore` (filesystem-backed JPEG I/O
byte-identical to `satellite-provider`) and `TileMetadataStore`
(Postgres-backed spatial / LRU / voting state). Owns the full insert
path (atomic-write + SHA-256 sidecar via AZ-280, content-hash gate,
single-transaction row insert, compensating delete on failure), the read
path (`MmapTilePixelHandle` read-only mmap, btree-indexed bbox query,
LRU access stamp), and bookkeeping (`mark_uploaded`,
`update_voting_status`, `lru_candidates`, `total_disk_bytes`). Wires the
freshness-gate call site (pass-through hook for AZ-307 to replace) and
exposes the LRU primitives AZ-308 will consume.
The class is invoked from `storage_factory` via a new `from_config`
classmethod that resolves the `psycopg_pool.ConnectionPool`, the
producer-local `FdrClient` (via `make_fdr_client`), and the project
logger. `__init__` itself takes explicit injected dependencies so unit
tests can substitute the `FakeFdrSink`, a `tmp_path` root, and a
test-managed pool without touching the composition root.
## Files added / modified
### New (production)
- `src/gps_denied_onboard/components/c6_tile_cache/postgres_filesystem_store.py`
`MmapTilePixelHandle` (read-only `PROT_READ` mmap returning a
`.toreadonly()` `memoryview`); `PostgresFilesystemStore` with explicit
dependency-injected `__init__` and a `from_config` classmethod for the
composition root. Implements `read_tile_pixels`, `write_tile`,
`tile_exists`, `delete_tile`, `query_by_bbox`, `insert_metadata`,
`update_voting_status`, `mark_uploaded`, `pending_uploads`,
`record_lru_access`, `lru_candidates`, `total_disk_bytes`,
`get_by_id`. All third-party exceptions (`psycopg.Error`, `OSError`,
`Sha256SidecarError`) are rewrapped into the `TileCacheError` family.
Construction runs an O(N) orphan-file reconciliation scan against the
`tiles_dir` and emits an INFO `c6.store.construct` log with the
steady-state row count and disk bytes.
- `src/gps_denied_onboard/components/c6_tile_cache/tools.py`
operator-side CLI (`python -m gps_denied_onboard.components.c6_tile_cache.tools dump --zoom Z --lat LAT --lon LON [-o PATH]`)
that opens the production store via `load_config()` +
`PostgresFilesystemStore.from_config()`, reads the tile via the mmap
handle, and writes the JPEG body to stdout or the supplied file.
Intentionally no formal contract — thin shell over `read_tile_pixels`.
### Modified (production)
- `src/gps_denied_onboard/components/c6_tile_cache/config.py` — added
`postgres_pool_size: int = 4` to `C6TileCacheConfig` with `> 0`
validation per AZ-305 scope.
- `src/gps_denied_onboard/fdr_client/records.py` — added
`c6.write` (`tile_id, source, disk_bytes, content_sha256`) and
`c6.write_failed` (`tile_id, source, reason, error_class, message`)
entries to `KNOWN_PAYLOAD_KEYS`. The parser is forward-compatible
by design (unknown kinds parse opaquely), so v1.0 readers do not
break — but the new entries put the new kinds on the validated /
monitored hot path.
- `src/gps_denied_onboard/runtime_root/storage_factory.py`
`build_tile_store` and `build_tile_metadata_store` now dispatch via
`PostgresFilesystemStore.from_config(config)` so the runtime root no
longer needs to know about pool / FdrClient / logger wiring.
### Modified (tests)
- `tests/unit/c6_tile_cache/test_postgres_filesystem_store.py`
**NEW** suite of 25 tests:
- 5 non-docker unit tests for `MmapTilePixelHandle` (read-only view,
missing-file `TileFsError`, empty-file `TileFsError`),
`_quality_to_dict` round-trip, and `_row_to_metadata` NULL-voting →
`TRUSTED` normalisation.
- 15 `@pytest.mark.docker` tests covering AC-1..AC-15 against a
real Postgres + `tmp_path` filesystem.
- 5 bonus tests covering `insert_metadata` validation, `get_by_id`
absence, and per-flight separation via different `flight_id`s.
- `tests/unit/c6_tile_cache/test_protocol_conformance.py` — the AZ-303
fake `PostgresFilesystemStore` now exposes a `from_config` classmethod
so the factory dispatch keeps working; the AC-5 "module missing"
branch is now exercised by patching the lazy import site to raise
`ModuleNotFoundError`.
- `tests/unit/test_az272_fdr_record_schema.py` — added fixture payloads
for the new `c6.write` and `c6.write_failed` kinds so the per-kind
round-trip test (AC-1 of AZ-272) covers them.
### Modified (docs)
- `_docs/02_document/contracts/shared_fdr_client/fdr_record_schema.md`
bumped to v1.1.0 (non-breaking, forward-compat); added rows for the
two new kinds and a change-log entry.
### Modified (build)
- `pyproject.toml` — added `psycopg-pool>=3.2,<4.0` to dependencies
(previously only `psycopg[binary]` was pinned; the impl needs the
pool to amortise checkout latency on the F3 read path per Risk 3 of
the AZ-305 spec).
## Acceptance criteria coverage
| AC | Test | Status |
|----|------|--------|
| AC-1 round-trip byte-identical | `test_ac1_write_read_round_trip_byte_identical` | passing |
| AC-2 hash mismatch rejected before I/O | `test_ac2_content_hash_mismatch_rejects_before_io` | passing |
| AC-3 duplicate key + compensating delete | `test_ac3_duplicate_key_raises_metadata_error_with_compensating_delete` | passing |
| AC-4 row without file fails fast | `test_ac4_row_without_file_raises_metadata_error` | passing |
| AC-5 bbox deterministic order | `test_ac5_query_by_bbox_returns_deterministic_results` | passing |
| AC-6 bbox filters | `test_ac6_query_by_bbox_honours_filters` | passing |
| AC-7 voting forward transitions | `test_ac7_update_voting_status_enforces_forward_transitions` | passing |
| AC-8 mark_uploaded + pending_uploads | `test_ac8_mark_uploaded_removes_from_pending` | passing |
| AC-9 LRU monotonic | `test_ac9_record_lru_access_is_monotonic` | passing |
| AC-10 disk bytes excludes rejected | `test_ac10_total_disk_bytes_excludes_rejected` | passing |
| AC-11 delete_tile idempotent | `test_ac11_delete_tile_is_idempotent` | passing |
| AC-12 third-party errors rewrapped | `test_ac12_third_party_exceptions_rewrapped` | passing |
| AC-13 warm read p95 budget | `test_ac13_read_tile_pixels_warm_latency_p95` | passing |
| AC-14 5 Hz write burst | `test_ac14_write_tile_sustains_burst_without_drops` | passing |
| AC-15 FDR record on success/failure | `test_ac15_fdr_record_on_write_success_and_failure` | passing |
## AC Test Coverage: 15 of 15 covered
## Code Review Verdict: PASS
## Auto-Fix Attempts: 1 (ruff `--fix`; 22 of 22 findings auto-resolved) + 1 user-requested fix (AC-3 strict-reading)
## Stuck Agents: None
## Findings (self-review)
| # | Severity | Category | Location | Note | Resolution |
|---|----------|----------|----------|------|------------|
| 1 | Medium | Spec-Gap | `postgres_filesystem_store.py::_write_tile_impl` | AC-3's strictest reading required the original row + file to be byte-identical after a duplicate-key collision. Original impl wrote the sidecar BEFORE the row insert, so a duplicate fired the comp-delete on the freshly overwritten file. | **FIXED** in this batch (user chose `fix_now`): `_write_tile_impl` was reordered — INSERT now runs first inside an open transaction; only on success does the atomic sidecar write touch the canonical path; the commit then closes the transaction. Duplicate-key collisions now raise `TileMetadataError` BEFORE any byte hits disk, leaving the original file untouched. Comp-delete is retained for the (extremely rare) commit-after-write-failure path. AC-3 test asserts the strict invariant: original file bytes + sidecar are byte-identical, and `read_tile_pixels` still returns the original `blob_a`. |
| 2 | Low | Maintainability | `postgres_filesystem_store.py::_emit_write_failed` | The failure path calls `self._tile_xy()` to derive the canonical UUID for the FDR record. If `_tile_xy()` itself ever raises (it shouldn't — `TileId.__post_init__` validates lat/lon at construction), the FDR record would be lost and the exception would mask the original write-time error. Pre-validation in `TileId` keeps this safe today; revisit when `WgsConverter` gains a per-call failure mode. | Open (Low) — accepted as-is. |
| 3 | Low | Test-quality | `test_ac13_read_tile_pixels_warm_latency_p95` | The spec quotes a 0.5 ms p95 target with a 5 ms failure threshold. The test asserts only the failure threshold so it stays useful on a heterogeneous CI host; the soft 0.5 ms goal is tracked outside of this test (e.g., performance dashboards). | Open (Low) — accepted as-is. |
## Tracker
- AZ-305 transitioned to **In Progress** on session start; will be moved to **In Testing** post-commit per `protocols.md`.
## Test suite
- `tests/unit/c6_tile_cache/` (128 tests) — passing at Tier-2.
- Full Tier-2 suite (`pytest tests/unit`): 1215 passed, 8 skipped, 1 pre-existing failure (`test_ac8_read_host_tuple_on_jetson` — needs `pynvml`, Jetson-only, unrelated to AZ-305 — confirmed pre-existing on `bf33b94` by `git stash` round-trip).
## Next batch
All AZ-305 work complete. Cycle 1 has no more remaining batches in the
greenfield queue — autodev advances to the cycle-end gate (Step 7's
batch-loop exit → Step 15 Product Implementation Completeness Gate, or
the next sub-step the active flow defines).