Files
Oleksandr Bezdieniezhnykh d1c1cd9ab4 [AZ-305] c6 PostgresFilesystemStore: TileStore + TileMetadataStore impl
Adds the production PostgresFilesystemStore implementing both protocols
in a single class. Filesystem-backed JPEG I/O (atomic sidecar write,
read-only mmap) + Postgres-backed metadata (spatial bbox, LRU, voting,
upload bookkeeping). Wires composition via `from_config` classmethod.

Key behaviors:
- AC-3 strict reading: INSERT runs first inside an open transaction;
  duplicate-key collisions raise `TileMetadataError` BEFORE any byte is
  written, leaving the original file + sidecar byte-identical. Atomic
  sidecar write happens inside the same transaction; commit closes it.
  Comp-delete remains as a safety net for the rare commit-after-write
  failure path.
- AC-2 content-hash gate runs before any I/O.
- Construction performs an orphan-file reconciliation scan and emits an
  INFO `c6.store.construct` log with steady-state stats.

Adds `c6.write` and `c6.write_failed` FDR record kinds (schema v1.1.0,
forward-compatible) and a thin operator CLI at
`c6_tile_cache.tools dump` for inspection.

Dependencies: adds `psycopg-pool>=3.2,<4.0` for the connection pool used
on the F3 read-hot path.

Tests: 25 new tests for c6_tile_cache cover AC-1..AC-15 plus
MmapTilePixelHandle + helper round-trips. Full Tier-2 unit suite passes
(1215 passed, 8 skipped, 1 pre-existing unrelated failure
`test_ac8_read_host_tuple_on_jetson` — missing `pynvml` on macOS,
Jetson-only).

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-12 18:01:50 +03:00

9.4 KiB

Batch 28 / Cycle 1 — Implementation Report

Date: 2026-05-12 Tasks: AZ-305 (C6 PostgresFilesystemStore — TileStore + TileMetadataStore production impl) Story points landed: 5 Status: complete (AZ-305 → In Testing)

Scope summary

Single-task batch landing the production PostgresFilesystemStore — the single class that satisfies BOTH TileStore (filesystem-backed JPEG I/O byte-identical to satellite-provider) and TileMetadataStore (Postgres-backed spatial / LRU / voting state). Owns the full insert path (atomic-write + SHA-256 sidecar via AZ-280, content-hash gate, single-transaction row insert, compensating delete on failure), the read path (MmapTilePixelHandle read-only mmap, btree-indexed bbox query, LRU access stamp), and bookkeeping (mark_uploaded, update_voting_status, lru_candidates, total_disk_bytes). Wires the freshness-gate call site (pass-through hook for AZ-307 to replace) and exposes the LRU primitives AZ-308 will consume.

The class is invoked from storage_factory via a new from_config classmethod that resolves the psycopg_pool.ConnectionPool, the producer-local FdrClient (via make_fdr_client), and the project logger. __init__ itself takes explicit injected dependencies so unit tests can substitute the FakeFdrSink, a tmp_path root, and a test-managed pool without touching the composition root.

Files added / modified

New (production)

  • src/gps_denied_onboard/components/c6_tile_cache/postgres_filesystem_store.pyMmapTilePixelHandle (read-only PROT_READ mmap returning a .toreadonly() memoryview); PostgresFilesystemStore with explicit dependency-injected __init__ and a from_config classmethod for the composition root. Implements read_tile_pixels, write_tile, tile_exists, delete_tile, query_by_bbox, insert_metadata, update_voting_status, mark_uploaded, pending_uploads, record_lru_access, lru_candidates, total_disk_bytes, get_by_id. All third-party exceptions (psycopg.Error, OSError, Sha256SidecarError) are rewrapped into the TileCacheError family. Construction runs an O(N) orphan-file reconciliation scan against the tiles_dir and emits an INFO c6.store.construct log with the steady-state row count and disk bytes.
  • src/gps_denied_onboard/components/c6_tile_cache/tools.py — operator-side CLI (python -m gps_denied_onboard.components.c6_tile_cache.tools dump --zoom Z --lat LAT --lon LON [-o PATH]) that opens the production store via load_config() + PostgresFilesystemStore.from_config(), reads the tile via the mmap handle, and writes the JPEG body to stdout or the supplied file. Intentionally no formal contract — thin shell over read_tile_pixels.

Modified (production)

  • src/gps_denied_onboard/components/c6_tile_cache/config.py — added postgres_pool_size: int = 4 to C6TileCacheConfig with > 0 validation per AZ-305 scope.
  • src/gps_denied_onboard/fdr_client/records.py — added c6.write (tile_id, source, disk_bytes, content_sha256) and c6.write_failed (tile_id, source, reason, error_class, message) entries to KNOWN_PAYLOAD_KEYS. The parser is forward-compatible by design (unknown kinds parse opaquely), so v1.0 readers do not break — but the new entries put the new kinds on the validated / monitored hot path.
  • src/gps_denied_onboard/runtime_root/storage_factory.pybuild_tile_store and build_tile_metadata_store now dispatch via PostgresFilesystemStore.from_config(config) so the runtime root no longer needs to know about pool / FdrClient / logger wiring.

Modified (tests)

  • tests/unit/c6_tile_cache/test_postgres_filesystem_store.pyNEW suite of 25 tests:
    • 5 non-docker unit tests for MmapTilePixelHandle (read-only view, missing-file TileFsError, empty-file TileFsError), _quality_to_dict round-trip, and _row_to_metadata NULL-voting → TRUSTED normalisation.
    • 15 @pytest.mark.docker tests covering AC-1..AC-15 against a real Postgres + tmp_path filesystem.
    • 5 bonus tests covering insert_metadata validation, get_by_id absence, and per-flight separation via different flight_ids.
  • tests/unit/c6_tile_cache/test_protocol_conformance.py — the AZ-303 fake PostgresFilesystemStore now exposes a from_config classmethod so the factory dispatch keeps working; the AC-5 "module missing" branch is now exercised by patching the lazy import site to raise ModuleNotFoundError.
  • tests/unit/test_az272_fdr_record_schema.py — added fixture payloads for the new c6.write and c6.write_failed kinds so the per-kind round-trip test (AC-1 of AZ-272) covers them.

Modified (docs)

  • _docs/02_document/contracts/shared_fdr_client/fdr_record_schema.md — bumped to v1.1.0 (non-breaking, forward-compat); added rows for the two new kinds and a change-log entry.

Modified (build)

  • pyproject.toml — added psycopg-pool>=3.2,<4.0 to dependencies (previously only psycopg[binary] was pinned; the impl needs the pool to amortise checkout latency on the F3 read path per Risk 3 of the AZ-305 spec).

Acceptance criteria coverage

AC Test Status
AC-1 round-trip byte-identical test_ac1_write_read_round_trip_byte_identical passing
AC-2 hash mismatch rejected before I/O test_ac2_content_hash_mismatch_rejects_before_io passing
AC-3 duplicate key + compensating delete test_ac3_duplicate_key_raises_metadata_error_with_compensating_delete passing
AC-4 row without file fails fast test_ac4_row_without_file_raises_metadata_error passing
AC-5 bbox deterministic order test_ac5_query_by_bbox_returns_deterministic_results passing
AC-6 bbox filters test_ac6_query_by_bbox_honours_filters passing
AC-7 voting forward transitions test_ac7_update_voting_status_enforces_forward_transitions passing
AC-8 mark_uploaded + pending_uploads test_ac8_mark_uploaded_removes_from_pending passing
AC-9 LRU monotonic test_ac9_record_lru_access_is_monotonic passing
AC-10 disk bytes excludes rejected test_ac10_total_disk_bytes_excludes_rejected passing
AC-11 delete_tile idempotent test_ac11_delete_tile_is_idempotent passing
AC-12 third-party errors rewrapped test_ac12_third_party_exceptions_rewrapped passing
AC-13 warm read p95 budget test_ac13_read_tile_pixels_warm_latency_p95 passing
AC-14 5 Hz write burst test_ac14_write_tile_sustains_burst_without_drops passing
AC-15 FDR record on success/failure test_ac15_fdr_record_on_write_success_and_failure passing

AC Test Coverage: 15 of 15 covered

Code Review Verdict: PASS

Auto-Fix Attempts: 1 (ruff --fix; 22 of 22 findings auto-resolved) + 1 user-requested fix (AC-3 strict-reading)

Stuck Agents: None

Findings (self-review)

# Severity Category Location Note Resolution
1 Medium Spec-Gap postgres_filesystem_store.py::_write_tile_impl AC-3's strictest reading required the original row + file to be byte-identical after a duplicate-key collision. Original impl wrote the sidecar BEFORE the row insert, so a duplicate fired the comp-delete on the freshly overwritten file. FIXED in this batch (user chose fix_now): _write_tile_impl was reordered — INSERT now runs first inside an open transaction; only on success does the atomic sidecar write touch the canonical path; the commit then closes the transaction. Duplicate-key collisions now raise TileMetadataError BEFORE any byte hits disk, leaving the original file untouched. Comp-delete is retained for the (extremely rare) commit-after-write-failure path. AC-3 test asserts the strict invariant: original file bytes + sidecar are byte-identical, and read_tile_pixels still returns the original blob_a.
2 Low Maintainability postgres_filesystem_store.py::_emit_write_failed The failure path calls self._tile_xy() to derive the canonical UUID for the FDR record. If _tile_xy() itself ever raises (it shouldn't — TileId.__post_init__ validates lat/lon at construction), the FDR record would be lost and the exception would mask the original write-time error. Pre-validation in TileId keeps this safe today; revisit when WgsConverter gains a per-call failure mode. Open (Low) — accepted as-is.
3 Low Test-quality test_ac13_read_tile_pixels_warm_latency_p95 The spec quotes a 0.5 ms p95 target with a 5 ms failure threshold. The test asserts only the failure threshold so it stays useful on a heterogeneous CI host; the soft 0.5 ms goal is tracked outside of this test (e.g., performance dashboards). Open (Low) — accepted as-is.

Tracker

  • AZ-305 transitioned to In Progress on session start; will be moved to In Testing post-commit per protocols.md.

Test suite

  • tests/unit/c6_tile_cache/ (128 tests) — passing at Tier-2.
  • Full Tier-2 suite (pytest tests/unit): 1215 passed, 8 skipped, 1 pre-existing failure (test_ac8_read_host_tuple_on_jetson — needs pynvml, Jetson-only, unrelated to AZ-305 — confirmed pre-existing on bf33b94 by git stash round-trip).

Next batch

All AZ-305 work complete. Cycle 1 has no more remaining batches in the greenfield queue — autodev advances to the cycle-end gate (Step 7's batch-loop exit → Step 15 Product Implementation Completeness Gate, or the next sub-step the active flow defines).