[AZ-303] C6 storage interfaces: Protocols + DTOs + factories

Freezes the c6_tile_cache Public API per
_docs/02_document/contracts/c6_tile_cache/{tile_store,tile_metadata_store,
descriptor_index}.md v1.0.0:

- Three runtime_checkable Protocols (TileStore 4-method, TileMetadataStore
  9-method, DescriptorIndex 5-method) in components/c6_tile_cache/interface.py.
- Frozen DTOs + enums (TileId, TileMetadata, TileMetadataPersistent,
  TileQualityMetadata, Bbox, SectorBoundary, HnswParams, IndexMetadata,
  TileSource, FreshnessLabel, VotingStatus, SectorClassification) in
  components/c6_tile_cache/_types.py. Constructor-time validation rejects
  out-of-range zoom_level / lat / lon and inverted Bbox.
- TilePixelHandle ABC for read-only mmap access (Invariant I-4).
- TileCacheError family (6 subtypes) + IndexBuildError (deliberately
  outside the family) in components/c6_tile_cache/errors.py.
- C6TileCacheConfig per-component config block, registered on package
  import; validates known runtime labels at construction time.
- Composition-root factories build_tile_store / build_tile_metadata_store /
  build_descriptor_index in runtime_root/storage_factory.py, with lazy
  concrete-impl imports gated by BUILD_FAISS_INDEX (AC-5 / Risk 2:
  no module-level FAISS import when the flag is OFF).
- RuntimeNotAvailableError defined in runtime_root/errors.py to be shared
  with AZ-297 (composition-time error, distinct from per-component
  runtime errors).

51 conformance tests cover all 10 ACs + NFR-perf-factory (p99 build_*
under 50 ms across 1000 calls) + NFR-reliability-error-family. AC-9
introspects each contract file's Shape table and asserts method
parity against the runtime Protocol.

Retired the AZ-263 scaffolding SectorClassification (dataclass) and
TileQualityMetadata from _types/tile.py since their canonical home is
now c6_tile_cache._types; Tile and TileRecord remain in _types/tile.py
until c3_matcher (AZ-344) and c11_tile_manager (AZ-316/319) retire
their interface stubs.

Full unit-test sweep: 791 passed, 2 pre-existing environment skips.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-12 04:21:44 +03:00
parent 48281db9e9
commit f925af9de3
12 changed files with 1539 additions and 63 deletions
@@ -1,211 +0,0 @@
# C6 Storage Interfaces — Protocols + DTOs + Composition-Root Factories
**Task**: AZ-303_c6_storage_interfaces
**Name**: C6 Storage Interfaces
**Description**: Define the three `c6_tile_cache` Protocols (`TileStore`, `TileMetadataStore`, `DescriptorIndex`), their shared DTOs (`TileId`, `TileMetadata`, `TileQualityMetadata`, `TilePixelHandle`, `Bbox`, `SectorBoundary`, `HnswParams`, `IndexMetadata`), the `TileSource` / `FreshnessLabel` / `VotingStatus` / `SectorClassification` enums, the runtime error taxonomy (`TileCacheError` family + `IndexBuildError`), and the composition-root factory triple `build_tile_store / build_tile_metadata_store / build_descriptor_index`. This is the foundational shared-API task for E-C6 — five external components (C2, C2.5, C3, C10, C11) plus C12 operator tooling depend on the contracts this task freezes.
**Complexity**: 3 points
**Dependencies**: AZ-263_initial_structure, AZ-269_config_loader, AZ-266_log_module, AZ-280_sha256_sidecar
**Component**: c6_tile_cache (epic AZ-250 / E-C6)
**Tracker**: AZ-303
**Epic**: AZ-250 (E-C6)
### Document Dependencies
- `_docs/02_document/contracts/c6_tile_cache/tile_store.md` — produced by this task.
- `_docs/02_document/contracts/c6_tile_cache/tile_metadata_store.md` — produced by this task.
- `_docs/02_document/contracts/c6_tile_cache/descriptor_index.md` — produced by this task.
- `_docs/02_document/contracts/shared_helpers/sha256_sidecar.md``TileMetadata.content_sha256_hex`, `IndexMetadata.sidecar_sha256_hex`, and the atomic-write/sidecar pattern.
- `_docs/02_document/contracts/shared_config/composition_root_protocol.md` — adds `config.tile_cache.{store_runtime, metadata_runtime, descriptor_index_runtime, root_dir, postgres_dsn, lru_eviction_threshold_bytes}` fields.
- `_docs/02_document/contracts/shared_logging/log_record_schema.md` — error events emitted by Protocol implementations use this log shape.
## Problem
Five different components (C2 VPR, C2.5 ReRanker, C3 CrossDomainMatcher, C10 CacheProvisioner, C11 TileDownloader/Uploader) and one operator-side consumer (C12) all need a single, frozen interface to the persistent imagery store. Without it:
- Each consumer would import the concrete `PostgresFilesystemStore` / `FaissDescriptorIndex` directly, hard-coding the storage choice and breaking ADR-009 interface-first DI.
- A future swap of the descriptor index (FAISS → ScaNN) or the metadata store (Postgres → SQLite for Tier-0 dev) would ripple across every consumer.
- Error handling would diverge per consumer; `FreshnessRejectionError`, `ContentHashMismatchError`, and `IndexUnavailableError` would have different shapes per impl, making the F2 takeoff abort path and the F4 mid-flight insert path fragile.
- The composition root would have to know per-component which storage runtime is acceptable; today only ADR-001 (config) + ADR-009 (interface DI) decide.
- No canonical place to declare the `TileMetadata` shape — every consumer would re-derive `flight_id`, `companion_id`, `quality_metadata`, `voting_status` field names, leading to drift.
This task delivers the typed boundary every consumer reads against and every implementation conforms to. It writes no storage logic — concrete `PostgresFilesystemStore` is owned by the postgres-filesystem-store task; concrete `FaissDescriptorIndex` is owned by the faiss-descriptor-index task; the freshness gate logic is its own task; the LRU eviction is its own task.
## Outcome
- Three Protocols at `src/gps_denied_onboard/components/c6_tile_cache/interface.py` (re-exported from `__init__.py`):
- `TileStore``read_tile_pixels`, `write_tile`, `tile_exists`, `delete_tile`.
- `TileMetadataStore``query_by_bbox`, `insert_metadata`, `update_voting_status`, `mark_uploaded`, `pending_uploads`, `record_lru_access`, `lru_candidates`, `total_disk_bytes`, `get_by_id`.
- `DescriptorIndex``search_topk`, `descriptor_dim`, `mmap_handle`, `rebuild_from_descriptors`, `index_metadata`.
- All three Protocols are `typing.Protocol` with `runtime_checkable=True`.
- DTOs at `src/gps_denied_onboard/components/c6_tile_cache/_types.py` (re-exported from `__init__.py`): `TileId`, `TileMetadata`, `TileMetadataPersistent`, `TileQualityMetadata`, `Bbox`, `SectorBoundary`, `HnswParams`, `IndexMetadata`, plus the enums `TileSource`, `FreshnessLabel`, `VotingStatus`, `SectorClassification`. All `@dataclass(frozen=True)` except `TilePixelHandle` (opaque context-manager class).
- A `TilePixelHandle` ABC at the same path that the concrete impl subclasses; consumers use `with handle as memview:` and treat the underlying bytes as read-only.
- The runtime error hierarchy under `c6_tile_cache.errors`:
- `TileCacheError` ← {`TileNotFoundError`, `TileFsError`, `TileMetadataError`, `ContentHashMismatchError`, `FreshnessRejectionError`, `IndexUnavailableError`}.
- `IndexBuildError` (NOT a subclass of `TileCacheError` — offline build envelope only; raised by `rebuild_from_descriptors`).
- Composition-root factories at `src/gps_denied_onboard/runtime_root/storage_factory.py`:
- `build_tile_store(config) -> TileStore`
- `build_tile_metadata_store(config) -> TileMetadataStore`
- `build_descriptor_index(config) -> DescriptorIndex`
- Each respects compile-time `BUILD_*` gating (today only `BUILD_FAISS_INDEX` for `DescriptorIndex`; the metadata + filesystem store has no build flag).
- Requesting an impl whose flag is OFF raises `RuntimeNotAvailableError` (reused from AZ-297) at composition time, NOT at first call.
- A `ConfigSchemaError` extension to AZ-269's config loader for the new `config.tile_cache.{store_runtime, metadata_runtime, descriptor_index_runtime, root_dir, postgres_dsn, lru_eviction_threshold_bytes}` fields.
- Three frozen contract files at `_docs/02_document/contracts/c6_tile_cache/{tile_store, tile_metadata_store, descriptor_index}.md` carry the full shapes; consumers read those files, not this task spec.
- Type-only unit tests verify each future concrete impl module's class actually conforms to the Protocol via `runtime_checkable` + `isinstance` (catches drift at CI time, not deployment).
## Scope
### Included
- All three Protocols, all DTOs, all enums, the error taxonomy, the composition-root factory triple, and the config-loader extension.
- Three contract files (already drafted alongside this task); the producer task is responsible for keeping them in sync with the code.
- Type-only conformance tests at `tests/unit/c6_tile_cache/test_protocol_conformance.py` that import each concrete impl class and assert `isinstance(impl, ProtocolClass)`. The tests stand up no Postgres / FAISS — they only exercise structural typing.
- `RuntimeNotAvailableError` reuse from AZ-297 (do NOT define a new error type).
- `TilePixelHandle` ABC (so the concrete impl can subclass; tests can substitute a fake handle that wraps a `bytes` buffer).
- DTO field validation at construction time: e.g., `TileId(zoom_level=22)` (out-of-range) raises `ValueError`; `Bbox` with `min_lat > max_lat` raises `ValueError`. These are NOT in `TileCacheError` — they are stdlib `ValueError` for bad caller input.
- The `FreshnessRejectionError`, `ContentHashMismatchError`, and `IndexBuildError` types (defined here even though only the impl tasks raise them — keeps the family / taxonomy in one place).
### Excluded
- `PostgresFilesystemStore` implementation — separate task (`c6_postgres_filesystem_store`).
- `FaissDescriptorIndex` implementation — separate task (`c6_faiss_descriptor_index`).
- Postgres schema migration (`_alembic/0001_initial.sql`) — separate task (`c6_postgres_schema`).
- Freshness gate logic (active_conflict reject / stable_rear downgrade) — separate task (`c6_freshness_gate`); this task only declares `FreshnessRejectionError` and the `freshness_label` field.
- 10 GB LRU cache eviction — separate task (`c6_cache_budget_eviction`); this task only declares `lru_candidates` / `record_lru_access` / `total_disk_bytes` Protocol methods.
- C10 CacheProvisioner consumer wiring of `rebuild_from_descriptors` — owned by E-C10.
- C11 `TileUploader` consumer wiring of `pending_uploads` / `mark_uploaded` — owned by E-C11.
- C2 / C2.5 / C3 consumer wiring of read paths — owned by their respective epics.
- Sector boundary CRUD — owned by C12 operator tooling. This task only declares the read-side `SectorBoundary` DTO.
- Test infrastructure (Postgres test container, FAISS test fixtures) — owned by E-BBT (test infrastructure task).
## Acceptance Criteria
**AC-1: Three Protocols are conformance-checkable**
Given a class that implements every method on `TileStore` (or `TileMetadataStore`, or `DescriptorIndex`) with matching signatures
When `isinstance(impl, TileStore)` is evaluated under `runtime_checkable`
Then the result is `True`; for a class that omits any method, the result is `False` for that Protocol
**AC-2: Frozen DTOs reject mutation**
Given a constructed `TileId(...)`, `TileMetadata(...)`, `Bbox(...)`, or `HnswParams(...)` instance
When the test attempts any field reassignment
Then `dataclasses.FrozenInstanceError` is raised; the original value is preserved
**AC-3: Error hierarchy catchable as a single family**
Given any of the six `TileCacheError` subtypes
When the consumer wraps a Protocol method call in `try: ... except c6_tile_cache.errors.TileCacheError`
Then every documented subtype is caught; an unrelated `Exception` is NOT caught; `IndexBuildError` is also NOT caught (it is intentionally out of the runtime-read envelope)
**AC-4: Composition-root factory honours config**
Given `config.tile_cache.descriptor_index_runtime = "faiss_hnsw"` and `BUILD_FAISS_INDEX=ON`
When `build_descriptor_index(config)` is called
Then a `FaissDescriptorIndex` instance is returned (the test substitutes a fake satisfying the Protocol; production wiring is the same call site)
**AC-5: Composition-root factory honours BUILD flag gate**
Given `config.tile_cache.descriptor_index_runtime = "faiss_hnsw"` and `BUILD_FAISS_INDEX=OFF`
When `build_descriptor_index(config)` is called
Then `RuntimeNotAvailableError` is raised at composition time with a message naming `"faiss_hnsw"`; no module-level import of FAISS symbols has occurred (verifiable via `sys.modules` does NOT contain `c6_tile_cache.faiss_descriptor_index`)
**AC-6: Unknown runtime label rejected at config load**
Given `config.tile_cache.descriptor_index_runtime = "scann"` (not in the enum)
When the config is loaded via AZ-269's loader
Then `ConfigSchemaError` is raised at load time with a message listing the valid values; `build_descriptor_index` is never reached
**AC-7: Constructor-time validation rejects bad input**
Given `TileId(zoom_level=22, lat=0.0, lon=0.0)` (out-of-range zoom) or `Bbox(min_lat=10, min_lon=0, max_lat=5, max_lon=10)` (inverted box)
When the DTO is constructed
Then `ValueError` is raised with a message naming the offending field; no DTO instance is produced
**AC-8: TilePixelHandle is read-only by contract**
Given a concrete `TilePixelHandle` subclass that exposes `memoryview` over mmap'd bytes
When `with handle as memview: memview[0] = 0xff`
Then `TypeError: cannot modify read-only memoryview` is raised; the underlying file is not mutated
**AC-9: Contract files match Protocol shapes**
Given the three contract files at `_docs/02_document/contracts/c6_tile_cache/`
When a contract-test parses each file's Shape section's method/field tables and compares against the runtime Protocol via introspection
Then every method, every field, every error type is present and consistent in both directions
**AC-10: VotingStatus transitions are policy-aware (declared, not enforced)**
Given the `VotingStatus` enum
When the consumer test asserts the documented forward-transitions table (`PENDING → TRUSTED`, `PENDING → REJECTED`, `TRUSTED → REJECTED`)
Then the table matches the contract; the actual enforcement lives in `update_voting_status` impl (NOT this task), so the test only verifies the enum exposes the four documented states (`PENDING`, `TRUSTED`, `REJECTED`)
## Non-Functional Requirements
**Compatibility**
- Protocols use stdlib `typing.Protocol` (PEP 544); no third-party Protocol library is introduced.
- DTOs use stdlib `dataclasses` with `frozen=True`; no `pydantic` or `attrs` dependency.
- Errors subclass `Exception` (not `BaseException`); upstream `except Exception:` continues to work.
**Performance**
- The factory triple `build_*` returns within 50 ms each (lazy-imports the concrete impl on first call; subsequent calls << 1 ms).
- DTO construction is the bare-cost dataclass `__init__` plus the constructor-time validation (AC-7).
**Reliability**
- Implementations MUST raise only members of `c6_tile_cache.errors.TileCacheError` from runtime Protocol methods; third-party library exceptions (psycopg / FAISS C++ exceptions / OS errors from filesystem syscalls) MUST be caught and rewrapped.
- Versioning: any breaking change to a Protocol or DTO MUST bump the corresponding contract file's `Version` and notify every consumer task listed in the contract header.
## Unit Tests
| AC Ref | What to Test | Required Outcome |
|--------|-------------|-----------------|
| AC-1 | `runtime_checkable` Protocol vs. fully-implementing fakes for each of the three Protocols; vs. fakes missing one method | `isinstance` returns True for full, False for partial |
| AC-2 | Mutation attempt on each frozen DTO | `FrozenInstanceError`; original value preserved |
| AC-3 | Raise each of the six error subtypes; catch as `c6_tile_cache.errors.TileCacheError` | All caught; unrelated `ValueError` is NOT caught; `IndexBuildError` is NOT caught by the family handler |
| AC-4 | `build_descriptor_index` with `faiss_hnsw` + flag ON → fake `FaissDescriptorIndex` | Returned instance satisfies the Protocol |
| AC-5 | `build_descriptor_index` with `faiss_hnsw` + flag OFF | `RuntimeNotAvailableError`; `sys.modules` does NOT contain `c6_tile_cache.faiss_descriptor_index` |
| AC-6 | Config load with invalid `descriptor_index_runtime` value | `ConfigSchemaError`; valid values listed in message |
| AC-7 | `TileId(zoom_level=22, ...)`, `Bbox(min_lat > max_lat, ...)` | `ValueError` with offending field named |
| AC-8 | `TilePixelHandle` write attempt through `memoryview` | `TypeError`; underlying file unchanged |
| AC-9 | Contract introspection vs. Protocol introspection for each of the three contracts | Shape parity test passes for all three |
| AC-10 | `VotingStatus` enum surface | `{PENDING, TRUSTED, REJECTED}` exactly |
| NFR-perf-factory | Microbench `build_*` × 1000 | p99 ≤ 50 ms each |
| NFR-reliability-error-family | All six subtypes inherit from `c6_tile_cache.errors.TileCacheError` | Verified via `issubclass` for each |
## Constraints
- The Protocols are stdlib `typing.Protocol`; no third-party Protocol library is introduced.
- DTOs are stdlib `@dataclass(frozen=True)`; no `pydantic` / `attrs`.
- `TilePixelHandle` is an ABC — concrete impls subclass with mmap-backed state; consumers MUST treat the bytes as read-only (enforced via `memoryview` `readonly=True`).
- The error hierarchy is the boundary of acceptable runtime errors. Implementations rewrap third-party exceptions; consumers catch the family.
- Lazy import of concrete impls is mandatory in the composition-root factory triple. The package `__init__.py` re-exports ONLY the Protocols, DTOs, enums, errors, and `TilePixelHandle` ABC — no concrete impl module is imported at package load time.
- The three contract files at `_docs/02_document/contracts/c6_tile_cache/` are the source of truth for shape; if the Protocol changes here without the contract updating, that is a Spec-Gap finding (High) per code-review skill Phase 2.
- This task introduces no new third-party dependencies — `typing.Protocol`, `dataclasses`, `enum`, `pathlib`, `numpy` (already pinned for the project) are all that's used.
- `numpy` arrays in the `DescriptorIndex` Protocol surface MUST be C-contiguous `float32`; the impl validates this at runtime (raises `IndexUnavailableError` on mismatch per the contract). This task only declares the type annotations; validation logic lives in the impl task.
## Risks & Mitigation
**Risk 1: Protocol drift between contract and code**
- *Risk*: Implementations diverge from the contract over time; consumers cannot tell which is canonical.
- *Mitigation*: AC-9 contract-introspection test runs in CI; any drift fails the test before merge. Each contract's `## Test Cases` section names this exact test.
**Risk 2: Lazy-import gating bypassed by transitively-imported module**
- *Risk*: A consumer imports `c6_tile_cache` (the package) and the package's `__init__.py` eagerly imports the concrete impl, triggering FAISS load even when `BUILD_FAISS_INDEX=OFF`.
- *Mitigation*: The package `__init__.py` re-exports ONLY the Protocols, DTOs, enums, errors, and `TilePixelHandle` ABC — it does NOT import any concrete impl. AC-5 verifies via `sys.modules`.
**Risk 3: Three Protocols cluttering the public surface**
- *Risk*: A consumer that needs only `TileStore` is forced to import the whole `c6_tile_cache` package; if the package eagerly evaluates the other two Protocols' DTOs, the import cost is wasteful.
- *Mitigation*: Stdlib dataclasses + `typing.Protocol` evaluation is essentially free (one class statement each); the AC-5 sys-modules test covers the only meaningful cost (concrete impls). No further mitigation needed.
**Risk 4: TileMetadata field set drifts as new sources or quality fields are added**
- *Risk*: Adding a field to `TileMetadata` is a contract change rippling to every consumer.
- *Mitigation*: Versioning rules in `tile_store.md` § Versioning Rules require a minor bump for new optional fields with defaults; consumers tolerate. A required-field addition is a major bump and triggers the user-Choose-format coordination per `decompose/templates/api-contract.md`.
**Risk 5: `IndexBuildError` outside the family confuses catchers**
- *Risk*: A consumer doing `except TileCacheError` MIGHT expect to catch a build-time corruption; instead the error escapes.
- *Mitigation*: Documented as Non-Goal in `descriptor_index.md` and as a separate test in AC-3. The build path lives in C10 pre-flight; in flight the only descriptor-index errors are read-side (`IndexUnavailableError`, which IS in the family).
## Runtime Completeness
- **Named capability**: typed Protocols + DTOs + error envelope + composition-root selection for `c6_tile_cache` (architecture / E-C6 / ADR-001 + ADR-009).
- **Production code that must exist**: real Protocol declarations, real frozen DTOs, real error hierarchy, real composition-root factory triple with lazy-import gating, real config-loader extension for the runtime enum, real constructor-time DTO validation (AC-7), real `TilePixelHandle` ABC.
- **Allowed external stubs**: tests MAY substitute fake impl classes that conform to the Protocols; production wiring uses the real impls from the postgres-filesystem-store and faiss-descriptor-index tasks.
- **Unacceptable substitutes**: ABCs instead of `typing.Protocol` (would force inheritance changes downstream), `pydantic.BaseModel` instead of `@dataclass(frozen=True)` (adds a runtime validation layer this task does not need), eager imports of concrete impls in `__init__.py` (would defeat `BUILD_FAISS_INDEX` gating), or a `descriptor_index_runtime: str` config field without an enum (would lose the load-time validation in AC-6).
## Contract
This task produces/implements the contracts at:
- `_docs/02_document/contracts/c6_tile_cache/tile_store.md`
- `_docs/02_document/contracts/c6_tile_cache/tile_metadata_store.md`
- `_docs/02_document/contracts/c6_tile_cache/descriptor_index.md`
Consumers MUST read those files — not this task spec — to discover the interfaces.