Files
satellite-provider/_docs/02_document/contracts/data-access/tile-storage.md
T
Oleksandr Bezdieniezhnykh 909f69cb3a [AZ-505] Tile inventory endpoint + HTTP/2 + Leaflet covering index
Production code:
- POST /api/satellite/tiles/inventory (XOR body, 5000-cap,
  most-recent-per-location_hash select, present/absent shaping).
- Kestrel HttpProtocols.Http1AndHttp2 on every listener (AC-5).
- Migration 015 creates tiles_leaflet_path covering index over
  (location_hash, captured_at DESC, updated_at DESC, id DESC)
  INCLUDE (file_path, source); drops superseded idx_tiles_location_hash.
- TileRepository.GetByTileCoordinatesAsync rewired to filter by
  location_hash (Index Only Scan via tiles_leaflet_path).
- TileRepository.GetTilesByLocationHashesAsync added with Npgsql-
  direct ANY($1::uuid[]) binding (Dapper IEnumerable expansion is
  incompatible with the array form).
- Uuidv5.LocationHashForTile centralises the UUIDv5(TileNamespace,
  "{z}/{x}/{y}") formula — single source of truth for the cross-repo
  invariant (gps-denied-onboard parity).

Contracts:
- New: contracts/api/tile-inventory.md v1.0.0.
- Bumped: contracts/data-access/tile-storage.md to v2.0.0 (joint
  ownership by AZ-503-foundation + AZ-505: schema + covering index +
  GetByTileCoordinatesAsync rewrite).

Tests:
- TileInventoryTests covers AC-1, AC-2 (DB-level), AC-4, AC-6.
- Http2MultiplexingTests covers AC-5 (20 concurrent multiplexed GETs
  over h2c via SocketsHttpHandler + AppContext Http2Unencrypted switch).
- LeafletPathIndexOnlyTests covers AC-3 (EXPLAIN (ANALYZE, BUFFERS)
  asserts Index Only Scan over tiles_leaflet_path with heap_blocks=0).

Docs:
- architecture.md, system-flows.md, data_model.md, module-layout.md,
  glossary.md, modules/api_program.md, modules/dataaccess_tile_repository.md,
  components/02_data_access/description.md all updated to reference the
  v2.0.0 tile-storage contract + new tile-inventory contract + AC-7.

Reports:
- batch_01_cycle6_report.md, batch_01_cycle6_review.md,
  implementation_completeness_cycle6_report.md (PASS),
  implementation_report_tile_inventory_cycle6.md.

Task spec moved todo/ -> done/.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-12 21:16:37 +03:00

17 KiB
Raw Blame History

Contract: tile-storage

Component: DataAccess Producer task: AZ-484 (v1.0.0 — multi-source schema) + AZ-503-foundation (v2.0.0 — tile identity columns) + AZ-505 (v2.0.0 freeze — covering index + location_hash-keyed reads + bulk inventory) Consumer tasks: AZ-485 (UAV upload endpoint, cycle 5 — uav-tile-upload.md v1.1.0), AZ-505 (inventory endpoint, this cycle — tile-inventory.md v1.0.0), future SatAR / additional-source tasks Version: 2.0.0 Status: frozen Last Updated: 2026-05-12

Purpose

Defines how satellite imagery tiles are persisted in the tiles table when more than one acquisition source (and multiple UAV flights per source) can write to the same geographic cell, AND how the table is indexed for the two distinct read patterns it serves:

  1. Producer writes (POST /api/satellite/upload, GoogleMapsDownloaderV2) — per-source, per-flight UPSERTs keyed by integer slippy coords.
  2. Consumer reads:
    • Leaflet hot path — GET /tiles/{z}/{x}/{y} returns the most-recent variant by location_hash.
    • Bulk inventory — POST /api/satellite/tiles/inventory returns one row per location_hash across many cells in one round trip.

Producers must agree on the source enum, captured_at semantics, flight_id semantics, and the per-(source × flight) UPSERT contract. Readers must use the location_hash-keyed selection rule and tolerate the multi-source / multi-flight row layout.

Shape

Schema (PostgreSQL tiles table — relevant columns only)

-- Pre-existing columns (unchanged since AZ-484)
id                  UUID            PRIMARY KEY
tile_zoom           INT             NOT NULL
tile_x              INT             NOT NULL
tile_y              INT             NOT NULL
latitude            DOUBLE PRECISION NOT NULL
longitude           DOUBLE PRECISION NOT NULL
tile_size_meters    DOUBLE PRECISION NOT NULL
tile_size_pixels    INT             NOT NULL
image_type          VARCHAR(10)     NOT NULL
file_path           VARCHAR(500)    NOT NULL
created_at          TIMESTAMP       NOT NULL DEFAULT CURRENT_TIMESTAMP
updated_at          TIMESTAMP       NOT NULL DEFAULT CURRENT_TIMESTAMP

-- AZ-484 (v1.0.0)
source              VARCHAR(32)     NOT NULL    -- enum-stored: 'google_maps' | 'uav'
captured_at         TIMESTAMP       NOT NULL    -- UTC; producer-supplied semantics, see below

-- AZ-503-foundation (this contract — v2.0.0)
flight_id           UUID            NULL        -- per-UAV-flight identifier; NULL for google_maps + legacy uav
location_hash       UUID            NOT NULL    -- UUIDv5(TileNamespace, "{tile_zoom}/{tile_x}/{tile_y}")
content_sha256     BYTEA           NULL        -- SHA-256 of JPEG body at insert time; NULL for legacy rows only
legacy_id           UUID            NULL        -- pre-AZ-503 random `id` preserved for one deprecation cycle

-- Vestigial columns (preserved per coderule.mdc; readers MUST NOT depend on them)
maps_version        VARCHAR(50)     NULL
version             INT             NULL

Field reference (v2.0.0)

Field Type Required Description Constraints
source enum (TileSource) stored as VARCHAR(32) yes Producer of the tile 'google_maps' or 'uav'. New values require a contract version bump.
captured_at TIMESTAMP UTC yes Producer-defined "moment the imagery represents" For google_maps: DateTime.UtcNow at download time (provider does not expose original imagery date). For uav: the UAV capture timestamp supplied by the upload client. Must be UTC; non-UTC must be converted before write.
flight_id UUID no (NULL for google_maps + legacy uav) Per-flight identifier supplied by the UAV upload endpoint When source = 'uav' AND tile is AZ-503+ era → NOT NULL. When source = 'google_maps' → MUST be NULL. Pre-AZ-503 uav rows may have NULL. UPSERT collapses NULL via COALESCE(flight_id, '00000000-…'::uuid).
location_hash UUID (v5) yes Deterministic cell identifier UUIDv5(Uuidv5.TileNamespace, "{tile_zoom}/{tile_x}/{tile_y}"). Cross-repo invariant — TileNamespace = 5b8d0c2e-7f1a-4d3b-9c5e-1f3a8e7d2b6c. Identical byte-for-byte with gps-denied-onboard/components/c6_tile_cache/_uuid.py:TILE_NAMESPACE.
content_sha256 BYTEA (32) yes for AZ-503+ writes (NULL only for pre-AZ-503 rows the migration could not re-hash) SHA-256 of the JPEG body at insert time Application invariant: enforced NOT NULL on new writes via TileEntity.ContentSha256. Migration 014 left the column nullable because it could not safely re-open tile files on disk during schema migration.
legacy_id UUID no Pre-AZ-503 random id preserved for one deprecation cycle NULL for AZ-503+ rows; populated for rows that pre-date migration 014. Will be dropped in a follow-up migration once external references are confirmed flushed.
(tile_zoom, tile_x, tile_y, tile_size_meters, source, COALESCE(flight_id, '00000000-…'::uuid)) composite yes Per-source-per-flight uniqueness Enforced via UNIQUE INDEX idx_tiles_unique_identity. Replaces the AZ-484 lat/lon-keyed uniqueness from idx_tiles_unique_location_source.

Indexes (v2.0.0)

-- Per-source-per-flight uniqueness. NULL-safe via COALESCE.
CREATE UNIQUE INDEX idx_tiles_unique_identity
    ON tiles (
        tile_zoom, tile_x, tile_y, tile_size_meters, source,
        COALESCE(flight_id, '00000000-0000-0000-0000-000000000000'::uuid)
    );

-- Leaflet hot-path covering index. AZ-505.
CREATE INDEX tiles_leaflet_path
    ON tiles (location_hash, captured_at DESC, updated_at DESC, id DESC)
    INCLUDE (file_path, source);

Indexes dropped in v2.0.0:

  • idx_tiles_unique_location_source (AZ-484 lat/lon-keyed uniqueness) — dropped by migration 014; superseded by idx_tiles_unique_identity.
  • idx_tiles_unique_location (pre-AZ-484 4-column uniqueness) — dropped by migration 013; included here for completeness.
  • idx_tiles_location_hash (lightweight lookup added by migration 014) — dropped by migration 015; superseded — equality lookups by location_hash use the leading column of tiles_leaflet_path.

Producer write API

Operation Repository method Conflict semantics
Insert / replace same-(source, flight) row for a cell ITileRepository.InsertAsync(TileEntity) ON CONFLICT (tile_zoom, tile_x, tile_y, tile_size_meters, source, COALESCE(flight_id, '00000000-…'::uuid)) DO UPDATE SET file_path, latitude, longitude, captured_at, updated_at, content_sha256. Producers MUST set Source, CapturedAt, LocationHash, ContentSha256. Producers MUST set FlightId when source = uav.
Update by primary key ITileRepository.UpdateAsync(TileEntity) Updates by id only. Caller's responsibility not to violate the unique index.
Delete by primary key ITileRepository.DeleteAsync(Guid) Removes a single row by id; no cascade.

Consumer read API and selection rule

Operation Repository method Selection rule
Read by id ITileRepository.GetByIdAsync(Guid) Returns the row identified by id (no source/flight filter).
Read most-recent for a cell by slippy coordinates ITileRepository.GetByTileCoordinatesAsync(zoom, x, y) Computes location_hash = UUIDv5(TileNamespace, "{zoom}/{x}/{y}") and returns the row with the highest (captured_at, updated_at, id) tuple for that hash across all sources/flights. At most one row.
Read region ITileRepository.GetTilesByRegionAsync(lat, lon, sizeMeters, zoomLevel) Returns at most one row per (tile_zoom, tile_x, tile_y, tile_size_meters) group, selected by the same most-recent rule.
Bulk inventory lookup ITileRepository.GetTilesByLocationHashesAsync(IReadOnlyList<Guid> locationHashes) Returns at most one row per requested location_hash, selected by DISTINCT ON (location_hash) ... ORDER BY location_hash, captured_at DESC, updated_at DESC, id DESC. Used by the AZ-505 inventory endpoint.

The selection rule is most-recent across all sources and flights ordered by captured_at DESC, with (updated_at DESC, id DESC) as deterministic tie-breakers. No voting / trust-promotion filter is applied at this layer.

Invariants

  • Inv-1: Every row has a non-null source whose string value is a member of TileSource. Rows with unknown source values are a contract violation.
  • Inv-2: Every row has a non-null captured_at in UTC.
  • Inv-3: At most one row exists per (tile_zoom, tile_x, tile_y, tile_size_meters, source, COALESCE(flight_id, '00000000-…'::uuid)). NULL-coalesced flight_id is what makes google_maps rows (where flight_id is always NULL) deduplicate to one row per cell-and-size-and-source.
  • Inv-4: For any cell with one or more rows, the row returned by GetByTileCoordinatesAsync and the per-cell row returned by GetTilesByRegionAsync are identical.
  • Inv-5: The source column value space is closed: only the snake_case wire values defined in SatelliteProvider.Common.Enums.TileSourceConverter ("google_maps", "uav") are valid. Adding a new producer requires a new TileSource enum member, a corresponding wire value in TileSourceConverter, AND a contract version bump (minor). Note: TileEntity.Source is stored as the wire string (not the C# enum) because Dapper's TypeHandler<T> for enum types is bypassed during read deserialization (Dapper issue #259); TileSourceConverter.{ToWireValue,FromWireValue} is the documented bridge.
  • Inv-6: captured_at semantics are producer-defined per the Field Reference table above; consumers MUST NOT reinterpret it (e.g., consumers MUST NOT assume captured_at from google_maps reflects original imagery date).
  • Inv-7 (new in v2.0.0): location_hash is functionally determined by (tile_zoom, tile_x, tile_y) — every row with the same slippy coords has the same hash, and that hash equals UUIDv5(Uuidv5.TileNamespace, "{tile_zoom}/{tile_x}/{tile_y}"). The namespace constant is a cross-repository invariant that must NOT be changed unilaterally — see coderule.mdc "cross-repo invariants" and the AZ-503 migration header.
  • Inv-8 (new in v2.0.0): When source = 'google_maps', flight_id MUST be NULL. When source = 'uav' AND the row is AZ-503+ era (created after migration 014), flight_id SHOULD be non-NULL; legacy uav rows with NULL flight_id are tolerated for one deprecation cycle.
  • Inv-9 (new in v2.0.0): GetByTileCoordinatesAsync filters by location_hash, not by (tile_zoom, tile_x, tile_y) directly. Callers that pass the same (z, x, y) tuple get byte-identical results to v1.0.0 because the hash is deterministic; this is a behavior-preserving rewrite that exists to make the leaflet hot path index-only against tiles_leaflet_path.

Non-Goals

  • Not covered: Per-source / per-flight historical revision retention. Same-(source, flight) uploads to the same cell overwrite the previous row by design — this is not a versioned table. Consumers wanting season selection or rollback must propose a v3 schema.
  • Not covered: Cross-source / cross-flight merging or compositing at read time. Reads return exactly one row per cell.
  • Not covered: Quality scoring, threshold gating, or voting / trust-promotion at this layer. Voting is owned by gps-denied-onboard Design Task #2 and consumes flight_id from this contract.
  • Not covered: Backwards-compatible reads against the v1.0.0 unique index. Migration 014 is mandatory before any consumer of v2.0.0 runs.
  • Not covered: The vestigial maps_version and version columns. Consumers MUST NOT read them; producers MUST NOT write them in v1.0.0+.
  • Not covered: content_sha256 integrity verification on read. The column is populated for new writes; downstream verification is a future-task concern.

Versioning Rules

  • Patch (2.0.x): Documentation clarifications, additional invariants that do not change runtime behavior, expanded test cases.
  • Minor (2.x.0): Adding a new TileSource enum member; adding optional columns that consumers may safely ignore; adding new repository read methods; widening the tiles_leaflet_path INCLUDE list to remove heap fetches from inventory.
  • Major (3.0.0): Removing or renaming a column; changing the unique index columns; changing the selection rule (e.g., adding source priority or voting filter); changing captured_at from required to optional or vice versa; introducing per-(source, flight) historical revisions; changing the Uuidv5.TileNamespace constant (would also break sibling repos and require coordinated cross-repo work).

Each version bump requires updating the Change Log below and notifying every consumer listed in the header. If consumers' tasks have not yet been written, the producer task is responsible for surfacing the change to the user before merging.

Test Cases

Case Input Expected Notes
valid-google-only Insert source='google_maps' captured_at=T1 flight_id=NULL for a fresh cell Single row returned by region read; source='google_maps', captured_at=T1. v1.0.0 baseline regression case.
valid-multi-source Insert google_maps captured_at=T1, then uav captured_at=T2 > T1 flight_id=F1 for same cell Both rows persisted; GetByTileCoordinatesAsync returns the uav row. AC-1 + AC-2 of AZ-484.
valid-multi-flight Insert two uav rows with distinct flight_ids for same (z, x, y), captured_at=T1 and T2 > T1 Both rows persisted under idx_tiles_unique_identity; most-recent rule returns the T2 row. v2.0.0 new — was a unique-index violation under v1.0.0.
same-source-same-flight-upsert Insert uav captured_at=T1 flight_id=F1, then uav captured_at=T2 > T1 flight_id=F1 for same cell Exactly one uav/F1 row remains, with captured_at=T2 and updated file_path. AZ-484 AC-3, preserved through AZ-503 schema rewrite.
time-tiebreak Insert google_maps captured_at=T, then uav captured_at=T flight_id=F1 (identical timestamps) for same cell Selection deterministic by (updated_at DESC, id DESC) tie-break; result must be reproducible across two test runs with the same seed. Inv-4 enforcement.
location-hash-stability Compute UUIDv5 for (z=18, x=154321, y=95812) both in C# (Uuidv5.Create) and in Postgres (migration 014 helper) Identical 16 bytes. Both equal gps-denied-onboard's Python uuid5(TILE_NAMESPACE, "18/154321/95812"). Inv-7 cross-repo invariant.
leaflet-index-only Seed ≥ 100k rows, VACUUM ANALYZE tiles, then EXPLAIN SELECT file_path FROM tiles WHERE location_hash = $1 ORDER BY captured_at DESC, updated_at DESC, id DESC LIMIT 1 Plan contains Index Only Scan using tiles_leaflet_path; Heap Fetches ≤ 1. AZ-505 AC-3.
bulk-inventory-ordering GetTilesByLocationHashesAsync with 2500 hashes (mix of present + absent) Result is one-row-per-distinct-hash, most-recent across (source, flight). Order is hash-keyed; caller re-aligns to request order. AZ-505 AC-1 / AC-4.
backfill-completeness Migration 013 against a snapshot DB with N pre-existing rows Post-migration row count is N; every row has source='google_maps' and captured_at = created_at. AZ-484 AC-4.
location-hash-backfill Migration 014 against a snapshot DB after AZ-484 has applied Every row has non-NULL location_hash matching the application-side UUIDv5 for that row's (tile_zoom, tile_x, tile_y). AZ-503-foundation guarantee.
invalid-source Direct SQL insert with source='satar' (not in enum) Repository read either rejects deserialization or raises a contract violation; behavior MUST surface the violation, not swallow it. Inv-1 + coderule.mdc "never suppress errors silently".

Change Log

Version Date Change Author
1.0.0 2026-05-11 Initial contract — multi-source schema (source, captured_at), 5-column unique key, most-recent-across-sources read rule. Produced by AZ-484. autodev (Step 9)
2.0.0 2026-05-12 MAJOR. Identity columns + covering-index freeze. Added columns: flight_id (per-UAV-flight, nullable), location_hash (UUIDv5, NOT NULL), content_sha256 (BYTEA, app-NOT-NULL), legacy_id (pre-AZ-503 random id preserved one cycle). Replaced AZ-484 idx_tiles_unique_location_source (lat/lon-keyed) with idx_tiles_unique_identity (integer slippy + per-flight, NULL-coalesced) — migration 014. Added covering index tiles_leaflet_path (location_hash, captured_at DESC, updated_at DESC, id DESC) INCLUDE (file_path, source) — migration 015. Rewrote GetByTileCoordinatesAsync to filter on location_hash (behavior-preserving — same UUIDv5 deterministic on both ends — to enable index-only scan on the leaflet hot path). Added GetTilesByLocationHashesAsync for the AZ-505 bulk inventory endpoint. Introduced Inv-7 / Inv-8 / Inv-9. Produced jointly by AZ-503-foundation (cycle 5, columns + identity index) and AZ-505 (cycle 6, covering index + location_hash-keyed reads + bulk inventory). Consumers reviewed at bump time: AZ-485 (uav-tile-upload.md v1.1.0) — already aligned in cycle 5; AZ-505 (tile-inventory.md v1.0.0) — produced jointly. autodev (Step 10, cycle 6)