# Tile schema — usage-scenario analysis & cross-workspace design decision **Date**: 2026-05-12 **Trigger**: `AZ-304` (C6 Postgres schema) found contradictory inputs (AZ-263 bootstrap migration vs. AZ-303 contract vs. AZ-304 spec). Mid-Implement step the user requested a research pass over all tile-utilisation scenarios to verify whether the proposed schema is sufficient before committing. **Type**: design / research artefact. Replays at next `/autodev` invocation as input to AZ-304 implementation. Spawns ONE cross-workspace task in `satellite-provider`. **Related leftover**: `2026-05-09_satellite-provider-design-tasks.md` (D-PROJ-2 inbound ingest + voting layer — STILL OPEN). This document supersedes the "tile identity" portion of that leftover and re-uses the same multi-flight trust framing. --- ## 0. TL;DR The proposed two-hash schema (`id = uuidv5(z, x, y, source, flight_id)` + `location_hash = uuidv5(z, x, y)` + integer-only UPSERT key + `content_sha256`) is **sufficient and strictly better than current** for all eight scenarios analysed. Two changes are required outside the schema itself: 1. **`satellite-provider`** — replace `Guid.NewGuid()` with UUIDv5, replace the float-based UPSERT conflict key with the integer slippy-tile key extended by `flight_id`, add `content_sha256`. Tracked as a new todo task in that workspace (see § 5 hand-off). 2. **`satellite-provider`** — add the `enumerate_remote_coverage` GET surface that the onboard `TileDownloader` (AZ-316) already references (`GET /api/satellite/tiles?bbox=...&zoom=...&list-only=true`). This endpoint does NOT exist in `SatelliteProvider.Api/Program.cs` today. Folded into the same satellite-provider todo. No change needed to AZ-304's already-drafted schema for the onboard side — the new columns are additive. --- ## 1. Scenario inventory & access patterns Researched in parallel across `gps-denied-onboard`, `ui/`, `ui/mission-planner/`, and `satellite-provider/`. | # | Scenario | Driver | Access pattern | Where it lives | |---|----------|--------|----------------|----------------| | 1 | UI tile display (Leaflet) | Operator pans the map | `GET /tiles/{z}/{x}/{y}` — one tile per request, no filters | `ui/src/features/flights/{FlightMap,MiniMap}.tsx`, `mission-planner/src/flightPlanning/MapView.tsx`. Backed by `satellite-provider/.../Program.cs` MapGet `/tiles/{z:int}/{x:int}/{y:int}` → `TileService.GetOrDownloadTileAsync` → `TileRepository.GetByTileCoordinatesAsync` (AZ-484 "most-recent across sources, deterministic tie-break") | | 2 | Pre-flight cache provisioning (operator) | Operator submits flight plan JSON (geofences + waypoints), C11 downloads required tiles, C10 builds engines/manifest | bbox + zoom_levels → enumerate (z, x, y) → bulk fetch from satellite-provider, write to local C6 | UI side: `mission-planner/.../flightPlan.tsx` exports `{ geofences.polygons[].{northWest, southEast}, action_points[] }` JSON. Onboard: `C11 TileDownloader.enumerate_remote_coverage` + `download_tiles_for_area` (AZ-316); `C10 CacheProvisioner.build_cache_artifacts` (AZ-325). Persistence: AZ-303 + AZ-304/305 (this repo). | | 3 | Post-flight UAV tile upload | UAV captured mid-flight tiles after landing | Onboard: enumerate pending tiles by `(source='onboard_ingest', voting_status='pending')` → POST multipart → satellite-provider stores with `source='uav'` | Onboard: `C13 FDR tile_snapshot_sink.py` (capture) + `C11 TileUploader` (AZ-319 task spec). Satellite-provider: `/api/satellite/upload` → `UavTileUploadHandler` → `TileRepository.InsertAsync` (currently UPSERT on `(latitude, longitude, tile_zoom, tile_size_meters, source)`). | | 4 | Onboard nav-time tile fetch | C2 SuperGlue / C2.5 rerank needs a local tile under the UAV's current pose estimate | `(zoom, lat, lon)` neighborhood lookup OR pre-computed `(zoom, tile_x, tile_y)` — local-only, no network | C6 in this repo. Hot path. | | 5 | Freshness gate / eviction | AZ-307 freshness rules + AZ-308 budget enforcement | Range scans on `captured_at`, `sector_class`, `last_accessed_at`, `byte_count` | C6 in this repo. | | 6 | Multi-flight trust promotion (future) | D-PROJ-2 voting layer in satellite-provider | "All `source='uav'` tiles for this `(z, x, y)` grouped by flight_id, run voting" | Satellite-provider future work — currently NO implementation. See `2026-05-09_satellite-provider-design-tasks.md` Design Task #2. | | 7 | Bulk replay / re-grade | Operator re-runs quality grading or manifest rebuild | Stream by `id` or `captured_at` | Either repo. | | 8 | Per-flight tile inspection UI | Operator wants to see what flight X uploaded | `WHERE flight_id = ? ORDER BY captured_at` | Satellite-provider read path; future UI page. | --- ## 2. Schema sufficiency per scenario Proposed schema columns: ``` id uuid PRIMARY KEY = uuidv5(NAMESPACE, "${z}/${x}/${y}/${source}/${flight_id or 'none'}") location_hash uuid NOT NULL = uuidv5(NAMESPACE, "${z}/${x}/${y}") zoom_level smallint NOT NULL tile_x integer NOT NULL tile_y integer NOT NULL latitude double precision NOT NULL -- center, derived, advisory longitude double precision NOT NULL -- center, derived, advisory tile_size_meters double precision NOT NULL source text NOT NULL CHECK (source IN ('google_maps', 'uav', 'onboard_ingest')) flight_id uuid NULL content_sha256 bytea NOT NULL -- 32 bytes; tile JPEG digest captured_at timestamptz NOT NULL ... UNIQUE (zoom_level, tile_x, tile_y, tile_size_meters, source, COALESCE(flight_id, '00000000-0000-0000-0000-000000000000'::uuid)) INDEX btree (location_hash) -- scenario 1, 6 INDEX btree (zoom_level, tile_x, tile_y) -- scenario 2, 4 INDEX btree (flight_id, captured_at) -- scenario 8 INDEX btree (sector_class, captured_at) -- scenario 5 ``` ### Scenario 1 — UI tile display ✅ - Current: `WHERE tile_zoom=? AND tile_x=? AND tile_y=? ORDER BY captured_at DESC, ... LIMIT 1` — 3-column compound predicate. - Proposed: `WHERE location_hash=? ORDER BY captured_at DESC LIMIT 1` — single equality probe on a uuid column (hash-index-friendly). - Modest performance win; major correctness win: the cell-bag "give me everything for this (z, x, y)" becomes natural for future season-toggle UI. ### Scenario 2 — Pre-flight provisioning ✅ - Mission-planner submits `{ northWest: {lat, lon}, southEast: {lat, lon} }` rectangles. Onboard converts to slippy-tile ranges via `helpers/wgs_converter.py.latlon_to_tile_xy` (same formula as C# `GeoUtils.WorldToTilePos`, verified deterministic in the prior turn). - The enumerated `(z, x, y)` list goes to satellite-provider's bulk-fetch endpoint, which today **does not exist**. The closest is `GET /api/satellite/tiles/latlon` (single tile by lat/lon) and there's a private `GetTilesByRegionAsync` (uses double-comparison `latitude BETWEEN ... AND longitude BETWEEN ...` — also imprecise on floats and not exposed over HTTP). - The schema supports this fine via `(zoom_level, tile_x, tile_y)` btree. The missing piece is the HTTP surface — that's part of the new satellite-provider task. ### Scenario 3 — UAV tile upload ⚠️ requires UPSERT-key change - **Current bug**: UPSERT conflict key is `(latitude, longitude, tile_zoom, tile_size_meters, source)`. The first two are `double precision`. Two POSTs of the "same" tile from independently computed `tile_center_latitude` values can hash-equal (per Postgres semantics) only when bit-identical — which is fragile. AZ-484 partially papered over this with `DISTINCT ON` + ORDER BY tie-break on read, but the duplicate row still exists. - **Bigger issue**: even if floats were stable, the UPSERT collapses multi-flight `uav` rows for the same cell into ONE row, with `DO UPDATE` overwriting `file_path` + `tile_x` + `tile_y` + `captured_at`. **This destroys per-flight evidence that D-PROJ-2's voting layer (Design Task #2 from the prior leftover) needs.** Flight A's tile is lost the moment Flight B uploads the same cell. - **Fix** (part of new satellite-provider task): switch UPSERT to integer-only key `(zoom_level, tile_x, tile_y, tile_size_meters, source, COALESCE(flight_id, '00000000-0000-0000-0000-000000000000'::uuid))`. Now Flight A and Flight B each get their own row; voting can see both. ### Scenario 4 — Onboard nav-time fetch ✅ - C6-local: `(zoom_level, tile_x, tile_y)` btree gives O(log N) point lookup. - A small neighborhood query (e.g. fetch 3×3 tiles around current UAV pose) is a range scan on the same index. **Schema sufficient.** ### Scenario 5 — Freshness gate & eviction ✅ - Btree on `(sector_class, captured_at)` covers AZ-307's "active_conflict ≥30d → reject / DOWNGRADE" rule. - Btree on `(last_accessed_at)` covers AZ-308's LRU eviction. - Schema sufficient (these indices already in `0001_initial.py`). ### Scenario 6 — Multi-flight trust promotion ✅ - Voting query: `SELECT flight_id, file_path, quality_metadata FROM tiles WHERE location_hash=? AND source='uav'`. - Single hash-index hit on `location_hash` returns the candidate set; aggregation in app code. **Schema sufficient.** Note: the trust layer is owned by satellite-provider (Design Task #2 — still open). ### Scenario 7 — Bulk replay ✅ — standard. ### Scenario 8 — Per-flight inspection ✅ — `(flight_id, captured_at)` btree. --- ## 3. Gaps surfaced by the analysis (NOT schema gaps — surrounding code/API gaps) | # | Gap | Owner | Effort | |---|-----|-------|--------| | G1 | satellite-provider uses `Guid.NewGuid()` for tile id (non-deterministic). | satellite-provider | small (replace with UUIDv5 generator) | | G2 | satellite-provider UPSERT key uses doubles + omits `flight_id`. | satellite-provider | small (rewrite SQL + migration) | | G3 | satellite-provider lacks `content_sha256` for content-addressable dedup. | satellite-provider | small (column + compute on insert) | | G4 | satellite-provider lacks `GET /api/satellite/tiles?bbox=...&zoom=...&list-only=true` — referenced by onboard `TileDownloader` (AZ-316) but not implemented. | satellite-provider | medium (new MapGet + service method + tile-coord enumeration shared with onboard) | | G5 | satellite-provider lacks the inbound ingest + voting endpoints from D-PROJ-2 leftover. | satellite-provider | large — already filed in `2026-05-09_satellite-provider-design-tasks.md` | | G6 | Cross-workspace WGS↔tile coordinate math is duplicated (`helpers/wgs_converter.py` + `GeoUtils.cs`) with identical formulas. | suite | low priority — both sides match within 1 ULP. Document equivalence; do not refactor. | G1–G4 form ONE coherent satellite-provider task (the schema/UPSERT change + the bulk-list endpoint). G5 stays as its own task per the prior leftover. --- ## 4. Onboard impact For this repo (`gps-denied-onboard`), the conclusion is: - **AZ-304 still goes forward with the proposed schema** — it's strictly compatible with all scenarios above. - **AZ-304 should add `location_hash` and `content_sha256` columns** to the `tiles` table — additive migration on top of `0001_initial.py`. This was already in the proposed Option C from the prior turn; this analysis confirms it. - **No `flight_id` column changes** required onboard — already present. - **AZ-303 contract update**: the `TileMetadataStore` Protocol's DTO must learn `location_hash: uuid.UUID` and `content_sha256: bytes` fields. Marked as a non-breaking minor bump (per the contract's versioning rules). - **AZ-316 TileDownloader** (not yet implemented) will continue to call `enumerate_remote_coverage` against satellite-provider's planned `GET /api/satellite/tiles?bbox=...&list-only=true`. Until that endpoint exists onboard the downloader must use per-tile GETs via `/tiles/{z}/{x}/{y}` — slower but functional. Add this caveat to AZ-316. --- ## 5. Hand-off — what happens next ### Inside `gps-denied-onboard` (this repo) - This document remains in `_docs/_process_leftovers/` until AZ-304 implementation incorporates it. Next `/autodev` invocation should: 1. Read this file. 2. Confirm the proposed schema (Option C from the prior turn + `location_hash`/`content_sha256`) with the user. 3. Execute AZ-304 as an **additive** `0002` migration on top of `0001_initial.py`. - The autodev state file is updated to reflect parking AZ-304 implementation pending user confirmation of the design. ### Inside `satellite-provider` (cross-workspace) A new todo task spec is written at: `/Users/obezdienie001/dev/azaion/suite/satellite-provider/_docs/02_tasks/todo/AZ-TBD_tile_identity_uuidv5_bulk_list.md` Next `/autodev` in that workspace must: 1. Claim a real `AZ-NNN` Jira ID for it. 2. Pull G1–G4 from § 3 above into a single PBI (estimated 3pt — moderate scope, low risk, additive migration). 3. Coordinate the migration via a feature flag: dual-write both `Guid.NewGuid()` and `uuidv5(...)` for a deprecation window, OR a one-time backfill that recomputes `id` and `location_hash` for all existing rows. --- ## 6. Retrieval efficiency & on-disk layout (added in the same session) ### 6.1 Why "multiple versions per cell" stays cheap The realistic row count per `location_hash` is bounded: **1 `google_maps` + ~1–5 `uav` rows from prior flights = 2–6 rows total**. That bound is what keeps "find best" cheap; a materialised pointer table is NOT needed at the bounded scale. **Q1 — Leaflet `/tiles/{z}/{x}/{y}` hot path** — required covering index: ```sql CREATE INDEX tiles_leaflet_path ON tiles (location_hash, captured_at DESC, updated_at DESC, id DESC) INCLUDE (file_path, content_type, etag, voting_status); ``` Query: ```sql SELECT file_path, content_type, etag FROM tiles WHERE location_hash = $1 AND voting_status IN ('trusted', NULL) ORDER BY captured_at DESC, updated_at DESC, id DESC LIMIT 1; ``` Layers above the DB do most of the work: `MemoryCache` keyed `tile_{z}_{x}_{y}` (L1, µs) → OS page cache (L2, sub-ms) → DB only on cold path. Index-only scan at N=6 ≈ 6 µs. **Q2 — Pre-flight provisioning** — replace the proposed `GET /api/satellite/tiles?bbox=...&list-only=true` with a POST inventory endpoint. The onboard side already has `helpers/wgs_converter.py.latlon_to_tile_xy` (deterministic, identical to C# `GeoUtils.WorldToTilePos`), so it can enumerate `(z, x, y)` locally and ask the server only "what do you have for THIS list?": ``` POST /api/satellite/tiles/inventory Body: { "tiles": [{"z":18,"x":12345,"y":23456}, ...] } Response: per-tile { id, location_hash, captured_at, resolution_m_per_px, estimated_bytes, source } | null ``` Server query (one round-trip, indexed via `location_hash` btree): ```sql SELECT DISTINCT ON (location_hash) location_hash, id, captured_at, resolution_m_per_px, estimated_bytes, source FROM tiles WHERE location_hash = ANY($1::uuid[]) AND voting_status IN ('trusted', NULL) ORDER BY location_hash, captured_at DESC, updated_at DESC, id DESC; ``` At 2500 cells × N=6 = 15k rows scanned, `DISTINCT ON` collapses to 2500. Postgres handles this ~100–200 ms cold, ~30 ms warm — well inside the 500 ms AC-9 budget. **Defer the materialised `tile_current` pointer table** until production profiling demands it. Pre-optimisation is rejected. ### 6.2 On-disk layout — source-segregated, flight-keyed Current state (already partially correct): - `./tiles/{z}/{x}/{y}/...jpg` — Google Maps - `./tiles/uav/{z}/{x}/{y}.jpg` — UAV (per `uav-tile-upload.md` v1.0.0) Target layout: ``` ./tiles/ google_maps/{z}/{x}/{y}.jpg # 1 file per cell, no flight_id uav/{flight_id}/{z}/{x}/{y}.jpg # 1 file per flight per cell onboard_ingest/{flight_id}/{z}/{x}/{y}.jpg # D-PROJ-2 future ``` Properties: - The DB `file_path` column is authoritative; FS layout is informational/predictable but never parsed by code. - Natural sharding on `x` keeps leaf directories bounded. - `flight_id` in the path lets ops nuke "everything from flight X" with `rm -r ./tiles/uav/{flight_id}/` — the same primitive D-PROJ-2 voting needs for adversarial rollback. - **No content-addressable / blob storage.** At our scale CAS adds complexity without measurable benefit. `content_sha256` gives dedup *detection* without forcing dedup *storage*. - Page cache + `MemoryCache` carry the hot path; SSDs make adjacent-file locality irrelevant. ### 6.3 Bulk retrieval — HTTP/2 multiplexing, NOT application-level batching The conventional "bulk fetch" instinct (one POST returns N tiles in a multipart body) is rejected. The right "bulk" mechanism is **HTTP/2 multiplexing** for both scenarios. #### Leaflet path (`/tiles/{z}/{x}/{y}`) Custom-TileLayer batching is rejected. Every per-tile property we want to keep — ETag-based 304s, browser cache, CDN/nginx edge cache, independent retry, display-as-arrives latency, conventional `image/jpeg` content-type — is lost the moment you bundle. HTTP/2 multiplexing gives the same wire-level efficiency (one TCP connection, header compression, response interleaving) while preserving every property above. This is what every production tile server (Mapbox, OSM, ArcGIS) does. **Action**: enable HTTP/2 (and HTTP/3 when feasible) at the nginx + Kestrel boundary in `satellite-provider`. Zero application code changes. Current state: neither side has HTTP/2 enabled today. ASP.NET Kestrel defaults to HTTP/1.1; flip `EndpointDefaults.Protocols = HttpProtocols.Http1AndHttp2` (or `Http1AndHttp2AndHttp3` over TLS). One-line change at the host config layer. #### UAV provisioning path (no Leaflet constraint) Multipart / tar / zip bundle responses are also rejected. Their downsides: - Server holds N file handles open and streams serially within one response → less parallelism than HTTP/2 multistream. - Resume granularity collapses to "the whole bundle". - Cache key becomes the bundle, not the tile. - Client needs a parser. Per-tile GET over HTTP/2 wins on every axis, and the journal-based resume already designed in AZ-316 keeps its semantics intact. PMTiles is excellent for STATIC pre-built tile sets (Cloudflare/Protomaps); our DB is dynamic (UAV uploads invalidate any pre-built archive). Defer PMTiles until profiling demands it. The real efficiency win on this path is the **inventory pre-filter step** (already designed in § 6.1). Refined `TileDownloader` (AZ-316) flow: 1. C11 computes the `(z, x, y)` list locally (deterministic slippy math — verified in the prior turn). 2. `POST /api/satellite/tiles/inventory` with the coord list → metadata-only response (`captured_at`, `estimated_bytes`, `resolution_m_per_px`). 3. Run AZ-307 freshness gate + RESTRICT-SAT-4 resolution gate + journal-skip against metadata → produce a "keep list". 4. Issue per-tile `GET /tiles/{z}/{x}/{y}` for the keep list over an HTTP/2 multiplexed connection: `httpx.Client(http2=True, limits=httpx.Limits(max_connections=10, max_keepalive_connections=10))`. 5. Append journal per successful write; per-tile retry on transient failure; resume on partial-success. In active-conflict zones where many tiles are stale or sub-spec, the inventory step lets C11 skip downloading bytes for 30–50% of the candidate set. That dwarfs any wire-level bulk-format gain. ### 6.4 Existing test that needs updating in the new satellite-provider task `SatelliteProvider.Tests/UavTileFilePathTests.cs:23` — current expectation: ``` "UAV file paths follow `./tiles/uav/{zoom}/{x}/{y}.jpg` per `uav-tile-upload.md` v1.0.0" ``` becomes: ``` "UAV file paths follow `./tiles/uav/{flight_id}/{zoom}/{x}/{y}.jpg`" ``` The migration moves existing files into per-flight directories during the same backfill that adds `location_hash`/`content_sha256`. Mechanical operation; DB `file_path` column rewritten in the same transaction. --- ## 7. Why this is parked as a leftover (not blocking) - The user explicitly asked for the analysis before deciding the AZ-304 reconciliation option. Decision is now informed, but the user did not yet pick A/B/C/D. The session is being closed for token reasons. - Next `/autodev` will re-enter at the same decision point with this document and the prior turn's recommendation as input.