Files
gps-denied-onboard/_docs/_process_leftovers/2026-05-12_tile-schema-scenario-analysis.md
T
Oleksandr Bezdieniezhnykh 21f5a30d09 refactor: update autodev state and tile metadata store version
- Changed autodev state to reflect the transition from batch 26 to batch 27, updating the phase and details for the compute-batch step.
- Incremented the version of the tile metadata store from 1.0.0 to 1.1.0, refining the uniqueness invariant to use a natural key that includes flight_id, allowing coexistence of multiple rows for the same tile from different flights.
- Updated the last modified date in the tile metadata store documentation to reflect recent changes.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-12 16:33:23 +03:00

273 lines
20 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Tile schema — usage-scenario analysis & cross-workspace design decision
**Date**: 2026-05-12
**Trigger**: `AZ-304` (C6 Postgres schema) found contradictory inputs (AZ-263 bootstrap migration vs. AZ-303 contract vs. AZ-304 spec). Mid-Implement step the user requested a research pass over all tile-utilisation scenarios to verify whether the proposed schema is sufficient before committing.
**Type**: design / research artefact. Replays at next `/autodev` invocation as input to AZ-304 implementation. Spawns ONE cross-workspace task in `satellite-provider`.
**Related leftover**: `2026-05-09_satellite-provider-design-tasks.md` (D-PROJ-2 inbound ingest + voting layer — STILL OPEN). This document supersedes the "tile identity" portion of that leftover and re-uses the same multi-flight trust framing.
---
## 0. TL;DR
The proposed two-hash schema (`id = uuidv5(z, x, y, source, flight_id)` + `location_hash = uuidv5(z, x, y)` + integer-only UPSERT key + `content_sha256`) is **sufficient and strictly better than current** for all eight scenarios analysed. Two changes are required outside the schema itself:
1. **`satellite-provider`** — replace `Guid.NewGuid()` with UUIDv5, replace the float-based UPSERT conflict key with the integer slippy-tile key extended by `flight_id`, add `content_sha256`. Tracked as a new todo task in that workspace (see § 5 hand-off).
2. **`satellite-provider`** — add the `enumerate_remote_coverage` GET surface that the onboard `TileDownloader` (AZ-316) already references (`GET /api/satellite/tiles?bbox=...&zoom=...&list-only=true`). This endpoint does NOT exist in `SatelliteProvider.Api/Program.cs` today. Folded into the same satellite-provider todo.
No change needed to AZ-304's already-drafted schema for the onboard side — the new columns are additive.
---
## 1. Scenario inventory & access patterns
Researched in parallel across `gps-denied-onboard`, `ui/`, `ui/mission-planner/`, and `satellite-provider/`.
| # | Scenario | Driver | Access pattern | Where it lives |
|---|----------|--------|----------------|----------------|
| 1 | UI tile display (Leaflet) | Operator pans the map | `GET /tiles/{z}/{x}/{y}` — one tile per request, no filters | `ui/src/features/flights/{FlightMap,MiniMap}.tsx`, `mission-planner/src/flightPlanning/MapView.tsx`. Backed by `satellite-provider/.../Program.cs` MapGet `/tiles/{z:int}/{x:int}/{y:int}``TileService.GetOrDownloadTileAsync``TileRepository.GetByTileCoordinatesAsync` (AZ-484 "most-recent across sources, deterministic tie-break") |
| 2 | Pre-flight cache provisioning (operator) | Operator submits flight plan JSON (geofences + waypoints), C11 downloads required tiles, C10 builds engines/manifest | bbox + zoom_levels → enumerate (z, x, y) → bulk fetch from satellite-provider, write to local C6 | UI side: `mission-planner/.../flightPlan.tsx` exports `{ geofences.polygons[].{northWest, southEast}, action_points[] }` JSON. Onboard: `C11 TileDownloader.enumerate_remote_coverage` + `download_tiles_for_area` (AZ-316); `C10 CacheProvisioner.build_cache_artifacts` (AZ-325). Persistence: AZ-303 + AZ-304/305 (this repo). |
| 3 | Post-flight UAV tile upload | UAV captured mid-flight tiles after landing | Onboard: enumerate pending tiles by `(source='onboard_ingest', voting_status='pending')` → POST multipart → satellite-provider stores with `source='uav'` | Onboard: `C13 FDR tile_snapshot_sink.py` (capture) + `C11 TileUploader` (AZ-319 task spec). Satellite-provider: `/api/satellite/upload``UavTileUploadHandler``TileRepository.InsertAsync` (currently UPSERT on `(latitude, longitude, tile_zoom, tile_size_meters, source)`). |
| 4 | Onboard nav-time tile fetch | C2 SuperGlue / C2.5 rerank needs a local tile under the UAV's current pose estimate | `(zoom, lat, lon)` neighborhood lookup OR pre-computed `(zoom, tile_x, tile_y)` — local-only, no network | C6 in this repo. Hot path. |
| 5 | Freshness gate / eviction | AZ-307 freshness rules + AZ-308 budget enforcement | Range scans on `captured_at`, `sector_class`, `last_accessed_at`, `byte_count` | C6 in this repo. |
| 6 | Multi-flight trust promotion (future) | D-PROJ-2 voting layer in satellite-provider | "All `source='uav'` tiles for this `(z, x, y)` grouped by flight_id, run voting" | Satellite-provider future work — currently NO implementation. See `2026-05-09_satellite-provider-design-tasks.md` Design Task #2. |
| 7 | Bulk replay / re-grade | Operator re-runs quality grading or manifest rebuild | Stream by `id` or `captured_at` | Either repo. |
| 8 | Per-flight tile inspection UI | Operator wants to see what flight X uploaded | `WHERE flight_id = ? ORDER BY captured_at` | Satellite-provider read path; future UI page. |
---
## 2. Schema sufficiency per scenario
Proposed schema columns:
```
id uuid PRIMARY KEY = uuidv5(NAMESPACE, "${z}/${x}/${y}/${source}/${flight_id or 'none'}")
location_hash uuid NOT NULL = uuidv5(NAMESPACE, "${z}/${x}/${y}")
zoom_level smallint NOT NULL
tile_x integer NOT NULL
tile_y integer NOT NULL
latitude double precision NOT NULL -- center, derived, advisory
longitude double precision NOT NULL -- center, derived, advisory
tile_size_meters double precision NOT NULL
source text NOT NULL CHECK (source IN ('google_maps', 'uav', 'onboard_ingest'))
flight_id uuid NULL
content_sha256 bytea NOT NULL -- 32 bytes; tile JPEG digest
captured_at timestamptz NOT NULL
...
UNIQUE (zoom_level, tile_x, tile_y, tile_size_meters, source, COALESCE(flight_id, '00000000-0000-0000-0000-000000000000'::uuid))
INDEX btree (location_hash) -- scenario 1, 6
INDEX btree (zoom_level, tile_x, tile_y) -- scenario 2, 4
INDEX btree (flight_id, captured_at) -- scenario 8
INDEX btree (sector_class, captured_at) -- scenario 5
```
### Scenario 1 — UI tile display ✅
- Current: `WHERE tile_zoom=? AND tile_x=? AND tile_y=? ORDER BY captured_at DESC, ... LIMIT 1` — 3-column compound predicate.
- Proposed: `WHERE location_hash=? ORDER BY captured_at DESC LIMIT 1` — single equality probe on a uuid column (hash-index-friendly).
- Modest performance win; major correctness win: the cell-bag "give me everything for this (z, x, y)" becomes natural for future season-toggle UI.
### Scenario 2 — Pre-flight provisioning ✅
- Mission-planner submits `{ northWest: {lat, lon}, southEast: {lat, lon} }` rectangles. Onboard converts to slippy-tile ranges via `helpers/wgs_converter.py.latlon_to_tile_xy` (same formula as C# `GeoUtils.WorldToTilePos`, verified deterministic in the prior turn).
- The enumerated `(z, x, y)` list goes to satellite-provider's bulk-fetch endpoint, which today **does not exist**. The closest is `GET /api/satellite/tiles/latlon` (single tile by lat/lon) and there's a private `GetTilesByRegionAsync` (uses double-comparison `latitude BETWEEN ... AND longitude BETWEEN ...` — also imprecise on floats and not exposed over HTTP).
- The schema supports this fine via `(zoom_level, tile_x, tile_y)` btree. The missing piece is the HTTP surface — that's part of the new satellite-provider task.
### Scenario 3 — UAV tile upload ⚠️ requires UPSERT-key change
- **Current bug**: UPSERT conflict key is `(latitude, longitude, tile_zoom, tile_size_meters, source)`. The first two are `double precision`. Two POSTs of the "same" tile from independently computed `tile_center_latitude` values can hash-equal (per Postgres semantics) only when bit-identical — which is fragile. AZ-484 partially papered over this with `DISTINCT ON` + ORDER BY tie-break on read, but the duplicate row still exists.
- **Bigger issue**: even if floats were stable, the UPSERT collapses multi-flight `uav` rows for the same cell into ONE row, with `DO UPDATE` overwriting `file_path` + `tile_x` + `tile_y` + `captured_at`. **This destroys per-flight evidence that D-PROJ-2's voting layer (Design Task #2 from the prior leftover) needs.** Flight A's tile is lost the moment Flight B uploads the same cell.
- **Fix** (part of new satellite-provider task): switch UPSERT to integer-only key `(zoom_level, tile_x, tile_y, tile_size_meters, source, COALESCE(flight_id, '00000000-0000-0000-0000-000000000000'::uuid))`. Now Flight A and Flight B each get their own row; voting can see both.
### Scenario 4 — Onboard nav-time fetch ✅
- C6-local: `(zoom_level, tile_x, tile_y)` btree gives O(log N) point lookup.
- A small neighborhood query (e.g. fetch 3×3 tiles around current UAV pose) is a range scan on the same index. **Schema sufficient.**
### Scenario 5 — Freshness gate & eviction ✅
- Btree on `(sector_class, captured_at)` covers AZ-307's "active_conflict ≥30d → reject / DOWNGRADE" rule.
- Btree on `(last_accessed_at)` covers AZ-308's LRU eviction.
- Schema sufficient (these indices already in `0001_initial.py`).
### Scenario 6 — Multi-flight trust promotion ✅
- Voting query: `SELECT flight_id, file_path, quality_metadata FROM tiles WHERE location_hash=? AND source='uav'`.
- Single hash-index hit on `location_hash` returns the candidate set; aggregation in app code. **Schema sufficient.** Note: the trust layer is owned by satellite-provider (Design Task #2 — still open).
### Scenario 7 — Bulk replay ✅ — standard.
### Scenario 8 — Per-flight inspection ✅ — `(flight_id, captured_at)` btree.
---
## 3. Gaps surfaced by the analysis (NOT schema gaps — surrounding code/API gaps)
| # | Gap | Owner | Effort |
|---|-----|-------|--------|
| G1 | satellite-provider uses `Guid.NewGuid()` for tile id (non-deterministic). | satellite-provider | small (replace with UUIDv5 generator) |
| G2 | satellite-provider UPSERT key uses doubles + omits `flight_id`. | satellite-provider | small (rewrite SQL + migration) |
| G3 | satellite-provider lacks `content_sha256` for content-addressable dedup. | satellite-provider | small (column + compute on insert) |
| G4 | satellite-provider lacks `GET /api/satellite/tiles?bbox=...&zoom=...&list-only=true` — referenced by onboard `TileDownloader` (AZ-316) but not implemented. | satellite-provider | medium (new MapGet + service method + tile-coord enumeration shared with onboard) |
| G5 | satellite-provider lacks the inbound ingest + voting endpoints from D-PROJ-2 leftover. | satellite-provider | large — already filed in `2026-05-09_satellite-provider-design-tasks.md` |
| G6 | Cross-workspace WGS↔tile coordinate math is duplicated (`helpers/wgs_converter.py` + `GeoUtils.cs`) with identical formulas. | suite | low priority — both sides match within 1 ULP. Document equivalence; do not refactor. |
G1G4 form ONE coherent satellite-provider task (the schema/UPSERT change + the bulk-list endpoint).
G5 stays as its own task per the prior leftover.
---
## 4. Onboard impact
For this repo (`gps-denied-onboard`), the conclusion is:
- **AZ-304 still goes forward with the proposed schema** — it's strictly compatible with all scenarios above.
- **AZ-304 should add `location_hash` and `content_sha256` columns** to the `tiles` table — additive migration on top of `0001_initial.py`. This was already in the proposed Option C from the prior turn; this analysis confirms it.
- **No `flight_id` column changes** required onboard — already present.
- **AZ-303 contract update**: the `TileMetadataStore` Protocol's DTO must learn `location_hash: uuid.UUID` and `content_sha256: bytes` fields. Marked as a non-breaking minor bump (per the contract's versioning rules).
- **AZ-316 TileDownloader** (not yet implemented) will continue to call `enumerate_remote_coverage` against satellite-provider's planned `GET /api/satellite/tiles?bbox=...&list-only=true`. Until that endpoint exists onboard the downloader must use per-tile GETs via `/tiles/{z}/{x}/{y}` — slower but functional. Add this caveat to AZ-316.
---
## 5. Hand-off — what happens next
### Inside `gps-denied-onboard` (this repo)
- This document remains in `_docs/_process_leftovers/` until AZ-304 implementation incorporates it. Next `/autodev` invocation should:
1. Read this file.
2. Confirm the proposed schema (Option C from the prior turn + `location_hash`/`content_sha256`) with the user.
3. Execute AZ-304 as an **additive** `0002` migration on top of `0001_initial.py`.
- The autodev state file is updated to reflect parking AZ-304 implementation pending user confirmation of the design.
### Inside `satellite-provider` (cross-workspace)
A new todo task spec is written at:
`/Users/obezdienie001/dev/azaion/suite/satellite-provider/_docs/02_tasks/todo/AZ-TBD_tile_identity_uuidv5_bulk_list.md`
Next `/autodev` in that workspace must:
1. Claim a real `AZ-NNN` Jira ID for it.
2. Pull G1G4 from § 3 above into a single PBI (estimated 3pt — moderate scope, low risk, additive migration).
3. Coordinate the migration via a feature flag: dual-write both `Guid.NewGuid()` and `uuidv5(...)` for a deprecation window, OR a one-time backfill that recomputes `id` and `location_hash` for all existing rows.
---
## 6. Retrieval efficiency & on-disk layout (added in the same session)
### 6.1 Why "multiple versions per cell" stays cheap
The realistic row count per `location_hash` is bounded: **1 `google_maps` + ~15 `uav` rows from prior flights = 26 rows total**. That bound is what keeps "find best" cheap; a materialised pointer table is NOT needed at the bounded scale.
**Q1 — Leaflet `/tiles/{z}/{x}/{y}` hot path** — required covering index:
```sql
CREATE INDEX tiles_leaflet_path
ON tiles (location_hash, captured_at DESC, updated_at DESC, id DESC)
INCLUDE (file_path, content_type, etag, voting_status);
```
Query:
```sql
SELECT file_path, content_type, etag FROM tiles
WHERE location_hash = $1
AND voting_status IN ('trusted', NULL)
ORDER BY captured_at DESC, updated_at DESC, id DESC
LIMIT 1;
```
Layers above the DB do most of the work: `MemoryCache` keyed `tile_{z}_{x}_{y}` (L1, µs) → OS page cache (L2, sub-ms) → DB only on cold path. Index-only scan at N=6 ≈ 6 µs.
**Q2 — Pre-flight provisioning** — replace the proposed `GET /api/satellite/tiles?bbox=...&list-only=true` with a POST inventory endpoint. The onboard side already has `helpers/wgs_converter.py.latlon_to_tile_xy` (deterministic, identical to C# `GeoUtils.WorldToTilePos`), so it can enumerate `(z, x, y)` locally and ask the server only "what do you have for THIS list?":
```
POST /api/satellite/tiles/inventory
Body: { "tiles": [{"z":18,"x":12345,"y":23456}, ...] }
Response: per-tile { id, location_hash, captured_at, resolution_m_per_px, estimated_bytes, source } | null
```
Server query (one round-trip, indexed via `location_hash` btree):
```sql
SELECT DISTINCT ON (location_hash)
location_hash, id, captured_at, resolution_m_per_px, estimated_bytes, source
FROM tiles
WHERE location_hash = ANY($1::uuid[])
AND voting_status IN ('trusted', NULL)
ORDER BY location_hash, captured_at DESC, updated_at DESC, id DESC;
```
At 2500 cells × N=6 = 15k rows scanned, `DISTINCT ON` collapses to 2500. Postgres handles this ~100200 ms cold, ~30 ms warm — well inside the 500 ms AC-9 budget.
**Defer the materialised `tile_current` pointer table** until production profiling demands it. Pre-optimisation is rejected.
### 6.2 On-disk layout — source-segregated, flight-keyed
Current state (already partially correct):
- `./tiles/{z}/{x}/{y}/...jpg` — Google Maps
- `./tiles/uav/{z}/{x}/{y}.jpg` — UAV (per `uav-tile-upload.md` v1.0.0)
Target layout:
```
./tiles/
google_maps/{z}/{x}/{y}.jpg # 1 file per cell, no flight_id
uav/{flight_id}/{z}/{x}/{y}.jpg # 1 file per flight per cell
onboard_ingest/{flight_id}/{z}/{x}/{y}.jpg # D-PROJ-2 future
```
Properties:
- The DB `file_path` column is authoritative; FS layout is informational/predictable but never parsed by code.
- Natural sharding on `x` keeps leaf directories bounded.
- `flight_id` in the path lets ops nuke "everything from flight X" with `rm -r ./tiles/uav/{flight_id}/` — the same primitive D-PROJ-2 voting needs for adversarial rollback.
- **No content-addressable / blob storage.** At our scale CAS adds complexity without measurable benefit. `content_sha256` gives dedup *detection* without forcing dedup *storage*.
- Page cache + `MemoryCache` carry the hot path; SSDs make adjacent-file locality irrelevant.
### 6.3 Bulk retrieval — HTTP/2 multiplexing, NOT application-level batching
The conventional "bulk fetch" instinct (one POST returns N tiles in a multipart body) is rejected. The right "bulk" mechanism is **HTTP/2 multiplexing** for both scenarios.
#### Leaflet path (`/tiles/{z}/{x}/{y}`)
Custom-TileLayer batching is rejected. Every per-tile property we want to keep — ETag-based 304s, browser cache, CDN/nginx edge cache, independent retry, display-as-arrives latency, conventional `image/jpeg` content-type — is lost the moment you bundle. HTTP/2 multiplexing gives the same wire-level efficiency (one TCP connection, header compression, response interleaving) while preserving every property above. This is what every production tile server (Mapbox, OSM, ArcGIS) does.
**Action**: enable HTTP/2 (and HTTP/3 when feasible) at the nginx + Kestrel boundary in `satellite-provider`. Zero application code changes.
Current state: neither side has HTTP/2 enabled today. ASP.NET Kestrel defaults to HTTP/1.1; flip `EndpointDefaults.Protocols = HttpProtocols.Http1AndHttp2` (or `Http1AndHttp2AndHttp3` over TLS). One-line change at the host config layer.
#### UAV provisioning path (no Leaflet constraint)
Multipart / tar / zip bundle responses are also rejected. Their downsides:
- Server holds N file handles open and streams serially within one response → less parallelism than HTTP/2 multistream.
- Resume granularity collapses to "the whole bundle".
- Cache key becomes the bundle, not the tile.
- Client needs a parser.
Per-tile GET over HTTP/2 wins on every axis, and the journal-based resume already designed in AZ-316 keeps its semantics intact.
PMTiles is excellent for STATIC pre-built tile sets (Cloudflare/Protomaps); our DB is dynamic (UAV uploads invalidate any pre-built archive). Defer PMTiles until profiling demands it.
The real efficiency win on this path is the **inventory pre-filter step** (already designed in § 6.1). Refined `TileDownloader` (AZ-316) flow:
1. C11 computes the `(z, x, y)` list locally (deterministic slippy math — verified in the prior turn).
2. `POST /api/satellite/tiles/inventory` with the coord list → metadata-only response (`captured_at`, `estimated_bytes`, `resolution_m_per_px`).
3. Run AZ-307 freshness gate + RESTRICT-SAT-4 resolution gate + journal-skip against metadata → produce a "keep list".
4. Issue per-tile `GET /tiles/{z}/{x}/{y}` for the keep list over an HTTP/2 multiplexed connection: `httpx.Client(http2=True, limits=httpx.Limits(max_connections=10, max_keepalive_connections=10))`.
5. Append journal per successful write; per-tile retry on transient failure; resume on partial-success.
In active-conflict zones where many tiles are stale or sub-spec, the inventory step lets C11 skip downloading bytes for 3050% of the candidate set. That dwarfs any wire-level bulk-format gain.
### 6.4 Existing test that needs updating in the new satellite-provider task
`SatelliteProvider.Tests/UavTileFilePathTests.cs:23` — current expectation:
```
"UAV file paths follow `./tiles/uav/{zoom}/{x}/{y}.jpg` per `uav-tile-upload.md` v1.0.0"
```
becomes:
```
"UAV file paths follow `./tiles/uav/{flight_id}/{zoom}/{x}/{y}.jpg`"
```
The migration moves existing files into per-flight directories during the same backfill that adds `location_hash`/`content_sha256`. Mechanical operation; DB `file_path` column rewritten in the same transaction.
---
## 7. Why this is parked as a leftover (not blocking)
- The user explicitly asked for the analysis before deciding the AZ-304 reconciliation option. Decision is now informed, but the user did not yet pick A/B/C/D. The session is being closed for token reasons.
- Next `/autodev` will re-enter at the same decision point with this document and the prior turn's recommendation as input.