diff --git a/_docs/_autodev_state.md b/_docs/_autodev_state.md index 0f036a7..c87e9f9 100644 --- a/_docs/_autodev_state.md +++ b/_docs/_autodev_state.md @@ -4,11 +4,11 @@ flow: greenfield step: 7 name: Implement -status: paused_user_requested +status: in_progress sub_step: - phase: 14.5 - name: cumulative-review-aftermath - detail: "batches 23-27 cumulative review: PASS_WITH_WARNINGS (3 Medium doc-drift findings on module-layout.md). User approved fix-now for findings, then chose stop_here. Module-layout.md updated; user will re-invoke /autodev when ready to continue." + phase: 14 + name: batch-loop + detail: "resumed after 23-27 cumulative review (PASS_WITH_WARNINGS, fixes landed in 1141d17); 27/~? batches done; 84 tasks remain in todo" retry_count: 0 cycle: 1 tracker: jira diff --git a/_docs/_process_leftovers/2026-05-12_tile-schema-scenario-analysis.md b/_docs/_process_leftovers/2026-05-12_tile-schema-scenario-analysis.md deleted file mode 100644 index 92bb542..0000000 --- a/_docs/_process_leftovers/2026-05-12_tile-schema-scenario-analysis.md +++ /dev/null @@ -1,272 +0,0 @@ -# Tile schema — usage-scenario analysis & cross-workspace design decision - -**Date**: 2026-05-12 -**Trigger**: `AZ-304` (C6 Postgres schema) found contradictory inputs (AZ-263 bootstrap migration vs. AZ-303 contract vs. AZ-304 spec). Mid-Implement step the user requested a research pass over all tile-utilisation scenarios to verify whether the proposed schema is sufficient before committing. -**Type**: design / research artefact. Replays at next `/autodev` invocation as input to AZ-304 implementation. Spawns ONE cross-workspace task in `satellite-provider`. -**Related leftover**: `2026-05-09_satellite-provider-design-tasks.md` (D-PROJ-2 inbound ingest + voting layer — STILL OPEN). This document supersedes the "tile identity" portion of that leftover and re-uses the same multi-flight trust framing. - ---- - -## 0. TL;DR - -The proposed two-hash schema (`id = uuidv5(z, x, y, source, flight_id)` + `location_hash = uuidv5(z, x, y)` + integer-only UPSERT key + `content_sha256`) is **sufficient and strictly better than current** for all eight scenarios analysed. Two changes are required outside the schema itself: - -1. **`satellite-provider`** — replace `Guid.NewGuid()` with UUIDv5, replace the float-based UPSERT conflict key with the integer slippy-tile key extended by `flight_id`, add `content_sha256`. Tracked as a new todo task in that workspace (see § 5 hand-off). -2. **`satellite-provider`** — add the `enumerate_remote_coverage` GET surface that the onboard `TileDownloader` (AZ-316) already references (`GET /api/satellite/tiles?bbox=...&zoom=...&list-only=true`). This endpoint does NOT exist in `SatelliteProvider.Api/Program.cs` today. Folded into the same satellite-provider todo. - -No change needed to AZ-304's already-drafted schema for the onboard side — the new columns are additive. - ---- - -## 1. Scenario inventory & access patterns - -Researched in parallel across `gps-denied-onboard`, `ui/`, `ui/mission-planner/`, and `satellite-provider/`. - -| # | Scenario | Driver | Access pattern | Where it lives | -|---|----------|--------|----------------|----------------| -| 1 | UI tile display (Leaflet) | Operator pans the map | `GET /tiles/{z}/{x}/{y}` — one tile per request, no filters | `ui/src/features/flights/{FlightMap,MiniMap}.tsx`, `mission-planner/src/flightPlanning/MapView.tsx`. Backed by `satellite-provider/.../Program.cs` MapGet `/tiles/{z:int}/{x:int}/{y:int}` → `TileService.GetOrDownloadTileAsync` → `TileRepository.GetByTileCoordinatesAsync` (AZ-484 "most-recent across sources, deterministic tie-break") | -| 2 | Pre-flight cache provisioning (operator) | Operator submits flight plan JSON (geofences + waypoints), C11 downloads required tiles, C10 builds engines/manifest | bbox + zoom_levels → enumerate (z, x, y) → bulk fetch from satellite-provider, write to local C6 | UI side: `mission-planner/.../flightPlan.tsx` exports `{ geofences.polygons[].{northWest, southEast}, action_points[] }` JSON. Onboard: `C11 TileDownloader.enumerate_remote_coverage` + `download_tiles_for_area` (AZ-316); `C10 CacheProvisioner.build_cache_artifacts` (AZ-325). Persistence: AZ-303 + AZ-304/305 (this repo). | -| 3 | Post-flight UAV tile upload | UAV captured mid-flight tiles after landing | Onboard: enumerate pending tiles by `(source='onboard_ingest', voting_status='pending')` → POST multipart → satellite-provider stores with `source='uav'` | Onboard: `C13 FDR tile_snapshot_sink.py` (capture) + `C11 TileUploader` (AZ-319 task spec). Satellite-provider: `/api/satellite/upload` → `UavTileUploadHandler` → `TileRepository.InsertAsync` (currently UPSERT on `(latitude, longitude, tile_zoom, tile_size_meters, source)`). | -| 4 | Onboard nav-time tile fetch | C2 SuperGlue / C2.5 rerank needs a local tile under the UAV's current pose estimate | `(zoom, lat, lon)` neighborhood lookup OR pre-computed `(zoom, tile_x, tile_y)` — local-only, no network | C6 in this repo. Hot path. | -| 5 | Freshness gate / eviction | AZ-307 freshness rules + AZ-308 budget enforcement | Range scans on `captured_at`, `sector_class`, `last_accessed_at`, `byte_count` | C6 in this repo. | -| 6 | Multi-flight trust promotion (future) | D-PROJ-2 voting layer in satellite-provider | "All `source='uav'` tiles for this `(z, x, y)` grouped by flight_id, run voting" | Satellite-provider future work — currently NO implementation. See `2026-05-09_satellite-provider-design-tasks.md` Design Task #2. | -| 7 | Bulk replay / re-grade | Operator re-runs quality grading or manifest rebuild | Stream by `id` or `captured_at` | Either repo. | -| 8 | Per-flight tile inspection UI | Operator wants to see what flight X uploaded | `WHERE flight_id = ? ORDER BY captured_at` | Satellite-provider read path; future UI page. | - ---- - -## 2. Schema sufficiency per scenario - -Proposed schema columns: - -``` -id uuid PRIMARY KEY = uuidv5(NAMESPACE, "${z}/${x}/${y}/${source}/${flight_id or 'none'}") -location_hash uuid NOT NULL = uuidv5(NAMESPACE, "${z}/${x}/${y}") -zoom_level smallint NOT NULL -tile_x integer NOT NULL -tile_y integer NOT NULL -latitude double precision NOT NULL -- center, derived, advisory -longitude double precision NOT NULL -- center, derived, advisory -tile_size_meters double precision NOT NULL -source text NOT NULL CHECK (source IN ('google_maps', 'uav', 'onboard_ingest')) -flight_id uuid NULL -content_sha256 bytea NOT NULL -- 32 bytes; tile JPEG digest -captured_at timestamptz NOT NULL -... -UNIQUE (zoom_level, tile_x, tile_y, tile_size_meters, source, COALESCE(flight_id, '00000000-0000-0000-0000-000000000000'::uuid)) -INDEX btree (location_hash) -- scenario 1, 6 -INDEX btree (zoom_level, tile_x, tile_y) -- scenario 2, 4 -INDEX btree (flight_id, captured_at) -- scenario 8 -INDEX btree (sector_class, captured_at) -- scenario 5 -``` - -### Scenario 1 — UI tile display ✅ -- Current: `WHERE tile_zoom=? AND tile_x=? AND tile_y=? ORDER BY captured_at DESC, ... LIMIT 1` — 3-column compound predicate. -- Proposed: `WHERE location_hash=? ORDER BY captured_at DESC LIMIT 1` — single equality probe on a uuid column (hash-index-friendly). -- Modest performance win; major correctness win: the cell-bag "give me everything for this (z, x, y)" becomes natural for future season-toggle UI. - -### Scenario 2 — Pre-flight provisioning ✅ -- Mission-planner submits `{ northWest: {lat, lon}, southEast: {lat, lon} }` rectangles. Onboard converts to slippy-tile ranges via `helpers/wgs_converter.py.latlon_to_tile_xy` (same formula as C# `GeoUtils.WorldToTilePos`, verified deterministic in the prior turn). -- The enumerated `(z, x, y)` list goes to satellite-provider's bulk-fetch endpoint, which today **does not exist**. The closest is `GET /api/satellite/tiles/latlon` (single tile by lat/lon) and there's a private `GetTilesByRegionAsync` (uses double-comparison `latitude BETWEEN ... AND longitude BETWEEN ...` — also imprecise on floats and not exposed over HTTP). -- The schema supports this fine via `(zoom_level, tile_x, tile_y)` btree. The missing piece is the HTTP surface — that's part of the new satellite-provider task. - -### Scenario 3 — UAV tile upload ⚠️ requires UPSERT-key change -- **Current bug**: UPSERT conflict key is `(latitude, longitude, tile_zoom, tile_size_meters, source)`. The first two are `double precision`. Two POSTs of the "same" tile from independently computed `tile_center_latitude` values can hash-equal (per Postgres semantics) only when bit-identical — which is fragile. AZ-484 partially papered over this with `DISTINCT ON` + ORDER BY tie-break on read, but the duplicate row still exists. -- **Bigger issue**: even if floats were stable, the UPSERT collapses multi-flight `uav` rows for the same cell into ONE row, with `DO UPDATE` overwriting `file_path` + `tile_x` + `tile_y` + `captured_at`. **This destroys per-flight evidence that D-PROJ-2's voting layer (Design Task #2 from the prior leftover) needs.** Flight A's tile is lost the moment Flight B uploads the same cell. -- **Fix** (part of new satellite-provider task): switch UPSERT to integer-only key `(zoom_level, tile_x, tile_y, tile_size_meters, source, COALESCE(flight_id, '00000000-0000-0000-0000-000000000000'::uuid))`. Now Flight A and Flight B each get their own row; voting can see both. - -### Scenario 4 — Onboard nav-time fetch ✅ -- C6-local: `(zoom_level, tile_x, tile_y)` btree gives O(log N) point lookup. -- A small neighborhood query (e.g. fetch 3×3 tiles around current UAV pose) is a range scan on the same index. **Schema sufficient.** - -### Scenario 5 — Freshness gate & eviction ✅ -- Btree on `(sector_class, captured_at)` covers AZ-307's "active_conflict ≥30d → reject / DOWNGRADE" rule. -- Btree on `(last_accessed_at)` covers AZ-308's LRU eviction. -- Schema sufficient (these indices already in `0001_initial.py`). - -### Scenario 6 — Multi-flight trust promotion ✅ -- Voting query: `SELECT flight_id, file_path, quality_metadata FROM tiles WHERE location_hash=? AND source='uav'`. -- Single hash-index hit on `location_hash` returns the candidate set; aggregation in app code. **Schema sufficient.** Note: the trust layer is owned by satellite-provider (Design Task #2 — still open). - -### Scenario 7 — Bulk replay ✅ — standard. -### Scenario 8 — Per-flight inspection ✅ — `(flight_id, captured_at)` btree. - ---- - -## 3. Gaps surfaced by the analysis (NOT schema gaps — surrounding code/API gaps) - -| # | Gap | Owner | Effort | -|---|-----|-------|--------| -| G1 | satellite-provider uses `Guid.NewGuid()` for tile id (non-deterministic). | satellite-provider | small (replace with UUIDv5 generator) | -| G2 | satellite-provider UPSERT key uses doubles + omits `flight_id`. | satellite-provider | small (rewrite SQL + migration) | -| G3 | satellite-provider lacks `content_sha256` for content-addressable dedup. | satellite-provider | small (column + compute on insert) | -| G4 | satellite-provider lacks `GET /api/satellite/tiles?bbox=...&zoom=...&list-only=true` — referenced by onboard `TileDownloader` (AZ-316) but not implemented. | satellite-provider | medium (new MapGet + service method + tile-coord enumeration shared with onboard) | -| G5 | satellite-provider lacks the inbound ingest + voting endpoints from D-PROJ-2 leftover. | satellite-provider | large — already filed in `2026-05-09_satellite-provider-design-tasks.md` | -| G6 | Cross-workspace WGS↔tile coordinate math is duplicated (`helpers/wgs_converter.py` + `GeoUtils.cs`) with identical formulas. | suite | low priority — both sides match within 1 ULP. Document equivalence; do not refactor. | - -G1–G4 form ONE coherent satellite-provider task (the schema/UPSERT change + the bulk-list endpoint). -G5 stays as its own task per the prior leftover. - ---- - -## 4. Onboard impact - -For this repo (`gps-denied-onboard`), the conclusion is: - -- **AZ-304 still goes forward with the proposed schema** — it's strictly compatible with all scenarios above. -- **AZ-304 should add `location_hash` and `content_sha256` columns** to the `tiles` table — additive migration on top of `0001_initial.py`. This was already in the proposed Option C from the prior turn; this analysis confirms it. -- **No `flight_id` column changes** required onboard — already present. -- **AZ-303 contract update**: the `TileMetadataStore` Protocol's DTO must learn `location_hash: uuid.UUID` and `content_sha256: bytes` fields. Marked as a non-breaking minor bump (per the contract's versioning rules). -- **AZ-316 TileDownloader** (not yet implemented) will continue to call `enumerate_remote_coverage` against satellite-provider's planned `GET /api/satellite/tiles?bbox=...&list-only=true`. Until that endpoint exists onboard the downloader must use per-tile GETs via `/tiles/{z}/{x}/{y}` — slower but functional. Add this caveat to AZ-316. - ---- - -## 5. Hand-off — what happens next - -### Inside `gps-denied-onboard` (this repo) -- This document remains in `_docs/_process_leftovers/` until AZ-304 implementation incorporates it. Next `/autodev` invocation should: - 1. Read this file. - 2. Confirm the proposed schema (Option C from the prior turn + `location_hash`/`content_sha256`) with the user. - 3. Execute AZ-304 as an **additive** `0002` migration on top of `0001_initial.py`. -- The autodev state file is updated to reflect parking AZ-304 implementation pending user confirmation of the design. - -### Inside `satellite-provider` (cross-workspace) -A new todo task spec is written at: - -`/Users/obezdienie001/dev/azaion/suite/satellite-provider/_docs/02_tasks/todo/AZ-TBD_tile_identity_uuidv5_bulk_list.md` - -Next `/autodev` in that workspace must: -1. Claim a real `AZ-NNN` Jira ID for it. -2. Pull G1–G4 from § 3 above into a single PBI (estimated 3pt — moderate scope, low risk, additive migration). -3. Coordinate the migration via a feature flag: dual-write both `Guid.NewGuid()` and `uuidv5(...)` for a deprecation window, OR a one-time backfill that recomputes `id` and `location_hash` for all existing rows. - ---- - -## 6. Retrieval efficiency & on-disk layout (added in the same session) - -### 6.1 Why "multiple versions per cell" stays cheap - -The realistic row count per `location_hash` is bounded: **1 `google_maps` + ~1–5 `uav` rows from prior flights = 2–6 rows total**. That bound is what keeps "find best" cheap; a materialised pointer table is NOT needed at the bounded scale. - -**Q1 — Leaflet `/tiles/{z}/{x}/{y}` hot path** — required covering index: - -```sql -CREATE INDEX tiles_leaflet_path -ON tiles (location_hash, captured_at DESC, updated_at DESC, id DESC) -INCLUDE (file_path, content_type, etag, voting_status); -``` - -Query: - -```sql -SELECT file_path, content_type, etag FROM tiles -WHERE location_hash = $1 - AND voting_status IN ('trusted', NULL) -ORDER BY captured_at DESC, updated_at DESC, id DESC -LIMIT 1; -``` - -Layers above the DB do most of the work: `MemoryCache` keyed `tile_{z}_{x}_{y}` (L1, µs) → OS page cache (L2, sub-ms) → DB only on cold path. Index-only scan at N=6 ≈ 6 µs. - -**Q2 — Pre-flight provisioning** — replace the proposed `GET /api/satellite/tiles?bbox=...&list-only=true` with a POST inventory endpoint. The onboard side already has `helpers/wgs_converter.py.latlon_to_tile_xy` (deterministic, identical to C# `GeoUtils.WorldToTilePos`), so it can enumerate `(z, x, y)` locally and ask the server only "what do you have for THIS list?": - -``` -POST /api/satellite/tiles/inventory -Body: { "tiles": [{"z":18,"x":12345,"y":23456}, ...] } -Response: per-tile { id, location_hash, captured_at, resolution_m_per_px, estimated_bytes, source } | null -``` - -Server query (one round-trip, indexed via `location_hash` btree): - -```sql -SELECT DISTINCT ON (location_hash) - location_hash, id, captured_at, resolution_m_per_px, estimated_bytes, source -FROM tiles -WHERE location_hash = ANY($1::uuid[]) - AND voting_status IN ('trusted', NULL) -ORDER BY location_hash, captured_at DESC, updated_at DESC, id DESC; -``` - -At 2500 cells × N=6 = 15k rows scanned, `DISTINCT ON` collapses to 2500. Postgres handles this ~100–200 ms cold, ~30 ms warm — well inside the 500 ms AC-9 budget. - -**Defer the materialised `tile_current` pointer table** until production profiling demands it. Pre-optimisation is rejected. - -### 6.2 On-disk layout — source-segregated, flight-keyed - -Current state (already partially correct): -- `./tiles/{z}/{x}/{y}/...jpg` — Google Maps -- `./tiles/uav/{z}/{x}/{y}.jpg` — UAV (per `uav-tile-upload.md` v1.0.0) - -Target layout: - -``` -./tiles/ - google_maps/{z}/{x}/{y}.jpg # 1 file per cell, no flight_id - uav/{flight_id}/{z}/{x}/{y}.jpg # 1 file per flight per cell - onboard_ingest/{flight_id}/{z}/{x}/{y}.jpg # D-PROJ-2 future -``` - -Properties: -- The DB `file_path` column is authoritative; FS layout is informational/predictable but never parsed by code. -- Natural sharding on `x` keeps leaf directories bounded. -- `flight_id` in the path lets ops nuke "everything from flight X" with `rm -r ./tiles/uav/{flight_id}/` — the same primitive D-PROJ-2 voting needs for adversarial rollback. -- **No content-addressable / blob storage.** At our scale CAS adds complexity without measurable benefit. `content_sha256` gives dedup *detection* without forcing dedup *storage*. -- Page cache + `MemoryCache` carry the hot path; SSDs make adjacent-file locality irrelevant. - -### 6.3 Bulk retrieval — HTTP/2 multiplexing, NOT application-level batching - -The conventional "bulk fetch" instinct (one POST returns N tiles in a multipart body) is rejected. The right "bulk" mechanism is **HTTP/2 multiplexing** for both scenarios. - -#### Leaflet path (`/tiles/{z}/{x}/{y}`) - -Custom-TileLayer batching is rejected. Every per-tile property we want to keep — ETag-based 304s, browser cache, CDN/nginx edge cache, independent retry, display-as-arrives latency, conventional `image/jpeg` content-type — is lost the moment you bundle. HTTP/2 multiplexing gives the same wire-level efficiency (one TCP connection, header compression, response interleaving) while preserving every property above. This is what every production tile server (Mapbox, OSM, ArcGIS) does. - -**Action**: enable HTTP/2 (and HTTP/3 when feasible) at the nginx + Kestrel boundary in `satellite-provider`. Zero application code changes. - -Current state: neither side has HTTP/2 enabled today. ASP.NET Kestrel defaults to HTTP/1.1; flip `EndpointDefaults.Protocols = HttpProtocols.Http1AndHttp2` (or `Http1AndHttp2AndHttp3` over TLS). One-line change at the host config layer. - -#### UAV provisioning path (no Leaflet constraint) - -Multipart / tar / zip bundle responses are also rejected. Their downsides: - -- Server holds N file handles open and streams serially within one response → less parallelism than HTTP/2 multistream. -- Resume granularity collapses to "the whole bundle". -- Cache key becomes the bundle, not the tile. -- Client needs a parser. - -Per-tile GET over HTTP/2 wins on every axis, and the journal-based resume already designed in AZ-316 keeps its semantics intact. - -PMTiles is excellent for STATIC pre-built tile sets (Cloudflare/Protomaps); our DB is dynamic (UAV uploads invalidate any pre-built archive). Defer PMTiles until profiling demands it. - -The real efficiency win on this path is the **inventory pre-filter step** (already designed in § 6.1). Refined `TileDownloader` (AZ-316) flow: - -1. C11 computes the `(z, x, y)` list locally (deterministic slippy math — verified in the prior turn). -2. `POST /api/satellite/tiles/inventory` with the coord list → metadata-only response (`captured_at`, `estimated_bytes`, `resolution_m_per_px`). -3. Run AZ-307 freshness gate + RESTRICT-SAT-4 resolution gate + journal-skip against metadata → produce a "keep list". -4. Issue per-tile `GET /tiles/{z}/{x}/{y}` for the keep list over an HTTP/2 multiplexed connection: `httpx.Client(http2=True, limits=httpx.Limits(max_connections=10, max_keepalive_connections=10))`. -5. Append journal per successful write; per-tile retry on transient failure; resume on partial-success. - -In active-conflict zones where many tiles are stale or sub-spec, the inventory step lets C11 skip downloading bytes for 30–50% of the candidate set. That dwarfs any wire-level bulk-format gain. - -### 6.4 Existing test that needs updating in the new satellite-provider task - -`SatelliteProvider.Tests/UavTileFilePathTests.cs:23` — current expectation: - -``` -"UAV file paths follow `./tiles/uav/{zoom}/{x}/{y}.jpg` per `uav-tile-upload.md` v1.0.0" -``` - -becomes: - -``` -"UAV file paths follow `./tiles/uav/{flight_id}/{zoom}/{x}/{y}.jpg`" -``` - -The migration moves existing files into per-flight directories during the same backfill that adds `location_hash`/`content_sha256`. Mechanical operation; DB `file_path` column rewritten in the same transaction. - ---- - -## 7. Why this is parked as a leftover (not blocking) - -- The user explicitly asked for the analysis before deciding the AZ-304 reconciliation option. Decision is now informed, but the user did not yet pick A/B/C/D. The session is being closed for token reasons. -- Next `/autodev` will re-enter at the same decision point with this document and the prior turn's recommendation as input.