Files
satellite-provider/_docs/02_document/data_model.md
T
Oleksandr Bezdieniezhnykh 909f69cb3a [AZ-505] Tile inventory endpoint + HTTP/2 + Leaflet covering index
Production code:
- POST /api/satellite/tiles/inventory (XOR body, 5000-cap,
  most-recent-per-location_hash select, present/absent shaping).
- Kestrel HttpProtocols.Http1AndHttp2 on every listener (AC-5).
- Migration 015 creates tiles_leaflet_path covering index over
  (location_hash, captured_at DESC, updated_at DESC, id DESC)
  INCLUDE (file_path, source); drops superseded idx_tiles_location_hash.
- TileRepository.GetByTileCoordinatesAsync rewired to filter by
  location_hash (Index Only Scan via tiles_leaflet_path).
- TileRepository.GetTilesByLocationHashesAsync added with Npgsql-
  direct ANY($1::uuid[]) binding (Dapper IEnumerable expansion is
  incompatible with the array form).
- Uuidv5.LocationHashForTile centralises the UUIDv5(TileNamespace,
  "{z}/{x}/{y}") formula — single source of truth for the cross-repo
  invariant (gps-denied-onboard parity).

Contracts:
- New: contracts/api/tile-inventory.md v1.0.0.
- Bumped: contracts/data-access/tile-storage.md to v2.0.0 (joint
  ownership by AZ-503-foundation + AZ-505: schema + covering index +
  GetByTileCoordinatesAsync rewrite).

Tests:
- TileInventoryTests covers AC-1, AC-2 (DB-level), AC-4, AC-6.
- Http2MultiplexingTests covers AC-5 (20 concurrent multiplexed GETs
  over h2c via SocketsHttpHandler + AppContext Http2Unencrypted switch).
- LeafletPathIndexOnlyTests covers AC-3 (EXPLAIN (ANALYZE, BUFFERS)
  asserts Index Only Scan over tiles_leaflet_path with heap_blocks=0).

Docs:
- architecture.md, system-flows.md, data_model.md, module-layout.md,
  glossary.md, modules/api_program.md, modules/dataaccess_tile_repository.md,
  components/02_data_access/description.md all updated to reference the
  v2.0.0 tile-storage contract + new tile-inventory contract + AC-7.

Reports:
- batch_01_cycle6_report.md, batch_01_cycle6_review.md,
  implementation_completeness_cycle6_report.md (PASS),
  implementation_report_tile_inventory_cycle6.md.

Task spec moved todo/ -> done/.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-12 21:16:37 +03:00

239 lines
15 KiB
Markdown

# Satellite Provider — Data Model
## Entity-Relationship Diagram
```mermaid
erDiagram
TILES {
uuid id PK "AZ-503: deterministic UUIDv5"
int tile_zoom
float latitude
float longitude
float tile_size_meters
int tile_size_pixels
varchar image_type
varchar maps_version
int version
varchar source
timestamp captured_at
uuid flight_id "AZ-503 nullable"
uuid location_hash "AZ-503 NOT NULL"
bytea content_sha256 "AZ-503 nullable (app-NOT-NULL for new)"
uuid legacy_id "AZ-503 nullable, pre-migration id"
varchar file_path
int tile_x
int tile_y
timestamp created_at
timestamp updated_at
}
REGIONS {
uuid id PK
float latitude
float longitude
float size_meters
int zoom_level
varchar status
bool stitch_tiles
varchar csv_file_path
varchar summary_file_path
int tiles_downloaded
int tiles_reused
timestamp created_at
timestamp updated_at
}
ROUTES {
uuid id PK
varchar name
text description
float region_size_meters
int zoom_level
float total_distance_meters
int total_points
bool request_maps
bool maps_ready
bool create_tiles_zip
varchar tiles_zip_path
varchar csv_file_path
varchar summary_file_path
varchar stitched_image_path
timestamp created_at
timestamp updated_at
}
ROUTE_POINTS {
uuid id PK
uuid route_id FK
int sequence_number
float latitude
float longitude
varchar point_type
int segment_index
float distance_from_previous
timestamp created_at
}
ROUTE_REGIONS {
uuid route_id FK
uuid region_id FK
bool is_geofence
int geofence_polygon_index
timestamp created_at
}
ROUTES ||--o{ ROUTE_POINTS : "has many"
ROUTES ||--o{ ROUTE_REGIONS : "has many"
REGIONS ||--o{ ROUTE_REGIONS : "linked via"
```
## Tables
### tiles
Stores metadata for downloaded satellite imagery tiles. Each tile is a single image at a specific geographic coordinate and zoom level.
| Column | Type | Constraints | Description |
|--------|------|-------------|-------------|
| id | UUID | PK | AZ-503: deterministic UUIDv5 of `{tile_zoom}/{tile_x}/{tile_y}/{source}/{flight_id or zero-uuid}` under `Uuidv5.TileNamespace`. Stable across re-ingests; preserved on UPSERT conflict (AC-2 idempotence). Pre-AZ-503 rows have their original random Guid; migration 014 also copies that value into `legacy_id` for one-cycle forensics. |
| tile_zoom | INT | NOT NULL | Google Maps zoom level (1-20) |
| latitude | DOUBLE PRECISION | NOT NULL | Center latitude |
| longitude | DOUBLE PRECISION | NOT NULL | Center longitude |
| tile_size_meters | DOUBLE PRECISION | NOT NULL | Ground coverage in meters |
| tile_size_pixels | INT | NOT NULL | Image dimension in pixels |
| image_type | VARCHAR(10) | NOT NULL | Image format (e.g., "jpg") |
| maps_version | VARCHAR(50) | | Legacy free-form provider tag; post-AZ-373 new rows write NULL. Vestigial post-AZ-484 (column retained for forensics on pre-existing rows; no longer part of any index) |
| version | INT | NOT NULL, DEFAULT 2025 | Year-based versioning for cache invalidation. Vestigial post-AZ-484 — removed from the unique key by migration 012 (preparation for AZ-484); column retained nullable for backward compatibility |
| source | VARCHAR(32) | NOT NULL, DEFAULT 'google_maps' | AZ-484: producer of the imagery (`'google_maps'`, `'uav'`). Closed value set — see `tile-storage` v1.0.0 contract Inv-5 and `Common.Enums.TileSourceConverter`. Backfilled to `'google_maps'` for all pre-AZ-484 rows by migration 013 |
| captured_at | TIMESTAMP | NOT NULL | AZ-484: imagery acquisition timestamp (UTC). Drives most-recent-across-sources selection. Backfilled to `created_at` for pre-AZ-484 rows by migration 013 |
| file_path | VARCHAR(500) | NOT NULL | Relative path to stored image. **AZ-503 per-flight UAV layout** (supersedes AZ-488): `source='google_maps'` rows keep the legacy bucketed/timestamped path emitted by `StorageConfig.GetTileFilePath` (`{TilesDirectory}/{zoom}/{x_bucket}/{y_bucket}/tile_{zoom}_{x}_{y}_{ts}.jpg`). `source='uav'` rows live under `{TilesDirectory}/uav/{flight_id or 'none'}/{zoom}/{x}/{y}.jpg` — so `rm -rf ./tiles/uav/{flight_id}/` cleanly removes one flight's evidence without disturbing other flights at overlapping cells. The authoritative source marker is the `source` column; the per-source / per-flight path is implementation detail that keeps both producers' bytes individually addressable. |
| tile_x | INT | NOT NULL | Tile X coordinate (Slippy Map) |
| tile_y | INT | NOT NULL | Tile Y coordinate (Slippy Map) |
| flight_id | UUID | NULL | AZ-503: optional flight identifier. `NULL` for Google Maps tiles and anonymous UAV uploads; populated from `UavTileMetadata.FlightId` when present. Part of the UPSERT conflict key via `COALESCE(flight_id, '00000000-0000-0000-0000-000000000000'::uuid)`, so two flights uploading the same `(z, x, y)` cell produce two separate rows. |
| location_hash | UUID | NOT NULL | AZ-503: deterministic UUIDv5 of `{tile_zoom}/{tile_x}/{tile_y}` under `Uuidv5.TileNamespace`. Identical across flights and sources for the same cell. Backfilled in migration 014 via a `pg_temp.uuidv5` PL/pgSQL function. AZ-505 made this the keyed read column for `GetByTileCoordinatesAsync` (leaflet hot path) and the bulk lookup column for `GetTilesByLocationHashesAsync` (`POST /api/satellite/tiles/inventory`); covered by the `tiles_leaflet_path` index. |
| content_sha256 | BYTEA | NULL | AZ-503: SHA-256 digest of the JPEG body. Application-layer NOT NULL for new writes (enforced in `TileService.BuildTileEntity` + `UavTileUploadHandler.PersistAsync`); DB column is NULLABLE because legacy pre-migration rows cannot be backfilled reliably from disk. See `batch_02_cycle5_report.md` "Low maintainability finding" for the rationale. |
| legacy_id | UUID | NULL | AZ-503: pre-migration `id` value, copied by migration 014 for one-cycle forensics. To be dropped in a future migration once the cross-repo cutover settles. |
| created_at | TIMESTAMP | NOT NULL, DEFAULT NOW | |
| updated_at | TIMESTAMP | NOT NULL, DEFAULT NOW | |
**Indexes** (post-AZ-503):
- `idx_tiles_unique_identity` UNIQUE (tile_zoom, tile_x, tile_y, tile_size_meters, source, COALESCE(flight_id, '00000000-0000-0000-0000-000000000000'::uuid)) — created by migration 014; replaces the AZ-484 `idx_tiles_unique_location_source` (5-col float-based). Integer-only conflict columns eliminate float-rounding collisions; the `COALESCE` lets per-flight rows coexist while keeping single-row semantics for anonymous and `google_maps` rows.
- `tiles_leaflet_path` (location_hash, captured_at DESC, updated_at DESC, id DESC) INCLUDE (file_path, source) — created by AZ-505 migration 015. Drives `GET /tiles/{z}/{x}/{y}` (`Index Only Scan` for the leaflet hot path) and the `POST /api/satellite/tiles/inventory` bulk lookup (leading column matches the `WHERE location_hash = ANY($1::uuid[])` predicate). The lightweight `idx_tiles_location_hash` from migration 014 is dropped by migration 015 — equality lookups by `location_hash` use the leading column of the covering index, making the lookup-only index redundant.
- `idx_tiles_coordinates` (tile_zoom, tile_x, tile_y, version)
- `idx_tiles_zoom` (tile_zoom)
**Selection rule** (unchanged from AZ-484): `GetByTileCoordinatesAsync` and `GetTilesByRegionAsync` return the most-recent row across sources for any `(latitude, longitude, tile_zoom, tile_size_meters)` cell. Tie-break: `captured_at DESC, updated_at DESC, id DESC`. Region read uses `DISTINCT ON` to enforce one-row-per-cell at the SQL layer.
**UPSERT contract** (AZ-503): `INSERT … ON CONFLICT (tile_zoom, tile_x, tile_y, tile_size_meters, source, COALESCE(flight_id, '00000000-...'::uuid)) DO UPDATE` refreshes `file_path, latitude, longitude, captured_at, location_hash, content_sha256, updated_at`. `id` is intentionally NOT overwritten on conflict, preserving AC-2 idempotence (same inputs ⇒ same id). Two sources (or two flights of the same source) at the same cell coexist as separate rows.
### regions
Tracks region download requests and their processing status.
| Column | Type | Constraints | Description |
|--------|------|-------------|-------------|
| id | UUID | PK | Region request identifier |
| latitude | DOUBLE PRECISION | NOT NULL | Center latitude |
| longitude | DOUBLE PRECISION | NOT NULL | Center longitude |
| size_meters | DOUBLE PRECISION | NOT NULL | Square region side length |
| zoom_level | INT | NOT NULL | Zoom level for tiles |
| status | VARCHAR(20) | NOT NULL | pending / processing / completed / failed |
| stitch_tiles | BOOLEAN | NOT NULL, DEFAULT false | Whether to produce stitched image |
| csv_file_path | VARCHAR(500) | | Path to tile manifest CSV |
| summary_file_path | VARCHAR(500) | | Path to summary text |
| tiles_downloaded | INT | DEFAULT 0 | Count of newly downloaded tiles |
| tiles_reused | INT | DEFAULT 0 | Count of cache-hit tiles |
| created_at | TIMESTAMP | NOT NULL, DEFAULT NOW | |
| updated_at | TIMESTAMP | NOT NULL, DEFAULT NOW | |
**Indexes**:
- `idx_regions_status` (status)
### routes
Defines route paths with configuration for map tile generation.
| Column | Type | Constraints | Description |
|--------|------|-------------|-------------|
| id | UUID | PK | Route identifier |
| name | VARCHAR(200) | NOT NULL | Human-readable name |
| description | TEXT | | Optional description |
| region_size_meters | DOUBLE PRECISION | NOT NULL | Size of region per point |
| zoom_level | INT | NOT NULL | Zoom level for regions |
| total_distance_meters | DOUBLE PRECISION | NOT NULL | Total route length |
| total_points | INT | NOT NULL | Total point count (original + interpolated) |
| request_maps | BOOLEAN | NOT NULL, DEFAULT false | Whether to generate map tiles |
| maps_ready | BOOLEAN | NOT NULL, DEFAULT false | Whether map generation is complete |
| create_tiles_zip | BOOLEAN | NOT NULL, DEFAULT false | Whether to produce ZIP archive |
| tiles_zip_path | VARCHAR(500) | | Path to output ZIP |
| csv_file_path | VARCHAR(500) | | Route-level CSV |
| summary_file_path | VARCHAR(500) | | Route-level summary |
| stitched_image_path | VARCHAR(500) | | Route-level stitched image |
| created_at | TIMESTAMP | NOT NULL, DEFAULT NOW | |
| updated_at | TIMESTAMP | NOT NULL, DEFAULT NOW | |
### route_points
Stores all points along a route (both original waypoints and interpolated intermediate points).
| Column | Type | Constraints | Description |
|--------|------|-------------|-------------|
| id | UUID | PK | Point identifier |
| route_id | UUID | FK → routes.id, CASCADE | Parent route |
| sequence_number | INT | NOT NULL, UNIQUE(route_id, seq) | Order along route |
| latitude | DOUBLE PRECISION | NOT NULL | Point latitude |
| longitude | DOUBLE PRECISION | NOT NULL | Point longitude |
| point_type | VARCHAR(20) | NOT NULL | "original" or "intermediate" |
| segment_index | INT | NOT NULL | Which segment (between original points) |
| distance_from_previous | DOUBLE PRECISION | | Meters from previous point |
| created_at | TIMESTAMP | NOT NULL, DEFAULT NOW | |
**Indexes**:
- `idx_route_points_route` (route_id, sequence_number)
- `idx_route_points_coords` (latitude, longitude)
### route_regions
Junction table linking routes to their generated region requests, with geofence metadata.
| Column | Type | Constraints | Description |
|--------|------|-------------|-------------|
| route_id | UUID | FK → routes.id, CASCADE, PK | |
| region_id | UUID | FK → regions.id, CASCADE, PK | |
| is_geofence | BOOLEAN | NOT NULL, DEFAULT false | Whether point is inside a geofence |
| geofence_polygon_index | INTEGER | | Which polygon (0-based) the point is in |
| created_at | TIMESTAMP | NOT NULL, DEFAULT NOW | |
**Indexes**:
- `idx_route_regions_route` (route_id)
- `idx_route_regions_region` (region_id)
## Migration Strategy
- **Tool**: DbUp (embedded SQL scripts)
- **Execution**: Automatic on application startup (`DatabaseMigrator.Migrate()`)
- **Naming**: `NNN_DescriptiveName.sql` (sequential numbering)
- **Storage**: Embedded resources in `SatelliteProvider.DataAccess` assembly
- **Tracking**: DbUp's internal `schemaversions` table records which scripts have run
- **Rollback**: Not supported — forward-only migrations
## Migration History
| # | Migration | Purpose |
|---|-----------|---------|
| 001 | CreateTilesTable | Base tiles table |
| 002 | CreateRegionsTable | Region request tracking |
| 003 | CreateIndexes | Performance indexes |
| 004 | AddVersionColumn | Year-based tile versioning + dedup |
| 005 | CreateRoutesTables | Routes, route_points, route_regions |
| 006 | AddStitchTilesToRegions | Stitch flag on regions |
| 007 | AddRouteMapFields | request_maps, maps_ready, file paths on routes |
| 008 | AddGeofenceFlagToRouteRegions | is_geofence flag |
| 009 | AddGeofencePolygonIndex | Polygon index tracking |
| 010 | AddTilesZipToRoutes | ZIP generation fields |
| 011 | AddTileCoordinates | Slippy map X/Y + rename zoom_level → tile_zoom |
| 012 | DropTileVersionConstraint | Drops legacy 5-col `(…, version)` unique index; replaces with 4-col `idx_tiles_unique_location` (preparation for AZ-484) |
| 013 | AddTileSourceAndCapturedAt | AZ-484: adds `source` (default `'google_maps'`) + `captured_at` columns; backfills both for pre-existing rows; replaces 4-col unique with 5-col `idx_tiles_unique_location_source`. Transactional; idempotent against partial replays |
| 014 | AddTileIdentityColumns | AZ-503: adds `flight_id` (NULL), `location_hash` (NOT NULL after backfill), `content_sha256` (NULL), `legacy_id` (NULL); backfills `location_hash` via `pg_temp.uuidv5(TILE_NAMESPACE, "{tile_zoom}/{tile_x}/{tile_y}")` and copies `id → legacy_id` for every pre-existing row; drops `idx_tiles_unique_location_source` (AZ-484) and creates `idx_tiles_unique_identity` (integer + flight-aware) + `idx_tiles_location_hash`. Enables `pgcrypto` for the in-migration SHA-1 digest. Transactional; safe to replay (column adds are `IF NOT EXISTS`-equivalent, backfill is idempotent on `location_hash` because UUIDv5 is deterministic) |
| 015 | AddTilesLeafletPathIndex | AZ-505: creates `tiles_leaflet_path (location_hash, captured_at DESC, updated_at DESC, id DESC) INCLUDE (file_path, source)` covering index for the leaflet hot path; drops the superseded `idx_tiles_location_hash` from migration 014 (equality lookups by `location_hash` now use the leading column of the covering index). Transactional; runs inside DbUp's per-script transaction (incompatible with `CREATE INDEX CONCURRENTLY`) — schedule deploys to a low-traffic window on populated tables. INCLUDE columns intentionally narrow (`file_path, source`); inventory queries that need more columns trigger a bounded heap fetch (per AZ-505 NFR-Perf-2 budget). |