mirror of
https://github.com/azaion/satellite-provider.git
synced 2026-06-21 09:01:16 +00:00
[AZ-503] [AZ-504] cycle 5 new-task: tile identity + perf-script-fix
- AZ-503 (3 SP, epic AZ-483) — Tile identity → UUIDv5 deterministic id; integer-only UPSERT with COALESCE(flight_id) per-flight separation; content_sha256 column; POST /api/satellite/tiles/inventory bulk-list endpoint; HTTP/2 at Kestrel edge. Cross-workspace handoff from gps-denied-onboard (AZ-304 / AZ-316 counterpart). Supersedes the AZ-484 UPSERT-conflict-key portion. - AZ-504 (1 SP, epic AZ-483) — Fix scripts/run-performance-tests.sh lines 416-417: grep -o | wc -l + set -o pipefail kills PT-08 when rejected=0. Closes the replay obligation for the cycle-3 perf-harness leftover (leftover deletion gated on green full perf run, AC-4). Updates _dependencies_table.md with cycle 5 entries and records replay attempt #4 against the perf-cycle3 leftover (PBI opened — leftover still stays until AZ-504 lands and full perf run is green). State advanced to Step 10 (Implement). Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -92,6 +92,15 @@ Source: cycle-3 perf-harness leftover replay surfaced the host SDK / project SDK
|
|||||||
|------|-------|-----------|--------|--------|
|
|------|-------|-----------|--------|--------|
|
||||||
| AZ-500 | .NET 8 → .NET 10 migration (TFM + SDK pin + Docker images + CI images + Microsoft.AspNetCore.* + Microsoft.Extensions.* + Serilog.AspNetCore) | — | 5 | Done (In Testing) |
|
| AZ-500 | .NET 8 → .NET 10 migration (TFM + SDK pin + Docker images + CI images + Microsoft.AspNetCore.* + Microsoft.Extensions.* + Serilog.AspNetCore) | — | 5 | Done (In Testing) |
|
||||||
|
|
||||||
|
### Step 9 cycle 5 — New Task: Tile identity foundation + perf-harness fix (AZ-483 epic)
|
||||||
|
|
||||||
|
Source: cross-workspace handoff from `gps-denied-onboard` (tile-schema scenario analysis) for AZ-503; cycle-3 perf-harness leftover replay-obligation closure for AZ-504. Both attach to epic AZ-483 (Multi-source tile storage + UAV upload, Layer 2) — AZ-503 supersedes the AZ-484 UPSERT-conflict-key portion, AZ-504 unblocks PT-08 measurement.
|
||||||
|
|
||||||
|
| Task | Title | Depends On | Points | Status |
|
||||||
|
|------|-------|-----------|--------|--------|
|
||||||
|
| AZ-503 | Tile identity → UUIDv5 + integer UPSERT + bulk-list endpoint | AZ-484 (supersedes UPSERT-conflict-key portion of AZ-484 selection rule) | 3 | To Do |
|
||||||
|
| AZ-504 | Perf script: fix grep \| wc -l pipefail crash in PT-08 | — (independent; references AZ-488 PT-08 threshold) | 1 | To Do |
|
||||||
|
|
||||||
## Execution Order
|
## Execution Order
|
||||||
|
|
||||||
### Step 6
|
### Step 6
|
||||||
@@ -135,6 +144,13 @@ Single task; coordinated cross-cutting bump.
|
|||||||
|
|
||||||
1. AZ-500 (5 SP) — .NET 8 → .NET 10 migration. Self-contained but touches every csproj, both Dockerfiles, run-tests.sh, .woodpecker/01-test.yml, global.json, and two docs files.
|
1. AZ-500 (5 SP) — .NET 8 → .NET 10 migration. Self-contained but touches every csproj, both Dockerfiles, run-tests.sh, .woodpecker/01-test.yml, global.json, and two docs files.
|
||||||
|
|
||||||
|
### Step 9 cycle 5
|
||||||
|
|
||||||
|
Independent tracks — both can run in parallel; no ordering constraint between them. AZ-504 is a prerequisite for the cycle's Step 15 Performance Test to deliver a green PT-08 reading (and therefore for deleting the perf-cycle3 leftover); AZ-503 is the cycle's main feature.
|
||||||
|
|
||||||
|
1. AZ-504 (1 SP) — cheapest unblocker; lands first to clear PT-08 reporting for the cycle.
|
||||||
|
2. AZ-503 (3 SP) — main feature; data-model + API; cross-workspace alignment with `gps-denied-onboard` AZ-304 / AZ-316.
|
||||||
|
|
||||||
## Total Effort
|
## Total Effort
|
||||||
|
|
||||||
Step 6: 6 tasks, 17 story points
|
Step 6: 6 tasks, 17 story points
|
||||||
@@ -144,6 +160,7 @@ Step 9 cycle 1: 1 task created (AZ-484, 5 pts)
|
|||||||
Step 9 cycle 2: 2 tasks created (AZ-487 = 2 pts, AZ-488 = 8 pts over-cap user-accepted) — total 10 pts
|
Step 9 cycle 2: 2 tasks created (AZ-487 = 2 pts, AZ-488 = 8 pts over-cap user-accepted) — total 10 pts
|
||||||
Step 9 cycle 3: 6 tasks created (AZ-491 = 3 pts, AZ-492 = 3 pts, AZ-493 = 2 pts, AZ-494 = 2 pts, AZ-495 = 1 pt, AZ-496 = 2 pts) — total 13 pts
|
Step 9 cycle 3: 6 tasks created (AZ-491 = 3 pts, AZ-492 = 3 pts, AZ-493 = 2 pts, AZ-494 = 2 pts, AZ-495 = 1 pt, AZ-496 = 2 pts) — total 13 pts
|
||||||
Step 9 cycle 4: 1 task created (AZ-500 = 5 pts)
|
Step 9 cycle 4: 1 task created (AZ-500 = 5 pts)
|
||||||
|
Step 9 cycle 5: 2 tasks created (AZ-503 = 3 pts, AZ-504 = 1 pt) — total 4 pts
|
||||||
|
|
||||||
## Coverage Verification
|
## Coverage Verification
|
||||||
|
|
||||||
|
|||||||
@@ -0,0 +1,206 @@
|
|||||||
|
# Tile identity → UUIDv5 + integer UPSERT + bulk-list endpoint
|
||||||
|
|
||||||
|
**Task**: AZ-503_tile_identity_uuidv5_bulk_list
|
||||||
|
**Name**: Tile identity → UUIDv5 + integer UPSERT + bulk-list endpoint
|
||||||
|
**Description**: Tile identity in the `tiles` table is currently random (`Guid.NewGuid()`), and the UPSERT conflict key uses `double precision` `latitude`/`longitude` and omits `flight_id`, which (a) makes idempotent re-insert fragile against float rounding and (b) destroys per-flight evidence required by the D-PROJ-2 multi-flight voting layer when two UAVs upload the same `(z, x, y)` cell. This task migrates tile identity to deterministic UUIDv5 (`id = uuidv5(NAMESPACE, "{z}/{x}/{y}/{source}/{flight_id or 'none'}")`), adds a `location_hash` UUIDv5 (`uuidv5(..., "{z}/{x}/{y}")`) for efficient cell-bag queries (UI Leaflet path + future voting), switches the UPSERT conflict key to integer-only `(zoom_level, tile_x, tile_y, tile_size_meters, source, COALESCE(flight_id, '00000000-0000-0000-0000-000000000000'::uuid))`, adds a `content_sha256 bytea NOT NULL` column for content-addressable dedup, and adds the `POST /api/satellite/tiles/inventory` endpoint that the onboard `TileDownloader` (`gps-denied-onboard` AZ-316) needs for bbox→tile enumeration during pre-flight provisioning.
|
||||||
|
**Complexity**: 3 points
|
||||||
|
**Dependencies**: AZ-484 (UPSERT-per-source + AZ-484 selection rule — done; this task supersedes the UPSERT conflict-key portion)
|
||||||
|
**Component**: SatelliteProvider.DataAccess + SatelliteProvider.Services.TileDownloader + SatelliteProvider.Api
|
||||||
|
**Tracker**: AZ-503
|
||||||
|
**Epic**: AZ-483 — Multi-source tile storage + UAV upload (Layer 2)
|
||||||
|
|
||||||
|
## Origin
|
||||||
|
|
||||||
|
Cross-workspace surface from `gps-denied-onboard` `_docs/_process_leftovers/2026-05-12_tile-schema-scenario-analysis.md`. The onboard repo's `AZ-304` C6 Postgres schema is being designed with `location_hash` + `content_sha256` columns and a deterministic `id`; this satellite-provider task is the parent-suite counterpart so both sides of the wire agree on tile identity semantics.
|
||||||
|
|
||||||
|
Related: `gps-denied-onboard` `_docs/_process_leftovers/2026-05-09_satellite-provider-design-tasks.md` Design Task #2 (multi-flight trust / voting layer) is the downstream consumer of this task's `flight_id`-aware UPSERT and `content_sha256`.
|
||||||
|
|
||||||
|
## Problem
|
||||||
|
|
||||||
|
Three concrete issues in the current code:
|
||||||
|
|
||||||
|
1. **Random tile id** — `SatelliteProvider.Services.TileDownloader/TileService.cs:149` and `UavTileUploadHandler.cs:160` use `Id = Guid.NewGuid()`. The `id` is opaque, non-deterministic, and useless as a content/location handle. Onboard cannot pre-compute or compare ids before round-tripping to the DB.
|
||||||
|
|
||||||
|
2. **Float-based UPSERT collapses multi-flight evidence** — `TileRepository.InsertAsync` line 124: `ON CONFLICT (latitude, longitude, tile_zoom, tile_size_meters, source) DO UPDATE SET file_path = EXCLUDED.file_path, ...`. Problems:
|
||||||
|
- `latitude` / `longitude` are `double precision`; Postgres conflict detection requires bit-identical floats. Re-uploads of the same tile computed from independently-rounded center coords can miss the conflict and create duplicate rows. AZ-484's `DISTINCT ON` read-side fix papers over the duplicate but does not prevent it.
|
||||||
|
- The conflict key omits `flight_id`. When two flights upload `source='uav'` for the same cell, `DO UPDATE` overwrites Flight A with Flight B. **D-PROJ-2 voting (Design Task #2 from the 2026-05-09 leftover) needs both rows alive.**
|
||||||
|
|
||||||
|
3. **No bulk-list endpoint for pre-flight provisioning** — onboard `TileDownloader` (`gps-denied-onboard` AZ-316) calls `GET /api/satellite/tiles?bbox=...&zoom=...&list-only=true` to size and enumerate a pre-flight cache build. **This endpoint does not exist** in `SatelliteProvider.Api/Program.cs`. The closest is `GetTilesByRegionAsync` (private, lat/lon-meters input, no HTTP surface) and `GET /api/satellite/tiles/latlon` (single tile). Operators today cannot pre-size a cache build over the bbox the mission planner produces.
|
||||||
|
|
||||||
|
4. **No content digest** — there is no `content_sha256` column. Same JPEG re-uploaded under a different `source` or `flight_id` is indistinguishable from a re-encode.
|
||||||
|
|
||||||
|
## Outcome
|
||||||
|
|
||||||
|
- Tile id becomes deterministic: `id = uuidv5(TILE_NAMESPACE, "{z}/{x}/{y}/{source}/{flight_id or '00000000-0000-0000-0000-000000000000'}")`. Same inputs always produce the same id; idempotent inserts no longer require a "did this row exist?" pre-check.
|
||||||
|
- `location_hash uuid NOT NULL` column: `uuidv5(TILE_NAMESPACE, "{z}/{x}/{y}")`. Drives Scenario 1 (UI Leaflet `/tiles/{z}/{x}/{y}`) lookup as a single hash-index probe, and Scenario 6 (voting) as a single cell-bag fetch.
|
||||||
|
- `content_sha256 bytea NOT NULL` column: SHA-256 of the JPEG body, computed at insert time. Enables dedup detection ("Flight B uploaded a byte-identical tile to Flight A — flag for inspection") and integrity checks on the read path.
|
||||||
|
- UPSERT conflict key becomes:
|
||||||
|
```
|
||||||
|
ON CONFLICT (zoom_level, tile_x, tile_y, tile_size_meters, source,
|
||||||
|
COALESCE(flight_id, '00000000-0000-0000-0000-000000000000'::uuid))
|
||||||
|
DO UPDATE SET file_path = EXCLUDED.file_path, captured_at = EXCLUDED.captured_at, updated_at = EXCLUDED.updated_at
|
||||||
|
```
|
||||||
|
Integer-only equality, per-flight separation. (Note: requires `zoom_level` column — the current schema uses `tile_zoom`; either rename or use `tile_zoom` consistently — this task keeps the existing column name `tile_zoom` and adjusts the onboard-side spec to match.)
|
||||||
|
- **New endpoint: `POST /api/satellite/tiles/inventory`** (preferred over the originally-proposed `GET /api/satellite/tiles?bbox=...`). Body is a list of `(z, x, y)` coords (or pre-computed `location_hash` UUIDs); response is one entry per input. Justification: the onboard side already has the deterministic slippy-tile math (`helpers/wgs_converter.py.latlon_to_tile_xy`, identical to C# `GeoUtils.WorldToTilePos`); making the server re-enumerate the bbox is wasted work. POST inventory is also the natural batched-existence-check shape — single round-trip, indexed lookup per row.
|
||||||
|
```
|
||||||
|
POST /api/satellite/tiles/inventory
|
||||||
|
Body: { "tiles": [ { "z": 18, "x": 12345, "y": 23456 }, ... ] }
|
||||||
|
// OR { "location_hashes": [ "uuid-v5", ... ] }
|
||||||
|
Response: {
|
||||||
|
"results": [
|
||||||
|
{
|
||||||
|
"tile_x": 12345, "tile_y": 23456, "tile_zoom": 18, "location_hash": "uuid-v5...",
|
||||||
|
"present": true,
|
||||||
|
"id": "uuid-v5...", "captured_at": "...", "resolution_m_per_px": 0.3,
|
||||||
|
"estimated_bytes": 42017, "source": "google_maps", "flight_id": null
|
||||||
|
},
|
||||||
|
{ "tile_x": 12346, "tile_y": 23456, "tile_zoom": 18, "location_hash": "uuid-v5...", "present": false }
|
||||||
|
]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
Server-side query (single round-trip):
|
||||||
|
```sql
|
||||||
|
SELECT DISTINCT ON (location_hash)
|
||||||
|
location_hash, id, captured_at, resolution_m_per_px, estimated_bytes, source, flight_id
|
||||||
|
FROM tiles
|
||||||
|
WHERE location_hash = ANY($1::uuid[])
|
||||||
|
AND voting_status IN ('trusted', NULL)
|
||||||
|
ORDER BY location_hash, captured_at DESC, updated_at DESC, id DESC;
|
||||||
|
```
|
||||||
|
- Required covering index for Leaflet hot path:
|
||||||
|
```sql
|
||||||
|
CREATE INDEX tiles_leaflet_path
|
||||||
|
ON tiles (location_hash, captured_at DESC, updated_at DESC, id DESC)
|
||||||
|
INCLUDE (file_path, content_type, etag, voting_status);
|
||||||
|
```
|
||||||
|
Leaflet `/tiles/{z}/{x}/{y}` becomes an index-only scan (no heap fetch when `voting_status='trusted'`):
|
||||||
|
```sql
|
||||||
|
SELECT file_path FROM tiles
|
||||||
|
WHERE location_hash = $1 AND voting_status IN ('trusted', NULL)
|
||||||
|
ORDER BY captured_at DESC, updated_at DESC, id DESC LIMIT 1;
|
||||||
|
```
|
||||||
|
- Migration: additive — add `location_hash`, `content_sha256` as nullable, backfill in a single `UPDATE`, then set `NOT NULL`. Drop the old `UNIQUE (latitude, longitude, tile_zoom, tile_size_meters, source)` constraint and add the new integer-keyed one. Backfill `id` IS NOT trivial — see Risk 1.
|
||||||
|
- The UUIDv5 generator is implemented inline in `SatelliteProvider.Common/Utils/Uuidv5.cs` (RFC 9562 algorithm — 60 lines of C#, MD5 dropped, SHA-1 only). .NET 10 has no native `Guid.CreateVersion5` (only `CreateVersion7`); we do NOT add a 3rd-party dep for this.
|
||||||
|
|
||||||
|
## Scope
|
||||||
|
|
||||||
|
### Included
|
||||||
|
- `SatelliteProvider.Common/Utils/Uuidv5.cs` — pure-C# RFC 9562 UUIDv5 implementation, unit-tested against the Python `uuid.uuid5` reference vectors (the onboard side uses Python `uuid.uuid5`; both must produce byte-identical output for the same name + namespace).
|
||||||
|
- `SatelliteProvider.DataAccess` — Dapper SQL changes: new columns, new UPSERT, new SELECT shapes. `TileRepository.GetByLocationHashAsync` and `TileRepository.InventoryAsync(uuid[])` added; `GetByTileCoordinatesAsync` rewritten to use `location_hash`. Existing `tiles_leaflet_path` covering index added.
|
||||||
|
- `SatelliteProvider.Services.TileDownloader` — `BuildTileEntity` no longer calls `Guid.NewGuid()`; it computes the UUIDv5 and the `location_hash` from the deterministic inputs. Same change in `UavTileUploadHandler`.
|
||||||
|
- `SatelliteProvider.Api/Program.cs` — new MapPost route `/api/satellite/tiles/inventory`; existing `/tiles/{z}/{x}/{y}` Leaflet path migrated to use `location_hash`-keyed query against the covering index.
|
||||||
|
- Migration script in the existing migrations tool (whichever the repo uses — Flyway/EFCore/handwritten SQL; this task uses whatever is already established).
|
||||||
|
- **On-disk layout migration**: UAV tiles move from `./tiles/uav/{zoom}/{x}/{y}.jpg` to `./tiles/uav/{flight_id}/{zoom}/{x}/{y}.jpg`. Google Maps tiles stay at `./tiles/{zoom}/{x}/{y}/...jpg` (or normalise to `./tiles/google_maps/{zoom}/{x}/{y}.jpg` if the cleanup is cheap). The DB `file_path` column is rewritten in the same backfill that populates `location_hash`/`content_sha256`. Test `SatelliteProvider.Tests/UavTileFilePathTests.cs:23` is updated to assert the new path shape.
|
||||||
|
- OpenAPI annotations for the new endpoint.
|
||||||
|
- Unit tests for `Uuidv5` against Python reference vectors.
|
||||||
|
- Integration tests for the new POST `/api/satellite/tiles/inventory` surface (use existing `docker-compose.tests.yml` fixture).
|
||||||
|
- Integration test for multi-flight upload — confirms two `source='uav'` rows for the same `(z, x, y)` from different `flight_id`s coexist on disk (different paths) and in DB (different rows, same `location_hash`).
|
||||||
|
- **Enable HTTP/2 (and HTTP/3 over TLS where feasible)** at the Kestrel endpoint boundary: `EndpointDefaults.Protocols = HttpProtocols.Http1AndHttp2AndHttp3`. Verify the dev `docker-compose` nginx reverse proxy also has `http2 on;` in the relevant `listen` directive. This is the bulk-retrieval mechanism for BOTH Leaflet (browser opens one TCP connection, multiplexes 30+ tile streams, HPACK compresses repeated headers) and UAV provisioning (`httpx.Client(http2=True)` on the onboard side). No application-level batching is added.
|
||||||
|
- **No materialised `tile_current` pointer table** — deferred until production profiling demands it. Pre-optimisation rejected.
|
||||||
|
- **No content-addressable / blob storage layout** — `content_sha256` is for dedup *detection* (and integrity), not dedup *storage*. CAS adds complexity without measurable benefit at our scale.
|
||||||
|
- **No multipart / tar / zip bundle endpoint** for UAV provisioning — rejected in favour of inventory POST + per-tile GET over HTTP/2 multiplex. The bundle approach collapses resume granularity, loses per-tile cacheability, and gives no throughput win over HTTP/2 multistream. PMTiles archive is excellent for STATIC tile sets (Cloudflare/Protomaps) but our DB is dynamic — UAV uploads invalidate any pre-built archive. Defer PMTiles until profiling demands it.
|
||||||
|
|
||||||
|
### Excluded
|
||||||
|
- The voting / trust-promotion layer (Design Task #2 from 2026-05-09 leftover) — separate task. This task makes voting POSSIBLE by keeping per-flight rows; it does NOT implement voting.
|
||||||
|
- Onboard companion auth (mTLS / signed payloads) — already covered by D-PROJ-2 Design Task #1.
|
||||||
|
- Renaming the `tile_zoom` column to `zoom_level` (rule: never rename columns without explicit confirmation — see `coderule.mdc`).
|
||||||
|
- Per-flight key management (already covered by gps-denied-onboard AZ-318).
|
||||||
|
- Removing the existing `latitude`/`longitude` columns. They stay as advisory center-of-tile data.
|
||||||
|
|
||||||
|
## Acceptance Criteria
|
||||||
|
|
||||||
|
**AC-1: UUIDv5 reference vectors match Python**
|
||||||
|
Given the test vector `namespace = TILE_NAMESPACE` and `name = "18/12345/23456/google_maps/00000000-0000-0000-0000-000000000000"`
|
||||||
|
When `Uuidv5.Create(TILE_NAMESPACE, name)` runs
|
||||||
|
Then the resulting `Guid` is byte-identical to Python `uuid.uuid5(TILE_NAMESPACE, "18/12345/23456/google_maps/00000000-0000-0000-0000-000000000000")` for ≥10 randomly-generated test cases.
|
||||||
|
|
||||||
|
**AC-2: Insert is idempotent on identical inputs**
|
||||||
|
Given a tile is inserted with `(tile_zoom=18, tile_x=A, tile_y=B, tile_size_meters=S, source='google_maps', flight_id=NULL)` returning `id=X`
|
||||||
|
When the same insert is repeated
|
||||||
|
Then exactly ONE row exists in `tiles`; the returned `id == X`; the `id` column is not regenerated; `updated_at` IS refreshed but `created_at` is NOT.
|
||||||
|
|
||||||
|
**AC-3: Multi-flight UAV uploads coexist**
|
||||||
|
Given two `source='uav'` inserts for the same `(tile_zoom, tile_x, tile_y, tile_size_meters)` with `flight_id=F1` and `flight_id=F2` (F1 ≠ F2)
|
||||||
|
When both inserts complete
|
||||||
|
Then TWO rows exist in `tiles`; each has its own `id`; both rows share the same `location_hash`.
|
||||||
|
|
||||||
|
**AC-4: Float rounding does not break idempotence**
|
||||||
|
Given an insert with `latitude=47.123456789012345` and another insert recomputed from `tile_center = TileToWorldPos(x, y, z)` (slightly different float representation)
|
||||||
|
When both inserts target the same `(tile_zoom, tile_x, tile_y, tile_size_meters, source, flight_id)`
|
||||||
|
Then exactly ONE row results; the conflict triggers despite float differences (because the new UPSERT key does not include `latitude`/`longitude`).
|
||||||
|
|
||||||
|
**AC-5: Inventory endpoint returns one entry per requested coord**
|
||||||
|
Given a POST body of 25 `(z, x, y)` coords at zoom 18, with 12 already in the DB and 13 absent
|
||||||
|
When `POST /api/satellite/tiles/inventory` is called
|
||||||
|
Then `results` contains 25 entries in the SAME ORDER as the input; 12 entries have `present=true` with `id`/`location_hash`/`captured_at` populated, 13 entries have `present=false` with `location_hash` populated (computed via UUIDv5) and `id=null`; per-tile `estimated_bytes` is `null|int`.
|
||||||
|
|
||||||
|
**AC-6: Leaflet path returns most-recent variant via location_hash**
|
||||||
|
Given multiple rows for `(z, x, y)` from different sources/flights
|
||||||
|
When `GET /tiles/{z}/{x}/{y}` is called
|
||||||
|
Then ONE tile body is returned, selected by `WHERE location_hash = $1 ORDER BY captured_at DESC, updated_at DESC, id DESC LIMIT 1` (semantically identical to AZ-484's prior rule, now using `location_hash`).
|
||||||
|
|
||||||
|
**AC-7: content_sha256 is computed and persisted**
|
||||||
|
Given a UAV upload of a JPEG with known SHA-256
|
||||||
|
When the insert lands
|
||||||
|
Then `content_sha256` matches the externally-computed digest; a follow-up insert of a byte-identical body produces the SAME `content_sha256` value.
|
||||||
|
|
||||||
|
**AC-8: Migration is reversible (best-effort)**
|
||||||
|
Given the migration runs forward on a populated `tiles` table
|
||||||
|
When the back-migration runs
|
||||||
|
Then the table is restored to the pre-migration shape; data loss is limited to the new columns (`location_hash`, `content_sha256`). (Best-effort because UPSERT key changes are awkward to reverse cleanly.)
|
||||||
|
|
||||||
|
**AC-9: Performance — inventory endpoint ≤ 500 ms for 2500 tiles**
|
||||||
|
Given a POST body listing 2500 `(z, x, y)` coords at zoom 18 against a populated DB (average ~3 versions per cell across `google_maps` + `uav` sources)
|
||||||
|
When `POST /api/satellite/tiles/inventory` is called
|
||||||
|
Then the response arrives within 500 ms (95th percentile over 20 calls). Index-only scan via `tiles_leaflet_path` is the expected plan.
|
||||||
|
|
||||||
|
**AC-10: Leaflet hot path is index-only**
|
||||||
|
Given the `tiles_leaflet_path` covering index exists and the table has ≥ 100k rows
|
||||||
|
When `EXPLAIN (ANALYZE, BUFFERS) SELECT file_path FROM tiles WHERE location_hash = $1 AND voting_status IN ('trusted', NULL) ORDER BY captured_at DESC LIMIT 1` is run
|
||||||
|
Then the plan is `Index Only Scan using tiles_leaflet_path`; `Heap Fetches = 0` (visibility map fully built); total time < 0.5 ms.
|
||||||
|
|
||||||
|
**AC-12: HTTP/2 multiplexed responses**
|
||||||
|
Given Kestrel is configured with `Http1AndHttp2AndHttp3` (or `Http1AndHttp2` over plain TLS without QUIC support)
|
||||||
|
When a single `httpx.Client(http2=True)` issues 20 concurrent `GET /tiles/{z}/{x}/{y}` requests
|
||||||
|
Then the responses arrive over ONE TCP connection (verifiable via packet capture / `httpx.Response.http_version == 'HTTP/2'`); all 20 responses interleave on the wire; total wall-clock time < 2× single-tile latency (vs. 20× for HTTP/1.1 without pipelining); per-tile ETags + `Cache-Control` headers are preserved unchanged.
|
||||||
|
|
||||||
|
**AC-11: Per-flight on-disk separation**
|
||||||
|
Given two UAV uploads of the same `(z, x, y)` from `flight_id=F1` and `flight_id=F2`
|
||||||
|
When both inserts complete and the backing JPEGs are persisted
|
||||||
|
Then two distinct files exist at `./tiles/uav/{F1}/{z}/{x}/{y}.jpg` and `./tiles/uav/{F2}/{z}/{x}/{y}.jpg`; `rm -rf ./tiles/uav/{F1}/` removes ONLY Flight F1's evidence (Flight F2's file is untouched); the DB `file_path` columns reflect the per-flight paths.
|
||||||
|
|
||||||
|
## Constraints
|
||||||
|
|
||||||
|
- **No column renames**: keep `tile_zoom`, `tile_x`, `tile_y`, `latitude`, `longitude` exactly as named today. The onboard side (`AZ-304`) is responsible for matching column names on its own table.
|
||||||
|
- **UUID namespace MUST be agreed cross-repo**: pick a fixed UUID (e.g. `urn:uuid:5b8d0c2e-...`) and pin it in BOTH `SatelliteProvider.Common/Utils/Uuidv5.cs` and the onboard `gps_denied_onboard/components/c6_tile_cache/_uuid.py`. Document the chosen value in this task's review.
|
||||||
|
- **No third-party UUIDv5 dependency**: pure-C# RFC 9562 implementation, ≤80 LoC.
|
||||||
|
- **Migration must run online**: the `tiles` table is the busiest table in the service. Adding the new columns must be a non-blocking `ALTER TABLE` followed by a backfill in batches.
|
||||||
|
- **Existing AZ-484 selection rule is preserved**: the read-side `ORDER BY captured_at DESC, updated_at DESC, id DESC` tie-break stays; the only change is the WHERE clause uses `location_hash` instead of `(tile_zoom, tile_x, tile_y)`.
|
||||||
|
|
||||||
|
## Risks & Mitigation
|
||||||
|
|
||||||
|
**Risk 1: Backfilling `id` on existing rows is irreversible**
|
||||||
|
- *Risk*: existing rows have random `Guid` ids. If we overwrite them with computed UUIDv5 values, any cached external reference to the old id (e.g. operator UI bookmarks, audit log entries) becomes stale.
|
||||||
|
- *Mitigation*: add a `legacy_id uuid NULL` column populated from the existing `id` before the backfill. The old id is preserved for diagnostic queries. The new UUIDv5 `id` takes over. After a deprecation window (one cycle), drop `legacy_id`.
|
||||||
|
|
||||||
|
**Risk 2: UUIDv5 namespace divergence between Python and C#**
|
||||||
|
- *Risk*: subtle bug in the C# SHA-1 impl produces ids that differ from Python's. Cross-repo lookups fail silently.
|
||||||
|
- *Mitigation*: AC-1 requires byte-identical output across ≥10 vectors. The vectors are generated by Python and pasted into the C# test fixture as fixed-string expectations.
|
||||||
|
|
||||||
|
**Risk 3: Migration deadlock under load**
|
||||||
|
- *Risk*: `ALTER TABLE ADD COLUMN ... NOT NULL DEFAULT '...'` on a 10M+ row table locks the table for minutes.
|
||||||
|
- *Mitigation*: 3-step migration — (a) ADD COLUMN nullable; (b) UPDATE in 1000-row batches with `pg_sleep(0.1)` between batches; (c) SET NOT NULL after backfill. Documented in the migration file.
|
||||||
|
|
||||||
|
**Risk 4: Onboard `TileDownloader` (AZ-316) writes against this endpoint before it exists**
|
||||||
|
- *Risk*: ordering — onboard AZ-316 might be implemented before this task lands. Production calls hit 404.
|
||||||
|
- *Mitigation*: the onboard `TileDownloader` already has a fallback path (per-tile GET via `/tiles/{z}/{x}/{y}`); document that fallback in AZ-316's caveats and gate the `list-only=true` path behind a feature flag `c11.use_bulk_list_endpoint` (default `false` until this satellite-provider task is in production).
|
||||||
|
|
||||||
|
## References
|
||||||
|
|
||||||
|
- `gps-denied-onboard/_docs/_process_leftovers/2026-05-12_tile-schema-scenario-analysis.md` — origin + scenario analysis.
|
||||||
|
- `gps-denied-onboard/_docs/_process_leftovers/2026-05-09_satellite-provider-design-tasks.md` — sibling design tasks (inbound ingest, voting layer).
|
||||||
|
- `gps-denied-onboard/_docs/02_tasks/todo/AZ-304_c6_postgres_schema.md` — onboard counterpart schema.
|
||||||
|
- `gps-denied-onboard/_docs/02_tasks/todo/AZ-316_c11_tile_downloader.md` — onboard consumer of the bulk-list endpoint.
|
||||||
|
- `SatelliteProvider.DataAccess/Repositories/TileRepository.cs` — current `Guid.NewGuid()` + float UPSERT.
|
||||||
|
- `SatelliteProvider.Services.TileDownloader/TileService.cs` — current `BuildTileEntity`.
|
||||||
|
- `SatelliteProvider.Api/Program.cs` — endpoint surface.
|
||||||
@@ -0,0 +1,94 @@
|
|||||||
|
# Perf script: fix grep | wc -l pipefail crash in PT-08
|
||||||
|
|
||||||
|
**Task**: AZ-504_perf_script_grep_pipefail_fix
|
||||||
|
**Name**: Perf script: fix grep | wc -l pipefail crash in PT-08
|
||||||
|
**Description**: `scripts/run-performance-tests.sh:416-417` uses `grep -o ... | wc -l` to count `"status":"rejected"` and `"status":"accepted"` markers in the PT-08 batch response. On the happy path (`rejected=0`), `grep -o` exits 1 (no matches), and because the script has `set -o pipefail` + `set -e`, the assignment kills the script silently right after `rejected=0`. The cycle-3 perf-harness leftover (`_docs/_process_leftovers/2026-05-12_perf-cycle3-harness-execution.md`) has reproduced this crash twice (replay #2 + #3, post-AZ-500); PT-01..PT-07 stay green. Until this one-line fix lands, PT-08 cannot be measured against its 2000 ms p95 threshold and the leftover stays open.
|
||||||
|
**Complexity**: 1 points
|
||||||
|
**Dependencies**: None
|
||||||
|
**Component**: scripts/run-performance-tests.sh
|
||||||
|
**Tracker**: AZ-504
|
||||||
|
**Epic**: AZ-483 — Multi-source tile storage + UAV upload (Layer 2)
|
||||||
|
|
||||||
|
## Problem
|
||||||
|
|
||||||
|
`scripts/run-performance-tests.sh` lines 416-417 currently read:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
accepted=$(grep -o '"status":"accepted"' "$PERF_TMP_DIR/pt08_resp.json" | wc -l | tr -d ' ')
|
||||||
|
rejected=$(grep -o '"status":"rejected"' "$PERF_TMP_DIR/pt08_resp.json" | wc -l | tr -d ' ')
|
||||||
|
```
|
||||||
|
|
||||||
|
With `set -o pipefail` (line 16) and `set -e`, when `grep -o` matches zero times it exits 1; the pipeline returns 1; the assignment kills the script. This is masked when both counts are ≥1 — the bug only surfaces on the happy path where `rejected=0`. Two consecutive full-harness replays (post-AZ-500 .NET 10 migration) reproduced the crash at the exact same line; PT-01..PT-07 are unaffected because they don't pipe potentially-empty `grep` output through `wc -l`.
|
||||||
|
|
||||||
|
The leftover documents that the actual perf-relevant data PT-08 captured before crashing was healthy: HTTP 200, batch latency 99 ms (well under the 2000 ms threshold), accepted=2, rejected=0. The harness is buggy, not the production code path.
|
||||||
|
|
||||||
|
## Outcome
|
||||||
|
|
||||||
|
- `scripts/run-performance-tests.sh` runs PT-08 to completion when `rejected=0` (and when `accepted=0`, defensively).
|
||||||
|
- A full default-parameter perf run (`PERF_REPEAT_COUNT=20 PERF_UAV_BATCH_SIZE=10`) prints the PT-08 summary line with batch p50/p95, accepted total, rejected total, and exits 0 against an api built from `dev`.
|
||||||
|
- The cycle-3 perf-harness leftover (`_docs/_process_leftovers/2026-05-12_perf-cycle3-harness-execution.md`) is deleted once a full perf run is green (per AZ-500 Constraint).
|
||||||
|
|
||||||
|
## Scope
|
||||||
|
|
||||||
|
### Included
|
||||||
|
- `scripts/run-performance-tests.sh` lines 416-417 replaced with pipefail-tolerant counting (`grep -c ... || true`).
|
||||||
|
- A short shellcheck pass on the surrounding `pt08_summary` block so no similar empty-match-counting bug remains in the same scope.
|
||||||
|
- Delete `_docs/_process_leftovers/2026-05-12_perf-cycle3-harness-execution.md` after the green full perf run completes (post-implementation).
|
||||||
|
|
||||||
|
### Excluded
|
||||||
|
- Changes to PT-01..PT-07 scenarios or any production code path — bug is harness-only.
|
||||||
|
- Adding new perf scenarios — out of scope.
|
||||||
|
- Renaming or restructuring `scripts/run-performance-tests.sh`.
|
||||||
|
- Server-side per-call `UavTileQualityGate.Validate` timing instrumentation (already deferred in cycle 3 AZ-492 — out of scope here too).
|
||||||
|
|
||||||
|
## Acceptance Criteria
|
||||||
|
|
||||||
|
**AC-1: PT-08 completes on zero-rejected response**
|
||||||
|
Given the api is up against `docker-compose up -d --build` and a UAV batch upload returns `accepted ≥ 1, rejected = 0`
|
||||||
|
When the harness runs PT-08 batch counting
|
||||||
|
Then the assignment to `rejected` succeeds (no script exit); `rejected` is `0`; the loop continues to the next iteration.
|
||||||
|
|
||||||
|
**AC-2: PT-08 completes on zero-accepted response (defensive)**
|
||||||
|
Given a UAV batch upload returns `accepted = 0, rejected ≥ 1` (e.g., quality gate rejects every item)
|
||||||
|
When the harness runs PT-08 batch counting
|
||||||
|
Then the assignment to `accepted` succeeds (no script exit); `accepted` is `0`; the loop continues.
|
||||||
|
|
||||||
|
**AC-3: PT-08 summary line prints in full run**
|
||||||
|
Given a full default-parameter perf run (`PERF_REPEAT_COUNT=20 PERF_UAV_BATCH_SIZE=10`) against a healthy api
|
||||||
|
When the harness reaches the PT-08 reporting block
|
||||||
|
Then a summary line `PT-08 UAV batch upload: PASS p95=Xms / 2000ms (accepted=A, rejected=R, N=20)` is printed; the script exits 0.
|
||||||
|
|
||||||
|
**AC-4: Leftover deletion on green full run**
|
||||||
|
Given AC-1 + AC-2 + AC-3 hold and the full perf run is green
|
||||||
|
When the implementer verifies the run
|
||||||
|
Then `_docs/_process_leftovers/2026-05-12_perf-cycle3-harness-execution.md` is deleted in the same commit (per AZ-500 Constraint "leftover file is deleted ONLY when the full perf script runs cleanly").
|
||||||
|
|
||||||
|
## Non-Functional Requirements
|
||||||
|
|
||||||
|
**Compatibility**
|
||||||
|
- The fix must work under `bash` (script's `#!/bin/bash`). No POSIX-sh constraint.
|
||||||
|
- The fix must not silently swallow other errors in the surrounding block — only the empty-match case (`grep` exit 1) is tolerated; any other failure (file unreadable, pipe broken) must still propagate (`coderule.mdc` "never suppress errors silently").
|
||||||
|
|
||||||
|
## Constraints
|
||||||
|
|
||||||
|
- One-file edit. No new scripts, no helper extraction.
|
||||||
|
- Preserve `set -o pipefail` and `set -e` semantics globally — only neutralise grep's exit-1-on-no-match locally.
|
||||||
|
- Do not change PT-08's threshold (2000 ms p95, established by AZ-488).
|
||||||
|
- Do not rename, move, or restructure `scripts/run-performance-tests.sh`.
|
||||||
|
|
||||||
|
## Risks & Mitigation
|
||||||
|
|
||||||
|
**Risk 1: `grep -c` counts lines containing the pattern, not occurrences**
|
||||||
|
- *Risk*: `grep -c` counts matching LINES, not total occurrences. If the PT-08 response packs multiple `"status":"..."` markers on one line (compact JSON), the count diverges from `grep -o | wc -l`.
|
||||||
|
- *Mitigation*: inspect a sample `pt08_resp.json` from a recent run; if the JSON is one-per-line the `grep -c` swap is exact, otherwise use `grep -o ... | wc -l` wrapped in `|| echo 0` to preserve occurrence-count semantics while neutralising the empty-match exit.
|
||||||
|
|
||||||
|
**Risk 2: Pipefail bug exists elsewhere in the same script**
|
||||||
|
- *Risk*: similar `grep -o ... | wc -l` patterns elsewhere will crash on different happy paths.
|
||||||
|
- *Mitigation*: in-scope shellcheck pass over the file flags every `grep -o ... | wc -l` / `grep ... | wc -l` site for review; same defensive treatment applied as needed.
|
||||||
|
|
||||||
|
## References
|
||||||
|
|
||||||
|
- `_docs/_process_leftovers/2026-05-12_perf-cycle3-harness-execution.md` — replay #2 + #3 evidence; one-line fix proposed at lines 56-67 of the leftover.
|
||||||
|
- `scripts/run-performance-tests.sh:416-417` — exact lines to fix.
|
||||||
|
- `AZ-488` — PT-08 scenario owner; threshold context.
|
||||||
|
- `AZ-500` Constraint — "leftover file is deleted ONLY when the full perf script runs cleanly".
|
||||||
@@ -2,9 +2,9 @@
|
|||||||
|
|
||||||
## Current Step
|
## Current Step
|
||||||
flow: existing-code
|
flow: existing-code
|
||||||
step: 9
|
step: 10
|
||||||
name: New Task
|
name: Implement
|
||||||
status: not_started
|
status: in_progress
|
||||||
sub_step:
|
sub_step:
|
||||||
phase: 0
|
phase: 0
|
||||||
name: awaiting-invocation
|
name: awaiting-invocation
|
||||||
|
|||||||
@@ -122,3 +122,11 @@ User picked A at the Step 15 (Performance Test) gate of cycle 4. Full default-pa
|
|||||||
## Replay obligation
|
## Replay obligation
|
||||||
|
|
||||||
Open a new follow-up PBI for the `scripts/run-performance-tests.sh:416-417` grep fix (estimated 1 SP). Once that lands and a full perf run is green, delete this file. Until then, this leftover stays.
|
Open a new follow-up PBI for the `scripts/run-performance-tests.sh:416-417` grep fix (estimated 1 SP). Once that lands and a full perf run is green, delete this file. Until then, this leftover stays.
|
||||||
|
|
||||||
|
## Replay attempt #4 — 2026-05-12T13:00:00Z (cycle 5 /autodev Step 9 New Task)
|
||||||
|
|
||||||
|
PBI opened: **AZ-504 — "Perf script: fix grep | wc -l pipefail crash in PT-08"** (1 SP, parent epic AZ-483 — same as PT-08 scenario owner AZ-488). Spec landed at `_docs/02_tasks/todo/AZ-504_perf_script_grep_pipefail_fix.md`. The spec captures the AC-4 obligation that THIS leftover file is deleted in the same commit as the green full perf run.
|
||||||
|
|
||||||
|
The "open the PBI" half of the Replay obligation is now done. The "full perf run is green" half remains outstanding — this leftover stays open until AZ-504 lands AND a default-parameter `./scripts/run-performance-tests.sh` (`PERF_REPEAT_COUNT=20 PERF_UAV_BATCH_SIZE=10`) exits 0 against an api built from `dev`.
|
||||||
|
|
||||||
|
Next-cycle /autodev should NOT attempt replay #5 (open another PBI) — AZ-504 is the canonical replay vehicle. The next replay action is implementing AZ-504 itself (cycle 5 Step 10).
|
||||||
|
|||||||
Reference in New Issue
Block a user