[AZ-503] [AZ-504] cycle 5 new-task: tile identity + perf-script-fix

- AZ-503 (3 SP, epic AZ-483) — Tile identity → UUIDv5 deterministic id;
  integer-only UPSERT with COALESCE(flight_id) per-flight separation;
  content_sha256 column; POST /api/satellite/tiles/inventory bulk-list
  endpoint; HTTP/2 at Kestrel edge. Cross-workspace handoff from
  gps-denied-onboard (AZ-304 / AZ-316 counterpart). Supersedes the
  AZ-484 UPSERT-conflict-key portion.

- AZ-504 (1 SP, epic AZ-483) — Fix scripts/run-performance-tests.sh
  lines 416-417: grep -o | wc -l + set -o pipefail kills PT-08 when
  rejected=0. Closes the replay obligation for the cycle-3 perf-harness
  leftover (leftover deletion gated on green full perf run, AC-4).

Updates _dependencies_table.md with cycle 5 entries and records
replay attempt #4 against the perf-cycle3 leftover (PBI opened —
leftover still stays until AZ-504 lands and full perf run is green).

State advanced to Step 10 (Implement).

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-12 16:27:40 +03:00
parent e31f59211d
commit 8e509b550c
5 changed files with 328 additions and 3 deletions
+17
View File
@@ -92,6 +92,15 @@ Source: cycle-3 perf-harness leftover replay surfaced the host SDK / project SDK
|------|-------|-----------|--------|--------|
| AZ-500 | .NET 8 → .NET 10 migration (TFM + SDK pin + Docker images + CI images + Microsoft.AspNetCore.* + Microsoft.Extensions.* + Serilog.AspNetCore) | — | 5 | Done (In Testing) |
### Step 9 cycle 5 — New Task: Tile identity foundation + perf-harness fix (AZ-483 epic)
Source: cross-workspace handoff from `gps-denied-onboard` (tile-schema scenario analysis) for AZ-503; cycle-3 perf-harness leftover replay-obligation closure for AZ-504. Both attach to epic AZ-483 (Multi-source tile storage + UAV upload, Layer 2) — AZ-503 supersedes the AZ-484 UPSERT-conflict-key portion, AZ-504 unblocks PT-08 measurement.
| Task | Title | Depends On | Points | Status |
|------|-------|-----------|--------|--------|
| AZ-503 | Tile identity → UUIDv5 + integer UPSERT + bulk-list endpoint | AZ-484 (supersedes UPSERT-conflict-key portion of AZ-484 selection rule) | 3 | To Do |
| AZ-504 | Perf script: fix grep \| wc -l pipefail crash in PT-08 | — (independent; references AZ-488 PT-08 threshold) | 1 | To Do |
## Execution Order
### Step 6
@@ -135,6 +144,13 @@ Single task; coordinated cross-cutting bump.
1. AZ-500 (5 SP) — .NET 8 → .NET 10 migration. Self-contained but touches every csproj, both Dockerfiles, run-tests.sh, .woodpecker/01-test.yml, global.json, and two docs files.
### Step 9 cycle 5
Independent tracks — both can run in parallel; no ordering constraint between them. AZ-504 is a prerequisite for the cycle's Step 15 Performance Test to deliver a green PT-08 reading (and therefore for deleting the perf-cycle3 leftover); AZ-503 is the cycle's main feature.
1. AZ-504 (1 SP) — cheapest unblocker; lands first to clear PT-08 reporting for the cycle.
2. AZ-503 (3 SP) — main feature; data-model + API; cross-workspace alignment with `gps-denied-onboard` AZ-304 / AZ-316.
## Total Effort
Step 6: 6 tasks, 17 story points
@@ -144,6 +160,7 @@ Step 9 cycle 1: 1 task created (AZ-484, 5 pts)
Step 9 cycle 2: 2 tasks created (AZ-487 = 2 pts, AZ-488 = 8 pts over-cap user-accepted) — total 10 pts
Step 9 cycle 3: 6 tasks created (AZ-491 = 3 pts, AZ-492 = 3 pts, AZ-493 = 2 pts, AZ-494 = 2 pts, AZ-495 = 1 pt, AZ-496 = 2 pts) — total 13 pts
Step 9 cycle 4: 1 task created (AZ-500 = 5 pts)
Step 9 cycle 5: 2 tasks created (AZ-503 = 3 pts, AZ-504 = 1 pt) — total 4 pts
## Coverage Verification
@@ -0,0 +1,206 @@
# Tile identity → UUIDv5 + integer UPSERT + bulk-list endpoint
**Task**: AZ-503_tile_identity_uuidv5_bulk_list
**Name**: Tile identity → UUIDv5 + integer UPSERT + bulk-list endpoint
**Description**: Tile identity in the `tiles` table is currently random (`Guid.NewGuid()`), and the UPSERT conflict key uses `double precision` `latitude`/`longitude` and omits `flight_id`, which (a) makes idempotent re-insert fragile against float rounding and (b) destroys per-flight evidence required by the D-PROJ-2 multi-flight voting layer when two UAVs upload the same `(z, x, y)` cell. This task migrates tile identity to deterministic UUIDv5 (`id = uuidv5(NAMESPACE, "{z}/{x}/{y}/{source}/{flight_id or 'none'}")`), adds a `location_hash` UUIDv5 (`uuidv5(..., "{z}/{x}/{y}")`) for efficient cell-bag queries (UI Leaflet path + future voting), switches the UPSERT conflict key to integer-only `(zoom_level, tile_x, tile_y, tile_size_meters, source, COALESCE(flight_id, '00000000-0000-0000-0000-000000000000'::uuid))`, adds a `content_sha256 bytea NOT NULL` column for content-addressable dedup, and adds the `POST /api/satellite/tiles/inventory` endpoint that the onboard `TileDownloader` (`gps-denied-onboard` AZ-316) needs for bbox→tile enumeration during pre-flight provisioning.
**Complexity**: 3 points
**Dependencies**: AZ-484 (UPSERT-per-source + AZ-484 selection rule — done; this task supersedes the UPSERT conflict-key portion)
**Component**: SatelliteProvider.DataAccess + SatelliteProvider.Services.TileDownloader + SatelliteProvider.Api
**Tracker**: AZ-503
**Epic**: AZ-483 — Multi-source tile storage + UAV upload (Layer 2)
## Origin
Cross-workspace surface from `gps-denied-onboard` `_docs/_process_leftovers/2026-05-12_tile-schema-scenario-analysis.md`. The onboard repo's `AZ-304` C6 Postgres schema is being designed with `location_hash` + `content_sha256` columns and a deterministic `id`; this satellite-provider task is the parent-suite counterpart so both sides of the wire agree on tile identity semantics.
Related: `gps-denied-onboard` `_docs/_process_leftovers/2026-05-09_satellite-provider-design-tasks.md` Design Task #2 (multi-flight trust / voting layer) is the downstream consumer of this task's `flight_id`-aware UPSERT and `content_sha256`.
## Problem
Three concrete issues in the current code:
1. **Random tile id**`SatelliteProvider.Services.TileDownloader/TileService.cs:149` and `UavTileUploadHandler.cs:160` use `Id = Guid.NewGuid()`. The `id` is opaque, non-deterministic, and useless as a content/location handle. Onboard cannot pre-compute or compare ids before round-tripping to the DB.
2. **Float-based UPSERT collapses multi-flight evidence**`TileRepository.InsertAsync` line 124: `ON CONFLICT (latitude, longitude, tile_zoom, tile_size_meters, source) DO UPDATE SET file_path = EXCLUDED.file_path, ...`. Problems:
- `latitude` / `longitude` are `double precision`; Postgres conflict detection requires bit-identical floats. Re-uploads of the same tile computed from independently-rounded center coords can miss the conflict and create duplicate rows. AZ-484's `DISTINCT ON` read-side fix papers over the duplicate but does not prevent it.
- The conflict key omits `flight_id`. When two flights upload `source='uav'` for the same cell, `DO UPDATE` overwrites Flight A with Flight B. **D-PROJ-2 voting (Design Task #2 from the 2026-05-09 leftover) needs both rows alive.**
3. **No bulk-list endpoint for pre-flight provisioning** — onboard `TileDownloader` (`gps-denied-onboard` AZ-316) calls `GET /api/satellite/tiles?bbox=...&zoom=...&list-only=true` to size and enumerate a pre-flight cache build. **This endpoint does not exist** in `SatelliteProvider.Api/Program.cs`. The closest is `GetTilesByRegionAsync` (private, lat/lon-meters input, no HTTP surface) and `GET /api/satellite/tiles/latlon` (single tile). Operators today cannot pre-size a cache build over the bbox the mission planner produces.
4. **No content digest** — there is no `content_sha256` column. Same JPEG re-uploaded under a different `source` or `flight_id` is indistinguishable from a re-encode.
## Outcome
- Tile id becomes deterministic: `id = uuidv5(TILE_NAMESPACE, "{z}/{x}/{y}/{source}/{flight_id or '00000000-0000-0000-0000-000000000000'}")`. Same inputs always produce the same id; idempotent inserts no longer require a "did this row exist?" pre-check.
- `location_hash uuid NOT NULL` column: `uuidv5(TILE_NAMESPACE, "{z}/{x}/{y}")`. Drives Scenario 1 (UI Leaflet `/tiles/{z}/{x}/{y}`) lookup as a single hash-index probe, and Scenario 6 (voting) as a single cell-bag fetch.
- `content_sha256 bytea NOT NULL` column: SHA-256 of the JPEG body, computed at insert time. Enables dedup detection ("Flight B uploaded a byte-identical tile to Flight A — flag for inspection") and integrity checks on the read path.
- UPSERT conflict key becomes:
```
ON CONFLICT (zoom_level, tile_x, tile_y, tile_size_meters, source,
COALESCE(flight_id, '00000000-0000-0000-0000-000000000000'::uuid))
DO UPDATE SET file_path = EXCLUDED.file_path, captured_at = EXCLUDED.captured_at, updated_at = EXCLUDED.updated_at
```
Integer-only equality, per-flight separation. (Note: requires `zoom_level` column — the current schema uses `tile_zoom`; either rename or use `tile_zoom` consistently — this task keeps the existing column name `tile_zoom` and adjusts the onboard-side spec to match.)
- **New endpoint: `POST /api/satellite/tiles/inventory`** (preferred over the originally-proposed `GET /api/satellite/tiles?bbox=...`). Body is a list of `(z, x, y)` coords (or pre-computed `location_hash` UUIDs); response is one entry per input. Justification: the onboard side already has the deterministic slippy-tile math (`helpers/wgs_converter.py.latlon_to_tile_xy`, identical to C# `GeoUtils.WorldToTilePos`); making the server re-enumerate the bbox is wasted work. POST inventory is also the natural batched-existence-check shape — single round-trip, indexed lookup per row.
```
POST /api/satellite/tiles/inventory
Body: { "tiles": [ { "z": 18, "x": 12345, "y": 23456 }, ... ] }
// OR { "location_hashes": [ "uuid-v5", ... ] }
Response: {
"results": [
{
"tile_x": 12345, "tile_y": 23456, "tile_zoom": 18, "location_hash": "uuid-v5...",
"present": true,
"id": "uuid-v5...", "captured_at": "...", "resolution_m_per_px": 0.3,
"estimated_bytes": 42017, "source": "google_maps", "flight_id": null
},
{ "tile_x": 12346, "tile_y": 23456, "tile_zoom": 18, "location_hash": "uuid-v5...", "present": false }
]
}
```
Server-side query (single round-trip):
```sql
SELECT DISTINCT ON (location_hash)
location_hash, id, captured_at, resolution_m_per_px, estimated_bytes, source, flight_id
FROM tiles
WHERE location_hash = ANY($1::uuid[])
AND voting_status IN ('trusted', NULL)
ORDER BY location_hash, captured_at DESC, updated_at DESC, id DESC;
```
- Required covering index for Leaflet hot path:
```sql
CREATE INDEX tiles_leaflet_path
ON tiles (location_hash, captured_at DESC, updated_at DESC, id DESC)
INCLUDE (file_path, content_type, etag, voting_status);
```
Leaflet `/tiles/{z}/{x}/{y}` becomes an index-only scan (no heap fetch when `voting_status='trusted'`):
```sql
SELECT file_path FROM tiles
WHERE location_hash = $1 AND voting_status IN ('trusted', NULL)
ORDER BY captured_at DESC, updated_at DESC, id DESC LIMIT 1;
```
- Migration: additive — add `location_hash`, `content_sha256` as nullable, backfill in a single `UPDATE`, then set `NOT NULL`. Drop the old `UNIQUE (latitude, longitude, tile_zoom, tile_size_meters, source)` constraint and add the new integer-keyed one. Backfill `id` IS NOT trivial — see Risk 1.
- The UUIDv5 generator is implemented inline in `SatelliteProvider.Common/Utils/Uuidv5.cs` (RFC 9562 algorithm — 60 lines of C#, MD5 dropped, SHA-1 only). .NET 10 has no native `Guid.CreateVersion5` (only `CreateVersion7`); we do NOT add a 3rd-party dep for this.
## Scope
### Included
- `SatelliteProvider.Common/Utils/Uuidv5.cs` — pure-C# RFC 9562 UUIDv5 implementation, unit-tested against the Python `uuid.uuid5` reference vectors (the onboard side uses Python `uuid.uuid5`; both must produce byte-identical output for the same name + namespace).
- `SatelliteProvider.DataAccess` — Dapper SQL changes: new columns, new UPSERT, new SELECT shapes. `TileRepository.GetByLocationHashAsync` and `TileRepository.InventoryAsync(uuid[])` added; `GetByTileCoordinatesAsync` rewritten to use `location_hash`. Existing `tiles_leaflet_path` covering index added.
- `SatelliteProvider.Services.TileDownloader` — `BuildTileEntity` no longer calls `Guid.NewGuid()`; it computes the UUIDv5 and the `location_hash` from the deterministic inputs. Same change in `UavTileUploadHandler`.
- `SatelliteProvider.Api/Program.cs` — new MapPost route `/api/satellite/tiles/inventory`; existing `/tiles/{z}/{x}/{y}` Leaflet path migrated to use `location_hash`-keyed query against the covering index.
- Migration script in the existing migrations tool (whichever the repo uses — Flyway/EFCore/handwritten SQL; this task uses whatever is already established).
- **On-disk layout migration**: UAV tiles move from `./tiles/uav/{zoom}/{x}/{y}.jpg` to `./tiles/uav/{flight_id}/{zoom}/{x}/{y}.jpg`. Google Maps tiles stay at `./tiles/{zoom}/{x}/{y}/...jpg` (or normalise to `./tiles/google_maps/{zoom}/{x}/{y}.jpg` if the cleanup is cheap). The DB `file_path` column is rewritten in the same backfill that populates `location_hash`/`content_sha256`. Test `SatelliteProvider.Tests/UavTileFilePathTests.cs:23` is updated to assert the new path shape.
- OpenAPI annotations for the new endpoint.
- Unit tests for `Uuidv5` against Python reference vectors.
- Integration tests for the new POST `/api/satellite/tiles/inventory` surface (use existing `docker-compose.tests.yml` fixture).
- Integration test for multi-flight upload — confirms two `source='uav'` rows for the same `(z, x, y)` from different `flight_id`s coexist on disk (different paths) and in DB (different rows, same `location_hash`).
- **Enable HTTP/2 (and HTTP/3 over TLS where feasible)** at the Kestrel endpoint boundary: `EndpointDefaults.Protocols = HttpProtocols.Http1AndHttp2AndHttp3`. Verify the dev `docker-compose` nginx reverse proxy also has `http2 on;` in the relevant `listen` directive. This is the bulk-retrieval mechanism for BOTH Leaflet (browser opens one TCP connection, multiplexes 30+ tile streams, HPACK compresses repeated headers) and UAV provisioning (`httpx.Client(http2=True)` on the onboard side). No application-level batching is added.
- **No materialised `tile_current` pointer table** — deferred until production profiling demands it. Pre-optimisation rejected.
- **No content-addressable / blob storage layout** — `content_sha256` is for dedup *detection* (and integrity), not dedup *storage*. CAS adds complexity without measurable benefit at our scale.
- **No multipart / tar / zip bundle endpoint** for UAV provisioning — rejected in favour of inventory POST + per-tile GET over HTTP/2 multiplex. The bundle approach collapses resume granularity, loses per-tile cacheability, and gives no throughput win over HTTP/2 multistream. PMTiles archive is excellent for STATIC tile sets (Cloudflare/Protomaps) but our DB is dynamic — UAV uploads invalidate any pre-built archive. Defer PMTiles until profiling demands it.
### Excluded
- The voting / trust-promotion layer (Design Task #2 from 2026-05-09 leftover) — separate task. This task makes voting POSSIBLE by keeping per-flight rows; it does NOT implement voting.
- Onboard companion auth (mTLS / signed payloads) — already covered by D-PROJ-2 Design Task #1.
- Renaming the `tile_zoom` column to `zoom_level` (rule: never rename columns without explicit confirmation — see `coderule.mdc`).
- Per-flight key management (already covered by gps-denied-onboard AZ-318).
- Removing the existing `latitude`/`longitude` columns. They stay as advisory center-of-tile data.
## Acceptance Criteria
**AC-1: UUIDv5 reference vectors match Python**
Given the test vector `namespace = TILE_NAMESPACE` and `name = "18/12345/23456/google_maps/00000000-0000-0000-0000-000000000000"`
When `Uuidv5.Create(TILE_NAMESPACE, name)` runs
Then the resulting `Guid` is byte-identical to Python `uuid.uuid5(TILE_NAMESPACE, "18/12345/23456/google_maps/00000000-0000-0000-0000-000000000000")` for ≥10 randomly-generated test cases.
**AC-2: Insert is idempotent on identical inputs**
Given a tile is inserted with `(tile_zoom=18, tile_x=A, tile_y=B, tile_size_meters=S, source='google_maps', flight_id=NULL)` returning `id=X`
When the same insert is repeated
Then exactly ONE row exists in `tiles`; the returned `id == X`; the `id` column is not regenerated; `updated_at` IS refreshed but `created_at` is NOT.
**AC-3: Multi-flight UAV uploads coexist**
Given two `source='uav'` inserts for the same `(tile_zoom, tile_x, tile_y, tile_size_meters)` with `flight_id=F1` and `flight_id=F2` (F1 ≠ F2)
When both inserts complete
Then TWO rows exist in `tiles`; each has its own `id`; both rows share the same `location_hash`.
**AC-4: Float rounding does not break idempotence**
Given an insert with `latitude=47.123456789012345` and another insert recomputed from `tile_center = TileToWorldPos(x, y, z)` (slightly different float representation)
When both inserts target the same `(tile_zoom, tile_x, tile_y, tile_size_meters, source, flight_id)`
Then exactly ONE row results; the conflict triggers despite float differences (because the new UPSERT key does not include `latitude`/`longitude`).
**AC-5: Inventory endpoint returns one entry per requested coord**
Given a POST body of 25 `(z, x, y)` coords at zoom 18, with 12 already in the DB and 13 absent
When `POST /api/satellite/tiles/inventory` is called
Then `results` contains 25 entries in the SAME ORDER as the input; 12 entries have `present=true` with `id`/`location_hash`/`captured_at` populated, 13 entries have `present=false` with `location_hash` populated (computed via UUIDv5) and `id=null`; per-tile `estimated_bytes` is `null|int`.
**AC-6: Leaflet path returns most-recent variant via location_hash**
Given multiple rows for `(z, x, y)` from different sources/flights
When `GET /tiles/{z}/{x}/{y}` is called
Then ONE tile body is returned, selected by `WHERE location_hash = $1 ORDER BY captured_at DESC, updated_at DESC, id DESC LIMIT 1` (semantically identical to AZ-484's prior rule, now using `location_hash`).
**AC-7: content_sha256 is computed and persisted**
Given a UAV upload of a JPEG with known SHA-256
When the insert lands
Then `content_sha256` matches the externally-computed digest; a follow-up insert of a byte-identical body produces the SAME `content_sha256` value.
**AC-8: Migration is reversible (best-effort)**
Given the migration runs forward on a populated `tiles` table
When the back-migration runs
Then the table is restored to the pre-migration shape; data loss is limited to the new columns (`location_hash`, `content_sha256`). (Best-effort because UPSERT key changes are awkward to reverse cleanly.)
**AC-9: Performance — inventory endpoint ≤ 500 ms for 2500 tiles**
Given a POST body listing 2500 `(z, x, y)` coords at zoom 18 against a populated DB (average ~3 versions per cell across `google_maps` + `uav` sources)
When `POST /api/satellite/tiles/inventory` is called
Then the response arrives within 500 ms (95th percentile over 20 calls). Index-only scan via `tiles_leaflet_path` is the expected plan.
**AC-10: Leaflet hot path is index-only**
Given the `tiles_leaflet_path` covering index exists and the table has ≥ 100k rows
When `EXPLAIN (ANALYZE, BUFFERS) SELECT file_path FROM tiles WHERE location_hash = $1 AND voting_status IN ('trusted', NULL) ORDER BY captured_at DESC LIMIT 1` is run
Then the plan is `Index Only Scan using tiles_leaflet_path`; `Heap Fetches = 0` (visibility map fully built); total time < 0.5 ms.
**AC-12: HTTP/2 multiplexed responses**
Given Kestrel is configured with `Http1AndHttp2AndHttp3` (or `Http1AndHttp2` over plain TLS without QUIC support)
When a single `httpx.Client(http2=True)` issues 20 concurrent `GET /tiles/{z}/{x}/{y}` requests
Then the responses arrive over ONE TCP connection (verifiable via packet capture / `httpx.Response.http_version == 'HTTP/2'`); all 20 responses interleave on the wire; total wall-clock time < 2× single-tile latency (vs. 20× for HTTP/1.1 without pipelining); per-tile ETags + `Cache-Control` headers are preserved unchanged.
**AC-11: Per-flight on-disk separation**
Given two UAV uploads of the same `(z, x, y)` from `flight_id=F1` and `flight_id=F2`
When both inserts complete and the backing JPEGs are persisted
Then two distinct files exist at `./tiles/uav/{F1}/{z}/{x}/{y}.jpg` and `./tiles/uav/{F2}/{z}/{x}/{y}.jpg`; `rm -rf ./tiles/uav/{F1}/` removes ONLY Flight F1's evidence (Flight F2's file is untouched); the DB `file_path` columns reflect the per-flight paths.
## Constraints
- **No column renames**: keep `tile_zoom`, `tile_x`, `tile_y`, `latitude`, `longitude` exactly as named today. The onboard side (`AZ-304`) is responsible for matching column names on its own table.
- **UUID namespace MUST be agreed cross-repo**: pick a fixed UUID (e.g. `urn:uuid:5b8d0c2e-...`) and pin it in BOTH `SatelliteProvider.Common/Utils/Uuidv5.cs` and the onboard `gps_denied_onboard/components/c6_tile_cache/_uuid.py`. Document the chosen value in this task's review.
- **No third-party UUIDv5 dependency**: pure-C# RFC 9562 implementation, ≤80 LoC.
- **Migration must run online**: the `tiles` table is the busiest table in the service. Adding the new columns must be a non-blocking `ALTER TABLE` followed by a backfill in batches.
- **Existing AZ-484 selection rule is preserved**: the read-side `ORDER BY captured_at DESC, updated_at DESC, id DESC` tie-break stays; the only change is the WHERE clause uses `location_hash` instead of `(tile_zoom, tile_x, tile_y)`.
## Risks & Mitigation
**Risk 1: Backfilling `id` on existing rows is irreversible**
- *Risk*: existing rows have random `Guid` ids. If we overwrite them with computed UUIDv5 values, any cached external reference to the old id (e.g. operator UI bookmarks, audit log entries) becomes stale.
- *Mitigation*: add a `legacy_id uuid NULL` column populated from the existing `id` before the backfill. The old id is preserved for diagnostic queries. The new UUIDv5 `id` takes over. After a deprecation window (one cycle), drop `legacy_id`.
**Risk 2: UUIDv5 namespace divergence between Python and C#**
- *Risk*: subtle bug in the C# SHA-1 impl produces ids that differ from Python's. Cross-repo lookups fail silently.
- *Mitigation*: AC-1 requires byte-identical output across ≥10 vectors. The vectors are generated by Python and pasted into the C# test fixture as fixed-string expectations.
**Risk 3: Migration deadlock under load**
- *Risk*: `ALTER TABLE ADD COLUMN ... NOT NULL DEFAULT '...'` on a 10M+ row table locks the table for minutes.
- *Mitigation*: 3-step migration — (a) ADD COLUMN nullable; (b) UPDATE in 1000-row batches with `pg_sleep(0.1)` between batches; (c) SET NOT NULL after backfill. Documented in the migration file.
**Risk 4: Onboard `TileDownloader` (AZ-316) writes against this endpoint before it exists**
- *Risk*: ordering — onboard AZ-316 might be implemented before this task lands. Production calls hit 404.
- *Mitigation*: the onboard `TileDownloader` already has a fallback path (per-tile GET via `/tiles/{z}/{x}/{y}`); document that fallback in AZ-316's caveats and gate the `list-only=true` path behind a feature flag `c11.use_bulk_list_endpoint` (default `false` until this satellite-provider task is in production).
## References
- `gps-denied-onboard/_docs/_process_leftovers/2026-05-12_tile-schema-scenario-analysis.md` — origin + scenario analysis.
- `gps-denied-onboard/_docs/_process_leftovers/2026-05-09_satellite-provider-design-tasks.md` — sibling design tasks (inbound ingest, voting layer).
- `gps-denied-onboard/_docs/02_tasks/todo/AZ-304_c6_postgres_schema.md` — onboard counterpart schema.
- `gps-denied-onboard/_docs/02_tasks/todo/AZ-316_c11_tile_downloader.md` — onboard consumer of the bulk-list endpoint.
- `SatelliteProvider.DataAccess/Repositories/TileRepository.cs` — current `Guid.NewGuid()` + float UPSERT.
- `SatelliteProvider.Services.TileDownloader/TileService.cs` — current `BuildTileEntity`.
- `SatelliteProvider.Api/Program.cs` — endpoint surface.
@@ -0,0 +1,94 @@
# Perf script: fix grep | wc -l pipefail crash in PT-08
**Task**: AZ-504_perf_script_grep_pipefail_fix
**Name**: Perf script: fix grep | wc -l pipefail crash in PT-08
**Description**: `scripts/run-performance-tests.sh:416-417` uses `grep -o ... | wc -l` to count `"status":"rejected"` and `"status":"accepted"` markers in the PT-08 batch response. On the happy path (`rejected=0`), `grep -o` exits 1 (no matches), and because the script has `set -o pipefail` + `set -e`, the assignment kills the script silently right after `rejected=0`. The cycle-3 perf-harness leftover (`_docs/_process_leftovers/2026-05-12_perf-cycle3-harness-execution.md`) has reproduced this crash twice (replay #2 + #3, post-AZ-500); PT-01..PT-07 stay green. Until this one-line fix lands, PT-08 cannot be measured against its 2000 ms p95 threshold and the leftover stays open.
**Complexity**: 1 points
**Dependencies**: None
**Component**: scripts/run-performance-tests.sh
**Tracker**: AZ-504
**Epic**: AZ-483 — Multi-source tile storage + UAV upload (Layer 2)
## Problem
`scripts/run-performance-tests.sh` lines 416-417 currently read:
```bash
accepted=$(grep -o '"status":"accepted"' "$PERF_TMP_DIR/pt08_resp.json" | wc -l | tr -d ' ')
rejected=$(grep -o '"status":"rejected"' "$PERF_TMP_DIR/pt08_resp.json" | wc -l | tr -d ' ')
```
With `set -o pipefail` (line 16) and `set -e`, when `grep -o` matches zero times it exits 1; the pipeline returns 1; the assignment kills the script. This is masked when both counts are ≥1 — the bug only surfaces on the happy path where `rejected=0`. Two consecutive full-harness replays (post-AZ-500 .NET 10 migration) reproduced the crash at the exact same line; PT-01..PT-07 are unaffected because they don't pipe potentially-empty `grep` output through `wc -l`.
The leftover documents that the actual perf-relevant data PT-08 captured before crashing was healthy: HTTP 200, batch latency 99 ms (well under the 2000 ms threshold), accepted=2, rejected=0. The harness is buggy, not the production code path.
## Outcome
- `scripts/run-performance-tests.sh` runs PT-08 to completion when `rejected=0` (and when `accepted=0`, defensively).
- A full default-parameter perf run (`PERF_REPEAT_COUNT=20 PERF_UAV_BATCH_SIZE=10`) prints the PT-08 summary line with batch p50/p95, accepted total, rejected total, and exits 0 against an api built from `dev`.
- The cycle-3 perf-harness leftover (`_docs/_process_leftovers/2026-05-12_perf-cycle3-harness-execution.md`) is deleted once a full perf run is green (per AZ-500 Constraint).
## Scope
### Included
- `scripts/run-performance-tests.sh` lines 416-417 replaced with pipefail-tolerant counting (`grep -c ... || true`).
- A short shellcheck pass on the surrounding `pt08_summary` block so no similar empty-match-counting bug remains in the same scope.
- Delete `_docs/_process_leftovers/2026-05-12_perf-cycle3-harness-execution.md` after the green full perf run completes (post-implementation).
### Excluded
- Changes to PT-01..PT-07 scenarios or any production code path — bug is harness-only.
- Adding new perf scenarios — out of scope.
- Renaming or restructuring `scripts/run-performance-tests.sh`.
- Server-side per-call `UavTileQualityGate.Validate` timing instrumentation (already deferred in cycle 3 AZ-492 — out of scope here too).
## Acceptance Criteria
**AC-1: PT-08 completes on zero-rejected response**
Given the api is up against `docker-compose up -d --build` and a UAV batch upload returns `accepted ≥ 1, rejected = 0`
When the harness runs PT-08 batch counting
Then the assignment to `rejected` succeeds (no script exit); `rejected` is `0`; the loop continues to the next iteration.
**AC-2: PT-08 completes on zero-accepted response (defensive)**
Given a UAV batch upload returns `accepted = 0, rejected ≥ 1` (e.g., quality gate rejects every item)
When the harness runs PT-08 batch counting
Then the assignment to `accepted` succeeds (no script exit); `accepted` is `0`; the loop continues.
**AC-3: PT-08 summary line prints in full run**
Given a full default-parameter perf run (`PERF_REPEAT_COUNT=20 PERF_UAV_BATCH_SIZE=10`) against a healthy api
When the harness reaches the PT-08 reporting block
Then a summary line `PT-08 UAV batch upload: PASS p95=Xms / 2000ms (accepted=A, rejected=R, N=20)` is printed; the script exits 0.
**AC-4: Leftover deletion on green full run**
Given AC-1 + AC-2 + AC-3 hold and the full perf run is green
When the implementer verifies the run
Then `_docs/_process_leftovers/2026-05-12_perf-cycle3-harness-execution.md` is deleted in the same commit (per AZ-500 Constraint "leftover file is deleted ONLY when the full perf script runs cleanly").
## Non-Functional Requirements
**Compatibility**
- The fix must work under `bash` (script's `#!/bin/bash`). No POSIX-sh constraint.
- The fix must not silently swallow other errors in the surrounding block — only the empty-match case (`grep` exit 1) is tolerated; any other failure (file unreadable, pipe broken) must still propagate (`coderule.mdc` "never suppress errors silently").
## Constraints
- One-file edit. No new scripts, no helper extraction.
- Preserve `set -o pipefail` and `set -e` semantics globally — only neutralise grep's exit-1-on-no-match locally.
- Do not change PT-08's threshold (2000 ms p95, established by AZ-488).
- Do not rename, move, or restructure `scripts/run-performance-tests.sh`.
## Risks & Mitigation
**Risk 1: `grep -c` counts lines containing the pattern, not occurrences**
- *Risk*: `grep -c` counts matching LINES, not total occurrences. If the PT-08 response packs multiple `"status":"..."` markers on one line (compact JSON), the count diverges from `grep -o | wc -l`.
- *Mitigation*: inspect a sample `pt08_resp.json` from a recent run; if the JSON is one-per-line the `grep -c` swap is exact, otherwise use `grep -o ... | wc -l` wrapped in `|| echo 0` to preserve occurrence-count semantics while neutralising the empty-match exit.
**Risk 2: Pipefail bug exists elsewhere in the same script**
- *Risk*: similar `grep -o ... | wc -l` patterns elsewhere will crash on different happy paths.
- *Mitigation*: in-scope shellcheck pass over the file flags every `grep -o ... | wc -l` / `grep ... | wc -l` site for review; same defensive treatment applied as needed.
## References
- `_docs/_process_leftovers/2026-05-12_perf-cycle3-harness-execution.md` — replay #2 + #3 evidence; one-line fix proposed at lines 56-67 of the leftover.
- `scripts/run-performance-tests.sh:416-417` — exact lines to fix.
- `AZ-488` — PT-08 scenario owner; threshold context.
- `AZ-500` Constraint — "leftover file is deleted ONLY when the full perf script runs cleanly".