Compare commits

...

4 Commits

Author SHA1 Message Date
Oleksandr Bezdieniezhnykh f6197499a4 chore: update autodev state after AZ-504 batch 1
ci/woodpecker/push/01-test Pipeline was successful
ci/woodpecker/push/02-build-push Pipeline was successful
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-12 16:33:08 +03:00
Oleksandr Bezdieniezhnykh ab437a15df [AZ-504] Fix grep | wc -l pipefail crash in PT-08 batch counting
scripts/run-performance-tests.sh:416-417 used `grep -o ... | wc -l`
to count `"status":"accepted"` and `"status":"rejected"` markers in
the PT-08 batch response. On the happy path (rejected=0) grep -o
exits 1, and under `set -o pipefail` + `set -e` (line 16) the
pipeline killed the script before reaching any of PT-08's reporting
code — reproducing twice in the cycle-3 perf-harness leftover
(replay #2 + #3 post-AZ-500).

Fix: neutralise grep's no-match exit locally with `|| true` on the
grep stage of each pipeline. `grep -o | wc -l` is kept (not swapped
for `grep -c`) because the PT-08 response is compact JSON — all
items live on one line, so `grep -c` would always return 1 and lose
occurrence-count semantics. An 8-line comment explains why grep
cannot fail for I/O at this code path (file is curl-written, HTTP
200 gated).

AC-1 + AC-2 verified in-place against a standalone harness under
`set -e -o pipefail` (compact-JSON, mixed-status, edge-empty
cases). AC-3 + AC-4 are Step 15 (Performance Test) obligations by
spec design — the leftover deletion (AC-4) is "in the same commit"
as the green full perf run.

Batch report: _docs/03_implementation/batch_01_cycle5_report.md.
Code review: _docs/03_implementation/reviews/batch_01_cycle5_review.md
— PASS, no findings.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-12 16:32:36 +03:00
Oleksandr Bezdieniezhnykh 8e509b550c [AZ-503] [AZ-504] cycle 5 new-task: tile identity + perf-script-fix
- AZ-503 (3 SP, epic AZ-483) — Tile identity → UUIDv5 deterministic id;
  integer-only UPSERT with COALESCE(flight_id) per-flight separation;
  content_sha256 column; POST /api/satellite/tiles/inventory bulk-list
  endpoint; HTTP/2 at Kestrel edge. Cross-workspace handoff from
  gps-denied-onboard (AZ-304 / AZ-316 counterpart). Supersedes the
  AZ-484 UPSERT-conflict-key portion.

- AZ-504 (1 SP, epic AZ-483) — Fix scripts/run-performance-tests.sh
  lines 416-417: grep -o | wc -l + set -o pipefail kills PT-08 when
  rejected=0. Closes the replay obligation for the cycle-3 perf-harness
  leftover (leftover deletion gated on green full perf run, AC-4).

Updates _dependencies_table.md with cycle 5 entries and records
replay attempt #4 against the perf-cycle3 leftover (PBI opened —
leftover still stays until AZ-504 lands and full perf run is green).

State advanced to Step 10 (Implement).

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-12 16:27:40 +03:00
Oleksandr Bezdieniezhnykh e31f59211d [AZ-500] Cycle 4 Step 17: retrospective + close cycle
Adds retro_2026-05-12_cycle4.md, structure_2026-05-12_cycle4.md, and
the deploy_cycle4.md report that was dropped from the Steps 12-15
sync commit. Appends 3 new lessons to LESSONS.md (12/15 ring buffer)
on transitive major-version bumps, exposed pre-existing bugs, and
single-task-cycle metric framing. State advances to cycle 5 / step 9
(awaiting next New Task invocation).

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-12 06:14:43 +03:00
12 changed files with 813 additions and 8 deletions
+17
View File
@@ -92,6 +92,15 @@ Source: cycle-3 perf-harness leftover replay surfaced the host SDK / project SDK
|------|-------|-----------|--------|--------|
| AZ-500 | .NET 8 → .NET 10 migration (TFM + SDK pin + Docker images + CI images + Microsoft.AspNetCore.* + Microsoft.Extensions.* + Serilog.AspNetCore) | — | 5 | Done (In Testing) |
### Step 9 cycle 5 — New Task: Tile identity foundation + perf-harness fix (AZ-483 epic)
Source: cross-workspace handoff from `gps-denied-onboard` (tile-schema scenario analysis) for AZ-503; cycle-3 perf-harness leftover replay-obligation closure for AZ-504. Both attach to epic AZ-483 (Multi-source tile storage + UAV upload, Layer 2) — AZ-503 supersedes the AZ-484 UPSERT-conflict-key portion, AZ-504 unblocks PT-08 measurement.
| Task | Title | Depends On | Points | Status |
|------|-------|-----------|--------|--------|
| AZ-503 | Tile identity → UUIDv5 + integer UPSERT + bulk-list endpoint | AZ-484 (supersedes UPSERT-conflict-key portion of AZ-484 selection rule) | 3 | To Do |
| AZ-504 | Perf script: fix grep \| wc -l pipefail crash in PT-08 | — (independent; references AZ-488 PT-08 threshold) | 1 | To Do |
## Execution Order
### Step 6
@@ -135,6 +144,13 @@ Single task; coordinated cross-cutting bump.
1. AZ-500 (5 SP) — .NET 8 → .NET 10 migration. Self-contained but touches every csproj, both Dockerfiles, run-tests.sh, .woodpecker/01-test.yml, global.json, and two docs files.
### Step 9 cycle 5
Independent tracks — both can run in parallel; no ordering constraint between them. AZ-504 is a prerequisite for the cycle's Step 15 Performance Test to deliver a green PT-08 reading (and therefore for deleting the perf-cycle3 leftover); AZ-503 is the cycle's main feature.
1. AZ-504 (1 SP) — cheapest unblocker; lands first to clear PT-08 reporting for the cycle.
2. AZ-503 (3 SP) — main feature; data-model + API; cross-workspace alignment with `gps-denied-onboard` AZ-304 / AZ-316.
## Total Effort
Step 6: 6 tasks, 17 story points
@@ -144,6 +160,7 @@ Step 9 cycle 1: 1 task created (AZ-484, 5 pts)
Step 9 cycle 2: 2 tasks created (AZ-487 = 2 pts, AZ-488 = 8 pts over-cap user-accepted) — total 10 pts
Step 9 cycle 3: 6 tasks created (AZ-491 = 3 pts, AZ-492 = 3 pts, AZ-493 = 2 pts, AZ-494 = 2 pts, AZ-495 = 1 pt, AZ-496 = 2 pts) — total 13 pts
Step 9 cycle 4: 1 task created (AZ-500 = 5 pts)
Step 9 cycle 5: 2 tasks created (AZ-503 = 3 pts, AZ-504 = 1 pt) — total 4 pts
## Coverage Verification
@@ -0,0 +1,94 @@
# Perf script: fix grep | wc -l pipefail crash in PT-08
**Task**: AZ-504_perf_script_grep_pipefail_fix
**Name**: Perf script: fix grep | wc -l pipefail crash in PT-08
**Description**: `scripts/run-performance-tests.sh:416-417` uses `grep -o ... | wc -l` to count `"status":"rejected"` and `"status":"accepted"` markers in the PT-08 batch response. On the happy path (`rejected=0`), `grep -o` exits 1 (no matches), and because the script has `set -o pipefail` + `set -e`, the assignment kills the script silently right after `rejected=0`. The cycle-3 perf-harness leftover (`_docs/_process_leftovers/2026-05-12_perf-cycle3-harness-execution.md`) has reproduced this crash twice (replay #2 + #3, post-AZ-500); PT-01..PT-07 stay green. Until this one-line fix lands, PT-08 cannot be measured against its 2000 ms p95 threshold and the leftover stays open.
**Complexity**: 1 points
**Dependencies**: None
**Component**: scripts/run-performance-tests.sh
**Tracker**: AZ-504
**Epic**: AZ-483 — Multi-source tile storage + UAV upload (Layer 2)
## Problem
`scripts/run-performance-tests.sh` lines 416-417 currently read:
```bash
accepted=$(grep -o '"status":"accepted"' "$PERF_TMP_DIR/pt08_resp.json" | wc -l | tr -d ' ')
rejected=$(grep -o '"status":"rejected"' "$PERF_TMP_DIR/pt08_resp.json" | wc -l | tr -d ' ')
```
With `set -o pipefail` (line 16) and `set -e`, when `grep -o` matches zero times it exits 1; the pipeline returns 1; the assignment kills the script. This is masked when both counts are ≥1 — the bug only surfaces on the happy path where `rejected=0`. Two consecutive full-harness replays (post-AZ-500 .NET 10 migration) reproduced the crash at the exact same line; PT-01..PT-07 are unaffected because they don't pipe potentially-empty `grep` output through `wc -l`.
The leftover documents that the actual perf-relevant data PT-08 captured before crashing was healthy: HTTP 200, batch latency 99 ms (well under the 2000 ms threshold), accepted=2, rejected=0. The harness is buggy, not the production code path.
## Outcome
- `scripts/run-performance-tests.sh` runs PT-08 to completion when `rejected=0` (and when `accepted=0`, defensively).
- A full default-parameter perf run (`PERF_REPEAT_COUNT=20 PERF_UAV_BATCH_SIZE=10`) prints the PT-08 summary line with batch p50/p95, accepted total, rejected total, and exits 0 against an api built from `dev`.
- The cycle-3 perf-harness leftover (`_docs/_process_leftovers/2026-05-12_perf-cycle3-harness-execution.md`) is deleted once a full perf run is green (per AZ-500 Constraint).
## Scope
### Included
- `scripts/run-performance-tests.sh` lines 416-417 replaced with pipefail-tolerant counting (`grep -c ... || true`).
- A short shellcheck pass on the surrounding `pt08_summary` block so no similar empty-match-counting bug remains in the same scope.
- Delete `_docs/_process_leftovers/2026-05-12_perf-cycle3-harness-execution.md` after the green full perf run completes (post-implementation).
### Excluded
- Changes to PT-01..PT-07 scenarios or any production code path — bug is harness-only.
- Adding new perf scenarios — out of scope.
- Renaming or restructuring `scripts/run-performance-tests.sh`.
- Server-side per-call `UavTileQualityGate.Validate` timing instrumentation (already deferred in cycle 3 AZ-492 — out of scope here too).
## Acceptance Criteria
**AC-1: PT-08 completes on zero-rejected response**
Given the api is up against `docker-compose up -d --build` and a UAV batch upload returns `accepted ≥ 1, rejected = 0`
When the harness runs PT-08 batch counting
Then the assignment to `rejected` succeeds (no script exit); `rejected` is `0`; the loop continues to the next iteration.
**AC-2: PT-08 completes on zero-accepted response (defensive)**
Given a UAV batch upload returns `accepted = 0, rejected ≥ 1` (e.g., quality gate rejects every item)
When the harness runs PT-08 batch counting
Then the assignment to `accepted` succeeds (no script exit); `accepted` is `0`; the loop continues.
**AC-3: PT-08 summary line prints in full run**
Given a full default-parameter perf run (`PERF_REPEAT_COUNT=20 PERF_UAV_BATCH_SIZE=10`) against a healthy api
When the harness reaches the PT-08 reporting block
Then a summary line `PT-08 UAV batch upload: PASS p95=Xms / 2000ms (accepted=A, rejected=R, N=20)` is printed; the script exits 0.
**AC-4: Leftover deletion on green full run**
Given AC-1 + AC-2 + AC-3 hold and the full perf run is green
When the implementer verifies the run
Then `_docs/_process_leftovers/2026-05-12_perf-cycle3-harness-execution.md` is deleted in the same commit (per AZ-500 Constraint "leftover file is deleted ONLY when the full perf script runs cleanly").
## Non-Functional Requirements
**Compatibility**
- The fix must work under `bash` (script's `#!/bin/bash`). No POSIX-sh constraint.
- The fix must not silently swallow other errors in the surrounding block — only the empty-match case (`grep` exit 1) is tolerated; any other failure (file unreadable, pipe broken) must still propagate (`coderule.mdc` "never suppress errors silently").
## Constraints
- One-file edit. No new scripts, no helper extraction.
- Preserve `set -o pipefail` and `set -e` semantics globally — only neutralise grep's exit-1-on-no-match locally.
- Do not change PT-08's threshold (2000 ms p95, established by AZ-488).
- Do not rename, move, or restructure `scripts/run-performance-tests.sh`.
## Risks & Mitigation
**Risk 1: `grep -c` counts lines containing the pattern, not occurrences**
- *Risk*: `grep -c` counts matching LINES, not total occurrences. If the PT-08 response packs multiple `"status":"..."` markers on one line (compact JSON), the count diverges from `grep -o | wc -l`.
- *Mitigation*: inspect a sample `pt08_resp.json` from a recent run; if the JSON is one-per-line the `grep -c` swap is exact, otherwise use `grep -o ... | wc -l` wrapped in `|| echo 0` to preserve occurrence-count semantics while neutralising the empty-match exit.
**Risk 2: Pipefail bug exists elsewhere in the same script**
- *Risk*: similar `grep -o ... | wc -l` patterns elsewhere will crash on different happy paths.
- *Mitigation*: in-scope shellcheck pass over the file flags every `grep -o ... | wc -l` / `grep ... | wc -l` site for review; same defensive treatment applied as needed.
## References
- `_docs/_process_leftovers/2026-05-12_perf-cycle3-harness-execution.md` — replay #2 + #3 evidence; one-line fix proposed at lines 56-67 of the leftover.
- `scripts/run-performance-tests.sh:416-417` — exact lines to fix.
- `AZ-488` — PT-08 scenario owner; threshold context.
- `AZ-500` Constraint — "leftover file is deleted ONLY when the full perf script runs cleanly".
@@ -0,0 +1,206 @@
# Tile identity → UUIDv5 + integer UPSERT + bulk-list endpoint
**Task**: AZ-503_tile_identity_uuidv5_bulk_list
**Name**: Tile identity → UUIDv5 + integer UPSERT + bulk-list endpoint
**Description**: Tile identity in the `tiles` table is currently random (`Guid.NewGuid()`), and the UPSERT conflict key uses `double precision` `latitude`/`longitude` and omits `flight_id`, which (a) makes idempotent re-insert fragile against float rounding and (b) destroys per-flight evidence required by the D-PROJ-2 multi-flight voting layer when two UAVs upload the same `(z, x, y)` cell. This task migrates tile identity to deterministic UUIDv5 (`id = uuidv5(NAMESPACE, "{z}/{x}/{y}/{source}/{flight_id or 'none'}")`), adds a `location_hash` UUIDv5 (`uuidv5(..., "{z}/{x}/{y}")`) for efficient cell-bag queries (UI Leaflet path + future voting), switches the UPSERT conflict key to integer-only `(zoom_level, tile_x, tile_y, tile_size_meters, source, COALESCE(flight_id, '00000000-0000-0000-0000-000000000000'::uuid))`, adds a `content_sha256 bytea NOT NULL` column for content-addressable dedup, and adds the `POST /api/satellite/tiles/inventory` endpoint that the onboard `TileDownloader` (`gps-denied-onboard` AZ-316) needs for bbox→tile enumeration during pre-flight provisioning.
**Complexity**: 3 points
**Dependencies**: AZ-484 (UPSERT-per-source + AZ-484 selection rule — done; this task supersedes the UPSERT conflict-key portion)
**Component**: SatelliteProvider.DataAccess + SatelliteProvider.Services.TileDownloader + SatelliteProvider.Api
**Tracker**: AZ-503
**Epic**: AZ-483 — Multi-source tile storage + UAV upload (Layer 2)
## Origin
Cross-workspace surface from `gps-denied-onboard` `_docs/_process_leftovers/2026-05-12_tile-schema-scenario-analysis.md`. The onboard repo's `AZ-304` C6 Postgres schema is being designed with `location_hash` + `content_sha256` columns and a deterministic `id`; this satellite-provider task is the parent-suite counterpart so both sides of the wire agree on tile identity semantics.
Related: `gps-denied-onboard` `_docs/_process_leftovers/2026-05-09_satellite-provider-design-tasks.md` Design Task #2 (multi-flight trust / voting layer) is the downstream consumer of this task's `flight_id`-aware UPSERT and `content_sha256`.
## Problem
Three concrete issues in the current code:
1. **Random tile id**`SatelliteProvider.Services.TileDownloader/TileService.cs:149` and `UavTileUploadHandler.cs:160` use `Id = Guid.NewGuid()`. The `id` is opaque, non-deterministic, and useless as a content/location handle. Onboard cannot pre-compute or compare ids before round-tripping to the DB.
2. **Float-based UPSERT collapses multi-flight evidence**`TileRepository.InsertAsync` line 124: `ON CONFLICT (latitude, longitude, tile_zoom, tile_size_meters, source) DO UPDATE SET file_path = EXCLUDED.file_path, ...`. Problems:
- `latitude` / `longitude` are `double precision`; Postgres conflict detection requires bit-identical floats. Re-uploads of the same tile computed from independently-rounded center coords can miss the conflict and create duplicate rows. AZ-484's `DISTINCT ON` read-side fix papers over the duplicate but does not prevent it.
- The conflict key omits `flight_id`. When two flights upload `source='uav'` for the same cell, `DO UPDATE` overwrites Flight A with Flight B. **D-PROJ-2 voting (Design Task #2 from the 2026-05-09 leftover) needs both rows alive.**
3. **No bulk-list endpoint for pre-flight provisioning** — onboard `TileDownloader` (`gps-denied-onboard` AZ-316) calls `GET /api/satellite/tiles?bbox=...&zoom=...&list-only=true` to size and enumerate a pre-flight cache build. **This endpoint does not exist** in `SatelliteProvider.Api/Program.cs`. The closest is `GetTilesByRegionAsync` (private, lat/lon-meters input, no HTTP surface) and `GET /api/satellite/tiles/latlon` (single tile). Operators today cannot pre-size a cache build over the bbox the mission planner produces.
4. **No content digest** — there is no `content_sha256` column. Same JPEG re-uploaded under a different `source` or `flight_id` is indistinguishable from a re-encode.
## Outcome
- Tile id becomes deterministic: `id = uuidv5(TILE_NAMESPACE, "{z}/{x}/{y}/{source}/{flight_id or '00000000-0000-0000-0000-000000000000'}")`. Same inputs always produce the same id; idempotent inserts no longer require a "did this row exist?" pre-check.
- `location_hash uuid NOT NULL` column: `uuidv5(TILE_NAMESPACE, "{z}/{x}/{y}")`. Drives Scenario 1 (UI Leaflet `/tiles/{z}/{x}/{y}`) lookup as a single hash-index probe, and Scenario 6 (voting) as a single cell-bag fetch.
- `content_sha256 bytea NOT NULL` column: SHA-256 of the JPEG body, computed at insert time. Enables dedup detection ("Flight B uploaded a byte-identical tile to Flight A — flag for inspection") and integrity checks on the read path.
- UPSERT conflict key becomes:
```
ON CONFLICT (zoom_level, tile_x, tile_y, tile_size_meters, source,
COALESCE(flight_id, '00000000-0000-0000-0000-000000000000'::uuid))
DO UPDATE SET file_path = EXCLUDED.file_path, captured_at = EXCLUDED.captured_at, updated_at = EXCLUDED.updated_at
```
Integer-only equality, per-flight separation. (Note: requires `zoom_level` column — the current schema uses `tile_zoom`; either rename or use `tile_zoom` consistently — this task keeps the existing column name `tile_zoom` and adjusts the onboard-side spec to match.)
- **New endpoint: `POST /api/satellite/tiles/inventory`** (preferred over the originally-proposed `GET /api/satellite/tiles?bbox=...`). Body is a list of `(z, x, y)` coords (or pre-computed `location_hash` UUIDs); response is one entry per input. Justification: the onboard side already has the deterministic slippy-tile math (`helpers/wgs_converter.py.latlon_to_tile_xy`, identical to C# `GeoUtils.WorldToTilePos`); making the server re-enumerate the bbox is wasted work. POST inventory is also the natural batched-existence-check shape — single round-trip, indexed lookup per row.
```
POST /api/satellite/tiles/inventory
Body: { "tiles": [ { "z": 18, "x": 12345, "y": 23456 }, ... ] }
// OR { "location_hashes": [ "uuid-v5", ... ] }
Response: {
"results": [
{
"tile_x": 12345, "tile_y": 23456, "tile_zoom": 18, "location_hash": "uuid-v5...",
"present": true,
"id": "uuid-v5...", "captured_at": "...", "resolution_m_per_px": 0.3,
"estimated_bytes": 42017, "source": "google_maps", "flight_id": null
},
{ "tile_x": 12346, "tile_y": 23456, "tile_zoom": 18, "location_hash": "uuid-v5...", "present": false }
]
}
```
Server-side query (single round-trip):
```sql
SELECT DISTINCT ON (location_hash)
location_hash, id, captured_at, resolution_m_per_px, estimated_bytes, source, flight_id
FROM tiles
WHERE location_hash = ANY($1::uuid[])
AND voting_status IN ('trusted', NULL)
ORDER BY location_hash, captured_at DESC, updated_at DESC, id DESC;
```
- Required covering index for Leaflet hot path:
```sql
CREATE INDEX tiles_leaflet_path
ON tiles (location_hash, captured_at DESC, updated_at DESC, id DESC)
INCLUDE (file_path, content_type, etag, voting_status);
```
Leaflet `/tiles/{z}/{x}/{y}` becomes an index-only scan (no heap fetch when `voting_status='trusted'`):
```sql
SELECT file_path FROM tiles
WHERE location_hash = $1 AND voting_status IN ('trusted', NULL)
ORDER BY captured_at DESC, updated_at DESC, id DESC LIMIT 1;
```
- Migration: additive — add `location_hash`, `content_sha256` as nullable, backfill in a single `UPDATE`, then set `NOT NULL`. Drop the old `UNIQUE (latitude, longitude, tile_zoom, tile_size_meters, source)` constraint and add the new integer-keyed one. Backfill `id` IS NOT trivial — see Risk 1.
- The UUIDv5 generator is implemented inline in `SatelliteProvider.Common/Utils/Uuidv5.cs` (RFC 9562 algorithm — 60 lines of C#, MD5 dropped, SHA-1 only). .NET 10 has no native `Guid.CreateVersion5` (only `CreateVersion7`); we do NOT add a 3rd-party dep for this.
## Scope
### Included
- `SatelliteProvider.Common/Utils/Uuidv5.cs` — pure-C# RFC 9562 UUIDv5 implementation, unit-tested against the Python `uuid.uuid5` reference vectors (the onboard side uses Python `uuid.uuid5`; both must produce byte-identical output for the same name + namespace).
- `SatelliteProvider.DataAccess` — Dapper SQL changes: new columns, new UPSERT, new SELECT shapes. `TileRepository.GetByLocationHashAsync` and `TileRepository.InventoryAsync(uuid[])` added; `GetByTileCoordinatesAsync` rewritten to use `location_hash`. Existing `tiles_leaflet_path` covering index added.
- `SatelliteProvider.Services.TileDownloader` — `BuildTileEntity` no longer calls `Guid.NewGuid()`; it computes the UUIDv5 and the `location_hash` from the deterministic inputs. Same change in `UavTileUploadHandler`.
- `SatelliteProvider.Api/Program.cs` — new MapPost route `/api/satellite/tiles/inventory`; existing `/tiles/{z}/{x}/{y}` Leaflet path migrated to use `location_hash`-keyed query against the covering index.
- Migration script in the existing migrations tool (whichever the repo uses — Flyway/EFCore/handwritten SQL; this task uses whatever is already established).
- **On-disk layout migration**: UAV tiles move from `./tiles/uav/{zoom}/{x}/{y}.jpg` to `./tiles/uav/{flight_id}/{zoom}/{x}/{y}.jpg`. Google Maps tiles stay at `./tiles/{zoom}/{x}/{y}/...jpg` (or normalise to `./tiles/google_maps/{zoom}/{x}/{y}.jpg` if the cleanup is cheap). The DB `file_path` column is rewritten in the same backfill that populates `location_hash`/`content_sha256`. Test `SatelliteProvider.Tests/UavTileFilePathTests.cs:23` is updated to assert the new path shape.
- OpenAPI annotations for the new endpoint.
- Unit tests for `Uuidv5` against Python reference vectors.
- Integration tests for the new POST `/api/satellite/tiles/inventory` surface (use existing `docker-compose.tests.yml` fixture).
- Integration test for multi-flight upload — confirms two `source='uav'` rows for the same `(z, x, y)` from different `flight_id`s coexist on disk (different paths) and in DB (different rows, same `location_hash`).
- **Enable HTTP/2 (and HTTP/3 over TLS where feasible)** at the Kestrel endpoint boundary: `EndpointDefaults.Protocols = HttpProtocols.Http1AndHttp2AndHttp3`. Verify the dev `docker-compose` nginx reverse proxy also has `http2 on;` in the relevant `listen` directive. This is the bulk-retrieval mechanism for BOTH Leaflet (browser opens one TCP connection, multiplexes 30+ tile streams, HPACK compresses repeated headers) and UAV provisioning (`httpx.Client(http2=True)` on the onboard side). No application-level batching is added.
- **No materialised `tile_current` pointer table** — deferred until production profiling demands it. Pre-optimisation rejected.
- **No content-addressable / blob storage layout** — `content_sha256` is for dedup *detection* (and integrity), not dedup *storage*. CAS adds complexity without measurable benefit at our scale.
- **No multipart / tar / zip bundle endpoint** for UAV provisioning — rejected in favour of inventory POST + per-tile GET over HTTP/2 multiplex. The bundle approach collapses resume granularity, loses per-tile cacheability, and gives no throughput win over HTTP/2 multistream. PMTiles archive is excellent for STATIC tile sets (Cloudflare/Protomaps) but our DB is dynamic — UAV uploads invalidate any pre-built archive. Defer PMTiles until profiling demands it.
### Excluded
- The voting / trust-promotion layer (Design Task #2 from 2026-05-09 leftover) — separate task. This task makes voting POSSIBLE by keeping per-flight rows; it does NOT implement voting.
- Onboard companion auth (mTLS / signed payloads) — already covered by D-PROJ-2 Design Task #1.
- Renaming the `tile_zoom` column to `zoom_level` (rule: never rename columns without explicit confirmation — see `coderule.mdc`).
- Per-flight key management (already covered by gps-denied-onboard AZ-318).
- Removing the existing `latitude`/`longitude` columns. They stay as advisory center-of-tile data.
## Acceptance Criteria
**AC-1: UUIDv5 reference vectors match Python**
Given the test vector `namespace = TILE_NAMESPACE` and `name = "18/12345/23456/google_maps/00000000-0000-0000-0000-000000000000"`
When `Uuidv5.Create(TILE_NAMESPACE, name)` runs
Then the resulting `Guid` is byte-identical to Python `uuid.uuid5(TILE_NAMESPACE, "18/12345/23456/google_maps/00000000-0000-0000-0000-000000000000")` for ≥10 randomly-generated test cases.
**AC-2: Insert is idempotent on identical inputs**
Given a tile is inserted with `(tile_zoom=18, tile_x=A, tile_y=B, tile_size_meters=S, source='google_maps', flight_id=NULL)` returning `id=X`
When the same insert is repeated
Then exactly ONE row exists in `tiles`; the returned `id == X`; the `id` column is not regenerated; `updated_at` IS refreshed but `created_at` is NOT.
**AC-3: Multi-flight UAV uploads coexist**
Given two `source='uav'` inserts for the same `(tile_zoom, tile_x, tile_y, tile_size_meters)` with `flight_id=F1` and `flight_id=F2` (F1 ≠ F2)
When both inserts complete
Then TWO rows exist in `tiles`; each has its own `id`; both rows share the same `location_hash`.
**AC-4: Float rounding does not break idempotence**
Given an insert with `latitude=47.123456789012345` and another insert recomputed from `tile_center = TileToWorldPos(x, y, z)` (slightly different float representation)
When both inserts target the same `(tile_zoom, tile_x, tile_y, tile_size_meters, source, flight_id)`
Then exactly ONE row results; the conflict triggers despite float differences (because the new UPSERT key does not include `latitude`/`longitude`).
**AC-5: Inventory endpoint returns one entry per requested coord**
Given a POST body of 25 `(z, x, y)` coords at zoom 18, with 12 already in the DB and 13 absent
When `POST /api/satellite/tiles/inventory` is called
Then `results` contains 25 entries in the SAME ORDER as the input; 12 entries have `present=true` with `id`/`location_hash`/`captured_at` populated, 13 entries have `present=false` with `location_hash` populated (computed via UUIDv5) and `id=null`; per-tile `estimated_bytes` is `null|int`.
**AC-6: Leaflet path returns most-recent variant via location_hash**
Given multiple rows for `(z, x, y)` from different sources/flights
When `GET /tiles/{z}/{x}/{y}` is called
Then ONE tile body is returned, selected by `WHERE location_hash = $1 ORDER BY captured_at DESC, updated_at DESC, id DESC LIMIT 1` (semantically identical to AZ-484's prior rule, now using `location_hash`).
**AC-7: content_sha256 is computed and persisted**
Given a UAV upload of a JPEG with known SHA-256
When the insert lands
Then `content_sha256` matches the externally-computed digest; a follow-up insert of a byte-identical body produces the SAME `content_sha256` value.
**AC-8: Migration is reversible (best-effort)**
Given the migration runs forward on a populated `tiles` table
When the back-migration runs
Then the table is restored to the pre-migration shape; data loss is limited to the new columns (`location_hash`, `content_sha256`). (Best-effort because UPSERT key changes are awkward to reverse cleanly.)
**AC-9: Performance — inventory endpoint ≤ 500 ms for 2500 tiles**
Given a POST body listing 2500 `(z, x, y)` coords at zoom 18 against a populated DB (average ~3 versions per cell across `google_maps` + `uav` sources)
When `POST /api/satellite/tiles/inventory` is called
Then the response arrives within 500 ms (95th percentile over 20 calls). Index-only scan via `tiles_leaflet_path` is the expected plan.
**AC-10: Leaflet hot path is index-only**
Given the `tiles_leaflet_path` covering index exists and the table has ≥ 100k rows
When `EXPLAIN (ANALYZE, BUFFERS) SELECT file_path FROM tiles WHERE location_hash = $1 AND voting_status IN ('trusted', NULL) ORDER BY captured_at DESC LIMIT 1` is run
Then the plan is `Index Only Scan using tiles_leaflet_path`; `Heap Fetches = 0` (visibility map fully built); total time < 0.5 ms.
**AC-12: HTTP/2 multiplexed responses**
Given Kestrel is configured with `Http1AndHttp2AndHttp3` (or `Http1AndHttp2` over plain TLS without QUIC support)
When a single `httpx.Client(http2=True)` issues 20 concurrent `GET /tiles/{z}/{x}/{y}` requests
Then the responses arrive over ONE TCP connection (verifiable via packet capture / `httpx.Response.http_version == 'HTTP/2'`); all 20 responses interleave on the wire; total wall-clock time < 2× single-tile latency (vs. 20× for HTTP/1.1 without pipelining); per-tile ETags + `Cache-Control` headers are preserved unchanged.
**AC-11: Per-flight on-disk separation**
Given two UAV uploads of the same `(z, x, y)` from `flight_id=F1` and `flight_id=F2`
When both inserts complete and the backing JPEGs are persisted
Then two distinct files exist at `./tiles/uav/{F1}/{z}/{x}/{y}.jpg` and `./tiles/uav/{F2}/{z}/{x}/{y}.jpg`; `rm -rf ./tiles/uav/{F1}/` removes ONLY Flight F1's evidence (Flight F2's file is untouched); the DB `file_path` columns reflect the per-flight paths.
## Constraints
- **No column renames**: keep `tile_zoom`, `tile_x`, `tile_y`, `latitude`, `longitude` exactly as named today. The onboard side (`AZ-304`) is responsible for matching column names on its own table.
- **UUID namespace MUST be agreed cross-repo**: pick a fixed UUID (e.g. `urn:uuid:5b8d0c2e-...`) and pin it in BOTH `SatelliteProvider.Common/Utils/Uuidv5.cs` and the onboard `gps_denied_onboard/components/c6_tile_cache/_uuid.py`. Document the chosen value in this task's review.
- **No third-party UUIDv5 dependency**: pure-C# RFC 9562 implementation, ≤80 LoC.
- **Migration must run online**: the `tiles` table is the busiest table in the service. Adding the new columns must be a non-blocking `ALTER TABLE` followed by a backfill in batches.
- **Existing AZ-484 selection rule is preserved**: the read-side `ORDER BY captured_at DESC, updated_at DESC, id DESC` tie-break stays; the only change is the WHERE clause uses `location_hash` instead of `(tile_zoom, tile_x, tile_y)`.
## Risks & Mitigation
**Risk 1: Backfilling `id` on existing rows is irreversible**
- *Risk*: existing rows have random `Guid` ids. If we overwrite them with computed UUIDv5 values, any cached external reference to the old id (e.g. operator UI bookmarks, audit log entries) becomes stale.
- *Mitigation*: add a `legacy_id uuid NULL` column populated from the existing `id` before the backfill. The old id is preserved for diagnostic queries. The new UUIDv5 `id` takes over. After a deprecation window (one cycle), drop `legacy_id`.
**Risk 2: UUIDv5 namespace divergence between Python and C#**
- *Risk*: subtle bug in the C# SHA-1 impl produces ids that differ from Python's. Cross-repo lookups fail silently.
- *Mitigation*: AC-1 requires byte-identical output across ≥10 vectors. The vectors are generated by Python and pasted into the C# test fixture as fixed-string expectations.
**Risk 3: Migration deadlock under load**
- *Risk*: `ALTER TABLE ADD COLUMN ... NOT NULL DEFAULT '...'` on a 10M+ row table locks the table for minutes.
- *Mitigation*: 3-step migration — (a) ADD COLUMN nullable; (b) UPDATE in 1000-row batches with `pg_sleep(0.1)` between batches; (c) SET NOT NULL after backfill. Documented in the migration file.
**Risk 4: Onboard `TileDownloader` (AZ-316) writes against this endpoint before it exists**
- *Risk*: ordering — onboard AZ-316 might be implemented before this task lands. Production calls hit 404.
- *Mitigation*: the onboard `TileDownloader` already has a fallback path (per-tile GET via `/tiles/{z}/{x}/{y}`); document that fallback in AZ-316's caveats and gate the `list-only=true` path behind a feature flag `c11.use_bulk_list_endpoint` (default `false` until this satellite-provider task is in production).
## References
- `gps-denied-onboard/_docs/_process_leftovers/2026-05-12_tile-schema-scenario-analysis.md` — origin + scenario analysis.
- `gps-denied-onboard/_docs/_process_leftovers/2026-05-09_satellite-provider-design-tasks.md` — sibling design tasks (inbound ingest, voting layer).
- `gps-denied-onboard/_docs/02_tasks/todo/AZ-304_c6_postgres_schema.md` — onboard counterpart schema.
- `gps-denied-onboard/_docs/02_tasks/todo/AZ-316_c11_tile_downloader.md` — onboard consumer of the bulk-list endpoint.
- `SatelliteProvider.DataAccess/Repositories/TileRepository.cs` — current `Guid.NewGuid()` + float UPSERT.
- `SatelliteProvider.Services.TileDownloader/TileService.cs` — current `BuildTileEntity`.
- `SatelliteProvider.Api/Program.cs` — endpoint surface.
@@ -0,0 +1,60 @@
# Batch Report
**Batch**: 01 (cycle 5)
**Tasks**: AZ-504 — Perf script: fix grep | wc -l pipefail crash in PT-08
**Date**: 2026-05-12
## Task Results
| Task | Status | Files Modified | Tests | AC Coverage | Issues |
|------|--------|---------------|-------|-------------|--------|
| AZ-504_perf_script_grep_pipefail_fix | Done (AC-1/AC-2 in-place verified; AC-3/AC-4 deferred to Step 15) | 1 file | pass (in-place harness, 4 cases under `set -e -o pipefail`) | 2/4 covered now; 2/4 deferred to Step 15 by spec design | None |
## Changes
`scripts/run-performance-tests.sh:416-417` — replaced
```bash
accepted=$(grep -o '"status":"accepted"' "$PERF_TMP_DIR/pt08_resp.json" | wc -l | tr -d ' ')
rejected=$(grep -o '"status":"rejected"' "$PERF_TMP_DIR/pt08_resp.json" | wc -l | tr -d ' ')
```
with
```bash
# AZ-504: grep exits 1 on zero matches. Under `set -o pipefail` (line 16)
# that kills the assignment and crashes the script on the happy path
# (rejected=0). Neutralise the no-match case locally with `|| true` so
# the pipeline still produces a count. The response is compact JSON
# (one line, all items) so `grep -o | wc -l` is required to count
# occurrences — `grep -c` would only count matching lines (=1). The
# file is guaranteed-readable here (curl wrote it earlier in this
# iteration on the HTTP 200 branch), so grep cannot fail for I/O.
accepted=$({ grep -o '"status":"accepted"' "$PERF_TMP_DIR/pt08_resp.json" || true; } | wc -l | tr -d ' ')
rejected=$({ grep -o '"status":"rejected"' "$PERF_TMP_DIR/pt08_resp.json" || true; } | wc -l | tr -d ' ')
```
**Why `|| true` on grep instead of `grep -c`**: PT-08 response is compact JSON written by ASP.NET Core's default serializer — all items live on a single line of the response body. `grep -c` counts MATCHING LINES, so it would return 1 regardless of whether there are 1 or 10 accepted items (Risk 1 of the task spec). `grep -o` + `wc -l` preserves occurrence semantics. The `|| true` is the minimum local neutralisation of pipefail and is justified by the `coderule.mdc` exception "If an error is truly safe to ignore, log it or comment why" — the 8-line comment explains why grep cannot fail for I/O at this code path (file is curl-written, HTTP-200-gated).
**Shellcheck pass over the file (Risk 2 of task spec)**: surveyed all `grep ... | wc -l` and `grep -[oc]` sites. Only two `grep -o ... | wc -l` sites exist in the script — both at lines 416-417 (the ones fixed). The third `grep -o` site (line 141, region status polling) is already protected by `head -1 || true` and is not vulnerable. No other defensive work required.
## AC Test Coverage
| AC | Status | Where verified |
|----|--------|----------------|
| AC-1 — PT-08 completes on zero-rejected response | **Covered** | Standalone harness reproduces the exact pipeline under `set -e -o pipefail` with `accepted=2 rejected=0`; pipeline returns count `0` without script exit. |
| AC-2 — PT-08 completes on zero-accepted response (defensive) | **Covered** | Same harness with `accepted=0 rejected=2`; pipeline returns count `0` for accepted without script exit. |
| AC-3 — PT-08 summary line prints in full run | **Deferred to Step 15** | Requires `docker-compose up -d --build` + full default-parameter `./scripts/run-performance-tests.sh` (PERF_REPEAT_COUNT=20 PERF_UAV_BATCH_SIZE=10). This is exactly the Step 15 (Performance Test) gate for cycle 5. The spec is staged: AC-4 GIVEN clause depends on AC-3 + "the full perf run is green", so the natural verification environment is Step 15. |
| AC-4 — Leftover deletion on green full run | **Deferred to Step 15** | By spec design — AC-4 says the leftover file is deleted "in the same commit" as the green full perf run. That commit IS the Step 15 deliverable. Confirmed in the perf-cycle3 leftover's "Replay attempt #4" entry. |
**Spec-Gap?** No. AC-3 + AC-4 are not gaps for Step 10 — they are explicit Step 15 deliverables baked into the spec itself (AC-4 GIVEN clause). Step 10 (this batch) delivers the script fix that makes Step 15 possible.
## Test Strategy Note
This project has no shell-script unit test infrastructure (no BATS, no `scripts/test_*.sh`). The established pattern is "the script is the test" — `scripts/run-tests.sh` and `scripts/run-performance-tests.sh` ARE the verification surface. Adding BATS for a 1-SP single-line fix would be infra creep explicitly disallowed by `coderule.mdc` ("Avoid boilerplate and unnecessary indirection ... follow established project patterns"). The standalone harness recorded above is repeatable on demand and produces evidence equivalent to a BATS smoke test.
## Code Review Verdict: PASS
## Auto-Fix Attempts: 0
## Stuck Agents: None
## Next Batch: AZ-503 — Tile identity → UUIDv5 + integer UPSERT + bulk-list endpoint (3 SP)
+69
View File
@@ -0,0 +1,69 @@
# Deploy Report — Cycle 4 (AZ-500)
**Date**: 2026-05-12
**Cycle**: 4
**Scope**: Single-task cycle — **AZ-500 .NET 8 LTS → .NET 10 migration** (cross-cutting infra: every csproj, both Dockerfiles, every CI script, two docs, plus the Microsoft.OpenApi 1.x → 2.x compat refactor in `Program.cs`).
## What is shipping
### Code changes (committed to `dev`, pushed to `origin/dev`)
| Commit | Subject |
|--------|---------|
| `c0f004d` | `[AZ-500] Cycle 4 Step 9: new-task .NET 10 migration` |
| `8131363` | `[AZ-500] .NET 8 -> .NET 10 migration` |
| `de609cf` | `[AZ-500] Cycle 4 implement-skill wrap-up reports` |
| `af4219f` | `[AZ-500] Cycle 4 Steps 12-15 sync (test-spec / docs / security / perf)` |
All 4 commits on `dev`, pushed to `origin/dev` as of this report.
### Database migration
**None this cycle.** AZ-500 is runtime/SDK only; the `tiles` table schema and all DbUp migrations are unchanged. Per AZ-500 Constraint: "Do not rename any database objects during this task."
### Configuration changes (operator must verify before promoting)
| Setting | Was | Now | Source |
|---------|-----|-----|--------|
| Host SDK on every dev/CI machine | .NET 8.0.x SDK installed (or rolling from .NET 10.0.x via global.json `latestMinor` — no, that one didn't roll because cycle-3 pin was `8.0.0/latestMinor`) | **.NET 10.0.x SDK installed** (pin is now `10.0.0/latestMinor`, so any 10.0.x patch SDK satisfies — `dotnet --list-sdks` should show ≥ 10.0.0) | AZ-500 AC-2 — `global.json`. Surfaced in cycle-3 perf-harness leftover (host had only 10.0.103, .NET 8 pin wouldn't roll). Now resolved at the project pin level. |
| Docker image tag (`api` service) | `mcr.microsoft.com/dotnet/aspnet:8.0` (floating, last resolved 8.0.25-bullseye-slim) | **`mcr.microsoft.com/dotnet/aspnet:10.0`** (floating; first build pulls latest 10.0.x patch from Microsoft) | AZ-500 AC-3 — `SatelliteProvider.Api/Dockerfile`. CI image (`mcr.microsoft.com/dotnet/sdk:10.0`) similarly bumped in `.woodpecker/01-test.yml` + `scripts/run-tests.sh`. |
| **No new env vars introduced.** | — | — | AZ-500 carries forward the cycle-3 env contract verbatim (`JWT_SECRET ≥ 32B`, `JWT_ISSUER`, `JWT_AUDIENCE`, `GOOGLE_MAPS_API_KEY`). |
### Container image
- **Source**: `SatelliteProvider.Api/Dockerfile` multi-stage build, base `mcr.microsoft.com/dotnet/aspnet:10.0` (was `:8.0`).
- **Verification on dev workstation (local)**: `docker compose up -d --build` → API healthy on `:18980`, `/swagger` returns 301, anonymous probe of `/api/satellite/region/<id>` returns 401 (expected — JWT enforcement). Verified at the start of Step 15 perf run.
- **Verification on CI**: `origin/dev` push at `af4219f` triggers Woodpecker `01-test` (now on `mcr.microsoft.com/dotnet/sdk:10.0`) → `02-build-push` → registry tag `dev-arm`. **Operator action**: confirm the next CI run on `dev` succeeds before promoting to staging.
- **Multi-arch**: `mcr.microsoft.com/dotnet/sdk:10.0` and `aspnet:10.0` are published as multi-arch (amd64 + arm64) by Microsoft — verified via `docker manifest inspect` in cycle 3 (no change in cycle 4); Risk #6 from AZ-500 spec is closed.
## Verification gates passed in this cycle
| Gate | Result | Evidence |
|------|--------|----------|
| Step 11 — Functional test suite | **PASS** | 271 unit + integration tests green; `_docs/03_implementation/implementation_report_dotnet10_migration_cycle4.md` |
| Step 12 — Test-Spec Sync | **PASS** | `_docs/02_document/tests/traceability-matrix.md` updated with 8 AZ-500 AC rows + .NET 10 runtime restriction supersession + Cycle-4 coverage shape note |
| Step 13 — Update Docs | **PASS** | 8 doc files synced + `_docs/02_document/ripple_log_cycle4.md` (empty import-graph ripple recorded with rationale) |
| Step 14 — Security Audit | **PASS_WITH_WARNINGS** | `_docs/05_security/dependency_scan_cycle4.md` + `security_report_cycle4.md`; 0 new Critical/High; cycle-3 D2 (`Microsoft.NET.Test.Sdk` carry-over) still open per AZ-500 scope |
| Step 15 — Performance Test | **PASS_WITH_UNVERIFIED** | `_docs/06_metrics/perf_2026-05-12_cycle4.md`; PT-01..PT-07 PASS (7.7x improvement on PT-07 warm p95); PT-08 unmeasurable (pre-existing script bug, not a .NET 10 regression) |
## Outstanding leftovers (NOT closed by cycle 4)
1. **`_docs/_process_leftovers/2026-05-12_perf-cycle3-harness-execution.md`** — STAYS OPEN. Replay #3 (this cycle's Step 15 full run) appended; PT-08 still hits the same pre-existing `scripts/run-performance-tests.sh:417` grep-pipefail bug. Per AZ-500 Constraint: "leftover file is deleted ONLY when the full perf script runs cleanly." Closure path is the script-fix follow-up PBI below.
## Recommended follow-up PBIs (out of cycle-4 scope, surfaced for backlog)
| ID | Estimate | Title | Why |
|----|----------|-------|-----|
| (TBD) | 1 SP | Fix `scripts/run-performance-tests.sh:416-417` grep-pipefail | Replace `grep -o ... \| wc -l` with `grep -c ... \|\| true`. Unblocks PT-08 + closes the cycle-3 perf-harness leftover. Trivial mechanical fix. |
| (TBD) | 3 SP | Migrate `WithOpenApi(...)` callsites to ASP.NET Core 10 minimal-API metadata extensions | Clears 8 `ASPDEPR002` deprecation warnings in `Program.cs`. Recorded in `_docs/03_implementation/reviews/batch_01_cycle4_review.md`. API still fully functional in .NET 10 (deprecated, not removed). |
| (TBD) | 1 SP | Microsoft.OpenApi 2.x nullable cleanup | `CS8604` warning in `SatelliteProvider.Api/Swagger/ParameterDescriptionFilter.cs:25` exposed by the major bump. |
| (TBD) | 1 SP | Bump `Microsoft.NET.Test.Sdk` 17.8.0 → 17.13.0+ | Closes cycle-3 D2 (transitive `NuGet.Frameworks` flag). Test-runtime exposure only; safe to land independently. |
| (TBD) | 1 SP (recheck per cycle) | `Serilog.AspNetCore` 8.0.3 → 10.x | Currently retains 8.0.3 fallback per AZ-500 Risk #4 (no 10.x line published as of cycle 4). Re-check at every cycle start; bump as soon as a 10.x line ships. Removes the AGENTS.md/00_discovery/api_program inline-fallback notes. |
## Operator runbook for promoting to staging / production
1. Wait for CI to confirm `dev` build at `af4219f` is green and the registry has the new `dev-arm` (and any new amd64) tag built on `aspnet:10.0`.
2. Verify on a dev environment that `docker pull` of the new tag + `docker compose up -d` brings the API up healthy on the configured port; smoke-test `/swagger` (expect 200/301) and `/api/satellite/region/<random>` (expect 401, no 500).
3. If staging/production hosts have not yet provisioned the .NET 10 SDK on the *host* (only relevant for non-containerised perf-harness invocations), update host provisioning to install .NET 10.0.x SDK. The deployed container itself does NOT need a host SDK — only `scripts/run-performance-tests.sh` does (and that script normally only runs from a developer or CI machine).
4. No DB migration to apply. No env-var change to coordinate.
5. Roll forward; if a regression appears, roll back to the prior `dev-arm` tag (the one built from commit `de609cf` or earlier) — same compose contract, same env vars, same DB schema.
@@ -0,0 +1,31 @@
# Code Review Report
**Batch**: 01 (cycle 5) — AZ-504 (1 SP)
**Date**: 2026-05-12
**Verdict**: PASS
## Phase Summary
| Phase | Result |
|-------|--------|
| 1. Context Loading | OK — task spec read; intent = tolerate grep exit 1 on no-match while preserving occurrence-count semantics; do NOT change PT-08 threshold or production code. |
| 2. Spec Compliance | OK — AC-1 + AC-2 verified in standalone harness under `set -e -o pipefail`. AC-3 + AC-4 staged by spec design (AC-4 GIVEN clause couples them to the Step 15 full perf run). No Spec-Gap. |
| 3. Code Quality | OK — `\|\| true` is locally scoped, accompanied by an 8-line comment explaining why grep cannot fail for I/O at this code path; coderule.mdc allows this with comment. Mirror-image accepted/rejected lines are intentional, not duplication worth extracting. |
| 4. Security Quick-Scan | OK — no SQL, no command injection, no secrets, no input validation touch. |
| 5. Performance Scan | OK — same `grep -o \| wc -l` complexity as before; `\|\| true` adds zero overhead on the happy path. |
| 6. Cross-Task Consistency | N/A — single-task batch. |
| 7. Architecture Compliance | N/A — shell script, not in any C# component; no layer / Public API / cycle / duplicate-symbol concerns. |
## Findings
None.
## Notes
- Risk 1 (compact JSON vs `grep -c`) was verified empirically against `SatelliteProvider.Common/DTO/UavTileBatchUploadResponse.cs` — the ASP.NET Core default serializer emits the full `Items` list on a single line. `grep -c` would have collapsed all occurrences to a line count of 1. The chosen fix (`grep -o ... \|\| true \| wc -l`) preserves occurrence semantics. The 8-line in-script comment captures this so a future maintainer doesn't re-introduce `grep -c`.
- Risk 2 (other vulnerable `grep ... \| wc -l` sites) — Grep pass over the whole script found only the two sites named in the spec (416, 417). The third `grep -o` site (line 141, region status polling) is protected by `head -1 \|\| true`. No other defensive work required.
- Test infra note — no BATS/shell-test framework in this project. Adding one for a 1-SP fix is infra creep. The standalone harness in the batch report is the established equivalent verification.
## Verdict
**PASS** — no findings of any severity. Proceed to commit. AC-3 + AC-4 are Step 15 obligations, not Step 10 gaps.
+222
View File
@@ -0,0 +1,222 @@
# Retrospective — Cycle 4 (2026-05-12)
**Tasks**: AZ-500 (.NET 8 LTS → .NET 10 migration, 5 SP)
**Mode**: cycle-end (autodev Step 17)
**Previous retro**: `retro_2026-05-12_cycle3.md`
**Cycle shape**: single-task migration cycle (cross-cutting infra) — first such cycle since the project was documented.
## 1. Implementation Metrics
| Metric | Cycle 4 | Δ vs cycle 3 |
|--------|---------|--------------|
| Tasks implemented | **1** (AZ-500) | -5 |
| Batches executed | **1** | -4 |
| Avg tasks / batch | 1.0 | -0.2 |
| Total complexity delivered | **5 SP** | -13 SP |
| Avg complexity / batch | 5 SP | +1.4 |
| Tasks at-or-below 5 SP cap | **1 of 1 (100%)** | unchanged |
| Tasks above cap | 0 | unchanged |
| Cumulative reviews | **0** (single-task batch — Phase 6 cross-task consistency N/A; only the per-batch review ran) | -2 |
**Sequencing**: Single batch — AZ-500 is one atomic coordinated bump (per its own Constraint: "TFM, SDK pin, Docker images, CI images, and M.E.* package versions ALL move in the same commit"). The cycle completed in 4 dev commits (Step 9 task spec, Step 10 implementation, Step 10 wrap-up reports, Step 12-15 sync).
## 2. Quality Metrics
### Code Review Results
| Verdict | Count | Percentage |
|---------|-------|-----------|
| PASS | 0 | 0% |
| PASS_WITH_WARNINGS | **1** | **100%** |
| FAIL | 0 | 0% |
### Findings by Severity (per-batch code review)
| Severity | Cycle 4 | Δ vs cycle 3 |
|----------|---------|--------------|
| Critical | 0 | unchanged |
| High | 0 | unchanged |
| Medium | **2** (F1 ASPDEPR002 deprecation, F2 CS8604 nullable) | **+2** (cycle 3 had 0 new Medium) |
| Low | 1 (F3 implicit-scope perf-script path fix) | -6 |
### Findings by Category
| Category | Count | Top Files |
|----------|-------|-----------|
| Maintainability | 2 | `SatelliteProvider.Api/Program.cs` (F1), `SatelliteProvider.Api/Swagger/ParameterDescriptionFilter.cs` (F2) |
| Scope | 1 | `scripts/run-performance-tests.sh:49` (F3 — necessary scope creep, accepted) |
| Bug | 0 | — |
| Spec-Gap | 0 | — |
| Security | 0 NEW (5 NEW informational Lows in `dependency_scan_cycle4.md` — all "no published advisory" confirmations on the 10.x lines) | -2 Low (cycle 3 introduced 3) |
| Performance | 0 | — |
| Style | 0 | — |
**Note on the 2 Medium findings**: both are *consequences of the major-version bump itself*, not implementation defects. F1 (ASPDEPR002 on 8 `WithOpenApi(...)` callsites) is a deprecation that the .NET 10 line introduces — the API still works, but the old surface is slated for removal. F2 (CS8604 nullable on `parameter.Name`) is exposed by Microsoft.OpenApi 2.x's stricter nullable annotations. Both were intentionally deferred per AZ-500's "preserve behaviour during runtime/SDK migration" contract; neither was an implementation choice.
### Security audit (cycle 4)
| Metric | Value | Δ vs cycle 3 |
|--------|-------|--------------|
| Verdict | **PASS_WITH_WARNINGS** | unchanged |
| Mode | **Resume** (only `dependency_scan` re-run; static / OWASP / infrastructure carried forward unchanged because AZ-500 made no source-level edits to those surfaces) | new pattern |
| New Critical / High | 0 / 0 | unchanged |
| New Medium | **0** | unchanged |
| New Low (informational only — all "no published advisory" confirmations on the 10.x lines) | **5** | varies |
| Resolved findings | **2** (cycle-3 D1 + D3 forward-resolved by the major-version bump to JwtBearer 10.0.7 / OpenApi 10.0.7) | -1 (cycle 3 resolved 3) |
| Carry-overs (still OPEN) | 1 (cycle-3 D2 — `Microsoft.NET.Test.Sdk 17.8.0` transitive `NuGet.Frameworks` flag, explicitly out of AZ-500 scope) | unchanged |
### Performance gate (cycle 4 — first executed perf gate since cycle 1's AZ-484 run)
| Metric | Value |
|--------|-------|
| Verdict | **PASS_WITH_UNVERIFIED** |
| Scenarios | 7 Pass · 0 Warn · 0 Fail · 1 Unverified |
| AZ-500 NFR (Performance) | **MET** for 7/8 scenarios; PT-08 unmeasurable due to pre-existing `scripts/run-performance-tests.sh:417` grep-pipefail bug (not a .NET 10 regression) |
| PT-07 warm p95 | 301ms — **7.7x improvement** vs cycle-3 short-variant baseline (2340ms at N=2); cold p95 = 2782ms (-14%) |
| PT-06 route creation | 90ms — **49% improvement** vs cycle-3 (178ms); consistent with .NET 10 GC + JIT improvements on small-allocation paths |
| Cycle-3 perf-harness leftover | **STAYS OPEN** per AZ-500 Constraint (deletes only on a fully clean run; replay #3 results appended) |
## 3. Structural Metrics (snapshot: `structure_2026-05-12_cycle4.md`)
| Metric | Cycle 4 | Δ vs cycle 3 |
|--------|---------|--------------|
| .NET projects (csproj) | **9** | unchanged |
| Cross-project edges (ProjectReference) | **20** | unchanged |
| Cycles in project graph | 0 | unchanged |
| Average ProjectReferences per component | ~2.2 | unchanged |
| New Architecture violations | **0** | unchanged |
| Resolved Architecture violations | **2 forward** (cycle-3 D1 + D3 by major-version bump) | -1 (cycle 3 resolved 3 + the cycle-2 PT-07/PT-08 leftover) |
| Net Architecture delta | **0** | +3 (cycle 3 was -3) |
| Contract coverage % | unchanged (no new public API surfaces) | n/a |
**Cycle-4 is structurally neutral by design** — a runtime/SDK migration that preserves behaviour. The graph properties confirm this: same 9 projects, same 20 edges, same DAG (zero cycles), same in-degree distribution. The only structural-equivalence change is the coordinated 11-package M.E.* + 3-package ASP.NET + 1 Swashbuckle major-line bump, all forward-clean against advisories.
## 4. Efficiency Metrics
| Metric | Cycle 4 | Δ vs cycle 3 |
|--------|---------|--------------|
| Blocked tasks (during implementation) | **0 of 1** — implementation iterated 3 times during the implement skill (Serilog NU1605 → revert per Risk #4; Microsoft.OpenApi 2.x compile errors → user A/B/C decision → Swashbuckle 6.6.2 → 10.1.7 + Program.cs refactor; OpenApiSecurityRequirement type mismatch → wrap in lambda + List<string>), but never reached the code-review FAIL gate | improvement (cycle 3 had 1 blocked of 6) |
| Tasks completed first attempt (no post-review fix commits) | **1 of 1 (100%)** | best cycle on record (cycle 3 was 5 of 6) |
| Tasks requiring multiple post-code-review fix commits | 0 | unchanged |
| Most-findings batch | batch 01 (the only batch — 2 Medium + 1 Low; both Mediums are scope-deferred, not implementation defects) | n/a |
| Cumulative-review-only findings | n/a (single-task batch) | n/a |
| Step-15 (Perf Test) execution | **EXECUTED** (full default-param run; 7 Pass + 1 Unverified) | **first cycle since cycle 1 to actually execute Step 15** |
| Step-14 (Security Audit) — net findings improvement | **+2** (2 Resolved cycle-3 carry-overs by forward-bump; 5 new Low informational confirmations only) | unchanged net direction (cycle 3 was +1) |
## 5. Patterns Identified
### Pattern 1 — Major-version bumps cascade through transitive deps; the surprise was Microsoft.OpenApi 1.x → 2.x
The implementation hit two separate dependency-conflict surprises that the task spec didn't pre-anticipate:
1. **`Serilog.AspNetCore 10.0.0` → NU1605 conflict** with the project's pinned `Serilog.Sinks.File 6.0.0` (transitive dep `>= 7.0.0`). Resolved by reverting to `Serilog.AspNetCore 8.0.3` per Risk #4's pre-documented fallback. **The risk was anticipated; the resolution was already specified.** Good outcome.
2. **`Microsoft.AspNetCore.OpenApi 10.0.7` → CS0234/CS0246/CS7069 compile errors** in `Program.cs` because Microsoft.OpenApi 2.x removed the `Microsoft.OpenApi.Models` namespace, and Swashbuckle 6.6.2 (still pinned) only knows the 1.x namespace. **The risk was NOT anticipated** — Risk #2 in the task spec only flagged "Swagger UI generation breaking" at a high level; the actual breakage was at compile time (namespace removal), and the fix required bumping Swashbuckle (6.6.2 → 10.1.7) AND refactoring `Program.cs` to use the new 2.x types (`OpenApiSecuritySchemeReference`, `JsonSchemaType`, `IOpenApiSchema`). The user was correctly asked via A/B/C; option A was picked; the refactor landed cleanly.
**Insight**: when a task spec lists "Microsoft.AspNetCore.* package bumps", it should also list the **transitive packages whose major version changes** as a result. Microsoft.OpenApi was technically transitive in cycle 3 (Swashbuckle 6.6.2 pulled it in at 1.x); after the AspNetCore 10.0.7 bump it became *direct* (the new line wants 2.x), forcing Swashbuckle and source code to follow. **A pre-implementation `dotnet restore` dry run (or NuGet manifest diff) against the proposed pin set would have caught this in spec-time, not implementation-time.**
### Pattern 2 — A pre-existing bug in a sibling script became visible the moment the migration unblocked the path leading to it
`scripts/run-performance-tests.sh:417` (`grep -o ... | wc -l` with `set -o pipefail`) had been broken since the script was written, but it never fired because the script always exited earlier (cycle 1 didn't reach PT-08; cycle 2 PT-08 was a stub; cycle 3 perf gate was skipped). AZ-500's bootstrap fix (replay #2) and the Step 15 full run (replay #3) were the first two runs ever to reach line 417 — and both crashed there with the same root cause.
**Insight**: scope-protected migrations like AZ-500 explicitly defer fixes to *unrelated* code, but they should explicitly call out **bugs the migration newly EXPOSES** (vs. introduces) as required follow-up PBIs in the task spec's `Risks & Mitigation` or a new `Newly Exposed Bugs` section. The cycle-3 perf-harness leftover already documents this case in detail; the pattern itself is what's worth surfacing.
### Pattern 3 — "Resume" mode for the security audit avoided 4 unnecessary re-runs without losing posture
The cycle-4 security audit ran in **resume mode** — only Phase 1 (dependency scan) was re-executed; Phases 2/3/4 (static analysis, OWASP, infrastructure) carried forward unchanged from cycle 3 because AZ-500 made no source-level edits to those surfaces (only `Program.cs` Swashbuckle DI registration internals + one `using` directive change in `ParameterDescriptionFilter.cs`).
**Insight**: per-cycle sub-skill prerequisites that ask "resume / overwrite / skip" should accept "resume narrowed to phases X" as a first-class option when the cycle's scope is provably outside the unchanged phases' coverage. We did this manually here via the A/B/C choice; codifying it in the security skill's prereq #3 would be a small DX win.
### Pattern 4 — First cycle since cycle 1 where Step 15 (Performance Test) actually executed end-to-end
Cycles 2 + 3 both **skipped** Step 15. Cycle 4 ran the full default-parameter perf harness (PT-01..PT-08) against the migrated `dev` image and captured 7 Pass + 1 Unverified (PT-08 hit the pre-existing script bug). This produced:
- The first PT-07/PT-08 numbers ever recorded against the active codebase (cycle-1 baseline was AZ-484-only; PT-07/PT-08 were added in cycle 3 by AZ-492 but never executed end-to-end);
- A 7.7x improvement signal on PT-07 warm p95 (mostly N=20 dilution; partly .NET 10 pipeline);
- Concrete confirmation that AZ-500 NFR (Performance — "must not regress beyond existing thresholds") is met for every scenario where measurement was possible.
**Insight**: the cycle-3 retro Action 1 ("Execute the cycle-3 perf harness against the deployed `dev` image to convert the cycle-3 perf-execution leftover into PT-07/PT-08 baseline numbers") was effectively absorbed into AZ-500's NFR (Performance) gate. Worth recording that Action-1 happened — just not as a standalone PBI, but folded into the migration's verification.
### Pattern 5 — Single-task cycles produce structurally neutral retro metrics — that's not a problem, but it confuses the trend lines
Most efficiency / quality / structural metrics in this retro are "unchanged from cycle 3" or "0 NEW", because runtime/SDK migrations don't add business logic. The trend graphs in §6 will show artificial flatlines on metrics like "tasks implemented" (1 vs 6) and "complexity delivered" (5 vs 18 SP). That's correct — but if read naively, it could look like cycle 4 was "less productive" than cycle 3.
**Insight**: when a cycle has a single non-functional task (migration, refactor, dependency hygiene), the retro should explicitly reframe the metric set around what *that* PBI shape proves: continuity (0 regressions), forward-resolution (cycle-3 D1+D3 closed), and unblocking (perf harness now runnable end-to-end). The "tasks implemented" count as a proxy for productivity is misleading here.
## 6. Comparison vs. previous retros
| Metric | Cycle 1 | Cycle 2 | Cycle 3 | Cycle 4 |
|--------------------------------------|----------------|----------------|----------------|----------------|
| Tasks implemented | 1 | 2 | 6 | **1** |
| Total complexity delivered | 8 SP | 10 SP | 18 SP | **5 SP** |
| Batches | 1 | 2 | 5 | **1** |
| Critical/High review findings | 0 | 0 | 0 | **0** |
| New Medium review findings | 0 | 0 | 0 | **2** (both bump-consequences, scope-deferred) |
| New Low review findings | 3 | 6 (5 distinct) | 7 | **1** |
| Code review pass rate | 100% (1/1) | 100% (2/2) | 100% (5/5) | **100% (1/1)** |
| Tasks completed first attempt | 0 of 1 | 0 of 2 | 5 of 6 | **1 of 1** |
| New Medium security findings | 2 | 2 | 0 | **0** |
| Resolved security findings | 0 | 0 | 3 | **2** (forward-bump) |
| Net Architecture delta | n/a (baseline) | +0 | -3 | **0** |
| Step-15 (Perf) executed | YES (AZ-484) | SKIPPED | SKIPPED | **YES (full PT-01..PT-08)** |
| Step-15 leftover present at retro | NO | YES | YES | **YES (still — pre-existing script bug, not migration-caused)** |
### Did the cycle-3 actions land?
- **Cycle 3 Action 1 (execute perf harness against deployed dev image)** — landed implicitly via AZ-500 Step 15 full run. PT-07 + PT-08 numbers now recorded in `_docs/06_metrics/perf_2026-05-12_cycle4.md`. **Verdict: implemented as part of the migration's NFR gate**, not as a standalone 2-SP PBI. Leftover stays open due to the *separate* pre-existing PT-08 script bug.
- **Cycle 3 Action 2 (bump `System.IdentityModel.Tokens.Jwt 7.0.3 → 7.1.2+` to clear NU1902)** — **NOT landed in cycle 4**. Explicitly out of AZ-500 scope per "no unrelated package bumps" Constraint. NU1902 hits still appear in every test build log (counted ~9 in the AZ-500 perf-harness build trace). Recommended rollover to cycle 5.
- **Cycle 3 Action 3 (`workspace:` field on cross-repo ACs in new-task skill)** — **NOT landed in cycle 4**. AZ-500 had zero cross-repo ACs (it's a single-workspace migration), so this rule was not exercised. Recommended rollover.
This is the second consecutive cycle where a prior-retro action fully closed (Action 1 here, Actions 1+2+3 in cycle 3). Pattern is stable — good.
## 7. Top 3 Improvement Actions (ranked by impact)
### Action 1 — Fix `scripts/run-performance-tests.sh:416-417` grep-pipefail (1 SP, mechanical)
**Why this is the highest impact**: this single-line bug (replace `grep -o ... | wc -l` with `grep -c ... || true`) blocks PT-08 measurement on every run, AND blocks closure of the cycle-3 perf-harness leftover that has now been carried across 3 cycles. Two consecutive replays (#2 + #3) reproduced the exact same failure mode at the same line; the fix is provably needed AND provably small.
**Action**: 1 SP PBI in cycle 5. Replaces 2 lines in the script. After it lands, re-run `./scripts/run-performance-tests.sh` once — if the full PT-01..PT-08 sweep is clean, delete `_docs/_process_leftovers/2026-05-12_perf-cycle3-harness-execution.md`.
**Cost**: ~15 minutes (edit + one perf run + leftover deletion). Counted as 1 SP because the deletion contract requires a successful end-to-end run.
### Action 2 — Migrate the 8 `WithOpenApi(...)` callsites in `Program.cs` to ASP.NET Core 10 minimal-API metadata extensions (3 SP)
**Why**: clears the 8 `ASPDEPR002` deprecation warnings that AZ-500 left behind (intentionally, per the "preserve behaviour" contract). The replacement API is documented at `https://aka.ms/aspnet/deprecate/002` — straightforward swap to `WithSummary`/`WithDescription`/`WithName`/`WithTags`. Once migrated, the deprecation warning drops out of every CI build log and the Swagger UI continues to render the same metadata.
**Action**: 3 SP PBI in cycle 5. Already filed as a recommended follow-up in `_docs/03_implementation/reviews/batch_01_cycle4_review.md`. Test gate is the existing `SwaggerDocument_AdvertisesBearerSecurityScheme` programmatic check + a manual swagger-UI smoke (Bearer Authorize button still present).
**Cost**: 3 SP — touches 8 callsites, light testing burden. Could reasonably be paired with Action 3 below as a single 5-SP "OpenApi 2.x cleanup" PBI.
### Action 3 — Pre-flight transitive-major-version impact analysis at task-spec time (process change, ~0 SP for the rule, ~1 SP for the next migration that uses it)
**Why**: AZ-500's biggest implementation surprise (Microsoft.OpenApi 1.x → 2.x cascade forcing Swashbuckle bump + Program.cs refactor) was knowable in advance with a `dotnet restore --dry-run` or `dotnet list package --include-transitive` before/after the proposed pin set. The task spec accurately listed the *direct* package bumps but not the transitive major-version flips. Pattern 1 above documents the cost: extra A/B/C round-trip during implementation, plus an unscheduled refactor.
**Action**:
- Add to `coderule.mdc` (or a new `task-spec.mdc`): "When a task spec proposes a major-version bump of any direct dependency, the spec must also list the transitive packages whose major version changes as a result, OR explicitly note 'transitive major-version drift not analyzed in spec — implementer to surface and ASK if any non-trivial transitive bump is required'. The check is: run `dotnet restore --dry-run` (or `dotnet list package --include-transitive`) against a scratch branch with the proposed pins before writing the spec, OR mark the spec as 'transitive analysis deferred to implementation time' so the implementer knows to allow extra time."
- Optionally add a `new-task/SKILL.md` Step prompt: "for any package bump in this task, has the transitive major-version drift been analyzed?"
**Cost**: rule addition is 0 SP; the next migration PBI that adopts the practice will absorb a ~1 SP slack to do the dry-run analysis at spec time. ROI: avoids the ~30-minute mid-implementation unscheduled refactor + A/B/C round-trip that AZ-500 hit.
## 8. Suggested Rule / Skill updates
| File | Change | Rationale |
|------|--------|-----------|
| `coderule.mdc` (new bullet near "library API verification" section) | "When a task spec proposes a major-version bump of any direct dependency, the spec must list the transitive packages whose major version changes as a result, OR explicitly note 'transitive major-version drift not analyzed in spec'. The check at spec-time is `dotnet restore --dry-run` against a scratch branch (or equivalent for non-.NET ecosystems)." | Pattern 1 (Microsoft.OpenApi 1.x → 2.x cascade); Action 3 |
| `.cursor/skills/security/SKILL.md` (Phase prereq #3) | Add a fourth option to the resume/overwrite/skip prompt: "Resume narrowed (only re-run Phase X)" — applicable when the cycle's source-level changes are provably outside the unchanged phases' coverage. | Pattern 3 (Resume mode worked manually here; codifying saves a few minutes per single-PBI infrastructure cycle) |
| `coderule.mdc` (new bullet in scope-discipline section) | "When a scope-protected task (migration, dependency bump) newly *exposes* a pre-existing bug elsewhere in the codebase, the implementation MUST surface it as a recommended follow-up PBI in the batch report — and the cycle's deploy report MUST list it as a 'newly exposed bug' separate from 'newly introduced findings'. Bugs that already existed do not count as cycle-introduced regressions, but they must not be silently re-buried." | Pattern 2 (`scripts/run-performance-tests.sh:417` grep-pipefail surfaced via AZ-500's bootstrap fix) |
| `.cursor/skills/retrospective/SKILL.md` (Step 2 narrative guidance) | When a cycle has a single non-functional task (migration / refactor / dependency hygiene), the retro report should explicitly reframe the metric set around continuity (0 regressions), forward-resolution (prior findings closed), and unblocking (capabilities now exercisable end-to-end) — not just task count + complexity points. | Pattern 5 (single-task migration cycles produce flatline metrics that are correct but easily misread) |
## 9. Decision items carried over (operator)
- **Cycle-3 perf-harness leftover** — STAYS OPEN per AZ-500 Constraint. Closure path: Action 1 above (1 SP fix → re-run → delete file). Tracked in `deploy_cycle4.md` runbook + `_docs/_process_leftovers/2026-05-12_perf-cycle3-harness-execution.md`.
- **Admin team iss/aud confirmation** (carried from cycle 3) — still required before promoting beyond `dev`. Unchanged. Tracked in `deploy_cycle3.md` + `deploy_cycle4.md`.
- **Cross-repo doc** (carried from cycle 3) — `suite/_docs/10_auth.md` paragraph addition. Unchanged.
## 10. What this retro says about process maturity
Cycle 4 is the first cycle that:
- Executed Step 15 (Performance Test) end-to-end against a real deployed image since cycle 1.
- Forward-resolved supply-chain advisories by *bumping past them* (cycle-3 D1+D3 closed by the major-version migration), rather than by patch-line bumps.
- Surfaced a transitive-major-version cascade as an implementation-time surprise — and the response chain (NU1605 fail → revert per pre-spec'd Risk → continue; CS0234 fail → user A/B/C → bump Swashbuckle + refactor → continue) showed the implement-skill's recovery loop working as designed.
- Carried a leftover across two cycles where the *underlying capability is healthy* but the *instrumentation harness is broken*. The leftover replay protocol kept the issue visible without blocking forward progress.
The process continues to converge. The remaining friction points after cycle 4 are (a) transitive-dep awareness at spec-time (Action 3), (b) the lingering `scripts/run-performance-tests.sh` grep-pipefail (Action 1), and (c) the cycle-3 carry-overs (Test.Sdk + IdentityModel.Tokens.Jwt) that AZ-500 explicitly excluded — all are concrete cycle-5 PBI candidates totalling ~6 SP, comfortably below a normal cycle's capacity.
@@ -0,0 +1,84 @@
# Structural Snapshot — 2026-05-12 (post-cycle 4, AZ-500)
Cycle 4 delta against `structure_2026-05-12_cycle3.md`. Source of truth: `_docs/02_document/module-layout.md` + on-disk `*.csproj` graph + `_docs/02_document/contracts/`.
## Projects
| Layer | csproj | Cycle 4 delta |
|-------|--------|---------------|
| 1 (Foundation) | `SatelliteProvider.Common` | TFM `net8.0``net10.0` |
| 1 (Foundation) | `SatelliteProvider.DataAccess` | TFM `net8.0``net10.0`; M.E.* 9.0.10 → 10.0.7 (Configuration.Abstractions, Logging.Abstractions) |
| 3 (Application) | `SatelliteProvider.Services.TileDownloader` | TFM `net8.0``net10.0`; M.E.* 9.0.10 → 10.0.7 (Caching.Memory, Http, Logging.Abstractions, Options.ConfigurationExtensions) |
| 3 (Application) | `SatelliteProvider.Services.RegionProcessing` | TFM `net8.0``net10.0`; M.E.* 9.0.10 → 10.0.7 (DependencyInjection.Abstractions, Hosting.Abstractions, Logging.Abstractions, Options.ConfigurationExtensions) |
| 3 (Application) | `SatelliteProvider.Services.RouteManagement` | TFM `net8.0``net10.0`; M.E.* 9.0.10 → 10.0.7 (DependencyInjection.Abstractions, Hosting.Abstractions, Logging.Abstractions, Options.ConfigurationExtensions) |
| 4 (API / Entry) | `SatelliteProvider.Api` | TFM `net8.0``net10.0`; `Microsoft.AspNetCore.Authentication.JwtBearer` 8.0.25 → **10.0.7**; `Microsoft.AspNetCore.OpenApi` 8.0.25 → **10.0.7**; `Swashbuckle.AspNetCore` 6.6.2 → **10.1.7** (drove transitive `Microsoft.OpenApi` 1.x → 2.3.x); `Serilog.AspNetCore` retained at 8.0.3 (Risk #4 fallback); `Program.cs` Microsoft.OpenApi 2.x setup refactor (3 internal edits — namespace, `OpenApiSecuritySchemeReference`, `JsonSchemaType` + `IOpenApiSchema`); `Swagger/ParameterDescriptionFilter.cs` namespace update |
| 5 (Test-Support) | `SatelliteProvider.TestSupport` | TFM `net8.0``net10.0`; NuGet refs unchanged (still pinned to `Microsoft.IdentityModel.Tokens 7.0.3` + `System.IdentityModel.Tokens.Jwt 7.0.3` — cycle-3 D4 carry-over) |
| 6 (Tests) | `SatelliteProvider.Tests` | TFM `net8.0``net10.0`; M.E.* 9.0.10 → 10.0.7 (Caching.Memory, Configuration.Json, DependencyInjection, Http, Logging.Abstractions, Logging.Console, Options); `Microsoft.AspNetCore.Authentication.JwtBearer` 8.0.25 → 10.0.7 (transitively via ProjectReference to Api) |
| 6 (Tests) | `SatelliteProvider.IntegrationTests` | TFM `net8.0``net10.0`; transitive bumps via TestSupport + Api |
**Project count**: 9 (unchanged from cycle 3 — AZ-500 is a runtime/SDK migration, not a project-graph change).
## Cross-Project Import Edges (compile-time `ProjectReference`)
| Edge | Count | Cycle 4 delta |
|------|-------|----------------|
| Api → {Common, DataAccess, TileDownloader, RegionProcessing, RouteManagement} | 5 | unchanged |
| TileDownloader → {Common, DataAccess} | 2 | unchanged |
| DataAccess → {Common} | 1 | unchanged |
| RegionProcessing → {Common, DataAccess} | 2 | unchanged |
| RouteManagement → {Common, DataAccess} | 2 | unchanged |
| Tests → {Api, TileDownloader, RegionProcessing, RouteManagement, Common, DataAccess, TestSupport} | 7 | unchanged |
| IntegrationTests → {TestSupport} (+ runtime DTOs only) | 1 | unchanged |
**Total ProjectReference edges**: 20 (unchanged from cycle 3). AZ-500 added zero cross-project import edges.
## Source-import sites — cycle 4 delta
| Importer | Imports from | Cycle 4 delta |
|----------|--------------|---------------|
| `SatelliteProvider.Api/Program.cs` | `Microsoft.OpenApi` (was `Microsoft.OpenApi.Models`) | namespace move (1.x → 2.x); same conceptual surface, different package layout |
| `SatelliteProvider.Api/Swagger/ParameterDescriptionFilter.cs` | `Microsoft.OpenApi` (was `Microsoft.OpenApi.Models`) | same namespace move |
| All other source files | unchanged | — |
**No new source-level imports introduced.** The two changed `using` directives are namespace renames forced by the Microsoft.OpenApi 2.x layout, not new dependencies.
## Graph properties
- **Cycles in project import graph**: 0 (clean DAG — unchanged)
- **Average ProjectReferences per component**: 20 / 9 = ~2.2 (unchanged from cycle 3)
- **Max in-degree**: Common (still highest at 6 — Api, TileDownloader, DataAccess, RegionProcessing, RouteManagement, Tests).
- **Max out-degree**: Tests (7 — unchanged from cycle 3).
- **TestSupport position**: leaf-of-test-subgraph; no production-layer importers (unchanged).
## NuGet dependency hygiene (cycle 4)
| Package | Cycle-3 version | Cycle-4 version | Status |
|---------|-----------------|-----------------|--------|
| `Microsoft.AspNetCore.Authentication.JwtBearer` | 8.0.25 | **10.0.7** | RESOLVED cycle-3 D1 forward (now on the 10.x line; same underlying CVE patch, current line) |
| `Microsoft.AspNetCore.OpenApi` | 8.0.25 | **10.0.7** | RESOLVED cycle-3 D3 forward (same as above) |
| `Swashbuckle.AspNetCore` | 6.6.2 | **10.1.7** | NEW major line — 0 known vulnerabilities; bumped to land Microsoft.OpenApi 2.x compat required by ASP.NET Core 10 |
| `Microsoft.OpenApi` (transitive via Swashbuckle) | 1.x | **2.3.x** | NEW major line — 0 known vulnerabilities; drove the 3 internal `Program.cs` setup edits |
| `Serilog.AspNetCore` | 8.0.3 | **8.0.3 (unchanged)** | Risk #4 fallback — no 10.x line published as of cycle 4; restores cleanly on `net10.0` via netstandard 2.0; recheck per cycle |
| `Microsoft.Extensions.*` (11 distinct package IDs) | 9.0.10 | **10.0.7** | Coordinated bump across 6 csproj files; 0 known vulnerabilities on 10.0.7 line; historical CVE-2024-43483 already not applicable in cycle 3 (9.0.10 baseline post-rc.1 cutoff) |
| `Microsoft.IdentityModel.Tokens` (TestSupport) | 7.0.3 | **7.0.3 (unchanged)** | Cycle-3 D4 carry-over — explicitly out of AZ-500 scope per "no unrelated package bumps" Constraint; recommend separate PBI |
| `System.IdentityModel.Tokens.Jwt` (TestSupport) | 7.0.3 | **7.0.3 (unchanged)** | Same disposition as above |
| `Microsoft.NET.Test.Sdk` | 17.8.0 | **17.8.0 (unchanged)** | Cycle-3 D2 carry-over (transitive `NuGet.Frameworks` flag); explicitly out of AZ-500 scope |
| `SixLabors.ImageSharp` | 3.1.11 | **3.1.11 (unchanged)** | clean |
| `Npgsql` | 9.0.2 | **9.0.2 (unchanged)** | clean |
| `Newtonsoft.Json` | 13.0.4 | **13.0.4 (unchanged)** | clean |
| `Dapper` | 2.1.35 | **2.1.35 (unchanged)** | clean |
| `dbup-postgresql` | 6.0.3 | **6.0.3 (unchanged)** | clean |
## Architecture / contract surface (cycle 4 delta)
- **No new public-API contracts** under `_docs/02_document/contracts/` this cycle. AZ-500 preserves every endpoint shape, every DTO, the JWT validation contract (signature + lifetime + iss + aud + 30s clock skew), the multi-source `tile-storage` contract v1.0.0, the `uav-tile-upload` contract v1.0.0, and the AZ-488 permissions claim policy.
- `_docs/02_document/architecture.md` updated to reference .NET 10 / ASP.NET Core 10 / JwtBearer 10.0.7 (Authentication & Authorization paragraph + §2 Tech Stack table); the Architecture Vision prose is unchanged.
- The 8 `WithOpenApi(...)` callsites in `Program.cs` now emit `ASPDEPR002` deprecation warnings — recorded as a follow-up PBI (3 SP). API surface is unchanged.
## Net Architecture delta vs cycle 3
- **Resolved**: cycle-3 D1 (CVE-2026-26130 SignalR DoS, JwtBearer 8.0.21 line) **forward-resolved by major-version bump**; cycle-3 D3 (Microsoft.AspNetCore.OpenApi 8.0.21 line) similarly forward-resolved. The cycle-3 perf-harness leftover stays OPEN per AZ-500 Constraint, NOT closed by this cycle. **Total resolved: 2 (both Low/forward).**
- **Newly introduced (informational only)**: 5 new Low informational findings in `dependency_scan_cycle4.md` (F1-cy4..F5-cy4 — all "no known vulnerabilities" confirmations on the 10.x lines); 0 new Medium; 0 new High; 0 new Critical. 2 new Maintainability Mediums in code review (F1 ASPDEPR002 deprecation, F2 CS8604 nullable) — both deferred per "scope discipline" + the AZ-500 contract of "preserve behaviour during migration".
- **Net Architecture delta**: 0 net architecture-level findings. The cycle is structurally neutral — same component count, same edges, same DAG, same contract surface. The only structural-equivalence change is the 11-package M.E.* + 3-package ASP.NET + 1 Swashbuckle major-line bump, all forward-clean against advisories.
Cycle 4 is the first runtime/SDK-migration cycle. Net-zero is the expected outcome for that PBI shape — there is no business-logic delta to add structure. The graph properties confirming this is a "preserve-behaviour" migration: same imports, same edges, same cycles (zero), same in-degree distribution, same DAG.
+6
View File
@@ -37,6 +37,12 @@ If the enum's wire string happens to match a member name case-insensitively (e.g
## Ring buffer (last 15 entries — newest at top)
- [2026-05-12] [dependencies] Major-version bumps of direct deps cascade through transitives; the task spec must list the transitive packages whose major version changes as a result OR explicitly note "transitive major-version drift not analyzed in spec" — verify with `dotnet restore --dry-run` against a scratch branch before writing the spec (cycle 4: AZ-500 surprise-bumped `Microsoft.OpenApi` 1.x → 2.x via the `Microsoft.AspNetCore.OpenApi` 8.0.25 → 10.0.7 path; forced an unscheduled Swashbuckle bump + Program.cs refactor mid-implementation).
Source: _docs/06_metrics/retro_2026-05-12_cycle4.md
- [2026-05-12] [process] When a scope-protected task newly *exposes* a pre-existing bug elsewhere in the codebase (vs. introducing a new one), surface it as a recommended follow-up PBI in the batch report AND list it as a "newly exposed bug" separate from "newly introduced findings" in the deploy report — bugs that already existed don't count as cycle-introduced regressions, but they must not be silently re-buried (cycle 4: AZ-500's bootstrap fix unmasked the pre-existing `scripts/run-performance-tests.sh:417` `grep -o | wc -l` + `pipefail` bug).
Source: _docs/06_metrics/retro_2026-05-12_cycle4.md
- [2026-05-12] [process] When a cycle has a single non-functional task (migration / refactor / dependency hygiene), the retro must reframe the metric set around continuity (0 regressions), forward-resolution (prior findings closed by the bump itself), and unblocking (capabilities now exercisable end-to-end) — task count + complexity points read as misleading flatlines that look like under-productivity (cycle 4: AZ-500 alone delivered 5 SP vs cycle 3's 18 SP, but the cycle's value was forward-resolving 2 cycle-3 advisories and finally executing PT-01..PT-08 end-to-end against the migrated build).
Source: _docs/06_metrics/retro_2026-05-12_cycle4.md
- [2026-05-12] [process] For cross-team blockers (admin team must supply config values, etc.), prefer an Option-B forcing function (ship the validation/scaffolding with prod-empty config that fails-fast at deploy) over deferring the entire task — the fail-fast contract makes the cross-team conversation impossible to skip and ships the in-workspace work in the current cycle (cycle 3: AZ-494 shipped iss/aud validation with empty prod appsettings so deploy must supply real values).
Source: _docs/06_metrics/retro_2026-05-12_cycle3.md
- [2026-05-12] [process] ACs that prescribe a specific measurement or sentinel mechanism (e.g. "per-item latency < 50ms", "guard fires when DB name contains _test") should also prescribe — or explicitly defer — the path for collecting / enforcing it, or implementations will substitute proxies / equivalents that look like spec drift in review (cycle 3: AZ-492 PT-08 per-item gate cost became a derived proxy; AZ-493 DB-name guard became Host-allowlist).
+6 -6
View File
@@ -2,14 +2,14 @@
## Current Step
flow: existing-code
step: 16
name: Deploy
step: 10
name: Implement
status: in_progress
sub_step:
phase: 1
name: commit-cycle4-sync
detail: ""
phase: 14
name: batch-loop
detail: "batch 1/2 done (AZ-504, commit ab437a1, In Testing); batch 2/2 = AZ-503 pending"
retry_count: 0
cycle: 4
cycle: 5
tracker: jira
auto_push: true
@@ -122,3 +122,11 @@ User picked A at the Step 15 (Performance Test) gate of cycle 4. Full default-pa
## Replay obligation
Open a new follow-up PBI for the `scripts/run-performance-tests.sh:416-417` grep fix (estimated 1 SP). Once that lands and a full perf run is green, delete this file. Until then, this leftover stays.
## Replay attempt #4 — 2026-05-12T13:00:00Z (cycle 5 /autodev Step 9 New Task)
PBI opened: **AZ-504 — "Perf script: fix grep | wc -l pipefail crash in PT-08"** (1 SP, parent epic AZ-483 — same as PT-08 scenario owner AZ-488). Spec landed at `_docs/02_tasks/todo/AZ-504_perf_script_grep_pipefail_fix.md`. The spec captures the AC-4 obligation that THIS leftover file is deleted in the same commit as the green full perf run.
The "open the PBI" half of the Replay obligation is now done. The "full perf run is green" half remains outstanding — this leftover stays open until AZ-504 lands AND a default-parameter `./scripts/run-performance-tests.sh` (`PERF_REPEAT_COUNT=20 PERF_UAV_BATCH_SIZE=10`) exits 0 against an api built from `dev`.
Next-cycle /autodev should NOT attempt replay #5 (open another PBI) — AZ-504 is the canonical replay vehicle. The next replay action is implementing AZ-504 itself (cycle 5 Step 10).
+10 -2
View File
@@ -413,8 +413,16 @@ else
continue
fi
accepted=$(grep -o '"status":"accepted"' "$PERF_TMP_DIR/pt08_resp.json" | wc -l | tr -d ' ')
rejected=$(grep -o '"status":"rejected"' "$PERF_TMP_DIR/pt08_resp.json" | wc -l | tr -d ' ')
# AZ-504: grep exits 1 on zero matches. Under `set -o pipefail` (line 16)
# that kills the assignment and crashes the script on the happy path
# (rejected=0). Neutralise the no-match case locally with `|| true` so
# the pipeline still produces a count. The response is compact JSON
# (one line, all items) so `grep -o | wc -l` is required to count
# occurrences — `grep -c` would only count matching lines (=1). The
# file is guaranteed-readable here (curl wrote it earlier in this
# iteration on the HTTP 200 branch), so grep cannot fail for I/O.
accepted=$({ grep -o '"status":"accepted"' "$PERF_TMP_DIR/pt08_resp.json" || true; } | wc -l | tr -d ' ')
rejected=$({ grep -o '"status":"rejected"' "$PERF_TMP_DIR/pt08_resp.json" || true; } | wc -l | tr -d ' ')
PT08_ACCEPTED=$((PT08_ACCEPTED + accepted))
PT08_REJECTED=$((PT08_REJECTED + rejected))