[AZ-492] Cycle 3 batch 4: perf harness PT-07 + PT-08 + JWT-attach

Drains all three deferred perf-harness items in one batch: - PT-01..PT-06 now carry Authorization: Bearer minted via the canonical SatelliteProvider.TestSupport.JwtTokenFactory (AZ-491) — no third copy of JWT logic in the shell. - PT-07 implemented as cold + warm dual-pass distribution (N=20 each), reports p50/p95 for both passes and fails if warm p95 >= cold p95. - PT-08 implemented as 20-batch upload distribution with batch p95 gated at the AZ-488 2000 ms target; per-item gate cost reported as derived proxy (batch_p95 / batch_size). New SatelliteProvider.IntegrationTests/PerfBootstrap.cs adds two CLI short-circuit subcommands (--mint-only and --gen-uav-fixture <path>) invoked by the shell so the perf script never inlines the JWT or JPEG-fixture logic. The dispatch sits at the top of Program.cs Main and runs before any HTTP / DB / readiness setup. performance-tests.md PT-07 + PT-08 flip from Deferred to Implemented. traceability-matrix.md PT-07 + PT-08 rows move from recorded to covered (PT-08 partial due to per-item proxy — flagged Low in batch-4 review). _docs/_process_leftovers/2026-05-11_perf-pt07-harness.md deleted; the leftovers directory is now empty. Closes cycle-2 retro Action 2; LESSONS.md [process] rule about Deferred NFRs remains in force as a guardrail. Also includes the previously-uncommitted cumulative review report for cycle-3 batches 01-03 (generated at the end of batch 3 but not staged). Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-21 19:01:15 +00:00 · 2026-05-12 01:52:25 +03:00
parent 745f4840e6
commit 080441db5d
14 changed files with 715 additions and 76 deletions
@@ -0,0 +1,131 @@
+# Performance harness: PT-07 + PT-08 + JWT-attach in run-performance-tests.sh
+
+**Task**: AZ-492_perf_harness_pt07_pt08_jwt_attach
+**Name**: Perf harness drains all 3 deferred items
+**Description**: Promote the deferred PT-07 (route-tile fetch warm cache) and PT-08 (UAV tile batch upload latency) NFRs into actual runnable scenarios in `scripts/run-performance-tests.sh`, AND fix the script so PT-01..PT-06 stop returning 401 against the post-AZ-487 build by attaching a Bearer token to every request.
+**Complexity**: 3 points
+**Dependencies**: AZ-487 (Bearer-token attach depends on the JWT_SECRET / token-mint surface introduced in AZ-487); soft dependency on AZ-491 (Option B token-mint reuse)
+**Component**: Test infrastructure (`scripts/run-performance-tests.sh`) + perf NFR coverage (`_docs/02_document/tests/performance-tests.md`)
+**Tracker**: AZ-492
+**Epic**: none (cycle-3 test-infrastructure hardening)
+
+## Problem
+
+The performance test gate has now been 0-of-N for two cycles running, and the perf-side rot is actively masking real regressions:
+
+1. **Cycle 1 (AZ-484)**: PT-07 (route-tile fetch warm cache) was added to `performance-tests.md` and the traceability matrix, but the runner-script implementation was not. Recorded as `Deferred — harness work tracked in _docs/_process_leftovers/2026-05-11_perf-pt07-harness.md`. Step 15 Performance Test gate marked all scenarios `Unverified`. Cycle 1 retrospective Action 2 introduced the "Deferred-status NFRs are allowed at most once" rule (LESSONS.md `[process]`).
+2. **Cycle 2 (AZ-488)**: PT-08 (UAV tile batch upload latency) was added to `performance-tests.md`, again as Deferred under the cycle-1-sanctioned escape hatch. The leftover file was updated with a PT-08 follow-on instruction.
+3. **Cycle 2 (AZ-487 side effect)**: `scripts/run-performance-tests.sh` does not attach an `Authorization: Bearer …` header to its outbound requests. After AZ-487 made every endpoint `RequireAuthorization()`, PT-01..PT-06 now return 401 for every call. Step 15 Performance Test gate at cycle 2 had to be skipped because of this script rot. Recorded as a third item in `_docs/_process_leftovers/2026-05-11_perf-pt07-harness.md`.
+
+Cycle 2 retrospective Improvement Action 2 (`_docs/06_metrics/retro_2026-05-11_cycle2.md` § 7) promoted "schedule PT-07 + PT-08 + JWT-attach as actual feature work" to a top-3 action. Per the cycle-2 LESSONS.md `[process]` rule, any new Deferred-status NFR after this point requires this PBI to land first.
+
+## Outcome
+
+- `scripts/run-performance-tests.sh` mints (or reads from `.env`) a valid HS256 JWT signed with `JWT_SECRET` and attaches `Authorization: Bearer <token>` to every probe request.
+- PT-01..PT-06 return real HTTP 200 responses with measurements written to `_docs/06_metrics/perf_<date>.md` — not 401, not `Unverified`.
+- PT-07 (route-tile fetch warm cache) is implemented as a runnable scenario in the script. Its row in `performance-tests.md` moves from `Status: Deferred` to `Status: Implemented` with the measurement target documented.
+- PT-08 (UAV tile batch upload latency) is implemented as a runnable scenario in the script. Same status transition.
+- The leftover file `_docs/_process_leftovers/2026-05-11_perf-pt07-harness.md` is deleted (or empty enough to delete) after this PBI lands — all three items resolved.
+- A regression test in CI runs the perf script in smoke mode (single iteration per scenario) to keep the script honest going forward.
+
+## Scope
+
+### Included
+
+- Add Bearer-token attach logic to `scripts/run-performance-tests.sh`. Two viable shapes (implementer's choice):
+  - **Option A**: script accepts `PERF_JWT_TOKEN` env var (operator pre-mints) and attaches via `curl -H "Authorization: Bearer $PERF_JWT_TOKEN"`.
+  - **Option B**: script invokes a small `dotnet run --project SatelliteProvider.TestSupport` (or equivalent) to mint the token from `JWT_SECRET` on the fly, then attaches it. Reuses the consolidated factory from `01_consolidate_jwt_test_helpers` if that PBI ships first.
+- Implement PT-07 scenario per its existing spec in `_docs/02_document/tests/performance-tests.md` (cold + warm region request, measure `tile_lookup_ms` vs `total_ms`).
+- Implement PT-08 scenario per its existing spec (UAV batch upload of N tiles, measure end-to-end latency + per-item gate cost).
+- Update `performance-tests.md`: move PT-07 and PT-08 from `Status: Deferred` to `Status: Implemented`; document measurement targets and acceptable variance.
+- Update `_docs/02_document/tests/traceability-matrix.md` — refresh PT-07 + PT-08 coverage notes from `Unverified` to actual scenario IDs.
+- Add a CI smoke run of the perf script (or document why none — e.g., needs full DB seed; in that case, gate at Step 15 stays manual but the script is verified per cycle by autodev).
+- Delete `_docs/_process_leftovers/2026-05-11_perf-pt07-harness.md` (or trim to only any genuinely-future items not addressed here).
+
+### Excluded
+
+- Any new perf scenarios beyond PT-07 + PT-08 + the 401 fix.
+- Any change to production code; this is harness-only work.
+- Any threshold-based PASS/FAIL gating in autodev Step 15 — the existing skill handles threshold comparison; this PBI only makes the scenarios runnable.
+- Replacing `curl`-based probes with a structured load tool (k6, JMeter, etc.) — that is a separate decision.
+
+## Acceptance Criteria
+
+**AC-1: PT-01..PT-06 no longer 401**
+Given the post-AZ-487 build is running with a valid `JWT_SECRET`
+When `scripts/run-performance-tests.sh` runs the existing PT-01..PT-06 scenarios
+Then every probe request receives an HTTP 200/2xx response (or the scenario's documented expected status), AND zero 401 responses appear in the perf log output.
+
+**AC-2: PT-07 runs to completion**
+Given a clean tile cache state
+When the script runs PT-07 (cold then warm region request)
+Then `tile_lookup_ms` and `total_ms` are measured for both cold and warm requests, AND the warm `tile_lookup_ms` is documented as less than the cold value (no specific threshold required — just measurable).
+
+**AC-3: PT-08 runs to completion**
+Given a valid JWT token with the `GPS` permission
+When the script runs PT-08 (UAV batch upload of N tiles, N = parameterized, default 10)
+Then the script reports per-item gate cost and end-to-end batch latency, AND all N items return either `accepted` or a documented reject reason (no `STORAGE_FAILURE` from harness misconfiguration).
+
+**AC-4: Spec status reflects implementation**
+Given the post-PBI repository state
+When `_docs/02_document/tests/performance-tests.md` is read
+Then PT-07 and PT-08 are marked `Status: Implemented` (or equivalent active state), AND the section is reformatted to no longer claim "harness work tracked in <leftover>".
+
+**AC-5: Leftover drained**
+Given `_docs/_process_leftovers/2026-05-11_perf-pt07-harness.md`
+When this PBI lands
+Then the file is either deleted or trimmed to genuinely-future items only (and the cycle-1 + cycle-2 follow-on instructions removed because they are now resolved).
+
+**AC-6: Token-mint surface reused, not duplicated**
+Given the consolidated JWT factory exists (after `01_consolidate_jwt_test_helpers`) OR the existing per-project mint helpers
+When the perf script mints its token
+Then it reuses the canonical surface from `01_consolidate_jwt_test_helpers` (if that PBI has landed) rather than inlining a third mint implementation in the shell script.
+
+## Non-Functional Requirements
+
+**Reliability**
+- Script must fail loudly (non-zero exit code, clear message) if `JWT_SECRET` is unset or shorter than 32 bytes — same contract as `scripts/run-tests.sh`.
+- Script must not silently skip a scenario; if a scenario cannot run, exit non-zero with an explicit reason.
+
+**Compatibility**
+- Bash 4+ compatible (the script already uses bash; do not introduce Python or other runtime dependencies for token minting unless they leverage an existing project — Option B above).
+
+## Unit Tests
+
+| AC Ref | What to Test | Required Outcome |
+|--------|-------------|-----------------|
+| AC-1   | The Bearer-attach logic itself (if a helper function is introduced) | Returns a valid `Authorization: Bearer <token>` header value when invoked with a known secret |
+| AC-6   | Token mint reuses canonical surface (after `01_consolidate_jwt_test_helpers` lands) | A grep across `scripts/` and test projects shows no new `new JwtSecurityToken(` constructor calls |
+
+## Blackbox Tests
+
+| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
+|--------|------------------------|-------------|-------------------|----------------|
+| AC-1   | API container running with auth; valid `JWT_SECRET` exported | Run `scripts/run-performance-tests.sh` PT-01..PT-06 | All requests return non-401 status codes | Reliability |
+| AC-2   | Clean tile cache; valid Bearer token attached | Run PT-07 cold + warm region request | Cold and warm `tile_lookup_ms` reported separately; warm < cold (no threshold required) | — |
+| AC-3   | Valid Bearer token with `GPS` permission attached | Run PT-08 UAV batch upload of 10 tiles | All 10 items return `accepted` or documented reject; per-item gate cost reported | — |
+| AC-4   | Post-PBI repo state | Read `performance-tests.md` | PT-07 + PT-08 status `Implemented`; no "Deferred — harness work" language | — |
+
+## Constraints
+
+- Must not regress the existing `scripts/run-tests.sh` flow.
+- Must not bake a Bearer token into git-tracked files. The token is minted at runtime or read from a git-ignored env var.
+- If the consolidated test-support library from `01_consolidate_jwt_test_helpers` exists, this PBI MUST reuse it for token minting — no third copy of the mint logic.
+
+## Risks & Mitigation
+
+**Risk 1: Dependency on `01_consolidate_jwt_test_helpers`**
+- *Risk*: This PBI is cleaner if Option B (mint via `SatelliteProvider.TestSupport`) is taken AND that project exists. If `01_consolidate_jwt_test_helpers` chose Option B for itself (no new library), Option B here is unavailable.
+- *Mitigation*: Implementer picks between Option A (pre-minted token via env var) and Option B based on the state of `01_consolidate_jwt_test_helpers` at start of work. Option A always works regardless.
+
+**Risk 2: Perf measurements are unstable on dev hardware**
+- *Risk*: PT-07 + PT-08 are timing-sensitive. Running on a developer laptop will produce noisy results that look like regressions.
+- *Mitigation*: The script reports measurements but does NOT gate on them. Autodev Step 15 already has an A/B/C gate on threshold failures handled by the test-run skill in perf mode. This PBI's scope is "make scenarios runnable", not "set thresholds".
+
+**Risk 3: Token expiry mid-test**
+- *Risk*: A 1-hour-lifetime token may expire during a long perf run.
+- *Mitigation*: Mint with a generous lifetime (e.g., 4h) when starting the script. Document the lifetime choice in the script header.
+
+**Risk 4: CI smoke run becomes a maintenance burden**
+- *Risk*: A CI-side perf smoke run that runs every commit may be flaky and become a source of noise.
+- *Mitigation*: Document explicitly whether a CI smoke run is added. If added, run only on `dev` push (not every PR commit) and only with a tight per-scenario timeout to catch script-rot, not perf regressions.