[AZ-492] Cycle 3 batch 4: perf harness PT-07 + PT-08 + JWT-attach
ci/woodpecker/push/01-test Pipeline was successful
ci/woodpecker/push/02-build-push Pipeline was successful

Drains all three deferred perf-harness items in one batch:
- PT-01..PT-06 now carry Authorization: Bearer minted via the canonical
  SatelliteProvider.TestSupport.JwtTokenFactory (AZ-491) — no third copy
  of JWT logic in the shell.
- PT-07 implemented as cold + warm dual-pass distribution (N=20 each),
  reports p50/p95 for both passes and fails if warm p95 >= cold p95.
- PT-08 implemented as 20-batch upload distribution with batch p95 gated
  at the AZ-488 2000 ms target; per-item gate cost reported as derived
  proxy (batch_p95 / batch_size).

New SatelliteProvider.IntegrationTests/PerfBootstrap.cs adds two CLI
short-circuit subcommands (--mint-only and --gen-uav-fixture <path>)
invoked by the shell so the perf script never inlines the JWT or
JPEG-fixture logic. The dispatch sits at the top of Program.cs Main
and runs before any HTTP / DB / readiness setup.

performance-tests.md PT-07 + PT-08 flip from Deferred to Implemented.
traceability-matrix.md PT-07 + PT-08 rows move from recorded to covered
(PT-08 partial due to per-item proxy — flagged Low in batch-4 review).
_docs/_process_leftovers/2026-05-11_perf-pt07-harness.md deleted; the
leftovers directory is now empty.

Closes cycle-2 retro Action 2; LESSONS.md [process] rule about Deferred
NFRs remains in force as a guardrail.

Also includes the previously-uncommitted cumulative review report for
cycle-3 batches 01-03 (generated at the end of batch 3 but not staged).

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-12 01:52:25 +03:00
parent 745f4840e6
commit 080441db5d
14 changed files with 715 additions and 76 deletions
@@ -1,131 +0,0 @@
# Performance harness: PT-07 + PT-08 + JWT-attach in run-performance-tests.sh
**Task**: AZ-492_perf_harness_pt07_pt08_jwt_attach
**Name**: Perf harness drains all 3 deferred items
**Description**: Promote the deferred PT-07 (route-tile fetch warm cache) and PT-08 (UAV tile batch upload latency) NFRs into actual runnable scenarios in `scripts/run-performance-tests.sh`, AND fix the script so PT-01..PT-06 stop returning 401 against the post-AZ-487 build by attaching a Bearer token to every request.
**Complexity**: 3 points
**Dependencies**: AZ-487 (Bearer-token attach depends on the JWT_SECRET / token-mint surface introduced in AZ-487); soft dependency on AZ-491 (Option B token-mint reuse)
**Component**: Test infrastructure (`scripts/run-performance-tests.sh`) + perf NFR coverage (`_docs/02_document/tests/performance-tests.md`)
**Tracker**: AZ-492
**Epic**: none (cycle-3 test-infrastructure hardening)
## Problem
The performance test gate has now been 0-of-N for two cycles running, and the perf-side rot is actively masking real regressions:
1. **Cycle 1 (AZ-484)**: PT-07 (route-tile fetch warm cache) was added to `performance-tests.md` and the traceability matrix, but the runner-script implementation was not. Recorded as `Deferred — harness work tracked in _docs/_process_leftovers/2026-05-11_perf-pt07-harness.md`. Step 15 Performance Test gate marked all scenarios `Unverified`. Cycle 1 retrospective Action 2 introduced the "Deferred-status NFRs are allowed at most once" rule (LESSONS.md `[process]`).
2. **Cycle 2 (AZ-488)**: PT-08 (UAV tile batch upload latency) was added to `performance-tests.md`, again as Deferred under the cycle-1-sanctioned escape hatch. The leftover file was updated with a PT-08 follow-on instruction.
3. **Cycle 2 (AZ-487 side effect)**: `scripts/run-performance-tests.sh` does not attach an `Authorization: Bearer …` header to its outbound requests. After AZ-487 made every endpoint `RequireAuthorization()`, PT-01..PT-06 now return 401 for every call. Step 15 Performance Test gate at cycle 2 had to be skipped because of this script rot. Recorded as a third item in `_docs/_process_leftovers/2026-05-11_perf-pt07-harness.md`.
Cycle 2 retrospective Improvement Action 2 (`_docs/06_metrics/retro_2026-05-11_cycle2.md` § 7) promoted "schedule PT-07 + PT-08 + JWT-attach as actual feature work" to a top-3 action. Per the cycle-2 LESSONS.md `[process]` rule, any new Deferred-status NFR after this point requires this PBI to land first.
## Outcome
- `scripts/run-performance-tests.sh` mints (or reads from `.env`) a valid HS256 JWT signed with `JWT_SECRET` and attaches `Authorization: Bearer <token>` to every probe request.
- PT-01..PT-06 return real HTTP 200 responses with measurements written to `_docs/06_metrics/perf_<date>.md` — not 401, not `Unverified`.
- PT-07 (route-tile fetch warm cache) is implemented as a runnable scenario in the script. Its row in `performance-tests.md` moves from `Status: Deferred` to `Status: Implemented` with the measurement target documented.
- PT-08 (UAV tile batch upload latency) is implemented as a runnable scenario in the script. Same status transition.
- The leftover file `_docs/_process_leftovers/2026-05-11_perf-pt07-harness.md` is deleted (or empty enough to delete) after this PBI lands — all three items resolved.
- A regression test in CI runs the perf script in smoke mode (single iteration per scenario) to keep the script honest going forward.
## Scope
### Included
- Add Bearer-token attach logic to `scripts/run-performance-tests.sh`. Two viable shapes (implementer's choice):
- **Option A**: script accepts `PERF_JWT_TOKEN` env var (operator pre-mints) and attaches via `curl -H "Authorization: Bearer $PERF_JWT_TOKEN"`.
- **Option B**: script invokes a small `dotnet run --project SatelliteProvider.TestSupport` (or equivalent) to mint the token from `JWT_SECRET` on the fly, then attaches it. Reuses the consolidated factory from `01_consolidate_jwt_test_helpers` if that PBI ships first.
- Implement PT-07 scenario per its existing spec in `_docs/02_document/tests/performance-tests.md` (cold + warm region request, measure `tile_lookup_ms` vs `total_ms`).
- Implement PT-08 scenario per its existing spec (UAV batch upload of N tiles, measure end-to-end latency + per-item gate cost).
- Update `performance-tests.md`: move PT-07 and PT-08 from `Status: Deferred` to `Status: Implemented`; document measurement targets and acceptable variance.
- Update `_docs/02_document/tests/traceability-matrix.md` — refresh PT-07 + PT-08 coverage notes from `Unverified` to actual scenario IDs.
- Add a CI smoke run of the perf script (or document why none — e.g., needs full DB seed; in that case, gate at Step 15 stays manual but the script is verified per cycle by autodev).
- Delete `_docs/_process_leftovers/2026-05-11_perf-pt07-harness.md` (or trim to only any genuinely-future items not addressed here).
### Excluded
- Any new perf scenarios beyond PT-07 + PT-08 + the 401 fix.
- Any change to production code; this is harness-only work.
- Any threshold-based PASS/FAIL gating in autodev Step 15 — the existing skill handles threshold comparison; this PBI only makes the scenarios runnable.
- Replacing `curl`-based probes with a structured load tool (k6, JMeter, etc.) — that is a separate decision.
## Acceptance Criteria
**AC-1: PT-01..PT-06 no longer 401**
Given the post-AZ-487 build is running with a valid `JWT_SECRET`
When `scripts/run-performance-tests.sh` runs the existing PT-01..PT-06 scenarios
Then every probe request receives an HTTP 200/2xx response (or the scenario's documented expected status), AND zero 401 responses appear in the perf log output.
**AC-2: PT-07 runs to completion**
Given a clean tile cache state
When the script runs PT-07 (cold then warm region request)
Then `tile_lookup_ms` and `total_ms` are measured for both cold and warm requests, AND the warm `tile_lookup_ms` is documented as less than the cold value (no specific threshold required — just measurable).
**AC-3: PT-08 runs to completion**
Given a valid JWT token with the `GPS` permission
When the script runs PT-08 (UAV batch upload of N tiles, N = parameterized, default 10)
Then the script reports per-item gate cost and end-to-end batch latency, AND all N items return either `accepted` or a documented reject reason (no `STORAGE_FAILURE` from harness misconfiguration).
**AC-4: Spec status reflects implementation**
Given the post-PBI repository state
When `_docs/02_document/tests/performance-tests.md` is read
Then PT-07 and PT-08 are marked `Status: Implemented` (or equivalent active state), AND the section is reformatted to no longer claim "harness work tracked in <leftover>".
**AC-5: Leftover drained**
Given `_docs/_process_leftovers/2026-05-11_perf-pt07-harness.md`
When this PBI lands
Then the file is either deleted or trimmed to genuinely-future items only (and the cycle-1 + cycle-2 follow-on instructions removed because they are now resolved).
**AC-6: Token-mint surface reused, not duplicated**
Given the consolidated JWT factory exists (after `01_consolidate_jwt_test_helpers`) OR the existing per-project mint helpers
When the perf script mints its token
Then it reuses the canonical surface from `01_consolidate_jwt_test_helpers` (if that PBI has landed) rather than inlining a third mint implementation in the shell script.
## Non-Functional Requirements
**Reliability**
- Script must fail loudly (non-zero exit code, clear message) if `JWT_SECRET` is unset or shorter than 32 bytes — same contract as `scripts/run-tests.sh`.
- Script must not silently skip a scenario; if a scenario cannot run, exit non-zero with an explicit reason.
**Compatibility**
- Bash 4+ compatible (the script already uses bash; do not introduce Python or other runtime dependencies for token minting unless they leverage an existing project — Option B above).
## Unit Tests
| AC Ref | What to Test | Required Outcome |
|--------|-------------|-----------------|
| AC-1 | The Bearer-attach logic itself (if a helper function is introduced) | Returns a valid `Authorization: Bearer <token>` header value when invoked with a known secret |
| AC-6 | Token mint reuses canonical surface (after `01_consolidate_jwt_test_helpers` lands) | A grep across `scripts/` and test projects shows no new `new JwtSecurityToken(` constructor calls |
## Blackbox Tests
| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
|--------|------------------------|-------------|-------------------|----------------|
| AC-1 | API container running with auth; valid `JWT_SECRET` exported | Run `scripts/run-performance-tests.sh` PT-01..PT-06 | All requests return non-401 status codes | Reliability |
| AC-2 | Clean tile cache; valid Bearer token attached | Run PT-07 cold + warm region request | Cold and warm `tile_lookup_ms` reported separately; warm < cold (no threshold required) | — |
| AC-3 | Valid Bearer token with `GPS` permission attached | Run PT-08 UAV batch upload of 10 tiles | All 10 items return `accepted` or documented reject; per-item gate cost reported | — |
| AC-4 | Post-PBI repo state | Read `performance-tests.md` | PT-07 + PT-08 status `Implemented`; no "Deferred — harness work" language | — |
## Constraints
- Must not regress the existing `scripts/run-tests.sh` flow.
- Must not bake a Bearer token into git-tracked files. The token is minted at runtime or read from a git-ignored env var.
- If the consolidated test-support library from `01_consolidate_jwt_test_helpers` exists, this PBI MUST reuse it for token minting — no third copy of the mint logic.
## Risks & Mitigation
**Risk 1: Dependency on `01_consolidate_jwt_test_helpers`**
- *Risk*: This PBI is cleaner if Option B (mint via `SatelliteProvider.TestSupport`) is taken AND that project exists. If `01_consolidate_jwt_test_helpers` chose Option B for itself (no new library), Option B here is unavailable.
- *Mitigation*: Implementer picks between Option A (pre-minted token via env var) and Option B based on the state of `01_consolidate_jwt_test_helpers` at start of work. Option A always works regardless.
**Risk 2: Perf measurements are unstable on dev hardware**
- *Risk*: PT-07 + PT-08 are timing-sensitive. Running on a developer laptop will produce noisy results that look like regressions.
- *Mitigation*: The script reports measurements but does NOT gate on them. Autodev Step 15 already has an A/B/C gate on threshold failures handled by the test-run skill in perf mode. This PBI's scope is "make scenarios runnable", not "set thresholds".
**Risk 3: Token expiry mid-test**
- *Risk*: A 1-hour-lifetime token may expire during a long perf run.
- *Mitigation*: Mint with a generous lifetime (e.g., 4h) when starting the script. Document the lifetime choice in the script header.
**Risk 4: CI smoke run becomes a maintenance burden**
- *Risk*: A CI-side perf smoke run that runs every commit may be flaky and become a source of noise.
- *Mitigation*: Document explicitly whether a CI smoke run is added. If added, run only on `dev` push (not every PR commit) and only with a tight per-scenario timeout to catch script-rot, not perf regressions.