Drains all three deferred perf-harness items in one batch: - PT-01..PT-06 now carry Authorization: Bearer minted via the canonical SatelliteProvider.TestSupport.JwtTokenFactory (AZ-491) — no third copy of JWT logic in the shell. - PT-07 implemented as cold + warm dual-pass distribution (N=20 each), reports p50/p95 for both passes and fails if warm p95 >= cold p95. - PT-08 implemented as 20-batch upload distribution with batch p95 gated at the AZ-488 2000 ms target; per-item gate cost reported as derived proxy (batch_p95 / batch_size). New SatelliteProvider.IntegrationTests/PerfBootstrap.cs adds two CLI short-circuit subcommands (--mint-only and --gen-uav-fixture <path>) invoked by the shell so the perf script never inlines the JWT or JPEG-fixture logic. The dispatch sits at the top of Program.cs Main and runs before any HTTP / DB / readiness setup. performance-tests.md PT-07 + PT-08 flip from Deferred to Implemented. traceability-matrix.md PT-07 + PT-08 rows move from recorded to covered (PT-08 partial due to per-item proxy — flagged Low in batch-4 review). _docs/_process_leftovers/2026-05-11_perf-pt07-harness.md deleted; the leftovers directory is now empty. Closes cycle-2 retro Action 2; LESSONS.md [process] rule about Deferred NFRs remains in force as a guardrail. Also includes the previously-uncommitted cumulative review report for cycle-3 batches 01-03 (generated at the end of batch 3 but not staged). Co-authored-by: Cursor <cursoragent@cursor.com>
12 KiB
Batch Report — Batch 04 cycle 3
Batch: 04 (cycle 3) Tasks: AZ-492 (Perf harness: PT-07 + PT-08 + JWT-attach in run-performance-tests.sh) Date: 2026-05-12
Task Results
| Task | Status | Files Modified | Tests | AC Coverage | Issues |
|---|---|---|---|---|---|
| AZ-492_perf_harness_pt07_pt08_jwt_attach | Done | 1 added (PerfBootstrap.cs) + 6 modified |
Existing JwtTokenFactory unit tests cover the delegated mint path; AC-6 verified by repo-wide grep (only one new JwtSecurityToken( site in source). Live perf-script execution deferred to Step 16. |
6/6 ACs addressed in the harness; AC-2 & AC-3 fully verifiable only at runtime (live perf run); AC-1 / AC-4 / AC-5 / AC-6 statically verifiable. | 0 blockers; 2 Low findings (see Review). |
AC Test Coverage: All addressed (6 of 6) — runtime verification at Step 16
Code Review Verdict: pending (this batch report precedes per-batch review)
Auto-Fix Attempts: 0
Stuck Agents: None
What was implemented
The perf harness drains all three deferred items in a single batch:
- PT-01..PT-06 stop returning 401 — every probe carries an
Authorization: Bearer <token>header minted fromJWT_SECRETvia the canonicalSatelliteProvider.TestSupport.JwtTokenFactory.Createsurface (AZ-491). No third copy of the JWT-mint logic ships in the shell script. - PT-07 is now a runnable two-pass scenario (cold N requests at distinct coordinates, then warm N requests against the same coordinates). The harness reports p50/p95 for both passes and fails the scenario if warm p95 is NOT below cold p95.
- PT-08 is now a runnable scenario (N batch uploads of
PERF_UAV_BATCH_SIZE256×256 JPEGs each). The harness reports batch p50/p95, a per-item proxybatch_p95 / batch_size, and accepted/rejected/failed item counts. Batch p95 is gated at the AZ-488 target of 2000 ms.
Added
SatelliteProvider.IntegrationTests/PerfBootstrap.cs— static helper with two short-circuit subcommands invoked by the shell:MintToken()— readsJWT_SECRETviaJwtTestHelpers.ResolveSecretOrThrow, mints a 4-hour HS256 token with subjectperf-testsand claimpermissions: GPSviaJwtTokenFactory.Create, writes the token to stdout. The 4-hour lifetime is sized for the longest possible PT-01..PT-08 combined run with margin (per AZ-492 § Risk 3 mitigation).GenerateUavFixture(args)— writes a 256×256 random-noise JPEG viaSixLabors.ImageSharpto the path passed as the second CLI argument. Pixel pattern is identical toUavUploadTests.CreateValidJpegso the perf harness exercises the same quality-gate path the integration tests already validate.
Modified
SatelliteProvider.IntegrationTests/Program.cs— added a 13-line dispatch block at the top ofMainthat recognises--mint-only/--gen-uav-fixtureand delegates toPerfBootstrapbefore any HTTP / DB / readiness logic runs. Both subcommands therefore work on any host that has the .NET SDK installed, with no live API / Postgres dependency.scripts/run-performance-tests.sh— rewritten:- Loads
JWT_SECRETfrom.envif unset (mirrorsscripts/run-tests.shpattern; AC-1 reliability). - Pre-builds
SatelliteProvider.IntegrationTestsin Release once so thedotnet <dll>invocations of--mint-only/--gen-uav-fixtureproduce clean stdout (no Restore/Build chatter). - Mints a token via
dotnet <SatelliteProvider.IntegrationTests.dll> --mint-onlyunless the operator pre-mints viaPERF_JWT_TOKEN(per AZ-492 Option A / Option B in the spec; both paths supported). - Attaches
-H "$AUTH_HEADER"to everycurlin PT-01..PT-06 + thewait_region_completedpolling helper (8 attach sites; verified via repo grep — see Review § Static checks). - Adds PT-07 (cold + warm 20-request distributions; p50/p95 reported per pass).
- Adds PT-08 (20 batches of 10 items each at distinct coordinates; batch p50/p95 + per-item proxy + accepted/rejected/failed counts).
- Adds a
percentile()awk helper. AddsPERF_REPEAT_COUNT(default 20) andPERF_UAV_BATCH_SIZE(default 10) env-var knobs so the run can be tuned without editing the script. - Adds a
mktemp -dtmpdir for the UAV fixture JPEG + per-batch response captures; tmpdir is unlinked incleanup.
- Loads
_docs/02_document/tests/performance-tests.md— PT-07 entry rewritten: Status flipped from "Deferred (Note: active enforcement deferred…)" to Implemented (AZ-492), trigger text updated to describe the cold+warm dual-pass design, pass criterion now references the cold-vs-warm relative comparison. PT-08 entry rewritten: Status flipped from "Deferred — harness work tracked in " to Implemented (AZ-492), trigger text updated to describe the on-demand--gen-uav-fixturepath, pass criterion now matches what the harness actually gates (batch p95 at 2000 ms + per-item proxy — true per-call gate timing remains a follow-up since it requires server-side instrumentation)._docs/02_document/tests/traceability-matrix.md— PT-07 row moved from◐ recordedto✓with text updated to describe the cold+warm distribution. PT-08 row moved from◐ recorded (Deferred)to✓ (batch p95) / ◐ (per-item proxy only)reflecting the partial-coverage shape. The "Coverage shape notes" paragraph at the bottom of the Cycle 2 section updated to summarise the AZ-492 transition._docs/02_document/modules/tests_integration.md— the### Supporting Classesentry forProgram.csnow mentions the AZ-492 perf-bootstrap subcommands. A new bullet documentsPerfBootstrap.cs(purpose, public API, dependency notes, invocation example)._docs/02_document/module-layout.md— the TestSupport "Runner-side concerns NOT in TestSupport" paragraph extended to document whyPerfBootstrap.cssits in IntegrationTests rather than TestSupport (it pulls in ImageSharp; the JWT-mint delegation is the only TestSupport touchpoint)._docs/06_metrics/retro_2026-05-11_cycle2.md§ Action 2 — heading suffixed with**RESOLVED in cycle 3 (AZ-492)**; closing paragraph added that summarises which items landed and which lessons remain in force.
Removed
_docs/_process_leftovers/2026-05-11_perf-pt07-harness.md— deleted (per AC-5). The leftovers directory is now empty.
Verification
AC-1 — PT-01..PT-06 no longer 401
Static: every curl invocation in scripts/run-performance-tests.sh carries -H "$AUTH_HEADER" where $AUTH_HEADER is Authorization: Bearer $PERF_JWT_TOKEN. Verified via rg 'curl ' scripts/run-performance-tests.sh — 10 curl sites, every one passes the auth header (including the wait_region_completed polling helper and the multipart upload curl_args array used in PT-08).
Runtime: deferred to Step 16. Per the AZ-492 task spec § Risk 2 mitigation, the perf script does not gate on absolute thresholds for the new scenarios, so a Step-16 run is expected to either PASS or surface real signal (not script-rot 401s).
AC-2 — PT-07 runs to completion
Statically: the script emits two timing arrays (PT07_COLD_MS and PT07_WARM_MS), computes p50/p95 via the new percentile() awk helper, and prints both distributions. The pass condition is PT07_WARM_P95 < PT07_COLD_P95 per AZ-492 spec ("warm < cold, no specific threshold required"). The cold/warm passes use the SAME coordinates so the warm pass exercises the cached path.
AC-3 — PT-08 runs to completion
Statically: --gen-uav-fixture is invoked once at the top of PT-08 to produce a deterministic 256×256 random-noise JPEG (the same shape that UavUploadTests.MixedBatch_ReturnsPerItemResults already validates passes the quality gate). Each batch posts PERF_UAV_BATCH_SIZE copies of the fixture at distinct coordinates (PT08_COORD_STRIDE is large enough to fall into distinct tile cells). The script reports accepted=/rejected=/failed= counts so a non-zero rejected count surfaces with a documented reason rather than being silently masked.
AC-4 — Spec status reflects implementation
Verified by reading _docs/02_document/tests/performance-tests.md — both PT-07 and PT-08 carry **Status**: **Implemented (AZ-492).** headings and the "Deferred — harness work tracked in " language is gone.
AC-5 — Leftover drained
Verified: _docs/_process_leftovers/2026-05-11_perf-pt07-harness.md deleted; ls _docs/_process_leftovers/ shows no entries.
AC-6 — Token-mint surface reused, not duplicated
Verified by repo-wide grep: rg 'new JwtSecurityToken\(' matches exactly one source-code site (SatelliteProvider.TestSupport/JwtTokenFactory.cs); the other two matches are inside _docs/02_tasks/ text describing the pattern. PerfBootstrap.MintToken() delegates to JwtTokenFactory.Create(secret, "perf-tests", TimeSpan.FromHours(4), new[] { new Claim("permissions", "GPS") }) — single call, no inlining.
Spec-vs-reality
Per-item gate cost — proxy not direct measurement. AZ-492 AC-3 ("script reports per-item gate cost") is satisfied by a derived value batch_p95 / batch_size rather than the true per-call UavTileQualityGate.Validate timing. The true value would require server-side instrumentation (UavTileUploadHandler would need to record per-item validate timings and expose them in the response envelope or via a metrics endpoint). That instrumentation is out of scope for AZ-492 (which is harness-only per the spec § Excluded: "Any change to production code; this is harness-only work"). The proxy is documented as such in both performance-tests.md and the script comments, and traceability-matrix.md flags the row as ✓ (batch p95) / ◐ (per-item proxy only).
No CI smoke run added. AZ-492 Risk 4 left the CI smoke decision as "Document explicitly whether a CI smoke run is added". The smoke is NOT added in this batch because (a) the perf script depends on a running API + Postgres + populated tile cache, which is more than a CI per-commit run can warm up cheaply, and (b) Step 16 already runs the perf script per cycle. If the cycle gate proves insufficient, a dev-push-only workflow can be added in a future PBI.
Outstanding follow-ups
- Server-side gate timing instrumentation — would let PT-08 report a true per-item p95 instead of the
batch_p95 / batch_sizeproxy. Estimate: 2 SP. Sequence: after the next perf-gate result to see whether the proxy is actually misleading. - Image-fixture factory consolidation —
UavUploadTests.CreateValidJpeg(integration) +UavTileImageFactory.CreateRandomJpeg(unit) +PerfBootstrap.CreateValidJpeg(perf bootstrap) all produce essentially the same noise JPEG with slight signature differences. AZ-491 set the precedent for moving cross-project test helpers intoSatelliteProvider.TestSupport; the JPEG factory is a natural follow-up. Estimate: 1–2 SP. Same applies to theClaim("permissions", "GPS")literal which appears inUavUploadTests,PerfBootstrap, and several other places. - Database name alignment with the AZ-493 guard intent — the AZ-493 Spec-vs-reality note (batch 03 report) about renaming
satelliteprovider→satelliteprovider_testis unrelated to AZ-492 but should be re-evaluated as part of the cycle 3 retrospective alongside the recurring "task-spec accuracy" pattern noted in the cumulative review.
Tests Run
Unit tests not re-run as part of this batch (no unit-test code modified). Integration tests not re-run (no integration-test code modified except Program.cs which adds a pre-existing-code short-circuit; the --smoke / --full paths are unchanged). The final --full run at Step 16 will exercise the integration suite end-to-end and the perf script will be invoked there.
Cumulative review trigger
This is batch 4. Cumulative review triggers at every K=3 batches (per .cursor/skills/implement/SKILL.md). The next cumulative review covers batches 4–6 — i.e. AZ-492 + AZ-494 + the final test run. Not triggered in this batch.
Auto-fix attempts: 0
No build / test failures observed. bash -n scripts/run-performance-tests.sh is clean; C# code compiles per the existing project structure (verified by reading the file — dotnet build not executed per the project's AGENTS.md "do not run dotnet build via terminal tools" guidance).