Files
satellite-provider/_docs/03_implementation/batch_04_cycle3_report.md
T
Oleksandr Bezdieniezhnykh 080441db5d
ci/woodpecker/push/01-test Pipeline was successful
ci/woodpecker/push/02-build-push Pipeline was successful
[AZ-492] Cycle 3 batch 4: perf harness PT-07 + PT-08 + JWT-attach
Drains all three deferred perf-harness items in one batch:
- PT-01..PT-06 now carry Authorization: Bearer minted via the canonical
  SatelliteProvider.TestSupport.JwtTokenFactory (AZ-491) — no third copy
  of JWT logic in the shell.
- PT-07 implemented as cold + warm dual-pass distribution (N=20 each),
  reports p50/p95 for both passes and fails if warm p95 >= cold p95.
- PT-08 implemented as 20-batch upload distribution with batch p95 gated
  at the AZ-488 2000 ms target; per-item gate cost reported as derived
  proxy (batch_p95 / batch_size).

New SatelliteProvider.IntegrationTests/PerfBootstrap.cs adds two CLI
short-circuit subcommands (--mint-only and --gen-uav-fixture <path>)
invoked by the shell so the perf script never inlines the JWT or
JPEG-fixture logic. The dispatch sits at the top of Program.cs Main
and runs before any HTTP / DB / readiness setup.

performance-tests.md PT-07 + PT-08 flip from Deferred to Implemented.
traceability-matrix.md PT-07 + PT-08 rows move from recorded to covered
(PT-08 partial due to per-item proxy — flagged Low in batch-4 review).
_docs/_process_leftovers/2026-05-11_perf-pt07-harness.md deleted; the
leftovers directory is now empty.

Closes cycle-2 retro Action 2; LESSONS.md [process] rule about Deferred
NFRs remains in force as a guardrail.

Also includes the previously-uncommitted cumulative review report for
cycle-3 batches 01-03 (generated at the end of batch 3 but not staged).

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-12 01:52:25 +03:00

12 KiB
Raw Blame History

Batch Report — Batch 04 cycle 3

Batch: 04 (cycle 3) Tasks: AZ-492 (Perf harness: PT-07 + PT-08 + JWT-attach in run-performance-tests.sh) Date: 2026-05-12

Task Results

Task Status Files Modified Tests AC Coverage Issues
AZ-492_perf_harness_pt07_pt08_jwt_attach Done 1 added (PerfBootstrap.cs) + 6 modified Existing JwtTokenFactory unit tests cover the delegated mint path; AC-6 verified by repo-wide grep (only one new JwtSecurityToken( site in source). Live perf-script execution deferred to Step 16. 6/6 ACs addressed in the harness; AC-2 & AC-3 fully verifiable only at runtime (live perf run); AC-1 / AC-4 / AC-5 / AC-6 statically verifiable. 0 blockers; 2 Low findings (see Review).

AC Test Coverage: All addressed (6 of 6) — runtime verification at Step 16

Code Review Verdict: pending (this batch report precedes per-batch review)

Auto-Fix Attempts: 0

Stuck Agents: None

What was implemented

The perf harness drains all three deferred items in a single batch:

  1. PT-01..PT-06 stop returning 401 — every probe carries an Authorization: Bearer <token> header minted from JWT_SECRET via the canonical SatelliteProvider.TestSupport.JwtTokenFactory.Create surface (AZ-491). No third copy of the JWT-mint logic ships in the shell script.
  2. PT-07 is now a runnable two-pass scenario (cold N requests at distinct coordinates, then warm N requests against the same coordinates). The harness reports p50/p95 for both passes and fails the scenario if warm p95 is NOT below cold p95.
  3. PT-08 is now a runnable scenario (N batch uploads of PERF_UAV_BATCH_SIZE 256×256 JPEGs each). The harness reports batch p50/p95, a per-item proxy batch_p95 / batch_size, and accepted/rejected/failed item counts. Batch p95 is gated at the AZ-488 target of 2000 ms.

Added

  • SatelliteProvider.IntegrationTests/PerfBootstrap.cs — static helper with two short-circuit subcommands invoked by the shell:
    • MintToken() — reads JWT_SECRET via JwtTestHelpers.ResolveSecretOrThrow, mints a 4-hour HS256 token with subject perf-tests and claim permissions: GPS via JwtTokenFactory.Create, writes the token to stdout. The 4-hour lifetime is sized for the longest possible PT-01..PT-08 combined run with margin (per AZ-492 § Risk 3 mitigation).
    • GenerateUavFixture(args) — writes a 256×256 random-noise JPEG via SixLabors.ImageSharp to the path passed as the second CLI argument. Pixel pattern is identical to UavUploadTests.CreateValidJpeg so the perf harness exercises the same quality-gate path the integration tests already validate.

Modified

  • SatelliteProvider.IntegrationTests/Program.cs — added a 13-line dispatch block at the top of Main that recognises --mint-only / --gen-uav-fixture and delegates to PerfBootstrap before any HTTP / DB / readiness logic runs. Both subcommands therefore work on any host that has the .NET SDK installed, with no live API / Postgres dependency.
  • scripts/run-performance-tests.sh — rewritten:
    • Loads JWT_SECRET from .env if unset (mirrors scripts/run-tests.sh pattern; AC-1 reliability).
    • Pre-builds SatelliteProvider.IntegrationTests in Release once so the dotnet <dll> invocations of --mint-only / --gen-uav-fixture produce clean stdout (no Restore/Build chatter).
    • Mints a token via dotnet <SatelliteProvider.IntegrationTests.dll> --mint-only unless the operator pre-mints via PERF_JWT_TOKEN (per AZ-492 Option A / Option B in the spec; both paths supported).
    • Attaches -H "$AUTH_HEADER" to every curl in PT-01..PT-06 + the wait_region_completed polling helper (8 attach sites; verified via repo grep — see Review § Static checks).
    • Adds PT-07 (cold + warm 20-request distributions; p50/p95 reported per pass).
    • Adds PT-08 (20 batches of 10 items each at distinct coordinates; batch p50/p95 + per-item proxy + accepted/rejected/failed counts).
    • Adds a percentile() awk helper. Adds PERF_REPEAT_COUNT (default 20) and PERF_UAV_BATCH_SIZE (default 10) env-var knobs so the run can be tuned without editing the script.
    • Adds a mktemp -d tmpdir for the UAV fixture JPEG + per-batch response captures; tmpdir is unlinked in cleanup.
  • _docs/02_document/tests/performance-tests.md — PT-07 entry rewritten: Status flipped from "Deferred (Note: active enforcement deferred…)" to Implemented (AZ-492), trigger text updated to describe the cold+warm dual-pass design, pass criterion now references the cold-vs-warm relative comparison. PT-08 entry rewritten: Status flipped from "Deferred — harness work tracked in " to Implemented (AZ-492), trigger text updated to describe the on-demand --gen-uav-fixture path, pass criterion now matches what the harness actually gates (batch p95 at 2000 ms + per-item proxy — true per-call gate timing remains a follow-up since it requires server-side instrumentation).
  • _docs/02_document/tests/traceability-matrix.md — PT-07 row moved from ◐ recorded to with text updated to describe the cold+warm distribution. PT-08 row moved from ◐ recorded (Deferred) to ✓ (batch p95) / ◐ (per-item proxy only) reflecting the partial-coverage shape. The "Coverage shape notes" paragraph at the bottom of the Cycle 2 section updated to summarise the AZ-492 transition.
  • _docs/02_document/modules/tests_integration.md — the ### Supporting Classes entry for Program.cs now mentions the AZ-492 perf-bootstrap subcommands. A new bullet documents PerfBootstrap.cs (purpose, public API, dependency notes, invocation example).
  • _docs/02_document/module-layout.md — the TestSupport "Runner-side concerns NOT in TestSupport" paragraph extended to document why PerfBootstrap.cs sits in IntegrationTests rather than TestSupport (it pulls in ImageSharp; the JWT-mint delegation is the only TestSupport touchpoint).
  • _docs/06_metrics/retro_2026-05-11_cycle2.md § Action 2 — heading suffixed with **RESOLVED in cycle 3 (AZ-492)**; closing paragraph added that summarises which items landed and which lessons remain in force.

Removed

  • _docs/_process_leftovers/2026-05-11_perf-pt07-harness.md — deleted (per AC-5). The leftovers directory is now empty.

Verification

AC-1 — PT-01..PT-06 no longer 401

Static: every curl invocation in scripts/run-performance-tests.sh carries -H "$AUTH_HEADER" where $AUTH_HEADER is Authorization: Bearer $PERF_JWT_TOKEN. Verified via rg 'curl ' scripts/run-performance-tests.sh — 10 curl sites, every one passes the auth header (including the wait_region_completed polling helper and the multipart upload curl_args array used in PT-08). Runtime: deferred to Step 16. Per the AZ-492 task spec § Risk 2 mitigation, the perf script does not gate on absolute thresholds for the new scenarios, so a Step-16 run is expected to either PASS or surface real signal (not script-rot 401s).

AC-2 — PT-07 runs to completion

Statically: the script emits two timing arrays (PT07_COLD_MS and PT07_WARM_MS), computes p50/p95 via the new percentile() awk helper, and prints both distributions. The pass condition is PT07_WARM_P95 < PT07_COLD_P95 per AZ-492 spec ("warm < cold, no specific threshold required"). The cold/warm passes use the SAME coordinates so the warm pass exercises the cached path.

AC-3 — PT-08 runs to completion

Statically: --gen-uav-fixture is invoked once at the top of PT-08 to produce a deterministic 256×256 random-noise JPEG (the same shape that UavUploadTests.MixedBatch_ReturnsPerItemResults already validates passes the quality gate). Each batch posts PERF_UAV_BATCH_SIZE copies of the fixture at distinct coordinates (PT08_COORD_STRIDE is large enough to fall into distinct tile cells). The script reports accepted=/rejected=/failed= counts so a non-zero rejected count surfaces with a documented reason rather than being silently masked.

AC-4 — Spec status reflects implementation

Verified by reading _docs/02_document/tests/performance-tests.md — both PT-07 and PT-08 carry **Status**: **Implemented (AZ-492).** headings and the "Deferred — harness work tracked in " language is gone.

AC-5 — Leftover drained

Verified: _docs/_process_leftovers/2026-05-11_perf-pt07-harness.md deleted; ls _docs/_process_leftovers/ shows no entries.

AC-6 — Token-mint surface reused, not duplicated

Verified by repo-wide grep: rg 'new JwtSecurityToken\(' matches exactly one source-code site (SatelliteProvider.TestSupport/JwtTokenFactory.cs); the other two matches are inside _docs/02_tasks/ text describing the pattern. PerfBootstrap.MintToken() delegates to JwtTokenFactory.Create(secret, "perf-tests", TimeSpan.FromHours(4), new[] { new Claim("permissions", "GPS") }) — single call, no inlining.

Spec-vs-reality

Per-item gate cost — proxy not direct measurement. AZ-492 AC-3 ("script reports per-item gate cost") is satisfied by a derived value batch_p95 / batch_size rather than the true per-call UavTileQualityGate.Validate timing. The true value would require server-side instrumentation (UavTileUploadHandler would need to record per-item validate timings and expose them in the response envelope or via a metrics endpoint). That instrumentation is out of scope for AZ-492 (which is harness-only per the spec § Excluded: "Any change to production code; this is harness-only work"). The proxy is documented as such in both performance-tests.md and the script comments, and traceability-matrix.md flags the row as ✓ (batch p95) / ◐ (per-item proxy only).

No CI smoke run added. AZ-492 Risk 4 left the CI smoke decision as "Document explicitly whether a CI smoke run is added". The smoke is NOT added in this batch because (a) the perf script depends on a running API + Postgres + populated tile cache, which is more than a CI per-commit run can warm up cheaply, and (b) Step 16 already runs the perf script per cycle. If the cycle gate proves insufficient, a dev-push-only workflow can be added in a future PBI.

Outstanding follow-ups

  • Server-side gate timing instrumentation — would let PT-08 report a true per-item p95 instead of the batch_p95 / batch_size proxy. Estimate: 2 SP. Sequence: after the next perf-gate result to see whether the proxy is actually misleading.
  • Image-fixture factory consolidationUavUploadTests.CreateValidJpeg (integration) + UavTileImageFactory.CreateRandomJpeg (unit) + PerfBootstrap.CreateValidJpeg (perf bootstrap) all produce essentially the same noise JPEG with slight signature differences. AZ-491 set the precedent for moving cross-project test helpers into SatelliteProvider.TestSupport; the JPEG factory is a natural follow-up. Estimate: 12 SP. Same applies to the Claim("permissions", "GPS") literal which appears in UavUploadTests, PerfBootstrap, and several other places.
  • Database name alignment with the AZ-493 guard intent — the AZ-493 Spec-vs-reality note (batch 03 report) about renaming satelliteprovidersatelliteprovider_test is unrelated to AZ-492 but should be re-evaluated as part of the cycle 3 retrospective alongside the recurring "task-spec accuracy" pattern noted in the cumulative review.

Tests Run

Unit tests not re-run as part of this batch (no unit-test code modified). Integration tests not re-run (no integration-test code modified except Program.cs which adds a pre-existing-code short-circuit; the --smoke / --full paths are unchanged). The final --full run at Step 16 will exercise the integration suite end-to-end and the perf script will be invoked there.

Cumulative review trigger

This is batch 4. Cumulative review triggers at every K=3 batches (per .cursor/skills/implement/SKILL.md). The next cumulative review covers batches 46 — i.e. AZ-492 + AZ-494 + the final test run. Not triggered in this batch.

Auto-fix attempts: 0

No build / test failures observed. bash -n scripts/run-performance-tests.sh is clean; C# code compiles per the existing project structure (verified by reading the file — dotnet build not executed per the project's AGENTS.md "do not run dotnet build via terminal tools" guidance).