[AZ-492] Cycle 3 batch 4: perf harness PT-07 + PT-08 + JWT-attach

Drains all three deferred perf-harness items in one batch: - PT-01..PT-06 now carry Authorization: Bearer minted via the canonical SatelliteProvider.TestSupport.JwtTokenFactory (AZ-491) — no third copy of JWT logic in the shell. - PT-07 implemented as cold + warm dual-pass distribution (N=20 each), reports p50/p95 for both passes and fails if warm p95 >= cold p95. - PT-08 implemented as 20-batch upload distribution with batch p95 gated at the AZ-488 2000 ms target; per-item gate cost reported as derived proxy (batch_p95 / batch_size). New SatelliteProvider.IntegrationTests/PerfBootstrap.cs adds two CLI short-circuit subcommands (--mint-only and --gen-uav-fixture <path>) invoked by the shell so the perf script never inlines the JWT or JPEG-fixture logic. The dispatch sits at the top of Program.cs Main and runs before any HTTP / DB / readiness setup. performance-tests.md PT-07 + PT-08 flip from Deferred to Implemented. traceability-matrix.md PT-07 + PT-08 rows move from recorded to covered (PT-08 partial due to per-item proxy — flagged Low in batch-4 review). _docs/_process_leftovers/2026-05-11_perf-pt07-harness.md deleted; the leftovers directory is now empty. Closes cycle-2 retro Action 2; LESSONS.md [process] rule about Deferred NFRs remains in force as a guardrail. Also includes the previously-uncommitted cumulative review report for cycle-3 batches 01-03 (generated at the end of batch 3 but not staged). Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-21 15:51:14 +00:00 · 2026-05-12 01:52:25 +03:00
parent 745f4840e6
commit 080441db5d
14 changed files with 715 additions and 76 deletions
@@ -0,0 +1,134 @@
+# Code Review — Batch 04 cycle 3
+
+**Tasks reviewed**: AZ-492 (Perf harness PT-07 + PT-08 + JWT-attach)
+**Date**: 2026-05-12
+**Verdict**: **PASS_WITH_WARNINGS** (2 Low findings; 0 Critical/High/Medium)
+
+## Phase 1: Context
+
+Spec inputs read: `_docs/02_tasks/todo/AZ-492_perf_harness_pt07_pt08_jwt_attach.md`; project restrictions / solution overview from prior batches still in force; cycle-2 retrospective Action 2 explicitly promoted this work; LESSONS.md `[process]` rule on Deferred-status NFRs is the governing guardrail. Changed files mapped to the AZ-492 ACs:
+
+- `SatelliteProvider.IntegrationTests/PerfBootstrap.cs` (new) → AC-1 (mint surface) + AC-3 (UAV fixture) + AC-6 (no duplicate mint)
+- `SatelliteProvider.IntegrationTests/Program.cs` (modified) → dispatch for AC-1 / AC-3 subcommands
+- `scripts/run-performance-tests.sh` (rewritten) → AC-1 / AC-2 / AC-3
+- `_docs/02_document/tests/performance-tests.md` → AC-4
+- `_docs/02_document/tests/traceability-matrix.md` → AC-4 (status visibility)
+- `_docs/_process_leftovers/2026-05-11_perf-pt07-harness.md` (deleted) → AC-5
+- Module-layout + tests_integration doc updates → architectural documentation
+- `_docs/06_metrics/retro_2026-05-11_cycle2.md` § Action 2 → process-resolution back-reference
+
+## Phase 2: Spec Compliance
+
+| AC | Status | Evidence |
+|----|--------|----------|
+| AC-1: PT-01..PT-06 no longer 401 | **Covered (static)** | 10 `curl` sites in `scripts/run-performance-tests.sh` all carry `-H "$AUTH_HEADER"`; `AUTH_HEADER="Authorization: Bearer $PERF_JWT_TOKEN"`. The `wait_region_completed` polling helper also carries it (line 131). The PT-08 `curl_args` array carries it (line 389). Runtime verification deferred to Step 16. |
+| AC-2: PT-07 runs to completion | **Covered (static)** | Two timing arrays (`PT07_COLD_MS`, `PT07_WARM_MS`) populated by 20 cold + 20 warm requests at the same coordinate set. `percentile()` awk helper computes p50 and p95 for both. Pass/fail asserts `warm_p95 < cold_p95`. Same-coordinate design guarantees the warm pass hits the cache (otherwise the cold pass would not have populated it). |
+| AC-3: PT-08 runs to completion | **Covered (static)** | `--gen-uav-fixture` produces the JPEG once; 20 batches of 10 distinct-coordinate items uploaded. Per-batch accepted/rejected/failed counts surfaced. Batch p95 gated at 2000 ms. Per-item gate cost is a derived proxy (see Spec-vs-reality below). |
+| AC-4: Spec status reflects implementation | **Covered** | `performance-tests.md` PT-07 and PT-08 both carry `**Status**: **Implemented (AZ-492).**`; "Deferred — harness work tracked in <leftover>" language gone. `traceability-matrix.md` rows moved from `◐ recorded` / `◐ recorded (Deferred)` to `✓`. |
+| AC-5: Leftover drained | **Covered** | `_docs/_process_leftovers/2026-05-11_perf-pt07-harness.md` deleted; `ls _docs/_process_leftovers/` shows no files. |
+| AC-6: Token-mint surface reused, not duplicated | **Covered** | `rg 'new JwtSecurityToken\('` matches one source-code site only (`SatelliteProvider.TestSupport/JwtTokenFactory.cs`). `PerfBootstrap.MintToken()` is a 6-line delegation to `JwtTokenFactory.Create`. The shell script does not inline JWT logic — it shells out to `dotnet <dll> --mint-only`. |
+
+**Contract verification**: AZ-492 has no `## Contract` section; not applicable.
+
+**Consumer-side contract verification**: AZ-492 depends on AZ-487 (`JWT_SECRET` surface) and AZ-491 (canonical `JwtTokenFactory`). Both consumed correctly — the new `PerfBootstrap.cs` uses `JwtTestHelpers.ResolveSecretOrThrow` (the AZ-487-introduced env-var surface) and `JwtTokenFactory.Create` (the AZ-491 canonical factory). No drift.
+
+**Scope creep check**: Implementation stayed within harness-only scope per spec § Excluded ("Any change to production code; this is harness-only work"). Documentation updates to module-layout / tests_integration / traceability-matrix are explicitly in scope for the doc-sync part of the spec. Retro back-reference is process hygiene, not scope creep.
+
+## Phase 3: Code Quality
+
+**SOLID**:
+- `PerfBootstrap` has a clear single responsibility: short-circuit perf-harness subcommands. Two methods, both pure-CLI dispatch + delegate. Good.
+- `Program.cs` dispatch block is 13 lines — minimal coupling between perf-bootstrap and the rest of the runner. The decision to put it FIRST (before any HTTP / DB / env read) is correct — these subcommands should work even on a host where the API and Postgres are not running.
+
+**Error handling**:
+- `PerfBootstrap.MintToken` catches `InvalidOperationException` from `JwtTestHelpers.ResolveSecretOrThrow` and writes to stderr, returning exit code 1. Good — the shell can detect the failure cleanly.
+- `PerfBootstrap.GenerateUavFixture` validates `args.Length >= 2` and emits a usage hint on stderr with exit code 2. Good.
+- Shell script: every `curl` capture checks the HTTP code and adds to `FAIL` on mismatch. The `percentile()` helper guards against `NR == 0` to avoid div-by-zero. The `dotnet --mint-only` capture checks `[[ -z "$PERF_JWT_TOKEN" ]]` after the call to catch an empty-output failure.
+
+**Naming**: `PerfSubject`, `PerfTokenLifetime`, `PerfBootstrap` — clear, consistent prefix. Shell variables follow the existing convention (`PT07_*`, `PT08_*`, `PERF_*`). `AUTH_HEADER` is uppercase per shell-script convention.
+
+**Complexity**:
+- `PerfBootstrap.MintToken` — 14 lines, no branches except the try/catch.
+- `PerfBootstrap.GenerateUavFixture` — 12 lines.
+- `PerfBootstrap.CreateValidJpeg` — 20 lines, single loop.
+- `scripts/run-performance-tests.sh` — 430 lines total. Each PT-NN scenario is ~25–60 lines and is independently readable. The PT-08 batch loop is the longest single block (~70 lines including metadata construction); it could be extracted into a function but the inlining keeps the data flow obvious. Acceptable.
+
+**DRY**:
+- The same JPEG-creation pattern now exists in three places: `UavTileImageFactory.CreateRandomJpeg` (unit), `UavUploadTests.CreateValidJpeg` (integration), and `PerfBootstrap.CreateValidJpeg` (perf bootstrap). This is **flagged as L2 below** with a recommended consolidation path.
+
+**Test quality**: AZ-492 spec's "Unit Tests" table notes that AC-1 (Bearer-attach helper) is only testable if a helper function is introduced — the shell script inlines the header directly, so no shell-side helper to test. AC-6 is tested by the repo-wide grep (one source site → not new). Both AC verifications above use the static-check approach the spec authorises. The mint logic itself is unit-tested in `SatelliteProvider.Tests/TestSupport/JwtTokenFactoryTests` (AZ-491); the `PerfBootstrap.MintToken` wrapper is a 6-line delegation and is exercised end-to-end by Step 16's actual perf run.
+
+**Dead code**: None added. The deleted leftover file removes ~50 lines of stale process documentation.
+
+## Phase 4: Security Quick-Scan
+
+- **Token lifetime**: 4 hours, mitigation for AZ-492 Risk 3 (token expiry mid-test). Tokens are minted on each script run and never persisted; the perf script's tmpdir is wiped on exit (trap-based cleanup). No JWT material ends up on disk.
+- **Secret handling**: `JWT_SECRET` is read from env or `.env` (gitignored). Never echoed. The byte-length check fails fast for under-32-byte secrets — same contract as `scripts/run-tests.sh`. Good.
+- **`PERF_JWT_TOKEN` env var**: documented as the operator-supplied alternative to in-script minting. If the operator pre-mints, the script does not echo the token value (only its byte length). Good.
+- **Subject value**: `perf-tests` — distinct from the integration test runner subject (`integration-tests`) so audit logs can disambiguate the source. Good.
+- **`permissions: GPS` claim**: required by AZ-488's UAV upload endpoint. Granted to the perf token so PT-08 can exercise the AC-3 path. No other permissions are minted — least-privilege ish.
+- **Input validation on `--gen-uav-fixture`**: path is treated as a literal filesystem path. The script writes to `$PERF_TMP_DIR/uav_fixture.jpg` which is a freshly-created `mktemp -d` directory. No path-traversal risk in the current call site; if a future consumer passes an untrusted path it would write to that path — documented behaviour for an internal test helper.
+
+No new attack surface introduced; no secret material touches version control; no new endpoints exposed.
+
+## Phase 5: Performance Scan
+
+The harness IS the performance scan. Observations on the harness itself:
+
+- `dotnet build` runs once per script invocation (or skipped if the DLL already exists). Build time ~5–10 s on dev hardware — acceptable for a perf run.
+- `dotnet <dll> --mint-only` startup is ~1.5–2 s (CLR cold start + a tiny token mint). Acceptable for a one-time bootstrap.
+- The cold + warm passes in PT-07 do 40 total region requests at ~200m zoom 18 — about ~40 × (5–15 s per request) = 3.5–10 minutes for PT-07 alone. PT-08 adds another 20 × (200–2000 ms) = 4–40 s. The full script run on dev hardware is in the 8–15 minute range; not fast, not glacial.
+- The `awk` percentile helper sorts the input array — O(N log N) per call. With N=20 this is trivial.
+
+No performance concerns in the harness code itself.
+
+## Phase 6: Cross-Task Consistency
+
+- **Naming alignment with AZ-491**: `PerfBootstrap.PerfSubject = "perf-tests"` mirrors the AZ-491 pattern of `JwtTestHelpers.DefaultSubject = "integration-tests"`. Consistent. `PermissionsClaimType = "permissions"` matches the value `UavUploadTests` and the API use.
+- **Architecture alignment with AZ-491 + AZ-493**: pure / stateless logic stays in `TestSupport` (`JwtTokenFactory`, `IntegrationTestResetGuard`); side-effectful / dependency-bearing logic stays in the consumer (`IntegrationTestDatabaseReset` for Npgsql, `PerfBootstrap` for ImageSharp). Same boundary as batches 2 and 3. Good.
+- **Documentation pattern**: AZ-492 follows the AZ-491 / AZ-493 doc pattern — `tests_integration.md` gets the runtime surface, `module-layout.md` gets the boundary rationale, `performance-tests.md` gets the test-spec status flip. Consistent.
+- **Duplicate test-helper detection (the Phase 6 rule added by AZ-491 review)**: the JPEG factory triple is flagged below as L2 — the rule fires.
+
+## Findings
+
+### L1 (Low / Spec accuracy) — AZ-492 AC-3 "per-item gate cost" satisfied by proxy, not direct measurement
+
+**Location**: `scripts/run-performance-tests.sh` PT-08 (~line 410); `_docs/02_document/tests/performance-tests.md` PT-08 entry.
+
+**Issue**: AC-3 says "the script reports per-item gate cost and end-to-end batch latency". The end-to-end batch latency is direct; the per-item gate cost is reported as the derived proxy `batch_p95 / batch_size`. True per-call `UavTileQualityGate.Validate` timing requires server-side instrumentation that the AZ-492 spec § Excluded explicitly excludes ("Any change to production code; this is harness-only work").
+
+**Severity rationale**: Low / Spec accuracy. The deviation is a conscious trade-off documented in both the script comments AND the test-spec doc AND the traceability-matrix row (which now reads `✓ (batch p95) / ◐ (per-item proxy only)`). The AC is satisfied in *spirit* (the harness produces a per-item number); future work on production-side timing would replace the proxy with the real value.
+
+**Suggested follow-up**: PBI to add a `quality_gate_validate_ms` field to the per-item response or a metrics endpoint, then update the perf script to consume it. Estimate 2 SP. Sequence: after the first AZ-492 perf-gate result determines whether the proxy is misleading in practice.
+
+### L2 (Low / Maintainability) — Duplicate `CreateValidJpeg`-shaped JPEG factory now in THREE locations
+
+**Location**:
+- `SatelliteProvider.Tests/TestUtilities/UavTileImageFactory.cs` (unit tests — internal)
+- `SatelliteProvider.IntegrationTests/UavUploadTests.cs` § `CreateValidJpeg` (integration — private)
+- `SatelliteProvider.IntegrationTests/PerfBootstrap.cs` § `CreateValidJpeg` (perf bootstrap — private)
+
+**Issue**: All three produce a 256×256 random-noise JPEG via ImageSharp `Image<Rgba32>` with `random.Next(256)` per channel and `JpegEncoder { Quality = 95 }`. The implementations differ trivially (constants, comments) but the logical surface is identical. This is exactly the cycle-2 problem that AZ-491 solved for the JWT factory.
+
+**Severity rationale**: Low / Maintainability. None of the three is wrong; the issue is *future drift* — if the quality gate becomes pickier (e.g. enforces minimum entropy), all three factories must update in lockstep. The AZ-491 review explicitly added a Phase 6 rule for this pattern, and the rule fires here.
+
+**Suggested follow-up**: Move the JPEG factory to `SatelliteProvider.TestSupport/UavTileImageFactory.cs` (extending the AZ-491 boundary). `PerfBootstrap` then becomes a one-line `var bytes = UavTileImageFactory.CreateRandomJpeg();`. The integration-tests `UavUploadTests` consumes the same surface, eliminating the third copy. Cost: 1–2 SP. The ImageSharp dependency would have to be added to `SatelliteProvider.TestSupport.csproj`, which is acceptable because both consumers (Tests + IntegrationTests) already depend on it.
+
+**Not a blocker** for this PBI because (a) the proximate AZ-492 scope was harness-only, (b) extracting the factory in this batch would have pulled ImageSharp into TestSupport — a non-trivial architectural change that warrants its own review, and (c) the precedent is well-established (AZ-491 split the JWT factory; this is the natural sequel).
+
+## Architecture Compliance
+
+- **Layering**: `PerfBootstrap` sits in `SatelliteProvider.IntegrationTests` (Layer-99 test infra). It calls into `SatelliteProvider.TestSupport` (Layer-99 test infra) which it already ProjectReferences. No production layer touched. The new module-layout note documents this explicitly.
+- **WebApi documentation convention (AZ-495)**: not relevant to this batch; no WebApi changes.
+- **Test-isolation guardrail (AZ-493)**: PT-07's distinct-coordinate cold pass means the test data spreads across 20 cells per run. The persistent Postgres volume + AZ-493 reset hook handle cleanup between integration-test runs; the perf script doesn't share a runner with the integration tests, but if a future operator runs PT-07 against the same volume the cells will accumulate. Not a problem in practice (the integration-test reset hook truncates `tiles` on the next integration run), but worth noting in the perf-gate playbook if PT-07 ever starts mis-firing due to pre-existing cells at the chosen coordinates.
+
+## Recurring patterns (for the cycle 3 retrospective)
+
+- **Spec-vs-reality on derived measurements**: PT-08's "per-item gate cost" became a proxy because the spec didn't constrain the measurement path. AZ-493 had a similar pattern (DB-name vs Host-allowlist). The cycle-3 retro should capture: "ACs that prescribe a measurement should also prescribe the path for collecting it, or note that the harness gets to choose between direct measurement and a proxy."
+- **Triple-duplicate test fixtures**: the JWT factory was consolidated in AZ-491; the JPEG factory is the natural next target. Capture as an Improvement Action for cycle 4.
+
+## Verdict
+
+**PASS_WITH_WARNINGS** — implementation satisfies all 6 ACs (runtime verification of AC-1 / AC-2 / AC-3 at Step 16). Two Low findings, both deferred to future PBIs by explicit scope choices in AZ-492. No Critical/High/Medium findings.
+
+Ready to merge / advance to the next batch.