Files
Oleksandr Bezdieniezhnykh 080441db5d
ci/woodpecker/push/01-test Pipeline was successful
ci/woodpecker/push/02-build-push Pipeline was successful
[AZ-492] Cycle 3 batch 4: perf harness PT-07 + PT-08 + JWT-attach
Drains all three deferred perf-harness items in one batch:
- PT-01..PT-06 now carry Authorization: Bearer minted via the canonical
  SatelliteProvider.TestSupport.JwtTokenFactory (AZ-491) — no third copy
  of JWT logic in the shell.
- PT-07 implemented as cold + warm dual-pass distribution (N=20 each),
  reports p50/p95 for both passes and fails if warm p95 >= cold p95.
- PT-08 implemented as 20-batch upload distribution with batch p95 gated
  at the AZ-488 2000 ms target; per-item gate cost reported as derived
  proxy (batch_p95 / batch_size).

New SatelliteProvider.IntegrationTests/PerfBootstrap.cs adds two CLI
short-circuit subcommands (--mint-only and --gen-uav-fixture <path>)
invoked by the shell so the perf script never inlines the JWT or
JPEG-fixture logic. The dispatch sits at the top of Program.cs Main
and runs before any HTTP / DB / readiness setup.

performance-tests.md PT-07 + PT-08 flip from Deferred to Implemented.
traceability-matrix.md PT-07 + PT-08 rows move from recorded to covered
(PT-08 partial due to per-item proxy — flagged Low in batch-4 review).
_docs/_process_leftovers/2026-05-11_perf-pt07-harness.md deleted; the
leftovers directory is now empty.

Closes cycle-2 retro Action 2; LESSONS.md [process] rule about Deferred
NFRs remains in force as a guardrail.

Also includes the previously-uncommitted cumulative review report for
cycle-3 batches 01-03 (generated at the end of batch 3 but not staged).

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-12 01:52:25 +03:00

15 KiB
Raw Permalink Blame History

Code Review — Batch 04 cycle 3

Tasks reviewed: AZ-492 (Perf harness PT-07 + PT-08 + JWT-attach) Date: 2026-05-12 Verdict: PASS_WITH_WARNINGS (2 Low findings; 0 Critical/High/Medium)

Phase 1: Context

Spec inputs read: _docs/02_tasks/todo/AZ-492_perf_harness_pt07_pt08_jwt_attach.md; project restrictions / solution overview from prior batches still in force; cycle-2 retrospective Action 2 explicitly promoted this work; LESSONS.md [process] rule on Deferred-status NFRs is the governing guardrail. Changed files mapped to the AZ-492 ACs:

  • SatelliteProvider.IntegrationTests/PerfBootstrap.cs (new) → AC-1 (mint surface) + AC-3 (UAV fixture) + AC-6 (no duplicate mint)
  • SatelliteProvider.IntegrationTests/Program.cs (modified) → dispatch for AC-1 / AC-3 subcommands
  • scripts/run-performance-tests.sh (rewritten) → AC-1 / AC-2 / AC-3
  • _docs/02_document/tests/performance-tests.md → AC-4
  • _docs/02_document/tests/traceability-matrix.md → AC-4 (status visibility)
  • _docs/_process_leftovers/2026-05-11_perf-pt07-harness.md (deleted) → AC-5
  • Module-layout + tests_integration doc updates → architectural documentation
  • _docs/06_metrics/retro_2026-05-11_cycle2.md § Action 2 → process-resolution back-reference

Phase 2: Spec Compliance

AC Status Evidence
AC-1: PT-01..PT-06 no longer 401 Covered (static) 10 curl sites in scripts/run-performance-tests.sh all carry -H "$AUTH_HEADER"; AUTH_HEADER="Authorization: Bearer $PERF_JWT_TOKEN". The wait_region_completed polling helper also carries it (line 131). The PT-08 curl_args array carries it (line 389). Runtime verification deferred to Step 16.
AC-2: PT-07 runs to completion Covered (static) Two timing arrays (PT07_COLD_MS, PT07_WARM_MS) populated by 20 cold + 20 warm requests at the same coordinate set. percentile() awk helper computes p50 and p95 for both. Pass/fail asserts warm_p95 < cold_p95. Same-coordinate design guarantees the warm pass hits the cache (otherwise the cold pass would not have populated it).
AC-3: PT-08 runs to completion Covered (static) --gen-uav-fixture produces the JPEG once; 20 batches of 10 distinct-coordinate items uploaded. Per-batch accepted/rejected/failed counts surfaced. Batch p95 gated at 2000 ms. Per-item gate cost is a derived proxy (see Spec-vs-reality below).
AC-4: Spec status reflects implementation Covered performance-tests.md PT-07 and PT-08 both carry **Status**: **Implemented (AZ-492).**; "Deferred — harness work tracked in " language gone. traceability-matrix.md rows moved from ◐ recorded / ◐ recorded (Deferred) to .
AC-5: Leftover drained Covered _docs/_process_leftovers/2026-05-11_perf-pt07-harness.md deleted; ls _docs/_process_leftovers/ shows no files.
AC-6: Token-mint surface reused, not duplicated Covered rg 'new JwtSecurityToken\(' matches one source-code site only (SatelliteProvider.TestSupport/JwtTokenFactory.cs). PerfBootstrap.MintToken() is a 6-line delegation to JwtTokenFactory.Create. The shell script does not inline JWT logic — it shells out to dotnet <dll> --mint-only.

Contract verification: AZ-492 has no ## Contract section; not applicable.

Consumer-side contract verification: AZ-492 depends on AZ-487 (JWT_SECRET surface) and AZ-491 (canonical JwtTokenFactory). Both consumed correctly — the new PerfBootstrap.cs uses JwtTestHelpers.ResolveSecretOrThrow (the AZ-487-introduced env-var surface) and JwtTokenFactory.Create (the AZ-491 canonical factory). No drift.

Scope creep check: Implementation stayed within harness-only scope per spec § Excluded ("Any change to production code; this is harness-only work"). Documentation updates to module-layout / tests_integration / traceability-matrix are explicitly in scope for the doc-sync part of the spec. Retro back-reference is process hygiene, not scope creep.

Phase 3: Code Quality

SOLID:

  • PerfBootstrap has a clear single responsibility: short-circuit perf-harness subcommands. Two methods, both pure-CLI dispatch + delegate. Good.
  • Program.cs dispatch block is 13 lines — minimal coupling between perf-bootstrap and the rest of the runner. The decision to put it FIRST (before any HTTP / DB / env read) is correct — these subcommands should work even on a host where the API and Postgres are not running.

Error handling:

  • PerfBootstrap.MintToken catches InvalidOperationException from JwtTestHelpers.ResolveSecretOrThrow and writes to stderr, returning exit code 1. Good — the shell can detect the failure cleanly.
  • PerfBootstrap.GenerateUavFixture validates args.Length >= 2 and emits a usage hint on stderr with exit code 2. Good.
  • Shell script: every curl capture checks the HTTP code and adds to FAIL on mismatch. The percentile() helper guards against NR == 0 to avoid div-by-zero. The dotnet --mint-only capture checks [[ -z "$PERF_JWT_TOKEN" ]] after the call to catch an empty-output failure.

Naming: PerfSubject, PerfTokenLifetime, PerfBootstrap — clear, consistent prefix. Shell variables follow the existing convention (PT07_*, PT08_*, PERF_*). AUTH_HEADER is uppercase per shell-script convention.

Complexity:

  • PerfBootstrap.MintToken — 14 lines, no branches except the try/catch.
  • PerfBootstrap.GenerateUavFixture — 12 lines.
  • PerfBootstrap.CreateValidJpeg — 20 lines, single loop.
  • scripts/run-performance-tests.sh — 430 lines total. Each PT-NN scenario is ~2560 lines and is independently readable. The PT-08 batch loop is the longest single block (~70 lines including metadata construction); it could be extracted into a function but the inlining keeps the data flow obvious. Acceptable.

DRY:

  • The same JPEG-creation pattern now exists in three places: UavTileImageFactory.CreateRandomJpeg (unit), UavUploadTests.CreateValidJpeg (integration), and PerfBootstrap.CreateValidJpeg (perf bootstrap). This is flagged as L2 below with a recommended consolidation path.

Test quality: AZ-492 spec's "Unit Tests" table notes that AC-1 (Bearer-attach helper) is only testable if a helper function is introduced — the shell script inlines the header directly, so no shell-side helper to test. AC-6 is tested by the repo-wide grep (one source site → not new). Both AC verifications above use the static-check approach the spec authorises. The mint logic itself is unit-tested in SatelliteProvider.Tests/TestSupport/JwtTokenFactoryTests (AZ-491); the PerfBootstrap.MintToken wrapper is a 6-line delegation and is exercised end-to-end by Step 16's actual perf run.

Dead code: None added. The deleted leftover file removes ~50 lines of stale process documentation.

Phase 4: Security Quick-Scan

  • Token lifetime: 4 hours, mitigation for AZ-492 Risk 3 (token expiry mid-test). Tokens are minted on each script run and never persisted; the perf script's tmpdir is wiped on exit (trap-based cleanup). No JWT material ends up on disk.
  • Secret handling: JWT_SECRET is read from env or .env (gitignored). Never echoed. The byte-length check fails fast for under-32-byte secrets — same contract as scripts/run-tests.sh. Good.
  • PERF_JWT_TOKEN env var: documented as the operator-supplied alternative to in-script minting. If the operator pre-mints, the script does not echo the token value (only its byte length). Good.
  • Subject value: perf-tests — distinct from the integration test runner subject (integration-tests) so audit logs can disambiguate the source. Good.
  • permissions: GPS claim: required by AZ-488's UAV upload endpoint. Granted to the perf token so PT-08 can exercise the AC-3 path. No other permissions are minted — least-privilege ish.
  • Input validation on --gen-uav-fixture: path is treated as a literal filesystem path. The script writes to $PERF_TMP_DIR/uav_fixture.jpg which is a freshly-created mktemp -d directory. No path-traversal risk in the current call site; if a future consumer passes an untrusted path it would write to that path — documented behaviour for an internal test helper.

No new attack surface introduced; no secret material touches version control; no new endpoints exposed.

Phase 5: Performance Scan

The harness IS the performance scan. Observations on the harness itself:

  • dotnet build runs once per script invocation (or skipped if the DLL already exists). Build time ~510 s on dev hardware — acceptable for a perf run.
  • dotnet <dll> --mint-only startup is ~1.52 s (CLR cold start + a tiny token mint). Acceptable for a one-time bootstrap.
  • The cold + warm passes in PT-07 do 40 total region requests at ~200m zoom 18 — about ~40 × (515 s per request) = 3.510 minutes for PT-07 alone. PT-08 adds another 20 × (2002000 ms) = 440 s. The full script run on dev hardware is in the 815 minute range; not fast, not glacial.
  • The awk percentile helper sorts the input array — O(N log N) per call. With N=20 this is trivial.

No performance concerns in the harness code itself.

Phase 6: Cross-Task Consistency

  • Naming alignment with AZ-491: PerfBootstrap.PerfSubject = "perf-tests" mirrors the AZ-491 pattern of JwtTestHelpers.DefaultSubject = "integration-tests". Consistent. PermissionsClaimType = "permissions" matches the value UavUploadTests and the API use.
  • Architecture alignment with AZ-491 + AZ-493: pure / stateless logic stays in TestSupport (JwtTokenFactory, IntegrationTestResetGuard); side-effectful / dependency-bearing logic stays in the consumer (IntegrationTestDatabaseReset for Npgsql, PerfBootstrap for ImageSharp). Same boundary as batches 2 and 3. Good.
  • Documentation pattern: AZ-492 follows the AZ-491 / AZ-493 doc pattern — tests_integration.md gets the runtime surface, module-layout.md gets the boundary rationale, performance-tests.md gets the test-spec status flip. Consistent.
  • Duplicate test-helper detection (the Phase 6 rule added by AZ-491 review): the JPEG factory triple is flagged below as L2 — the rule fires.

Findings

L1 (Low / Spec accuracy) — AZ-492 AC-3 "per-item gate cost" satisfied by proxy, not direct measurement

Location: scripts/run-performance-tests.sh PT-08 (~line 410); _docs/02_document/tests/performance-tests.md PT-08 entry.

Issue: AC-3 says "the script reports per-item gate cost and end-to-end batch latency". The end-to-end batch latency is direct; the per-item gate cost is reported as the derived proxy batch_p95 / batch_size. True per-call UavTileQualityGate.Validate timing requires server-side instrumentation that the AZ-492 spec § Excluded explicitly excludes ("Any change to production code; this is harness-only work").

Severity rationale: Low / Spec accuracy. The deviation is a conscious trade-off documented in both the script comments AND the test-spec doc AND the traceability-matrix row (which now reads ✓ (batch p95) / ◐ (per-item proxy only)). The AC is satisfied in spirit (the harness produces a per-item number); future work on production-side timing would replace the proxy with the real value.

Suggested follow-up: PBI to add a quality_gate_validate_ms field to the per-item response or a metrics endpoint, then update the perf script to consume it. Estimate 2 SP. Sequence: after the first AZ-492 perf-gate result determines whether the proxy is misleading in practice.

L2 (Low / Maintainability) — Duplicate CreateValidJpeg-shaped JPEG factory now in THREE locations

Location:

  • SatelliteProvider.Tests/TestUtilities/UavTileImageFactory.cs (unit tests — internal)
  • SatelliteProvider.IntegrationTests/UavUploadTests.cs § CreateValidJpeg (integration — private)
  • SatelliteProvider.IntegrationTests/PerfBootstrap.cs § CreateValidJpeg (perf bootstrap — private)

Issue: All three produce a 256×256 random-noise JPEG via ImageSharp Image<Rgba32> with random.Next(256) per channel and JpegEncoder { Quality = 95 }. The implementations differ trivially (constants, comments) but the logical surface is identical. This is exactly the cycle-2 problem that AZ-491 solved for the JWT factory.

Severity rationale: Low / Maintainability. None of the three is wrong; the issue is future drift — if the quality gate becomes pickier (e.g. enforces minimum entropy), all three factories must update in lockstep. The AZ-491 review explicitly added a Phase 6 rule for this pattern, and the rule fires here.

Suggested follow-up: Move the JPEG factory to SatelliteProvider.TestSupport/UavTileImageFactory.cs (extending the AZ-491 boundary). PerfBootstrap then becomes a one-line var bytes = UavTileImageFactory.CreateRandomJpeg();. The integration-tests UavUploadTests consumes the same surface, eliminating the third copy. Cost: 12 SP. The ImageSharp dependency would have to be added to SatelliteProvider.TestSupport.csproj, which is acceptable because both consumers (Tests + IntegrationTests) already depend on it.

Not a blocker for this PBI because (a) the proximate AZ-492 scope was harness-only, (b) extracting the factory in this batch would have pulled ImageSharp into TestSupport — a non-trivial architectural change that warrants its own review, and (c) the precedent is well-established (AZ-491 split the JWT factory; this is the natural sequel).

Architecture Compliance

  • Layering: PerfBootstrap sits in SatelliteProvider.IntegrationTests (Layer-99 test infra). It calls into SatelliteProvider.TestSupport (Layer-99 test infra) which it already ProjectReferences. No production layer touched. The new module-layout note documents this explicitly.
  • WebApi documentation convention (AZ-495): not relevant to this batch; no WebApi changes.
  • Test-isolation guardrail (AZ-493): PT-07's distinct-coordinate cold pass means the test data spreads across 20 cells per run. The persistent Postgres volume + AZ-493 reset hook handle cleanup between integration-test runs; the perf script doesn't share a runner with the integration tests, but if a future operator runs PT-07 against the same volume the cells will accumulate. Not a problem in practice (the integration-test reset hook truncates tiles on the next integration run), but worth noting in the perf-gate playbook if PT-07 ever starts mis-firing due to pre-existing cells at the chosen coordinates.

Recurring patterns (for the cycle 3 retrospective)

  • Spec-vs-reality on derived measurements: PT-08's "per-item gate cost" became a proxy because the spec didn't constrain the measurement path. AZ-493 had a similar pattern (DB-name vs Host-allowlist). The cycle-3 retro should capture: "ACs that prescribe a measurement should also prescribe the path for collecting it, or note that the harness gets to choose between direct measurement and a proxy."
  • Triple-duplicate test fixtures: the JWT factory was consolidated in AZ-491; the JPEG factory is the natural next target. Capture as an Improvement Action for cycle 4.

Verdict

PASS_WITH_WARNINGS — implementation satisfies all 6 ACs (runtime verification of AC-1 / AC-2 / AC-3 at Step 16). Two Low findings, both deferred to future PBIs by explicit scope choices in AZ-492. No Critical/High/Medium findings.

Ready to merge / advance to the next batch.