Files
satellite-provider/_docs/06_metrics/perf_2026-05-23_cycle8.md
T
Oleksandr Bezdieniezhnykh 32bc5c1e48
ci/woodpecker/push/01-test Pipeline was successful
ci/woodpecker/push/02-build-push Pipeline was successful
[AZ-808] [AZ-809] [AZ-810] [AZ-811] [AZ-812] Cycle 8 perf run
8/8 scenarios PASS within threshold. Cycle-8 strict-validation
overhead is below percentile resolution on every measured
endpoint.

PT-06 (route creation) required one in-cycle perf-script fix:
add requestMaps=false + createTilesZip=false to the body to
satisfy AZ-809's no-defaulting rule. The script had already
been updated for AZ-812's wire rename during cycle 8 but missed
AZ-809's newly required fields. Production code is correct; only
the perf probe was stale.

Report: _docs/06_metrics/perf_2026-05-23_cycle8.md. Trend vs
cycle 7 is flat within noise band on every scenario.

Known harness quirks (pre-existing, not cycle-8 regressions)
surfaced and documented for cleanup:
- PT-07 cross-run cache pollution (hard-coded base coords)
- PT-01 "cold" misnomer (tile cached on disk since cycle 5)
- PT-03 cached-by-PT-02 side effect (cycle-7 note carried forward)

Auto-chains to Step 16 (Deploy).

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-23 16:10:12 +03:00

76 lines
10 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Perf Run — Cycle 8 (AZ-808 + AZ-809 + AZ-810 + AZ-811 + AZ-812)
**Date**: 2026-05-23T12:50Z (first run) + 2026-05-23T12:53Z (re-run after PT-06 script fix)
**Run label**: cycle8 — full default-parameter run after the cycle-8 strict-validation sweep (AZ-808 region POST + AZ-809 route POST + AZ-810 UAV upload metadata + AZ-811 GET tile lat/lon + AZ-812 region OSM rename) and the in-cycle F-AZ809-1 polygon-cap follow-up (commit `8fca6e0`).
**Trigger**: autodev existing-code Step 15 (Performance Test gate). Cycle 8 goal: confirm that adding FluentValidation + `JsonUnmappedMemberHandling.Disallow` to four endpoints — and the new 50-polygon cap on `POST /api/satellite/route` — introduced no measurable regression on existing perf scenarios.
**Runner**: `scripts/run-performance-tests.sh` (default params: `PERF_REPEAT_COUNT=20`, `PERF_UAV_BATCH_SIZE=10`). Two runs were needed; see "PT-06 script fix" below.
**System under test**: `docker compose up -d --build` against `mcr.microsoft.com/dotnet/aspnet:10.0`; api healthy on `https://localhost:18980` (TLS+ALPN, dev cert `./certs/api.crt` trusted via `--cacert`). Postgres on `localhost:5433`. Single docker-compose stack lifecycle across both runs (no restart between them).
**Build**: `SatelliteProvider.IntegrationTests` Release built inside the SDK; 0 errors / 15 warnings (carried-over NU1902 IdentityModel — already tracked as D-AZ795-1 / Low / Hardening — plus CA2227 setter-on-collection — already noted, design choice for DTO mutability).
**JWT**: minted by `SatelliteProvider.IntegrationTests --mint-only` (canonical `JwtTokenFactory` surface per AZ-491); 4 h lifetime, 341 bytes.
## Results
| # | Scenario | Verdict | Observed | Threshold | Source of measurement | Source of threshold |
|---|----------|---------|----------|-----------|-----------------------|---------------------|
| PT-01 | Tile download (cold) | **PASS** | 885 ms | ≤ 30000 ms | run 1 | `_docs/02_document/tests/performance-tests.md` |
| PT-02 | Cached tile retrieval | **PASS** | 244 ms | ≤ 500 ms | run 1 | `_docs/02_document/tests/performance-tests.md` |
| PT-03 | Region 200 m / z18 | **PASS** | 99 ms | ≤ 60000 ms | run 1 | `_docs/02_document/tests/performance-tests.md` |
| PT-04 | Region 500 m / z18 + stitch | **PASS** | 2128 ms | ≤ 120000 ms | run 1 | `_docs/02_document/tests/performance-tests.md` |
| PT-05 | 5 concurrent regions | **PASS** | 2663 ms | ≤ 300000 ms | run 1 | `_docs/02_document/tests/performance-tests.md` |
| PT-06 | Route creation (2 points) | **PASS** (after fix) | 83 ms | ≤ 5000 ms | run 2 (post-script-fix; see below) | `_docs/02_document/tests/performance-tests.md` |
| PT-07 | Region request distribution (N=20, cold + warm) | **PASS** | cold p50=2113 ms, p95=2274 ms · warm p50=52 ms, p95=108 ms | warm p95 < cold p95 (delta 2166 ms) | run 1 — only run with a true cold cache for the PT-07 coord band; see "Known harness quirks" below | AZ-484 / AZ-492 |
| PT-08 | UAV batch upload (batch=10, N=20) | **PASS** | batch p50=106 ms, p95=379 ms; per-item proxy p95=37 ms; accepted=200, rejected=0, failed=0 | batch p95 ≤ 2000 ms (AZ-488) | run 1 | `_docs/02_document/tests/performance-tests.md` |
**Raw verdict: 8 Pass · 0 Warn · 0 Fail · 0 Unverified** (after the PT-06 script-fix re-run).
## PT-06 script fix (AZ-809 contract-sync follow-up)
Run 1 reported `PT-06: HTTP 400 (expected 200)`. Root cause: the perf script's `POST /api/satellite/route` body omitted `requestMaps` and `createTilesZip`. AZ-809 (cycle 8) made both fields required — no defaulting — under FluentValidation's `RuleFor(...).NotNull()` chain (see `BT-29` rule 10 in `_docs/02_document/tests/blackbox-tests.md` and `route-creation.md` v1.0.1 Inv-10 / Rule 10). The perf script had been updated for AZ-812's lat/lon wire rename during cycle 8 but missed AZ-809's newly required fields — a contract-sync miss, not a production regression.
Fix: one line in `scripts/run-performance-tests.sh` PT-06 body — added `"requestMaps":false,"createTilesZip":false`. Re-run confirmed `PT-06: 83 ms` (well under the 5000 ms threshold, comparable to cycle-7's 161 ms). Both fields are `false` so the rule-12 cross-field (`createTilesZip=true requires requestMaps=true`) is not exercised — the scenario's intent is route-creation latency, not the cross-field gate.
## AZ-808 + AZ-809 + AZ-810 + AZ-811 + AZ-812 NFR verification
Cycle 8 added per-endpoint strict validation (FluentValidation + `JsonUnmappedMemberHandling.Disallow`) to four endpoints. The directly relevant scenarios are PT-03 / PT-04 / PT-05 (region POST — exercises AZ-808 validator on every accept), PT-06 (route POST — exercises AZ-809 validator), PT-07 (region POST distribution — exercises AZ-808 validator 40 times), and PT-08 (UAV upload — exercises AZ-810 validator 200 times via the 20×10 batch). PT-01 / PT-02 exercise AZ-811 (GET tile lat/lon) on every call.
**Validator cost**: invisible at the percentile resolutions reported. Each cycle-8 validator iterates O(N) bounds checks on a small input (typical N ≤ 20 for routes / regions / UAV batches), each check is constant-time, and the FluentValidation rule cache amortises the rule-tree across requests. The measured p95 deltas vs cycle 7 are all within noise band (see "Trend vs cycle 7" below).
**F-AZ809-1 polygon-cap fix (commit `8fca6e0`)**: the new `MaxPolygons = 50` check is a single `Count <= 50` comparison on the chained `RuleFor(...).Must(...)`; it runs in O(1) and is invisible at the PT-06 resolution (83 ms). No new perf scenario was added for the cap because the cap's intent is allocation-bounding under adversarial load, not steady-state latency — that load profile would belong in a separate adversarial/abuse scenario, deferred (see "Follow-ups").
**Auth-before-validation ordering**: confirmed in `Program.cs` (`UseAuthentication()` / `UseAuthorization()` run before any `WithValidation()` endpoint filter and before `UavUploadValidationFilter`). PT-06 / PT-07 / PT-08 calls carry the Bearer token; an unauthenticated probe would 401 before any validator runs, so the validator cost is bounded by authenticated traffic only.
## Trend comparison vs cycle 7
| Scenario | Cycle 7 | Cycle 8 | Δ | Cause |
|----------|---------|---------|---|-------|
| PT-01 cold | 998 ms | 885 ms | -113 ms | noise band (Google Maps DNS / cold-network variance) |
| PT-02 cached | 269 ms | 244 ms | -25 ms | noise |
| PT-03 region 200 m | 139 ms | 99 ms | -40 ms | noise (both runs warm-cache hits on the PT-02 / PT-03 shared coord) |
| PT-04 region 500 m + stitch | 2110 ms | 2128 ms | +18 ms | noise |
| PT-05 5 concurrent | 3145 ms | 2663 ms | -482 ms | noise band (queue scheduler variance under 5-way concurrency) |
| PT-06 route create | 161 ms | 83 ms | -78 ms | noise band (TLS connection state); post-fix value |
| PT-07 cold p95 / warm p95 | 2608 ms / 76 ms | 2274 ms / 108 ms | cold -334 ms / warm +32 ms | warm uptick is noise (≈40 % of the cycle-7 warm-p95 absolute number — both within sub-200 ms noise band on dev hardware); cold improved as the network state stabilised. Delta cold→warm remains > 2000 ms — same order of magnitude as cycle 7 (2532 ms). |
| PT-08 batch p95 | 284 ms | 379 ms | +95 ms | noise band (single-client p95 over 20 batches on shared dev hardware; the cycle 7 number was on the low end of the historical distribution — cycle 5 measured 350 ms, cycle 6 measured 544 ms, cycle 8 sits in the middle of that band). |
**Zero scenarios show a meaningful regression attributable to cycle 8.** All deltas are within the historical noise band for dev hardware (PT-08 has the widest historical band: 284544 ms across cycles 5/6/7/8).
## Known harness quirks (pre-existing — not cycle-8 regressions)
These surfaced during this run but are NOT caused by cycle 8. Each is documented here for trend-tracking visibility; remediation is a perf-harness-cleanup track, not a cycle-8 deliverable.
- **PT-07 cross-run cache pollution**: `PT07_BASE_LAT` / `PT07_BASE_LON` are hard-coded constants (47.471747, 37.657063) with deterministic per-iteration offsets. Back-to-back perf runs against the same docker-compose stack lifecycle reuse the same coord band, so the second run's "cold pass" is actually a warm-cache hit (p95 dropped from 2274 ms in run 1 to 70 ms in run 2; warm p95 also 70 ms; the script's strict `<` check then fails because they're equal). Cycle-7's report (§ "Trend comparison") implicitly noted the same family of issue for PT-03. Fix candidate: parameterise the base coord by a per-run nonce or `PERF_RUN_LABEL` env so each cycle's PT-07 starts cold.
- **PT-01 "cold" misnomer**: `PT01_LAT` / `PT01_LON` (47.461347, 37.646663) are hard-coded; the tile has been cached on the host filesystem since cycle 5 (`./tiles/18/...`). The reported number is "first-request latency on a stack-lifecycle-fresh API process," not a true Google-Maps round-trip. The 885 ms is comfortably under the 30 s threshold because the threshold was set for the genuine cold case; the measurement nonetheless under-reports cold-path latency. Fix candidate same as PT-07: parameterise by nonce.
- **PT-03 cached-by-PT-02 side effect** (cycle 7's note): PT-03 reuses the (47.461747, 37.647063) coord PT-02 already populated; this is by design (PT-03's threshold is 60 s end-to-end, so even a fully cold region would pass, and the test's intent is region-orchestration overhead, not tile-fetch latency).
## Follow-ups (not blocking deploy)
1. **Perf-harness cleanup** — parameterise `PT07_BASE_*` and `PT01_*` coords by a run-nonce env (e.g., `PERF_RUN_LABEL`) so back-to-back runs and trend comparisons are not corrupted by cross-run cache state. Tracker entry candidate (sized ~2 SP).
2. **F-AZ809-1 cap adversarial scenario** — add an explicit PT-NN that posts 50 polygons (the cap) and 51 polygons (one over), measures validator latency and 400-response time. This converts the cap's intent (DoS-bound) into a measurable regression-gate. Deferred to cycle 9 (sized ~3 SP — adds one perf scenario + threshold + report row).
3. **PT-09 promotion**`tile-inventory.md` already notes the option to promote `TileInventoryTests.PerformanceBudget_AC4` from full-suite to `scripts/run-performance-tests.sh § PT-09` if the budget tightens. Not needed at the cycle-8 budget level; track for cycle 10+.
## Verdict (Step 15)
**PASS** — 8/8 scenarios within threshold (after the trivial AZ-809 contract-sync fix to PT-06's body). Zero meaningful regressions attributable to cycle 8. Cycle-8 strict-validation overhead is below percentile resolution on every measured endpoint.
Cleared to auto-chain to Step 16 (Deploy).