# Performance Tests **Task**: AZ-586_test_performance **Name**: Performance tests (NFT-PERF-01..04) **Description**: Implement xUnit blackbox tests for the 4 performance scenarios — F3 cascade-delete P50 ≤ 50ms on a 1-waypoint mission, F3 cascade-delete P50 ≤ 200ms on the full chain (provisional baseline; lock after first green run), `GET /health` P50 ≤ 10ms, and `GET /missions?page=1&pageSize=20` P95 ≤ 100ms against a 1000-mission seed (provisional baseline). Every test runs 5 warm-up calls + the documented N measured calls; cold-start passes excluded. **Complexity**: 3 points **Dependencies**: AZ-576_test_infrastructure **Component**: Blackbox Tests **Tracker**: AZ-586 **Epic**: AZ-575 ## Problem Three latency thresholds are documented (AC-3.6 P50 ≤ 50ms for minimal cascade, AC-7.3 P50 ≤ 10ms for health, AC-2.3 implicit list latency) and one (NFT-PERF-02 full-chain cascade) is a baseline that subsequent runs must not regress by more than 50%. Without these tests, an unintentional N+1 query, missing index, or accidental serialization layer overhead could silently 10× the response time before the next manual perf benchmark catches it. The full-chain cascade test is especially load-bearing because the F3 cascade walks 5 dependency tables — a future indexing regression or transaction-wrap addition would show up here first. ## Outcome - All four NFT-PERF-01..04 scenarios run and pass against the dockerised `missions` service. - Each test produces a CSV row with `Category=Perf`, `Traces=AC-3.6` / `AC-3.1` / `AC-7.3` / `AC-2.3`, `Result=pass`, AND records P50 and P95 numeric values in the `Traces` column (e.g., `P50_MS=23.4, P95_MS=41.8`). - 5 warm-up calls precede every measured set; cold-start passes are excluded from the percentile computation. - All tests run sequentially against a single client (no concurrent connections) so HTTP/1.1 connection-reuse and JIT warm-up are deterministic. - Tests run only when `[Trait("Category","Perf")]` filter is active (default test suite filter excludes performance to keep the standard CI gate ≤ 15 min); a separate `scripts/run-performance-tests.sh` invocation runs them. ## Scope ### Included - NFT-PERF-01 F3 minimal cascade — `DELETE /missions/{id}` on 1-waypoint missions; P50 ≤ 50ms over 100 sequential calls. - NFT-PERF-02 F3 full cascade — `DELETE /missions/{id}` on `fixture_cascade_F3`-shaped missions; P50 ≤ 200ms over 50 sequential calls (provisional baseline). - NFT-PERF-03 Health endpoint — `GET /health` P50 ≤ 10ms over 100 sequential calls. - NFT-PERF-04 List pagination — `GET /missions?page=1&pageSize=20` P95 ≤ 100ms over 100 sequential calls against a 1000-mission seed (provisional baseline). - Recording P50/P95 to CSV `Traces` column for trend tracking even when not gated. - Performance suite is gated behind the `[Trait("Category","Perf")]` filter; standard CI gate excludes these. ### Excluded - Concurrency / contention tests (race scenarios) live in Task 17 (NFT-RES-08). - Resource consumption (RSS, FDs, connections) lives in Task 18 (NFT-RES-LIM). - Production-hardware (Jetson Orin) latency baselines — documented as a follow-up in `restrictions.md` H8; test environment baselines stand in. - Concurrent-client throughput / RPS — not in scope today; documented as Refactor Backlog. ## Acceptance Criteria **AC-1: NFT-PERF-01 F3 minimal cascade P50 ≤ 50ms** Given `missions` + `postgres-test` colocated on the same Docker network, `seed_one_default_vehicle` + 100 minimal missions (each with 1 waypoint, no media/annotations/detection/map_objects rows), AND 5 warm-up `DELETE` calls have completed on missions outside the measured set When the consumer issues 100 sequential `DELETE /missions/{id_i}` calls (one per seeded mission, 1 ≤ i ≤ 100) and records per-call wall-clock latency Then the P50 (median) of the 100 latencies is `≤ 50ms` And P50 + P95 are recorded to the CSV `Traces` column as `P50_MS=, P95_MS=` **AC-2: NFT-PERF-02 F3 full-chain cascade P50 ≤ 200ms** Given 50 missions each with the `fixture_cascade_F3` chain (3 map_objects, 2 waypoints, 2 media, 2 annotations, 2 detection rows) AND 5 warm-up calls on additional fixtures outside the measured set When the consumer issues 50 sequential `DELETE /missions/{id_i}` calls and records per-call wall-clock latency Then P50 ≤ 200ms (provisional baseline — to be locked at `measured + 50%` on first green run) And P50 + P95 recorded to CSV **AC-3: NFT-PERF-03 health endpoint P50 ≤ 10ms** Given `missions` running, no special seed, AND 5 warm-up `GET /health` calls When the consumer issues 100 sequential `GET /health` calls (no `Authorization` header) and records per-call wall-clock latency Then P50 ≤ 10ms And P50 + P95 recorded to CSV **AC-4: NFT-PERF-04 list pagination P95 ≤ 100ms (provisional)** Given `seed_one_default_vehicle` + 1000 missions referencing it, AND 5 warm-up `GET /missions?page=1&pageSize=20` calls When the consumer issues 100 sequential `GET /missions?page=1&pageSize=20` calls and records per-call wall-clock latency Then P95 ≤ 100ms (provisional baseline — to be locked at `measured + 50%` on first green run) And P50 + P95 recorded to CSV ## Non-Functional Requirements **Performance** - NFT-PERF-01: ≤ 30s wall-clock (100 calls × ≤ 50ms each + measurement overhead). Per `[Trait("max_ms","30000")]` xUnit timeout. - NFT-PERF-02: ≤ 60s wall-clock. - NFT-PERF-03: ≤ 5s wall-clock. - NFT-PERF-04: ≤ 30s wall-clock. **Reliability** - All tests SKIP if the runner cannot allocate ≥ 2 CPU cores and ≥ 2 GB free RAM (per `performance-tests.md` Notes). SKIP records `Result=skip` and `ErrorMessage=insufficient CPU/RAM`. Default CI runner spec must meet this — but degraded runners must not produce false-fail noise. - All tests assume `missions` and `postgres-test` are colocated on the same Docker network (no inter-host link). The fixture verifies this via `docker inspect missions-sut --format '{{.NetworkSettings.Networks.testnet.IPAddress}}'` returns non-empty. ## Blackbox Tests | AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References | |--------|------------------------|-------------|-------------------|----------------| | AC-1 | 100 minimal missions + 5 warm-ups | 100 sequential `DELETE /missions/{id}` | P50 ≤ 50ms; record P50/P95 | AC-3.6 | | AC-2 | 50 F3-fixture missions + 5 warm-ups | 50 sequential `DELETE /missions/{id}` | P50 ≤ 200ms (provisional); record P50/P95 | AC-3.1, AC-3.6 | | AC-3 | warm runtime + 5 warm-ups | 100 sequential `GET /health` | P50 ≤ 10ms; record P50/P95 | AC-7.3 | | AC-4 | 1000 missions + 5 warm-ups | 100 sequential `GET /missions?page=1&pageSize=20` | P95 ≤ 100ms (provisional); record P50/P95 | AC-2.3 | ## Constraints - Tests live in `Tests/Performance/` and are tagged `[Trait("Category","Perf")]` so the default CI gate excludes them. - A separate `scripts/run-performance-tests.sh` (created by AZ-576) invokes only this category. The standard `scripts/run-tests.sh` skips them. - Sequential single-client execution — no `Parallel.For` or `Task.WhenAll`; each call awaits the previous response. - Warm-up calls are NOT included in the percentile computation. Per `// Warmup` comment block in the test, the first 5 calls go to fixtures created specifically for warm-up (not the measured set). - The `Stopwatch`-based timing measures `HttpClient.SendAsync` wall-clock; serialization/deserialization overhead is INCLUDED (this is what end-users observe). - Provisional gates (NFT-PERF-02, NFT-PERF-04) are documented in source as `// PROVISIONAL — lock at measured + 50% on first green run` and `[Trait("provisional","yes")]`. - AAA pattern with `// Arrange` (seed + warm-up), `// Act` (measured calls + percentile compute), `// Assert` (gate + CSV record). ## Risks & Mitigation **Risk 1: CI variance breaks tight P50 ≤ 10ms gate (NFT-PERF-03)** - *Risk*: On a noisy-neighbour CI runner, even a static `/health` route can hiccup once per 100 calls; if the hiccup lands in the P50 region, the median exceeds 10ms. - *Mitigation*: P50 is robust to single outliers (median position 50 of 100). If the test still flakes, lock the gate at `measured P50 + 50%` after the first green run. **Risk 2: NFT-PERF-04 1000-mission seed overlaps with other tests' DB state** - *Risk*: Seeding 1000 missions affects pagination tests, list-shape tests, and date-filter tests — if NFT-PERF-04 runs before them in the same SUT lifetime, results drift. - *Mitigation*: NFT-PERF-04 lives in `[Collection("Perf1k")]` and uses `IClassFixture` to TRUNCATE all rows before its seed AND restore `seed_empty` after. Functional tests' fixtures handle their own seed; no cross-pollination. **Risk 3: Provisional gates accepted as locked gates** - *Risk*: Same as NFT-RES-LIM Risk 3 — if first run measures 80ms and the test passes, future engineers see the 100ms gate as the standard. - *Mitigation*: CI dashboards flag `measured / gate ratio > 0.8` for re-tuning. Lock-in workflow documented in `performance-tests.md`. ## System Under Test Boundary - Tests drive the product through the public HTTP surface (`http://missions:8080`) plus Npgsql side-channel for seed setup. Bearer tokens (NFT-PERF-01, 02, 04) minted via `https://jwks-mock:8443/sign`; NFT-PERF-03 sends no Authorization header. Expected outputs are the documented latency thresholds from `_docs/02_document/tests/performance-tests.md`. - Stubs are allowed ONLY for: the external `admin` JWT issuer (`jwks-mock` container) and the DB-only stub tables for `media`, `annotations`, `detection`, `map_objects`. - Stubs, fakes, deterministic fallbacks, monkeypatches, or direct imports are NOT allowed for any internal product module — including the controllers, service classes, `AppDataConnection`, or any layer affecting response time. If any of these is not implemented, the test MUST fail/block as missing product implementation — it must not pass by replacing the module with a test stub.