# Performance Tests

**Task**: AZ-586_test_performance
**Name**: Performance tests (NFT-PERF-01..04)
**Description**: Implement xUnit blackbox tests for the 4 performance scenarios — F3 cascade-delete P50 ≤ 50ms on a 1-waypoint mission, F3 cascade-delete P50 ≤ 200ms on the full chain (provisional baseline; lock after first green run), `GET /health` P50 ≤ 10ms, and `GET /missions?page=1&pageSize=20` P95 ≤ 100ms against a 1000-mission seed (provisional baseline). Every test runs 5 warm-up calls + the documented N measured calls; cold-start passes excluded.
**Complexity**: 3 points
**Dependencies**: AZ-576_test_infrastructure
**Component**: Blackbox Tests
**Tracker**: AZ-586
**Epic**: AZ-575

## Problem

Three latency thresholds are documented (AC-3.6 P50 ≤ 50ms for minimal cascade, AC-7.3 P50 ≤ 10ms for health, AC-2.3 implicit list latency) and one (NFT-PERF-02 full-chain cascade) is a baseline that subsequent runs must not regress by more than 50%. Without these tests, an unintentional N+1 query, missing index, or accidental serialization layer overhead could silently 10× the response time before the next manual perf benchmark catches it. The full-chain cascade test is especially load-bearing because the F3 cascade walks 5 dependency tables — a future indexing regression or transaction-wrap addition would show up here first.

## Outcome

- All four NFT-PERF-01..04 scenarios run and pass against the dockerised `missions` service.
- Each test produces a CSV row with `Category=Perf`, `Traces=AC-3.6` / `AC-3.1` / `AC-7.3` / `AC-2.3`, `Result=pass`, AND records P50 and P95 numeric values in the `Traces` column (e.g., `P50_MS=23.4, P95_MS=41.8`).
- 5 warm-up calls precede every measured set; cold-start passes are excluded from the percentile computation.
- All tests run sequentially against a single client (no concurrent connections) so HTTP/1.1 connection-reuse and JIT warm-up are deterministic.
- Tests run only when `[Trait("Category","Perf")]` filter is active (default test suite filter excludes performance to keep the standard CI gate ≤ 15 min); a separate `scripts/run-performance-tests.sh` invocation runs them.

## Scope

### Included

- NFT-PERF-01 F3 minimal cascade — `DELETE /missions/{id}` on 1-waypoint missions; P50 ≤ 50ms over 100 sequential calls.
- NFT-PERF-02 F3 full cascade — `DELETE /missions/{id}` on `fixture_cascade_F3`-shaped missions; P50 ≤ 200ms over 50 sequential calls (provisional baseline).
- NFT-PERF-03 Health endpoint — `GET /health` P50 ≤ 10ms over 100 sequential calls.
- NFT-PERF-04 List pagination — `GET /missions?page=1&pageSize=20` P95 ≤ 100ms over 100 sequential calls against a 1000-mission seed (provisional baseline).
- Recording P50/P95 to CSV `Traces` column for trend tracking even when not gated.
- Performance suite is gated behind the `[Trait("Category","Perf")]` filter; standard CI gate excludes these.

### Excluded

- Concurrency / contention tests (race scenarios) live in Task 17 (NFT-RES-08).
- Resource consumption (RSS, FDs, connections) lives in Task 18 (NFT-RES-LIM).
- Production-hardware (Jetson Orin) latency baselines — documented as a follow-up in `restrictions.md` H8; test environment baselines stand in.
- Concurrent-client throughput / RPS — not in scope today; documented as Refactor Backlog.

## Acceptance Criteria

**AC-1: NFT-PERF-01 F3 minimal cascade P50 ≤ 50ms**
Given `missions` + `postgres-test` colocated on the same Docker network, `seed_one_default_vehicle` + 100 minimal missions (each with 1 waypoint, no media/annotations/detection/map_objects rows), AND 5 warm-up `DELETE` calls have completed on missions outside the measured set
When the consumer issues 100 sequential `DELETE /missions/{id_i}` calls (one per seeded mission, 1 ≤ i ≤ 100) and records per-call wall-clock latency
Then the P50 (median) of the 100 latencies is `≤ 50ms`
And P50 + P95 are recorded to the CSV `Traces` column as `P50_MS=<v1>, P95_MS=<v2>`

**AC-2: NFT-PERF-02 F3 full-chain cascade P50 ≤ 200ms**
Given 50 missions each with the `fixture_cascade_F3` chain (3 map_objects, 2 waypoints, 2 media, 2 annotations, 2 detection rows) AND 5 warm-up calls on additional fixtures outside the measured set
When the consumer issues 50 sequential `DELETE /missions/{id_i}` calls and records per-call wall-clock latency
Then P50 ≤ 200ms (provisional baseline — to be locked at `measured + 50%` on first green run)
And P50 + P95 recorded to CSV

**AC-3: NFT-PERF-03 health endpoint P50 ≤ 10ms**
Given `missions` running, no special seed, AND 5 warm-up `GET /health` calls
When the consumer issues 100 sequential `GET /health` calls (no `Authorization` header) and records per-call wall-clock latency
Then P50 ≤ 10ms
And P50 + P95 recorded to CSV

**AC-4: NFT-PERF-04 list pagination P95 ≤ 100ms (provisional)**
Given `seed_one_default_vehicle` + 1000 missions referencing it, AND 5 warm-up `GET /missions?page=1&pageSize=20` calls
When the consumer issues 100 sequential `GET /missions?page=1&pageSize=20` calls and records per-call wall-clock latency
Then P95 ≤ 100ms (provisional baseline — to be locked at `measured + 50%` on first green run)
And P50 + P95 recorded to CSV

## Non-Functional Requirements

**Performance**
- NFT-PERF-01: ≤ 30s wall-clock (100 calls × ≤ 50ms each + measurement overhead). Per `[Trait("max_ms","30000")]` xUnit timeout.
- NFT-PERF-02: ≤ 60s wall-clock.
- NFT-PERF-03: ≤ 5s wall-clock.
- NFT-PERF-04: ≤ 30s wall-clock.

**Reliability**
- All tests SKIP if the runner cannot allocate ≥ 2 CPU cores and ≥ 2 GB free RAM (per `performance-tests.md` Notes). SKIP records `Result=skip` and `ErrorMessage=insufficient CPU/RAM`. Default CI runner spec must meet this — but degraded runners must not produce false-fail noise.
- All tests assume `missions` and `postgres-test` are colocated on the same Docker network (no inter-host link). The fixture verifies this via `docker inspect missions-sut --format '{{.NetworkSettings.Networks.testnet.IPAddress}}'` returns non-empty.

## Blackbox Tests

| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
|--------|------------------------|-------------|-------------------|----------------|
| AC-1 | 100 minimal missions + 5 warm-ups | 100 sequential `DELETE /missions/{id}` | P50 ≤ 50ms; record P50/P95 | AC-3.6 |
| AC-2 | 50 F3-fixture missions + 5 warm-ups | 50 sequential `DELETE /missions/{id}` | P50 ≤ 200ms (provisional); record P50/P95 | AC-3.1, AC-3.6 |
| AC-3 | warm runtime + 5 warm-ups | 100 sequential `GET /health` | P50 ≤ 10ms; record P50/P95 | AC-7.3 |
| AC-4 | 1000 missions + 5 warm-ups | 100 sequential `GET /missions?page=1&pageSize=20` | P95 ≤ 100ms (provisional); record P50/P95 | AC-2.3 |

## Constraints

- Tests live in `Tests/Performance/` and are tagged `[Trait("Category","Perf")]` so the default CI gate excludes them.
- A separate `scripts/run-performance-tests.sh` (created by AZ-576) invokes only this category. The standard `scripts/run-tests.sh` skips them.
- Sequential single-client execution — no `Parallel.For` or `Task.WhenAll`; each call awaits the previous response.
- Warm-up calls are NOT included in the percentile computation. Per `// Warmup` comment block in the test, the first 5 calls go to fixtures created specifically for warm-up (not the measured set).
- The `Stopwatch`-based timing measures `HttpClient.SendAsync` wall-clock; serialization/deserialization overhead is INCLUDED (this is what end-users observe).
- Provisional gates (NFT-PERF-02, NFT-PERF-04) are documented in source as `// PROVISIONAL — lock at measured + 50% on first green run` and `[Trait("provisional","yes")]`.
- AAA pattern with `// Arrange` (seed + warm-up), `// Act` (measured calls + percentile compute), `// Assert` (gate + CSV record).

## Risks & Mitigation

**Risk 1: CI variance breaks tight P50 ≤ 10ms gate (NFT-PERF-03)**
- *Risk*: On a noisy-neighbour CI runner, even a static `/health` route can hiccup once per 100 calls; if the hiccup lands in the P50 region, the median exceeds 10ms.
- *Mitigation*: P50 is robust to single outliers (median position 50 of 100). If the test still flakes, lock the gate at `measured P50 + 50%` after the first green run.

**Risk 2: NFT-PERF-04 1000-mission seed overlaps with other tests' DB state**
- *Risk*: Seeding 1000 missions affects pagination tests, list-shape tests, and date-filter tests — if NFT-PERF-04 runs before them in the same SUT lifetime, results drift.
- *Mitigation*: NFT-PERF-04 lives in `[Collection("Perf1k")]` and uses `IClassFixture<DbResetFixture>` to TRUNCATE all rows before its seed AND restore `seed_empty` after. Functional tests' fixtures handle their own seed; no cross-pollination.

**Risk 3: Provisional gates accepted as locked gates**
- *Risk*: Same as NFT-RES-LIM Risk 3 — if first run measures 80ms and the test passes, future engineers see the 100ms gate as the standard.
- *Mitigation*: CI dashboards flag `measured / gate ratio > 0.8` for re-tuning. Lock-in workflow documented in `performance-tests.md`.

## System Under Test Boundary

- Tests drive the product through the public HTTP surface (`http://missions:8080`) plus Npgsql side-channel for seed setup. Bearer tokens (NFT-PERF-01, 02, 04) minted via `https://jwks-mock:8443/sign`; NFT-PERF-03 sends no Authorization header. Expected outputs are the documented latency thresholds from `_docs/02_document/tests/performance-tests.md`.
- Stubs are allowed ONLY for: the external `admin` JWT issuer (`jwks-mock` container) and the DB-only stub tables for `media`, `annotations`, `detection`, `map_objects`.
- Stubs, fakes, deterministic fallbacks, monkeypatches, or direct imports are NOT allowed for any internal product module — including the controllers, service classes, `AppDataConnection`, or any layer affecting response time. If any of these is not implemented, the test MUST fail/block as missing product implementation — it must not pass by replacing the module with a test stub.