Decompose Step 5 (tests-only mode) produced the test-task ladder for the Blackbox Tests epic. Test infrastructure (AZ-576) blocks the rest; all 10 blackbox child tasks fan out from it. Tasks (epic AZ-575): - AZ-576 test_infrastructure (5 SP) - AZ-577 test_vehicles_positive (5 SP) - AZ-578 test_missions_positive (5 SP) - AZ-579 test_waypoints_health_positive (5 SP) - AZ-580 test_validation_authz_negative (3 SP) - AZ-581 test_security_auth_claims (5 SP) - AZ-582 test_security_alg_rotation_cors (5 SP) - AZ-583 test_resilience_cascade_migrator (3 SP) - AZ-584 test_resilience_config_db_rotation_race (5 SP) - AZ-585 test_resource_limits (3 SP) - AZ-586 test_performance (3 SP) Total: 45 SP across 11 tasks. Coverage verified against blackbox/security/resilience/resource-limit/performance test specs (56 scenarios). _docs/_autodev_state.md advanced to Step 6 (Implement Tests). Co-authored-by: Cursor <cursoragent@cursor.com>
9.8 KiB
Performance Tests
Task: AZ-586_test_performance
Name: Performance tests (NFT-PERF-01..04)
Description: Implement xUnit blackbox tests for the 4 performance scenarios — F3 cascade-delete P50 ≤ 50ms on a 1-waypoint mission, F3 cascade-delete P50 ≤ 200ms on the full chain (provisional baseline; lock after first green run), GET /health P50 ≤ 10ms, and GET /missions?page=1&pageSize=20 P95 ≤ 100ms against a 1000-mission seed (provisional baseline). Every test runs 5 warm-up calls + the documented N measured calls; cold-start passes excluded.
Complexity: 3 points
Dependencies: AZ-576_test_infrastructure
Component: Blackbox Tests
Tracker: AZ-586
Epic: AZ-575
Problem
Three latency thresholds are documented (AC-3.6 P50 ≤ 50ms for minimal cascade, AC-7.3 P50 ≤ 10ms for health, AC-2.3 implicit list latency) and one (NFT-PERF-02 full-chain cascade) is a baseline that subsequent runs must not regress by more than 50%. Without these tests, an unintentional N+1 query, missing index, or accidental serialization layer overhead could silently 10× the response time before the next manual perf benchmark catches it. The full-chain cascade test is especially load-bearing because the F3 cascade walks 5 dependency tables — a future indexing regression or transaction-wrap addition would show up here first.
Outcome
- All four NFT-PERF-01..04 scenarios run and pass against the dockerised
missionsservice. - Each test produces a CSV row with
Category=Perf,Traces=AC-3.6/AC-3.1/AC-7.3/AC-2.3,Result=pass, AND records P50 and P95 numeric values in theTracescolumn (e.g.,P50_MS=23.4, P95_MS=41.8). - 5 warm-up calls precede every measured set; cold-start passes are excluded from the percentile computation.
- All tests run sequentially against a single client (no concurrent connections) so HTTP/1.1 connection-reuse and JIT warm-up are deterministic.
- Tests run only when
[Trait("Category","Perf")]filter is active (default test suite filter excludes performance to keep the standard CI gate ≤ 15 min); a separatescripts/run-performance-tests.shinvocation runs them.
Scope
Included
- NFT-PERF-01 F3 minimal cascade —
DELETE /missions/{id}on 1-waypoint missions; P50 ≤ 50ms over 100 sequential calls. - NFT-PERF-02 F3 full cascade —
DELETE /missions/{id}onfixture_cascade_F3-shaped missions; P50 ≤ 200ms over 50 sequential calls (provisional baseline). - NFT-PERF-03 Health endpoint —
GET /healthP50 ≤ 10ms over 100 sequential calls. - NFT-PERF-04 List pagination —
GET /missions?page=1&pageSize=20P95 ≤ 100ms over 100 sequential calls against a 1000-mission seed (provisional baseline). - Recording P50/P95 to CSV
Tracescolumn for trend tracking even when not gated. - Performance suite is gated behind the
[Trait("Category","Perf")]filter; standard CI gate excludes these.
Excluded
- Concurrency / contention tests (race scenarios) live in Task 17 (NFT-RES-08).
- Resource consumption (RSS, FDs, connections) lives in Task 18 (NFT-RES-LIM).
- Production-hardware (Jetson Orin) latency baselines — documented as a follow-up in
restrictions.mdH8; test environment baselines stand in. - Concurrent-client throughput / RPS — not in scope today; documented as Refactor Backlog.
Acceptance Criteria
AC-1: NFT-PERF-01 F3 minimal cascade P50 ≤ 50ms
Given missions + postgres-test colocated on the same Docker network, seed_one_default_vehicle + 100 minimal missions (each with 1 waypoint, no media/annotations/detection/map_objects rows), AND 5 warm-up DELETE calls have completed on missions outside the measured set
When the consumer issues 100 sequential DELETE /missions/{id_i} calls (one per seeded mission, 1 ≤ i ≤ 100) and records per-call wall-clock latency
Then the P50 (median) of the 100 latencies is ≤ 50ms
And P50 + P95 are recorded to the CSV Traces column as P50_MS=<v1>, P95_MS=<v2>
AC-2: NFT-PERF-02 F3 full-chain cascade P50 ≤ 200ms
Given 50 missions each with the fixture_cascade_F3 chain (3 map_objects, 2 waypoints, 2 media, 2 annotations, 2 detection rows) AND 5 warm-up calls on additional fixtures outside the measured set
When the consumer issues 50 sequential DELETE /missions/{id_i} calls and records per-call wall-clock latency
Then P50 ≤ 200ms (provisional baseline — to be locked at measured + 50% on first green run)
And P50 + P95 recorded to CSV
AC-3: NFT-PERF-03 health endpoint P50 ≤ 10ms
Given missions running, no special seed, AND 5 warm-up GET /health calls
When the consumer issues 100 sequential GET /health calls (no Authorization header) and records per-call wall-clock latency
Then P50 ≤ 10ms
And P50 + P95 recorded to CSV
AC-4: NFT-PERF-04 list pagination P95 ≤ 100ms (provisional)
Given seed_one_default_vehicle + 1000 missions referencing it, AND 5 warm-up GET /missions?page=1&pageSize=20 calls
When the consumer issues 100 sequential GET /missions?page=1&pageSize=20 calls and records per-call wall-clock latency
Then P95 ≤ 100ms (provisional baseline — to be locked at measured + 50% on first green run)
And P50 + P95 recorded to CSV
Non-Functional Requirements
Performance
- NFT-PERF-01: ≤ 30s wall-clock (100 calls × ≤ 50ms each + measurement overhead). Per
[Trait("max_ms","30000")]xUnit timeout. - NFT-PERF-02: ≤ 60s wall-clock.
- NFT-PERF-03: ≤ 5s wall-clock.
- NFT-PERF-04: ≤ 30s wall-clock.
Reliability
- All tests SKIP if the runner cannot allocate ≥ 2 CPU cores and ≥ 2 GB free RAM (per
performance-tests.mdNotes). SKIP recordsResult=skipandErrorMessage=insufficient CPU/RAM. Default CI runner spec must meet this — but degraded runners must not produce false-fail noise. - All tests assume
missionsandpostgres-testare colocated on the same Docker network (no inter-host link). The fixture verifies this viadocker inspect missions-sut --format '{{.NetworkSettings.Networks.testnet.IPAddress}}'returns non-empty.
Blackbox Tests
| AC Ref | Initial Data/Conditions | What to Test | Expected Behavior | NFR References |
|---|---|---|---|---|
| AC-1 | 100 minimal missions + 5 warm-ups | 100 sequential DELETE /missions/{id} |
P50 ≤ 50ms; record P50/P95 | AC-3.6 |
| AC-2 | 50 F3-fixture missions + 5 warm-ups | 50 sequential DELETE /missions/{id} |
P50 ≤ 200ms (provisional); record P50/P95 | AC-3.1, AC-3.6 |
| AC-3 | warm runtime + 5 warm-ups | 100 sequential GET /health |
P50 ≤ 10ms; record P50/P95 | AC-7.3 |
| AC-4 | 1000 missions + 5 warm-ups | 100 sequential GET /missions?page=1&pageSize=20 |
P95 ≤ 100ms (provisional); record P50/P95 | AC-2.3 |
Constraints
- Tests live in
Tests/Performance/and are tagged[Trait("Category","Perf")]so the default CI gate excludes them. - A separate
scripts/run-performance-tests.sh(created by AZ-576) invokes only this category. The standardscripts/run-tests.shskips them. - Sequential single-client execution — no
Parallel.FororTask.WhenAll; each call awaits the previous response. - Warm-up calls are NOT included in the percentile computation. Per
// Warmupcomment block in the test, the first 5 calls go to fixtures created specifically for warm-up (not the measured set). - The
Stopwatch-based timing measuresHttpClient.SendAsyncwall-clock; serialization/deserialization overhead is INCLUDED (this is what end-users observe). - Provisional gates (NFT-PERF-02, NFT-PERF-04) are documented in source as
// PROVISIONAL — lock at measured + 50% on first green runand[Trait("provisional","yes")]. - AAA pattern with
// Arrange(seed + warm-up),// Act(measured calls + percentile compute),// Assert(gate + CSV record).
Risks & Mitigation
Risk 1: CI variance breaks tight P50 ≤ 10ms gate (NFT-PERF-03)
- Risk: On a noisy-neighbour CI runner, even a static
/healthroute can hiccup once per 100 calls; if the hiccup lands in the P50 region, the median exceeds 10ms. - Mitigation: P50 is robust to single outliers (median position 50 of 100). If the test still flakes, lock the gate at
measured P50 + 50%after the first green run.
Risk 2: NFT-PERF-04 1000-mission seed overlaps with other tests' DB state
- Risk: Seeding 1000 missions affects pagination tests, list-shape tests, and date-filter tests — if NFT-PERF-04 runs before them in the same SUT lifetime, results drift.
- Mitigation: NFT-PERF-04 lives in
[Collection("Perf1k")]and usesIClassFixture<DbResetFixture>to TRUNCATE all rows before its seed AND restoreseed_emptyafter. Functional tests' fixtures handle their own seed; no cross-pollination.
Risk 3: Provisional gates accepted as locked gates
- Risk: Same as NFT-RES-LIM Risk 3 — if first run measures 80ms and the test passes, future engineers see the 100ms gate as the standard.
- Mitigation: CI dashboards flag
measured / gate ratio > 0.8for re-tuning. Lock-in workflow documented inperformance-tests.md.
System Under Test Boundary
- Tests drive the product through the public HTTP surface (
http://missions:8080) plus Npgsql side-channel for seed setup. Bearer tokens (NFT-PERF-01, 02, 04) minted viahttps://jwks-mock:8443/sign; NFT-PERF-03 sends no Authorization header. Expected outputs are the documented latency thresholds from_docs/02_document/tests/performance-tests.md. - Stubs are allowed ONLY for: the external
adminJWT issuer (jwks-mockcontainer) and the DB-only stub tables formedia,annotations,detection,map_objects. - Stubs, fakes, deterministic fallbacks, monkeypatches, or direct imports are NOT allowed for any internal product module — including the controllers, service classes,
AppDataConnection, or any layer affecting response time. If any of these is not implemented, the test MUST fail/block as missing product implementation — it must not pass by replacing the module with a test stub.