Files
missions/_docs/tasks/todo/AZ-586_test_performance.md
T
Oleksandr Bezdieniezhnykh b0c7132889 [AZ-575] Add 11 blackbox test task specs from decompose Step 5
Decompose Step 5 (tests-only mode) produced the test-task ladder for
the Blackbox Tests epic. Test infrastructure (AZ-576) blocks the rest;
all 10 blackbox child tasks fan out from it.

Tasks (epic AZ-575):
- AZ-576 test_infrastructure (5 SP)
- AZ-577 test_vehicles_positive (5 SP)
- AZ-578 test_missions_positive (5 SP)
- AZ-579 test_waypoints_health_positive (5 SP)
- AZ-580 test_validation_authz_negative (3 SP)
- AZ-581 test_security_auth_claims (5 SP)
- AZ-582 test_security_alg_rotation_cors (5 SP)
- AZ-583 test_resilience_cascade_migrator (3 SP)
- AZ-584 test_resilience_config_db_rotation_race (5 SP)
- AZ-585 test_resource_limits (3 SP)
- AZ-586 test_performance (3 SP)

Total: 45 SP across 11 tasks. Coverage verified against
blackbox/security/resilience/resource-limit/performance test specs
(56 scenarios). _docs/_autodev_state.md advanced to Step 6 (Implement
Tests).

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-15 06:37:00 +03:00

9.8 KiB
Raw Blame History

Performance Tests

Task: AZ-586_test_performance Name: Performance tests (NFT-PERF-01..04) Description: Implement xUnit blackbox tests for the 4 performance scenarios — F3 cascade-delete P50 ≤ 50ms on a 1-waypoint mission, F3 cascade-delete P50 ≤ 200ms on the full chain (provisional baseline; lock after first green run), GET /health P50 ≤ 10ms, and GET /missions?page=1&pageSize=20 P95 ≤ 100ms against a 1000-mission seed (provisional baseline). Every test runs 5 warm-up calls + the documented N measured calls; cold-start passes excluded. Complexity: 3 points Dependencies: AZ-576_test_infrastructure Component: Blackbox Tests Tracker: AZ-586 Epic: AZ-575

Problem

Three latency thresholds are documented (AC-3.6 P50 ≤ 50ms for minimal cascade, AC-7.3 P50 ≤ 10ms for health, AC-2.3 implicit list latency) and one (NFT-PERF-02 full-chain cascade) is a baseline that subsequent runs must not regress by more than 50%. Without these tests, an unintentional N+1 query, missing index, or accidental serialization layer overhead could silently 10× the response time before the next manual perf benchmark catches it. The full-chain cascade test is especially load-bearing because the F3 cascade walks 5 dependency tables — a future indexing regression or transaction-wrap addition would show up here first.

Outcome

  • All four NFT-PERF-01..04 scenarios run and pass against the dockerised missions service.
  • Each test produces a CSV row with Category=Perf, Traces=AC-3.6 / AC-3.1 / AC-7.3 / AC-2.3, Result=pass, AND records P50 and P95 numeric values in the Traces column (e.g., P50_MS=23.4, P95_MS=41.8).
  • 5 warm-up calls precede every measured set; cold-start passes are excluded from the percentile computation.
  • All tests run sequentially against a single client (no concurrent connections) so HTTP/1.1 connection-reuse and JIT warm-up are deterministic.
  • Tests run only when [Trait("Category","Perf")] filter is active (default test suite filter excludes performance to keep the standard CI gate ≤ 15 min); a separate scripts/run-performance-tests.sh invocation runs them.

Scope

Included

  • NFT-PERF-01 F3 minimal cascade — DELETE /missions/{id} on 1-waypoint missions; P50 ≤ 50ms over 100 sequential calls.
  • NFT-PERF-02 F3 full cascade — DELETE /missions/{id} on fixture_cascade_F3-shaped missions; P50 ≤ 200ms over 50 sequential calls (provisional baseline).
  • NFT-PERF-03 Health endpoint — GET /health P50 ≤ 10ms over 100 sequential calls.
  • NFT-PERF-04 List pagination — GET /missions?page=1&pageSize=20 P95 ≤ 100ms over 100 sequential calls against a 1000-mission seed (provisional baseline).
  • Recording P50/P95 to CSV Traces column for trend tracking even when not gated.
  • Performance suite is gated behind the [Trait("Category","Perf")] filter; standard CI gate excludes these.

Excluded

  • Concurrency / contention tests (race scenarios) live in Task 17 (NFT-RES-08).
  • Resource consumption (RSS, FDs, connections) lives in Task 18 (NFT-RES-LIM).
  • Production-hardware (Jetson Orin) latency baselines — documented as a follow-up in restrictions.md H8; test environment baselines stand in.
  • Concurrent-client throughput / RPS — not in scope today; documented as Refactor Backlog.

Acceptance Criteria

AC-1: NFT-PERF-01 F3 minimal cascade P50 ≤ 50ms Given missions + postgres-test colocated on the same Docker network, seed_one_default_vehicle + 100 minimal missions (each with 1 waypoint, no media/annotations/detection/map_objects rows), AND 5 warm-up DELETE calls have completed on missions outside the measured set When the consumer issues 100 sequential DELETE /missions/{id_i} calls (one per seeded mission, 1 ≤ i ≤ 100) and records per-call wall-clock latency Then the P50 (median) of the 100 latencies is ≤ 50ms And P50 + P95 are recorded to the CSV Traces column as P50_MS=<v1>, P95_MS=<v2>

AC-2: NFT-PERF-02 F3 full-chain cascade P50 ≤ 200ms Given 50 missions each with the fixture_cascade_F3 chain (3 map_objects, 2 waypoints, 2 media, 2 annotations, 2 detection rows) AND 5 warm-up calls on additional fixtures outside the measured set When the consumer issues 50 sequential DELETE /missions/{id_i} calls and records per-call wall-clock latency Then P50 ≤ 200ms (provisional baseline — to be locked at measured + 50% on first green run) And P50 + P95 recorded to CSV

AC-3: NFT-PERF-03 health endpoint P50 ≤ 10ms Given missions running, no special seed, AND 5 warm-up GET /health calls When the consumer issues 100 sequential GET /health calls (no Authorization header) and records per-call wall-clock latency Then P50 ≤ 10ms And P50 + P95 recorded to CSV

AC-4: NFT-PERF-04 list pagination P95 ≤ 100ms (provisional) Given seed_one_default_vehicle + 1000 missions referencing it, AND 5 warm-up GET /missions?page=1&pageSize=20 calls When the consumer issues 100 sequential GET /missions?page=1&pageSize=20 calls and records per-call wall-clock latency Then P95 ≤ 100ms (provisional baseline — to be locked at measured + 50% on first green run) And P50 + P95 recorded to CSV

Non-Functional Requirements

Performance

  • NFT-PERF-01: ≤ 30s wall-clock (100 calls × ≤ 50ms each + measurement overhead). Per [Trait("max_ms","30000")] xUnit timeout.
  • NFT-PERF-02: ≤ 60s wall-clock.
  • NFT-PERF-03: ≤ 5s wall-clock.
  • NFT-PERF-04: ≤ 30s wall-clock.

Reliability

  • All tests SKIP if the runner cannot allocate ≥ 2 CPU cores and ≥ 2 GB free RAM (per performance-tests.md Notes). SKIP records Result=skip and ErrorMessage=insufficient CPU/RAM. Default CI runner spec must meet this — but degraded runners must not produce false-fail noise.
  • All tests assume missions and postgres-test are colocated on the same Docker network (no inter-host link). The fixture verifies this via docker inspect missions-sut --format '{{.NetworkSettings.Networks.testnet.IPAddress}}' returns non-empty.

Blackbox Tests

AC Ref Initial Data/Conditions What to Test Expected Behavior NFR References
AC-1 100 minimal missions + 5 warm-ups 100 sequential DELETE /missions/{id} P50 ≤ 50ms; record P50/P95 AC-3.6
AC-2 50 F3-fixture missions + 5 warm-ups 50 sequential DELETE /missions/{id} P50 ≤ 200ms (provisional); record P50/P95 AC-3.1, AC-3.6
AC-3 warm runtime + 5 warm-ups 100 sequential GET /health P50 ≤ 10ms; record P50/P95 AC-7.3
AC-4 1000 missions + 5 warm-ups 100 sequential GET /missions?page=1&pageSize=20 P95 ≤ 100ms (provisional); record P50/P95 AC-2.3

Constraints

  • Tests live in Tests/Performance/ and are tagged [Trait("Category","Perf")] so the default CI gate excludes them.
  • A separate scripts/run-performance-tests.sh (created by AZ-576) invokes only this category. The standard scripts/run-tests.sh skips them.
  • Sequential single-client execution — no Parallel.For or Task.WhenAll; each call awaits the previous response.
  • Warm-up calls are NOT included in the percentile computation. Per // Warmup comment block in the test, the first 5 calls go to fixtures created specifically for warm-up (not the measured set).
  • The Stopwatch-based timing measures HttpClient.SendAsync wall-clock; serialization/deserialization overhead is INCLUDED (this is what end-users observe).
  • Provisional gates (NFT-PERF-02, NFT-PERF-04) are documented in source as // PROVISIONAL — lock at measured + 50% on first green run and [Trait("provisional","yes")].
  • AAA pattern with // Arrange (seed + warm-up), // Act (measured calls + percentile compute), // Assert (gate + CSV record).

Risks & Mitigation

Risk 1: CI variance breaks tight P50 ≤ 10ms gate (NFT-PERF-03)

  • Risk: On a noisy-neighbour CI runner, even a static /health route can hiccup once per 100 calls; if the hiccup lands in the P50 region, the median exceeds 10ms.
  • Mitigation: P50 is robust to single outliers (median position 50 of 100). If the test still flakes, lock the gate at measured P50 + 50% after the first green run.

Risk 2: NFT-PERF-04 1000-mission seed overlaps with other tests' DB state

  • Risk: Seeding 1000 missions affects pagination tests, list-shape tests, and date-filter tests — if NFT-PERF-04 runs before them in the same SUT lifetime, results drift.
  • Mitigation: NFT-PERF-04 lives in [Collection("Perf1k")] and uses IClassFixture<DbResetFixture> to TRUNCATE all rows before its seed AND restore seed_empty after. Functional tests' fixtures handle their own seed; no cross-pollination.

Risk 3: Provisional gates accepted as locked gates

  • Risk: Same as NFT-RES-LIM Risk 3 — if first run measures 80ms and the test passes, future engineers see the 100ms gate as the standard.
  • Mitigation: CI dashboards flag measured / gate ratio > 0.8 for re-tuning. Lock-in workflow documented in performance-tests.md.

System Under Test Boundary

  • Tests drive the product through the public HTTP surface (http://missions:8080) plus Npgsql side-channel for seed setup. Bearer tokens (NFT-PERF-01, 02, 04) minted via https://jwks-mock:8443/sign; NFT-PERF-03 sends no Authorization header. Expected outputs are the documented latency thresholds from _docs/02_document/tests/performance-tests.md.
  • Stubs are allowed ONLY for: the external admin JWT issuer (jwks-mock container) and the DB-only stub tables for media, annotations, detection, map_objects.
  • Stubs, fakes, deterministic fallbacks, monkeypatches, or direct imports are NOT allowed for any internal product module — including the controllers, service classes, AppDataConnection, or any layer affecting response time. If any of these is not implemented, the test MUST fail/block as missing product implementation — it must not pass by replacing the module with a test stub.