Files
annotations/_docs/02_tasks/todo/AZ-574_test_performance.md
T
Oleksandr Bezdieniezhnykh cf632d9e2e [AZ-563] Decompose blackbox tests into AZ-564..574 task specs
Step 5 of autodev existing-code flow. Epic AZ-563 plus 11 atomic
tasks covering all 67 test scenarios from
_docs/02_document/tests/* exactly once:

- AZ-564 test infrastructure (xUnit + Docker + mock JWKS + dataseed)
- AZ-565..568 functional positive (FT-P-01..22)
- AZ-569..570 functional negative (FT-N-01..16)
- AZ-571 security (NFT-SEC-01..10)
- AZ-572 resilience (NFT-RES-01..06)
- AZ-573 resource limits (NFT-RES-LIM-01..06)
- AZ-574 performance (NFT-PERF-*)

_dependencies_table.md records the cross-check vs traceability
matrix (22 + 16 + 29 = 67 scenarios, no overlaps, no gaps; deferred
items remain deferred per matrix). All task headers carry their
Jira IDs (tracker: jira). Autodev state advanced to Step 6
(Implement Tests).

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-14 21:13:53 +03:00

3.0 KiB
Raw Blame History

Performance tests (NFT-PERF-*)

Task: AZ-574 Name: Performance tests Description: Implement xUnit tests for the 7 performance scenarios: annotation create p95 latency (small + large), sustained writes throughput, FailsafeProducer drain rate, SSE delivery latency under fan-out, annotation listing at scale, dataset class distribution at scale. Complexity: 3 points Dependencies: AZ-564 (test infrastructure; depends on dataseed populating the 10k/50k bulk rows for the "at scale" tests) Component: Blackbox Tests → Performance Tracker: jira Epic: AZ-563

Scenarios Covered

Test ID Source What it asserts
NFT-PERF-LATENCY-01 _docs/02_document/tests/performance-tests.md POST /annotations p95 latency — small image (image_small.jpg) ≤ documented threshold (≤ 1500 ms per spec).
NFT-PERF-LATENCY-02 same POST /annotations p95 latency — large image (image_large.JPG, ~7 MB) ≤ documented threshold.
NFT-PERF-THROUGHPUT-01 same Sustained writes throughput — RPS over a 60-s window meets the documented threshold.
NFT-PERF-OUTBOX-DRAIN-01 same FailsafeProducer drain rate — outbox depth converges to 0 within the documented window after a burst.
NFT-PERF-SSE-FANOUT-01 same SSE delivery latency under modest fan-out (N=20 subscribers) — p95 latency ≤ documented threshold.
NFT-PERF-LIST-01 same GET /annotations listing on populated DB (10k rows). p95 latency ≤ documented threshold.
NFT-PERF-DATASET-01 same Dataset class distribution at scale (50k detections). p95 latency ≤ documented threshold.

System Under Test Boundary

  • HTTP only.
  • p95 computed by the test from a sample of N requests (per-scenario sample size in the spec).
  • NFT-PERF-LIST-01 / NFT-PERF-DATASET-01 require dataseed to have populated the bulk rows (AZ-564 covers this).
  • Profile gate: E2E_RUN_PROFILE=performance enables these tests; the standard functional profile skips them (they are too long for the merge gate).

Acceptance Criteria

AC-1: Every perf scenario passes its threshold under the performance profile.

AC-2: Smoke variant runs in the standard profile Given E2E_RUN_PROFILE=functional, When the test runs, Then a short smoke variant (e.g., 10 requests instead of 1000) executes and only asserts p95 < 2× the threshold (a sanity check, not a perf gate).

AC-3: Measurement uncertainty acknowledged Given p95 is computed from a finite sample, When the test reports its result, Then the result includes the sample size, the actual p95, and the documented threshold. Failures include a JSON report file at e2e-results/perf-<test_id>.json.

Constraints

  • AAA pattern.
  • [Trait("traces_to", "AC-F-10, AC-N-01")] plus per-test specific traces.
  • Perf tests run in their own xUnit collection so they don't block functional tests during interactive runs.
  • Performance thresholds come from results_report.md; tests must not hard-code numbers — they read them from a fixture.