mirror of
https://github.com/azaion/annotations.git
synced 2026-06-21 10:21:07 +00:00
[AZ-563] Decompose blackbox tests into AZ-564..574 task specs
Step 5 of autodev existing-code flow. Epic AZ-563 plus 11 atomic tasks covering all 67 test scenarios from _docs/02_document/tests/* exactly once: - AZ-564 test infrastructure (xUnit + Docker + mock JWKS + dataseed) - AZ-565..568 functional positive (FT-P-01..22) - AZ-569..570 functional negative (FT-N-01..16) - AZ-571 security (NFT-SEC-01..10) - AZ-572 resilience (NFT-RES-01..06) - AZ-573 resource limits (NFT-RES-LIM-01..06) - AZ-574 performance (NFT-PERF-*) _dependencies_table.md records the cross-check vs traceability matrix (22 + 16 + 29 = 67 scenarios, no overlaps, no gaps; deferred items remain deferred per matrix). All task headers carry their Jira IDs (tracker: jira). Autodev state advanced to Step 6 (Implement Tests). Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -0,0 +1,50 @@
|
||||
# Performance tests (NFT-PERF-*)
|
||||
|
||||
**Task**: AZ-574
|
||||
**Name**: Performance tests
|
||||
**Description**: Implement xUnit tests for the 7 performance scenarios: annotation create p95 latency (small + large), sustained writes throughput, FailsafeProducer drain rate, SSE delivery latency under fan-out, annotation listing at scale, dataset class distribution at scale.
|
||||
**Complexity**: 3 points
|
||||
**Dependencies**: AZ-564 (test infrastructure; depends on dataseed populating the 10k/50k bulk rows for the "at scale" tests)
|
||||
**Component**: Blackbox Tests → Performance
|
||||
**Tracker**: jira
|
||||
**Epic**: AZ-563
|
||||
|
||||
## Scenarios Covered
|
||||
|
||||
| Test ID | Source | What it asserts |
|
||||
|---------|--------|-----------------|
|
||||
| NFT-PERF-LATENCY-01 | `_docs/02_document/tests/performance-tests.md` | `POST /annotations` p95 latency — small image (image_small.jpg) ≤ documented threshold (≤ 1500 ms per spec). |
|
||||
| NFT-PERF-LATENCY-02 | same | `POST /annotations` p95 latency — large image (image_large.JPG, ~7 MB) ≤ documented threshold. |
|
||||
| NFT-PERF-THROUGHPUT-01 | same | Sustained writes throughput — RPS over a 60-s window meets the documented threshold. |
|
||||
| NFT-PERF-OUTBOX-DRAIN-01 | same | FailsafeProducer drain rate — outbox depth converges to 0 within the documented window after a burst. |
|
||||
| NFT-PERF-SSE-FANOUT-01 | same | SSE delivery latency under modest fan-out (N=20 subscribers) — p95 latency ≤ documented threshold. |
|
||||
| NFT-PERF-LIST-01 | same | `GET /annotations` listing on populated DB (10k rows). p95 latency ≤ documented threshold. |
|
||||
| NFT-PERF-DATASET-01 | same | Dataset class distribution at scale (50k detections). p95 latency ≤ documented threshold. |
|
||||
|
||||
## System Under Test Boundary
|
||||
|
||||
- HTTP only.
|
||||
- p95 computed by the test from a sample of N requests (per-scenario sample size in the spec).
|
||||
- NFT-PERF-LIST-01 / NFT-PERF-DATASET-01 require `dataseed` to have populated the bulk rows (AZ-564 covers this).
|
||||
- Profile gate: `E2E_RUN_PROFILE=performance` enables these tests; the standard `functional` profile skips them (they are too long for the merge gate).
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: Every perf scenario passes its threshold under the `performance` profile.**
|
||||
|
||||
**AC-2: Smoke variant runs in the standard profile**
|
||||
Given `E2E_RUN_PROFILE=functional`,
|
||||
When the test runs,
|
||||
Then a short smoke variant (e.g., 10 requests instead of 1000) executes and only asserts p95 < 2× the threshold (a sanity check, not a perf gate).
|
||||
|
||||
**AC-3: Measurement uncertainty acknowledged**
|
||||
Given p95 is computed from a finite sample,
|
||||
When the test reports its result,
|
||||
Then the result includes the sample size, the actual p95, and the documented threshold. Failures include a JSON report file at `e2e-results/perf-<test_id>.json`.
|
||||
|
||||
## Constraints
|
||||
|
||||
- AAA pattern.
|
||||
- `[Trait("traces_to", "AC-F-10, AC-N-01")]` plus per-test specific traces.
|
||||
- Perf tests run in their own xUnit collection so they don't block functional tests during interactive runs.
|
||||
- Performance thresholds come from `results_report.md`; tests must not hard-code numbers — they read them from a fixture.
|
||||
Reference in New Issue
Block a user