[AZ-563] Decompose blackbox tests into AZ-564..574 task specs

Step 5 of autodev existing-code flow. Epic AZ-563 plus 11 atomic tasks covering all 67 test scenarios from _docs/02_document/tests/* exactly once: - AZ-564 test infrastructure (xUnit + Docker + mock JWKS + dataseed) - AZ-565..568 functional positive (FT-P-01..22) - AZ-569..570 functional negative (FT-N-01..16) - AZ-571 security (NFT-SEC-01..10) - AZ-572 resilience (NFT-RES-01..06) - AZ-573 resource limits (NFT-RES-LIM-01..06) - AZ-574 performance (NFT-PERF-*) _dependencies_table.md records the cross-check vs traceability matrix (22 + 16 + 29 = 67 scenarios, no overlaps, no gaps; deferred items remain deferred per matrix). All task headers carry their Jira IDs (tracker: jira). Autodev state advanced to Step 6 (Implement Tests). Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-21 10:51:07 +00:00 · 2026-05-14 21:13:53 +03:00
parent 637f41c51c
commit cf632d9e2e
13 changed files with 703 additions and 2 deletions
@@ -11,6 +11,32 @@ Tracks ordering and inter-task dependencies for all task specs in `_docs/02_task

 Tasks AZ-561 and AZ-562 touch disjoint files and were implemented as a single batch.

+## Open — Step 5: Blackbox Tests (epic AZ-563)
+
+All test tasks below land their xUnit code in a single new test project rooted at `e2e/Azaion.Annotations.E2E/` (per `AZ-564`). The infrastructure task is a hard prerequisite for every other test task.
+
+| Task | Title | Scope | Scenarios | Complexity | Depends on |
+|------|-------|-------|-----------|-----------|------------|
+| [AZ-564](https://denyspopov.atlassian.net/browse/AZ-564) | Test infrastructure (Annotations e2e) | `e2e/Azaion.Annotations.E2E/`, `e2e/docker-compose.test.yml`, mock JWKS issuer, dataseed, runner script | n/a — bootstrap | 5 | None |
+| [AZ-565](https://denyspopov.atlassian.net/browse/AZ-565) | Annotations REST positive | `Tests/AnnotationsRest/` | FT-P-01..06 (6) | 5 | AZ-564 |
+| [AZ-566](https://denyspopov.atlassian.net/browse/AZ-566) | Realtime + outbox positive | `Tests/Realtime/`, `Tests/Outbox/` | FT-P-07,08,09 + FT-P-21,22 (skipped: RB-01) (5) | 5 | AZ-564 |
+| [AZ-567](https://denyspopov.atlassian.net/browse/AZ-567) | Media + Dataset positive | `Tests/Media/`, `Tests/Settings/`, `Tests/Dataset/` | FT-P-10,11,14,15,16,17,18 (7) | 5 | AZ-564 |
+| [AZ-568](https://denyspopov.atlassian.net/browse/AZ-568) | Auth + Health + Migrator positive | `Tests/Auth/`, `Tests/Health/`, `Tests/Migrator/` | FT-P-12,13,19,20 (4) | 2 | AZ-564 |
+| [AZ-569](https://denyspopov.atlassian.net/browse/AZ-569) | Validation + envelope negative | `Tests/Validation/` | FT-N-01,02,05,06,07,14,16 (7) | 3 | AZ-564 |
+| [AZ-570](https://denyspopov.atlassian.net/browse/AZ-570) | Authorization negative | `Tests/Authorization/` | FT-N-03,04,08,09,10,11,12,13,15 (9) | 3 | AZ-564 |
+| [AZ-571](https://denyspopov.atlassian.net/browse/AZ-571) | Security tests | `Tests/Security/` (+ Production-env xUnit collection) | NFT-SEC-01..10 (10) | 5 | AZ-564 |
+| [AZ-572](https://denyspopov.atlassian.net/browse/AZ-572) | Resilience tests | `Tests/Resilience/` (broker / DB outage fixtures) | NFT-RES-01..06 (6) | 5 | AZ-564 |
+| [AZ-573](https://denyspopov.atlassian.net/browse/AZ-573) | Resource-limit tests | `Tests/ResourceLimit/` (profile-gated nightly variants) | NFT-RES-LIM-01..06 (6) | 3 | AZ-564 |
+| [AZ-574](https://denyspopov.atlassian.net/browse/AZ-574) | Performance tests | `Tests/Performance/` (perf profile + dataseed-loaded DB) | NFT-PERF-* (7) | 3 | AZ-564 (incl. dataseed) |
+
+### Coverage cross-check vs `_docs/02_document/tests/traceability-matrix.md`
+
+- **Functional positive**: FT-P-01..22 = 22 scenarios → covered exactly once across AZ-565 (6) + AZ-566 (5) + AZ-567 (7) + AZ-568 (4).
+- **Functional negative**: FT-N-01..16 = 16 scenarios → covered exactly once across AZ-569 (7) + AZ-570 (9).
+- **Non-functional**: 10 + 6 + 6 + 7 = 29 scenarios → covered exactly once across AZ-571..574.
+- **Total decomposed**: 22 + 16 + 29 = **67 scenarios**, no overlaps, no gaps.
+- **Deferred items** (RB-01 gated FT-P-21/22, RB-02/06/08/09 follow-ups, AC-F-13, ENV-04/05, OP-02 multi-instance) remain marked deferred per the traceability matrix and will be re-decomposed in cycle-update once the gating refactor tasks land.
+
 ## Tracker Status

 `tracker: jira` (per `_docs/_autodev_state.md`). All task headers carry their Jira issue key. The deferred-write leftover at `_docs/_process_leftovers/2026-05-14_testability-tracker.md` was replayed on 2026-05-14 and removed.
@@ -0,0 +1,196 @@
+# Test Infrastructure
+
+**Task**: AZ-564
+**Name**: Test Infrastructure (Annotations e2e)
+**Description**: Scaffold the executable blackbox test project — xUnit runner, mock JWKS issuer, ES256 key-pair fixture, Docker test stack, fixture mounts, seed script, CSV reporting. After this task lands, every other test task can declare itself a child of this scaffold.
+**Complexity**: 5 points
+**Dependencies**: AZ-560 (testability refactor — already landed via AZ-561 and AZ-562)
+**Component**: Blackbox Tests
+**Tracker**: jira
+**Epic**: AZ-563 — `Blackbox Tests — Annotations`
+
+## Test Project Folder Layout
+
+```
+tests/
+├── Azaion.Annotations.E2E/
+│   ├── Azaion.Annotations.E2E.csproj
+│   ├── Dockerfile
+│   ├── TestBase.cs                       # base class with HttpClient, token helper
+│   ├── Fixtures/
+│   │   ├── DockerStackFixture.cs         # CollectionFixture — boot order check
+│   │   ├── CleanStateFixture.cs          # TRUNCATE between test classes
+│   │   ├── BrokerFixture.cs              # RabbitMQ stop/start helpers
+│   │   └── TokenMinter.cs                # ES256 token minting via the in-stack key
+│   ├── Domain/                           # one file per category (one task per file)
+│   │   ├── (populated by AZ-565 ... AZ-573)
+│   └── README.md
+└── harness/
+    ├── mock_issuer.py                    # ~40-line Python http.server (writes JWKS, mounts private key)
+    └── gen_keys.sh                       # one-shot ES256 keypair generator (invoked by mock_issuer at boot)
+
+e2e/
+├── docker-compose.test.yml               # already produced in autodev Step 3; this task wires the new services into it
+├── seed/
+│   └── run.sh                            # already drafted in Step 3; this task adds bulk-insert SQL for NFT-PERF-LIST-01 and NFT-PERF-DATASET-01
+└── e2e-results/                          # output of test runs (gitignored)
+```
+
+### Layout Rationale
+
+- Tests live under `tests/Azaion.Annotations.E2E/` to mirror the .NET convention (sibling of `src/`).
+- The mock issuer lives in `tests/harness/` so it can be shared by smoke / debug stacks without polluting the test runner project.
+- Fixtures are separated from test classes to make the docker-stack boot pattern reusable.
+- All tests are xUnit (matches the SUT runtime; avoids a Python toolchain in CI).
+
+## Mock Services
+
+| Mock Service | Replaces | Endpoints | Behavior |
+|--------------|----------|-----------|----------|
+| `e2e-issuer` (Python `http.server`) | Admin's JWKS issuer | `GET /.well-known/jwks.json` (returns a 1-key JWKS for the in-stack ES256 public key) | Static for the lifetime of the docker-compose stack. Public key regenerates per `docker compose down -v` cycle. No test-time mutability needed — variant tokens (expired / wrong-iss / wrong-aud / `alg=HS256` forgery) are minted with overrides by the runner against the same private key (NFT-SEC-01..10 verifies the SUT rejects them). |
+
+There are no other mock services. All other infrastructure is real (Postgres 13, RabbitMQ 3.13 streams) — restrictions.md mandates "no mocking of internal services". External dependencies that *could* be mocked (admin sync worker, AI training consumer) are simply not run because the SUT does not initiate calls to them; it publishes to the stream and the stream is read by the test runner directly.
+
+### Mock Control API
+
+Not applicable for this suite. The mock issuer is static; behavior variation is performed by the runner minting different tokens. Broker / DB resilience is performed by `docker exec rabbitmq rabbitmqctl stop_app` and `docker restart postgres` invoked from the test runner — driven via .NET's `Process.Start` against the host docker socket bound into the runner container.
+
+## Docker Test Environment
+
+### docker-compose.test.yml structure
+
+| Service | Image / Build | Purpose | Depends On |
+|---------|---------------|---------|------------|
+| `postgres` | `postgres:13` | SUT's DB | — |
+| `rabbitmq` | `rabbitmq:3.13-management` + streams plugin | Stream broker | — |
+| `e2e-issuer` | `python:3.12-alpine` running `tests/harness/mock_issuer.py` | Mock JWKS issuer + key pair generator | — |
+| `annotations` | Built from `src/Dockerfile` | SUT | `postgres` (healthy), `rabbitmq` (healthy), `e2e-issuer` (healthy) |
+| `dataseed` | `postgres:13` (one-shot psql) | Loads `classes-baseline`, mission row, and the bulk rows for NFT-PERF-LIST-01 / NFT-PERF-DATASET-01 | `annotations` (healthy) |
+| `e2e-runner` | Built from `tests/Azaion.Annotations.E2E/Dockerfile` (.NET SDK 10.0) | Test runner (xUnit) | `dataseed` (completed_successfully) |
+
+### Networks and Volumes
+
+- **Network**: `e2e-net` (bridge, isolated). All services reach each other by service name.
+- **Volumes**:
+  - `pg-data` — Postgres durability across restart (resilience scenarios).
+  - `annotations-images`, `annotations-videos`, `annotations-deleted` — SUT file dirs.
+  - `jwt-keys` — ES256 keypair shared between `e2e-issuer` (writes public + serves JWKS) and `e2e-runner` (reads private key for token minting).
+- **Bind mount (read-only)**: `../detections/_docs/00_problem/input_data` → `/fixtures` in both the SUT and the runner.
+
+### Test runner host-docker access
+
+The runner needs to execute `docker exec rabbitmq rabbitmqctl stop_app` (NFT-RES-01..03) and `docker restart postgres` (NFT-RES-02..03). Solution: bind-mount the host docker socket into the runner (`/var/run/docker.sock:/var/run/docker.sock`) under a `RESILIENCE_DOCKER_SOCKET` env var; the `BrokerFixture` / `DbFixture` use it. This is gated to the test stack — the production SUT never mounts the docker socket.
+
+## Test Runner Configuration
+
+**Framework**: xUnit (matches SUT toolchain — .NET 10).
+**Plugins / NuGet refs**:
+- `Microsoft.NET.Test.Sdk` (xUnit discovery)
+- `xunit` + `xunit.runner.visualstudio`
+- `RabbitMQ.Stream.Client` (same version as `src/Azaion.Annotations.csproj`)
+- `MessagePack` (same version) — to decode stream messages for FT-P-09
+- `Microsoft.AspNetCore.SignalR.Client` — NO, SSE is plain HTTP `text/event-stream`; we use `HttpClient` directly
+- `System.IdentityModel.Tokens.Jwt` — for ES256 minting
+- `Npgsql` — for direct DB introspection assertions (read-only, documented per test)
+- `coverlet.collector` — for coverage; not gated on this run but nice to have
+
+**Entry point**: `dotnet test --logger "trx;LogFileName=results.trx" --results-directory /results` — followed by a tiny CSV-converter post-step in `Dockerfile`'s ENTRYPOINT that produces `/results/report.csv` from `results.trx`.
+
+### Fixture Strategy
+
+| Fixture | Scope | Purpose |
+|---------|-------|---------|
+| `DockerStackFixture` | Collection (one per assembly) | Smoke-pings `/health` and waits for JWKS fetch on boot. Does NOT bring the stack up — that's `docker compose up`'s job. |
+| `CleanStateFixture` | Class (per test class) | `TRUNCATE annotations, media, detection, annotations_queue_records RESTART IDENTITY CASCADE` via direct Postgres. Run before first test, again after last. |
+| `TokenMinter` | Singleton (within fixture lifetime) | Holds the ES256 private key (read from `/keys` mount) and exposes `MintToken(claim, overrides?)`. |
+| `BrokerFixture` | Per-test (only for resilience tests) | `StopBroker()`, `StartBroker()` via `docker exec`. Asserts pre/post state. |
+| `StreamConsumerFixture` | Per-test (only for stream-consumer tests) | Creates a fresh consumer name, starts at offset `next`, decodes MessagePack + gzip into typed events. |
+
+## Test Data Fixtures
+
+| Data Set | Source | Format | Used By |
+|----------|--------|--------|---------|
+| Image / video fixtures | Bind-mount `../detections/_docs/00_problem/input_data/` → `/fixtures` (read-only) | JPEG / MP4 binary | All FT-P-* and most FT-N-* |
+| `classes-baseline` (19 detection classes) | Auto-seeded by `DatabaseMigrator` on `annotations` first boot | DB rows | FT-P-14 (catalog read), every FT-P that references `class_num` |
+| `mission-test` GUID `00000000-0000-0000-0000-000000000aaa` | Inlined in request payloads | GUID | All annotation-create paths |
+| Synthetic JPEGs for NFT-RES-LIM-02 | Generated at test time by `LargePayloadFixture` (1, 10, 50, 100, 256, 512 MB) | binary | NFT-RES-LIM-02 |
+| Bulk rows for NFT-PERF-LIST-01 / NFT-PERF-DATASET-01 (10k annotations, 50k detections) | `dataseed/run.sh` SQL block | DB rows | NFT-PERF-LIST-01, NFT-PERF-DATASET-01 |
+| Per-test ES256 tokens | `TokenMinter` (in-process minting) | JWT | All FT-* requiring `Authorization` header and all NFT-SEC-* |
+
+### Data Isolation
+
+- **Per-class truncation** via `CleanStateFixture` (above).
+- **Per-test mission GUID** for SSE fan-out tests (FT-P-07, NFT-PERF-SSE-FANOUT-01).
+- **Per-test stream consumer name** for FT-P-09 and NFT-RES-06.
+- **Volume reset on `docker compose down -v`** — image / video dirs and the JWKS keypair regenerate.
+
+## Test Reporting
+
+**Format**: `.trx` (xUnit native), converted to flat CSV by the runner.
+**CSV columns**: `test_id`, `test_name`, `category`, `traces_to`, `execution_time_ms`, `result`, `error_message`.
+**Output path**: `/results/report.csv` and `/results/results.trx` inside the runner; mounted to `./e2e-results/` on the host.
+
+`traces_to` is populated from a `[Trait("traces_to", "AC-F-01, HW-02")]` attribute on each test method — the converter reads the attribute and writes a comma-separated cell. This makes the resulting CSV self-describing for the traceability-matrix check at autodev Step 7 (Run Tests).
+
+## Acceptance Criteria
+
+**AC-1: Test environment starts**
+Given a clean clone of the repo on a host with Docker installed,
+When `./scripts/run-tests.sh` is executed (or equivalent `docker compose -f e2e/docker-compose.test.yml up`),
+Then `postgres`, `rabbitmq`, `e2e-issuer`, `annotations`, `dataseed`, and `e2e-runner` all start in dependency order, the `annotations` service reaches `healthy`, and the test runner begins discovery.
+
+**AC-2: Mock JWKS responds with the in-stack public key**
+Given the test environment is up,
+When `wget http://e2e-issuer:8080/.well-known/jwks.json` is executed from the `annotations` container,
+Then the response is a valid JWKS with exactly one ES256 key whose `kid` matches the private key shared with `e2e-runner`.
+
+**AC-3: Token minter mints a valid token end-to-end**
+Given the test environment is up and `TokenMinter.MintToken("ANN")` is invoked,
+When the resulting token is presented as `Authorization: Bearer <token>` on `POST /annotations` with a fixture payload,
+Then the SUT returns HTTP 200 (token validates against the JWKS-published public key).
+
+**AC-4: Truncation fixture isolates classes**
+Given two test classes that each create one annotation row,
+When both run within the same test session,
+Then each class observes an empty `annotations` table at start and the SUT keeps no cross-class state.
+
+**AC-5: CSV report generated with required columns**
+Given a test session has completed,
+When the runner exits,
+Then `./e2e-results/report.csv` exists on the host and contains the columns: `test_id`, `test_name`, `category`, `traces_to`, `execution_time_ms`, `result`, `error_message`.
+
+**AC-6: Resilience helpers work**
+Given the test environment is up,
+When `BrokerFixture.StopBroker()` is invoked from a test,
+Then `docker exec rabbitmq rabbitmqctl stop_app` succeeds and `BrokerFixture.StartBroker()` reverses it within 5 s; the SUT recovers (subsequent `POST /annotations` returns 200) within the documented backoff window.
+
+## Constraints
+
+- `restrictions.md` SW-01: .NET 10 toolchain only — test runner pins `Microsoft.NET.Test.Sdk` to the version compatible with .NET 10.
+- `restrictions.md` HW-01: ARM64-only — the e2e-runner Dockerfile uses `mcr.microsoft.com/dotnet/sdk:10.0` which is multi-arch.
+- `restrictions.md` ENV-02: no in-image TLS — the test stack uses plain HTTP; the JWKS HTTPS gate (AZ-561) is satisfied by `ASPNETCORE_ENVIRONMENT=E2ETest`.
+- Every test must use the Arrange / Act / Assert pattern with `// Arrange`, `// Act`, `// Assert` comments (per `coderule.mdc`).
+- No mocks for internal services (`AnnotationService`, `FailsafeProducer`, etc.) — every test exercises the real public surface.
+- No direct writes to the SUT's tables from the runner. Read-only DB access is allowed only for blackbox-documented assertions (outbox row count, queue depth) and must be marked with a `[Trait("db_access", "read-only")]` attribute.
+
+## Risks & Mitigation
+
+**Risk 1: Docker socket bind exposes too much**
+- *Risk*: Mounting `/var/run/docker.sock` into the runner gives it root-equivalent access to the host. Acceptable in CI runners; risky on developer laptops.
+- *Mitigation*: The socket bind is in `docker-compose.test.yml`'s `e2e-runner` block only (not the SUT). Document that the test stack assumes a CI-like or isolated dev environment. `restrictions.md` does not forbid this.
+
+**Risk 2: JWKS keypair freshness**
+- *Risk*: A stale keypair lingering in the `jwt-keys` volume could cause cryptic JWKS failures between test runs.
+- *Mitigation*: `mock_issuer.py` regenerates the keypair on every container start if `gen_keys.sh` has not been run in the current container lifetime. `docker compose down -v` between full runs guarantees a fresh key.
+
+**Risk 3: Bulk seed slows boot**
+- *Risk*: 10k annotation rows + 50k detection rows in `dataseed` could push boot from ~5 s to ~30 s.
+- *Mitigation*: Bulk insert uses `CROSS JOIN generate_series` and a single `COPY FROM STDIN` so the seed completes in <10 s on local hardware. NFT-PERF tests already document a separate boot allowance; functional tests do not depend on the perf seed and run independently if the seed is split into a profile-gated step.
+
+## Self-Verification
+
+- [x] Every external dependency from `environment.md` has a mock service defined OR an explicit "real service used" justification (real Postgres, real Rabbit, mock issuer only).
+- [x] Docker Compose structure covers all services from `environment.md`.
+- [x] Test data fixtures cover all seed data sets from `test-data.md` (tokens-test, mission-test, classes-baseline, clean-state, runtime-generated big payloads, bulk-perf rows).
+- [x] Test runner configuration matches SUT tech stack (.NET 10, xUnit, RabbitMQ.Stream.Client at the same NuGet version).
+- [x] Data isolation strategy is defined (per-class truncate, per-test mission/consumer/token).
@@ -0,0 +1,53 @@
+# Annotations REST positive tests
+
+**Task**: AZ-565
+**Name**: Annotations REST positive flow tests
+**Description**: Implement xUnit tests for FT-P-01..06 — annotation create (small / empty / dense), idempotency on identical re-POST, paginated listing, detail-by-id. The core happy-path surface of the annotations REST API.
+**Complexity**: 5 points
+**Dependencies**: AZ-564 (test infrastructure)
+**Component**: Blackbox Tests → Annotations REST
+**Tracker**: jira
+**Epic**: AZ-563
+
+## Scenarios Covered
+
+| Test ID | Source | What it asserts |
+|---------|--------|-----------------|
+| FT-P-01 | `_docs/02_document/tests/blackbox-tests.md` | Annotation create — single detection, small image. HTTP 200, AnnotationDto, 32-char hex id, label file on disk. |
+| FT-P-02 | same | Idempotency on identical re-POST. Same id, no new DB row. |
+| FT-P-03 | same | Empty scene, 0 detections. HTTP 200; no label file or zero-line label file (per Spec). |
+| FT-P-04 | same | Dense scene, 5 mixed-class detections. HTTP 200; YOLO label file has 5 lines, class numbers from `classes-baseline`. |
+| FT-P-05 | same | Paginated listing — `GET /annotations?missionId=…&offset=&limit=`. PaginatedResponse envelope; ordering deterministic. |
+| FT-P-06 | same | Detail by id. `GET /annotations/{id}`. Full DTO including detections. |
+
+## System Under Test Boundary
+
+- Tests MUST drive the system through `http://annotations:8080/annotations` HTTP only. No in-process imports of `Azaion.Annotations.*`.
+- Stubs are NOT allowed for `AnnotationService`, `MediaService`, `PathResolver`, the hash function, the migrator, or the SUT's DB schema. The test exercises the real production code path end to end.
+- Read-only DB introspection is allowed only for asserting that the label file row exists in the `annotations` table (FT-P-01 step 4). Marked with `[Trait("db_access", "read-only")]`. No writes.
+- Read-only filesystem introspection on `annotations-images` volume is allowed only for asserting the label file contents (FT-P-01, FT-P-04). The test mounts the volume read-only.
+- Outputs are compared against `_docs/00_problem/input_data/expected_results/results_report.md` row F1-001 / F1-002 / F1-003 / F1-004 / F1-005 (regex on id, exact on detections.length, file_content on label files).
+
+## Acceptance Criteria
+
+**AC-1: Every scenario passes per its spec**
+Given the e2e stack is up and clean,
+When the runner executes each FT-P-01..06 test exactly as documented,
+Then each test reports PASS against the comparison method and tolerance in `results_report.md`.
+
+**AC-2: Tests are deterministic across re-runs**
+Given two consecutive runs of FT-P-01..06,
+When both complete successfully,
+Then the assertion outcomes are identical (same ids, same response shapes, same DB rows / label files).
+
+## Non-Functional Requirements
+
+- **Performance**: Each test in this batch completes in ≤5 s on the documented hardware; total batch runs in ≤30 s (no perf gates here — those are in T11).
+- **Reliability**: Tests use `CleanStateFixture` to isolate state; no carry-over between tests in the class.
+
+## Constraints
+
+- Use AAA pattern with `// Arrange`, `// Act`, `// Assert` comments per `coderule.mdc`.
+- Token minting via `TokenMinter.MintToken("ANN")` for every test in this batch.
+- `[Trait("traces_to", "AC-F-01, AC-F-02, AC-F-03, AC-F-04, HW-02")]` (or the per-test subset) on every test method.
+- One xUnit test class per scenario file or per closely-related group (e.g., `AnnotationCreateTests`, `AnnotationListingTests`).
@@ -0,0 +1,51 @@
+# Realtime + outbox positive tests
+
+**Task**: AZ-566
+**Name**: Realtime + outbox positive tests
+**Description**: Implement xUnit tests for FT-P-07..09 (SSE delivery, outbox row, stream message round-trip) plus the two RB-01-gated lifecycle tests FT-P-21/FT-P-22 (authored as `Skip(Reason="awaiting RB-01")` per the test-spec convention).
+**Complexity**: 5 points
+**Dependencies**: AZ-564 (test infrastructure)
+**Component**: Blackbox Tests → Realtime + Outbox
+**Tracker**: jira
+**Epic**: AZ-563
+
+## Scenarios Covered
+
+| Test ID | Source | What it asserts |
+|---------|--------|-----------------|
+| FT-P-07 | `_docs/02_document/tests/blackbox-tests.md` | SSE event for new annotation. Latency ≤ 1 s. No backfill (assertion in step 2). |
+| FT-P-08 | same | Outbox row inserted on annotation create. Direct DB SELECT. |
+| FT-P-09 | same | Stream message round-trip. Decode MessagePack + gzip; assert payload schema. |
+| FT-P-21 | same `[after RB-01]` | Lifecycle event on annotation update (Skipped — awaiting RB-01). |
+| FT-P-22 | same `[after RB-01]` | Lifecycle event on delete + soft-delete file relocation (Skipped — awaiting RB-01). |
+
+## System Under Test Boundary
+
+- SSE: connect via `HttpClient` with `Accept: text/event-stream` and `Authorization: Bearer …`. No stubbing of `AnnotationEventService` or its `Channel<T>`.
+- Outbox: read-only DB query on `annotations_queue_records` table after the create call. `[Trait("db_access", "read-only")]`.
+- Stream: connect to `rabbitmq:5552` via `RabbitMQ.Stream.Client` with a fresh consumer name starting at offset `next`. Decode payload using the same MessagePack + gzip pipeline the SUT uses. No stubbing of `FailsafeProducer` or `RabbitMqConfig`.
+- Compare against `results_report.md` row F3-001 (latency_threshold_max), F4-001 (outbox row content), F4-002 (decoded stream message).
+
+## Acceptance Criteria
+
+**AC-1: Every active scenario passes per its spec.**
+Given the e2e stack is up,
+When FT-P-07, FT-P-08, FT-P-09 are executed,
+Then each reports PASS within its tolerance (FT-P-07 ≤ 1 s latency; FT-P-08 row exists with expected `operation=Created`; FT-P-09 MessagePack payload matches expected schema).
+
+**AC-2: FT-P-21 and FT-P-22 are authored as skipped tests with the documented reason.**
+Given the test discovery,
+When the runner enumerates tests,
+Then FT-P-21 and FT-P-22 appear in the report with `result=Skipped` and `error_message="awaiting RB-01"` (or equivalent). They auto-enable when the cycle-update test-spec mode flips them to active.
+
+**AC-3: SSE no-backfill assertion**
+Given a subscriber that connects AFTER an annotation has been created,
+When the subscriber waits 1 s,
+Then no event is received for the pre-connection annotation. (FT-P-07 step 2; also satisfies AC-F-11.)
+
+## Constraints
+
+- AAA pattern with `// Arrange`, `// Act`, `// Assert` per `coderule.mdc`.
+- `[Trait("traces_to", "AC-F-05, AC-F-10, AC-F-11, AC-F-12, SW-03, SW-04")]` (per-test subset).
+- Stream consumer must be torn down after each test to avoid offset leakage.
+- SSE client must be cancelled cleanly after each test (no zombie connections).
@@ -0,0 +1,49 @@
+# Media + Dataset positive tests
+
+**Task**: AZ-567
+**Name**: Media + Dataset positive tests
+**Description**: Implement xUnit tests for media single/batch upload (FT-P-10, FT-P-11), classes catalog read (FT-P-14), directory settings invariant (FT-P-15), dataset filter / class distribution / bulk status (FT-P-16, FT-P-17, FT-P-18).
+**Complexity**: 5 points
+**Dependencies**: AZ-564 (test infrastructure)
+**Component**: Blackbox Tests → Media, Settings, Dataset
+**Tracker**: jira
+**Epic**: AZ-563
+
+## Scenarios Covered
+
+| Test ID | Source | What it asserts |
+|---------|--------|-----------------|
+| FT-P-10 | `_docs/02_document/tests/blackbox-tests.md` | Single media upload. 200 + DTO; file lives in `images_dir`. |
+| FT-P-11 | same | Batch media upload. All rows accepted; correct ids returned. |
+| FT-P-14 | same | `GET /classes` returns 19 rows from `classes-baseline`. |
+| FT-P-15 | same | `PUT /settings/directories` triggers `PathResolver.Reset()`. Subsequent uploads land in the new root. |
+| FT-P-16 | same | `GET /dataset?status=…` filter. Result set matches DB state. |
+| FT-P-17 | same | Dataset class distribution. Sums match raw counts. |
+| FT-P-18 | same | `POST /dataset/status/bulk`. Transitions exactly the listed ids; non-listed ids untouched. |
+
+## System Under Test Boundary
+
+- Drive via HTTP. No imports.
+- No stubbing of `MediaService`, `DatasetService`, `SettingsService`, `ClassesController`, `PathResolver`.
+- FT-P-15 requires direct write to the SUT's `images_dir` volume only to seed pre-existing files for the post-Reset assertion. Marked with `[Trait("fs_access", "write-to-image-dir")]` and only allowed for this specific test per the System Under Test Boundary rule.
+- Compare against `results_report.md` rows F5-001, F5-002, F7-001, F6-002 etc.
+
+## Acceptance Criteria
+
+**AC-1: Every scenario passes per its spec.** Given the stack is up, when each FT-P-10..18 test runs, then each reports PASS within tolerance.
+
+**AC-2: FT-P-15 invariant holds across PUT**
+Given an annotation was created under the original `images_dir`,
+When `PUT /settings/directories` changes the root and a new annotation is created,
+Then the new annotation's label file lives under the new root and the previous file is untouched (no migration).
+
+## Constraints
+
+- AAA pattern, `// Arrange`/`// Act`/`// Assert` comments.
+- Token policy varies per endpoint:
+  - `POST /media` → `ANN`
+  - `GET /classes` → any authenticated
+  - `PUT /settings/*` → `ADM`
+  - `GET /dataset` → `DATASET`
+  - `POST /dataset/status/bulk` → `DATASET`
+- `[Trait("traces_to", "AC-F-20, AC-F-21, AC-F-30, AC-F-31, AC-F-40, AC-F-41, HW-02")]` (per-test subset).
@@ -0,0 +1,35 @@
+# Auth + Health + Migrator positive tests
+
+**Task**: AZ-568
+**Name**: Auth + Health + Migrator positive tests
+**Description**: Implement xUnit tests for the bearer-token happy path (FT-P-12), alg pinning happy path (FT-P-13), health endpoint (FT-P-19), and migrator idempotence (FT-P-20).
+**Complexity**: 2 points
+**Dependencies**: AZ-564 (test infrastructure)
+**Component**: Blackbox Tests → Auth + Health + Migrator
+**Tracker**: jira
+**Epic**: AZ-563
+
+## Scenarios Covered
+
+| Test ID | Source | What it asserts |
+|---------|--------|-----------------|
+| FT-P-12 | `_docs/02_document/tests/blackbox-tests.md` | Bearer token happy path. ES256 token with valid `iss`/`aud`/`exp` + `ANN` claim is accepted. |
+| FT-P-13 | same | Alg pinning happy path — token signed with ES256 and `alg=ES256` header is accepted. (Negative variant `alg=HS256` is covered by NFT-SEC-10.) |
+| FT-P-19 | same | `GET /health` returns 200 OK without auth. Anonymous. |
+| FT-P-20 | same | Migrator idempotence — drop the database, recreate it twice via `docker restart annotations` and assert no errors. |
+
+## System Under Test Boundary
+
+- HTTP only. No imports.
+- FT-P-20 requires `docker restart annotations` from the runner (uses `BrokerFixture` pattern but renamed `SutRestartFixture`). DB state preserved across restart (via `pg-data` volume).
+- Compare against `results_report.md` row F8-001 (health), F8-002 (token validation succeeds).
+
+## Acceptance Criteria
+
+**AC-1: Every scenario passes per its spec.** Given the stack is up, when each FT-P-12, FT-P-13, FT-P-19, FT-P-20 test runs, then each reports PASS within tolerance.
+
+## Constraints
+
+- AAA pattern.
+- FT-P-19 uses no auth header (verifies `/health` is anonymous and disables the auth pipeline for this endpoint).
+- `[Trait("traces_to", "AC-F-50, AC-F-54, AC-N-01, AC-N-02, SW-05, OP-05")]` (per-test subset).
@@ -0,0 +1,43 @@
+# Validation + error-envelope negative tests
+
+**Task**: AZ-569
+**Name**: Validation + error-envelope negative tests
+**Description**: Implement xUnit tests for FT-N-01, FT-N-02, FT-N-05, FT-N-06, FT-N-07, FT-N-14, FT-N-16. Cover input validation failures, lenient-bbox behaviour, unknown ids, unknown missions, missing waypoint, empty bulk list. Each test asserts the documented error envelope shape.
+**Complexity**: 3 points
+**Dependencies**: AZ-564 (test infrastructure)
+**Component**: Blackbox Tests → Validation
+**Tracker**: jira
+**Epic**: AZ-563
+
+## Scenarios Covered
+
+| Test ID | Source | What it asserts |
+|---------|--------|-----------------|
+| FT-N-01 | `_docs/02_document/tests/blackbox-tests.md` | `POST /annotations` without `image_bytes`. HTTP 400/422; error envelope. |
+| FT-N-02 | same | `POST /annotations` without `mediaType`. HTTP 400/422; error envelope. |
+| FT-N-05 | same | Out-of-range bbox value — lenient behavior today (HTTP 200). Test pins that observed behavior; flagged as SEC-05 in security-tests.md. |
+| FT-N-06 | same | `GET /annotations/{nonexistent_id}`. HTTP 404; error envelope. |
+| FT-N-07 | same | Filter by unknown mission — returns empty page (not 404). |
+| FT-N-14 | same | Media upload missing `waypoint_id`. HTTP 400/422. |
+| FT-N-16 | same | `POST /dataset/status/bulk` with empty list. HTTP 400; error envelope. |
+
+## System Under Test Boundary
+
+- HTTP only.
+- No stubbing.
+- Every test asserts the error envelope shape against the contract in `_docs/02_document/common-helpers/01_http-error-envelope.md` and the global invariant AC-F-53.
+
+## Acceptance Criteria
+
+**AC-1: Every scenario produces the documented HTTP status + error envelope.**
+
+**AC-2: FT-N-05 pins the current lenient behavior and is tagged as SEC-05 follow-up.**
+Given an annotation with a bbox value of `1.5` or `-0.1`,
+When `POST /annotations` is called,
+Then HTTP 200 is returned today (lenient). Test asserts `[Trait("known_lenient", "true")]`. When SEC-05 lands, the test flips to expect 400 — handled by the test-spec cycle-update.
+
+## Constraints
+
+- AAA pattern.
+- `[Trait("traces_to", "AC-F-04, AC-F-53")]` plus per-test specific traces.
+- Token policy: most tests use `ANN`; FT-N-16 uses `DATASET`.
@@ -0,0 +1,44 @@
+# Authorization negative tests
+
+**Task**: AZ-570
+**Name**: Authorization negative tests
+**Description**: Implement xUnit tests for the 9 authorization-failure scenarios in `blackbox-tests.md` — wrong policy, missing token, expired token, wrong issuer, wrong audience, SSE without auth, settings without ADM, directories without ADM, media without ANN.
+**Complexity**: 3 points
+**Dependencies**: AZ-564 (test infrastructure)
+**Component**: Blackbox Tests → Authorization
+**Tracker**: jira
+**Epic**: AZ-563
+
+## Scenarios Covered
+
+| Test ID | Source | What it asserts |
+|---------|--------|-----------------|
+| FT-N-03 | `_docs/02_document/tests/blackbox-tests.md` | `POST /annotations` without `ANN` policy. HTTP 403. |
+| FT-N-04 | same | `POST /annotations` unauthenticated. HTTP 401. |
+| FT-N-08 | same | `GET /annotations/events` (SSE) without auth. HTTP 401. |
+| FT-N-09 | same | Bearer token expired. HTTP 401. |
+| FT-N-10 | same | Bearer token wrong issuer. HTTP 401. |
+| FT-N-11 | same | Bearer token wrong audience. HTTP 401. |
+| FT-N-12 | same | Mutating settings without `ADM`. HTTP 403. |
+| FT-N-13 | same | `PUT /settings/directories` without `ADM`. HTTP 403. |
+| FT-N-15 | same | Media upload without `ANN`. HTTP 403. |
+
+## System Under Test Boundary
+
+- HTTP only.
+- Token variants minted via `TokenMinter.MintToken(claim, overrides)` with `overrides` covering: expired, wrong-iss, wrong-aud.
+- Cross-policy tests use a token minted with a different claim than the endpoint requires (e.g., a `DATASET` token on `POST /annotations`).
+
+## Acceptance Criteria
+
+**AC-1: Every scenario produces the documented HTTP status + error envelope.**
+
+**AC-2: 401 vs 403 distinction is preserved.**
+- Missing / invalid token → 401 (authentication failed).
+- Valid token, wrong policy → 403 (authorization failed).
+
+## Constraints
+
+- AAA pattern.
+- `[Trait("traces_to", "AC-F-50, AC-F-52, SW-05")]` plus per-test specific traces.
+- Tests must not retry on 401/403 — single request, single assertion.
@@ -0,0 +1,52 @@
+# Security tests (NFT-SEC-01..10)
+
+**Task**: AZ-571
+**Name**: Security tests
+**Description**: Implement xUnit tests for all 10 security scenarios: JWT signature mismatch, expired, cross-policy DATASET/ANN, anonymous-access denials, error envelope no-stack-leak, path traversal in image/thumbnail GETs, token claim tampering, CORS preflight, alg-confusion `alg=HS256` forgery.
+**Complexity**: 5 points
+**Dependencies**: AZ-564 (test infrastructure)
+**Component**: Blackbox Tests → Security
+**Tracker**: jira
+**Epic**: AZ-563
+
+## Scenarios Covered
+
+| Test ID | Source | What it asserts |
+|---------|--------|-----------------|
+| NFT-SEC-01 | `_docs/02_document/tests/security-tests.md` | JWT signed with key NOT in JWKS. HTTP 401. |
+| NFT-SEC-02 | same | JWT expired. HTTP 401. |
+| NFT-SEC-03 | same | DATASET token → `POST /annotations`. HTTP 403. |
+| NFT-SEC-04 | same | ANN token → `PUT /settings/*`. HTTP 403. |
+| NFT-SEC-05 | same | Anonymous access to non-public endpoints. Only `/health` is anonymous; everything else returns 401 without auth. |
+| NFT-SEC-06 | same | Error envelope under Production env mode does NOT leak stack traces. |
+| NFT-SEC-07 | same | Path traversal in image / thumbnail GET routes. `../etc/passwd` style payloads return 400/404, never 200 with foreign content. |
+| NFT-SEC-08 | same | Token claim modification (signature breaks). HTTP 401. |
+| NFT-SEC-09 | same | CORS preflight respects `CorsConfig:AllowedOrigins` allow-list under Production. |
+| NFT-SEC-10 | same | Algorithm confusion — token forged with `alg=HS256` using the published ES256 public key as the HMAC secret. HTTP 401. |
+
+## System Under Test Boundary
+
+- HTTP only.
+- Token variants minted via `TokenMinter.MintToken(claim, overrides)`.
+- NFT-SEC-06 requires the SUT to be re-booted with `ASPNETCORE_ENVIRONMENT=Production` (and a Production-safe CORS config). This is a separate compose profile or test class with its own `SutRestartFixture`.
+- NFT-SEC-09 requires a second SUT boot under Production with `CorsConfig__AllowedOrigins__0: https://app.azaion.local`. Asserts ACAO is exactly that one origin.
+
+## Acceptance Criteria
+
+**AC-1: Every scenario passes per its spec.**
+
+**AC-2: NFT-SEC-10 explicitly verifies algorithm pinning**
+Given a token forged with `alg=HS256` and the published ES256 public key as the HMAC secret,
+When the runner presents it to `POST /annotations`,
+Then HTTP 401 is returned and the error envelope contains "Bearer error=invalid_token" in `WWW-Authenticate`.
+
+**AC-3: NFT-SEC-06 verifies no stack leak**
+Given `ASPNETCORE_ENVIRONMENT=Production`,
+When a request triggers a 500-class error,
+Then the response body's error envelope contains only the safe error code and message — no `stackTrace`, no `innerException`, no file paths.
+
+## Constraints
+
+- AAA pattern.
+- `[Trait("traces_to", "AC-F-50, AC-F-51, AC-F-52, SW-05, ENV-06")]` plus per-test specific traces.
+- Production-env tests run in a dedicated test class with its own fixture (no leak between Production and E2ETest boots).
@@ -0,0 +1,49 @@
+# Resilience tests (NFT-RES-01..06)
+
+**Task**: AZ-572
+**Name**: Resilience tests
+**Description**: Implement xUnit tests for the 6 resilience scenarios: RabbitMQ outage during create, Postgres restart, Postgres unreachable, SSE subscriber disconnect mid-stream, `FailsafeProducer` empty-catch path, stream consumer reconnect.
+**Complexity**: 5 points
+**Dependencies**: AZ-564 (test infrastructure)
+**Component**: Blackbox Tests → Resilience
+**Tracker**: jira
+**Epic**: AZ-563
+
+## Scenarios Covered
+
+| Test ID | Source | What it asserts |
+|---------|--------|-----------------|
+| NFT-RES-01 | `_docs/02_document/tests/resilience-tests.md` | RabbitMQ outage during create. `POST /annotations` returns 200; outbox row stays; on broker recovery, message is delivered. |
+| NFT-RES-02 | same | Postgres restart between writes. `POST /annotations` after restart succeeds without errors. |
+| NFT-RES-03 | same | Postgres unreachable during create. `POST /annotations` returns 5xx; error envelope; no partial state. |
+| NFT-RES-04 | same | SSE subscriber disconnects mid-stream. Server tears down channel cleanly; no zombie subscriptions; per-instance state cleanup. |
+| NFT-RES-05 | same | Repeated FailsafeProducer empty-catch path (catch{} swallowing IOException). Drain loop survives missing image file; no crash. |
+| NFT-RES-06 | same | Stream consumer reconnect. After broker restart, consumer resumes from offset and reads the same messages. |
+
+## System Under Test Boundary
+
+- HTTP only for SUT invocations.
+- `BrokerFixture.StopBroker()` / `StartBroker()` for NFT-RES-01, NFT-RES-06.
+- `docker exec postgres pg_ctl stop` / `start` (or `docker restart postgres`) for NFT-RES-02, NFT-RES-03.
+- NFT-RES-05 deliberately removes a specific image file from `annotations-images` volume (out-of-band, runner-only) to trigger the empty-catch path. Marked with `[Trait("fs_access", "delete-image-file")]`.
+- Long-running scenarios (NFT-RES-06 reconnect window) use a `[Fact(Timeout = 60000)]` cap.
+
+## Acceptance Criteria
+
+**AC-1: Every scenario passes per its spec.**
+
+**AC-2: SUT never crashes during the test class**
+Given any resilience test in the class is running,
+When the runner asserts the test's post-condition,
+Then the SUT's `/health` endpoint still returns 200 (the SUT survives every external failure).
+
+**AC-3: NFT-RES-01 verifies stream delivery on recovery**
+Given the broker was stopped before a `POST /annotations` and restarted after,
+When `BrokerFixture.StartBroker()` returns,
+Then the stream consumer reads the queued message within 30 s of broker recovery.
+
+## Constraints
+
+- AAA pattern.
+- `[Trait("traces_to", "AC-F-04, AC-F-12, AC-N-04, SW-03")]` plus per-test specific traces.
+- Resilience tests are long; group them in their own xUnit collection to avoid blocking the fast suite.
@@ -0,0 +1,49 @@
+# Resource-limit tests (NFT-RES-LIM-01..06)
+
+**Task**: AZ-573
+**Name**: Resource-limit tests
+**Description**: Implement xUnit tests for the 6 resource-limit scenarios: sustained-load process memory, single-file upload boundary, outbox depth under broker outage, disk usage by `images_dir`, concurrent SSE subscribers, migration on cold-start cost.
+**Complexity**: 3 points
+**Dependencies**: AZ-564 (test infrastructure)
+**Component**: Blackbox Tests → Resource Limits
+**Tracker**: jira
+**Epic**: AZ-563
+
+## Scenarios Covered
+
+| Test ID | Source | What it asserts |
+|---------|--------|-----------------|
+| NFT-RES-LIM-01 | `_docs/02_document/tests/resource-limit-tests.md` | Sustained-load process memory stays within configured envelope. |
+| NFT-RES-LIM-02 | same | Single-file upload boundary — 1, 10, 50, 100, 256, 512 MB. Uses `LargePayloadFixture` synthetic JPEGs. |
+| NFT-RES-LIM-03 | same | Outbox queue depth bounded under broker outage. Depth never exceeds documented ceiling for ≥ 30 min run. |
+| NFT-RES-LIM-04 | same | Disk usage by `images_dir` over many distinct uploads. Stays under documented HW-02 budget. |
+| NFT-RES-LIM-05 | same | Concurrent SSE subscribers — process-memory boundary. N concurrent subscribers don't push memory past envelope. |
+| NFT-RES-LIM-06 | same | Migration on cold-start cost. Time-to-`/health=200` from cold start within the documented boot budget. |
+
+## System Under Test Boundary
+
+- HTTP only.
+- Memory + disk metrics read from `docker stats` (out-of-band, runner-only). Marked `[Trait("docker_stats", "true")]`.
+- NFT-RES-LIM-02 uses `LargePayloadFixture` to generate synthetic JPEGs at runtime; never committed to repo.
+- NFT-RES-LIM-03 long-running (30 min smoke variant); the nightly profile runs the full 30 min, the standard profile runs a 5-min smoke proxy.
+- NFT-RES-LIM-05 spawns N parallel SSE subscribers via `Parallel.For` + per-subscriber `HttpClient`.
+
+## Acceptance Criteria
+
+**AC-1: Every scenario passes per its spec.**
+
+**AC-2: Smoke vs nightly profile distinction**
+Given a profile environment variable `E2E_RUN_PROFILE=functional` (default),
+When NFT-RES-LIM-03 runs,
+Then it executes a 5-min smoke proxy (not the 30-min full run); under `E2E_RUN_PROFILE=performance`, it runs the full 30 min.
+
+**AC-3: Memory + disk readings have measurement uncertainty noted**
+Given `docker stats` is the measurement source,
+When the test records a memory or disk reading,
+Then the result includes a tolerance margin (e.g., ± 50 MB for memory, ± 100 MB for disk) per the documented `results_report.md` tolerance.
+
+## Constraints
+
+- AAA pattern.
+- `[Trait("traces_to", "AC-N-03, AC-N-05, HW-02, HW-03")]` plus per-test specific traces.
+- Long-running tests `[Fact(Timeout = ?)]` per documented duration; never hang the runner.
@@ -0,0 +1,50 @@
+# Performance tests (NFT-PERF-*)
+
+**Task**: AZ-574
+**Name**: Performance tests
+**Description**: Implement xUnit tests for the 7 performance scenarios: annotation create p95 latency (small + large), sustained writes throughput, FailsafeProducer drain rate, SSE delivery latency under fan-out, annotation listing at scale, dataset class distribution at scale.
+**Complexity**: 3 points
+**Dependencies**: AZ-564 (test infrastructure; depends on dataseed populating the 10k/50k bulk rows for the "at scale" tests)
+**Component**: Blackbox Tests → Performance
+**Tracker**: jira
+**Epic**: AZ-563
+
+## Scenarios Covered
+
+| Test ID | Source | What it asserts |
+|---------|--------|-----------------|
+| NFT-PERF-LATENCY-01 | `_docs/02_document/tests/performance-tests.md` | `POST /annotations` p95 latency — small image (image_small.jpg) ≤ documented threshold (≤ 1500 ms per spec). |
+| NFT-PERF-LATENCY-02 | same | `POST /annotations` p95 latency — large image (image_large.JPG, ~7 MB) ≤ documented threshold. |
+| NFT-PERF-THROUGHPUT-01 | same | Sustained writes throughput — RPS over a 60-s window meets the documented threshold. |
+| NFT-PERF-OUTBOX-DRAIN-01 | same | FailsafeProducer drain rate — outbox depth converges to 0 within the documented window after a burst. |
+| NFT-PERF-SSE-FANOUT-01 | same | SSE delivery latency under modest fan-out (N=20 subscribers) — p95 latency ≤ documented threshold. |
+| NFT-PERF-LIST-01 | same | `GET /annotations` listing on populated DB (10k rows). p95 latency ≤ documented threshold. |
+| NFT-PERF-DATASET-01 | same | Dataset class distribution at scale (50k detections). p95 latency ≤ documented threshold. |
+
+## System Under Test Boundary
+
+- HTTP only.
+- p95 computed by the test from a sample of N requests (per-scenario sample size in the spec).
+- NFT-PERF-LIST-01 / NFT-PERF-DATASET-01 require `dataseed` to have populated the bulk rows (AZ-564 covers this).
+- Profile gate: `E2E_RUN_PROFILE=performance` enables these tests; the standard `functional` profile skips them (they are too long for the merge gate).
+
+## Acceptance Criteria
+
+**AC-1: Every perf scenario passes its threshold under the `performance` profile.**
+
+**AC-2: Smoke variant runs in the standard profile**
+Given `E2E_RUN_PROFILE=functional`,
+When the test runs,
+Then a short smoke variant (e.g., 10 requests instead of 1000) executes and only asserts p95 < 2× the threshold (a sanity check, not a perf gate).
+
+**AC-3: Measurement uncertainty acknowledged**
+Given p95 is computed from a finite sample,
+When the test reports its result,
+Then the result includes the sample size, the actual p95, and the documented threshold. Failures include a JSON report file at `e2e-results/perf-<test_id>.json`.
+
+## Constraints
+
+- AAA pattern.
+- `[Trait("traces_to", "AC-F-10, AC-N-01")]` plus per-test specific traces.
+- Perf tests run in their own xUnit collection so they don't block functional tests during interactive runs.
+- Performance thresholds come from `results_report.md`; tests must not hard-code numbers — they read them from a fixture.