[AZ-563] Decompose blackbox tests into AZ-564..574 task specs

Step 5 of autodev existing-code flow. Epic AZ-563 plus 11 atomic tasks covering all 67 test scenarios from _docs/02_document/tests/* exactly once: - AZ-564 test infrastructure (xUnit + Docker + mock JWKS + dataseed) - AZ-565..568 functional positive (FT-P-01..22) - AZ-569..570 functional negative (FT-N-01..16) - AZ-571 security (NFT-SEC-01..10) - AZ-572 resilience (NFT-RES-01..06) - AZ-573 resource limits (NFT-RES-LIM-01..06) - AZ-574 performance (NFT-PERF-*) _dependencies_table.md records the cross-check vs traceability matrix (22 + 16 + 29 = 67 scenarios, no overlaps, no gaps; deferred items remain deferred per matrix). All task headers carry their Jira IDs (tracker: jira). Autodev state advanced to Step 6 (Implement Tests). Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-22 10:31:06 +00:00 · 2026-05-14 21:13:53 +03:00
parent 637f41c51c
commit cf632d9e2e
13 changed files with 703 additions and 2 deletions
@@ -0,0 +1,196 @@
+# Test Infrastructure
+
+**Task**: AZ-564
+**Name**: Test Infrastructure (Annotations e2e)
+**Description**: Scaffold the executable blackbox test project — xUnit runner, mock JWKS issuer, ES256 key-pair fixture, Docker test stack, fixture mounts, seed script, CSV reporting. After this task lands, every other test task can declare itself a child of this scaffold.
+**Complexity**: 5 points
+**Dependencies**: AZ-560 (testability refactor — already landed via AZ-561 and AZ-562)
+**Component**: Blackbox Tests
+**Tracker**: jira
+**Epic**: AZ-563 — `Blackbox Tests — Annotations`
+
+## Test Project Folder Layout
+
+```
+tests/
+├── Azaion.Annotations.E2E/
+│   ├── Azaion.Annotations.E2E.csproj
+│   ├── Dockerfile
+│   ├── TestBase.cs                       # base class with HttpClient, token helper
+│   ├── Fixtures/
+│   │   ├── DockerStackFixture.cs         # CollectionFixture — boot order check
+│   │   ├── CleanStateFixture.cs          # TRUNCATE between test classes
+│   │   ├── BrokerFixture.cs              # RabbitMQ stop/start helpers
+│   │   └── TokenMinter.cs                # ES256 token minting via the in-stack key
+│   ├── Domain/                           # one file per category (one task per file)
+│   │   ├── (populated by AZ-565 ... AZ-573)
+│   └── README.md
+└── harness/
+    ├── mock_issuer.py                    # ~40-line Python http.server (writes JWKS, mounts private key)
+    └── gen_keys.sh                       # one-shot ES256 keypair generator (invoked by mock_issuer at boot)
+
+e2e/
+├── docker-compose.test.yml               # already produced in autodev Step 3; this task wires the new services into it
+├── seed/
+│   └── run.sh                            # already drafted in Step 3; this task adds bulk-insert SQL for NFT-PERF-LIST-01 and NFT-PERF-DATASET-01
+└── e2e-results/                          # output of test runs (gitignored)
+```
+
+### Layout Rationale
+
+- Tests live under `tests/Azaion.Annotations.E2E/` to mirror the .NET convention (sibling of `src/`).
+- The mock issuer lives in `tests/harness/` so it can be shared by smoke / debug stacks without polluting the test runner project.
+- Fixtures are separated from test classes to make the docker-stack boot pattern reusable.
+- All tests are xUnit (matches the SUT runtime; avoids a Python toolchain in CI).
+
+## Mock Services
+
+| Mock Service | Replaces | Endpoints | Behavior |
+|--------------|----------|-----------|----------|
+| `e2e-issuer` (Python `http.server`) | Admin's JWKS issuer | `GET /.well-known/jwks.json` (returns a 1-key JWKS for the in-stack ES256 public key) | Static for the lifetime of the docker-compose stack. Public key regenerates per `docker compose down -v` cycle. No test-time mutability needed — variant tokens (expired / wrong-iss / wrong-aud / `alg=HS256` forgery) are minted with overrides by the runner against the same private key (NFT-SEC-01..10 verifies the SUT rejects them). |
+
+There are no other mock services. All other infrastructure is real (Postgres 13, RabbitMQ 3.13 streams) — restrictions.md mandates "no mocking of internal services". External dependencies that *could* be mocked (admin sync worker, AI training consumer) are simply not run because the SUT does not initiate calls to them; it publishes to the stream and the stream is read by the test runner directly.
+
+### Mock Control API
+
+Not applicable for this suite. The mock issuer is static; behavior variation is performed by the runner minting different tokens. Broker / DB resilience is performed by `docker exec rabbitmq rabbitmqctl stop_app` and `docker restart postgres` invoked from the test runner — driven via .NET's `Process.Start` against the host docker socket bound into the runner container.
+
+## Docker Test Environment
+
+### docker-compose.test.yml structure
+
+| Service | Image / Build | Purpose | Depends On |
+|---------|---------------|---------|------------|
+| `postgres` | `postgres:13` | SUT's DB | — |
+| `rabbitmq` | `rabbitmq:3.13-management` + streams plugin | Stream broker | — |
+| `e2e-issuer` | `python:3.12-alpine` running `tests/harness/mock_issuer.py` | Mock JWKS issuer + key pair generator | — |
+| `annotations` | Built from `src/Dockerfile` | SUT | `postgres` (healthy), `rabbitmq` (healthy), `e2e-issuer` (healthy) |
+| `dataseed` | `postgres:13` (one-shot psql) | Loads `classes-baseline`, mission row, and the bulk rows for NFT-PERF-LIST-01 / NFT-PERF-DATASET-01 | `annotations` (healthy) |
+| `e2e-runner` | Built from `tests/Azaion.Annotations.E2E/Dockerfile` (.NET SDK 10.0) | Test runner (xUnit) | `dataseed` (completed_successfully) |
+
+### Networks and Volumes
+
+- **Network**: `e2e-net` (bridge, isolated). All services reach each other by service name.
+- **Volumes**:
+  - `pg-data` — Postgres durability across restart (resilience scenarios).
+  - `annotations-images`, `annotations-videos`, `annotations-deleted` — SUT file dirs.
+  - `jwt-keys` — ES256 keypair shared between `e2e-issuer` (writes public + serves JWKS) and `e2e-runner` (reads private key for token minting).
+- **Bind mount (read-only)**: `../detections/_docs/00_problem/input_data` → `/fixtures` in both the SUT and the runner.
+
+### Test runner host-docker access
+
+The runner needs to execute `docker exec rabbitmq rabbitmqctl stop_app` (NFT-RES-01..03) and `docker restart postgres` (NFT-RES-02..03). Solution: bind-mount the host docker socket into the runner (`/var/run/docker.sock:/var/run/docker.sock`) under a `RESILIENCE_DOCKER_SOCKET` env var; the `BrokerFixture` / `DbFixture` use it. This is gated to the test stack — the production SUT never mounts the docker socket.
+
+## Test Runner Configuration
+
+**Framework**: xUnit (matches SUT toolchain — .NET 10).
+**Plugins / NuGet refs**:
+- `Microsoft.NET.Test.Sdk` (xUnit discovery)
+- `xunit` + `xunit.runner.visualstudio`
+- `RabbitMQ.Stream.Client` (same version as `src/Azaion.Annotations.csproj`)
+- `MessagePack` (same version) — to decode stream messages for FT-P-09
+- `Microsoft.AspNetCore.SignalR.Client` — NO, SSE is plain HTTP `text/event-stream`; we use `HttpClient` directly
+- `System.IdentityModel.Tokens.Jwt` — for ES256 minting
+- `Npgsql` — for direct DB introspection assertions (read-only, documented per test)
+- `coverlet.collector` — for coverage; not gated on this run but nice to have
+
+**Entry point**: `dotnet test --logger "trx;LogFileName=results.trx" --results-directory /results` — followed by a tiny CSV-converter post-step in `Dockerfile`'s ENTRYPOINT that produces `/results/report.csv` from `results.trx`.
+
+### Fixture Strategy
+
+| Fixture | Scope | Purpose |
+|---------|-------|---------|
+| `DockerStackFixture` | Collection (one per assembly) | Smoke-pings `/health` and waits for JWKS fetch on boot. Does NOT bring the stack up — that's `docker compose up`'s job. |
+| `CleanStateFixture` | Class (per test class) | `TRUNCATE annotations, media, detection, annotations_queue_records RESTART IDENTITY CASCADE` via direct Postgres. Run before first test, again after last. |
+| `TokenMinter` | Singleton (within fixture lifetime) | Holds the ES256 private key (read from `/keys` mount) and exposes `MintToken(claim, overrides?)`. |
+| `BrokerFixture` | Per-test (only for resilience tests) | `StopBroker()`, `StartBroker()` via `docker exec`. Asserts pre/post state. |
+| `StreamConsumerFixture` | Per-test (only for stream-consumer tests) | Creates a fresh consumer name, starts at offset `next`, decodes MessagePack + gzip into typed events. |
+
+## Test Data Fixtures
+
+| Data Set | Source | Format | Used By |
+|----------|--------|--------|---------|
+| Image / video fixtures | Bind-mount `../detections/_docs/00_problem/input_data/` → `/fixtures` (read-only) | JPEG / MP4 binary | All FT-P-* and most FT-N-* |
+| `classes-baseline` (19 detection classes) | Auto-seeded by `DatabaseMigrator` on `annotations` first boot | DB rows | FT-P-14 (catalog read), every FT-P that references `class_num` |
+| `mission-test` GUID `00000000-0000-0000-0000-000000000aaa` | Inlined in request payloads | GUID | All annotation-create paths |
+| Synthetic JPEGs for NFT-RES-LIM-02 | Generated at test time by `LargePayloadFixture` (1, 10, 50, 100, 256, 512 MB) | binary | NFT-RES-LIM-02 |
+| Bulk rows for NFT-PERF-LIST-01 / NFT-PERF-DATASET-01 (10k annotations, 50k detections) | `dataseed/run.sh` SQL block | DB rows | NFT-PERF-LIST-01, NFT-PERF-DATASET-01 |
+| Per-test ES256 tokens | `TokenMinter` (in-process minting) | JWT | All FT-* requiring `Authorization` header and all NFT-SEC-* |
+
+### Data Isolation
+
+- **Per-class truncation** via `CleanStateFixture` (above).
+- **Per-test mission GUID** for SSE fan-out tests (FT-P-07, NFT-PERF-SSE-FANOUT-01).
+- **Per-test stream consumer name** for FT-P-09 and NFT-RES-06.
+- **Volume reset on `docker compose down -v`** — image / video dirs and the JWKS keypair regenerate.
+
+## Test Reporting
+
+**Format**: `.trx` (xUnit native), converted to flat CSV by the runner.
+**CSV columns**: `test_id`, `test_name`, `category`, `traces_to`, `execution_time_ms`, `result`, `error_message`.
+**Output path**: `/results/report.csv` and `/results/results.trx` inside the runner; mounted to `./e2e-results/` on the host.
+
+`traces_to` is populated from a `[Trait("traces_to", "AC-F-01, HW-02")]` attribute on each test method — the converter reads the attribute and writes a comma-separated cell. This makes the resulting CSV self-describing for the traceability-matrix check at autodev Step 7 (Run Tests).
+
+## Acceptance Criteria
+
+**AC-1: Test environment starts**
+Given a clean clone of the repo on a host with Docker installed,
+When `./scripts/run-tests.sh` is executed (or equivalent `docker compose -f e2e/docker-compose.test.yml up`),
+Then `postgres`, `rabbitmq`, `e2e-issuer`, `annotations`, `dataseed`, and `e2e-runner` all start in dependency order, the `annotations` service reaches `healthy`, and the test runner begins discovery.
+
+**AC-2: Mock JWKS responds with the in-stack public key**
+Given the test environment is up,
+When `wget http://e2e-issuer:8080/.well-known/jwks.json` is executed from the `annotations` container,
+Then the response is a valid JWKS with exactly one ES256 key whose `kid` matches the private key shared with `e2e-runner`.
+
+**AC-3: Token minter mints a valid token end-to-end**
+Given the test environment is up and `TokenMinter.MintToken("ANN")` is invoked,
+When the resulting token is presented as `Authorization: Bearer <token>` on `POST /annotations` with a fixture payload,
+Then the SUT returns HTTP 200 (token validates against the JWKS-published public key).
+
+**AC-4: Truncation fixture isolates classes**
+Given two test classes that each create one annotation row,
+When both run within the same test session,
+Then each class observes an empty `annotations` table at start and the SUT keeps no cross-class state.
+
+**AC-5: CSV report generated with required columns**
+Given a test session has completed,
+When the runner exits,
+Then `./e2e-results/report.csv` exists on the host and contains the columns: `test_id`, `test_name`, `category`, `traces_to`, `execution_time_ms`, `result`, `error_message`.
+
+**AC-6: Resilience helpers work**
+Given the test environment is up,
+When `BrokerFixture.StopBroker()` is invoked from a test,
+Then `docker exec rabbitmq rabbitmqctl stop_app` succeeds and `BrokerFixture.StartBroker()` reverses it within 5 s; the SUT recovers (subsequent `POST /annotations` returns 200) within the documented backoff window.
+
+## Constraints
+
+- `restrictions.md` SW-01: .NET 10 toolchain only — test runner pins `Microsoft.NET.Test.Sdk` to the version compatible with .NET 10.
+- `restrictions.md` HW-01: ARM64-only — the e2e-runner Dockerfile uses `mcr.microsoft.com/dotnet/sdk:10.0` which is multi-arch.
+- `restrictions.md` ENV-02: no in-image TLS — the test stack uses plain HTTP; the JWKS HTTPS gate (AZ-561) is satisfied by `ASPNETCORE_ENVIRONMENT=E2ETest`.
+- Every test must use the Arrange / Act / Assert pattern with `// Arrange`, `// Act`, `// Assert` comments (per `coderule.mdc`).
+- No mocks for internal services (`AnnotationService`, `FailsafeProducer`, etc.) — every test exercises the real public surface.
+- No direct writes to the SUT's tables from the runner. Read-only DB access is allowed only for blackbox-documented assertions (outbox row count, queue depth) and must be marked with a `[Trait("db_access", "read-only")]` attribute.
+
+## Risks & Mitigation
+
+**Risk 1: Docker socket bind exposes too much**
+- *Risk*: Mounting `/var/run/docker.sock` into the runner gives it root-equivalent access to the host. Acceptable in CI runners; risky on developer laptops.
+- *Mitigation*: The socket bind is in `docker-compose.test.yml`'s `e2e-runner` block only (not the SUT). Document that the test stack assumes a CI-like or isolated dev environment. `restrictions.md` does not forbid this.
+
+**Risk 2: JWKS keypair freshness**
+- *Risk*: A stale keypair lingering in the `jwt-keys` volume could cause cryptic JWKS failures between test runs.
+- *Mitigation*: `mock_issuer.py` regenerates the keypair on every container start if `gen_keys.sh` has not been run in the current container lifetime. `docker compose down -v` between full runs guarantees a fresh key.
+
+**Risk 3: Bulk seed slows boot**
+- *Risk*: 10k annotation rows + 50k detection rows in `dataseed` could push boot from ~5 s to ~30 s.
+- *Mitigation*: Bulk insert uses `CROSS JOIN generate_series` and a single `COPY FROM STDIN` so the seed completes in <10 s on local hardware. NFT-PERF tests already document a separate boot allowance; functional tests do not depend on the perf seed and run independently if the seed is split into a profile-gated step.
+
+## Self-Verification
+
+- [x] Every external dependency from `environment.md` has a mock service defined OR an explicit "real service used" justification (real Postgres, real Rabbit, mock issuer only).
+- [x] Docker Compose structure covers all services from `environment.md`.
+- [x] Test data fixtures cover all seed data sets from `test-data.md` (tokens-test, mission-test, classes-baseline, clean-state, runtime-generated big payloads, bulk-perf rows).
+- [x] Test runner configuration matches SUT tech stack (.NET 10, xUnit, RabbitMQ.Stream.Client at the same NuGet version).
+- [x] Data isolation strategy is defined (per-class truncate, per-test mission/consumer/token).