# Test Infrastructure **Task**: AZ-564 **Name**: Test Infrastructure (Annotations e2e) **Description**: Scaffold the executable blackbox test project — xUnit runner, mock JWKS issuer, ES256 key-pair fixture, Docker test stack, fixture mounts, seed script, CSV reporting. After this task lands, every other test task can declare itself a child of this scaffold. **Complexity**: 5 points **Dependencies**: AZ-560 (testability refactor — already landed via AZ-561 and AZ-562) **Component**: Blackbox Tests **Tracker**: jira **Epic**: AZ-563 — `Blackbox Tests — Annotations` ## Test Project Folder Layout ``` tests/ ├── Azaion.Annotations.E2E/ │ ├── Azaion.Annotations.E2E.csproj │ ├── Dockerfile │ ├── TestBase.cs # base class with HttpClient, token helper │ ├── Fixtures/ │ │ ├── DockerStackFixture.cs # CollectionFixture — boot order check │ │ ├── CleanStateFixture.cs # TRUNCATE between test classes │ │ ├── BrokerFixture.cs # RabbitMQ stop/start helpers │ │ └── TokenMinter.cs # ES256 token minting via the in-stack key │ ├── Domain/ # one file per category (one task per file) │ │ ├── (populated by AZ-565 ... AZ-573) │ └── README.md └── harness/ ├── mock_issuer.py # ~40-line Python http.server (writes JWKS, mounts private key) └── gen_keys.sh # one-shot ES256 keypair generator (invoked by mock_issuer at boot) e2e/ ├── docker-compose.test.yml # already produced in autodev Step 3; this task wires the new services into it ├── seed/ │ └── run.sh # already drafted in Step 3; this task adds bulk-insert SQL for NFT-PERF-LIST-01 and NFT-PERF-DATASET-01 └── e2e-results/ # output of test runs (gitignored) ``` ### Layout Rationale - Tests live under `tests/Azaion.Annotations.E2E/` to mirror the .NET convention (sibling of `src/`). - The mock issuer lives in `tests/harness/` so it can be shared by smoke / debug stacks without polluting the test runner project. - Fixtures are separated from test classes to make the docker-stack boot pattern reusable. - All tests are xUnit (matches the SUT runtime; avoids a Python toolchain in CI). ## Mock Services | Mock Service | Replaces | Endpoints | Behavior | |--------------|----------|-----------|----------| | `e2e-issuer` (Python `http.server`) | Admin's JWKS issuer | `GET /.well-known/jwks.json` (returns a 1-key JWKS for the in-stack ES256 public key) | Static for the lifetime of the docker-compose stack. Public key regenerates per `docker compose down -v` cycle. No test-time mutability needed — variant tokens (expired / wrong-iss / wrong-aud / `alg=HS256` forgery) are minted with overrides by the runner against the same private key (NFT-SEC-01..10 verifies the SUT rejects them). | There are no other mock services. All other infrastructure is real (Postgres 13, RabbitMQ 3.13 streams) — restrictions.md mandates "no mocking of internal services". External dependencies that *could* be mocked (admin sync worker, AI training consumer) are simply not run because the SUT does not initiate calls to them; it publishes to the stream and the stream is read by the test runner directly. ### Mock Control API Not applicable for this suite. The mock issuer is static; behavior variation is performed by the runner minting different tokens. Broker / DB resilience is performed by `docker exec rabbitmq rabbitmqctl stop_app` and `docker restart postgres` invoked from the test runner — driven via .NET's `Process.Start` against the host docker socket bound into the runner container. ## Docker Test Environment ### docker-compose.test.yml structure | Service | Image / Build | Purpose | Depends On | |---------|---------------|---------|------------| | `postgres` | `postgres:13` | SUT's DB | — | | `rabbitmq` | `rabbitmq:3.13-management` + streams plugin | Stream broker | — | | `e2e-issuer` | `python:3.12-alpine` running `tests/harness/mock_issuer.py` | Mock JWKS issuer + key pair generator | — | | `annotations` | Built from `src/Dockerfile` | SUT | `postgres` (healthy), `rabbitmq` (healthy), `e2e-issuer` (healthy) | | `dataseed` | `postgres:13` (one-shot psql) | Loads `classes-baseline`, mission row, and the bulk rows for NFT-PERF-LIST-01 / NFT-PERF-DATASET-01 | `annotations` (healthy) | | `e2e-runner` | Built from `tests/Azaion.Annotations.E2E/Dockerfile` (.NET SDK 10.0) | Test runner (xUnit) | `dataseed` (completed_successfully) | ### Networks and Volumes - **Network**: `e2e-net` (bridge, isolated). All services reach each other by service name. - **Volumes**: - `pg-data` — Postgres durability across restart (resilience scenarios). - `annotations-images`, `annotations-videos`, `annotations-deleted` — SUT file dirs. - `jwt-keys` — ES256 keypair shared between `e2e-issuer` (writes public + serves JWKS) and `e2e-runner` (reads private key for token minting). - **Bind mount (read-only)**: `../detections/_docs/00_problem/input_data` → `/fixtures` in both the SUT and the runner. ### Test runner host-docker access The runner needs to execute `docker exec rabbitmq rabbitmqctl stop_app` (NFT-RES-01..03) and `docker restart postgres` (NFT-RES-02..03). Solution: bind-mount the host docker socket into the runner (`/var/run/docker.sock:/var/run/docker.sock`) under a `RESILIENCE_DOCKER_SOCKET` env var; the `BrokerFixture` / `DbFixture` use it. This is gated to the test stack — the production SUT never mounts the docker socket. ## Test Runner Configuration **Framework**: xUnit (matches SUT toolchain — .NET 10). **Plugins / NuGet refs**: - `Microsoft.NET.Test.Sdk` (xUnit discovery) - `xunit` + `xunit.runner.visualstudio` - `RabbitMQ.Stream.Client` (same version as `src/Azaion.Annotations.csproj`) - `MessagePack` (same version) — to decode stream messages for FT-P-09 - `Microsoft.AspNetCore.SignalR.Client` — NO, SSE is plain HTTP `text/event-stream`; we use `HttpClient` directly - `System.IdentityModel.Tokens.Jwt` — for ES256 minting - `Npgsql` — for direct DB introspection assertions (read-only, documented per test) - `coverlet.collector` — for coverage; not gated on this run but nice to have **Entry point**: `dotnet test --logger "trx;LogFileName=results.trx" --results-directory /results` — followed by a tiny CSV-converter post-step in `Dockerfile`'s ENTRYPOINT that produces `/results/report.csv` from `results.trx`. ### Fixture Strategy | Fixture | Scope | Purpose | |---------|-------|---------| | `DockerStackFixture` | Collection (one per assembly) | Smoke-pings `/health` and waits for JWKS fetch on boot. Does NOT bring the stack up — that's `docker compose up`'s job. | | `CleanStateFixture` | Class (per test class) | `TRUNCATE annotations, media, detection, annotations_queue_records RESTART IDENTITY CASCADE` via direct Postgres. Run before first test, again after last. | | `TokenMinter` | Singleton (within fixture lifetime) | Holds the ES256 private key (read from `/keys` mount) and exposes `MintToken(claim, overrides?)`. | | `BrokerFixture` | Per-test (only for resilience tests) | `StopBroker()`, `StartBroker()` via `docker exec`. Asserts pre/post state. | | `StreamConsumerFixture` | Per-test (only for stream-consumer tests) | Creates a fresh consumer name, starts at offset `next`, decodes MessagePack + gzip into typed events. | ## Test Data Fixtures | Data Set | Source | Format | Used By | |----------|--------|--------|---------| | Image / video fixtures | Bind-mount `../detections/_docs/00_problem/input_data/` → `/fixtures` (read-only) | JPEG / MP4 binary | All FT-P-* and most FT-N-* | | `classes-baseline` (19 detection classes) | Auto-seeded by `DatabaseMigrator` on `annotations` first boot | DB rows | FT-P-14 (catalog read), every FT-P that references `class_num` | | `mission-test` GUID `00000000-0000-0000-0000-000000000aaa` | Inlined in request payloads | GUID | All annotation-create paths | | Synthetic JPEGs for NFT-RES-LIM-02 | Generated at test time by `LargePayloadFixture` (1, 10, 50, 100, 256, 512 MB) | binary | NFT-RES-LIM-02 | | Bulk rows for NFT-PERF-LIST-01 / NFT-PERF-DATASET-01 (10k annotations, 50k detections) | `dataseed/run.sh` SQL block | DB rows | NFT-PERF-LIST-01, NFT-PERF-DATASET-01 | | Per-test ES256 tokens | `TokenMinter` (in-process minting) | JWT | All FT-* requiring `Authorization` header and all NFT-SEC-* | ### Data Isolation - **Per-class truncation** via `CleanStateFixture` (above). - **Per-test mission GUID** for SSE fan-out tests (FT-P-07, NFT-PERF-SSE-FANOUT-01). - **Per-test stream consumer name** for FT-P-09 and NFT-RES-06. - **Volume reset on `docker compose down -v`** — image / video dirs and the JWKS keypair regenerate. ## Test Reporting **Format**: `.trx` (xUnit native), converted to flat CSV by the runner. **CSV columns**: `test_id`, `test_name`, `category`, `traces_to`, `execution_time_ms`, `result`, `error_message`. **Output path**: `/results/report.csv` and `/results/results.trx` inside the runner; mounted to `./e2e-results/` on the host. `traces_to` is populated from a `[Trait("traces_to", "AC-F-01, HW-02")]` attribute on each test method — the converter reads the attribute and writes a comma-separated cell. This makes the resulting CSV self-describing for the traceability-matrix check at autodev Step 7 (Run Tests). ## Acceptance Criteria **AC-1: Test environment starts** Given a clean clone of the repo on a host with Docker installed, When `./scripts/run-tests.sh` is executed (or equivalent `docker compose -f e2e/docker-compose.test.yml up`), Then `postgres`, `rabbitmq`, `e2e-issuer`, `annotations`, `dataseed`, and `e2e-runner` all start in dependency order, the `annotations` service reaches `healthy`, and the test runner begins discovery. **AC-2: Mock JWKS responds with the in-stack public key** Given the test environment is up, When `wget http://e2e-issuer:8080/.well-known/jwks.json` is executed from the `annotations` container, Then the response is a valid JWKS with exactly one ES256 key whose `kid` matches the private key shared with `e2e-runner`. **AC-3: Token minter mints a valid token end-to-end** Given the test environment is up and `TokenMinter.MintToken("ANN")` is invoked, When the resulting token is presented as `Authorization: Bearer ` on `POST /annotations` with a fixture payload, Then the SUT returns HTTP 200 (token validates against the JWKS-published public key). **AC-4: Truncation fixture isolates classes** Given two test classes that each create one annotation row, When both run within the same test session, Then each class observes an empty `annotations` table at start and the SUT keeps no cross-class state. **AC-5: CSV report generated with required columns** Given a test session has completed, When the runner exits, Then `./e2e-results/report.csv` exists on the host and contains the columns: `test_id`, `test_name`, `category`, `traces_to`, `execution_time_ms`, `result`, `error_message`. **AC-6: Resilience helpers work** Given the test environment is up, When `BrokerFixture.StopBroker()` is invoked from a test, Then `docker exec rabbitmq rabbitmqctl stop_app` succeeds and `BrokerFixture.StartBroker()` reverses it within 5 s; the SUT recovers (subsequent `POST /annotations` returns 200) within the documented backoff window. ## Constraints - `restrictions.md` SW-01: .NET 10 toolchain only — test runner pins `Microsoft.NET.Test.Sdk` to the version compatible with .NET 10. - `restrictions.md` HW-01: ARM64-only — the e2e-runner Dockerfile uses `mcr.microsoft.com/dotnet/sdk:10.0` which is multi-arch. - `restrictions.md` ENV-02: no in-image TLS — the test stack uses plain HTTP; the JWKS HTTPS gate (AZ-561) is satisfied by `ASPNETCORE_ENVIRONMENT=E2ETest`. - Every test must use the Arrange / Act / Assert pattern with `// Arrange`, `// Act`, `// Assert` comments (per `coderule.mdc`). - No mocks for internal services (`AnnotationService`, `FailsafeProducer`, etc.) — every test exercises the real public surface. - No direct writes to the SUT's tables from the runner. Read-only DB access is allowed only for blackbox-documented assertions (outbox row count, queue depth) and must be marked with a `[Trait("db_access", "read-only")]` attribute. ## Risks & Mitigation **Risk 1: Docker socket bind exposes too much** - *Risk*: Mounting `/var/run/docker.sock` into the runner gives it root-equivalent access to the host. Acceptable in CI runners; risky on developer laptops. - *Mitigation*: The socket bind is in `docker-compose.test.yml`'s `e2e-runner` block only (not the SUT). Document that the test stack assumes a CI-like or isolated dev environment. `restrictions.md` does not forbid this. **Risk 2: JWKS keypair freshness** - *Risk*: A stale keypair lingering in the `jwt-keys` volume could cause cryptic JWKS failures between test runs. - *Mitigation*: `mock_issuer.py` regenerates the keypair on every container start if `gen_keys.sh` has not been run in the current container lifetime. `docker compose down -v` between full runs guarantees a fresh key. **Risk 3: Bulk seed slows boot** - *Risk*: 10k annotation rows + 50k detection rows in `dataseed` could push boot from ~5 s to ~30 s. - *Mitigation*: Bulk insert uses `CROSS JOIN generate_series` and a single `COPY FROM STDIN` so the seed completes in <10 s on local hardware. NFT-PERF tests already document a separate boot allowance; functional tests do not depend on the perf seed and run independently if the seed is split into a profile-gated step. ## Self-Verification - [x] Every external dependency from `environment.md` has a mock service defined OR an explicit "real service used" justification (real Postgres, real Rabbit, mock issuer only). - [x] Docker Compose structure covers all services from `environment.md`. - [x] Test data fixtures cover all seed data sets from `test-data.md` (tokens-test, mission-test, classes-baseline, clean-state, runtime-generated big payloads, bulk-perf rows). - [x] Test runner configuration matches SUT tech stack (.NET 10, xUnit, RabbitMQ.Stream.Client at the same NuGet version). - [x] Data isolation strategy is defined (per-class truncate, per-test mission/consumer/token).