# Test Infrastructure

**Task**: AZ-564
**Name**: Test Infrastructure (Annotations e2e)
**Description**: Scaffold the executable blackbox test project — xUnit runner, mock JWKS issuer, ES256 key-pair fixture, Docker test stack, fixture mounts, seed script, CSV reporting. After this task lands, every other test task can declare itself a child of this scaffold.
**Complexity**: 5 points
**Dependencies**: AZ-560 (testability refactor — already landed via AZ-561 and AZ-562)
**Component**: Blackbox Tests
**Tracker**: jira
**Epic**: AZ-563 — `Blackbox Tests — Annotations`

## Test Project Folder Layout

```
tests/
├── Azaion.Annotations.E2E/
│   ├── Azaion.Annotations.E2E.csproj
│   ├── Dockerfile
│   ├── TestBase.cs                       # base class with HttpClient, token helper
│   ├── Fixtures/
│   │   ├── DockerStackFixture.cs         # CollectionFixture — boot order check
│   │   ├── CleanStateFixture.cs          # TRUNCATE between test classes
│   │   ├── BrokerFixture.cs              # RabbitMQ stop/start helpers
│   │   └── TokenMinter.cs                # ES256 token minting via the in-stack key
│   ├── Domain/                           # one file per category (one task per file)
│   │   ├── (populated by AZ-565 ... AZ-573)
│   └── README.md
└── harness/
    ├── mock_issuer.py                    # ~40-line Python http.server (writes JWKS, mounts private key)
    └── gen_keys.sh                       # one-shot ES256 keypair generator (invoked by mock_issuer at boot)

e2e/
├── docker-compose.test.yml               # already produced in autodev Step 3; this task wires the new services into it
├── seed/
│   └── run.sh                            # already drafted in Step 3; this task adds bulk-insert SQL for NFT-PERF-LIST-01 and NFT-PERF-DATASET-01
└── e2e-results/                          # output of test runs (gitignored)
```

### Layout Rationale

- Tests live under `tests/Azaion.Annotations.E2E/` to mirror the .NET convention (sibling of `src/`).
- The mock issuer lives in `tests/harness/` so it can be shared by smoke / debug stacks without polluting the test runner project.
- Fixtures are separated from test classes to make the docker-stack boot pattern reusable.
- All tests are xUnit (matches the SUT runtime; avoids a Python toolchain in CI).

## Mock Services

| Mock Service | Replaces | Endpoints | Behavior |
|--------------|----------|-----------|----------|
| `e2e-issuer` (Python `http.server`) | Admin's JWKS issuer | `GET /.well-known/jwks.json` (returns a 1-key JWKS for the in-stack ES256 public key) | Static for the lifetime of the docker-compose stack. Public key regenerates per `docker compose down -v` cycle. No test-time mutability needed — variant tokens (expired / wrong-iss / wrong-aud / `alg=HS256` forgery) are minted with overrides by the runner against the same private key (NFT-SEC-01..10 verifies the SUT rejects them). |

There are no other mock services. All other infrastructure is real (Postgres 13, RabbitMQ 3.13 streams) — restrictions.md mandates "no mocking of internal services". External dependencies that *could* be mocked (admin sync worker, AI training consumer) are simply not run because the SUT does not initiate calls to them; it publishes to the stream and the stream is read by the test runner directly.

### Mock Control API

Not applicable for this suite. The mock issuer is static; behavior variation is performed by the runner minting different tokens. Broker / DB resilience is performed by `docker exec rabbitmq rabbitmqctl stop_app` and `docker restart postgres` invoked from the test runner — driven via .NET's `Process.Start` against the host docker socket bound into the runner container.

## Docker Test Environment

### docker-compose.test.yml structure

| Service | Image / Build | Purpose | Depends On |
|---------|---------------|---------|------------|
| `postgres` | `postgres:13` | SUT's DB | — |
| `rabbitmq` | `rabbitmq:3.13-management` + streams plugin | Stream broker | — |
| `e2e-issuer` | `python:3.12-alpine` running `tests/harness/mock_issuer.py` | Mock JWKS issuer + key pair generator | — |
| `annotations` | Built from `src/Dockerfile` | SUT | `postgres` (healthy), `rabbitmq` (healthy), `e2e-issuer` (healthy) |
| `dataseed` | `postgres:13` (one-shot psql) | Loads `classes-baseline`, mission row, and the bulk rows for NFT-PERF-LIST-01 / NFT-PERF-DATASET-01 | `annotations` (healthy) |
| `e2e-runner` | Built from `tests/Azaion.Annotations.E2E/Dockerfile` (.NET SDK 10.0) | Test runner (xUnit) | `dataseed` (completed_successfully) |

### Networks and Volumes

- **Network**: `e2e-net` (bridge, isolated). All services reach each other by service name.
- **Volumes**:
  - `pg-data` — Postgres durability across restart (resilience scenarios).
  - `annotations-images`, `annotations-videos`, `annotations-deleted` — SUT file dirs.
  - `jwt-keys` — ES256 keypair shared between `e2e-issuer` (writes public + serves JWKS) and `e2e-runner` (reads private key for token minting).
- **Bind mount (read-only)**: `../detections/_docs/00_problem/input_data` → `/fixtures` in both the SUT and the runner.

### Test runner host-docker access

The runner needs to execute `docker exec rabbitmq rabbitmqctl stop_app` (NFT-RES-01..03) and `docker restart postgres` (NFT-RES-02..03). Solution: bind-mount the host docker socket into the runner (`/var/run/docker.sock:/var/run/docker.sock`) under a `RESILIENCE_DOCKER_SOCKET` env var; the `BrokerFixture` / `DbFixture` use it. This is gated to the test stack — the production SUT never mounts the docker socket.

## Test Runner Configuration

**Framework**: xUnit (matches SUT toolchain — .NET 10).
**Plugins / NuGet refs**:
- `Microsoft.NET.Test.Sdk` (xUnit discovery)
- `xunit` + `xunit.runner.visualstudio`
- `RabbitMQ.Stream.Client` (same version as `src/Azaion.Annotations.csproj`)
- `MessagePack` (same version) — to decode stream messages for FT-P-09
- `Microsoft.AspNetCore.SignalR.Client` — NO, SSE is plain HTTP `text/event-stream`; we use `HttpClient` directly
- `System.IdentityModel.Tokens.Jwt` — for ES256 minting
- `Npgsql` — for direct DB introspection assertions (read-only, documented per test)
- `coverlet.collector` — for coverage; not gated on this run but nice to have

**Entry point**: `dotnet test --logger "trx;LogFileName=results.trx" --results-directory /results` — followed by a tiny CSV-converter post-step in `Dockerfile`'s ENTRYPOINT that produces `/results/report.csv` from `results.trx`.

### Fixture Strategy

| Fixture | Scope | Purpose |
|---------|-------|---------|
| `DockerStackFixture` | Collection (one per assembly) | Smoke-pings `/health` and waits for JWKS fetch on boot. Does NOT bring the stack up — that's `docker compose up`'s job. |
| `CleanStateFixture` | Class (per test class) | `TRUNCATE annotations, media, detection, annotations_queue_records RESTART IDENTITY CASCADE` via direct Postgres. Run before first test, again after last. |
| `TokenMinter` | Singleton (within fixture lifetime) | Holds the ES256 private key (read from `/keys` mount) and exposes `MintToken(claim, overrides?)`. |
| `BrokerFixture` | Per-test (only for resilience tests) | `StopBroker()`, `StartBroker()` via `docker exec`. Asserts pre/post state. |
| `StreamConsumerFixture` | Per-test (only for stream-consumer tests) | Creates a fresh consumer name, starts at offset `next`, decodes MessagePack + gzip into typed events. |

## Test Data Fixtures

| Data Set | Source | Format | Used By |
|----------|--------|--------|---------|
| Image / video fixtures | Bind-mount `../detections/_docs/00_problem/input_data/` → `/fixtures` (read-only) | JPEG / MP4 binary | All FT-P-* and most FT-N-* |
| `classes-baseline` (19 detection classes) | Auto-seeded by `DatabaseMigrator` on `annotations` first boot | DB rows | FT-P-14 (catalog read), every FT-P that references `class_num` |
| `mission-test` GUID `00000000-0000-0000-0000-000000000aaa` | Inlined in request payloads | GUID | All annotation-create paths |
| Synthetic JPEGs for NFT-RES-LIM-02 | Generated at test time by `LargePayloadFixture` (1, 10, 50, 100, 256, 512 MB) | binary | NFT-RES-LIM-02 |
| Bulk rows for NFT-PERF-LIST-01 / NFT-PERF-DATASET-01 (10k annotations, 50k detections) | `dataseed/run.sh` SQL block | DB rows | NFT-PERF-LIST-01, NFT-PERF-DATASET-01 |
| Per-test ES256 tokens | `TokenMinter` (in-process minting) | JWT | All FT-* requiring `Authorization` header and all NFT-SEC-* |

### Data Isolation

- **Per-class truncation** via `CleanStateFixture` (above).
- **Per-test mission GUID** for SSE fan-out tests (FT-P-07, NFT-PERF-SSE-FANOUT-01).
- **Per-test stream consumer name** for FT-P-09 and NFT-RES-06.
- **Volume reset on `docker compose down -v`** — image / video dirs and the JWKS keypair regenerate.

## Test Reporting

**Format**: `.trx` (xUnit native), converted to flat CSV by the runner.
**CSV columns**: `test_id`, `test_name`, `category`, `traces_to`, `execution_time_ms`, `result`, `error_message`.
**Output path**: `/results/report.csv` and `/results/results.trx` inside the runner; mounted to `./e2e-results/` on the host.

`traces_to` is populated from a `[Trait("traces_to", "AC-F-01, HW-02")]` attribute on each test method — the converter reads the attribute and writes a comma-separated cell. This makes the resulting CSV self-describing for the traceability-matrix check at autodev Step 7 (Run Tests).

## Acceptance Criteria

**AC-1: Test environment starts**
Given a clean clone of the repo on a host with Docker installed,
When `./scripts/run-tests.sh` is executed (or equivalent `docker compose -f e2e/docker-compose.test.yml up`),
Then `postgres`, `rabbitmq`, `e2e-issuer`, `annotations`, `dataseed`, and `e2e-runner` all start in dependency order, the `annotations` service reaches `healthy`, and the test runner begins discovery.

**AC-2: Mock JWKS responds with the in-stack public key**
Given the test environment is up,
When `wget http://e2e-issuer:8080/.well-known/jwks.json` is executed from the `annotations` container,
Then the response is a valid JWKS with exactly one ES256 key whose `kid` matches the private key shared with `e2e-runner`.

**AC-3: Token minter mints a valid token end-to-end**
Given the test environment is up and `TokenMinter.MintToken("ANN")` is invoked,
When the resulting token is presented as `Authorization: Bearer <token>` on `POST /annotations` with a fixture payload,
Then the SUT returns HTTP 200 (token validates against the JWKS-published public key).

**AC-4: Truncation fixture isolates classes**
Given two test classes that each create one annotation row,
When both run within the same test session,
Then each class observes an empty `annotations` table at start and the SUT keeps no cross-class state.

**AC-5: CSV report generated with required columns**
Given a test session has completed,
When the runner exits,
Then `./e2e-results/report.csv` exists on the host and contains the columns: `test_id`, `test_name`, `category`, `traces_to`, `execution_time_ms`, `result`, `error_message`.

**AC-6: Resilience helpers work**
Given the test environment is up,
When `BrokerFixture.StopBroker()` is invoked from a test,
Then `docker exec rabbitmq rabbitmqctl stop_app` succeeds and `BrokerFixture.StartBroker()` reverses it within 5 s; the SUT recovers (subsequent `POST /annotations` returns 200) within the documented backoff window.

## Constraints

- `restrictions.md` SW-01: .NET 10 toolchain only — test runner pins `Microsoft.NET.Test.Sdk` to the version compatible with .NET 10.
- `restrictions.md` HW-01: ARM64-only — the e2e-runner Dockerfile uses `mcr.microsoft.com/dotnet/sdk:10.0` which is multi-arch.
- `restrictions.md` ENV-02: no in-image TLS — the test stack uses plain HTTP; the JWKS HTTPS gate (AZ-561) is satisfied by `ASPNETCORE_ENVIRONMENT=E2ETest`.
- Every test must use the Arrange / Act / Assert pattern with `// Arrange`, `// Act`, `// Assert` comments (per `coderule.mdc`).
- No mocks for internal services (`AnnotationService`, `FailsafeProducer`, etc.) — every test exercises the real public surface.
- No direct writes to the SUT's tables from the runner. Read-only DB access is allowed only for blackbox-documented assertions (outbox row count, queue depth) and must be marked with a `[Trait("db_access", "read-only")]` attribute.

## Risks & Mitigation

**Risk 1: Docker socket bind exposes too much**
- *Risk*: Mounting `/var/run/docker.sock` into the runner gives it root-equivalent access to the host. Acceptable in CI runners; risky on developer laptops.
- *Mitigation*: The socket bind is in `docker-compose.test.yml`'s `e2e-runner` block only (not the SUT). Document that the test stack assumes a CI-like or isolated dev environment. `restrictions.md` does not forbid this.

**Risk 2: JWKS keypair freshness**
- *Risk*: A stale keypair lingering in the `jwt-keys` volume could cause cryptic JWKS failures between test runs.
- *Mitigation*: `mock_issuer.py` regenerates the keypair on every container start if `gen_keys.sh` has not been run in the current container lifetime. `docker compose down -v` between full runs guarantees a fresh key.

**Risk 3: Bulk seed slows boot**
- *Risk*: 10k annotation rows + 50k detection rows in `dataseed` could push boot from ~5 s to ~30 s.
- *Mitigation*: Bulk insert uses `CROSS JOIN generate_series` and a single `COPY FROM STDIN` so the seed completes in <10 s on local hardware. NFT-PERF tests already document a separate boot allowance; functional tests do not depend on the perf seed and run independently if the seed is split into a profile-gated step.

## Self-Verification

- [x] Every external dependency from `environment.md` has a mock service defined OR an explicit "real service used" justification (real Postgres, real Rabbit, mock issuer only).
- [x] Docker Compose structure covers all services from `environment.md`.
- [x] Test data fixtures cover all seed data sets from `test-data.md` (tokens-test, mission-test, classes-baseline, clean-state, runtime-generated big payloads, bulk-perf rows).
- [x] Test runner configuration matches SUT tech stack (.NET 10, xUnit, RabbitMQ.Stream.Client at the same NuGet version).
- [x] Data isolation strategy is defined (per-class truncate, per-test mission/consumer/token).