Files
annotations/_docs/02_tasks/todo/AZ-564_test_infrastructure.md
T
Oleksandr Bezdieniezhnykh cf632d9e2e [AZ-563] Decompose blackbox tests into AZ-564..574 task specs
Step 5 of autodev existing-code flow. Epic AZ-563 plus 11 atomic
tasks covering all 67 test scenarios from
_docs/02_document/tests/* exactly once:

- AZ-564 test infrastructure (xUnit + Docker + mock JWKS + dataseed)
- AZ-565..568 functional positive (FT-P-01..22)
- AZ-569..570 functional negative (FT-N-01..16)
- AZ-571 security (NFT-SEC-01..10)
- AZ-572 resilience (NFT-RES-01..06)
- AZ-573 resource limits (NFT-RES-LIM-01..06)
- AZ-574 performance (NFT-PERF-*)

_dependencies_table.md records the cross-check vs traceability
matrix (22 + 16 + 29 = 67 scenarios, no overlaps, no gaps; deferred
items remain deferred per matrix). All task headers carry their
Jira IDs (tracker: jira). Autodev state advanced to Step 6
(Implement Tests).

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-14 21:13:53 +03:00

14 KiB

Test Infrastructure

Task: AZ-564 Name: Test Infrastructure (Annotations e2e) Description: Scaffold the executable blackbox test project — xUnit runner, mock JWKS issuer, ES256 key-pair fixture, Docker test stack, fixture mounts, seed script, CSV reporting. After this task lands, every other test task can declare itself a child of this scaffold. Complexity: 5 points Dependencies: AZ-560 (testability refactor — already landed via AZ-561 and AZ-562) Component: Blackbox Tests Tracker: jira Epic: AZ-563 — Blackbox Tests — Annotations

Test Project Folder Layout

tests/
├── Azaion.Annotations.E2E/
│   ├── Azaion.Annotations.E2E.csproj
│   ├── Dockerfile
│   ├── TestBase.cs                       # base class with HttpClient, token helper
│   ├── Fixtures/
│   │   ├── DockerStackFixture.cs         # CollectionFixture — boot order check
│   │   ├── CleanStateFixture.cs          # TRUNCATE between test classes
│   │   ├── BrokerFixture.cs              # RabbitMQ stop/start helpers
│   │   └── TokenMinter.cs                # ES256 token minting via the in-stack key
│   ├── Domain/                           # one file per category (one task per file)
│   │   ├── (populated by AZ-565 ... AZ-573)
│   └── README.md
└── harness/
    ├── mock_issuer.py                    # ~40-line Python http.server (writes JWKS, mounts private key)
    └── gen_keys.sh                       # one-shot ES256 keypair generator (invoked by mock_issuer at boot)

e2e/
├── docker-compose.test.yml               # already produced in autodev Step 3; this task wires the new services into it
├── seed/
│   └── run.sh                            # already drafted in Step 3; this task adds bulk-insert SQL for NFT-PERF-LIST-01 and NFT-PERF-DATASET-01
└── e2e-results/                          # output of test runs (gitignored)

Layout Rationale

  • Tests live under tests/Azaion.Annotations.E2E/ to mirror the .NET convention (sibling of src/).
  • The mock issuer lives in tests/harness/ so it can be shared by smoke / debug stacks without polluting the test runner project.
  • Fixtures are separated from test classes to make the docker-stack boot pattern reusable.
  • All tests are xUnit (matches the SUT runtime; avoids a Python toolchain in CI).

Mock Services

Mock Service Replaces Endpoints Behavior
e2e-issuer (Python http.server) Admin's JWKS issuer GET /.well-known/jwks.json (returns a 1-key JWKS for the in-stack ES256 public key) Static for the lifetime of the docker-compose stack. Public key regenerates per docker compose down -v cycle. No test-time mutability needed — variant tokens (expired / wrong-iss / wrong-aud / alg=HS256 forgery) are minted with overrides by the runner against the same private key (NFT-SEC-01..10 verifies the SUT rejects them).

There are no other mock services. All other infrastructure is real (Postgres 13, RabbitMQ 3.13 streams) — restrictions.md mandates "no mocking of internal services". External dependencies that could be mocked (admin sync worker, AI training consumer) are simply not run because the SUT does not initiate calls to them; it publishes to the stream and the stream is read by the test runner directly.

Mock Control API

Not applicable for this suite. The mock issuer is static; behavior variation is performed by the runner minting different tokens. Broker / DB resilience is performed by docker exec rabbitmq rabbitmqctl stop_app and docker restart postgres invoked from the test runner — driven via .NET's Process.Start against the host docker socket bound into the runner container.

Docker Test Environment

docker-compose.test.yml structure

Service Image / Build Purpose Depends On
postgres postgres:13 SUT's DB
rabbitmq rabbitmq:3.13-management + streams plugin Stream broker
e2e-issuer python:3.12-alpine running tests/harness/mock_issuer.py Mock JWKS issuer + key pair generator
annotations Built from src/Dockerfile SUT postgres (healthy), rabbitmq (healthy), e2e-issuer (healthy)
dataseed postgres:13 (one-shot psql) Loads classes-baseline, mission row, and the bulk rows for NFT-PERF-LIST-01 / NFT-PERF-DATASET-01 annotations (healthy)
e2e-runner Built from tests/Azaion.Annotations.E2E/Dockerfile (.NET SDK 10.0) Test runner (xUnit) dataseed (completed_successfully)

Networks and Volumes

  • Network: e2e-net (bridge, isolated). All services reach each other by service name.
  • Volumes:
    • pg-data — Postgres durability across restart (resilience scenarios).
    • annotations-images, annotations-videos, annotations-deleted — SUT file dirs.
    • jwt-keys — ES256 keypair shared between e2e-issuer (writes public + serves JWKS) and e2e-runner (reads private key for token minting).
  • Bind mount (read-only): ../detections/_docs/00_problem/input_data/fixtures in both the SUT and the runner.

Test runner host-docker access

The runner needs to execute docker exec rabbitmq rabbitmqctl stop_app (NFT-RES-01..03) and docker restart postgres (NFT-RES-02..03). Solution: bind-mount the host docker socket into the runner (/var/run/docker.sock:/var/run/docker.sock) under a RESILIENCE_DOCKER_SOCKET env var; the BrokerFixture / DbFixture use it. This is gated to the test stack — the production SUT never mounts the docker socket.

Test Runner Configuration

Framework: xUnit (matches SUT toolchain — .NET 10). Plugins / NuGet refs:

  • Microsoft.NET.Test.Sdk (xUnit discovery)
  • xunit + xunit.runner.visualstudio
  • RabbitMQ.Stream.Client (same version as src/Azaion.Annotations.csproj)
  • MessagePack (same version) — to decode stream messages for FT-P-09
  • Microsoft.AspNetCore.SignalR.Client — NO, SSE is plain HTTP text/event-stream; we use HttpClient directly
  • System.IdentityModel.Tokens.Jwt — for ES256 minting
  • Npgsql — for direct DB introspection assertions (read-only, documented per test)
  • coverlet.collector — for coverage; not gated on this run but nice to have

Entry point: dotnet test --logger "trx;LogFileName=results.trx" --results-directory /results — followed by a tiny CSV-converter post-step in Dockerfile's ENTRYPOINT that produces /results/report.csv from results.trx.

Fixture Strategy

Fixture Scope Purpose
DockerStackFixture Collection (one per assembly) Smoke-pings /health and waits for JWKS fetch on boot. Does NOT bring the stack up — that's docker compose up's job.
CleanStateFixture Class (per test class) TRUNCATE annotations, media, detection, annotations_queue_records RESTART IDENTITY CASCADE via direct Postgres. Run before first test, again after last.
TokenMinter Singleton (within fixture lifetime) Holds the ES256 private key (read from /keys mount) and exposes MintToken(claim, overrides?).
BrokerFixture Per-test (only for resilience tests) StopBroker(), StartBroker() via docker exec. Asserts pre/post state.
StreamConsumerFixture Per-test (only for stream-consumer tests) Creates a fresh consumer name, starts at offset next, decodes MessagePack + gzip into typed events.

Test Data Fixtures

Data Set Source Format Used By
Image / video fixtures Bind-mount ../detections/_docs/00_problem/input_data//fixtures (read-only) JPEG / MP4 binary All FT-P-* and most FT-N-*
classes-baseline (19 detection classes) Auto-seeded by DatabaseMigrator on annotations first boot DB rows FT-P-14 (catalog read), every FT-P that references class_num
mission-test GUID 00000000-0000-0000-0000-000000000aaa Inlined in request payloads GUID All annotation-create paths
Synthetic JPEGs for NFT-RES-LIM-02 Generated at test time by LargePayloadFixture (1, 10, 50, 100, 256, 512 MB) binary NFT-RES-LIM-02
Bulk rows for NFT-PERF-LIST-01 / NFT-PERF-DATASET-01 (10k annotations, 50k detections) dataseed/run.sh SQL block DB rows NFT-PERF-LIST-01, NFT-PERF-DATASET-01
Per-test ES256 tokens TokenMinter (in-process minting) JWT All FT-* requiring Authorization header and all NFT-SEC-*

Data Isolation

  • Per-class truncation via CleanStateFixture (above).
  • Per-test mission GUID for SSE fan-out tests (FT-P-07, NFT-PERF-SSE-FANOUT-01).
  • Per-test stream consumer name for FT-P-09 and NFT-RES-06.
  • Volume reset on docker compose down -v — image / video dirs and the JWKS keypair regenerate.

Test Reporting

Format: .trx (xUnit native), converted to flat CSV by the runner. CSV columns: test_id, test_name, category, traces_to, execution_time_ms, result, error_message. Output path: /results/report.csv and /results/results.trx inside the runner; mounted to ./e2e-results/ on the host.

traces_to is populated from a [Trait("traces_to", "AC-F-01, HW-02")] attribute on each test method — the converter reads the attribute and writes a comma-separated cell. This makes the resulting CSV self-describing for the traceability-matrix check at autodev Step 7 (Run Tests).

Acceptance Criteria

AC-1: Test environment starts Given a clean clone of the repo on a host with Docker installed, When ./scripts/run-tests.sh is executed (or equivalent docker compose -f e2e/docker-compose.test.yml up), Then postgres, rabbitmq, e2e-issuer, annotations, dataseed, and e2e-runner all start in dependency order, the annotations service reaches healthy, and the test runner begins discovery.

AC-2: Mock JWKS responds with the in-stack public key Given the test environment is up, When wget http://e2e-issuer:8080/.well-known/jwks.json is executed from the annotations container, Then the response is a valid JWKS with exactly one ES256 key whose kid matches the private key shared with e2e-runner.

AC-3: Token minter mints a valid token end-to-end Given the test environment is up and TokenMinter.MintToken("ANN") is invoked, When the resulting token is presented as Authorization: Bearer <token> on POST /annotations with a fixture payload, Then the SUT returns HTTP 200 (token validates against the JWKS-published public key).

AC-4: Truncation fixture isolates classes Given two test classes that each create one annotation row, When both run within the same test session, Then each class observes an empty annotations table at start and the SUT keeps no cross-class state.

AC-5: CSV report generated with required columns Given a test session has completed, When the runner exits, Then ./e2e-results/report.csv exists on the host and contains the columns: test_id, test_name, category, traces_to, execution_time_ms, result, error_message.

AC-6: Resilience helpers work Given the test environment is up, When BrokerFixture.StopBroker() is invoked from a test, Then docker exec rabbitmq rabbitmqctl stop_app succeeds and BrokerFixture.StartBroker() reverses it within 5 s; the SUT recovers (subsequent POST /annotations returns 200) within the documented backoff window.

Constraints

  • restrictions.md SW-01: .NET 10 toolchain only — test runner pins Microsoft.NET.Test.Sdk to the version compatible with .NET 10.
  • restrictions.md HW-01: ARM64-only — the e2e-runner Dockerfile uses mcr.microsoft.com/dotnet/sdk:10.0 which is multi-arch.
  • restrictions.md ENV-02: no in-image TLS — the test stack uses plain HTTP; the JWKS HTTPS gate (AZ-561) is satisfied by ASPNETCORE_ENVIRONMENT=E2ETest.
  • Every test must use the Arrange / Act / Assert pattern with // Arrange, // Act, // Assert comments (per coderule.mdc).
  • No mocks for internal services (AnnotationService, FailsafeProducer, etc.) — every test exercises the real public surface.
  • No direct writes to the SUT's tables from the runner. Read-only DB access is allowed only for blackbox-documented assertions (outbox row count, queue depth) and must be marked with a [Trait("db_access", "read-only")] attribute.

Risks & Mitigation

Risk 1: Docker socket bind exposes too much

  • Risk: Mounting /var/run/docker.sock into the runner gives it root-equivalent access to the host. Acceptable in CI runners; risky on developer laptops.
  • Mitigation: The socket bind is in docker-compose.test.yml's e2e-runner block only (not the SUT). Document that the test stack assumes a CI-like or isolated dev environment. restrictions.md does not forbid this.

Risk 2: JWKS keypair freshness

  • Risk: A stale keypair lingering in the jwt-keys volume could cause cryptic JWKS failures between test runs.
  • Mitigation: mock_issuer.py regenerates the keypair on every container start if gen_keys.sh has not been run in the current container lifetime. docker compose down -v between full runs guarantees a fresh key.

Risk 3: Bulk seed slows boot

  • Risk: 10k annotation rows + 50k detection rows in dataseed could push boot from ~5 s to ~30 s.
  • Mitigation: Bulk insert uses CROSS JOIN generate_series and a single COPY FROM STDIN so the seed completes in <10 s on local hardware. NFT-PERF tests already document a separate boot allowance; functional tests do not depend on the perf seed and run independently if the seed is split into a profile-gated step.

Self-Verification

  • Every external dependency from environment.md has a mock service defined OR an explicit "real service used" justification (real Postgres, real Rabbit, mock issuer only).
  • Docker Compose structure covers all services from environment.md.
  • Test data fixtures cover all seed data sets from test-data.md (tokens-test, mission-test, classes-baseline, clean-state, runtime-generated big payloads, bulk-perf rows).
  • Test runner configuration matches SUT tech stack (.NET 10, xUnit, RabbitMQ.Stream.Client at the same NuGet version).
  • Data isolation strategy is defined (per-class truncate, per-test mission/consumer/token).