Step 5 of autodev existing-code flow. Epic AZ-563 plus 11 atomic tasks covering all 67 test scenarios from _docs/02_document/tests/* exactly once: - AZ-564 test infrastructure (xUnit + Docker + mock JWKS + dataseed) - AZ-565..568 functional positive (FT-P-01..22) - AZ-569..570 functional negative (FT-N-01..16) - AZ-571 security (NFT-SEC-01..10) - AZ-572 resilience (NFT-RES-01..06) - AZ-573 resource limits (NFT-RES-LIM-01..06) - AZ-574 performance (NFT-PERF-*) _dependencies_table.md records the cross-check vs traceability matrix (22 + 16 + 29 = 67 scenarios, no overlaps, no gaps; deferred items remain deferred per matrix). All task headers carry their Jira IDs (tracker: jira). Autodev state advanced to Step 6 (Implement Tests). Co-authored-by: Cursor <cursoragent@cursor.com>
14 KiB
Test Infrastructure
Task: AZ-564
Name: Test Infrastructure (Annotations e2e)
Description: Scaffold the executable blackbox test project — xUnit runner, mock JWKS issuer, ES256 key-pair fixture, Docker test stack, fixture mounts, seed script, CSV reporting. After this task lands, every other test task can declare itself a child of this scaffold.
Complexity: 5 points
Dependencies: AZ-560 (testability refactor — already landed via AZ-561 and AZ-562)
Component: Blackbox Tests
Tracker: jira
Epic: AZ-563 — Blackbox Tests — Annotations
Test Project Folder Layout
tests/
├── Azaion.Annotations.E2E/
│ ├── Azaion.Annotations.E2E.csproj
│ ├── Dockerfile
│ ├── TestBase.cs # base class with HttpClient, token helper
│ ├── Fixtures/
│ │ ├── DockerStackFixture.cs # CollectionFixture — boot order check
│ │ ├── CleanStateFixture.cs # TRUNCATE between test classes
│ │ ├── BrokerFixture.cs # RabbitMQ stop/start helpers
│ │ └── TokenMinter.cs # ES256 token minting via the in-stack key
│ ├── Domain/ # one file per category (one task per file)
│ │ ├── (populated by AZ-565 ... AZ-573)
│ └── README.md
└── harness/
├── mock_issuer.py # ~40-line Python http.server (writes JWKS, mounts private key)
└── gen_keys.sh # one-shot ES256 keypair generator (invoked by mock_issuer at boot)
e2e/
├── docker-compose.test.yml # already produced in autodev Step 3; this task wires the new services into it
├── seed/
│ └── run.sh # already drafted in Step 3; this task adds bulk-insert SQL for NFT-PERF-LIST-01 and NFT-PERF-DATASET-01
└── e2e-results/ # output of test runs (gitignored)
Layout Rationale
- Tests live under
tests/Azaion.Annotations.E2E/to mirror the .NET convention (sibling ofsrc/). - The mock issuer lives in
tests/harness/so it can be shared by smoke / debug stacks without polluting the test runner project. - Fixtures are separated from test classes to make the docker-stack boot pattern reusable.
- All tests are xUnit (matches the SUT runtime; avoids a Python toolchain in CI).
Mock Services
| Mock Service | Replaces | Endpoints | Behavior |
|---|---|---|---|
e2e-issuer (Python http.server) |
Admin's JWKS issuer | GET /.well-known/jwks.json (returns a 1-key JWKS for the in-stack ES256 public key) |
Static for the lifetime of the docker-compose stack. Public key regenerates per docker compose down -v cycle. No test-time mutability needed — variant tokens (expired / wrong-iss / wrong-aud / alg=HS256 forgery) are minted with overrides by the runner against the same private key (NFT-SEC-01..10 verifies the SUT rejects them). |
There are no other mock services. All other infrastructure is real (Postgres 13, RabbitMQ 3.13 streams) — restrictions.md mandates "no mocking of internal services". External dependencies that could be mocked (admin sync worker, AI training consumer) are simply not run because the SUT does not initiate calls to them; it publishes to the stream and the stream is read by the test runner directly.
Mock Control API
Not applicable for this suite. The mock issuer is static; behavior variation is performed by the runner minting different tokens. Broker / DB resilience is performed by docker exec rabbitmq rabbitmqctl stop_app and docker restart postgres invoked from the test runner — driven via .NET's Process.Start against the host docker socket bound into the runner container.
Docker Test Environment
docker-compose.test.yml structure
| Service | Image / Build | Purpose | Depends On |
|---|---|---|---|
postgres |
postgres:13 |
SUT's DB | — |
rabbitmq |
rabbitmq:3.13-management + streams plugin |
Stream broker | — |
e2e-issuer |
python:3.12-alpine running tests/harness/mock_issuer.py |
Mock JWKS issuer + key pair generator | — |
annotations |
Built from src/Dockerfile |
SUT | postgres (healthy), rabbitmq (healthy), e2e-issuer (healthy) |
dataseed |
postgres:13 (one-shot psql) |
Loads classes-baseline, mission row, and the bulk rows for NFT-PERF-LIST-01 / NFT-PERF-DATASET-01 |
annotations (healthy) |
e2e-runner |
Built from tests/Azaion.Annotations.E2E/Dockerfile (.NET SDK 10.0) |
Test runner (xUnit) | dataseed (completed_successfully) |
Networks and Volumes
- Network:
e2e-net(bridge, isolated). All services reach each other by service name. - Volumes:
pg-data— Postgres durability across restart (resilience scenarios).annotations-images,annotations-videos,annotations-deleted— SUT file dirs.jwt-keys— ES256 keypair shared betweene2e-issuer(writes public + serves JWKS) ande2e-runner(reads private key for token minting).
- Bind mount (read-only):
../detections/_docs/00_problem/input_data→/fixturesin both the SUT and the runner.
Test runner host-docker access
The runner needs to execute docker exec rabbitmq rabbitmqctl stop_app (NFT-RES-01..03) and docker restart postgres (NFT-RES-02..03). Solution: bind-mount the host docker socket into the runner (/var/run/docker.sock:/var/run/docker.sock) under a RESILIENCE_DOCKER_SOCKET env var; the BrokerFixture / DbFixture use it. This is gated to the test stack — the production SUT never mounts the docker socket.
Test Runner Configuration
Framework: xUnit (matches SUT toolchain — .NET 10). Plugins / NuGet refs:
Microsoft.NET.Test.Sdk(xUnit discovery)xunit+xunit.runner.visualstudioRabbitMQ.Stream.Client(same version assrc/Azaion.Annotations.csproj)MessagePack(same version) — to decode stream messages for FT-P-09Microsoft.AspNetCore.SignalR.Client— NO, SSE is plain HTTPtext/event-stream; we useHttpClientdirectlySystem.IdentityModel.Tokens.Jwt— for ES256 mintingNpgsql— for direct DB introspection assertions (read-only, documented per test)coverlet.collector— for coverage; not gated on this run but nice to have
Entry point: dotnet test --logger "trx;LogFileName=results.trx" --results-directory /results — followed by a tiny CSV-converter post-step in Dockerfile's ENTRYPOINT that produces /results/report.csv from results.trx.
Fixture Strategy
| Fixture | Scope | Purpose |
|---|---|---|
DockerStackFixture |
Collection (one per assembly) | Smoke-pings /health and waits for JWKS fetch on boot. Does NOT bring the stack up — that's docker compose up's job. |
CleanStateFixture |
Class (per test class) | TRUNCATE annotations, media, detection, annotations_queue_records RESTART IDENTITY CASCADE via direct Postgres. Run before first test, again after last. |
TokenMinter |
Singleton (within fixture lifetime) | Holds the ES256 private key (read from /keys mount) and exposes MintToken(claim, overrides?). |
BrokerFixture |
Per-test (only for resilience tests) | StopBroker(), StartBroker() via docker exec. Asserts pre/post state. |
StreamConsumerFixture |
Per-test (only for stream-consumer tests) | Creates a fresh consumer name, starts at offset next, decodes MessagePack + gzip into typed events. |
Test Data Fixtures
| Data Set | Source | Format | Used By |
|---|---|---|---|
| Image / video fixtures | Bind-mount ../detections/_docs/00_problem/input_data/ → /fixtures (read-only) |
JPEG / MP4 binary | All FT-P-* and most FT-N-* |
classes-baseline (19 detection classes) |
Auto-seeded by DatabaseMigrator on annotations first boot |
DB rows | FT-P-14 (catalog read), every FT-P that references class_num |
mission-test GUID 00000000-0000-0000-0000-000000000aaa |
Inlined in request payloads | GUID | All annotation-create paths |
| Synthetic JPEGs for NFT-RES-LIM-02 | Generated at test time by LargePayloadFixture (1, 10, 50, 100, 256, 512 MB) |
binary | NFT-RES-LIM-02 |
| Bulk rows for NFT-PERF-LIST-01 / NFT-PERF-DATASET-01 (10k annotations, 50k detections) | dataseed/run.sh SQL block |
DB rows | NFT-PERF-LIST-01, NFT-PERF-DATASET-01 |
| Per-test ES256 tokens | TokenMinter (in-process minting) |
JWT | All FT-* requiring Authorization header and all NFT-SEC-* |
Data Isolation
- Per-class truncation via
CleanStateFixture(above). - Per-test mission GUID for SSE fan-out tests (FT-P-07, NFT-PERF-SSE-FANOUT-01).
- Per-test stream consumer name for FT-P-09 and NFT-RES-06.
- Volume reset on
docker compose down -v— image / video dirs and the JWKS keypair regenerate.
Test Reporting
Format: .trx (xUnit native), converted to flat CSV by the runner.
CSV columns: test_id, test_name, category, traces_to, execution_time_ms, result, error_message.
Output path: /results/report.csv and /results/results.trx inside the runner; mounted to ./e2e-results/ on the host.
traces_to is populated from a [Trait("traces_to", "AC-F-01, HW-02")] attribute on each test method — the converter reads the attribute and writes a comma-separated cell. This makes the resulting CSV self-describing for the traceability-matrix check at autodev Step 7 (Run Tests).
Acceptance Criteria
AC-1: Test environment starts
Given a clean clone of the repo on a host with Docker installed,
When ./scripts/run-tests.sh is executed (or equivalent docker compose -f e2e/docker-compose.test.yml up),
Then postgres, rabbitmq, e2e-issuer, annotations, dataseed, and e2e-runner all start in dependency order, the annotations service reaches healthy, and the test runner begins discovery.
AC-2: Mock JWKS responds with the in-stack public key
Given the test environment is up,
When wget http://e2e-issuer:8080/.well-known/jwks.json is executed from the annotations container,
Then the response is a valid JWKS with exactly one ES256 key whose kid matches the private key shared with e2e-runner.
AC-3: Token minter mints a valid token end-to-end
Given the test environment is up and TokenMinter.MintToken("ANN") is invoked,
When the resulting token is presented as Authorization: Bearer <token> on POST /annotations with a fixture payload,
Then the SUT returns HTTP 200 (token validates against the JWKS-published public key).
AC-4: Truncation fixture isolates classes
Given two test classes that each create one annotation row,
When both run within the same test session,
Then each class observes an empty annotations table at start and the SUT keeps no cross-class state.
AC-5: CSV report generated with required columns
Given a test session has completed,
When the runner exits,
Then ./e2e-results/report.csv exists on the host and contains the columns: test_id, test_name, category, traces_to, execution_time_ms, result, error_message.
AC-6: Resilience helpers work
Given the test environment is up,
When BrokerFixture.StopBroker() is invoked from a test,
Then docker exec rabbitmq rabbitmqctl stop_app succeeds and BrokerFixture.StartBroker() reverses it within 5 s; the SUT recovers (subsequent POST /annotations returns 200) within the documented backoff window.
Constraints
restrictions.mdSW-01: .NET 10 toolchain only — test runner pinsMicrosoft.NET.Test.Sdkto the version compatible with .NET 10.restrictions.mdHW-01: ARM64-only — the e2e-runner Dockerfile usesmcr.microsoft.com/dotnet/sdk:10.0which is multi-arch.restrictions.mdENV-02: no in-image TLS — the test stack uses plain HTTP; the JWKS HTTPS gate (AZ-561) is satisfied byASPNETCORE_ENVIRONMENT=E2ETest.- Every test must use the Arrange / Act / Assert pattern with
// Arrange,// Act,// Assertcomments (percoderule.mdc). - No mocks for internal services (
AnnotationService,FailsafeProducer, etc.) — every test exercises the real public surface. - No direct writes to the SUT's tables from the runner. Read-only DB access is allowed only for blackbox-documented assertions (outbox row count, queue depth) and must be marked with a
[Trait("db_access", "read-only")]attribute.
Risks & Mitigation
Risk 1: Docker socket bind exposes too much
- Risk: Mounting
/var/run/docker.sockinto the runner gives it root-equivalent access to the host. Acceptable in CI runners; risky on developer laptops. - Mitigation: The socket bind is in
docker-compose.test.yml'se2e-runnerblock only (not the SUT). Document that the test stack assumes a CI-like or isolated dev environment.restrictions.mddoes not forbid this.
Risk 2: JWKS keypair freshness
- Risk: A stale keypair lingering in the
jwt-keysvolume could cause cryptic JWKS failures between test runs. - Mitigation:
mock_issuer.pyregenerates the keypair on every container start ifgen_keys.shhas not been run in the current container lifetime.docker compose down -vbetween full runs guarantees a fresh key.
Risk 3: Bulk seed slows boot
- Risk: 10k annotation rows + 50k detection rows in
dataseedcould push boot from ~5 s to ~30 s. - Mitigation: Bulk insert uses
CROSS JOIN generate_seriesand a singleCOPY FROM STDINso the seed completes in <10 s on local hardware. NFT-PERF tests already document a separate boot allowance; functional tests do not depend on the perf seed and run independently if the seed is split into a profile-gated step.
Self-Verification
- Every external dependency from
environment.mdhas a mock service defined OR an explicit "real service used" justification (real Postgres, real Rabbit, mock issuer only). - Docker Compose structure covers all services from
environment.md. - Test data fixtures cover all seed data sets from
test-data.md(tokens-test, mission-test, classes-baseline, clean-state, runtime-generated big payloads, bulk-perf rows). - Test runner configuration matches SUT tech stack (.NET 10, xUnit, RabbitMQ.Stream.Client at the same NuGet version).
- Data isolation strategy is defined (per-class truncate, per-test mission/consumer/token).