mirror of
https://github.com/azaion/annotations.git
synced 2026-06-22 10:31:06 +00:00
[AZ-563] Decompose blackbox tests into AZ-564..574 task specs
Step 5 of autodev existing-code flow. Epic AZ-563 plus 11 atomic tasks covering all 67 test scenarios from _docs/02_document/tests/* exactly once: - AZ-564 test infrastructure (xUnit + Docker + mock JWKS + dataseed) - AZ-565..568 functional positive (FT-P-01..22) - AZ-569..570 functional negative (FT-N-01..16) - AZ-571 security (NFT-SEC-01..10) - AZ-572 resilience (NFT-RES-01..06) - AZ-573 resource limits (NFT-RES-LIM-01..06) - AZ-574 performance (NFT-PERF-*) _dependencies_table.md records the cross-check vs traceability matrix (22 + 16 + 29 = 67 scenarios, no overlaps, no gaps; deferred items remain deferred per matrix). All task headers carry their Jira IDs (tracker: jira). Autodev state advanced to Step 6 (Implement Tests). Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
@@ -0,0 +1,196 @@
|
||||
# Test Infrastructure
|
||||
|
||||
**Task**: AZ-564
|
||||
**Name**: Test Infrastructure (Annotations e2e)
|
||||
**Description**: Scaffold the executable blackbox test project — xUnit runner, mock JWKS issuer, ES256 key-pair fixture, Docker test stack, fixture mounts, seed script, CSV reporting. After this task lands, every other test task can declare itself a child of this scaffold.
|
||||
**Complexity**: 5 points
|
||||
**Dependencies**: AZ-560 (testability refactor — already landed via AZ-561 and AZ-562)
|
||||
**Component**: Blackbox Tests
|
||||
**Tracker**: jira
|
||||
**Epic**: AZ-563 — `Blackbox Tests — Annotations`
|
||||
|
||||
## Test Project Folder Layout
|
||||
|
||||
```
|
||||
tests/
|
||||
├── Azaion.Annotations.E2E/
|
||||
│ ├── Azaion.Annotations.E2E.csproj
|
||||
│ ├── Dockerfile
|
||||
│ ├── TestBase.cs # base class with HttpClient, token helper
|
||||
│ ├── Fixtures/
|
||||
│ │ ├── DockerStackFixture.cs # CollectionFixture — boot order check
|
||||
│ │ ├── CleanStateFixture.cs # TRUNCATE between test classes
|
||||
│ │ ├── BrokerFixture.cs # RabbitMQ stop/start helpers
|
||||
│ │ └── TokenMinter.cs # ES256 token minting via the in-stack key
|
||||
│ ├── Domain/ # one file per category (one task per file)
|
||||
│ │ ├── (populated by AZ-565 ... AZ-573)
|
||||
│ └── README.md
|
||||
└── harness/
|
||||
├── mock_issuer.py # ~40-line Python http.server (writes JWKS, mounts private key)
|
||||
└── gen_keys.sh # one-shot ES256 keypair generator (invoked by mock_issuer at boot)
|
||||
|
||||
e2e/
|
||||
├── docker-compose.test.yml # already produced in autodev Step 3; this task wires the new services into it
|
||||
├── seed/
|
||||
│ └── run.sh # already drafted in Step 3; this task adds bulk-insert SQL for NFT-PERF-LIST-01 and NFT-PERF-DATASET-01
|
||||
└── e2e-results/ # output of test runs (gitignored)
|
||||
```
|
||||
|
||||
### Layout Rationale
|
||||
|
||||
- Tests live under `tests/Azaion.Annotations.E2E/` to mirror the .NET convention (sibling of `src/`).
|
||||
- The mock issuer lives in `tests/harness/` so it can be shared by smoke / debug stacks without polluting the test runner project.
|
||||
- Fixtures are separated from test classes to make the docker-stack boot pattern reusable.
|
||||
- All tests are xUnit (matches the SUT runtime; avoids a Python toolchain in CI).
|
||||
|
||||
## Mock Services
|
||||
|
||||
| Mock Service | Replaces | Endpoints | Behavior |
|
||||
|--------------|----------|-----------|----------|
|
||||
| `e2e-issuer` (Python `http.server`) | Admin's JWKS issuer | `GET /.well-known/jwks.json` (returns a 1-key JWKS for the in-stack ES256 public key) | Static for the lifetime of the docker-compose stack. Public key regenerates per `docker compose down -v` cycle. No test-time mutability needed — variant tokens (expired / wrong-iss / wrong-aud / `alg=HS256` forgery) are minted with overrides by the runner against the same private key (NFT-SEC-01..10 verifies the SUT rejects them). |
|
||||
|
||||
There are no other mock services. All other infrastructure is real (Postgres 13, RabbitMQ 3.13 streams) — restrictions.md mandates "no mocking of internal services". External dependencies that *could* be mocked (admin sync worker, AI training consumer) are simply not run because the SUT does not initiate calls to them; it publishes to the stream and the stream is read by the test runner directly.
|
||||
|
||||
### Mock Control API
|
||||
|
||||
Not applicable for this suite. The mock issuer is static; behavior variation is performed by the runner minting different tokens. Broker / DB resilience is performed by `docker exec rabbitmq rabbitmqctl stop_app` and `docker restart postgres` invoked from the test runner — driven via .NET's `Process.Start` against the host docker socket bound into the runner container.
|
||||
|
||||
## Docker Test Environment
|
||||
|
||||
### docker-compose.test.yml structure
|
||||
|
||||
| Service | Image / Build | Purpose | Depends On |
|
||||
|---------|---------------|---------|------------|
|
||||
| `postgres` | `postgres:13` | SUT's DB | — |
|
||||
| `rabbitmq` | `rabbitmq:3.13-management` + streams plugin | Stream broker | — |
|
||||
| `e2e-issuer` | `python:3.12-alpine` running `tests/harness/mock_issuer.py` | Mock JWKS issuer + key pair generator | — |
|
||||
| `annotations` | Built from `src/Dockerfile` | SUT | `postgres` (healthy), `rabbitmq` (healthy), `e2e-issuer` (healthy) |
|
||||
| `dataseed` | `postgres:13` (one-shot psql) | Loads `classes-baseline`, mission row, and the bulk rows for NFT-PERF-LIST-01 / NFT-PERF-DATASET-01 | `annotations` (healthy) |
|
||||
| `e2e-runner` | Built from `tests/Azaion.Annotations.E2E/Dockerfile` (.NET SDK 10.0) | Test runner (xUnit) | `dataseed` (completed_successfully) |
|
||||
|
||||
### Networks and Volumes
|
||||
|
||||
- **Network**: `e2e-net` (bridge, isolated). All services reach each other by service name.
|
||||
- **Volumes**:
|
||||
- `pg-data` — Postgres durability across restart (resilience scenarios).
|
||||
- `annotations-images`, `annotations-videos`, `annotations-deleted` — SUT file dirs.
|
||||
- `jwt-keys` — ES256 keypair shared between `e2e-issuer` (writes public + serves JWKS) and `e2e-runner` (reads private key for token minting).
|
||||
- **Bind mount (read-only)**: `../detections/_docs/00_problem/input_data` → `/fixtures` in both the SUT and the runner.
|
||||
|
||||
### Test runner host-docker access
|
||||
|
||||
The runner needs to execute `docker exec rabbitmq rabbitmqctl stop_app` (NFT-RES-01..03) and `docker restart postgres` (NFT-RES-02..03). Solution: bind-mount the host docker socket into the runner (`/var/run/docker.sock:/var/run/docker.sock`) under a `RESILIENCE_DOCKER_SOCKET` env var; the `BrokerFixture` / `DbFixture` use it. This is gated to the test stack — the production SUT never mounts the docker socket.
|
||||
|
||||
## Test Runner Configuration
|
||||
|
||||
**Framework**: xUnit (matches SUT toolchain — .NET 10).
|
||||
**Plugins / NuGet refs**:
|
||||
- `Microsoft.NET.Test.Sdk` (xUnit discovery)
|
||||
- `xunit` + `xunit.runner.visualstudio`
|
||||
- `RabbitMQ.Stream.Client` (same version as `src/Azaion.Annotations.csproj`)
|
||||
- `MessagePack` (same version) — to decode stream messages for FT-P-09
|
||||
- `Microsoft.AspNetCore.SignalR.Client` — NO, SSE is plain HTTP `text/event-stream`; we use `HttpClient` directly
|
||||
- `System.IdentityModel.Tokens.Jwt` — for ES256 minting
|
||||
- `Npgsql` — for direct DB introspection assertions (read-only, documented per test)
|
||||
- `coverlet.collector` — for coverage; not gated on this run but nice to have
|
||||
|
||||
**Entry point**: `dotnet test --logger "trx;LogFileName=results.trx" --results-directory /results` — followed by a tiny CSV-converter post-step in `Dockerfile`'s ENTRYPOINT that produces `/results/report.csv` from `results.trx`.
|
||||
|
||||
### Fixture Strategy
|
||||
|
||||
| Fixture | Scope | Purpose |
|
||||
|---------|-------|---------|
|
||||
| `DockerStackFixture` | Collection (one per assembly) | Smoke-pings `/health` and waits for JWKS fetch on boot. Does NOT bring the stack up — that's `docker compose up`'s job. |
|
||||
| `CleanStateFixture` | Class (per test class) | `TRUNCATE annotations, media, detection, annotations_queue_records RESTART IDENTITY CASCADE` via direct Postgres. Run before first test, again after last. |
|
||||
| `TokenMinter` | Singleton (within fixture lifetime) | Holds the ES256 private key (read from `/keys` mount) and exposes `MintToken(claim, overrides?)`. |
|
||||
| `BrokerFixture` | Per-test (only for resilience tests) | `StopBroker()`, `StartBroker()` via `docker exec`. Asserts pre/post state. |
|
||||
| `StreamConsumerFixture` | Per-test (only for stream-consumer tests) | Creates a fresh consumer name, starts at offset `next`, decodes MessagePack + gzip into typed events. |
|
||||
|
||||
## Test Data Fixtures
|
||||
|
||||
| Data Set | Source | Format | Used By |
|
||||
|----------|--------|--------|---------|
|
||||
| Image / video fixtures | Bind-mount `../detections/_docs/00_problem/input_data/` → `/fixtures` (read-only) | JPEG / MP4 binary | All FT-P-* and most FT-N-* |
|
||||
| `classes-baseline` (19 detection classes) | Auto-seeded by `DatabaseMigrator` on `annotations` first boot | DB rows | FT-P-14 (catalog read), every FT-P that references `class_num` |
|
||||
| `mission-test` GUID `00000000-0000-0000-0000-000000000aaa` | Inlined in request payloads | GUID | All annotation-create paths |
|
||||
| Synthetic JPEGs for NFT-RES-LIM-02 | Generated at test time by `LargePayloadFixture` (1, 10, 50, 100, 256, 512 MB) | binary | NFT-RES-LIM-02 |
|
||||
| Bulk rows for NFT-PERF-LIST-01 / NFT-PERF-DATASET-01 (10k annotations, 50k detections) | `dataseed/run.sh` SQL block | DB rows | NFT-PERF-LIST-01, NFT-PERF-DATASET-01 |
|
||||
| Per-test ES256 tokens | `TokenMinter` (in-process minting) | JWT | All FT-* requiring `Authorization` header and all NFT-SEC-* |
|
||||
|
||||
### Data Isolation
|
||||
|
||||
- **Per-class truncation** via `CleanStateFixture` (above).
|
||||
- **Per-test mission GUID** for SSE fan-out tests (FT-P-07, NFT-PERF-SSE-FANOUT-01).
|
||||
- **Per-test stream consumer name** for FT-P-09 and NFT-RES-06.
|
||||
- **Volume reset on `docker compose down -v`** — image / video dirs and the JWKS keypair regenerate.
|
||||
|
||||
## Test Reporting
|
||||
|
||||
**Format**: `.trx` (xUnit native), converted to flat CSV by the runner.
|
||||
**CSV columns**: `test_id`, `test_name`, `category`, `traces_to`, `execution_time_ms`, `result`, `error_message`.
|
||||
**Output path**: `/results/report.csv` and `/results/results.trx` inside the runner; mounted to `./e2e-results/` on the host.
|
||||
|
||||
`traces_to` is populated from a `[Trait("traces_to", "AC-F-01, HW-02")]` attribute on each test method — the converter reads the attribute and writes a comma-separated cell. This makes the resulting CSV self-describing for the traceability-matrix check at autodev Step 7 (Run Tests).
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
**AC-1: Test environment starts**
|
||||
Given a clean clone of the repo on a host with Docker installed,
|
||||
When `./scripts/run-tests.sh` is executed (or equivalent `docker compose -f e2e/docker-compose.test.yml up`),
|
||||
Then `postgres`, `rabbitmq`, `e2e-issuer`, `annotations`, `dataseed`, and `e2e-runner` all start in dependency order, the `annotations` service reaches `healthy`, and the test runner begins discovery.
|
||||
|
||||
**AC-2: Mock JWKS responds with the in-stack public key**
|
||||
Given the test environment is up,
|
||||
When `wget http://e2e-issuer:8080/.well-known/jwks.json` is executed from the `annotations` container,
|
||||
Then the response is a valid JWKS with exactly one ES256 key whose `kid` matches the private key shared with `e2e-runner`.
|
||||
|
||||
**AC-3: Token minter mints a valid token end-to-end**
|
||||
Given the test environment is up and `TokenMinter.MintToken("ANN")` is invoked,
|
||||
When the resulting token is presented as `Authorization: Bearer <token>` on `POST /annotations` with a fixture payload,
|
||||
Then the SUT returns HTTP 200 (token validates against the JWKS-published public key).
|
||||
|
||||
**AC-4: Truncation fixture isolates classes**
|
||||
Given two test classes that each create one annotation row,
|
||||
When both run within the same test session,
|
||||
Then each class observes an empty `annotations` table at start and the SUT keeps no cross-class state.
|
||||
|
||||
**AC-5: CSV report generated with required columns**
|
||||
Given a test session has completed,
|
||||
When the runner exits,
|
||||
Then `./e2e-results/report.csv` exists on the host and contains the columns: `test_id`, `test_name`, `category`, `traces_to`, `execution_time_ms`, `result`, `error_message`.
|
||||
|
||||
**AC-6: Resilience helpers work**
|
||||
Given the test environment is up,
|
||||
When `BrokerFixture.StopBroker()` is invoked from a test,
|
||||
Then `docker exec rabbitmq rabbitmqctl stop_app` succeeds and `BrokerFixture.StartBroker()` reverses it within 5 s; the SUT recovers (subsequent `POST /annotations` returns 200) within the documented backoff window.
|
||||
|
||||
## Constraints
|
||||
|
||||
- `restrictions.md` SW-01: .NET 10 toolchain only — test runner pins `Microsoft.NET.Test.Sdk` to the version compatible with .NET 10.
|
||||
- `restrictions.md` HW-01: ARM64-only — the e2e-runner Dockerfile uses `mcr.microsoft.com/dotnet/sdk:10.0` which is multi-arch.
|
||||
- `restrictions.md` ENV-02: no in-image TLS — the test stack uses plain HTTP; the JWKS HTTPS gate (AZ-561) is satisfied by `ASPNETCORE_ENVIRONMENT=E2ETest`.
|
||||
- Every test must use the Arrange / Act / Assert pattern with `// Arrange`, `// Act`, `// Assert` comments (per `coderule.mdc`).
|
||||
- No mocks for internal services (`AnnotationService`, `FailsafeProducer`, etc.) — every test exercises the real public surface.
|
||||
- No direct writes to the SUT's tables from the runner. Read-only DB access is allowed only for blackbox-documented assertions (outbox row count, queue depth) and must be marked with a `[Trait("db_access", "read-only")]` attribute.
|
||||
|
||||
## Risks & Mitigation
|
||||
|
||||
**Risk 1: Docker socket bind exposes too much**
|
||||
- *Risk*: Mounting `/var/run/docker.sock` into the runner gives it root-equivalent access to the host. Acceptable in CI runners; risky on developer laptops.
|
||||
- *Mitigation*: The socket bind is in `docker-compose.test.yml`'s `e2e-runner` block only (not the SUT). Document that the test stack assumes a CI-like or isolated dev environment. `restrictions.md` does not forbid this.
|
||||
|
||||
**Risk 2: JWKS keypair freshness**
|
||||
- *Risk*: A stale keypair lingering in the `jwt-keys` volume could cause cryptic JWKS failures between test runs.
|
||||
- *Mitigation*: `mock_issuer.py` regenerates the keypair on every container start if `gen_keys.sh` has not been run in the current container lifetime. `docker compose down -v` between full runs guarantees a fresh key.
|
||||
|
||||
**Risk 3: Bulk seed slows boot**
|
||||
- *Risk*: 10k annotation rows + 50k detection rows in `dataseed` could push boot from ~5 s to ~30 s.
|
||||
- *Mitigation*: Bulk insert uses `CROSS JOIN generate_series` and a single `COPY FROM STDIN` so the seed completes in <10 s on local hardware. NFT-PERF tests already document a separate boot allowance; functional tests do not depend on the perf seed and run independently if the seed is split into a profile-gated step.
|
||||
|
||||
## Self-Verification
|
||||
|
||||
- [x] Every external dependency from `environment.md` has a mock service defined OR an explicit "real service used" justification (real Postgres, real Rabbit, mock issuer only).
|
||||
- [x] Docker Compose structure covers all services from `environment.md`.
|
||||
- [x] Test data fixtures cover all seed data sets from `test-data.md` (tokens-test, mission-test, classes-baseline, clean-state, runtime-generated big payloads, bulk-perf rows).
|
||||
- [x] Test runner configuration matches SUT tech stack (.NET 10, xUnit, RabbitMQ.Stream.Client at the same NuGet version).
|
||||
- [x] Data isolation strategy is defined (per-class truncate, per-test mission/consumer/token).
|
||||
Reference in New Issue
Block a user