Phase 6 smoke (Docker, _docs/04_refactoring/01-testability-refactoring/
smoke-compose.yml):
- Annotations app boots clean under ASPNETCORE_ENVIRONMENT=E2ETest.
- /health 200 OK; /annotations with bearer returns 401 with the
JWT library's own malformed-token rejection.
- 0 IDX20108 occurrences in logs (C01 verified).
- 0 IPAddress.Parse FormatException occurrences; FailsafeProducer
reaches the broker via Docker DNS (C02 verified).
- Full smoke report in verification.md.
Phase 7 docs:
- architecture.md: retire Open Risks §6 (testability blocker
resolved). Update the constraints block to describe the
ASPNETCORE_ENVIRONMENT-gated RequireHttps behavior.
- components/06_platform/description.md: one-liner on JwtExtensions
JWKS gating.
- components/02_annotations-realtime-sync/description.md: one-liner
on FailsafeProducer host resolution accepting literal IP or DNS.
- tests/test-data.md: refresh the JWKS URL configuration section to
point at the resolved implementation instead of the open risk.
Task housekeeping:
- _docs/02_tasks/todo/01_*.md -> done/
- _docs/02_tasks/todo/02_*.md -> done/
- _docs/_autodev_state.md: advance to Step 5 (Refactor Backlog Triage).
Tracker IDs remain placeholders pending Atlassian MCP availability —
real IDs to be assigned per
_docs/_process_leftovers/2026-05-14_testability-tracker.md.
Co-authored-by: Cursor <cursoragent@cursor.com>
12 KiB
Test Data Management
Seed Data Sets
| Data Set | Description | Used by Tests | How Loaded | Cleanup |
|---|---|---|---|---|
tokens-test |
3 ES256 access tokens minted on demand by the runner: ann-token (claim ANN), dataset-token (DATASET), adm-token (ADM). All carry iss=$JWT_ISSUER, aud=$JWT_AUDIENCE, exp=now+5m, and a deterministic sub GUID per role. |
F1-N-003, F1-N-004, F5-004, F6-001..006, F7-004, F8-*, NFT-SEC-01..10, FT-N-10..12 | The harness runs a mock JWKS issuer (Python script tests/harness/mock_issuer.py or the equivalent .NET fixture) that publishes the public ES256 key at JWT_JWKS_URL. The runner imports the matching private key as a fixture and mints tokens per test. |
Tokens are short-lived (5m) and never persisted; key pair regenerates on docker compose down -v |
mission-test |
One canonical waypoint id 00000000-0000-0000-0000-000000000aaa used as WaypointId / MissionId in every annotation create. |
All F1, F2, F3, F4, F5, F8 | Implicit — no FK enforcement; the GUID is just a column value. | N/A |
classes-baseline |
The 19 detection classes seeded by DatabaseMigrator (ids 0–18, names per data_parameters.md). |
F7-001 (catalog read), F1-* (class_num references) | Auto, by the SUT's boot-time migrator. | N/A — schema-managed |
clean-state |
Empty annotations, media, detection, annotations_queue_records tables at the start of each test class. |
every test class that asserts on count / depth | xUnit class fixture: TRUNCATE annotations, media, detection, annotations_queue_records RESTART IDENTITY CASCADE; via direct DB connection (out-of-band, runner-only). |
Fixture's Dispose() truncates again |
Data Isolation Strategy
- Per-class truncation — each xUnit test class declares an
IClassFixture<CleanStateFixture>that truncates the four mutable tables before the first test in the class and again after the last. - Per-test token — every test mints its own ES256 token via the mock issuer fixture (see "Bearer token harness" below); tokens never cross test boundaries.
- Per-test mission id — tests that need fan-out isolation (e.g., F3 SSE subscribers) generate a fresh
WaypointIdGUID per test so concurrent test runs don't leak events into each other. - Per-test stream consumer — F4 stream-consumer scenarios use a fresh consumer name per test and start at offset
next(current end of stream). They consume only messages produced after the test starts. - Filesystem isolation —
annotations-images,annotations-videos,annotations-deletedvolumes are recreated bydocker compose down -vbetween full runs. Per-test cleanup removes only files the test wrote (matching<id>patterns).
Input Data Mapping
| Input Data File | Source Location | Description | Covers Scenarios |
|---|---|---|---|
image_small.jpg |
<fixtures>/image_small.jpg |
1280×720 frame, ~1.5 MB | F1-001, F1-002, F1-N-003..005, F2-001/002, F3-001/002, F4-001/002, F5-001/002, F8-* |
image_dense01.jpg |
<fixtures>/image_dense01.jpg |
small dense frame (~230 KB) | F1-004, F5-002, F8-002 |
image_dense02.jpg |
<fixtures>/image_dense02.jpg |
larger dense frame (~2.8 MB) | F5-002 |
image_different_types.jpg |
<fixtures>/image_different_types.jpg |
multi-class scene (900×1600) | F8-002 (class filter) |
image_empty_scene.jpg |
<fixtures>/image_empty_scene.jpg |
1920×1080 empty scene | F1-003 (zero detections), NFT-PERF-* warmup |
image_large.JPG |
<fixtures>/image_large.JPG |
6252×4168, ~7 MB | F1-005 (large payload), NFT-PERF-LATENCY |
video_short01.mp4 |
<fixtures>/video_short01.mp4 |
~150 MB video | F1-006 (video annotation), F1-007 |
video_short02.mp4 |
<fixtures>/video_short02.mp4 |
distinct-bytes second video | F1-007 (distinct bytes → distinct ids) |
<fixtures> resolves to /fixtures inside the test runner / SUT container, bound to ../detections/_docs/00_problem/input_data/ per _docs/00_problem/input_data/fixtures.md.
Synthetic request payloads
JSON request bodies for POST /annotations, PUT /annotations/{id}, POST /dataset/status/bulk, and the auth flows live under _docs/00_problem/input_data/requests/. Each test references a request file by id (F1_001_request.json). Class numbers in detections come from the seeded detection_classes (ids 0–18); coordinates are normalized 0..1 floats.
Expected Results Mapping
(Full table is _docs/00_problem/input_data/expected_results/results_report.md — 44 rows. Selected entries here for cross-reference.)
| Test Scenario ID | Input Data | Expected Result | Comparison Method | Tolerance | Source |
|---|---|---|---|---|---|
| FT-P-01 (=F1-001) | image_small.jpg + F1_001_request.json |
HTTP 200 + AnnotationDto; id =~ /^[0-9a-f]{32}$/; detections.length == 1 |
exact, schema_match, regex | N/A | expected_results/F1_001_response.json |
| FT-P-02 (=F1-002) | Same input, second POST | Same id as FT-P-01; no duplicate row |
exact | N/A | inline |
| FT-P-04 (=F1-004) | image_dense01.jpg + F1_004_request.json |
HTTP 200; detections.length == 5; YOLO label file with 5 lines |
exact, file_content | N/A | expected_results/F1_004_response.json |
| FT-P-10 (=F3-001) | F1-001 fires, SSE subscriber connected | event with operation == "Created", latency ≤ 1000ms |
exact, threshold_max | ± 200ms | inline |
| FT-N-04 (=F1-N-004) | F1-001 with no Authorization header |
HTTP 401 + error envelope | exact, schema_match | N/A | inline |
| NFT-PERF-LATENCY-01 | image_small.jpg × 50 sequential calls |
p95 latency ≤ 1500ms | threshold_max | N/A | inline |
| NFT-RES-01 | RabbitMQ stopped, F1-001 fires | HTTP 200 returned to caller; outbox row stays; SUT stays alive | exact | N/A | inline |
| NFT-SEC-01 | F1-001 with JWT signed by wrong key | HTTP 401 | exact | N/A | inline |
| NFT-RES-LIM-01 | F4 outbox under sustained load | queue depth ≤ 10× steady-state for ≥ 30 min | threshold_max | N/A | inline |
External Dependency Mocks
| External Service | Mock/Stub | How Provided | Behavior |
|---|---|---|---|
| RabbitMQ Stream broker | Real rabbitmq:3.13-management with the streams plugin |
Docker service in e2e-net |
Real broker; resilience tests (NFT-RES-01..03) restart it mid-test using docker exec rabbitmq rabbitmqctl stop_app && start_app |
| Postgres | Real postgres:13 |
Docker service | Real DB; resilience tests (NFT-RES-04) crash and restart it |
| Detections service | Not run | N/A | The annotations service does not call the detections service; tests bypass it by hand-authoring synthetic detections[] payloads in requests/. |
| Suite-level reverse proxy / TLS terminator | Not run | N/A | Tests speak directly to http://annotations:8080. SEC-tests for HTTPS / HSTS therefore explicitly skip with reason "out-of-process for SUT". |
Data Validation Rules
| Data Type | Validation | Invalid Examples | Expected System Behavior |
|---|---|---|---|
image_bytes (POST /annotations) |
non-null, non-empty byte array | empty array [], missing field |
HTTP 400/422; error envelope |
mediaType (POST /annotations) |
enum Image=10 or Video=20 |
5, 100, missing |
HTTP 400/422; error envelope |
detections[].class_num |
int, no range validator today | -1, 999 |
HTTP 200 today (lenient); flagged as gap (SEC-05) |
detections[].centerX/Y/width/height |
float, no range validator today | 1.5, -0.1, NaN |
HTTP 200 today (lenient); flagged as gap (SEC-05) |
Authorization header |
bearer ES256 JWT issued by the mock issuer; validated for issuer / audience / signature / expiry, with alg pinned to ES256 |
missing, wrong issuer, wrong audience, wrong signature, expired, alg=HS256 forgery |
HTTP 401; error envelope |
| Caller policy | ANN, DATASET, or ADM per endpoint |
mismatched policy | HTTP 403; error envelope |
WaypointId (POST /annotations, /media) |
GUID format | not a GUID | HTTP 400/422 from model binder |
| File-upload size (POST /media) | no explicit limit visible at controller; underlying ASP.NET form-options apply | >256 MB single file | likely HTTP 400 from form-options; verify in NFT-RES-LIM-02 |
Runtime-generated test data
Two scenario groups consume synthetic test data generated by the runner at execution time rather than static files on disk. This is intentional and explicitly allowed by templates/expected-results.md ("Test data may be generated programmatically — note this in test-data.md"):
| Scenario | Generated data | How |
|---|---|---|
| NFT-RES-LIM-02 (single-file upload boundary) | Synthetic JPEG-prefixed binary blobs at sizes 1, 10, 50, 100, 256, 512 MB | Runner xUnit fixture writes a temp file: 4-byte JPEG magic header + pseudo-random bytes filling to the target size; uploaded once, deleted after. Files NOT committed to the repo. |
| NFT-PERF-LIST-01, NFT-PERF-DATASET-01 | 10,000 annotations rows + 50,000 detection rows in the test DB |
dataseed job runs a parameterised SQL script that bulk-inserts rows with media_id referencing 100 distinct seeded media rows; uses CROSS JOIN generate_series for speed. Cleared by clean-state truncation between test classes. |
The generated data still satisfies Phase 3 quantifiability: every generated input has a deterministic shape (size, count) AND a quantifiable expected result (HTTP code, latency threshold, returned row count).
Bearer token harness
Annotations is verifier-only — there is no /auth/login to call from a test. The harness reproduces the production model in miniature:
- Key pair — a fresh ES256 key pair is generated when the test stack starts (
docker compose up). The private key is mounted into the runner container; the public key is mounted into a tiny mock issuer sidecar that serves/.well-known/jwks.jsonover HTTP inside the docker-compose network. - JWKS URL configuration — the SUT is started with
JWT_ISSUER=https://e2e-issuer.test,JWT_AUDIENCE=annotations-e2e, andJWT_JWKS_URL=http://e2e-issuer:8080/.well-known/jwks.json. The HTTPS-only constraint ofHttpDocumentRetriever.RequireHttpsis relaxed in source:JwtExtensions.AddJwtAuthsetsRequireHttps = falseif and only ifASPNETCORE_ENVIRONMENT == "E2ETest"(case-insensitive). Any other value — including unset,Development,Staging,Production— keeps HTTPS required. This is the resolved form ofarchitecture.mdOpen Risks §6 (see also_docs/04_refactoring/01-testability-refactoring/testability_changes_summary.mditem C01). - Token minting — the runner exposes a per-test helper
mintToken(claim: "ANN" | "DATASET" | "ADM", overrides?)that builds an ES256 JWT from the in-process private key with the configurediss/aud,exp = now + 5m, a per-role deterministicsubGUID, and the requested policy claim.overrideslets a test produce expired / wrong-iss / wrong-aud / forged-alg=HS256variants for the security suite. - No persisted users — there is no
userstable in this service. Each test mints exactly the token it needs.
Notes for the runner
- Boot order:
postgres→rabbitmq→e2e-issuer(mock JWKS) →annotations(waits for postgres, rabbitmq, and a successful JWKS fetch) →dataseed→e2e-runner. - Fresh-state vs. carry-over: the suite truncates per class, so test ordering inside a class matters; ordering across classes does not.
- Stream consumption: every test that reads from
azaion-annotationsrecords the offset before the test acts, then consumes fromstart_offset = recorded_offset + 1to ignore historical messages. - Conditional probes: tests that depend on SUT behavior decisions (e.g., specific 4xx code on a corner case) include a fixture step that probes the SUT once at class-init, records the actual behavior, then asserts that branch consistently within the test class. Mismatch on a subsequent run flags as a behavior-drift test failure.