autopilot/_docs/02_document/tests/test-data.md

# Test Data Management

Authored by `/test-spec` Phase 2 (2026-05-19). Owns the **mapping** from fixtures to tests, mock data shapes, isolation strategy, and the deferred-fixture inventory bridge.

- Per-row input-to-expected-result binding lives in `_docs/00_problem/input_data/expected_results/results_report.md` — this file references it but never duplicates it.
- Fixture manifest (SHA-pinned files + provenance) lives in `_docs/00_problem/input_data/fixtures/README.md`.
- Per-service mock catalogue (what shape each mock returns) lives in `_docs/00_problem/input_data/services.md`.
- Deferred fixture inventory + replay obligation lives in `_docs/_process_leftovers/2026-05-19_autopilot_test_fixtures.md`.

## Seed data sets

| Data Set | Description | Used by Tests | How Loaded | Cleanup |
|---|---|---|---|---|
| `image-set-existing` | `fixtures/images/{4d6e1830d211ad50,54f6459dbddb93d8,6dd601b7d2dc1b30,805bcf1e9f271a58,f997d0934726b555}.jpg` — 5 aerial frames | FT-P-Tier1Contract, NFT-PERF-L1, NFT-PERF-L2, FT-P-DetectExisting | mounted read-only via `fixtures-ro:/fixtures` on `rtsp-loopback` (encoded to `.mp4` clip) and on `detections-mock` (paired with `expected_detections.json` per frame) | volume detached on container teardown |
| `video-recon` | `fixtures/videos/94d42580bd1ad6ff.mp4` | NFT-PERF-T3 | mounted read-only on `rtsp-loopback`; consumer requests stream at 30 fps then throttles decode + drops frames per scenario script | as above |
| `video-movement` | `fixtures/movement/video0[1-4].mp4` (4 wide-area clips) | FT-P-MoveStarter (visual reference only), FT-P-MoveBenchmark (deferred) | mounted on `rtsp-loopback`; played at 30 fps; consumer schedules which clip per scenario | as above |
| `image-semantic-starter` | `fixtures/semantic/semantic0[1-4].png` (1 winter + 3 unmarked season) | FT-P-ConcealStarter, FT-P-FootpathStarter (visual reference only; assertion semantics deferred) | mounted on `detections-mock` and `rtsp-loopback` as a single-frame loop | as above |
| `schemas-detection` | `fixtures/schemas/expected_detections.{json,schema.json}` | FT-P-Tier1Contract, FT-P-NormalisedBoxes (D6) | mounted on `e2e-consumer:/expected:ro` | as above |
| `sql-init-suite` | `fixtures/sql/init.sql` | NOT USED by autopilot tests (suite-only artefact; recorded here for traceability) | n/a | n/a |
| `mission-suite-fixture` | `<DEFERRED: missions_fixtures/mission_30x30km.json + mapobjects_10k.json; services.md §2>` | FT-P-MissionStart, FT-P-MapPull (Mp1), FT-P-MapPush (Mp3), NFT-RES-Mp2, NFT-RES-Mp4 | mounted on `missions-mock` once acquired | as above |
| `mavlink-sitl-scripts` | scripted `ardupilot/sitl` scenarios (waypoint upload, geofence in/out, RTL on link loss, RTL on battery floor) | FT-P-WaypointInsert (O8), NFT-RES-R4, NFT-RES-R5, NFT-RES-R6, NFT-RES-R7, NFT-RES-R9 | run in `mavlink-sitl` via `--script` argument per scenario | SITL container restarted per scenario |
| `operator-session-scripts` | scripted `(t, event)` traces — nominal, drop+reconnect, lost-link 30 s, sustained lost-link | FT-P-DecisionWindow (O1–O3, O4), FT-P-OperatorDecline (O5), FT-P-OperatorIgnoredSuppress (O6), FT-P-OperatorTimeout (O7), FT-P-OperatorConfirm (O8), NFT-RES-R4 | replayed by `operator-replay` per scenario | per-scenario |
| `operator-envelopes` | `<DEFERRED: operator_envelopes/{valid,replayed,malformed,unsigned,expired}.bin; services.md §8 (Q9-blocked)>` | NFT-SEC-O9, NFT-SEC-O10, FT-P-OperatorConfirm (O8 happy path uses a default placeholder envelope) | replayed by `operator-replay` | per-scenario |
| `vlm-io-pairs` | `<DEFERRED: vlm_io_pairs/{roi,prompt,response}.* + schema-violation cases; services.md §7>` | NFT-PERF-L3, FT-P-DeepAnalysisHold (S5), NFT-SEC-VlmSchemaViolation | mounted on `vlm-mock` | per-scenario |
| `gimbal-csv-pairs` | `<DEFERRED: gimbal_csv/video0[1-4].csv paired with movement videos at zoomed-in band + threshold-edge cluster; services.md §6>` | FT-P-EgoMotion (M1), FT-P-MoveDuringHold (M2), FT-P-ThresholdEdge (M3), FT-P-MoveBenchmark (M4), NFT-PERF-L6, NFT-PERF-L7 | replayed by `gimbal-mock` synchronised to RTSP frame timestamps | per-scenario |
| `tier1-replay-streams` | `<DEFERRED: tier1_replay/*.replay; services.md §1>` | FT-P-Tier1ContractIsolated (Tier B variant); Tier-E uses live `../detections` | served by `detections-mock` | per-scenario |
| `time-drift-scripts` | scripted clock offsets (50 ms ramp, 250 ms jump, NTP loss, GPS unlock) | NFT-RES-R8 | injected by `time-injector` via faketime LD_PRELOAD shim | per-scenario |
| `synthetic-poi-feeds` | inline-authorable: confidence={0.39, 0.40, 0.70, 1.00}, ordering-test feed, sustained-rate feed >5 POI/min | FT-P-DecisionWindow (O1–O4), FT-P-POIOrdering (S4), NFT-PERF-T1 | authored in Rust under `e2e/consumer/fixtures/synthetic_poi/`; pumped into the SUT by injecting recorded `Detections` into `detections-mock` | n/a (in-memory) |
| `bit-scenarios` | inline-authorable: every-dep-green, tier1-unreachable, storage-95pct-full | FT-P-BitPass (R1), NFT-RES-R2, NFT-RES-R3 | manipulated by toggling mock services up/down + `autopilot-state` volume seed file | volume seed file removed |

## Data isolation strategy

- **Per scenario, fresh containers.** Each scenario starts with `docker compose down -v && docker compose up -d` (the `e2e-consumer` orchestrates this via `testcontainers-rs`). No state leaks between scenarios.
- **`autopilot-state` volume** is named per `(test_id, run_id)` so parallel scenario runs do not collide.
- **Deterministic seeds.** Every randomness source in the SUT (POI age-factor tie-breaking, retry jitter, replay-window nonce window) is configured to a per-scenario seed via env vars (`AUTOPILOT_RNG_SEED=<test_id>`). The seed is captured in the CSV report.
- **Wall-clock control.** Scenarios that depend on absolute time (NFT-RES-R8, NFT-RES-R4 grace window, FT-P-DecisionWindow timeouts) use `time-injector` (faketime LD_PRELOAD). The SUT's `time.now()` calls are intercepted; GPS-source state is set via the `mavlink-sitl` GLOBAL_POSITION_INT message stream.
- **Network determinism.** All inter-service traffic stays on the `autopilot-e2e` Docker network (no internet egress). Latency injection (for L9 modem RTT exclusion checks) uses `tc qdisc` inside the `operator-replay` container.
- **No shared mocks between scenarios.** Even when two scenarios use the same fixture, each gets its own mock container instance — this avoids stale state in `missions-mock`'s POST-buffer or `gimbal-mock`'s last-command cache.

## Input data mapping (fixtures → scenarios)

This is the **fixture-side index**; the scenario-side index is in each `*-tests.md` file's `Input data` field.

| Input data file | Source location | Description | Covers scenarios |
|---|---|---|---|
| `fixtures/images/4d6e1830d211ad50.jpg` | `_docs/00_problem/input_data/fixtures/images/` | Aerial frame, 1280 px input | FT-P-Tier1Contract (D6), NFT-PERF-L1, NFT-PERF-L2 |
| `fixtures/images/{54f6...,6dd6...,805b...,f997...}.jpg` | same dir | 4 additional aerial frames for existing-class regression | FT-P-DetectExisting (D2) |
| `fixtures/videos/94d42580bd1ad6ff.mp4` | same dir | Reconnaissance clip, 30 fps; consumer throttles to drop below 10 fps for ≥5 s | NFT-PERF-T3 |
| `fixtures/movement/video01.mp4` | same dir | Wide-area movement clip (visual reference only) | FT-P-EgoMotion (M1) [DEFERRED — needs gimbal.csv] |
| `fixtures/movement/video02.mp4` | same dir | Wide-area movement clip (visual reference only) | FT-P-MoveDuringHold (M2) [DEFERRED — needs zoomed-in gimbal.csv] |
| `fixtures/movement/video03.mp4` | same dir | Wide-area movement clip (visual reference only) | FT-P-ThresholdEdge (M3) [DEFERRED — needs threshold-edge gimbal.csv] |
| `fixtures/movement/video04.mp4` | same dir | Wide-area movement clip (visual reference only) | FT-P-MoveBenchmark (M4) [DEFERRED — needs zoom-band benchmark CSV] |
| `fixtures/semantic/semantic01.png` | same dir | Winter concealed-position reference (starter only) | FT-P-ConcealStarter (D3, D4), FT-P-FootpathStarter (D5) [DEFERRED — needs annotated multi-season set] |
| `fixtures/semantic/semantic0[2-4].png` | same dir | 3 unmarked-season concealed-position references | as above |
| `fixtures/schemas/expected_detections.json` | same dir | Reference output for D6 | FT-P-Tier1Contract (D6), FT-P-NormalisedBoxes |
| `fixtures/schemas/expected_detections.schema.json` | same dir | Schema for normalised-box output | FT-P-NormalisedBoxes, NFT-SEC-Tier1SchemaViolation |
| `fixtures/sql/init.sql` | same dir | (suite-only — recorded for traceability) | none |

## Expected results mapping (scenario → comparison row)

Every scenario in `*-tests.md` traces to a row id in `_docs/00_problem/input_data/expected_results/results_report.md`. The comparison method + tolerance is owned by that row — this table is the **scenario-side index** so a reader can navigate from a test to its assertion contract.

| Scenario ID | Input data | Expected result row | Comparison method | Tolerance | Source |
|---|---|---|---|---|---|
| FT-P-Tier1Contract | `image-set-existing` (1 frame) | `D6` | `schema_match` + `range` | each coord ∈ [0,1] | `fixtures/schemas/expected_detections.schema.json` |
| FT-P-DetectExisting | `image-set-existing` (5 frames) | `D2` | `numeric_tolerance` | ± 0.02 (P, R) | `<DEFERRED: expected_results/existing_classes_baseline.json>` |
| FT-P-DetectNew | `<DEFERRED: new-class eval set>` | `D1` | `threshold_min` | P ≥ 0.80 AND R ≥ 0.80 | `<DEFERRED: expected_results/new_classes_pr.json>` |
| FT-P-ConcealRecall | `image-semantic-starter` + `<DEFERRED: full set>` | `D3` | `threshold_min` | recall ≥ 0.60 | `<DEFERRED: expected_results/concealed_positions.json>` |
| FT-P-ConcealPrecision | same | `D4` | `threshold_min` | precision ≥ 0.20 | same |
| FT-P-FootpathRecall | `image-semantic-starter` + `<DEFERRED>` | `D5` | `threshold_min` | recall ≥ 0.70 | `<DEFERRED: expected_results/footpaths.json>` |
| NFT-PERF-L1 | `image-set-existing` (1 frame) | `L1` | `threshold_max` | ≤ 100 ms | inline |
| NFT-PERF-L2 | derived ROI from same | `L2` | `threshold_max` | ≤ 200 ms | inline |
| NFT-PERF-L3 | `vlm-io-pairs` | `L3` | `threshold_max` | ≤ 5000 ms | inline |
| NFT-PERF-L4 | `<DEFERRED: SITL or HW zoom-cmd capture>` | `L4` | `threshold_max` | ≤ 2000 ms | inline |
| NFT-PERF-L5 | `<DEFERRED: scripted scan→movement>` | `L5` | `threshold_max` | ≤ 500 ms | inline |
| NFT-PERF-L6 | `video-movement` (visual ref) + `<DEFERRED gimbal.csv>` | `L6` | `threshold_max` | ≤ 1000 ms | inline |
| NFT-PERF-L7 | `video-movement` + `<DEFERRED zoomed-in gimbal.csv>` | `L7` | `threshold_max` | ≤ 1500 ms | inline |
| NFT-PERF-L8 | `<DEFERRED: sweep→zoomed transition capture>` | `L8` | `threshold_max` | ≤ 2000 ms | inline |
| NFT-PERF-L9 | `<DEFERRED: operator-click → outbound>` | `L9` | `threshold_max` | ≤ 500 ms | inline |
| NFT-PERF-T1 | `synthetic-poi-feeds` (sustained > cap) | `T1` | `threshold_max` | ≤ 5 / min | inline |
| NFT-PERF-T2 | `<DEFERRED: MAVLink replay 60 s>` | `T2` | `range` | 1 Hz ≤ r ≤ 10 Hz | inline |
| NFT-PERF-T3 | `video-recon` (throttled) | `T3` | `exact` × 2 | suppression bool + health=yellow | inline |
| FT-P-EgoMotion (M1) | `video-movement/video01.mp4` + `<DEFERRED gimbal.csv + telemetry.csv>` | `M1` | `set_contains` | candidate set == {vehicle}; ∉ tree row | inline |
| FT-P-MoveDuringHold (M2) | `video02.mp4` + `<DEFERRED zoomed-in CSV pair>` | `M2` | `exact` | 1 candidate; preempt per priority rule | inline |
| FT-P-ThresholdEdge (M3) | `video03.mp4` + `<DEFERRED threshold-edge CSV>` | `M3` | `exact` | count == 0 | inline |
| FT-P-MoveBenchmark (M4) | `video04.mp4` + `<DEFERRED benchmark suite>` | `M4` | `threshold_max` | per-zoom-band FP rate budget | `<DEFERRED: expected_results/movement_benchmark_caps.json>` |
| FT-P-SweepToZoom (S1) | `<DEFERRED scripted mission + POI>` | `S1` | `exact` × 3 | transition + ROI + queue+=1 | inline |
| FT-P-FootpathPan (S2) | `<DEFERRED hold + footpath polyline>` | `S2` | `numeric_tolerance` | centre offset ≤ 25% per frame | inline |
| FT-P-TargetFollow (S3) | `<DEFERRED confirmed target>` | `S3` | `threshold_max` | per-frame |dx,dy| ≤ 0.125 | inline |
| FT-P-POIOrdering (S4) | `synthetic-poi-feeds` (ordering test) | `S4` | `exact (order)` | ordering matches `conf × prox × age` | inline |
| FT-P-DeepAnalysisHold (S5) | `<DEFERRED VLM-enabled hold>` | `S5` | `exact` | hold = min(5 s, vlm_complete) | inline |
| FT-P-DecisionWindow30s (O1) | `synthetic-poi-feeds` (conf=0.40) | `O1` | `exact` | window = 30 s | inline |
| FT-P-DecisionWindow120s (O2) | conf=1.00 | `O2` | `exact` | window = 120 s | inline |
| FT-P-DecisionWindow75s (O3) | conf=0.70 | `O3` | `numeric_tolerance` | window ≈ 75 s ± 0.5 s | inline |
| FT-N-BelowThreshold (O4) | conf=0.39 | `O4` | `exact` | not surfaced | inline |
| FT-P-OperatorDecline (O5) | `operator-session-scripts` (nominal + decline) | `O5` | `exact (count Δ+1)` + `schema_match` | ignored-item appended | inline |
| FT-P-IgnoredSuppress (O6) | matching MGRS + class_group | `O6` | `exact` | not surfaced | inline |
| FT-P-OperatorTimeout (O7) | no-response + > window | `O7` | `exact` × 2 | queue −1; ignored unchanged | inline |
| FT-P-OperatorConfirm (O8) | `operator-envelopes` (valid happy path) | `O8` | `exact (HTTP 200)` + `exact (mode)` | mission POST + target-follow | inline |
| NFT-SEC-O9 | `operator-envelopes` (replayed) | `O9` | `exact` + `substring` | state unchanged; log contains "replay" | inline |
| NFT-SEC-O10 | `operator-envelopes` (malformed/unsigned) | `O10` | `exact` + `substring` | state unchanged; log contains "invalid" | inline |
| FT-P-BitPass (R1) | `bit-scenarios` (every dep green) | `R1` | `exact` × 2 | takeoff permitted + health all green | inline |
| FT-N-BitDetectionDown (R2) | tier1 unreachable | `R2` | `exact` | takeoff inhibited + detection red | inline |
| FT-N-BitStorageFull (R3) | storage ≥ 95 % | `R3` | `exact` | takeoff inhibited + storage red | inline |
| NFT-RES-R4 | `operator-session-scripts` (sustained lost-link) | `R4` | `exact (RTL at 30 s ± 1 s)` | RTL command + operator-link red | inline |
| NFT-RES-R5 | `mavlink-sitl-scripts` (battery at RTL-floor) | `R5` | `exact` × 2 | RTL + health yellow | inline |
| NFT-RES-R6 | battery at hard-floor | `R6` | `exact` | land-now | inline |
| NFT-RES-R7 | `mavlink-sitl-scripts` (no-response retry exhaustion) | `R7` | `exact` | health red after max-retry | inline |
| NFT-RES-R8 | `time-drift-scripts` (250 ms drift) | `R8` | `exact` | time-source yellow + clock_source/last_sync_at updated | inline |
| NFT-RES-R9 | `mavlink-sitl-scripts` (EXCLUSION cross) | `R9` | `exact` × 2 | waypoint rejected + RTL | inline |
| NFT-RES-LIM-Re1 | `<DEFERRED long-running RSS harness>` | `Re1` | `threshold_max` | combined RSS ≤ 6 GB | inline |
| NFT-RES-LIM-Re2 | Re1 + concurrent Tier-1 traffic | `Re2` | `numeric_tolerance` | Tier-1 ms/frame Δ ± 5 ms | inline |
| FT-P-MapPull (Mp1) | `<DEFERRED 30×30 km area + ~10k mapobjects>` | `Mp1` | `threshold_max` | ≤ 30 s | inline |
| NFT-RES-Mp2 | mock unreachable | `Mp2` | `exact` × 2 | cached_fallback + BIT requires ack | inline |
| FT-P-MapPush (Mp3) | `<DEFERRED 60 min diff>` | `Mp3` | `threshold_max` | ≤ 120 s | inline |
| NFT-RES-Mp4 | POST returns 5xx | `Mp4` | `exact` × 2 + `threshold_max` | file exists + warning + retries ≤ cap | inline |
| FT-P-MapConflict (Mp5) | `<DEFERRED conflict pair>` | `Mp5` | `json_diff` | conflict resolution per Q8 | `<DEFERRED: expected_results/mapobjects_conflict_resolution.json>` |

## External dependency mocks

(Index-only; per-mock acquisition status owned by `services.md`.)

| External service | Mock/stub | How provided | Behavior |
|---|---|---|---|
| `../detections` Tier-1 RPC | `detections-mock` (gRPC bi-stream) | Docker container; serves `.replay` files | Returns recorded `Detections` byte-stream for the input frame's hash; serves a 19-class catalogue (0..18) deterministically; supports schema-violation injection for NFT-SEC tests |
| `missions` API | `missions-mock` (HTTPS FastAPI) | Docker container; TLS via self-signed test cert | Static JSON for `GET /missions/{id}`, `GET /missions/{id}/mapobjects`; records POST bodies for assertion; can be configured to return 5xx for NFT-RES-Mp4 |
| ViewPro A40 RTSP | `rtsp-loopback` (mediamtx) | Docker container | Plays back `.mp4` at scheduled fps with frame-drop injection (T3) |
| ViewPro A40 gimbal | `gimbal-mock` (Rust UDP) | Docker container | Replays `gimbal.csv` synchronised to RTSP frame timestamps; echoes received commands with bounded latency budget |
| ArduPilot | `mavlink-sitl` (official ardupilot/ardupilot-sitl image) | Docker container | Deterministic SITL run from a scripted mission file |
| Ground Station modem | `operator-replay` (Python) | Docker container | Replays `(t, event)` script per scenario; signs envelopes per Q9 once resolved |
| Local VLM | `vlm-mock` (Python over UDS) | Docker container; UDS shared via `/tmp` volume | Returns paired `VlmAssessment` JSON; can return schema-violation responses for NFT-SEC tests |
| Wall-clock / GPS / NTP | `time-injector` (Rust) | LD_PRELOAD faketime shim into the SUT container at start | Scripted offset/jump/source-loss |

## Data validation rules

| Data Type | Validation | Invalid Examples | Expected System Behaviour |
|---|---|---|---|
| Mission JSON | `mission-schema` (shared with `missions` repo) | missing required field; coord out of [-180, 180]; unknown enum value | system refuses; mission-state stays at last-known; health flips mission-config-source = yellow; structured-log at WARN with `schema_violation_field` |
| Map-object record | suite-level mapobjects schema | non-finite coordinate; class_group not in catalogue; missing MGRS | record dropped; counter `mapobjects_rejected_total` increments; structured-log at WARN |
| Tier-1 `Detections` stream | `expected_detections.schema.json` (normalised-box) | bbox coord ∉ [0, 1]; confidence ∉ [0, 1]; class_id ∉ {0..18} | frame's detections dropped (not partially used); `tier1_invalid_frame_total` increments; per AC D6 the system must surface a structured WARN |
| MAVLink message | MAVLink v2 dialect (per ArduPilot) | unknown MSG_ID; CRC mismatch; (if Q6 resolves to "signing on") missing signature | message dropped; if signing required and missing → security WARN; airframe-link health unaffected for individual drops |
| Operator command envelope | Q9 scheme (TBD) | replay (sequence_id seen recently); signature invalid; timestamp outside replay window | rejected at the boundary; no state mutation; security WARN with reason code; counters `operator_cmd_rejected_replay_total`, `..._signature_total`, `..._expired_total` |
| VLM `VlmAssessment` response | structured assessment schema | missing required field; wrong type; truncated JSON | fail-closed: assessment discarded; POI does NOT get the deep-analysis upgrade; structured WARN |
| RTSP frame | container-level decode | malformed H.264/265 NAL; oversized SPS | frame dropped; `frame_decode_error_total` increments; if rate falls below 10 fps for ≥5 s → T3 path triggers (zoom-in suppressed + health yellow) |
| Camera frame size | bounded crop policy (security_approach §Bounded input) | crop > configured max bytes; format not in allow-list | rejected at boundary; security WARN |
| Time source | wall-clock binding | GPS unlocked AND no NTP sync at boot | clock_source = `none`; health red until either source available |

## Deferred-fixture bridge (replay obligation)

Every `<DEFERRED:>` row above maps 1-to-1 to an entry in `_docs/_process_leftovers/2026-05-19_autopilot_test_fixtures.md → "What is needed before /autodev can resume"` table. On every `/autodev` invocation, the leftovers step must re-evaluate whether any deferred fixture has landed; once landed, the corresponding scenario(s) become unblocked and their `Test status` line in the matching `*-tests.md` file moves from `DEFERRED — input fixture not yet acquired` to `READY`.

Inline-authorable categories (10 and 11 in the leftover) — `synthetic-poi-feeds`, `time-drift-scripts`, `operator-session-scripts`, `bit-scenarios` — are NOT marked `<DEFERRED:>` in this file because they have no external dependency. They are authored by Phase 4's `e2e/consumer/fixtures/` generators when the runner scripts come online.