# Test Data Management Authored by `/test-spec` Phase 2 (2026-05-19). Owns the **mapping** from fixtures to tests, mock data shapes, isolation strategy, and the deferred-fixture inventory bridge. - Per-row input-to-expected-result binding lives in `_docs/00_problem/input_data/expected_results/results_report.md` — this file references it but never duplicates it. - Fixture manifest (SHA-pinned files + provenance) lives in `_docs/00_problem/input_data/fixtures/README.md`. - Per-service mock catalogue (what shape each mock returns) lives in `_docs/00_problem/input_data/services.md`. - Deferred fixture inventory + replay obligation lives in `_docs/_process_leftovers/2026-05-19_autopilot_test_fixtures.md`. ## Seed data sets | Data Set | Description | Used by Tests | How Loaded | Cleanup | |---|---|---|---|---| | `image-set-existing` | `fixtures/images/{4d6e1830d211ad50,54f6459dbddb93d8,6dd601b7d2dc1b30,805bcf1e9f271a58,f997d0934726b555}.jpg` — 5 aerial frames | FT-P-Tier1Contract, NFT-PERF-L1, NFT-PERF-L2, FT-P-DetectExisting | mounted read-only via `fixtures-ro:/fixtures` on `rtsp-loopback` (encoded to `.mp4` clip) and on `detections-mock` (paired with `expected_detections.json` per frame) | volume detached on container teardown | | `video-recon` | `fixtures/videos/94d42580bd1ad6ff.mp4` | NFT-PERF-T3 | mounted read-only on `rtsp-loopback`; consumer requests stream at 30 fps then throttles decode + drops frames per scenario script | as above | | `video-movement` | `fixtures/movement/video0[1-4].mp4` (4 wide-area clips) | FT-P-MoveStarter (visual reference only), FT-P-MoveBenchmark (deferred) | mounted on `rtsp-loopback`; played at 30 fps; consumer schedules which clip per scenario | as above | | `image-semantic-starter` | `fixtures/semantic/semantic0[1-4].png` (1 winter + 3 unmarked season) | FT-P-ConcealStarter, FT-P-FootpathStarter (visual reference only; assertion semantics deferred) | mounted on `detections-mock` and `rtsp-loopback` as a single-frame loop | as above | | `schemas-detection` | `fixtures/schemas/expected_detections.{json,schema.json}` | FT-P-Tier1Contract, FT-P-NormalisedBoxes (D6) | mounted on `e2e-consumer:/expected:ro` | as above | | `sql-init-suite` | `fixtures/sql/init.sql` | NOT USED by autopilot tests (suite-only artefact; recorded here for traceability) | n/a | n/a | | `mission-suite-fixture` | `` | FT-P-MissionStart, FT-P-MapPull (Mp1), FT-P-MapPush (Mp3), NFT-RES-Mp2, NFT-RES-Mp4 | mounted on `missions-mock` once acquired | as above | | `mavlink-sitl-scripts` | scripted `ardupilot/sitl` scenarios (waypoint upload, geofence in/out, RTL on link loss, RTL on battery floor) | FT-P-WaypointInsert (O8), NFT-RES-R4, NFT-RES-R5, NFT-RES-R6, NFT-RES-R7, NFT-RES-R9 | run in `mavlink-sitl` via `--script` argument per scenario | SITL container restarted per scenario | | `operator-session-scripts` | scripted `(t, event)` traces — nominal, drop+reconnect, lost-link 30 s, sustained lost-link | FT-P-DecisionWindow (O1–O3, O4), FT-P-OperatorDecline (O5), FT-P-OperatorIgnoredSuppress (O6), FT-P-OperatorTimeout (O7), FT-P-OperatorConfirm (O8), NFT-RES-R4 | replayed by `operator-replay` per scenario | per-scenario | | `operator-envelopes` | `` | NFT-SEC-O9, NFT-SEC-O10, FT-P-OperatorConfirm (O8 happy path uses a default placeholder envelope) | replayed by `operator-replay` | per-scenario | | `vlm-io-pairs` | `` | NFT-PERF-L3, FT-P-DeepAnalysisHold (S5), NFT-SEC-VlmSchemaViolation | mounted on `vlm-mock` | per-scenario | | `gimbal-csv-pairs` | `` | FT-P-EgoMotion (M1), FT-P-MoveDuringHold (M2), FT-P-ThresholdEdge (M3), FT-P-MoveBenchmark (M4), NFT-PERF-L6, NFT-PERF-L7 | replayed by `gimbal-mock` synchronised to RTSP frame timestamps | per-scenario | | `tier1-replay-streams` | `` | FT-P-Tier1ContractIsolated (Tier B variant); Tier-E uses live `../detections` | served by `detections-mock` | per-scenario | | `time-drift-scripts` | scripted clock offsets (50 ms ramp, 250 ms jump, NTP loss, GPS unlock) | NFT-RES-R8 | injected by `time-injector` via faketime LD_PRELOAD shim | per-scenario | | `synthetic-poi-feeds` | inline-authorable: confidence={0.39, 0.40, 0.70, 1.00}, ordering-test feed, sustained-rate feed >5 POI/min | FT-P-DecisionWindow (O1–O4), FT-P-POIOrdering (S4), NFT-PERF-T1 | authored in Rust under `e2e/consumer/fixtures/synthetic_poi/`; pumped into the SUT by injecting recorded `Detections` into `detections-mock` | n/a (in-memory) | | `bit-scenarios` | inline-authorable: every-dep-green, tier1-unreachable, storage-95pct-full | FT-P-BitPass (R1), NFT-RES-R2, NFT-RES-R3 | manipulated by toggling mock services up/down + `autopilot-state` volume seed file | volume seed file removed | ## Data isolation strategy - **Per scenario, fresh containers.** Each scenario starts with `docker compose down -v && docker compose up -d` (the `e2e-consumer` orchestrates this via `testcontainers-rs`). No state leaks between scenarios. - **`autopilot-state` volume** is named per `(test_id, run_id)` so parallel scenario runs do not collide. - **Deterministic seeds.** Every randomness source in the SUT (POI age-factor tie-breaking, retry jitter, replay-window nonce window) is configured to a per-scenario seed via env vars (`AUTOPILOT_RNG_SEED=`). The seed is captured in the CSV report. - **Wall-clock control.** Scenarios that depend on absolute time (NFT-RES-R8, NFT-RES-R4 grace window, FT-P-DecisionWindow timeouts) use `time-injector` (faketime LD_PRELOAD). The SUT's `time.now()` calls are intercepted; GPS-source state is set via the `mavlink-sitl` GLOBAL_POSITION_INT message stream. - **Network determinism.** All inter-service traffic stays on the `autopilot-e2e` Docker network (no internet egress). Latency injection (for L9 modem RTT exclusion checks) uses `tc qdisc` inside the `operator-replay` container. - **No shared mocks between scenarios.** Even when two scenarios use the same fixture, each gets its own mock container instance — this avoids stale state in `missions-mock`'s POST-buffer or `gimbal-mock`'s last-command cache. ## Input data mapping (fixtures → scenarios) This is the **fixture-side index**; the scenario-side index is in each `*-tests.md` file's `Input data` field. | Input data file | Source location | Description | Covers scenarios | |---|---|---|---| | `fixtures/images/4d6e1830d211ad50.jpg` | `_docs/00_problem/input_data/fixtures/images/` | Aerial frame, 1280 px input | FT-P-Tier1Contract (D6), NFT-PERF-L1, NFT-PERF-L2 | | `fixtures/images/{54f6...,6dd6...,805b...,f997...}.jpg` | same dir | 4 additional aerial frames for existing-class regression | FT-P-DetectExisting (D2) | | `fixtures/videos/94d42580bd1ad6ff.mp4` | same dir | Reconnaissance clip, 30 fps; consumer throttles to drop below 10 fps for ≥5 s | NFT-PERF-T3 | | `fixtures/movement/video01.mp4` | same dir | Wide-area movement clip (visual reference only) | FT-P-EgoMotion (M1) [DEFERRED — needs gimbal.csv] | | `fixtures/movement/video02.mp4` | same dir | Wide-area movement clip (visual reference only) | FT-P-MoveDuringHold (M2) [DEFERRED — needs zoomed-in gimbal.csv] | | `fixtures/movement/video03.mp4` | same dir | Wide-area movement clip (visual reference only) | FT-P-ThresholdEdge (M3) [DEFERRED — needs threshold-edge gimbal.csv] | | `fixtures/movement/video04.mp4` | same dir | Wide-area movement clip (visual reference only) | FT-P-MoveBenchmark (M4) [DEFERRED — needs zoom-band benchmark CSV] | | `fixtures/semantic/semantic01.png` | same dir | Winter concealed-position reference (starter only) | FT-P-ConcealStarter (D3, D4), FT-P-FootpathStarter (D5) [DEFERRED — needs annotated multi-season set] | | `fixtures/semantic/semantic0[2-4].png` | same dir | 3 unmarked-season concealed-position references | as above | | `fixtures/schemas/expected_detections.json` | same dir | Reference output for D6 | FT-P-Tier1Contract (D6), FT-P-NormalisedBoxes | | `fixtures/schemas/expected_detections.schema.json` | same dir | Schema for normalised-box output | FT-P-NormalisedBoxes, NFT-SEC-Tier1SchemaViolation | | `fixtures/sql/init.sql` | same dir | (suite-only — recorded for traceability) | none | ## Expected results mapping (scenario → comparison row) Every scenario in `*-tests.md` traces to a row id in `_docs/00_problem/input_data/expected_results/results_report.md`. The comparison method + tolerance is owned by that row — this table is the **scenario-side index** so a reader can navigate from a test to its assertion contract. | Scenario ID | Input data | Expected result row | Comparison method | Tolerance | Source | |---|---|---|---|---|---| | FT-P-Tier1Contract | `image-set-existing` (1 frame) | `D6` | `schema_match` + `range` | each coord ∈ [0,1] | `fixtures/schemas/expected_detections.schema.json` | | FT-P-DetectExisting | `image-set-existing` (5 frames) | `D2` | `numeric_tolerance` | ± 0.02 (P, R) | `` | | FT-P-DetectNew | `` | `D1` | `threshold_min` | P ≥ 0.80 AND R ≥ 0.80 | `` | | FT-P-ConcealRecall | `image-semantic-starter` + `` | `D3` | `threshold_min` | recall ≥ 0.60 | `` | | FT-P-ConcealPrecision | same | `D4` | `threshold_min` | precision ≥ 0.20 | same | | FT-P-FootpathRecall | `image-semantic-starter` + `` | `D5` | `threshold_min` | recall ≥ 0.70 | `` | | NFT-PERF-L1 | `image-set-existing` (1 frame) | `L1` | `threshold_max` | ≤ 100 ms | inline | | NFT-PERF-L2 | derived ROI from same | `L2` | `threshold_max` | ≤ 200 ms | inline | | NFT-PERF-L3 | `vlm-io-pairs` | `L3` | `threshold_max` | ≤ 5000 ms | inline | | NFT-PERF-L4 | `` | `L4` | `threshold_max` | ≤ 2000 ms | inline | | NFT-PERF-L5 | `` | `L5` | `threshold_max` | ≤ 500 ms | inline | | NFT-PERF-L6 | `video-movement` (visual ref) + `` | `L6` | `threshold_max` | ≤ 1000 ms | inline | | NFT-PERF-L7 | `video-movement` + `` | `L7` | `threshold_max` | ≤ 1500 ms | inline | | NFT-PERF-L8 | `` | `L8` | `threshold_max` | ≤ 2000 ms | inline | | NFT-PERF-L9 | `` | `L9` | `threshold_max` | ≤ 500 ms | inline | | NFT-PERF-T1 | `synthetic-poi-feeds` (sustained > cap) | `T1` | `threshold_max` | ≤ 5 / min | inline | | NFT-PERF-T2 | `` | `T2` | `range` | 1 Hz ≤ r ≤ 10 Hz | inline | | NFT-PERF-T3 | `video-recon` (throttled) | `T3` | `exact` × 2 | suppression bool + health=yellow | inline | | FT-P-EgoMotion (M1) | `video-movement/video01.mp4` + `` | `M1` | `set_contains` | candidate set == {vehicle}; ∉ tree row | inline | | FT-P-MoveDuringHold (M2) | `video02.mp4` + `` | `M2` | `exact` | 1 candidate; preempt per priority rule | inline | | FT-P-ThresholdEdge (M3) | `video03.mp4` + `` | `M3` | `exact` | count == 0 | inline | | FT-P-MoveBenchmark (M4) | `video04.mp4` + `` | `M4` | `threshold_max` | per-zoom-band FP rate budget | `` | | FT-P-SweepToZoom (S1) | `` | `S1` | `exact` × 3 | transition + ROI + queue+=1 | inline | | FT-P-FootpathPan (S2) | `` | `S2` | `numeric_tolerance` | centre offset ≤ 25% per frame | inline | | FT-P-TargetFollow (S3) | `` | `S3` | `threshold_max` | per-frame |dx,dy| ≤ 0.125 | inline | | FT-P-POIOrdering (S4) | `synthetic-poi-feeds` (ordering test) | `S4` | `exact (order)` | ordering matches `conf × prox × age` | inline | | FT-P-DeepAnalysisHold (S5) | `` | `S5` | `exact` | hold = min(5 s, vlm_complete) | inline | | FT-P-DecisionWindow30s (O1) | `synthetic-poi-feeds` (conf=0.40) | `O1` | `exact` | window = 30 s | inline | | FT-P-DecisionWindow120s (O2) | conf=1.00 | `O2` | `exact` | window = 120 s | inline | | FT-P-DecisionWindow75s (O3) | conf=0.70 | `O3` | `numeric_tolerance` | window ≈ 75 s ± 0.5 s | inline | | FT-N-BelowThreshold (O4) | conf=0.39 | `O4` | `exact` | not surfaced | inline | | FT-P-OperatorDecline (O5) | `operator-session-scripts` (nominal + decline) | `O5` | `exact (count Δ+1)` + `schema_match` | ignored-item appended | inline | | FT-P-IgnoredSuppress (O6) | matching MGRS + class_group | `O6` | `exact` | not surfaced | inline | | FT-P-OperatorTimeout (O7) | no-response + > window | `O7` | `exact` × 2 | queue −1; ignored unchanged | inline | | FT-P-OperatorConfirm (O8) | `operator-envelopes` (valid happy path) | `O8` | `exact (HTTP 200)` + `exact (mode)` | mission POST + target-follow | inline | | NFT-SEC-O9 | `operator-envelopes` (replayed) | `O9` | `exact` + `substring` | state unchanged; log contains "replay" | inline | | NFT-SEC-O10 | `operator-envelopes` (malformed/unsigned) | `O10` | `exact` + `substring` | state unchanged; log contains "invalid" | inline | | FT-P-BitPass (R1) | `bit-scenarios` (every dep green) | `R1` | `exact` × 2 | takeoff permitted + health all green | inline | | FT-N-BitDetectionDown (R2) | tier1 unreachable | `R2` | `exact` | takeoff inhibited + detection red | inline | | FT-N-BitStorageFull (R3) | storage ≥ 95 % | `R3` | `exact` | takeoff inhibited + storage red | inline | | NFT-RES-R4 | `operator-session-scripts` (sustained lost-link) | `R4` | `exact (RTL at 30 s ± 1 s)` | RTL command + operator-link red | inline | | NFT-RES-R5 | `mavlink-sitl-scripts` (battery at RTL-floor) | `R5` | `exact` × 2 | RTL + health yellow | inline | | NFT-RES-R6 | battery at hard-floor | `R6` | `exact` | land-now | inline | | NFT-RES-R7 | `mavlink-sitl-scripts` (no-response retry exhaustion) | `R7` | `exact` | health red after max-retry | inline | | NFT-RES-R8 | `time-drift-scripts` (250 ms drift) | `R8` | `exact` | time-source yellow + clock_source/last_sync_at updated | inline | | NFT-RES-R9 | `mavlink-sitl-scripts` (EXCLUSION cross) | `R9` | `exact` × 2 | waypoint rejected + RTL | inline | | NFT-RES-LIM-Re1 | `` | `Re1` | `threshold_max` | combined RSS ≤ 6 GB | inline | | NFT-RES-LIM-Re2 | Re1 + concurrent Tier-1 traffic | `Re2` | `numeric_tolerance` | Tier-1 ms/frame Δ ± 5 ms | inline | | FT-P-MapPull (Mp1) | `` | `Mp1` | `threshold_max` | ≤ 30 s | inline | | NFT-RES-Mp2 | mock unreachable | `Mp2` | `exact` × 2 | cached_fallback + BIT requires ack | inline | | FT-P-MapPush (Mp3) | `` | `Mp3` | `threshold_max` | ≤ 120 s | inline | | NFT-RES-Mp4 | POST returns 5xx | `Mp4` | `exact` × 2 + `threshold_max` | file exists + warning + retries ≤ cap | inline | | FT-P-MapConflict (Mp5) | `` | `Mp5` | `json_diff` | conflict resolution per Q8 | `` | ## External dependency mocks (Index-only; per-mock acquisition status owned by `services.md`.) | External service | Mock/stub | How provided | Behavior | |---|---|---|---| | `../detections` Tier-1 RPC | `detections-mock` (gRPC bi-stream) | Docker container; serves `.replay` files | Returns recorded `Detections` byte-stream for the input frame's hash; serves a 19-class catalogue (0..18) deterministically; supports schema-violation injection for NFT-SEC tests | | `missions` API | `missions-mock` (HTTPS FastAPI) | Docker container; TLS via self-signed test cert | Static JSON for `GET /missions/{id}`, `GET /missions/{id}/mapobjects`; records POST bodies for assertion; can be configured to return 5xx for NFT-RES-Mp4 | | ViewPro A40 RTSP | `rtsp-loopback` (mediamtx) | Docker container | Plays back `.mp4` at scheduled fps with frame-drop injection (T3) | | ViewPro A40 gimbal | `gimbal-mock` (Rust UDP) | Docker container | Replays `gimbal.csv` synchronised to RTSP frame timestamps; echoes received commands with bounded latency budget | | ArduPilot | `mavlink-sitl` (official ardupilot/ardupilot-sitl image) | Docker container | Deterministic SITL run from a scripted mission file | | Ground Station modem | `operator-replay` (Python) | Docker container | Replays `(t, event)` script per scenario; signs envelopes per Q9 once resolved | | Local VLM | `vlm-mock` (Python over UDS) | Docker container; UDS shared via `/tmp` volume | Returns paired `VlmAssessment` JSON; can return schema-violation responses for NFT-SEC tests | | Wall-clock / GPS / NTP | `time-injector` (Rust) | LD_PRELOAD faketime shim into the SUT container at start | Scripted offset/jump/source-loss | ## Data validation rules | Data Type | Validation | Invalid Examples | Expected System Behaviour | |---|---|---|---| | Mission JSON | `mission-schema` (shared with `missions` repo) | missing required field; coord out of [-180, 180]; unknown enum value | system refuses; mission-state stays at last-known; health flips mission-config-source = yellow; structured-log at WARN with `schema_violation_field` | | Map-object record | suite-level mapobjects schema | non-finite coordinate; class_group not in catalogue; missing MGRS | record dropped; counter `mapobjects_rejected_total` increments; structured-log at WARN | | Tier-1 `Detections` stream | `expected_detections.schema.json` (normalised-box) | bbox coord ∉ [0, 1]; confidence ∉ [0, 1]; class_id ∉ {0..18} | frame's detections dropped (not partially used); `tier1_invalid_frame_total` increments; per AC D6 the system must surface a structured WARN | | MAVLink message | MAVLink v2 dialect (per ArduPilot) | unknown MSG_ID; CRC mismatch; (if Q6 resolves to "signing on") missing signature | message dropped; if signing required and missing → security WARN; airframe-link health unaffected for individual drops | | Operator command envelope | Q9 scheme (TBD) | replay (sequence_id seen recently); signature invalid; timestamp outside replay window | rejected at the boundary; no state mutation; security WARN with reason code; counters `operator_cmd_rejected_replay_total`, `..._signature_total`, `..._expired_total` | | VLM `VlmAssessment` response | structured assessment schema | missing required field; wrong type; truncated JSON | fail-closed: assessment discarded; POI does NOT get the deep-analysis upgrade; structured WARN | | RTSP frame | container-level decode | malformed H.264/265 NAL; oversized SPS | frame dropped; `frame_decode_error_total` increments; if rate falls below 10 fps for ≥5 s → T3 path triggers (zoom-in suppressed + health yellow) | | Camera frame size | bounded crop policy (security_approach §Bounded input) | crop > configured max bytes; format not in allow-list | rejected at boundary; security WARN | | Time source | wall-clock binding | GPS unlocked AND no NTP sync at boot | clock_source = `none`; health red until either source available | ## Deferred-fixture bridge (replay obligation) Every `` row above maps 1-to-1 to an entry in `_docs/_process_leftovers/2026-05-19_autopilot_test_fixtures.md → "What is needed before /autodev can resume"` table. On every `/autodev` invocation, the leftovers step must re-evaluate whether any deferred fixture has landed; once landed, the corresponding scenario(s) become unblocked and their `Test status` line in the matching `*-tests.md` file moves from `DEFERRED — input fixture not yet acquired` to `READY`. Inline-authorable categories (10 and 11 in the leftover) — `synthetic-poi-feeds`, `time-drift-scripts`, `operator-session-scripts`, `bit-scenarios` — are NOT marked `` in this file because they have no external dependency. They are authored by Phase 4's `e2e/consumer/fixtures/` generators when the runner scripts come online.