Greenfield Steps 1-6 baseline for the autopilot rewrite from legacy Qt/C++ to a Rust workspace. - Remove legacy Qt/C++ tree (ai_controller, drone_controller, misc/camera, python_scaffold, root Dockerfile, autopilot.pro, legacy main.py / requirements.txt). - Add _docs/00_problem (problem, restrictions, acceptance criteria, security approach, input data + fixtures). - Add _docs/01_solution/solution_draft01. - Add _docs/02_document (architecture, system-flows, data_model, glossary, decision-rationale, deployment, 13 component descriptions, tests/ specs, FINAL_report, module-layout). - Add _docs/02_tasks/todo with 47 task specs (AZ-640..AZ-686, one bootstrap + 46 component tasks) and _dependencies_table.md. - Add .cursor/rules/artifact-srp.mdc (single-responsibility rule for canonical _docs artifacts). - Track autodev state in _docs/_autodev_state.md (Step 6 completed, ready for Step 7 Implement). Jira: bootstrap AZ-626; component epics AZ-627..AZ-639; tasks AZ-640..AZ-686. Total complexity 173 points across 12 epics. Co-authored-by: Cursor <cursoragent@cursor.com>
20 KiB
Test Data Management
Authored by /test-spec Phase 2 (2026-05-19). Owns the mapping from fixtures to tests, mock data shapes, isolation strategy, and the deferred-fixture inventory bridge.
- Per-row input-to-expected-result binding lives in
_docs/00_problem/input_data/expected_results/results_report.md— this file references it but never duplicates it. - Fixture manifest (SHA-pinned files + provenance) lives in
_docs/00_problem/input_data/fixtures/README.md. - Per-service mock catalogue (what shape each mock returns) lives in
_docs/00_problem/input_data/services.md. - Deferred fixture inventory + replay obligation lives in
_docs/_process_leftovers/2026-05-19_autopilot_test_fixtures.md.
Seed data sets
| Data Set | Description | Used by Tests | How Loaded | Cleanup |
|---|---|---|---|---|
image-set-existing |
fixtures/images/{4d6e1830d211ad50,54f6459dbddb93d8,6dd601b7d2dc1b30,805bcf1e9f271a58,f997d0934726b555}.jpg — 5 aerial frames |
FT-P-Tier1Contract, NFT-PERF-L1, NFT-PERF-L2, FT-P-DetectExisting | mounted read-only via fixtures-ro:/fixtures on rtsp-loopback (encoded to .mp4 clip) and on detections-mock (paired with expected_detections.json per frame) |
volume detached on container teardown |
video-recon |
fixtures/videos/94d42580bd1ad6ff.mp4 |
NFT-PERF-T3 | mounted read-only on rtsp-loopback; consumer requests stream at 30 fps then throttles decode + drops frames per scenario script |
as above |
video-movement |
fixtures/movement/video0[1-4].mp4 (4 wide-area clips) |
FT-P-MoveStarter (visual reference only), FT-P-MoveBenchmark (deferred) | mounted on rtsp-loopback; played at 30 fps; consumer schedules which clip per scenario |
as above |
image-semantic-starter |
fixtures/semantic/semantic0[1-4].png (1 winter + 3 unmarked season) |
FT-P-ConcealStarter, FT-P-FootpathStarter (visual reference only; assertion semantics deferred) | mounted on detections-mock and rtsp-loopback as a single-frame loop |
as above |
schemas-detection |
fixtures/schemas/expected_detections.{json,schema.json} |
FT-P-Tier1Contract, FT-P-NormalisedBoxes (D6) | mounted on e2e-consumer:/expected:ro |
as above |
sql-init-suite |
fixtures/sql/init.sql |
NOT USED by autopilot tests (suite-only artefact; recorded here for traceability) | n/a | n/a |
mission-suite-fixture |
<DEFERRED: missions_fixtures/mission_30x30km.json + mapobjects_10k.json; services.md §2> |
FT-P-MissionStart, FT-P-MapPull (Mp1), FT-P-MapPush (Mp3), NFT-RES-Mp2, NFT-RES-Mp4 | mounted on missions-mock once acquired |
as above |
mavlink-sitl-scripts |
scripted ardupilot/sitl scenarios (waypoint upload, geofence in/out, RTL on link loss, RTL on battery floor) |
FT-P-WaypointInsert (O8), NFT-RES-R4, NFT-RES-R5, NFT-RES-R6, NFT-RES-R7, NFT-RES-R9 | run in mavlink-sitl via --script argument per scenario |
SITL container restarted per scenario |
operator-session-scripts |
scripted (t, event) traces — nominal, drop+reconnect, lost-link 30 s, sustained lost-link |
FT-P-DecisionWindow (O1–O3, O4), FT-P-OperatorDecline (O5), FT-P-OperatorIgnoredSuppress (O6), FT-P-OperatorTimeout (O7), FT-P-OperatorConfirm (O8), NFT-RES-R4 | replayed by operator-replay per scenario |
per-scenario |
operator-envelopes |
<DEFERRED: operator_envelopes/{valid,replayed,malformed,unsigned,expired}.bin; services.md §8 (Q9-blocked)> |
NFT-SEC-O9, NFT-SEC-O10, FT-P-OperatorConfirm (O8 happy path uses a default placeholder envelope) | replayed by operator-replay |
per-scenario |
vlm-io-pairs |
<DEFERRED: vlm_io_pairs/{roi,prompt,response}.* + schema-violation cases; services.md §7> |
NFT-PERF-L3, FT-P-DeepAnalysisHold (S5), NFT-SEC-VlmSchemaViolation | mounted on vlm-mock |
per-scenario |
gimbal-csv-pairs |
<DEFERRED: gimbal_csv/video0[1-4].csv paired with movement videos at zoomed-in band + threshold-edge cluster; services.md §6> |
FT-P-EgoMotion (M1), FT-P-MoveDuringHold (M2), FT-P-ThresholdEdge (M3), FT-P-MoveBenchmark (M4), NFT-PERF-L6, NFT-PERF-L7 | replayed by gimbal-mock synchronised to RTSP frame timestamps |
per-scenario |
tier1-replay-streams |
<DEFERRED: tier1_replay/*.replay; services.md §1> |
FT-P-Tier1ContractIsolated (Tier B variant); Tier-E uses live ../detections |
served by detections-mock |
per-scenario |
time-drift-scripts |
scripted clock offsets (50 ms ramp, 250 ms jump, NTP loss, GPS unlock) | NFT-RES-R8 | injected by time-injector via faketime LD_PRELOAD shim |
per-scenario |
synthetic-poi-feeds |
inline-authorable: confidence={0.39, 0.40, 0.70, 1.00}, ordering-test feed, sustained-rate feed >5 POI/min | FT-P-DecisionWindow (O1–O4), FT-P-POIOrdering (S4), NFT-PERF-T1 | authored in Rust under e2e/consumer/fixtures/synthetic_poi/; pumped into the SUT by injecting recorded Detections into detections-mock |
n/a (in-memory) |
bit-scenarios |
inline-authorable: every-dep-green, tier1-unreachable, storage-95pct-full | FT-P-BitPass (R1), NFT-RES-R2, NFT-RES-R3 | manipulated by toggling mock services up/down + autopilot-state volume seed file |
volume seed file removed |
Data isolation strategy
- Per scenario, fresh containers. Each scenario starts with
docker compose down -v && docker compose up -d(thee2e-consumerorchestrates this viatestcontainers-rs). No state leaks between scenarios. autopilot-statevolume is named per(test_id, run_id)so parallel scenario runs do not collide.- Deterministic seeds. Every randomness source in the SUT (POI age-factor tie-breaking, retry jitter, replay-window nonce window) is configured to a per-scenario seed via env vars (
AUTOPILOT_RNG_SEED=<test_id>). The seed is captured in the CSV report. - Wall-clock control. Scenarios that depend on absolute time (NFT-RES-R8, NFT-RES-R4 grace window, FT-P-DecisionWindow timeouts) use
time-injector(faketime LD_PRELOAD). The SUT'stime.now()calls are intercepted; GPS-source state is set via themavlink-sitlGLOBAL_POSITION_INT message stream. - Network determinism. All inter-service traffic stays on the
autopilot-e2eDocker network (no internet egress). Latency injection (for L9 modem RTT exclusion checks) usestc qdiscinside theoperator-replaycontainer. - No shared mocks between scenarios. Even when two scenarios use the same fixture, each gets its own mock container instance — this avoids stale state in
missions-mock's POST-buffer orgimbal-mock's last-command cache.
Input data mapping (fixtures → scenarios)
This is the fixture-side index; the scenario-side index is in each *-tests.md file's Input data field.
| Input data file | Source location | Description | Covers scenarios |
|---|---|---|---|
fixtures/images/4d6e1830d211ad50.jpg |
_docs/00_problem/input_data/fixtures/images/ |
Aerial frame, 1280 px input | FT-P-Tier1Contract (D6), NFT-PERF-L1, NFT-PERF-L2 |
fixtures/images/{54f6...,6dd6...,805b...,f997...}.jpg |
same dir | 4 additional aerial frames for existing-class regression | FT-P-DetectExisting (D2) |
fixtures/videos/94d42580bd1ad6ff.mp4 |
same dir | Reconnaissance clip, 30 fps; consumer throttles to drop below 10 fps for ≥5 s | NFT-PERF-T3 |
fixtures/movement/video01.mp4 |
same dir | Wide-area movement clip (visual reference only) | FT-P-EgoMotion (M1) [DEFERRED — needs gimbal.csv] |
fixtures/movement/video02.mp4 |
same dir | Wide-area movement clip (visual reference only) | FT-P-MoveDuringHold (M2) [DEFERRED — needs zoomed-in gimbal.csv] |
fixtures/movement/video03.mp4 |
same dir | Wide-area movement clip (visual reference only) | FT-P-ThresholdEdge (M3) [DEFERRED — needs threshold-edge gimbal.csv] |
fixtures/movement/video04.mp4 |
same dir | Wide-area movement clip (visual reference only) | FT-P-MoveBenchmark (M4) [DEFERRED — needs zoom-band benchmark CSV] |
fixtures/semantic/semantic01.png |
same dir | Winter concealed-position reference (starter only) | FT-P-ConcealStarter (D3, D4), FT-P-FootpathStarter (D5) [DEFERRED — needs annotated multi-season set] |
fixtures/semantic/semantic0[2-4].png |
same dir | 3 unmarked-season concealed-position references | as above |
fixtures/schemas/expected_detections.json |
same dir | Reference output for D6 | FT-P-Tier1Contract (D6), FT-P-NormalisedBoxes |
fixtures/schemas/expected_detections.schema.json |
same dir | Schema for normalised-box output | FT-P-NormalisedBoxes, NFT-SEC-Tier1SchemaViolation |
fixtures/sql/init.sql |
same dir | (suite-only — recorded for traceability) | none |
Expected results mapping (scenario → comparison row)
Every scenario in *-tests.md traces to a row id in _docs/00_problem/input_data/expected_results/results_report.md. The comparison method + tolerance is owned by that row — this table is the scenario-side index so a reader can navigate from a test to its assertion contract.
| Scenario ID | Input data | Expected result row | Comparison method | Tolerance | Source |
|---|---|---|---|---|---|
| FT-P-Tier1Contract | image-set-existing (1 frame) |
D6 |
schema_match + range |
each coord ∈ [0,1] | fixtures/schemas/expected_detections.schema.json |
| FT-P-DetectExisting | image-set-existing (5 frames) |
D2 |
numeric_tolerance |
± 0.02 (P, R) | <DEFERRED: expected_results/existing_classes_baseline.json> |
| FT-P-DetectNew | <DEFERRED: new-class eval set> |
D1 |
threshold_min |
P ≥ 0.80 AND R ≥ 0.80 | <DEFERRED: expected_results/new_classes_pr.json> |
| FT-P-ConcealRecall | image-semantic-starter + <DEFERRED: full set> |
D3 |
threshold_min |
recall ≥ 0.60 | <DEFERRED: expected_results/concealed_positions.json> |
| FT-P-ConcealPrecision | same | D4 |
threshold_min |
precision ≥ 0.20 | same |
| FT-P-FootpathRecall | image-semantic-starter + <DEFERRED> |
D5 |
threshold_min |
recall ≥ 0.70 | <DEFERRED: expected_results/footpaths.json> |
| NFT-PERF-L1 | image-set-existing (1 frame) |
L1 |
threshold_max |
≤ 100 ms | inline |
| NFT-PERF-L2 | derived ROI from same | L2 |
threshold_max |
≤ 200 ms | inline |
| NFT-PERF-L3 | vlm-io-pairs |
L3 |
threshold_max |
≤ 5000 ms | inline |
| NFT-PERF-L4 | <DEFERRED: SITL or HW zoom-cmd capture> |
L4 |
threshold_max |
≤ 2000 ms | inline |
| NFT-PERF-L5 | <DEFERRED: scripted scan→movement> |
L5 |
threshold_max |
≤ 500 ms | inline |
| NFT-PERF-L6 | video-movement (visual ref) + <DEFERRED gimbal.csv> |
L6 |
threshold_max |
≤ 1000 ms | inline |
| NFT-PERF-L7 | video-movement + <DEFERRED zoomed-in gimbal.csv> |
L7 |
threshold_max |
≤ 1500 ms | inline |
| NFT-PERF-L8 | <DEFERRED: sweep→zoomed transition capture> |
L8 |
threshold_max |
≤ 2000 ms | inline |
| NFT-PERF-L9 | <DEFERRED: operator-click → outbound> |
L9 |
threshold_max |
≤ 500 ms | inline |
| NFT-PERF-T1 | synthetic-poi-feeds (sustained > cap) |
T1 |
threshold_max |
≤ 5 / min | inline |
| NFT-PERF-T2 | <DEFERRED: MAVLink replay 60 s> |
T2 |
range |
1 Hz ≤ r ≤ 10 Hz | inline |
| NFT-PERF-T3 | video-recon (throttled) |
T3 |
exact × 2 |
suppression bool + health=yellow | inline |
| FT-P-EgoMotion (M1) | video-movement/video01.mp4 + <DEFERRED gimbal.csv + telemetry.csv> |
M1 |
set_contains |
candidate set == {vehicle}; ∉ tree row | inline |
| FT-P-MoveDuringHold (M2) | video02.mp4 + <DEFERRED zoomed-in CSV pair> |
M2 |
exact |
1 candidate; preempt per priority rule | inline |
| FT-P-ThresholdEdge (M3) | video03.mp4 + <DEFERRED threshold-edge CSV> |
M3 |
exact |
count == 0 | inline |
| FT-P-MoveBenchmark (M4) | video04.mp4 + <DEFERRED benchmark suite> |
M4 |
threshold_max |
per-zoom-band FP rate budget | <DEFERRED: expected_results/movement_benchmark_caps.json> |
| FT-P-SweepToZoom (S1) | <DEFERRED scripted mission + POI> |
S1 |
exact × 3 |
transition + ROI + queue+=1 | inline |
| FT-P-FootpathPan (S2) | <DEFERRED hold + footpath polyline> |
S2 |
numeric_tolerance |
centre offset ≤ 25% per frame | inline |
| FT-P-TargetFollow (S3) | <DEFERRED confirmed target> |
S3 |
threshold_max |
per-frame | dx,dy |
| FT-P-POIOrdering (S4) | synthetic-poi-feeds (ordering test) |
S4 |
exact (order) |
ordering matches conf × prox × age |
inline |
| FT-P-DeepAnalysisHold (S5) | <DEFERRED VLM-enabled hold> |
S5 |
exact |
hold = min(5 s, vlm_complete) | inline |
| FT-P-DecisionWindow30s (O1) | synthetic-poi-feeds (conf=0.40) |
O1 |
exact |
window = 30 s | inline |
| FT-P-DecisionWindow120s (O2) | conf=1.00 | O2 |
exact |
window = 120 s | inline |
| FT-P-DecisionWindow75s (O3) | conf=0.70 | O3 |
numeric_tolerance |
window ≈ 75 s ± 0.5 s | inline |
| FT-N-BelowThreshold (O4) | conf=0.39 | O4 |
exact |
not surfaced | inline |
| FT-P-OperatorDecline (O5) | operator-session-scripts (nominal + decline) |
O5 |
exact (count Δ+1) + schema_match |
ignored-item appended | inline |
| FT-P-IgnoredSuppress (O6) | matching MGRS + class_group | O6 |
exact |
not surfaced | inline |
| FT-P-OperatorTimeout (O7) | no-response + > window | O7 |
exact × 2 |
queue −1; ignored unchanged | inline |
| FT-P-OperatorConfirm (O8) | operator-envelopes (valid happy path) |
O8 |
exact (HTTP 200) + exact (mode) |
mission POST + target-follow | inline |
| NFT-SEC-O9 | operator-envelopes (replayed) |
O9 |
exact + substring |
state unchanged; log contains "replay" | inline |
| NFT-SEC-O10 | operator-envelopes (malformed/unsigned) |
O10 |
exact + substring |
state unchanged; log contains "invalid" | inline |
| FT-P-BitPass (R1) | bit-scenarios (every dep green) |
R1 |
exact × 2 |
takeoff permitted + health all green | inline |
| FT-N-BitDetectionDown (R2) | tier1 unreachable | R2 |
exact |
takeoff inhibited + detection red | inline |
| FT-N-BitStorageFull (R3) | storage ≥ 95 % | R3 |
exact |
takeoff inhibited + storage red | inline |
| NFT-RES-R4 | operator-session-scripts (sustained lost-link) |
R4 |
exact (RTL at 30 s ± 1 s) |
RTL command + operator-link red | inline |
| NFT-RES-R5 | mavlink-sitl-scripts (battery at RTL-floor) |
R5 |
exact × 2 |
RTL + health yellow | inline |
| NFT-RES-R6 | battery at hard-floor | R6 |
exact |
land-now | inline |
| NFT-RES-R7 | mavlink-sitl-scripts (no-response retry exhaustion) |
R7 |
exact |
health red after max-retry | inline |
| NFT-RES-R8 | time-drift-scripts (250 ms drift) |
R8 |
exact |
time-source yellow + clock_source/last_sync_at updated | inline |
| NFT-RES-R9 | mavlink-sitl-scripts (EXCLUSION cross) |
R9 |
exact × 2 |
waypoint rejected + RTL | inline |
| NFT-RES-LIM-Re1 | <DEFERRED long-running RSS harness> |
Re1 |
threshold_max |
combined RSS ≤ 6 GB | inline |
| NFT-RES-LIM-Re2 | Re1 + concurrent Tier-1 traffic | Re2 |
numeric_tolerance |
Tier-1 ms/frame Δ ± 5 ms | inline |
| FT-P-MapPull (Mp1) | <DEFERRED 30×30 km area + ~10k mapobjects> |
Mp1 |
threshold_max |
≤ 30 s | inline |
| NFT-RES-Mp2 | mock unreachable | Mp2 |
exact × 2 |
cached_fallback + BIT requires ack | inline |
| FT-P-MapPush (Mp3) | <DEFERRED 60 min diff> |
Mp3 |
threshold_max |
≤ 120 s | inline |
| NFT-RES-Mp4 | POST returns 5xx | Mp4 |
exact × 2 + threshold_max |
file exists + warning + retries ≤ cap | inline |
| FT-P-MapConflict (Mp5) | <DEFERRED conflict pair> |
Mp5 |
json_diff |
conflict resolution per Q8 | <DEFERRED: expected_results/mapobjects_conflict_resolution.json> |
External dependency mocks
(Index-only; per-mock acquisition status owned by services.md.)
| External service | Mock/stub | How provided | Behavior |
|---|---|---|---|
../detections Tier-1 RPC |
detections-mock (gRPC bi-stream) |
Docker container; serves .replay files |
Returns recorded Detections byte-stream for the input frame's hash; serves a 19-class catalogue (0..18) deterministically; supports schema-violation injection for NFT-SEC tests |
missions API |
missions-mock (HTTPS FastAPI) |
Docker container; TLS via self-signed test cert | Static JSON for GET /missions/{id}, GET /missions/{id}/mapobjects; records POST bodies for assertion; can be configured to return 5xx for NFT-RES-Mp4 |
| ViewPro A40 RTSP | rtsp-loopback (mediamtx) |
Docker container | Plays back .mp4 at scheduled fps with frame-drop injection (T3) |
| ViewPro A40 gimbal | gimbal-mock (Rust UDP) |
Docker container | Replays gimbal.csv synchronised to RTSP frame timestamps; echoes received commands with bounded latency budget |
| ArduPilot | mavlink-sitl (official ardupilot/ardupilot-sitl image) |
Docker container | Deterministic SITL run from a scripted mission file |
| Ground Station modem | operator-replay (Python) |
Docker container | Replays (t, event) script per scenario; signs envelopes per Q9 once resolved |
| Local VLM | vlm-mock (Python over UDS) |
Docker container; UDS shared via /tmp volume |
Returns paired VlmAssessment JSON; can return schema-violation responses for NFT-SEC tests |
| Wall-clock / GPS / NTP | time-injector (Rust) |
LD_PRELOAD faketime shim into the SUT container at start | Scripted offset/jump/source-loss |
Data validation rules
| Data Type | Validation | Invalid Examples | Expected System Behaviour |
|---|---|---|---|
| Mission JSON | mission-schema (shared with missions repo) |
missing required field; coord out of [-180, 180]; unknown enum value | system refuses; mission-state stays at last-known; health flips mission-config-source = yellow; structured-log at WARN with schema_violation_field |
| Map-object record | suite-level mapobjects schema | non-finite coordinate; class_group not in catalogue; missing MGRS | record dropped; counter mapobjects_rejected_total increments; structured-log at WARN |
Tier-1 Detections stream |
expected_detections.schema.json (normalised-box) |
bbox coord ∉ [0, 1]; confidence ∉ [0, 1]; class_id ∉ {0..18} | frame's detections dropped (not partially used); tier1_invalid_frame_total increments; per AC D6 the system must surface a structured WARN |
| MAVLink message | MAVLink v2 dialect (per ArduPilot) | unknown MSG_ID; CRC mismatch; (if Q6 resolves to "signing on") missing signature | message dropped; if signing required and missing → security WARN; airframe-link health unaffected for individual drops |
| Operator command envelope | Q9 scheme (TBD) | replay (sequence_id seen recently); signature invalid; timestamp outside replay window | rejected at the boundary; no state mutation; security WARN with reason code; counters operator_cmd_rejected_replay_total, ..._signature_total, ..._expired_total |
VLM VlmAssessment response |
structured assessment schema | missing required field; wrong type; truncated JSON | fail-closed: assessment discarded; POI does NOT get the deep-analysis upgrade; structured WARN |
| RTSP frame | container-level decode | malformed H.264/265 NAL; oversized SPS | frame dropped; frame_decode_error_total increments; if rate falls below 10 fps for ≥5 s → T3 path triggers (zoom-in suppressed + health yellow) |
| Camera frame size | bounded crop policy (security_approach §Bounded input) | crop > configured max bytes; format not in allow-list | rejected at boundary; security WARN |
| Time source | wall-clock binding | GPS unlocked AND no NTP sync at boot | clock_source = none; health red until either source available |
Deferred-fixture bridge (replay obligation)
Every <DEFERRED:> row above maps 1-to-1 to an entry in _docs/_process_leftovers/2026-05-19_autopilot_test_fixtures.md → "What is needed before /autodev can resume" table. On every /autodev invocation, the leftovers step must re-evaluate whether any deferred fixture has landed; once landed, the corresponding scenario(s) become unblocked and their Test status line in the matching *-tests.md file moves from DEFERRED — input fixture not yet acquired to READY.
Inline-authorable categories (10 and 11 in the leftover) — synthetic-poi-feeds, time-drift-scripts, operator-session-scripts, bit-scenarios — are NOT marked <DEFERRED:> in this file because they have no external dependency. They are authored by Phase 4's e2e/consumer/fixtures/ generators when the runner scripts come online.