Files
Oleksandr Bezdieniezhnykh bc40ea7300 [AZ-626] Decompose complete: 47 tasks + docs + module layout
Greenfield Steps 1-6 baseline for the autopilot rewrite from legacy
Qt/C++ to a Rust workspace.

- Remove legacy Qt/C++ tree (ai_controller, drone_controller,
  misc/camera, python_scaffold, root Dockerfile, autopilot.pro,
  legacy main.py / requirements.txt).
- Add _docs/00_problem (problem, restrictions, acceptance criteria,
  security approach, input data + fixtures).
- Add _docs/01_solution/solution_draft01.
- Add _docs/02_document (architecture, system-flows, data_model,
  glossary, decision-rationale, deployment, 13 component descriptions,
  tests/ specs, FINAL_report, module-layout).
- Add _docs/02_tasks/todo with 47 task specs (AZ-640..AZ-686, one
  bootstrap + 46 component tasks) and _dependencies_table.md.
- Add .cursor/rules/artifact-srp.mdc (single-responsibility rule for
  canonical _docs artifacts).
- Track autodev state in _docs/_autodev_state.md (Step 6 completed,
  ready for Step 7 Implement).

Jira: bootstrap AZ-626; component epics AZ-627..AZ-639; tasks
AZ-640..AZ-686. Total complexity 173 points across 12 epics.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-19 11:02:01 +03:00

20 KiB
Raw Permalink Blame History

Test Data Management

Authored by /test-spec Phase 2 (2026-05-19). Owns the mapping from fixtures to tests, mock data shapes, isolation strategy, and the deferred-fixture inventory bridge.

  • Per-row input-to-expected-result binding lives in _docs/00_problem/input_data/expected_results/results_report.md — this file references it but never duplicates it.
  • Fixture manifest (SHA-pinned files + provenance) lives in _docs/00_problem/input_data/fixtures/README.md.
  • Per-service mock catalogue (what shape each mock returns) lives in _docs/00_problem/input_data/services.md.
  • Deferred fixture inventory + replay obligation lives in _docs/_process_leftovers/2026-05-19_autopilot_test_fixtures.md.

Seed data sets

Data Set Description Used by Tests How Loaded Cleanup
image-set-existing fixtures/images/{4d6e1830d211ad50,54f6459dbddb93d8,6dd601b7d2dc1b30,805bcf1e9f271a58,f997d0934726b555}.jpg — 5 aerial frames FT-P-Tier1Contract, NFT-PERF-L1, NFT-PERF-L2, FT-P-DetectExisting mounted read-only via fixtures-ro:/fixtures on rtsp-loopback (encoded to .mp4 clip) and on detections-mock (paired with expected_detections.json per frame) volume detached on container teardown
video-recon fixtures/videos/94d42580bd1ad6ff.mp4 NFT-PERF-T3 mounted read-only on rtsp-loopback; consumer requests stream at 30 fps then throttles decode + drops frames per scenario script as above
video-movement fixtures/movement/video0[1-4].mp4 (4 wide-area clips) FT-P-MoveStarter (visual reference only), FT-P-MoveBenchmark (deferred) mounted on rtsp-loopback; played at 30 fps; consumer schedules which clip per scenario as above
image-semantic-starter fixtures/semantic/semantic0[1-4].png (1 winter + 3 unmarked season) FT-P-ConcealStarter, FT-P-FootpathStarter (visual reference only; assertion semantics deferred) mounted on detections-mock and rtsp-loopback as a single-frame loop as above
schemas-detection fixtures/schemas/expected_detections.{json,schema.json} FT-P-Tier1Contract, FT-P-NormalisedBoxes (D6) mounted on e2e-consumer:/expected:ro as above
sql-init-suite fixtures/sql/init.sql NOT USED by autopilot tests (suite-only artefact; recorded here for traceability) n/a n/a
mission-suite-fixture <DEFERRED: missions_fixtures/mission_30x30km.json + mapobjects_10k.json; services.md §2> FT-P-MissionStart, FT-P-MapPull (Mp1), FT-P-MapPush (Mp3), NFT-RES-Mp2, NFT-RES-Mp4 mounted on missions-mock once acquired as above
mavlink-sitl-scripts scripted ardupilot/sitl scenarios (waypoint upload, geofence in/out, RTL on link loss, RTL on battery floor) FT-P-WaypointInsert (O8), NFT-RES-R4, NFT-RES-R5, NFT-RES-R6, NFT-RES-R7, NFT-RES-R9 run in mavlink-sitl via --script argument per scenario SITL container restarted per scenario
operator-session-scripts scripted (t, event) traces — nominal, drop+reconnect, lost-link 30 s, sustained lost-link FT-P-DecisionWindow (O1O3, O4), FT-P-OperatorDecline (O5), FT-P-OperatorIgnoredSuppress (O6), FT-P-OperatorTimeout (O7), FT-P-OperatorConfirm (O8), NFT-RES-R4 replayed by operator-replay per scenario per-scenario
operator-envelopes <DEFERRED: operator_envelopes/{valid,replayed,malformed,unsigned,expired}.bin; services.md §8 (Q9-blocked)> NFT-SEC-O9, NFT-SEC-O10, FT-P-OperatorConfirm (O8 happy path uses a default placeholder envelope) replayed by operator-replay per-scenario
vlm-io-pairs <DEFERRED: vlm_io_pairs/{roi,prompt,response}.* + schema-violation cases; services.md §7> NFT-PERF-L3, FT-P-DeepAnalysisHold (S5), NFT-SEC-VlmSchemaViolation mounted on vlm-mock per-scenario
gimbal-csv-pairs <DEFERRED: gimbal_csv/video0[1-4].csv paired with movement videos at zoomed-in band + threshold-edge cluster; services.md §6> FT-P-EgoMotion (M1), FT-P-MoveDuringHold (M2), FT-P-ThresholdEdge (M3), FT-P-MoveBenchmark (M4), NFT-PERF-L6, NFT-PERF-L7 replayed by gimbal-mock synchronised to RTSP frame timestamps per-scenario
tier1-replay-streams <DEFERRED: tier1_replay/*.replay; services.md §1> FT-P-Tier1ContractIsolated (Tier B variant); Tier-E uses live ../detections served by detections-mock per-scenario
time-drift-scripts scripted clock offsets (50 ms ramp, 250 ms jump, NTP loss, GPS unlock) NFT-RES-R8 injected by time-injector via faketime LD_PRELOAD shim per-scenario
synthetic-poi-feeds inline-authorable: confidence={0.39, 0.40, 0.70, 1.00}, ordering-test feed, sustained-rate feed >5 POI/min FT-P-DecisionWindow (O1O4), FT-P-POIOrdering (S4), NFT-PERF-T1 authored in Rust under e2e/consumer/fixtures/synthetic_poi/; pumped into the SUT by injecting recorded Detections into detections-mock n/a (in-memory)
bit-scenarios inline-authorable: every-dep-green, tier1-unreachable, storage-95pct-full FT-P-BitPass (R1), NFT-RES-R2, NFT-RES-R3 manipulated by toggling mock services up/down + autopilot-state volume seed file volume seed file removed

Data isolation strategy

  • Per scenario, fresh containers. Each scenario starts with docker compose down -v && docker compose up -d (the e2e-consumer orchestrates this via testcontainers-rs). No state leaks between scenarios.
  • autopilot-state volume is named per (test_id, run_id) so parallel scenario runs do not collide.
  • Deterministic seeds. Every randomness source in the SUT (POI age-factor tie-breaking, retry jitter, replay-window nonce window) is configured to a per-scenario seed via env vars (AUTOPILOT_RNG_SEED=<test_id>). The seed is captured in the CSV report.
  • Wall-clock control. Scenarios that depend on absolute time (NFT-RES-R8, NFT-RES-R4 grace window, FT-P-DecisionWindow timeouts) use time-injector (faketime LD_PRELOAD). The SUT's time.now() calls are intercepted; GPS-source state is set via the mavlink-sitl GLOBAL_POSITION_INT message stream.
  • Network determinism. All inter-service traffic stays on the autopilot-e2e Docker network (no internet egress). Latency injection (for L9 modem RTT exclusion checks) uses tc qdisc inside the operator-replay container.
  • No shared mocks between scenarios. Even when two scenarios use the same fixture, each gets its own mock container instance — this avoids stale state in missions-mock's POST-buffer or gimbal-mock's last-command cache.

Input data mapping (fixtures → scenarios)

This is the fixture-side index; the scenario-side index is in each *-tests.md file's Input data field.

Input data file Source location Description Covers scenarios
fixtures/images/4d6e1830d211ad50.jpg _docs/00_problem/input_data/fixtures/images/ Aerial frame, 1280 px input FT-P-Tier1Contract (D6), NFT-PERF-L1, NFT-PERF-L2
fixtures/images/{54f6...,6dd6...,805b...,f997...}.jpg same dir 4 additional aerial frames for existing-class regression FT-P-DetectExisting (D2)
fixtures/videos/94d42580bd1ad6ff.mp4 same dir Reconnaissance clip, 30 fps; consumer throttles to drop below 10 fps for ≥5 s NFT-PERF-T3
fixtures/movement/video01.mp4 same dir Wide-area movement clip (visual reference only) FT-P-EgoMotion (M1) [DEFERRED — needs gimbal.csv]
fixtures/movement/video02.mp4 same dir Wide-area movement clip (visual reference only) FT-P-MoveDuringHold (M2) [DEFERRED — needs zoomed-in gimbal.csv]
fixtures/movement/video03.mp4 same dir Wide-area movement clip (visual reference only) FT-P-ThresholdEdge (M3) [DEFERRED — needs threshold-edge gimbal.csv]
fixtures/movement/video04.mp4 same dir Wide-area movement clip (visual reference only) FT-P-MoveBenchmark (M4) [DEFERRED — needs zoom-band benchmark CSV]
fixtures/semantic/semantic01.png same dir Winter concealed-position reference (starter only) FT-P-ConcealStarter (D3, D4), FT-P-FootpathStarter (D5) [DEFERRED — needs annotated multi-season set]
fixtures/semantic/semantic0[2-4].png same dir 3 unmarked-season concealed-position references as above
fixtures/schemas/expected_detections.json same dir Reference output for D6 FT-P-Tier1Contract (D6), FT-P-NormalisedBoxes
fixtures/schemas/expected_detections.schema.json same dir Schema for normalised-box output FT-P-NormalisedBoxes, NFT-SEC-Tier1SchemaViolation
fixtures/sql/init.sql same dir (suite-only — recorded for traceability) none

Expected results mapping (scenario → comparison row)

Every scenario in *-tests.md traces to a row id in _docs/00_problem/input_data/expected_results/results_report.md. The comparison method + tolerance is owned by that row — this table is the scenario-side index so a reader can navigate from a test to its assertion contract.

Scenario ID Input data Expected result row Comparison method Tolerance Source
FT-P-Tier1Contract image-set-existing (1 frame) D6 schema_match + range each coord ∈ [0,1] fixtures/schemas/expected_detections.schema.json
FT-P-DetectExisting image-set-existing (5 frames) D2 numeric_tolerance ± 0.02 (P, R) <DEFERRED: expected_results/existing_classes_baseline.json>
FT-P-DetectNew <DEFERRED: new-class eval set> D1 threshold_min P ≥ 0.80 AND R ≥ 0.80 <DEFERRED: expected_results/new_classes_pr.json>
FT-P-ConcealRecall image-semantic-starter + <DEFERRED: full set> D3 threshold_min recall ≥ 0.60 <DEFERRED: expected_results/concealed_positions.json>
FT-P-ConcealPrecision same D4 threshold_min precision ≥ 0.20 same
FT-P-FootpathRecall image-semantic-starter + <DEFERRED> D5 threshold_min recall ≥ 0.70 <DEFERRED: expected_results/footpaths.json>
NFT-PERF-L1 image-set-existing (1 frame) L1 threshold_max ≤ 100 ms inline
NFT-PERF-L2 derived ROI from same L2 threshold_max ≤ 200 ms inline
NFT-PERF-L3 vlm-io-pairs L3 threshold_max ≤ 5000 ms inline
NFT-PERF-L4 <DEFERRED: SITL or HW zoom-cmd capture> L4 threshold_max ≤ 2000 ms inline
NFT-PERF-L5 <DEFERRED: scripted scan→movement> L5 threshold_max ≤ 500 ms inline
NFT-PERF-L6 video-movement (visual ref) + <DEFERRED gimbal.csv> L6 threshold_max ≤ 1000 ms inline
NFT-PERF-L7 video-movement + <DEFERRED zoomed-in gimbal.csv> L7 threshold_max ≤ 1500 ms inline
NFT-PERF-L8 <DEFERRED: sweep→zoomed transition capture> L8 threshold_max ≤ 2000 ms inline
NFT-PERF-L9 <DEFERRED: operator-click → outbound> L9 threshold_max ≤ 500 ms inline
NFT-PERF-T1 synthetic-poi-feeds (sustained > cap) T1 threshold_max ≤ 5 / min inline
NFT-PERF-T2 <DEFERRED: MAVLink replay 60 s> T2 range 1 Hz ≤ r ≤ 10 Hz inline
NFT-PERF-T3 video-recon (throttled) T3 exact × 2 suppression bool + health=yellow inline
FT-P-EgoMotion (M1) video-movement/video01.mp4 + <DEFERRED gimbal.csv + telemetry.csv> M1 set_contains candidate set == {vehicle}; ∉ tree row inline
FT-P-MoveDuringHold (M2) video02.mp4 + <DEFERRED zoomed-in CSV pair> M2 exact 1 candidate; preempt per priority rule inline
FT-P-ThresholdEdge (M3) video03.mp4 + <DEFERRED threshold-edge CSV> M3 exact count == 0 inline
FT-P-MoveBenchmark (M4) video04.mp4 + <DEFERRED benchmark suite> M4 threshold_max per-zoom-band FP rate budget <DEFERRED: expected_results/movement_benchmark_caps.json>
FT-P-SweepToZoom (S1) <DEFERRED scripted mission + POI> S1 exact × 3 transition + ROI + queue+=1 inline
FT-P-FootpathPan (S2) <DEFERRED hold + footpath polyline> S2 numeric_tolerance centre offset ≤ 25% per frame inline
FT-P-TargetFollow (S3) <DEFERRED confirmed target> S3 threshold_max per-frame dx,dy
FT-P-POIOrdering (S4) synthetic-poi-feeds (ordering test) S4 exact (order) ordering matches conf × prox × age inline
FT-P-DeepAnalysisHold (S5) <DEFERRED VLM-enabled hold> S5 exact hold = min(5 s, vlm_complete) inline
FT-P-DecisionWindow30s (O1) synthetic-poi-feeds (conf=0.40) O1 exact window = 30 s inline
FT-P-DecisionWindow120s (O2) conf=1.00 O2 exact window = 120 s inline
FT-P-DecisionWindow75s (O3) conf=0.70 O3 numeric_tolerance window ≈ 75 s ± 0.5 s inline
FT-N-BelowThreshold (O4) conf=0.39 O4 exact not surfaced inline
FT-P-OperatorDecline (O5) operator-session-scripts (nominal + decline) O5 exact (count Δ+1) + schema_match ignored-item appended inline
FT-P-IgnoredSuppress (O6) matching MGRS + class_group O6 exact not surfaced inline
FT-P-OperatorTimeout (O7) no-response + > window O7 exact × 2 queue 1; ignored unchanged inline
FT-P-OperatorConfirm (O8) operator-envelopes (valid happy path) O8 exact (HTTP 200) + exact (mode) mission POST + target-follow inline
NFT-SEC-O9 operator-envelopes (replayed) O9 exact + substring state unchanged; log contains "replay" inline
NFT-SEC-O10 operator-envelopes (malformed/unsigned) O10 exact + substring state unchanged; log contains "invalid" inline
FT-P-BitPass (R1) bit-scenarios (every dep green) R1 exact × 2 takeoff permitted + health all green inline
FT-N-BitDetectionDown (R2) tier1 unreachable R2 exact takeoff inhibited + detection red inline
FT-N-BitStorageFull (R3) storage ≥ 95 % R3 exact takeoff inhibited + storage red inline
NFT-RES-R4 operator-session-scripts (sustained lost-link) R4 exact (RTL at 30 s ± 1 s) RTL command + operator-link red inline
NFT-RES-R5 mavlink-sitl-scripts (battery at RTL-floor) R5 exact × 2 RTL + health yellow inline
NFT-RES-R6 battery at hard-floor R6 exact land-now inline
NFT-RES-R7 mavlink-sitl-scripts (no-response retry exhaustion) R7 exact health red after max-retry inline
NFT-RES-R8 time-drift-scripts (250 ms drift) R8 exact time-source yellow + clock_source/last_sync_at updated inline
NFT-RES-R9 mavlink-sitl-scripts (EXCLUSION cross) R9 exact × 2 waypoint rejected + RTL inline
NFT-RES-LIM-Re1 <DEFERRED long-running RSS harness> Re1 threshold_max combined RSS ≤ 6 GB inline
NFT-RES-LIM-Re2 Re1 + concurrent Tier-1 traffic Re2 numeric_tolerance Tier-1 ms/frame Δ ± 5 ms inline
FT-P-MapPull (Mp1) <DEFERRED 30×30 km area + ~10k mapobjects> Mp1 threshold_max ≤ 30 s inline
NFT-RES-Mp2 mock unreachable Mp2 exact × 2 cached_fallback + BIT requires ack inline
FT-P-MapPush (Mp3) <DEFERRED 60 min diff> Mp3 threshold_max ≤ 120 s inline
NFT-RES-Mp4 POST returns 5xx Mp4 exact × 2 + threshold_max file exists + warning + retries ≤ cap inline
FT-P-MapConflict (Mp5) <DEFERRED conflict pair> Mp5 json_diff conflict resolution per Q8 <DEFERRED: expected_results/mapobjects_conflict_resolution.json>

External dependency mocks

(Index-only; per-mock acquisition status owned by services.md.)

External service Mock/stub How provided Behavior
../detections Tier-1 RPC detections-mock (gRPC bi-stream) Docker container; serves .replay files Returns recorded Detections byte-stream for the input frame's hash; serves a 19-class catalogue (0..18) deterministically; supports schema-violation injection for NFT-SEC tests
missions API missions-mock (HTTPS FastAPI) Docker container; TLS via self-signed test cert Static JSON for GET /missions/{id}, GET /missions/{id}/mapobjects; records POST bodies for assertion; can be configured to return 5xx for NFT-RES-Mp4
ViewPro A40 RTSP rtsp-loopback (mediamtx) Docker container Plays back .mp4 at scheduled fps with frame-drop injection (T3)
ViewPro A40 gimbal gimbal-mock (Rust UDP) Docker container Replays gimbal.csv synchronised to RTSP frame timestamps; echoes received commands with bounded latency budget
ArduPilot mavlink-sitl (official ardupilot/ardupilot-sitl image) Docker container Deterministic SITL run from a scripted mission file
Ground Station modem operator-replay (Python) Docker container Replays (t, event) script per scenario; signs envelopes per Q9 once resolved
Local VLM vlm-mock (Python over UDS) Docker container; UDS shared via /tmp volume Returns paired VlmAssessment JSON; can return schema-violation responses for NFT-SEC tests
Wall-clock / GPS / NTP time-injector (Rust) LD_PRELOAD faketime shim into the SUT container at start Scripted offset/jump/source-loss

Data validation rules

Data Type Validation Invalid Examples Expected System Behaviour
Mission JSON mission-schema (shared with missions repo) missing required field; coord out of [-180, 180]; unknown enum value system refuses; mission-state stays at last-known; health flips mission-config-source = yellow; structured-log at WARN with schema_violation_field
Map-object record suite-level mapobjects schema non-finite coordinate; class_group not in catalogue; missing MGRS record dropped; counter mapobjects_rejected_total increments; structured-log at WARN
Tier-1 Detections stream expected_detections.schema.json (normalised-box) bbox coord ∉ [0, 1]; confidence ∉ [0, 1]; class_id ∉ {0..18} frame's detections dropped (not partially used); tier1_invalid_frame_total increments; per AC D6 the system must surface a structured WARN
MAVLink message MAVLink v2 dialect (per ArduPilot) unknown MSG_ID; CRC mismatch; (if Q6 resolves to "signing on") missing signature message dropped; if signing required and missing → security WARN; airframe-link health unaffected for individual drops
Operator command envelope Q9 scheme (TBD) replay (sequence_id seen recently); signature invalid; timestamp outside replay window rejected at the boundary; no state mutation; security WARN with reason code; counters operator_cmd_rejected_replay_total, ..._signature_total, ..._expired_total
VLM VlmAssessment response structured assessment schema missing required field; wrong type; truncated JSON fail-closed: assessment discarded; POI does NOT get the deep-analysis upgrade; structured WARN
RTSP frame container-level decode malformed H.264/265 NAL; oversized SPS frame dropped; frame_decode_error_total increments; if rate falls below 10 fps for ≥5 s → T3 path triggers (zoom-in suppressed + health yellow)
Camera frame size bounded crop policy (security_approach §Bounded input) crop > configured max bytes; format not in allow-list rejected at boundary; security WARN
Time source wall-clock binding GPS unlocked AND no NTP sync at boot clock_source = none; health red until either source available

Deferred-fixture bridge (replay obligation)

Every <DEFERRED:> row above maps 1-to-1 to an entry in _docs/_process_leftovers/2026-05-19_autopilot_test_fixtures.md → "What is needed before /autodev can resume" table. On every /autodev invocation, the leftovers step must re-evaluate whether any deferred fixture has landed; once landed, the corresponding scenario(s) become unblocked and their Test status line in the matching *-tests.md file moves from DEFERRED — input fixture not yet acquired to READY.

Inline-authorable categories (10 and 11 in the leftover) — synthetic-poi-feeds, time-drift-scripts, operator-session-scripts, bit-scenarios — are NOT marked <DEFERRED:> in this file because they have no external dependency. They are authored by Phase 4's e2e/consumer/fixtures/ generators when the runner scripts come online.