mirror of https://github.com/azaion/autopilot.git synced 2026-06-21 15:21:11 +00:00

Files

T

Oleksandr Bezdieniezhnykh bc40ea7300 [AZ-626] Decompose complete: 47 tasks + docs + module layout

Greenfield Steps 1-6 baseline for the autopilot rewrite from legacy
Qt/C++ to a Rust workspace.

- Remove legacy Qt/C++ tree (ai_controller, drone_controller,
  misc/camera, python_scaffold, root Dockerfile, autopilot.pro,
  legacy main.py / requirements.txt).
- Add _docs/00_problem (problem, restrictions, acceptance criteria,
  security approach, input data + fixtures).
- Add _docs/01_solution/solution_draft01.
- Add _docs/02_document (architecture, system-flows, data_model,
  glossary, decision-rationale, deployment, 13 component descriptions,
  tests/ specs, FINAL_report, module-layout).
- Add _docs/02_tasks/todo with 47 task specs (AZ-640..AZ-686, one
  bootstrap + 46 component tasks) and _dependencies_table.md.
- Add .cursor/rules/artifact-srp.mdc (single-responsibility rule for
  canonical _docs artifacts).
- Track autodev state in _docs/_autodev_state.md (Step 6 completed,
  ready for Step 7 Implement).

Jira: bootstrap AZ-626; component epics AZ-627..AZ-639; tasks
AZ-640..AZ-686. Total complexity 173 points across 12 epics.

Co-authored-by: Cursor <cursoragent@cursor.com>

2026-05-19 11:02:01 +03:00

20 KiB

Raw Blame History

Test Data Management

Authored by /test-spec Phase 2 (2026-05-19). Owns the mapping from fixtures to tests, mock data shapes, isolation strategy, and the deferred-fixture inventory bridge.

Per-row input-to-expected-result binding lives in _docs/00_problem/input_data/expected_results/results_report.md — this file references it but never duplicates it.
Fixture manifest (SHA-pinned files + provenance) lives in _docs/00_problem/input_data/fixtures/README.md.
Per-service mock catalogue (what shape each mock returns) lives in _docs/00_problem/input_data/services.md.
Deferred fixture inventory + replay obligation lives in _docs/_process_leftovers/2026-05-19_autopilot_test_fixtures.md.

Seed data sets

Data Set	Description	Used by Tests	How Loaded	Cleanup
`image-set-existing`	`fixtures/images/{4d6e1830d211ad50,54f6459dbddb93d8,6dd601b7d2dc1b30,805bcf1e9f271a58,f997d0934726b555}.jpg` — 5 aerial frames	FT-P-Tier1Contract, NFT-PERF-L1, NFT-PERF-L2, FT-P-DetectExisting	mounted read-only via `fixtures-ro:/fixtures` on `rtsp-loopback` (encoded to `.mp4` clip) and on `detections-mock` (paired with `expected_detections.json` per frame)	volume detached on container teardown
`video-recon`	`fixtures/videos/94d42580bd1ad6ff.mp4`	NFT-PERF-T3	mounted read-only on `rtsp-loopback`; consumer requests stream at 30 fps then throttles decode + drops frames per scenario script	as above
`video-movement`	`fixtures/movement/video0[1-4].mp4` (4 wide-area clips)	FT-P-MoveStarter (visual reference only), FT-P-MoveBenchmark (deferred)	mounted on `rtsp-loopback`; played at 30 fps; consumer schedules which clip per scenario	as above
`image-semantic-starter`	`fixtures/semantic/semantic0[1-4].png` (1 winter + 3 unmarked season)	FT-P-ConcealStarter, FT-P-FootpathStarter (visual reference only; assertion semantics deferred)	mounted on `detections-mock` and `rtsp-loopback` as a single-frame loop	as above
`schemas-detection`	`fixtures/schemas/expected_detections.{json,schema.json}`	FT-P-Tier1Contract, FT-P-NormalisedBoxes (D6)	mounted on `e2e-consumer:/expected:ro`	as above
`sql-init-suite`	`fixtures/sql/init.sql`	NOT USED by autopilot tests (suite-only artefact; recorded here for traceability)	n/a	n/a
`mission-suite-fixture`	`<DEFERRED: missions_fixtures/mission_30x30km.json + mapobjects_10k.json; services.md §2>`	FT-P-MissionStart, FT-P-MapPull (Mp1), FT-P-MapPush (Mp3), NFT-RES-Mp2, NFT-RES-Mp4	mounted on `missions-mock` once acquired	as above
`mavlink-sitl-scripts`	scripted `ardupilot/sitl` scenarios (waypoint upload, geofence in/out, RTL on link loss, RTL on battery floor)	FT-P-WaypointInsert (O8), NFT-RES-R4, NFT-RES-R5, NFT-RES-R6, NFT-RES-R7, NFT-RES-R9	run in `mavlink-sitl` via `--script` argument per scenario	SITL container restarted per scenario
`operator-session-scripts`	scripted `(t, event)` traces — nominal, drop+reconnect, lost-link 30 s, sustained lost-link	FT-P-DecisionWindow (O1–O3, O4), FT-P-OperatorDecline (O5), FT-P-OperatorIgnoredSuppress (O6), FT-P-OperatorTimeout (O7), FT-P-OperatorConfirm (O8), NFT-RES-R4	replayed by `operator-replay` per scenario	per-scenario
`operator-envelopes`	`<DEFERRED: operator_envelopes/{valid,replayed,malformed,unsigned,expired}.bin; services.md §8 (Q9-blocked)>`	NFT-SEC-O9, NFT-SEC-O10, FT-P-OperatorConfirm (O8 happy path uses a default placeholder envelope)	replayed by `operator-replay`	per-scenario
`vlm-io-pairs`	`<DEFERRED: vlm_io_pairs/{roi,prompt,response}.* + schema-violation cases; services.md §7>`	NFT-PERF-L3, FT-P-DeepAnalysisHold (S5), NFT-SEC-VlmSchemaViolation	mounted on `vlm-mock`	per-scenario
`gimbal-csv-pairs`	`<DEFERRED: gimbal_csv/video0[1-4].csv paired with movement videos at zoomed-in band + threshold-edge cluster; services.md §6>`	FT-P-EgoMotion (M1), FT-P-MoveDuringHold (M2), FT-P-ThresholdEdge (M3), FT-P-MoveBenchmark (M4), NFT-PERF-L6, NFT-PERF-L7	replayed by `gimbal-mock` synchronised to RTSP frame timestamps	per-scenario
`tier1-replay-streams`	`<DEFERRED: tier1_replay/*.replay; services.md §1>`	FT-P-Tier1ContractIsolated (Tier B variant); Tier-E uses live `../detections`	served by `detections-mock`	per-scenario
`time-drift-scripts`	scripted clock offsets (50 ms ramp, 250 ms jump, NTP loss, GPS unlock)	NFT-RES-R8	injected by `time-injector` via faketime LD_PRELOAD shim	per-scenario
`synthetic-poi-feeds`	inline-authorable: confidence={0.39, 0.40, 0.70, 1.00}, ordering-test feed, sustained-rate feed >5 POI/min	FT-P-DecisionWindow (O1–O4), FT-P-POIOrdering (S4), NFT-PERF-T1	authored in Rust under `e2e/consumer/fixtures/synthetic_poi/`; pumped into the SUT by injecting recorded `Detections` into `detections-mock`	n/a (in-memory)
`bit-scenarios`	inline-authorable: every-dep-green, tier1-unreachable, storage-95pct-full	FT-P-BitPass (R1), NFT-RES-R2, NFT-RES-R3	manipulated by toggling mock services up/down + `autopilot-state` volume seed file	volume seed file removed

Data isolation strategy

Per scenario, fresh containers. Each scenario starts with docker compose down -v && docker compose up -d (the e2e-consumer orchestrates this via testcontainers-rs). No state leaks between scenarios.
autopilot-state volume is named per (test_id, run_id) so parallel scenario runs do not collide.
Deterministic seeds. Every randomness source in the SUT (POI age-factor tie-breaking, retry jitter, replay-window nonce window) is configured to a per-scenario seed via env vars (AUTOPILOT_RNG_SEED=<test_id>). The seed is captured in the CSV report.
Wall-clock control. Scenarios that depend on absolute time (NFT-RES-R8, NFT-RES-R4 grace window, FT-P-DecisionWindow timeouts) use time-injector (faketime LD_PRELOAD). The SUT's time.now() calls are intercepted; GPS-source state is set via the mavlink-sitl GLOBAL_POSITION_INT message stream.
Network determinism. All inter-service traffic stays on the autopilot-e2e Docker network (no internet egress). Latency injection (for L9 modem RTT exclusion checks) uses tc qdisc inside the operator-replay container.
No shared mocks between scenarios. Even when two scenarios use the same fixture, each gets its own mock container instance — this avoids stale state in missions-mock's POST-buffer or gimbal-mock's last-command cache.

Input data mapping (fixtures → scenarios)

This is the fixture-side index; the scenario-side index is in each *-tests.md file's Input data field.

Input data file	Source location	Description	Covers scenarios
`fixtures/images/4d6e1830d211ad50.jpg`	`_docs/00_problem/input_data/fixtures/images/`	Aerial frame, 1280 px input	FT-P-Tier1Contract (D6), NFT-PERF-L1, NFT-PERF-L2
`fixtures/images/{54f6...,6dd6...,805b...,f997...}.jpg`	same dir	4 additional aerial frames for existing-class regression	FT-P-DetectExisting (D2)
`fixtures/videos/94d42580bd1ad6ff.mp4`	same dir	Reconnaissance clip, 30 fps; consumer throttles to drop below 10 fps for ≥5 s	NFT-PERF-T3
`fixtures/movement/video01.mp4`	same dir	Wide-area movement clip (visual reference only)	FT-P-EgoMotion (M1) [DEFERRED — needs gimbal.csv]
`fixtures/movement/video02.mp4`	same dir	Wide-area movement clip (visual reference only)	FT-P-MoveDuringHold (M2) [DEFERRED — needs zoomed-in gimbal.csv]
`fixtures/movement/video03.mp4`	same dir	Wide-area movement clip (visual reference only)	FT-P-ThresholdEdge (M3) [DEFERRED — needs threshold-edge gimbal.csv]
`fixtures/movement/video04.mp4`	same dir	Wide-area movement clip (visual reference only)	FT-P-MoveBenchmark (M4) [DEFERRED — needs zoom-band benchmark CSV]
`fixtures/semantic/semantic01.png`	same dir	Winter concealed-position reference (starter only)	FT-P-ConcealStarter (D3, D4), FT-P-FootpathStarter (D5) [DEFERRED — needs annotated multi-season set]
`fixtures/semantic/semantic0[2-4].png`	same dir	3 unmarked-season concealed-position references	as above
`fixtures/schemas/expected_detections.json`	same dir	Reference output for D6	FT-P-Tier1Contract (D6), FT-P-NormalisedBoxes
`fixtures/schemas/expected_detections.schema.json`	same dir	Schema for normalised-box output	FT-P-NormalisedBoxes, NFT-SEC-Tier1SchemaViolation
`fixtures/sql/init.sql`	same dir	(suite-only — recorded for traceability)	none

Expected results mapping (scenario → comparison row)

Every scenario in *-tests.md traces to a row id in _docs/00_problem/input_data/expected_results/results_report.md. The comparison method + tolerance is owned by that row — this table is the scenario-side index so a reader can navigate from a test to its assertion contract.

Scenario ID	Input data	Expected result row	Comparison method	Tolerance	Source
FT-P-Tier1Contract	`image-set-existing` (1 frame)	`D6`	`schema_match` + `range`	each coord ∈ [0,1]	`fixtures/schemas/expected_detections.schema.json`
FT-P-DetectExisting	`image-set-existing` (5 frames)	`D2`	`numeric_tolerance`	± 0.02 (P, R)	`<DEFERRED: expected_results/existing_classes_baseline.json>`
FT-P-DetectNew	`<DEFERRED: new-class eval set>`	`D1`	`threshold_min`	P ≥ 0.80 AND R ≥ 0.80	`<DEFERRED: expected_results/new_classes_pr.json>`
FT-P-ConcealRecall	`image-semantic-starter` + `<DEFERRED: full set>`	`D3`	`threshold_min`	recall ≥ 0.60	`<DEFERRED: expected_results/concealed_positions.json>`
FT-P-ConcealPrecision	same	`D4`	`threshold_min`	precision ≥ 0.20	same
FT-P-FootpathRecall	`image-semantic-starter` + `<DEFERRED>`	`D5`	`threshold_min`	recall ≥ 0.70	`<DEFERRED: expected_results/footpaths.json>`
NFT-PERF-L1	`image-set-existing` (1 frame)	`L1`	`threshold_max`	≤ 100 ms	inline
NFT-PERF-L2	derived ROI from same	`L2`	`threshold_max`	≤ 200 ms	inline
NFT-PERF-L3	`vlm-io-pairs`	`L3`	`threshold_max`	≤ 5000 ms	inline
NFT-PERF-L4	`<DEFERRED: SITL or HW zoom-cmd capture>`	`L4`	`threshold_max`	≤ 2000 ms	inline
NFT-PERF-L5	`<DEFERRED: scripted scan→movement>`	`L5`	`threshold_max`	≤ 500 ms	inline
NFT-PERF-L6	`video-movement` (visual ref) + `<DEFERRED gimbal.csv>`	`L6`	`threshold_max`	≤ 1000 ms	inline
NFT-PERF-L7	`video-movement` + `<DEFERRED zoomed-in gimbal.csv>`	`L7`	`threshold_max`	≤ 1500 ms	inline
NFT-PERF-L8	`<DEFERRED: sweep→zoomed transition capture>`	`L8`	`threshold_max`	≤ 2000 ms	inline
NFT-PERF-L9	`<DEFERRED: operator-click → outbound>`	`L9`	`threshold_max`	≤ 500 ms	inline
NFT-PERF-T1	`synthetic-poi-feeds` (sustained > cap)	`T1`	`threshold_max`	≤ 5 / min	inline
NFT-PERF-T2	`<DEFERRED: MAVLink replay 60 s>`	`T2`	`range`	1 Hz ≤ r ≤ 10 Hz	inline
NFT-PERF-T3	`video-recon` (throttled)	`T3`	`exact` × 2	suppression bool + health=yellow	inline
FT-P-EgoMotion (M1)	`video-movement/video01.mp4` + `<DEFERRED gimbal.csv + telemetry.csv>`	`M1`	`set_contains`	candidate set == {vehicle}; ∉ tree row	inline
FT-P-MoveDuringHold (M2)	`video02.mp4` + `<DEFERRED zoomed-in CSV pair>`	`M2`	`exact`	1 candidate; preempt per priority rule	inline
FT-P-ThresholdEdge (M3)	`video03.mp4` + `<DEFERRED threshold-edge CSV>`	`M3`	`exact`	count == 0	inline
FT-P-MoveBenchmark (M4)	`video04.mp4` + `<DEFERRED benchmark suite>`	`M4`	`threshold_max`	per-zoom-band FP rate budget	`<DEFERRED: expected_results/movement_benchmark_caps.json>`
FT-P-SweepToZoom (S1)	`<DEFERRED scripted mission + POI>`	`S1`	`exact` × 3	transition + ROI + queue+=1	inline
FT-P-FootpathPan (S2)	`<DEFERRED hold + footpath polyline>`	`S2`	`numeric_tolerance`	centre offset ≤ 25% per frame	inline
FT-P-TargetFollow (S3)	`<DEFERRED confirmed target>`	`S3`	`threshold_max`	per-frame	dx,dy
FT-P-POIOrdering (S4)	`synthetic-poi-feeds` (ordering test)	`S4`	`exact (order)`	ordering matches `conf × prox × age`	inline
FT-P-DeepAnalysisHold (S5)	`<DEFERRED VLM-enabled hold>`	`S5`	`exact`	hold = min(5 s, vlm_complete)	inline
FT-P-DecisionWindow30s (O1)	`synthetic-poi-feeds` (conf=0.40)	`O1`	`exact`	window = 30 s	inline
FT-P-DecisionWindow120s (O2)	conf=1.00	`O2`	`exact`	window = 120 s	inline
FT-P-DecisionWindow75s (O3)	conf=0.70	`O3`	`numeric_tolerance`	window ≈ 75 s ± 0.5 s	inline
FT-N-BelowThreshold (O4)	conf=0.39	`O4`	`exact`	not surfaced	inline
FT-P-OperatorDecline (O5)	`operator-session-scripts` (nominal + decline)	`O5`	`exact (count Δ+1)` + `schema_match`	ignored-item appended	inline
FT-P-IgnoredSuppress (O6)	matching MGRS + class_group	`O6`	`exact`	not surfaced	inline
FT-P-OperatorTimeout (O7)	no-response + > window	`O7`	`exact` × 2	queue −1; ignored unchanged	inline
FT-P-OperatorConfirm (O8)	`operator-envelopes` (valid happy path)	`O8`	`exact (HTTP 200)` + `exact (mode)`	mission POST + target-follow	inline
NFT-SEC-O9	`operator-envelopes` (replayed)	`O9`	`exact` + `substring`	state unchanged; log contains "replay"	inline
NFT-SEC-O10	`operator-envelopes` (malformed/unsigned)	`O10`	`exact` + `substring`	state unchanged; log contains "invalid"	inline
FT-P-BitPass (R1)	`bit-scenarios` (every dep green)	`R1`	`exact` × 2	takeoff permitted + health all green	inline
FT-N-BitDetectionDown (R2)	tier1 unreachable	`R2`	`exact`	takeoff inhibited + detection red	inline
FT-N-BitStorageFull (R3)	storage ≥ 95 %	`R3`	`exact`	takeoff inhibited + storage red	inline
NFT-RES-R4	`operator-session-scripts` (sustained lost-link)	`R4`	`exact (RTL at 30 s ± 1 s)`	RTL command + operator-link red	inline
NFT-RES-R5	`mavlink-sitl-scripts` (battery at RTL-floor)	`R5`	`exact` × 2	RTL + health yellow	inline
NFT-RES-R6	battery at hard-floor	`R6`	`exact`	land-now	inline
NFT-RES-R7	`mavlink-sitl-scripts` (no-response retry exhaustion)	`R7`	`exact`	health red after max-retry	inline
NFT-RES-R8	`time-drift-scripts` (250 ms drift)	`R8`	`exact`	time-source yellow + clock_source/last_sync_at updated	inline
NFT-RES-R9	`mavlink-sitl-scripts` (EXCLUSION cross)	`R9`	`exact` × 2	waypoint rejected + RTL	inline
NFT-RES-LIM-Re1	`<DEFERRED long-running RSS harness>`	`Re1`	`threshold_max`	combined RSS ≤ 6 GB	inline
NFT-RES-LIM-Re2	Re1 + concurrent Tier-1 traffic	`Re2`	`numeric_tolerance`	Tier-1 ms/frame Δ ± 5 ms	inline
FT-P-MapPull (Mp1)	`<DEFERRED 30×30 km area + ~10k mapobjects>`	`Mp1`	`threshold_max`	≤ 30 s	inline
NFT-RES-Mp2	mock unreachable	`Mp2`	`exact` × 2	cached_fallback + BIT requires ack	inline
FT-P-MapPush (Mp3)	`<DEFERRED 60 min diff>`	`Mp3`	`threshold_max`	≤ 120 s	inline
NFT-RES-Mp4	POST returns 5xx	`Mp4`	`exact` × 2 + `threshold_max`	file exists + warning + retries ≤ cap	inline
FT-P-MapConflict (Mp5)	`<DEFERRED conflict pair>`	`Mp5`	`json_diff`	conflict resolution per Q8	`<DEFERRED: expected_results/mapobjects_conflict_resolution.json>`

External dependency mocks

(Index-only; per-mock acquisition status owned by services.md.)

External service	Mock/stub	How provided	Behavior
`../detections` Tier-1 RPC	`detections-mock` (gRPC bi-stream)	Docker container; serves `.replay` files	Returns recorded `Detections` byte-stream for the input frame's hash; serves a 19-class catalogue (0..18) deterministically; supports schema-violation injection for NFT-SEC tests
`missions` API	`missions-mock` (HTTPS FastAPI)	Docker container; TLS via self-signed test cert	Static JSON for `GET /missions/{id}`, `GET /missions/{id}/mapobjects`; records POST bodies for assertion; can be configured to return 5xx for NFT-RES-Mp4
ViewPro A40 RTSP	`rtsp-loopback` (mediamtx)	Docker container	Plays back `.mp4` at scheduled fps with frame-drop injection (T3)
ViewPro A40 gimbal	`gimbal-mock` (Rust UDP)	Docker container	Replays `gimbal.csv` synchronised to RTSP frame timestamps; echoes received commands with bounded latency budget
ArduPilot	`mavlink-sitl` (official ardupilot/ardupilot-sitl image)	Docker container	Deterministic SITL run from a scripted mission file
Ground Station modem	`operator-replay` (Python)	Docker container	Replays `(t, event)` script per scenario; signs envelopes per Q9 once resolved
Local VLM	`vlm-mock` (Python over UDS)	Docker container; UDS shared via `/tmp` volume	Returns paired `VlmAssessment` JSON; can return schema-violation responses for NFT-SEC tests
Wall-clock / GPS / NTP	`time-injector` (Rust)	LD_PRELOAD faketime shim into the SUT container at start	Scripted offset/jump/source-loss

Data validation rules

Data Type	Validation	Invalid Examples	Expected System Behaviour
Mission JSON	`mission-schema` (shared with `missions` repo)	missing required field; coord out of [-180, 180]; unknown enum value	system refuses; mission-state stays at last-known; health flips mission-config-source = yellow; structured-log at WARN with `schema_violation_field`
Map-object record	suite-level mapobjects schema	non-finite coordinate; class_group not in catalogue; missing MGRS	record dropped; counter `mapobjects_rejected_total` increments; structured-log at WARN
Tier-1 `Detections` stream	`expected_detections.schema.json` (normalised-box)	bbox coord ∉ [0, 1]; confidence ∉ [0, 1]; class_id ∉ {0..18}	frame's detections dropped (not partially used); `tier1_invalid_frame_total` increments; per AC D6 the system must surface a structured WARN
MAVLink message	MAVLink v2 dialect (per ArduPilot)	unknown MSG_ID; CRC mismatch; (if Q6 resolves to "signing on") missing signature	message dropped; if signing required and missing → security WARN; airframe-link health unaffected for individual drops
Operator command envelope	Q9 scheme (TBD)	replay (sequence_id seen recently); signature invalid; timestamp outside replay window	rejected at the boundary; no state mutation; security WARN with reason code; counters `operator_cmd_rejected_replay_total`, `..._signature_total`, `..._expired_total`
VLM `VlmAssessment` response	structured assessment schema	missing required field; wrong type; truncated JSON	fail-closed: assessment discarded; POI does NOT get the deep-analysis upgrade; structured WARN
RTSP frame	container-level decode	malformed H.264/265 NAL; oversized SPS	frame dropped; `frame_decode_error_total` increments; if rate falls below 10 fps for ≥5 s → T3 path triggers (zoom-in suppressed + health yellow)
Camera frame size	bounded crop policy (security_approach §Bounded input)	crop > configured max bytes; format not in allow-list	rejected at boundary; security WARN
Time source	wall-clock binding	GPS unlocked AND no NTP sync at boot	clock_source = `none`; health red until either source available

Deferred-fixture bridge (replay obligation)

Every <DEFERRED:> row above maps 1-to-1 to an entry in _docs/_process_leftovers/2026-05-19_autopilot_test_fixtures.md → "What is needed before /autodev can resume" table. On every /autodev invocation, the leftovers step must re-evaluate whether any deferred fixture has landed; once landed, the corresponding scenario(s) become unblocked and their Test status line in the matching *-tests.md file moves from DEFERRED — input fixture not yet acquired to READY.

Inline-authorable categories (10 and 11 in the leftover) — synthetic-poi-feeds, time-drift-scripts, operator-session-scripts, bit-scenarios — are NOT marked <DEFERRED:> in this file because they have no external dependency. They are authored by Phase 4's e2e/consumer/fixtures/ generators when the runner scripts come online.

20 KiB Raw Blame History Unescape Escape