mirror of https://github.com/azaion/autopilot.git synced 2026-06-21 11:01:10 +00:00

Files

T

Oleksandr Bezdieniezhnykh bc40ea7300 [AZ-626] Decompose complete: 47 tasks + docs + module layout

Greenfield Steps 1-6 baseline for the autopilot rewrite from legacy
Qt/C++ to a Rust workspace.

- Remove legacy Qt/C++ tree (ai_controller, drone_controller,
  misc/camera, python_scaffold, root Dockerfile, autopilot.pro,
  legacy main.py / requirements.txt).
- Add _docs/00_problem (problem, restrictions, acceptance criteria,
  security approach, input data + fixtures).
- Add _docs/01_solution/solution_draft01.
- Add _docs/02_document (architecture, system-flows, data_model,
  glossary, decision-rationale, deployment, 13 component descriptions,
  tests/ specs, FINAL_report, module-layout).
- Add _docs/02_tasks/todo with 47 task specs (AZ-640..AZ-686, one
  bootstrap + 46 component tasks) and _dependencies_table.md.
- Add .cursor/rules/artifact-srp.mdc (single-responsibility rule for
  canonical _docs artifacts).
- Track autodev state in _docs/_autodev_state.md (Step 6 completed,
  ready for Step 7 Implement).

Jira: bootstrap AZ-626; component epics AZ-627..AZ-639; tasks
AZ-640..AZ-686. Total complexity 173 points across 12 epics.

Co-authored-by: Cursor <cursoragent@cursor.com>

2026-05-19 11:02:01 +03:00

28 KiB

Raw Permalink Blame History

Blackbox Tests

Authored by /test-spec Phase 2 (2026-05-19). Every scenario observes the SUT only through public surfaces (RTSP, gRPC, MAVLink, REST, operator stream, gimbal UDP, VLM IPC, health endpoint, structured logs). No scenario imports internal modules or peeks at on-device state directly.

Each scenario header records:

Summary — one-line behaviour validated.
Traces to — AC ID(s) and any RESTRICT ID.
Tier — execution tier required (U / I / B / E / HW).
Test status — READY or DEFERRED — <reason> (per the override 2026-05-19 deferred scenarios are kept; release-gate items).

The Expected result field gives the inline pass/fail criterion; the authoritative comparison lives in _docs/00_problem/input_data/expected_results/results_report.md (referenced by row id).

Positive Scenarios — Detection Quality (functional)

FT-P-001: Tier-1 normalised-box contract conformance

Summary: Frame in → autopilot must consume and re-emit the Tier-1 detection stream conforming to the suite's normalised-box schema (class ids 0..18, coords ∈ [0,1]). Traces to: AC Detection Quality / D6, RESTRICT Suite-level architectural splits — Tier 1 lives in ../detections. Tier: B (mock detector) + E (live ../detections). Test status: READY.

Preconditions:

SUT started; detections-mock serving recorded Tier-1 stream for image-set-existing.
e2e-consumer subscribed to the SUT's outbound normalised-box stream (observable via the operator-stream channel and via the internal-test-only /debug/detections socket IF exposed in test build; otherwise via operator-stream only).

Input data: fixtures/images/4d6e1830d211ad50.jpg (1280 px aerial frame) → encoded into rtsp-loopback as a 1-second loop.

Step	Consumer Action	Expected System Response
1	Begin RTSP playback of the frame loop	SUT consumes the frame; emits a normalised-box detection record on the operator-stream channel
2	Capture one emitted detection record	Record validates against `fixtures/schemas/expected_detections.schema.json`; every bbox coord ∈ [0,1]; class_id ∈ {0..18}

Expected outcome: D6 — schema-match + range comparison passes. Max execution time: 10 s.

FT-P-002: Existing-class regression vs documented baseline

Summary: Per-class precision and recall must not regress by more than ±2 percentage points against the pinned baseline (P=0.816, R=0.852). Traces to: AC Detection Quality — Existing-class regression / D2. Tier: E + HW (HW required for the project-level Acceptance Gate). Test status: DEFERRED — expected_results baseline JSON not yet recorded (<DEFERRED: expected_results/existing_classes_baseline.json>). Visual fixtures (5 frames) are on disk; baseline numbers depend on a recording against the currently pinned ../detections model.

Preconditions:

Baseline JSON recorded against pinned ../detections model (DEFERRED).
SUT + live ../detections running (Tier E) or HW Jetson (HW).

Input data: fixtures/images/{4d6e1830...,54f6459...,6dd601b7...,805bcf1e...,f997d093...}.jpg (5 frames).

Step	Consumer Action	Expected System Response
1	Stream each frame through RTSP	SUT emits detections per frame
2	Compare aggregated per-class P/R to baseline	each per-class P, R within ± 0.02 absolute of baseline

Expected outcome: D2 — numeric_tolerance passes. Max execution time: 60 s.

FT-P-003: New-class precision and recall ≥80%

Summary: New target classes (black entrances, branch piles, footpaths, roads, trees, tree blocks) reach precision ≥0.80 AND recall ≥0.80 per class. Traces to: AC Detection Quality — New target classes / D1. Tier: E + HW. Test status: DEFERRED — multi-season annotated new-class eval set not acquired; annotation campaign owned by ../ai-training repo. <DEFERRED: expected_results/new_classes_pr.json>.

Preconditions:

Multi-season annotated new-class eval set acquired.
Tier-1 model updated to include the 5 new classes.

Input data: <DEFERRED: new-class eval set across all four seasons>.

Step	Consumer Action	Expected System Response
1	Stream eval-set frames through RTSP	SUT emits detections including new-class items
2	Compute per-class P, R	each ≥ 0.80

Expected outcome: D1 — threshold_min passes for every new class. Max execution time: 120 s.

FT-P-004: Concealed-position recall ≥60% (initial gate)

Summary: System surfaces concealed positions (FPV hideouts, dugouts) with recall ≥0.60, accepting high false-positive rate as operators filter. Traces to: AC Detection Quality — Concealed-position recall / D3. Tier: E + HW. Test status: DEFERRED — only 4 starter PNGs on disk; full multi-season annotated set required.

Input data: fixtures/semantic/semantic0[1-4].png (starter) + <DEFERRED full set>.

Step	Consumer Action	Expected System Response
1	Stream concealed-position frames	SUT emits concealed-structure POIs
2	Compute aggregate recall against ground truth	recall ≥ 0.60

Expected outcome: D3 — threshold_min passes. Max execution time: 120 s.

FT-P-005: Concealed-position precision ≥20% (initial gate)

Summary: Concealed-position precision ≥0.20 (operators filter; high-FP-accepting gate). Traces to: AC Detection Quality — Concealed-position precision / D4. Tier: E + HW. Test status: DEFERRED — same dataset gap as FT-P-004.

Input data: same as FT-P-004.

Step	Consumer Action	Expected System Response
1	Stream concealed-position frames	SUT emits POIs
2	Compute aggregate precision against ground truth	precision ≥ 0.20

Expected outcome: D4 — threshold_min passes.

FT-P-006: Footpath recall ≥70%

Summary: Footpath recall ≥0.70 across multi-season polyline-annotated eval set. Traces to: AC Detection Quality — Footpath recall / D5. Tier: E + HW. Test status: DEFERRED — <DEFERRED: footpath sequences (fresh + stale, all seasons), polyline-annotated>.

Input data: fixtures/semantic/semantic0[1-4].png (starter; 4 frames feature footpaths leading to concealment) + <DEFERRED full multi-season set>.

Step	Consumer Action	Expected System Response
1	Stream footpath-bearing frames	SUT emits footpath polyline annotations
2	Compute recall against polyline ground truth	recall ≥ 0.70

Expected outcome: D5 — threshold_min passes.

Positive Scenarios — Movement Detection Behaviour

FT-P-007: Ego-motion compensation rejects stable scene elements

Summary: With the camera platform itself moving, stable elements (tree rows, houses, roads) MUST NOT generate movement candidates; only the actual mover does. Traces to: AC Movement Detection — Stable objects MUST NOT be treated as moving / M1, RESTRICT Operational — moving camera platform. Tier: B (with paired CSVs) + E. Test status: DEFERRED — <DEFERRED: paired gimbal.csv + telemetry.csv for video01.mp4; scene must contain 1 stable tree row + 1 moving vehicle>.

Preconditions:

rtsp-loopback plays fixtures/movement/video01.mp4 at 30 fps.
gimbal-mock replays paired gimbal.csv synchronised to RTSP frame timestamps.
mavlink-sitl replays paired telemetry (position + attitude) for the same duration.

Step	Consumer Action	Expected System Response
1	Begin synchronised playback (video + gimbal + telemetry)	SUT begins consuming frames and ego-motion compensating
2	Capture every movement candidate emitted on operator-stream for the clip duration	exactly 1 candidate (the vehicle); tree-row position is NOT among candidates

Expected outcome: M1 — set_contains passes; candidate set == {vehicle}; tree-row position ∉ candidates. Max execution time: clip_duration + 10 s.

FT-P-008: Movement detection continues during zoomed-in hold

Summary: While the camera is in a zoomed-in hold on a confirmed POI, a new mover appearing in the ROI is still detected and enqueued; current ROI is preempted only if the new candidate's priority exceeds it. Traces to: AC Movement Detection — MUST continue during the zoomed-in inspection / M2. Tier: B + E. Test status: DEFERRED — <DEFERRED zoomed-in gimbal.csv + telemetry.csv pair; 1 small mover>.

Input data: fixtures/movement/video02.mp4 + DEFERRED CSV pair.

Step	Consumer Action	Expected System Response
1	Drive SUT into ZoomedIn hold via prior FT-P-016 setup	SUT in `ZoomedIn { roi, hold_started_at }`
2	Begin playback of the zoomed-in scene with the small mover	Movement candidate enqueued within ≤ 1.5 s (timing checked by NFT-PERF-L7)
3	Observe ROI lifecycle	ROI is preempted only if new candidate's `confidence × proximity × age_factor` exceeds the held ROI's; otherwise the held ROI completes

Expected outcome: M2 — exact passes; 1 candidate enqueued; ROI preempt decision matches the documented priority rule.

FT-P-009: Per-zoom-band threshold honoured (no false candidate at edge)

Summary: When a movement-cluster persists for one frame BELOW the configured per-zoom-band threshold, no candidate is emitted. Traces to: AC Movement Detection — configurable per-zoom-band false-positive budget MUST be honoured / M3. Tier: B. Test status: DEFERRED — <DEFERRED gimbal.csv simulating threshold edge>.

Input data: fixtures/movement/video03.mp4 + DEFERRED CSV.

Step	Consumer Action	Expected System Response
1	Replay scene at the threshold edge	SUT processes frames
2	Observe candidate count over the clip duration	count == 0

Expected outcome: M3 — exact passes.

FT-P-010: Movement zoomed-in benchmark FP-rate budget

Summary: Across the zoom-out + zoomed-in benchmark suite, false-positive rate per zoom band stays within the configurable per-zoom-band budget (Q14 fallback trigger). Traces to: AC Q-tagged criteria — Movement detection FP rate at zoomed-in inspection / M4 (depends on Q14). Tier: B + E. Test status: DEFERRED — <DEFERRED: zoom-out + zoomed-in benchmark suite + expected_results/movement_benchmark_caps.json; Q14>.

Input data: fixtures/movement/video04.mp4 (visual ref) + DEFERRED benchmark suite.

Step	Consumer Action	Expected System Response
1	Replay the benchmark suite end-to-end	SUT processes all frames
2	Aggregate FP candidates per zoom band	rate per band ≤ configured cap (default ≤ 0.5 / min at zoomed-in)

Expected outcome: M4 — threshold_max passes per zoom band.

Positive Scenarios — Scan & Camera Control

FT-P-011: Sweep → zoomed-inspection transition + POI enqueue

Summary: A POI detected mid-sweep triggers a transition into zoomed-inspection within 2 s (timing: NFT-PERF-L8) AND the POI is enqueued correctly. Traces to: AC Scan & Camera Control — Transition from sweep to detailed inspection / S1. Tier: B + E. Test status: DEFERRED — <DEFERRED: scripted mission with planned route + simulated POI detected mid-sweep>.

Input data: scripted MAVLink mission + scripted Tier-1 detection injection at known frame index.

Step	Consumer Action	Expected System Response
1	Start SUT with scripted mission; begin RTSP playback	SUT enters `ZoomedOut`, performs sweep
2	Inject Tier-1 detection of a high-confidence target at frame N	SUT transitions to `ZoomedIn { roi, hold_started_at }`; ROI bbox matches the injected detection's bbox; POI queue length increments by 1

Expected outcome: S1 — exact (transition) + exact (ROI matches POI bbox) + exact (queue Δ+1).

FT-P-012: Footpath-pan during zoomed-in hold

Summary: During a zoomed-in hold on a footpath ROI, the camera pans along the footpath while the airframe continues to fly. The footpath stays in the centre 50% of frame for the duration of the hold. Traces to: AC Scan & Camera Control — pan to keep features visible / S2. Tier: B + E. Test status: DEFERRED — <DEFERRED: zoomed-inspection scenario with footpath polyline overlapping the ROI>.

Step	Consumer Action	Expected System Response
1	Drive SUT into ZoomedIn hold on a footpath ROI	SUT in `ZoomedIn { roi, hold_started_at }`
2	Continue airframe flight; observe gimbal commands stream	SUT issues pan commands to track the footpath; observed centre offset ≤ 25% per frame

Expected outcome: S2 — numeric_tolerance passes; per-frame centre offset ≤ 0.25 × frame_dim.

FT-P-013: Target-follow centre-window

Summary: After operator confirmation, target-follow mode keeps the target within the centre 25% of frame while visible. Traces to: AC Scan & Camera Control — target-follow mode / S3. Tier: B + E. Test status: DEFERRED — <DEFERRED: operator-confirmed target + 60 s follow window>.

Step	Consumer Action	Expected System Response
1	Drive SUT into `TargetFollow { target_id, started_at }` via prior FT-P-016	mode == target-follow
2	Observe gimbal commands + per-frame target position for 60 s	per-frame

Expected outcome: S3 — threshold_max passes per frame.

FT-P-014: POI queue ordering by `confidence × proximity × age_factor`

Summary: With 3 POIs varying in confidence × proximity × age_factor, the system pops them in the documented relative order. Traces to: AC Scan & Camera Control — POI queue MUST be ordered by … / S4. Tier: B. Test status: READY (synthetic-poi-feeds inline-authorable).

Input data: synthetic-poi-feeds ordering test — 3 POIs with confidence ∈ {0.50, 0.80, 0.60}, proximity ∈ {near, mid, far}, age_factor ∈ {fresh, fresh, stale} chosen to produce a known relative ordering.

Step	Consumer Action	Expected System Response
1	Inject the 3 POIs as Tier-1 detections	all 3 enter the queue
2	Observe ZoomedIn transitions over the next N seconds	SUT inspects POIs in the documented relative order

Expected outcome: S4 — exact (order) passes.

FT-P-015: Zoomed-in hold cap interacts with deep-analysis

Summary: Zoomed-in hold defaults to 5 s/POI but caps deep-analysis interactions at 2 s; actual hold duration = min(5 s, deep_analysis_complete_at). Traces to: AC Scan & Camera Control — hold endpoints up to 2 s for deep analysis … per-POI timeout (default 5 s/POI) / S5. Tier: B + E. Test status: DEFERRED — <DEFERRED: VLM-enabled hold scenario with vlm_io_pair returning within 2 s>.

Step	Consumer Action	Expected System Response
1	Drive SUT into ZoomedIn hold; enable deep-analysis	SUT begins VLM IPC call on enter
2a	Case A: VLM returns at 1.5 s	hold ends at 1.5 s (deep_analysis_complete)
2b	Case B: VLM returns at 3.0 s	hold ends at 2.0 s (deep-analysis cap)
2c	Case C: deep-analysis disabled	hold ends at 5.0 s (per-POI timeout)

Expected outcome: S5 — exact passes for each case.

Positive Scenarios — Operator Workflow

FT-P-016: Operator confirm → middle waypoint inserted + target-follow

Summary: Valid + signed operator-confirm command results in a middle waypoint POSTed to missions AND a transition into target-follow mode. Traces to: AC Operator Workflow — Operator confirmation MUST result in … / O8. Tier: B + E. Test status: READY for happy path (default placeholder envelope until Q9 resolves; envelope replaced when Q9 ships).

Input data: operator-envelopes (valid happy path) + mission-suite-fixture (DEFERRED full version) + operator-session-scripts (nominal session).

Step	Consumer Action	Expected System Response
1	SUT in ZoomedIn hold on a POI surfaced to the operator	mode == ZoomedIn
2	Replay operator-confirm envelope on the return path	SUT validates envelope; commits decision
3	Observe HTTPS POST to `missions-mock`	`POST /missions/{id}` with a middle waypoint at the POI MGRS; HTTP 200
4	Observe scan-mode state	mode == `TargetFollow { target_id, started_at }`

Expected outcome: O8 — exact (HTTP 200) + exact (mode == TargetFollow).

FT-P-017: Decision window = 30 s at conf = 0.40

Summary: At confidence = 0.40 the decision window surfaced to the operator MUST equal 30 s (lower-bound anchor of the linear scale). Traces to: AC Operator Workflow — decision window … 40% confidence → 30 s / O1. Tier: B. Test status: READY.

Step	Consumer Action	Expected System Response
1	Inject a synthetic POI at conf=0.40	POI surfaced on operator-stream with `decision_window_seconds: 30`

Expected outcome: O1 — exact (window == 30 s).

FT-P-018: Decision window = 120 s at conf = 1.00

Summary: At confidence = 1.00 the decision window MUST equal 120 s (upper-bound anchor). Traces to: AC O2. Tier: B. Test status: READY.

Step	Consumer Action	Expected System Response
1	Inject a synthetic POI at conf=1.00	window == 120 s

Expected outcome: O2 — exact.

FT-P-019: Decision window linear interpolation at conf = 0.70

Summary: At conf=0.70 the window is interpolated linearly between (0.40, 30 s) and (1.00, 120 s) → 75 s ± 0.5 s. Traces to: AC O3. Tier: B. Test status: READY.

Step	Consumer Action	Expected System Response
1	Inject a synthetic POI at conf=0.70	window ≈ 75 s ± 0.5 s

Expected outcome: O3 — numeric_tolerance ± 0.5 s.

FT-P-020: Operator decline → persistent ignored-item

Summary: Operator-decline on a surfaced POI MUST persist an ignored-item entry keyed by (MGRS cell, class_group). Traces to: AC Operator Workflow — Operator-decline MUST result in a persistent ignored-item entry / O5. Tier: B + E. Test status: READY (operator-session-scripts inline-authorable; envelope uses default placeholder until Q9 resolves).

Step	Consumer Action	Expected System Response
1	Surface a POI to the operator	POI on operator-stream
2	Replay operator-decline envelope	SUT validates; ignored-item count via health endpoint increments by 1; new item has `(MGRS, class_group)` matching the declined POI

Expected outcome: O5 — exact (count Δ+1) + schema_match (ignored-item record shape).

FT-P-021: Ignored-item suppresses future matching detections

Summary: A new detection whose (MGRS, class_group) matches an existing ignored-item MUST NOT be surfaced to the operator. Traces to: AC Operator Workflow — A new detection whose (MGRS, class_group) matches an existing ignored-item MUST NOT be surfaced / O6. Tier: B + E. Test status: READY.

Step	Consumer Action	Expected System Response
1	Seed an ignored-item for `(MGRS=X, class_group=Y)` via FT-P-020	ignored-item present
2	Inject a new detection at `(MGRS=X, class_group=Y)`	operator-stream emits NO POI for this detection; counter `pois_suppressed_by_ignored_total` increments

Expected outcome: O6 — exact (count surfaced == 0).

FT-P-022: Operator timeout = forget (no ignored-item)

Summary: If the decision window expires with no operator response, the POI is removed from the queue but NO ignored-item is created (forget, do not blacklist). Traces to: AC Operator Workflow — Timeout (no operator response within the window) MUST NOT create an ignored-item entry / O7. Tier: B + E. Test status: READY.

Step	Consumer Action	Expected System Response
1	Surface a POI at conf=0.40 (30 s window)	POI on operator-stream
2	Wait > 30 s with no response	POI removed from queue; ignored-item count UNCHANGED

Expected outcome: O7 — exact (queue −1) + exact (ignored-item count unchanged).

Positive Scenarios — Pre-flight & Map Reconciliation

FT-P-023: BIT pre-flight pass with every dependency healthy

Summary: When every external dependency is reachable + healthy AND on-device storage < 95 % full AND wall-clock is bound, BIT passes and takeoff is permitted. Traces to: AC Reliability & Safety — Pre-flight self-test MUST pass / R1, RESTRICT Reliability & Safety obligations — Pre-flight self-test (BIT) MUST gate takeoff. Tier: B + E. Test status: READY (bit-scenarios inline-authorable).

Step	Consumer Action	Expected System Response
1	Bring up all mocks healthy + clean autopilot-state volume	every dependency green
2	Trigger BIT via the BIT-arm operator command (or scripted in `operator-session-scripts`)	health endpoint returns `{ "ok": true, "deps": { ...all green }, "takeoff_permitted": true }`

Expected outcome: R1 — exact (takeoff_permitted == true) + exact (health.all == "green").

FT-P-024: Pre-flight map pull ≤ 30 s for a 30×30 km region

Summary: Pulling the area-level map of previously-detected objects for a 30 km × 30 km mission area MUST complete within 30 s wall-clock. Traces to: AC Map Reconciliation — Pre-flight map pull / Mp1. Tier: B + E. Test status: DEFERRED — <DEFERRED: mock central area-map service with ~10000 map objects for the 30 km × 30 km region>.

Step	Consumer Action	Expected System Response
1	Configure `missions-mock` with the 30×30 km mapobjects fixture	mock ready
2	Trigger BIT (which pulls the map)	SUT issues `GET /missions/{id}/mapobjects`; local copy hydrated within 30 s
3	Confirm BIT proceeds normally afterwards	takeoff permitted

Expected outcome: Mp1 — threshold_max passes (NFT-PERF measures the latency; this scenario asserts the functional pathway).

FT-P-025: Post-flight map diff push for a 60-minute mission

Summary: Pushing the post-flight pass diff (~17 500 records: NEW + MOVED + REMOVED + CONFIRMED-EXISTING) for a 60-minute mission MUST complete within 120 s wall-clock. Traces to: AC Map Reconciliation — Post-flight pass diff push / Mp3. Tier: B + E. Test status: DEFERRED — <DEFERRED: 60-minute mission pass diff fixture>.

Step	Consumer Action	Expected System Response
1	Land the SUT after a 60-minute mission (scripted)	SUT enters post-flight reconciliation
2	Observe HTTPS POST to `missions-mock`	`POST /missions/{id}/mapobjects` with the diff; HTTP 200 within 120 s

Expected outcome: Mp3 — threshold_max passes (NFT-PERF measures latency).

FT-P-026: MapObjects conflict resolution (append-only + projection)

Summary: When two map updates conflict for the same (spatial-cell, class_group), the SUT records both observations append-only AND computes the current view per the documented resolution rule. Traces to: AC Q-tagged — MapObjects conflict resolution / Mp5 (depends on Q8). Tier: B + E. Test status: DEFERRED — <DEFERRED: conflict pair fixture + expected_results/mapobjects_conflict_resolution.json; Q8>.

Step	Consumer Action	Expected System Response
1	Seed the local mapobjects store via map pull	local store hydrated
2	Trigger two conflicting observations for `(cell=X, class=Y)`	both appended to the observation log
3	Observe the projected current view (via the operator-stream map-overlay channel or health debug)	current view matches the resolution rule (Q8)

Expected outcome: Mp5 — json_diff passes against the reference.

Negative Scenarios

FT-N-001: BIT inhibits takeoff when Tier-1 detection is unreachable

Summary: When ../detections is unreachable at BIT, takeoff MUST be inhibited and the detection dependency MUST report red. Traces to: AC Reliability & Safety — Pre-flight self-test MUST pass / R2, RESTRICT Suite-level architectural splits — Tier 1 lives in ../detections. Tier: B + E. Test status: READY.

Step	Consumer Action	Expected System Response
1	Stop `detections-mock`	dependency unreachable
2	Trigger BIT	health endpoint returns `takeoff_permitted: false`; `deps.detection == "red"`; operator-stream surfaces a BIT-failure event with category `detection`
3	Attempt to issue a takeoff MAVLink command (scripted)	SUT refuses; no MAVLink takeoff command observed on `mavlink-sitl`

Expected outcome: R2 — exact (takeoff inhibited).

FT-N-002: BIT inhibits takeoff when persistent storage ≥ 95 % full

Summary: When the on-device persistent store is ≥ 95 % full at BIT, takeoff MUST be inhibited. Traces to: AC Reliability & Safety — Pre-flight self-test MUST pass / R3, RESTRICT On-device storage MUST be bounded. Tier: B. Test status: READY.

Step	Consumer Action	Expected System Response
1	Pre-fill `autopilot-state` volume to ≥ 95 % via seed file	storage threshold tripped
2	Trigger BIT	`takeoff_permitted: false`; `deps.storage == "red"`

Expected outcome: R3 — exact (takeoff inhibited).

FT-N-003: Cache-fallback on map-pull timeout requires operator acknowledgement

Summary: When the pre-flight map pull times out, the SUT falls back to last-known cached MapObjects, reports map_sync == "cached_fallback", AND MUST require explicit operator acknowledgement before takeoff is permitted. Traces to: AC Map Reconciliation — Cache-fallback on timeout is acceptable only with explicit operator acknowledgement / Mp2. Tier: B + E. Test status: READY (operator-session-scripts inline-authorable; cached state seeded from prior pull).

Step	Consumer Action	Expected System Response
1	Seed `autopilot-state` with a known prior MapObjects snapshot	cached map present
2	Configure `missions-mock` to timeout on `GET /missions/{id}/mapobjects`	mock returns 504 / silent timeout
3	Trigger BIT	SUT falls back to cached; `map_sync == "cached_fallback"`; BIT reports `takeoff_permitted: false, awaiting_ack: ["map_cache_fallback"]`
4	Replay operator-ack envelope for `map_cache_fallback`	BIT now reports `takeoff_permitted: true`; one structured-log entry at WARN with `map_cache_fallback_acked_by_operator`
5	Replay a takeoff scenario WITHOUT the ack	takeoff remains inhibited

Expected outcome: Mp2 — exact (cached_fallback) + exact (BIT requires explicit ack).

FT-N-004: Below-threshold POI suppression (conf < 40 %)

Summary: A POI at confidence < 0.40 MUST NOT be surfaced to the operator at all. Traces to: AC Operator Workflow — Below 40% confidence, the POI MUST NOT be surfaced at all / O4. Tier: B. Test status: READY.

Step	Consumer Action	Expected System Response
1	Inject a synthetic POI at conf=0.39	POI does NOT appear on operator-stream; counter `pois_below_threshold_total` increments by 1

Expected outcome: O4 — exact (count surfaced == 0).

Notes for downstream skills

Decompose: every READY scenario above maps to at least one blackbox test task. DEFERRED scenarios MUST still produce a task spec (so the implementation has a placeholder), but the task spec's Acceptance section will reference the leftover entry that gates the fixture.
Implement Tests: per-scenario assertion helpers (RTSP playback orchestration, MAVLink observer, operator-stream observer) are likely shared across scenarios — Phase 4's runner scripts will assume a thin e2e/consumer/lib/ module that all scenarios depend on.
Test-Spec Sync (cycle-update mode): post-implementation, scenarios may be split (e.g. FT-P-015's three sub-cases may become FT-P-015a/b/c) or merged. The traceability-matrix is the source of truth — every scenario MUST trace to at least one AC or RESTRICT.

28 KiB Raw Permalink Blame History Unescape Escape

Blackbox Tests

Positive Scenarios — Detection Quality (functional)

FT-P-001: Tier-1 normalised-box contract conformance

FT-P-002: Existing-class regression vs documented baseline

FT-P-003: New-class precision and recall ≥80%

FT-P-004: Concealed-position recall ≥60% (initial gate)

FT-P-005: Concealed-position precision ≥20% (initial gate)

FT-P-006: Footpath recall ≥70%

Positive Scenarios — Movement Detection Behaviour

FT-P-007: Ego-motion compensation rejects stable scene elements

FT-P-008: Movement detection continues during zoomed-in hold

FT-P-009: Per-zoom-band threshold honoured (no false candidate at edge)

FT-P-010: Movement zoomed-in benchmark FP-rate budget

Positive Scenarios — Scan & Camera Control

FT-P-011: Sweep → zoomed-inspection transition + POI enqueue

FT-P-012: Footpath-pan during zoomed-in hold

FT-P-013: Target-follow centre-window

FT-P-014: POI queue ordering by confidence × proximity × age_factor

FT-P-015: Zoomed-in hold cap interacts with deep-analysis

Positive Scenarios — Operator Workflow

FT-P-016: Operator confirm → middle waypoint inserted + target-follow

FT-P-017: Decision window = 30 s at conf = 0.40

FT-P-018: Decision window = 120 s at conf = 1.00

FT-P-019: Decision window linear interpolation at conf = 0.70

FT-P-020: Operator decline → persistent ignored-item

FT-P-021: Ignored-item suppresses future matching detections

FT-P-022: Operator timeout = forget (no ignored-item)

Positive Scenarios — Pre-flight & Map Reconciliation

FT-P-023: BIT pre-flight pass with every dependency healthy

FT-P-024: Pre-flight map pull ≤ 30 s for a 30×30 km region

FT-P-025: Post-flight map diff push for a 60-minute mission

FT-P-026: MapObjects conflict resolution (append-only + projection)

Negative Scenarios

FT-N-001: BIT inhibits takeoff when Tier-1 detection is unreachable

FT-N-002: BIT inhibits takeoff when persistent storage ≥ 95 % full

FT-N-003: Cache-fallback on map-pull timeout requires operator acknowledgement

FT-N-004: Below-threshold POI suppression (conf < 40 %)

Notes for downstream skills

28 KiB

Raw Permalink Blame History

FT-P-014: POI queue ordering by `confidence × proximity × age_factor`