Files
Oleksandr Bezdieniezhnykh bc40ea7300 [AZ-626] Decompose complete: 47 tasks + docs + module layout
Greenfield Steps 1-6 baseline for the autopilot rewrite from legacy
Qt/C++ to a Rust workspace.

- Remove legacy Qt/C++ tree (ai_controller, drone_controller,
  misc/camera, python_scaffold, root Dockerfile, autopilot.pro,
  legacy main.py / requirements.txt).
- Add _docs/00_problem (problem, restrictions, acceptance criteria,
  security approach, input data + fixtures).
- Add _docs/01_solution/solution_draft01.
- Add _docs/02_document (architecture, system-flows, data_model,
  glossary, decision-rationale, deployment, 13 component descriptions,
  tests/ specs, FINAL_report, module-layout).
- Add _docs/02_tasks/todo with 47 task specs (AZ-640..AZ-686, one
  bootstrap + 46 component tasks) and _dependencies_table.md.
- Add .cursor/rules/artifact-srp.mdc (single-responsibility rule for
  canonical _docs artifacts).
- Track autodev state in _docs/_autodev_state.md (Step 6 completed,
  ready for Step 7 Implement).

Jira: bootstrap AZ-626; component epics AZ-627..AZ-639; tasks
AZ-640..AZ-686. Total complexity 173 points across 12 epics.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-19 11:02:01 +03:00

28 KiB
Raw Permalink Blame History

Blackbox Tests

Authored by /test-spec Phase 2 (2026-05-19). Every scenario observes the SUT only through public surfaces (RTSP, gRPC, MAVLink, REST, operator stream, gimbal UDP, VLM IPC, health endpoint, structured logs). No scenario imports internal modules or peeks at on-device state directly.

Each scenario header records:

  • Summary — one-line behaviour validated.
  • Traces to — AC ID(s) and any RESTRICT ID.
  • Tier — execution tier required (U / I / B / E / HW).
  • Test statusREADY or DEFERRED — <reason> (per the override 2026-05-19 deferred scenarios are kept; release-gate items).

The Expected result field gives the inline pass/fail criterion; the authoritative comparison lives in _docs/00_problem/input_data/expected_results/results_report.md (referenced by row id).


Positive Scenarios — Detection Quality (functional)

FT-P-001: Tier-1 normalised-box contract conformance

Summary: Frame in → autopilot must consume and re-emit the Tier-1 detection stream conforming to the suite's normalised-box schema (class ids 0..18, coords ∈ [0,1]). Traces to: AC Detection Quality / D6, RESTRICT Suite-level architectural splits — Tier 1 lives in ../detections. Tier: B (mock detector) + E (live ../detections). Test status: READY.

Preconditions:

  • SUT started; detections-mock serving recorded Tier-1 stream for image-set-existing.
  • e2e-consumer subscribed to the SUT's outbound normalised-box stream (observable via the operator-stream channel and via the internal-test-only /debug/detections socket IF exposed in test build; otherwise via operator-stream only).

Input data: fixtures/images/4d6e1830d211ad50.jpg (1280 px aerial frame) → encoded into rtsp-loopback as a 1-second loop.

Step Consumer Action Expected System Response
1 Begin RTSP playback of the frame loop SUT consumes the frame; emits a normalised-box detection record on the operator-stream channel
2 Capture one emitted detection record Record validates against fixtures/schemas/expected_detections.schema.json; every bbox coord ∈ [0,1]; class_id ∈ {0..18}

Expected outcome: D6 — schema-match + range comparison passes. Max execution time: 10 s.


FT-P-002: Existing-class regression vs documented baseline

Summary: Per-class precision and recall must not regress by more than ±2 percentage points against the pinned baseline (P=0.816, R=0.852). Traces to: AC Detection Quality — Existing-class regression / D2. Tier: E + HW (HW required for the project-level Acceptance Gate). Test status: DEFERRED — expected_results baseline JSON not yet recorded (<DEFERRED: expected_results/existing_classes_baseline.json>). Visual fixtures (5 frames) are on disk; baseline numbers depend on a recording against the currently pinned ../detections model.

Preconditions:

  • Baseline JSON recorded against pinned ../detections model (DEFERRED).
  • SUT + live ../detections running (Tier E) or HW Jetson (HW).

Input data: fixtures/images/{4d6e1830...,54f6459...,6dd601b7...,805bcf1e...,f997d093...}.jpg (5 frames).

Step Consumer Action Expected System Response
1 Stream each frame through RTSP SUT emits detections per frame
2 Compare aggregated per-class P/R to baseline each per-class P, R within ± 0.02 absolute of baseline

Expected outcome: D2 — numeric_tolerance passes. Max execution time: 60 s.


FT-P-003: New-class precision and recall ≥80%

Summary: New target classes (black entrances, branch piles, footpaths, roads, trees, tree blocks) reach precision ≥0.80 AND recall ≥0.80 per class. Traces to: AC Detection Quality — New target classes / D1. Tier: E + HW. Test status: DEFERRED — multi-season annotated new-class eval set not acquired; annotation campaign owned by ../ai-training repo. <DEFERRED: expected_results/new_classes_pr.json>.

Preconditions:

  • Multi-season annotated new-class eval set acquired.
  • Tier-1 model updated to include the 5 new classes.

Input data: <DEFERRED: new-class eval set across all four seasons>.

Step Consumer Action Expected System Response
1 Stream eval-set frames through RTSP SUT emits detections including new-class items
2 Compute per-class P, R each ≥ 0.80

Expected outcome: D1 — threshold_min passes for every new class. Max execution time: 120 s.


FT-P-004: Concealed-position recall ≥60% (initial gate)

Summary: System surfaces concealed positions (FPV hideouts, dugouts) with recall ≥0.60, accepting high false-positive rate as operators filter. Traces to: AC Detection Quality — Concealed-position recall / D3. Tier: E + HW. Test status: DEFERRED — only 4 starter PNGs on disk; full multi-season annotated set required.

Input data: fixtures/semantic/semantic0[1-4].png (starter) + <DEFERRED full set>.

Step Consumer Action Expected System Response
1 Stream concealed-position frames SUT emits concealed-structure POIs
2 Compute aggregate recall against ground truth recall ≥ 0.60

Expected outcome: D3 — threshold_min passes. Max execution time: 120 s.


FT-P-005: Concealed-position precision ≥20% (initial gate)

Summary: Concealed-position precision ≥0.20 (operators filter; high-FP-accepting gate). Traces to: AC Detection Quality — Concealed-position precision / D4. Tier: E + HW. Test status: DEFERRED — same dataset gap as FT-P-004.

Input data: same as FT-P-004.

Step Consumer Action Expected System Response
1 Stream concealed-position frames SUT emits POIs
2 Compute aggregate precision against ground truth precision ≥ 0.20

Expected outcome: D4 — threshold_min passes.


FT-P-006: Footpath recall ≥70%

Summary: Footpath recall ≥0.70 across multi-season polyline-annotated eval set. Traces to: AC Detection Quality — Footpath recall / D5. Tier: E + HW. Test status: DEFERRED — <DEFERRED: footpath sequences (fresh + stale, all seasons), polyline-annotated>.

Input data: fixtures/semantic/semantic0[1-4].png (starter; 4 frames feature footpaths leading to concealment) + <DEFERRED full multi-season set>.

Step Consumer Action Expected System Response
1 Stream footpath-bearing frames SUT emits footpath polyline annotations
2 Compute recall against polyline ground truth recall ≥ 0.70

Expected outcome: D5 — threshold_min passes.


Positive Scenarios — Movement Detection Behaviour

FT-P-007: Ego-motion compensation rejects stable scene elements

Summary: With the camera platform itself moving, stable elements (tree rows, houses, roads) MUST NOT generate movement candidates; only the actual mover does. Traces to: AC Movement Detection — Stable objects MUST NOT be treated as moving / M1, RESTRICT Operational — moving camera platform. Tier: B (with paired CSVs) + E. Test status: DEFERRED — <DEFERRED: paired gimbal.csv + telemetry.csv for video01.mp4; scene must contain 1 stable tree row + 1 moving vehicle>.

Preconditions:

  • rtsp-loopback plays fixtures/movement/video01.mp4 at 30 fps.
  • gimbal-mock replays paired gimbal.csv synchronised to RTSP frame timestamps.
  • mavlink-sitl replays paired telemetry (position + attitude) for the same duration.
Step Consumer Action Expected System Response
1 Begin synchronised playback (video + gimbal + telemetry) SUT begins consuming frames and ego-motion compensating
2 Capture every movement candidate emitted on operator-stream for the clip duration exactly 1 candidate (the vehicle); tree-row position is NOT among candidates

Expected outcome: M1 — set_contains passes; candidate set == {vehicle}; tree-row position ∉ candidates. Max execution time: clip_duration + 10 s.


FT-P-008: Movement detection continues during zoomed-in hold

Summary: While the camera is in a zoomed-in hold on a confirmed POI, a new mover appearing in the ROI is still detected and enqueued; current ROI is preempted only if the new candidate's priority exceeds it. Traces to: AC Movement Detection — MUST continue during the zoomed-in inspection / M2. Tier: B + E. Test status: DEFERRED — <DEFERRED zoomed-in gimbal.csv + telemetry.csv pair; 1 small mover>.

Input data: fixtures/movement/video02.mp4 + DEFERRED CSV pair.

Step Consumer Action Expected System Response
1 Drive SUT into ZoomedIn hold via prior FT-P-016 setup SUT in ZoomedIn { roi, hold_started_at }
2 Begin playback of the zoomed-in scene with the small mover Movement candidate enqueued within ≤ 1.5 s (timing checked by NFT-PERF-L7)
3 Observe ROI lifecycle ROI is preempted only if new candidate's confidence × proximity × age_factor exceeds the held ROI's; otherwise the held ROI completes

Expected outcome: M2 — exact passes; 1 candidate enqueued; ROI preempt decision matches the documented priority rule.


FT-P-009: Per-zoom-band threshold honoured (no false candidate at edge)

Summary: When a movement-cluster persists for one frame BELOW the configured per-zoom-band threshold, no candidate is emitted. Traces to: AC Movement Detection — configurable per-zoom-band false-positive budget MUST be honoured / M3. Tier: B. Test status: DEFERRED — <DEFERRED gimbal.csv simulating threshold edge>.

Input data: fixtures/movement/video03.mp4 + DEFERRED CSV.

Step Consumer Action Expected System Response
1 Replay scene at the threshold edge SUT processes frames
2 Observe candidate count over the clip duration count == 0

Expected outcome: M3 — exact passes.


FT-P-010: Movement zoomed-in benchmark FP-rate budget

Summary: Across the zoom-out + zoomed-in benchmark suite, false-positive rate per zoom band stays within the configurable per-zoom-band budget (Q14 fallback trigger). Traces to: AC Q-tagged criteria — Movement detection FP rate at zoomed-in inspection / M4 (depends on Q14). Tier: B + E. Test status: DEFERRED — <DEFERRED: zoom-out + zoomed-in benchmark suite + expected_results/movement_benchmark_caps.json; Q14>.

Input data: fixtures/movement/video04.mp4 (visual ref) + DEFERRED benchmark suite.

Step Consumer Action Expected System Response
1 Replay the benchmark suite end-to-end SUT processes all frames
2 Aggregate FP candidates per zoom band rate per band ≤ configured cap (default ≤ 0.5 / min at zoomed-in)

Expected outcome: M4 — threshold_max passes per zoom band.


Positive Scenarios — Scan & Camera Control

FT-P-011: Sweep → zoomed-inspection transition + POI enqueue

Summary: A POI detected mid-sweep triggers a transition into zoomed-inspection within 2 s (timing: NFT-PERF-L8) AND the POI is enqueued correctly. Traces to: AC Scan & Camera Control — Transition from sweep to detailed inspection / S1. Tier: B + E. Test status: DEFERRED — <DEFERRED: scripted mission with planned route + simulated POI detected mid-sweep>.

Input data: scripted MAVLink mission + scripted Tier-1 detection injection at known frame index.

Step Consumer Action Expected System Response
1 Start SUT with scripted mission; begin RTSP playback SUT enters ZoomedOut, performs sweep
2 Inject Tier-1 detection of a high-confidence target at frame N SUT transitions to ZoomedIn { roi, hold_started_at }; ROI bbox matches the injected detection's bbox; POI queue length increments by 1

Expected outcome: S1 — exact (transition) + exact (ROI matches POI bbox) + exact (queue Δ+1).


FT-P-012: Footpath-pan during zoomed-in hold

Summary: During a zoomed-in hold on a footpath ROI, the camera pans along the footpath while the airframe continues to fly. The footpath stays in the centre 50% of frame for the duration of the hold. Traces to: AC Scan & Camera Control — pan to keep features visible / S2. Tier: B + E. Test status: DEFERRED — <DEFERRED: zoomed-inspection scenario with footpath polyline overlapping the ROI>.

Step Consumer Action Expected System Response
1 Drive SUT into ZoomedIn hold on a footpath ROI SUT in ZoomedIn { roi, hold_started_at }
2 Continue airframe flight; observe gimbal commands stream SUT issues pan commands to track the footpath; observed centre offset ≤ 25% per frame

Expected outcome: S2 — numeric_tolerance passes; per-frame centre offset ≤ 0.25 × frame_dim.


FT-P-013: Target-follow centre-window

Summary: After operator confirmation, target-follow mode keeps the target within the centre 25% of frame while visible. Traces to: AC Scan & Camera Control — target-follow mode / S3. Tier: B + E. Test status: DEFERRED — <DEFERRED: operator-confirmed target + 60 s follow window>.

Step Consumer Action Expected System Response
1 Drive SUT into TargetFollow { target_id, started_at } via prior FT-P-016 mode == target-follow
2 Observe gimbal commands + per-frame target position for 60 s per-frame

Expected outcome: S3 — threshold_max passes per frame.


FT-P-014: POI queue ordering by confidence × proximity × age_factor

Summary: With 3 POIs varying in confidence × proximity × age_factor, the system pops them in the documented relative order. Traces to: AC Scan & Camera Control — POI queue MUST be ordered by … / S4. Tier: B. Test status: READY (synthetic-poi-feeds inline-authorable).

Input data: synthetic-poi-feeds ordering test — 3 POIs with confidence ∈ {0.50, 0.80, 0.60}, proximity ∈ {near, mid, far}, age_factor ∈ {fresh, fresh, stale} chosen to produce a known relative ordering.

Step Consumer Action Expected System Response
1 Inject the 3 POIs as Tier-1 detections all 3 enter the queue
2 Observe ZoomedIn transitions over the next N seconds SUT inspects POIs in the documented relative order

Expected outcome: S4 — exact (order) passes.


FT-P-015: Zoomed-in hold cap interacts with deep-analysis

Summary: Zoomed-in hold defaults to 5 s/POI but caps deep-analysis interactions at 2 s; actual hold duration = min(5 s, deep_analysis_complete_at). Traces to: AC Scan & Camera Control — hold endpoints up to 2 s for deep analysis … per-POI timeout (default 5 s/POI) / S5. Tier: B + E. Test status: DEFERRED — <DEFERRED: VLM-enabled hold scenario with vlm_io_pair returning within 2 s>.

Step Consumer Action Expected System Response
1 Drive SUT into ZoomedIn hold; enable deep-analysis SUT begins VLM IPC call on enter
2a Case A: VLM returns at 1.5 s hold ends at 1.5 s (deep_analysis_complete)
2b Case B: VLM returns at 3.0 s hold ends at 2.0 s (deep-analysis cap)
2c Case C: deep-analysis disabled hold ends at 5.0 s (per-POI timeout)

Expected outcome: S5 — exact passes for each case.


Positive Scenarios — Operator Workflow

FT-P-016: Operator confirm → middle waypoint inserted + target-follow

Summary: Valid + signed operator-confirm command results in a middle waypoint POSTed to missions AND a transition into target-follow mode. Traces to: AC Operator Workflow — Operator confirmation MUST result in … / O8. Tier: B + E. Test status: READY for happy path (default placeholder envelope until Q9 resolves; envelope replaced when Q9 ships).

Input data: operator-envelopes (valid happy path) + mission-suite-fixture (DEFERRED full version) + operator-session-scripts (nominal session).

Step Consumer Action Expected System Response
1 SUT in ZoomedIn hold on a POI surfaced to the operator mode == ZoomedIn
2 Replay operator-confirm envelope on the return path SUT validates envelope; commits decision
3 Observe HTTPS POST to missions-mock POST /missions/{id} with a middle waypoint at the POI MGRS; HTTP 200
4 Observe scan-mode state mode == TargetFollow { target_id, started_at }

Expected outcome: O8 — exact (HTTP 200) + exact (mode == TargetFollow).


FT-P-017: Decision window = 30 s at conf = 0.40

Summary: At confidence = 0.40 the decision window surfaced to the operator MUST equal 30 s (lower-bound anchor of the linear scale). Traces to: AC Operator Workflow — decision window … 40% confidence → 30 s / O1. Tier: B. Test status: READY.

Step Consumer Action Expected System Response
1 Inject a synthetic POI at conf=0.40 POI surfaced on operator-stream with decision_window_seconds: 30

Expected outcome: O1 — exact (window == 30 s).


FT-P-018: Decision window = 120 s at conf = 1.00

Summary: At confidence = 1.00 the decision window MUST equal 120 s (upper-bound anchor). Traces to: AC O2. Tier: B. Test status: READY.

Step Consumer Action Expected System Response
1 Inject a synthetic POI at conf=1.00 window == 120 s

Expected outcome: O2 — exact.


FT-P-019: Decision window linear interpolation at conf = 0.70

Summary: At conf=0.70 the window is interpolated linearly between (0.40, 30 s) and (1.00, 120 s) → 75 s ± 0.5 s. Traces to: AC O3. Tier: B. Test status: READY.

Step Consumer Action Expected System Response
1 Inject a synthetic POI at conf=0.70 window ≈ 75 s ± 0.5 s

Expected outcome: O3 — numeric_tolerance ± 0.5 s.


FT-P-020: Operator decline → persistent ignored-item

Summary: Operator-decline on a surfaced POI MUST persist an ignored-item entry keyed by (MGRS cell, class_group). Traces to: AC Operator Workflow — Operator-decline MUST result in a persistent ignored-item entry / O5. Tier: B + E. Test status: READY (operator-session-scripts inline-authorable; envelope uses default placeholder until Q9 resolves).

Step Consumer Action Expected System Response
1 Surface a POI to the operator POI on operator-stream
2 Replay operator-decline envelope SUT validates; ignored-item count via health endpoint increments by 1; new item has (MGRS, class_group) matching the declined POI

Expected outcome: O5 — exact (count Δ+1) + schema_match (ignored-item record shape).


FT-P-021: Ignored-item suppresses future matching detections

Summary: A new detection whose (MGRS, class_group) matches an existing ignored-item MUST NOT be surfaced to the operator. Traces to: AC Operator Workflow — A new detection whose (MGRS, class_group) matches an existing ignored-item MUST NOT be surfaced / O6. Tier: B + E. Test status: READY.

Step Consumer Action Expected System Response
1 Seed an ignored-item for (MGRS=X, class_group=Y) via FT-P-020 ignored-item present
2 Inject a new detection at (MGRS=X, class_group=Y) operator-stream emits NO POI for this detection; counter pois_suppressed_by_ignored_total increments

Expected outcome: O6 — exact (count surfaced == 0).


FT-P-022: Operator timeout = forget (no ignored-item)

Summary: If the decision window expires with no operator response, the POI is removed from the queue but NO ignored-item is created (forget, do not blacklist). Traces to: AC Operator Workflow — Timeout (no operator response within the window) MUST NOT create an ignored-item entry / O7. Tier: B + E. Test status: READY.

Step Consumer Action Expected System Response
1 Surface a POI at conf=0.40 (30 s window) POI on operator-stream
2 Wait > 30 s with no response POI removed from queue; ignored-item count UNCHANGED

Expected outcome: O7 — exact (queue 1) + exact (ignored-item count unchanged).


Positive Scenarios — Pre-flight & Map Reconciliation

FT-P-023: BIT pre-flight pass with every dependency healthy

Summary: When every external dependency is reachable + healthy AND on-device storage < 95 % full AND wall-clock is bound, BIT passes and takeoff is permitted. Traces to: AC Reliability & Safety — Pre-flight self-test MUST pass / R1, RESTRICT Reliability & Safety obligations — Pre-flight self-test (BIT) MUST gate takeoff. Tier: B + E. Test status: READY (bit-scenarios inline-authorable).

Step Consumer Action Expected System Response
1 Bring up all mocks healthy + clean autopilot-state volume every dependency green
2 Trigger BIT via the BIT-arm operator command (or scripted in operator-session-scripts) health endpoint returns { "ok": true, "deps": { ...all green }, "takeoff_permitted": true }

Expected outcome: R1 — exact (takeoff_permitted == true) + exact (health.all == "green").


FT-P-024: Pre-flight map pull ≤ 30 s for a 30×30 km region

Summary: Pulling the area-level map of previously-detected objects for a 30 km × 30 km mission area MUST complete within 30 s wall-clock. Traces to: AC Map Reconciliation — Pre-flight map pull / Mp1. Tier: B + E. Test status: DEFERRED — <DEFERRED: mock central area-map service with ~10000 map objects for the 30 km × 30 km region>.

Step Consumer Action Expected System Response
1 Configure missions-mock with the 30×30 km mapobjects fixture mock ready
2 Trigger BIT (which pulls the map) SUT issues GET /missions/{id}/mapobjects; local copy hydrated within 30 s
3 Confirm BIT proceeds normally afterwards takeoff permitted

Expected outcome: Mp1 — threshold_max passes (NFT-PERF measures the latency; this scenario asserts the functional pathway).


FT-P-025: Post-flight map diff push for a 60-minute mission

Summary: Pushing the post-flight pass diff (~17 500 records: NEW + MOVED + REMOVED + CONFIRMED-EXISTING) for a 60-minute mission MUST complete within 120 s wall-clock. Traces to: AC Map Reconciliation — Post-flight pass diff push / Mp3. Tier: B + E. Test status: DEFERRED — <DEFERRED: 60-minute mission pass diff fixture>.

Step Consumer Action Expected System Response
1 Land the SUT after a 60-minute mission (scripted) SUT enters post-flight reconciliation
2 Observe HTTPS POST to missions-mock POST /missions/{id}/mapobjects with the diff; HTTP 200 within 120 s

Expected outcome: Mp3 — threshold_max passes (NFT-PERF measures latency).


FT-P-026: MapObjects conflict resolution (append-only + projection)

Summary: When two map updates conflict for the same (spatial-cell, class_group), the SUT records both observations append-only AND computes the current view per the documented resolution rule. Traces to: AC Q-tagged — MapObjects conflict resolution / Mp5 (depends on Q8). Tier: B + E. Test status: DEFERRED — <DEFERRED: conflict pair fixture + expected_results/mapobjects_conflict_resolution.json; Q8>.

Step Consumer Action Expected System Response
1 Seed the local mapobjects store via map pull local store hydrated
2 Trigger two conflicting observations for (cell=X, class=Y) both appended to the observation log
3 Observe the projected current view (via the operator-stream map-overlay channel or health debug) current view matches the resolution rule (Q8)

Expected outcome: Mp5 — json_diff passes against the reference.


Negative Scenarios

FT-N-001: BIT inhibits takeoff when Tier-1 detection is unreachable

Summary: When ../detections is unreachable at BIT, takeoff MUST be inhibited and the detection dependency MUST report red. Traces to: AC Reliability & Safety — Pre-flight self-test MUST pass / R2, RESTRICT Suite-level architectural splits — Tier 1 lives in ../detections. Tier: B + E. Test status: READY.

Step Consumer Action Expected System Response
1 Stop detections-mock dependency unreachable
2 Trigger BIT health endpoint returns takeoff_permitted: false; deps.detection == "red"; operator-stream surfaces a BIT-failure event with category detection
3 Attempt to issue a takeoff MAVLink command (scripted) SUT refuses; no MAVLink takeoff command observed on mavlink-sitl

Expected outcome: R2 — exact (takeoff inhibited).


FT-N-002: BIT inhibits takeoff when persistent storage ≥ 95 % full

Summary: When the on-device persistent store is ≥ 95 % full at BIT, takeoff MUST be inhibited. Traces to: AC Reliability & Safety — Pre-flight self-test MUST pass / R3, RESTRICT On-device storage MUST be bounded. Tier: B. Test status: READY.

Step Consumer Action Expected System Response
1 Pre-fill autopilot-state volume to ≥ 95 % via seed file storage threshold tripped
2 Trigger BIT takeoff_permitted: false; deps.storage == "red"

Expected outcome: R3 — exact (takeoff inhibited).


FT-N-003: Cache-fallback on map-pull timeout requires operator acknowledgement

Summary: When the pre-flight map pull times out, the SUT falls back to last-known cached MapObjects, reports map_sync == "cached_fallback", AND MUST require explicit operator acknowledgement before takeoff is permitted. Traces to: AC Map Reconciliation — Cache-fallback on timeout is acceptable only with explicit operator acknowledgement / Mp2. Tier: B + E. Test status: READY (operator-session-scripts inline-authorable; cached state seeded from prior pull).

Step Consumer Action Expected System Response
1 Seed autopilot-state with a known prior MapObjects snapshot cached map present
2 Configure missions-mock to timeout on GET /missions/{id}/mapobjects mock returns 504 / silent timeout
3 Trigger BIT SUT falls back to cached; map_sync == "cached_fallback"; BIT reports takeoff_permitted: false, awaiting_ack: ["map_cache_fallback"]
4 Replay operator-ack envelope for map_cache_fallback BIT now reports takeoff_permitted: true; one structured-log entry at WARN with map_cache_fallback_acked_by_operator
5 Replay a takeoff scenario WITHOUT the ack takeoff remains inhibited

Expected outcome: Mp2 — exact (cached_fallback) + exact (BIT requires explicit ack).


FT-N-004: Below-threshold POI suppression (conf < 40 %)

Summary: A POI at confidence < 0.40 MUST NOT be surfaced to the operator at all. Traces to: AC Operator Workflow — Below 40% confidence, the POI MUST NOT be surfaced at all / O4. Tier: B. Test status: READY.

Step Consumer Action Expected System Response
1 Inject a synthetic POI at conf=0.39 POI does NOT appear on operator-stream; counter pois_below_threshold_total increments by 1

Expected outcome: O4 — exact (count surfaced == 0).


Notes for downstream skills

  • Decompose: every READY scenario above maps to at least one blackbox test task. DEFERRED scenarios MUST still produce a task spec (so the implementation has a placeholder), but the task spec's Acceptance section will reference the leftover entry that gates the fixture.
  • Implement Tests: per-scenario assertion helpers (RTSP playback orchestration, MAVLink observer, operator-stream observer) are likely shared across scenarios — Phase 4's runner scripts will assume a thin e2e/consumer/lib/ module that all scenarios depend on.
  • Test-Spec Sync (cycle-update mode): post-implementation, scenarios may be split (e.g. FT-P-015's three sub-cases may become FT-P-015a/b/c) or merged. The traceability-matrix is the source of truth — every scenario MUST trace to at least one AC or RESTRICT.