autopilot/_docs/02_document/tests/blackbox-tests.md

# Blackbox Tests

Authored by `/test-spec` Phase 2 (2026-05-19). Every scenario observes the SUT only through public surfaces (RTSP, gRPC, MAVLink, REST, operator stream, gimbal UDP, VLM IPC, health endpoint, structured logs). No scenario imports internal modules or peeks at on-device state directly.

Each scenario header records:

- **Summary** — one-line behaviour validated.
- **Traces to** — AC ID(s) and any RESTRICT ID.
- **Tier** — execution tier required (U / I / B / E / HW).
- **Test status** — `READY` or `DEFERRED — <reason>` (per the override 2026-05-19 deferred scenarios are kept; release-gate items).

The `Expected result` field gives the inline pass/fail criterion; the authoritative comparison lives in `_docs/00_problem/input_data/expected_results/results_report.md` (referenced by row id).

---

## Positive Scenarios — Detection Quality (functional)

### FT-P-001: Tier-1 normalised-box contract conformance
**Summary**: Frame in → autopilot must consume and re-emit the Tier-1 detection stream conforming to the suite's normalised-box schema (class ids 0..18, coords ∈ [0,1]).
**Traces to**: AC `Detection Quality / D6`, RESTRICT `Suite-level architectural splits — Tier 1 lives in ../detections`.
**Tier**: B (mock detector) + E (live `../detections`).
**Test status**: READY.

**Preconditions**:
- SUT started; `detections-mock` serving recorded Tier-1 stream for `image-set-existing`.
- `e2e-consumer` subscribed to the SUT's outbound normalised-box stream (observable via the operator-stream channel and via the internal-test-only `/debug/detections` socket IF exposed in test build; otherwise via operator-stream only).

**Input data**: `fixtures/images/4d6e1830d211ad50.jpg` (1280 px aerial frame) → encoded into `rtsp-loopback` as a 1-second loop.

| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Begin RTSP playback of the frame loop | SUT consumes the frame; emits a normalised-box detection record on the operator-stream channel |
| 2 | Capture one emitted detection record | Record validates against `fixtures/schemas/expected_detections.schema.json`; every bbox coord ∈ [0,1]; class_id ∈ {0..18} |

**Expected outcome**: D6 — schema-match + range comparison passes.
**Max execution time**: 10 s.

---

### FT-P-002: Existing-class regression vs documented baseline
**Summary**: Per-class precision and recall must not regress by more than ±2 percentage points against the pinned baseline (P=0.816, R=0.852).
**Traces to**: AC `Detection Quality — Existing-class regression / D2`.
**Tier**: E + HW (HW required for the project-level Acceptance Gate).
**Test status**: DEFERRED — expected_results baseline JSON not yet recorded (`<DEFERRED: expected_results/existing_classes_baseline.json>`). Visual fixtures (5 frames) are on disk; baseline numbers depend on a recording against the currently pinned `../detections` model.

**Preconditions**:
- Baseline JSON recorded against pinned `../detections` model (DEFERRED).
- SUT + live `../detections` running (Tier E) or HW Jetson (HW).

**Input data**: `fixtures/images/{4d6e1830...,54f6459...,6dd601b7...,805bcf1e...,f997d093...}.jpg` (5 frames).

| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Stream each frame through RTSP | SUT emits detections per frame |
| 2 | Compare aggregated per-class P/R to baseline | each per-class P, R within ± 0.02 absolute of baseline |

**Expected outcome**: D2 — `numeric_tolerance` passes.
**Max execution time**: 60 s.

---

### FT-P-003: New-class precision and recall ≥80%
**Summary**: New target classes (black entrances, branch piles, footpaths, roads, trees, tree blocks) reach precision ≥0.80 AND recall ≥0.80 per class.
**Traces to**: AC `Detection Quality — New target classes / D1`.
**Tier**: E + HW.
**Test status**: DEFERRED — multi-season annotated new-class eval set not acquired; annotation campaign owned by `../ai-training` repo. `<DEFERRED: expected_results/new_classes_pr.json>`.

**Preconditions**:
- Multi-season annotated new-class eval set acquired.
- Tier-1 model updated to include the 5 new classes.

**Input data**: `<DEFERRED: new-class eval set across all four seasons>`.

| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Stream eval-set frames through RTSP | SUT emits detections including new-class items |
| 2 | Compute per-class P, R | each ≥ 0.80 |

**Expected outcome**: D1 — `threshold_min` passes for every new class.
**Max execution time**: 120 s.

---

### FT-P-004: Concealed-position recall ≥60% (initial gate)
**Summary**: System surfaces concealed positions (FPV hideouts, dugouts) with recall ≥0.60, accepting high false-positive rate as operators filter.
**Traces to**: AC `Detection Quality — Concealed-position recall / D3`.
**Tier**: E + HW.
**Test status**: DEFERRED — only 4 starter PNGs on disk; full multi-season annotated set required.

**Input data**: `fixtures/semantic/semantic0[1-4].png` (starter) + `<DEFERRED full set>`.

| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Stream concealed-position frames | SUT emits concealed-structure POIs |
| 2 | Compute aggregate recall against ground truth | recall ≥ 0.60 |

**Expected outcome**: D3 — `threshold_min` passes.
**Max execution time**: 120 s.

---

### FT-P-005: Concealed-position precision ≥20% (initial gate)
**Summary**: Concealed-position precision ≥0.20 (operators filter; high-FP-accepting gate).
**Traces to**: AC `Detection Quality — Concealed-position precision / D4`.
**Tier**: E + HW.
**Test status**: DEFERRED — same dataset gap as FT-P-004.

**Input data**: same as FT-P-004.

| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Stream concealed-position frames | SUT emits POIs |
| 2 | Compute aggregate precision against ground truth | precision ≥ 0.20 |

**Expected outcome**: D4 — `threshold_min` passes.

---

### FT-P-006: Footpath recall ≥70%
**Summary**: Footpath recall ≥0.70 across multi-season polyline-annotated eval set.
**Traces to**: AC `Detection Quality — Footpath recall / D5`.
**Tier**: E + HW.
**Test status**: DEFERRED — `<DEFERRED: footpath sequences (fresh + stale, all seasons), polyline-annotated>`.

**Input data**: `fixtures/semantic/semantic0[1-4].png` (starter; 4 frames feature footpaths leading to concealment) + `<DEFERRED full multi-season set>`.

| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Stream footpath-bearing frames | SUT emits footpath polyline annotations |
| 2 | Compute recall against polyline ground truth | recall ≥ 0.70 |

**Expected outcome**: D5 — `threshold_min` passes.

---

## Positive Scenarios — Movement Detection Behaviour

### FT-P-007: Ego-motion compensation rejects stable scene elements
**Summary**: With the camera platform itself moving, stable elements (tree rows, houses, roads) MUST NOT generate movement candidates; only the actual mover does.
**Traces to**: AC `Movement Detection — Stable objects MUST NOT be treated as moving / M1`, RESTRICT `Operational — moving camera platform`.
**Tier**: B (with paired CSVs) + E.
**Test status**: DEFERRED — `<DEFERRED: paired gimbal.csv + telemetry.csv for video01.mp4; scene must contain 1 stable tree row + 1 moving vehicle>`.

**Preconditions**:
- `rtsp-loopback` plays `fixtures/movement/video01.mp4` at 30 fps.
- `gimbal-mock` replays paired gimbal.csv synchronised to RTSP frame timestamps.
- `mavlink-sitl` replays paired telemetry (position + attitude) for the same duration.

| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Begin synchronised playback (video + gimbal + telemetry) | SUT begins consuming frames and ego-motion compensating |
| 2 | Capture every movement candidate emitted on operator-stream for the clip duration | exactly 1 candidate (the vehicle); tree-row position is NOT among candidates |

**Expected outcome**: M1 — `set_contains` passes; candidate set == {vehicle}; tree-row position ∉ candidates.
**Max execution time**: clip_duration + 10 s.

---

### FT-P-008: Movement detection continues during zoomed-in hold
**Summary**: While the camera is in a zoomed-in hold on a confirmed POI, a new mover appearing in the ROI is still detected and enqueued; current ROI is preempted only if the new candidate's priority exceeds it.
**Traces to**: AC `Movement Detection — MUST continue during the zoomed-in inspection / M2`.
**Tier**: B + E.
**Test status**: DEFERRED — `<DEFERRED zoomed-in gimbal.csv + telemetry.csv pair; 1 small mover>`.

**Input data**: `fixtures/movement/video02.mp4` + DEFERRED CSV pair.

| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Drive SUT into ZoomedIn hold via prior FT-P-016 setup | SUT in `ZoomedIn { roi, hold_started_at }` |
| 2 | Begin playback of the zoomed-in scene with the small mover | Movement candidate enqueued within ≤ 1.5 s (timing checked by NFT-PERF-L7) |
| 3 | Observe ROI lifecycle | ROI is preempted only if new candidate's `confidence × proximity × age_factor` exceeds the held ROI's; otherwise the held ROI completes |

**Expected outcome**: M2 — `exact` passes; 1 candidate enqueued; ROI preempt decision matches the documented priority rule.

---

### FT-P-009: Per-zoom-band threshold honoured (no false candidate at edge)
**Summary**: When a movement-cluster persists for one frame BELOW the configured per-zoom-band threshold, no candidate is emitted.
**Traces to**: AC `Movement Detection — configurable per-zoom-band false-positive budget MUST be honoured / M3`.
**Tier**: B.
**Test status**: DEFERRED — `<DEFERRED gimbal.csv simulating threshold edge>`.

**Input data**: `fixtures/movement/video03.mp4` + DEFERRED CSV.

| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Replay scene at the threshold edge | SUT processes frames |
| 2 | Observe candidate count over the clip duration | count == 0 |

**Expected outcome**: M3 — `exact` passes.

---

### FT-P-010: Movement zoomed-in benchmark FP-rate budget
**Summary**: Across the zoom-out + zoomed-in benchmark suite, false-positive rate per zoom band stays within the configurable per-zoom-band budget (Q14 fallback trigger).
**Traces to**: AC `Q-tagged criteria — Movement detection FP rate at zoomed-in inspection / M4` (depends on Q14).
**Tier**: B + E.
**Test status**: DEFERRED — `<DEFERRED: zoom-out + zoomed-in benchmark suite + expected_results/movement_benchmark_caps.json; Q14>`.

**Input data**: `fixtures/movement/video04.mp4` (visual ref) + DEFERRED benchmark suite.

| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Replay the benchmark suite end-to-end | SUT processes all frames |
| 2 | Aggregate FP candidates per zoom band | rate per band ≤ configured cap (default ≤ 0.5 / min at zoomed-in) |

**Expected outcome**: M4 — `threshold_max` passes per zoom band.

---

## Positive Scenarios — Scan & Camera Control

### FT-P-011: Sweep → zoomed-inspection transition + POI enqueue
**Summary**: A POI detected mid-sweep triggers a transition into zoomed-inspection within 2 s (timing: NFT-PERF-L8) AND the POI is enqueued correctly.
**Traces to**: AC `Scan & Camera Control — Transition from sweep to detailed inspection / S1`.
**Tier**: B + E.
**Test status**: DEFERRED — `<DEFERRED: scripted mission with planned route + simulated POI detected mid-sweep>`.

**Input data**: scripted MAVLink mission + scripted Tier-1 detection injection at known frame index.

| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Start SUT with scripted mission; begin RTSP playback | SUT enters `ZoomedOut`, performs sweep |
| 2 | Inject Tier-1 detection of a high-confidence target at frame N | SUT transitions to `ZoomedIn { roi, hold_started_at }`; ROI bbox matches the injected detection's bbox; POI queue length increments by 1 |

**Expected outcome**: S1 — `exact (transition)` + `exact (ROI matches POI bbox)` + `exact (queue Δ+1)`.

---

### FT-P-012: Footpath-pan during zoomed-in hold
**Summary**: During a zoomed-in hold on a footpath ROI, the camera pans along the footpath while the airframe continues to fly. The footpath stays in the centre 50% of frame for the duration of the hold.
**Traces to**: AC `Scan & Camera Control — pan to keep features visible / S2`.
**Tier**: B + E.
**Test status**: DEFERRED — `<DEFERRED: zoomed-inspection scenario with footpath polyline overlapping the ROI>`.

| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Drive SUT into ZoomedIn hold on a footpath ROI | SUT in `ZoomedIn { roi, hold_started_at }` |
| 2 | Continue airframe flight; observe gimbal commands stream | SUT issues pan commands to track the footpath; observed centre offset ≤ 25% per frame |

**Expected outcome**: S2 — `numeric_tolerance` passes; per-frame centre offset ≤ 0.25 × frame_dim.

---

### FT-P-013: Target-follow centre-window
**Summary**: After operator confirmation, target-follow mode keeps the target within the centre 25% of frame while visible.
**Traces to**: AC `Scan & Camera Control — target-follow mode / S3`.
**Tier**: B + E.
**Test status**: DEFERRED — `<DEFERRED: operator-confirmed target + 60 s follow window>`.

| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Drive SUT into `TargetFollow { target_id, started_at }` via prior FT-P-016 | mode == target-follow |
| 2 | Observe gimbal commands + per-frame target position for 60 s | per-frame |dx, dy| ≤ 0.125 × frame_size |

**Expected outcome**: S3 — `threshold_max` passes per frame.

---

### FT-P-014: POI queue ordering by `confidence × proximity × age_factor`
**Summary**: With 3 POIs varying in confidence × proximity × age_factor, the system pops them in the documented relative order.
**Traces to**: AC `Scan & Camera Control — POI queue MUST be ordered by … / S4`.
**Tier**: B.
**Test status**: READY (synthetic-poi-feeds inline-authorable).

**Input data**: `synthetic-poi-feeds` ordering test — 3 POIs with confidence ∈ {0.50, 0.80, 0.60}, proximity ∈ {near, mid, far}, age_factor ∈ {fresh, fresh, stale} chosen to produce a known relative ordering.

| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Inject the 3 POIs as Tier-1 detections | all 3 enter the queue |
| 2 | Observe ZoomedIn transitions over the next N seconds | SUT inspects POIs in the documented relative order |

**Expected outcome**: S4 — `exact (order)` passes.

---

### FT-P-015: Zoomed-in hold cap interacts with deep-analysis
**Summary**: Zoomed-in hold defaults to 5 s/POI but caps deep-analysis interactions at 2 s; actual hold duration = min(5 s, deep_analysis_complete_at).
**Traces to**: AC `Scan & Camera Control — hold endpoints up to 2 s for deep analysis … per-POI timeout (default 5 s/POI) / S5`.
**Tier**: B + E.
**Test status**: DEFERRED — `<DEFERRED: VLM-enabled hold scenario with vlm_io_pair returning within 2 s>`.

| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Drive SUT into ZoomedIn hold; enable deep-analysis | SUT begins VLM IPC call on enter |
| 2a | Case A: VLM returns at 1.5 s | hold ends at 1.5 s (deep_analysis_complete) |
| 2b | Case B: VLM returns at 3.0 s | hold ends at 2.0 s (deep-analysis cap) |
| 2c | Case C: deep-analysis disabled | hold ends at 5.0 s (per-POI timeout) |

**Expected outcome**: S5 — `exact` passes for each case.

---

## Positive Scenarios — Operator Workflow

### FT-P-016: Operator confirm → middle waypoint inserted + target-follow
**Summary**: Valid + signed operator-confirm command results in a middle waypoint POSTed to `missions` AND a transition into target-follow mode.
**Traces to**: AC `Operator Workflow — Operator confirmation MUST result in … / O8`.
**Tier**: B + E.
**Test status**: READY for happy path (default placeholder envelope until Q9 resolves; envelope replaced when Q9 ships).

**Input data**: `operator-envelopes` (valid happy path) + `mission-suite-fixture` (DEFERRED full version) + `operator-session-scripts` (nominal session).

| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | SUT in ZoomedIn hold on a POI surfaced to the operator | mode == ZoomedIn |
| 2 | Replay operator-confirm envelope on the return path | SUT validates envelope; commits decision |
| 3 | Observe HTTPS POST to `missions-mock` | `POST /missions/{id}` with a middle waypoint at the POI MGRS; HTTP 200 |
| 4 | Observe scan-mode state | mode == `TargetFollow { target_id, started_at }` |

**Expected outcome**: O8 — `exact (HTTP 200)` + `exact (mode == TargetFollow)`.

---

### FT-P-017: Decision window = 30 s at conf = 0.40
**Summary**: At confidence = 0.40 the decision window surfaced to the operator MUST equal 30 s (lower-bound anchor of the linear scale).
**Traces to**: AC `Operator Workflow — decision window … 40% confidence → 30 s / O1`.
**Tier**: B.
**Test status**: READY.

| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Inject a synthetic POI at conf=0.40 | POI surfaced on operator-stream with `decision_window_seconds: 30` |

**Expected outcome**: O1 — `exact (window == 30 s)`.

---

### FT-P-018: Decision window = 120 s at conf = 1.00
**Summary**: At confidence = 1.00 the decision window MUST equal 120 s (upper-bound anchor).
**Traces to**: AC `O2`.
**Tier**: B.
**Test status**: READY.

| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Inject a synthetic POI at conf=1.00 | window == 120 s |

**Expected outcome**: O2 — `exact`.

---

### FT-P-019: Decision window linear interpolation at conf = 0.70
**Summary**: At conf=0.70 the window is interpolated linearly between (0.40, 30 s) and (1.00, 120 s) → 75 s ± 0.5 s.
**Traces to**: AC `O3`.
**Tier**: B.
**Test status**: READY.

| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Inject a synthetic POI at conf=0.70 | window ≈ 75 s ± 0.5 s |

**Expected outcome**: O3 — `numeric_tolerance ± 0.5 s`.

---

### FT-P-020: Operator decline → persistent ignored-item
**Summary**: Operator-decline on a surfaced POI MUST persist an ignored-item entry keyed by `(MGRS cell, class_group)`.
**Traces to**: AC `Operator Workflow — Operator-decline MUST result in a persistent ignored-item entry / O5`.
**Tier**: B + E.
**Test status**: READY (operator-session-scripts inline-authorable; envelope uses default placeholder until Q9 resolves).

| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Surface a POI to the operator | POI on operator-stream |
| 2 | Replay operator-decline envelope | SUT validates; ignored-item count via health endpoint increments by 1; new item has `(MGRS, class_group)` matching the declined POI |

**Expected outcome**: O5 — `exact (count Δ+1)` + `schema_match` (ignored-item record shape).

---

### FT-P-021: Ignored-item suppresses future matching detections
**Summary**: A new detection whose `(MGRS, class_group)` matches an existing ignored-item MUST NOT be surfaced to the operator.
**Traces to**: AC `Operator Workflow — A new detection whose (MGRS, class_group) matches an existing ignored-item MUST NOT be surfaced / O6`.
**Tier**: B + E.
**Test status**: READY.

| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Seed an ignored-item for `(MGRS=X, class_group=Y)` via FT-P-020 | ignored-item present |
| 2 | Inject a new detection at `(MGRS=X, class_group=Y)` | operator-stream emits NO POI for this detection; counter `pois_suppressed_by_ignored_total` increments |

**Expected outcome**: O6 — `exact (count surfaced == 0)`.

---

### FT-P-022: Operator timeout = forget (no ignored-item)
**Summary**: If the decision window expires with no operator response, the POI is removed from the queue but NO ignored-item is created (forget, do not blacklist).
**Traces to**: AC `Operator Workflow — Timeout (no operator response within the window) MUST NOT create an ignored-item entry / O7`.
**Tier**: B + E.
**Test status**: READY.

| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Surface a POI at conf=0.40 (30 s window) | POI on operator-stream |
| 2 | Wait > 30 s with no response | POI removed from queue; ignored-item count UNCHANGED |

**Expected outcome**: O7 — `exact (queue −1)` + `exact (ignored-item count unchanged)`.

---

## Positive Scenarios — Pre-flight & Map Reconciliation

### FT-P-023: BIT pre-flight pass with every dependency healthy
**Summary**: When every external dependency is reachable + healthy AND on-device storage < 95 % full AND wall-clock is bound, BIT passes and takeoff is permitted.
**Traces to**: AC `Reliability & Safety — Pre-flight self-test MUST pass / R1`, RESTRICT `Reliability & Safety obligations — Pre-flight self-test (BIT) MUST gate takeoff`.
**Tier**: B + E.
**Test status**: READY (bit-scenarios inline-authorable).

| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Bring up all mocks healthy + clean autopilot-state volume | every dependency green |
| 2 | Trigger BIT via the BIT-arm operator command (or scripted in `operator-session-scripts`) | health endpoint returns `{ "ok": true, "deps": { ...all green }, "takeoff_permitted": true }` |

**Expected outcome**: R1 — `exact (takeoff_permitted == true)` + `exact (health.all == "green")`.

---

### FT-P-024: Pre-flight map pull ≤ 30 s for a 30×30 km region
**Summary**: Pulling the area-level map of previously-detected objects for a 30 km × 30 km mission area MUST complete within 30 s wall-clock.
**Traces to**: AC `Map Reconciliation — Pre-flight map pull / Mp1`.
**Tier**: B + E.
**Test status**: DEFERRED — `<DEFERRED: mock central area-map service with ~10000 map objects for the 30 km × 30 km region>`.

| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Configure `missions-mock` with the 30×30 km mapobjects fixture | mock ready |
| 2 | Trigger BIT (which pulls the map) | SUT issues `GET /missions/{id}/mapobjects`; local copy hydrated within 30 s |
| 3 | Confirm BIT proceeds normally afterwards | takeoff permitted |

**Expected outcome**: Mp1 — `threshold_max` passes (NFT-PERF measures the latency; this scenario asserts the functional pathway).

---

### FT-P-025: Post-flight map diff push for a 60-minute mission
**Summary**: Pushing the post-flight pass diff (~17 500 records: NEW + MOVED + REMOVED + CONFIRMED-EXISTING) for a 60-minute mission MUST complete within 120 s wall-clock.
**Traces to**: AC `Map Reconciliation — Post-flight pass diff push / Mp3`.
**Tier**: B + E.
**Test status**: DEFERRED — `<DEFERRED: 60-minute mission pass diff fixture>`.

| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Land the SUT after a 60-minute mission (scripted) | SUT enters post-flight reconciliation |
| 2 | Observe HTTPS POST to `missions-mock` | `POST /missions/{id}/mapobjects` with the diff; HTTP 200 within 120 s |

**Expected outcome**: Mp3 — `threshold_max` passes (NFT-PERF measures latency).

---

### FT-P-026: MapObjects conflict resolution (append-only + projection)
**Summary**: When two map updates conflict for the same `(spatial-cell, class_group)`, the SUT records both observations append-only AND computes the current view per the documented resolution rule.
**Traces to**: AC `Q-tagged — MapObjects conflict resolution / Mp5` (depends on Q8).
**Tier**: B + E.
**Test status**: DEFERRED — `<DEFERRED: conflict pair fixture + expected_results/mapobjects_conflict_resolution.json; Q8>`.

| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Seed the local mapobjects store via map pull | local store hydrated |
| 2 | Trigger two conflicting observations for `(cell=X, class=Y)` | both appended to the observation log |
| 3 | Observe the projected current view (via the operator-stream map-overlay channel or health debug) | current view matches the resolution rule (Q8) |

**Expected outcome**: Mp5 — `json_diff` passes against the reference.

---

## Negative Scenarios

### FT-N-001: BIT inhibits takeoff when Tier-1 detection is unreachable
**Summary**: When `../detections` is unreachable at BIT, takeoff MUST be inhibited and the detection dependency MUST report red.
**Traces to**: AC `Reliability & Safety — Pre-flight self-test MUST pass / R2`, RESTRICT `Suite-level architectural splits — Tier 1 lives in ../detections`.
**Tier**: B + E.
**Test status**: READY.

| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Stop `detections-mock` | dependency unreachable |
| 2 | Trigger BIT | health endpoint returns `takeoff_permitted: false`; `deps.detection == "red"`; operator-stream surfaces a BIT-failure event with category `detection` |
| 3 | Attempt to issue a takeoff MAVLink command (scripted) | SUT refuses; no MAVLink takeoff command observed on `mavlink-sitl` |

**Expected outcome**: R2 — `exact (takeoff inhibited)`.

---

### FT-N-002: BIT inhibits takeoff when persistent storage ≥ 95 % full
**Summary**: When the on-device persistent store is ≥ 95 % full at BIT, takeoff MUST be inhibited.
**Traces to**: AC `Reliability & Safety — Pre-flight self-test MUST pass / R3`, RESTRICT `On-device storage MUST be bounded`.
**Tier**: B.
**Test status**: READY.

| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Pre-fill `autopilot-state` volume to ≥ 95 % via seed file | storage threshold tripped |
| 2 | Trigger BIT | `takeoff_permitted: false`; `deps.storage == "red"` |

**Expected outcome**: R3 — `exact (takeoff inhibited)`.

---

### FT-N-003: Cache-fallback on map-pull timeout requires operator acknowledgement
**Summary**: When the pre-flight map pull times out, the SUT falls back to last-known cached MapObjects, reports `map_sync == "cached_fallback"`, AND MUST require explicit operator acknowledgement before takeoff is permitted.
**Traces to**: AC `Map Reconciliation — Cache-fallback on timeout is acceptable only with explicit operator acknowledgement / Mp2`.
**Tier**: B + E.
**Test status**: READY (operator-session-scripts inline-authorable; cached state seeded from prior pull).

| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Seed `autopilot-state` with a known prior MapObjects snapshot | cached map present |
| 2 | Configure `missions-mock` to timeout on `GET /missions/{id}/mapobjects` | mock returns 504 / silent timeout |
| 3 | Trigger BIT | SUT falls back to cached; `map_sync == "cached_fallback"`; BIT reports `takeoff_permitted: false, awaiting_ack: ["map_cache_fallback"]` |
| 4 | Replay operator-ack envelope for `map_cache_fallback` | BIT now reports `takeoff_permitted: true`; one structured-log entry at WARN with `map_cache_fallback_acked_by_operator` |
| 5 | Replay a takeoff scenario WITHOUT the ack | takeoff remains inhibited |

**Expected outcome**: Mp2 — `exact (cached_fallback)` + `exact (BIT requires explicit ack)`.

---

### FT-N-004: Below-threshold POI suppression (conf < 40 %)
**Summary**: A POI at confidence < 0.40 MUST NOT be surfaced to the operator at all.
**Traces to**: AC `Operator Workflow — Below 40% confidence, the POI MUST NOT be surfaced at all / O4`.
**Tier**: B.
**Test status**: READY.

| Step | Consumer Action | Expected System Response |
|---|---|---|
| 1 | Inject a synthetic POI at conf=0.39 | POI does NOT appear on operator-stream; counter `pois_below_threshold_total` increments by 1 |

**Expected outcome**: O4 — `exact (count surfaced == 0)`.

---

## Notes for downstream skills

- Decompose: every `READY` scenario above maps to at least one blackbox test task. DEFERRED scenarios MUST still produce a task spec (so the implementation has a placeholder), but the task spec's `Acceptance` section will reference the leftover entry that gates the fixture.
- Implement Tests: per-scenario assertion helpers (RTSP playback orchestration, MAVLink observer, operator-stream observer) are likely shared across scenarios — Phase 4's runner scripts will assume a thin `e2e/consumer/lib/` module that all scenarios depend on.
- Test-Spec Sync (cycle-update mode): post-implementation, scenarios may be split (e.g. FT-P-015's three sub-cases may become FT-P-015a/b/c) or merged. The traceability-matrix is the source of truth — every scenario MUST trace to at least one AC or RESTRICT.