# autopilot — System Flows This document traces the main runtime flows through the component graph. Each flow has the same structure: entry point, narrative, sequence diagram, error scenarios, and (for non-trivial flows) a data-flow table. F4 is the full behaviour-tree spec for `scan_controller`. ## Flow Index | # | Flow | Notes | |---|---|---| | F1 | Frame pipeline (RTSP → bboxes → fan-out) | 5–7 participants; the system's main data plane in. | | F2 | Movement detection (zoom-out + zoom-in) | Telemetry-synchronised ego-motion compensation; per-zoom-band thresholds; mandatory. | | F3 | VLM confirmation (optional) | Explicit `VlmAssessment` loopback + fail-closed branches + "VLM disabled" alt. | | F4 | Scan controller behaviour tree | Full BT spec (text + ASCII tree + YAML + 15 fixed-wing rules + tick scenarios). | | F5 | Operator round trip | Always-on stream + POI confirm / decline / follow / timeout. | | F6 | Mission lifecycle | `mission_client` → `mission_executor` → `mavlink_layer` → ArduPilot; multirotor + fixed-wing variants; lost-link failsafe. | | F7 | MapObjects + ignored-items (in-flight) | H3 lookup, k-ring, class-group, distance/move thresholds, REMOVED diff, ignored-items append. | | F8 | MapObjects sync (central DB) | Pre-flight pull / post-flight push against `/missions/{id}/mapobjects`; batched only for MVP. | | F9 | Pre-flight self-test (BIT) | Gates takeoff on every dependency in §5 plus mission load + MapObjects pre-flight pull. | | F10 | Lost-link failsafe ladder | `LinkOk → LinkDegraded → LinkLost → LinkLostInFollow` with grace windows; default RTL after 30 s. | --- ## F1. Frame pipeline (RTSP → bboxes → fan-out) **Entry**: `frame_ingest` opens an RTSP session against the ViewPro A40 and pushes frames at the platform-supported rate. **Narrative**: 1. `frame_ingest` decodes RTSP frames. Each decoded frame carries a monotonic timestamp, the gimbal's pan/tilt/zoom snapshot, and the UAV motion sample, all synchronised within the configured skew tolerance (out-of-tolerance frames are dropped or downgraded — see F2). 2. Each frame is forwarded to `detection_client`, which streams it on a bi-directional gRPC channel to the external `../detections` service. `../detections` returns a list of normalized bboxes (class, confidence, geometry). 3. The bbox set fans out, in parallel, to: - `telemetry_stream` — for operator-side overlay rendering (always-on; not gated on detection content). - `scan_controller` — for the zoom-out sweep / zoom-in decision, the POI queue, and the ≤5 POIs/min cap. - `semantic_analyzer` — only when `scan_controller` is in `ZoomedIn`; it consumes the cropped ROI and returns Tier 2 evidence (path freshness, endpoint scoring, concealment score). - `vlm_client` — only when `scan_controller` requests zoom-in confirmation **and** the VLM optionality flag is enabled; see F3. 4. `scan_controller` emits a single coherent decision per tick: continue zoom-out, transition to zoom-in on a queued POI, hold endpoint for VLM, or enter target-follow on operator confirmation. Decision-to-movement latency budget is ≤500 ms. **Sequence diagram**: ```mermaid sequenceDiagram participant CAM as ViewPro A40 (RTSP) participant FI as frame_ingest participant DC as detection_client participant DETS as ../detections participant SC as scan_controller participant SA as semantic_analyzer participant TS as telemetry_stream CAM-->>FI: RTSP frames FI->>DC: frame + telemetry snapshot DC->>DETS: bidir gRPC frame DETS-->>DC: bboxes (class, conf, geometry) par fan-out DC->>TS: bboxes for operator overlay and DC->>SC: bboxes for POI queue and DC->>SA: ROI crops (zoom-in only) end SA-->>SC: Tier 2 evidence SC->>SC: tick decision (ZoomedOut / ZoomedIn / TargetFollow) ``` **Error scenarios**: - **gRPC stream disconnect** to `../detections` → `detection_client` reconnects with backoff; in-flight frames are dropped, **not** retried (per-frame freshness matters more than completeness). Health endpoint reflects the dependency state. - **Telemetry skew exceeds tolerance** → `frame_ingest` flags the frame; `movement_detector` (F2) refuses to consume it; `scan_controller` still receives bboxes but with the skew flag set. - **Bbox payload schema-invalid** → `detection_client` rejects the message and logs structured error; no silent swallow. - **Tier 2 ROI inference fails** → `semantic_analyzer` returns a typed error; `scan_controller` treats the ROI as inconclusive and proceeds per the configured Tier-2-failure policy (default: hold for VLM if enabled, else release POI with `inconclusive`). **Data-flow table**: | From → To | Payload | Channel | Lifecycle | |---|---|---|---| | Camera → frame_ingest | H.264/265 RTSP | RTSP/RTP | per-frame | | frame_ingest → detection_client | frame + telemetry snapshot | in-process | per-frame | | detection_client ↔ ../detections | bidi gRPC frame ↔ bboxes | gRPC | per-frame | | detection_client → telemetry_stream | bboxes for overlay | in-process | per-frame | | detection_client → scan_controller | bboxes for POI queue | in-process | per-frame | | detection_client → semantic_analyzer | ROI crops (zoom-in only) | in-process | per-ROI | | semantic_analyzer → scan_controller | Tier 2 evidence | in-process | per-ROI | --- ## F2. Movement detection (zoom-out + zoom-in) **Entry**: `movement_detector` subscribes to frames + synchronised telemetry from `frame_ingest` whenever `scan_controller` is in `ZoomedOut` **or** `ZoomedIn`. It is suppressed only during `TargetFollow` (the gimbal is dominated by tracking commands). **Narrative**: 1. `movement_detector` consumes only frames whose telemetry skew is within tolerance (frame timestamp ↔ gimbal pan/tilt/zoom ↔ UAV motion). The skew tolerance is **per zoom band** — tighter at zoom-in (gimbal slewing dominates the residual signal at narrow FOV). 2. It computes ego-motion compensation using OpenCV optical flow / global-motion estimation, fused with the gimbal angle / zoom and UAV velocity. Stable objects (trees, houses, terrain) must not be reported as moving solely because the platform moves. 3. Residual motion clusters that survive ego-motion subtraction are emitted as **movement candidates**, each with a normalised box, a confidence proxy, and a `source_zoom_band` enum (`zoomed_out` | `zoomed_in`). 4. The cluster-persistence threshold and residual-velocity floor are configured **per zoom band**. The pixel-to-metre ratio differs by ~10×, so the same residual pixel motion implies very different physical motion; the configuration normalises this. 5. Movement candidates are enqueued by `scan_controller`. Enqueue-latency budget is ≤1 s for zoom-out candidates, ≤1.5 s for zoom-in candidates (allowing a brief gimbal-stability window). They share the POI queue (and the ≤5 POIs/min cap) with semantic POIs. 6. **At zoom-in**, the typical lifecycle is: a candidate appears mid-hold → if it remains within the current ROI, the ROI's confidence is bumped (no new POI); → if it appears outside the current ROI but within the broader zoomed FOV, it becomes a candidate-POI for the queue; → if the gimbal needs to retarget, `scan_controller` decides whether to interrupt the current hold (only for higher-priority candidates). 7. After zoom-in confirmation, the system attempts semantic / YOLO confirmation as vehicle, people, or other relevant target. Target-follow only starts after operator confirmation. **Adequacy at zoom-in (research item, see `architecture.md §8 Q14`).** Classical OpenCV optical flow / global-motion estimation is well-validated at zoom-out but degrades when the gimbal is actively path-following at narrow FOV. The benchmark gate measures the false-positive rate at zoom-in **independently** from zoom-out. If the zoom-in cap is exceeded with classical CV, the implementation falls back to a learned optical-flow / CNN motion-segmentation module behind a feature flag, while keeping the same `Frame + telemetry → Vec` interface contract. **Sequence diagram**: ```mermaid sequenceDiagram participant FI as frame_ingest participant MD as movement_detector participant SC as scan_controller participant GC as gimbal_controller FI->>MD: frame + telemetry (skew OK; zoom band tagged) MD->>MD: ego-motion compensation (per-zoom-band) alt residual motion cluster MD->>SC: movement candidate (bbox, conf, source_zoom_band) alt source_zoom_band == zoomed_out SC->>SC: enqueue as zoom-in POI (≤1 s) SC->>GC: zoom + center on candidate else source_zoom_band == zoomed_in alt within current ROI SC->>SC: bump current ROI confidence else outside current ROI SC->>SC: enqueue as candidate-POI (≤1.5 s) Note over SC: interrupt only if higher priority
than current hold end end else stable scene MD-->>MD: drop (do not emit) end ``` **Error scenarios**: - **Telemetry skew out of tolerance** → `movement_detector` skips the frame; logs the skew; does **not** emit candidates from unsynchronised data. Skew tolerance is per-zoom-band; zoom-in is stricter. - **Optical-flow failure on degenerate frames** (low texture, motion blur) → returns no candidates; not an error. At zoom-in this is more frequent (narrow FOV, gimbal slewing); aggregate "no-candidate" rate above a threshold is surfaced as health → yellow. - **POI queue saturated at ≤5 POIs/min cap** → newest candidate is held with aging; oldest unprocessed candidates may age out per the queue policy. Zoom-in candidates inherit the same cap (no separate budget). - **Zoom-in confirmation rejects the candidate** (no semantic / YOLO match) → POI released; `scan_controller` returns to zoom-out (or to the next queued POI). - **Sustained zoom-in false-positive flood** (per-zoom-band threshold exceeded across the running window) → automatically suppress zoom-in movement detection and surface health → yellow; classical-CV failure mode for Q14 follow-up. **Data-flow table**: | From → To | Payload | Channel | Lifecycle | |---|---|---|---| | frame_ingest → movement_detector | frame + telemetry snapshot | in-process | per-frame (both zoom bands; suppressed only in TargetFollow) | | movement_detector → scan_controller | movement candidate (bbox + conf + source_zoom_band) | in-process | per-candidate | | scan_controller → gimbal_controller | zoom + centre command | in-process | per-POI transition | --- ## F3. VLM confirmation (optional) **Entry**: `scan_controller` is in `ZoomedIn`, holding on a POI endpoint, and the runtime configuration flag `vlm_enabled` is `true` (which is itself gated by the benchmark-gate result; see `architecture.md §7.6 Local VLM confirmation`). **Narrative**: 1. `scan_controller` requests confirmation for one bounded ROI crop. It hands the ROI plus context (POI class group, confidence, prior Tier 2 evidence) to `vlm_client`. 2. `vlm_client` validates the request payload (size limit, format allow-list, peer credential check), then pushes one bounded ROI crop + a short prompt over a Unix-domain socket to a local NanoLLM/VILA1.5-3B process. 3. The local VLM responds; `vlm_client` validates the answer against the structured `VlmAssessment` schema (label enum, confidence, evidence spans, reason, status). Free-form text is not a downstream API. 4. **Success path**: `vlm_client` emits the validated `VlmAssessment` back to `scan_controller`. `scan_controller` integrates it with Tier 2 evidence and decides whether to surface the POI to the operator (via `operator_bridge`), release the POI, or hold target-follow. 5. **Fail-closed paths** — if any of the following occur, `vlm_client` returns a `VlmAssessment` with `status` set accordingly and `label = inconclusive`: - **Schema-invalid output** → `status: schema_invalid`. - **Timeout** (response > 5 s/ROI budget) → `status: timeout`. - **IPC error** (socket closed, peer-cred check failed, oversized payload, decode failure) → `status: ipc_error`. In every fail-closed case, `scan_controller` MUST NOT promote the POI to a confirmed target on VLM evidence; it falls back to Tier 2 evidence + operator review. No silent swallow; the failure is logged with the originating POI ID. 6. **VLM-disabled alt path**: when `vlm_enabled == false` (benchmark gate failed, or build-time feature module is absent), `scan_controller` skips the VLM step entirely. The Level-2 hold proceeds on Tier 2 evidence alone; the operator timeout still applies; the POI is surfaced to the operator with `vlm_status: disabled` so the UI can render the source-of-evidence indicator correctly. **Sequence diagram**: ```mermaid sequenceDiagram participant SC as scan_controller participant VC as vlm_client participant VLM as NanoLLM/VILA1.5-3B (local IPC) participant OB as operator_bridge alt vlm_enabled == true SC->>VC: ROI crop + context (ZoomedIn hold) VC->>VC: validate payload (size/format/peer-cred) VC->>VLM: bounded crop + short prompt (UDS) alt VLM responds within ≤5 s VLM-->>VC: response text VC->>VC: schema validation alt schema OK VC-->>SC: VlmAssessment {label, conf, status: ok} SC->>OB: surface POI with VLM evidence else schema_invalid VC-->>SC: VlmAssessment {label: inconclusive, status: schema_invalid} SC->>OB: surface POI WITHOUT VLM evidence (fail-closed) end else timeout / IPC error VC-->>SC: VlmAssessment {label: inconclusive, status: timeout|ipc_error} SC->>OB: surface POI WITHOUT VLM evidence (fail-closed) end else vlm_enabled == false SC->>OB: surface POI with vlm_status: disabled end ``` **Error scenarios**: - **Sequential GPU contention** — VLM and YOLO share GPU memory. `scan_controller` enforces no concurrent execution; any concurrency violation is a logic bug and must be alarmed, not silently retried. - **Repeated `ipc_error` in short window** → `vlm_client` raises a structured health alert (does not silently disable VLM). The operator-facing health surface reflects degraded VLM availability; `scan_controller` continues operating with VLM treated as unavailable until recovery. - **Oversized ROI payload** → rejected at validation step; never sent to VLM. Logged. - **Free-form text outside the schema** → `status: schema_invalid`; never converted to a `confirmed` label. **Data-flow table**: | From → To | Payload | Channel | Lifecycle | |---|---|---|---| | scan_controller → vlm_client | ROI crop + context | in-process | per-Level-2 hold | | vlm_client ↔ NanoLLM | bounded crop + short prompt ↔ structured response | Unix-domain socket | per-request | | vlm_client → scan_controller | `VlmAssessment` (always returned, status reflects outcome) | in-process | per-request | | scan_controller → operator_bridge | POI + evidence (with `vlm_status` field) | in-process | per-POI surface | --- ## F4. Scan controller behaviour tree **Entry**: `scan_controller` ticks at a fixed rate (10 Hz) from process start; this section is the full spec. This flow is the spec for `scan_controller`. The rewrite uses a deterministic typed state machine inspired by the behaviour-tree (BT) model below — every tick re-evaluates the root, so a high-priority safety condition immediately preempts lower-priority mission work. The BT below is the canonical decomposition; the implementation may flatten it into a state machine as long as the priorities, preemption, blackboard semantics, and tick scenarios are preserved. ### What is a Behaviour Tree A behaviour tree (BT) is a hierarchical model that controls decision-making. The tree is **ticked** (evaluated) from the root every cycle. Each node returns one of three statuses: **Success**, **Failure**, or **Running**. #### Core node types | Node | Symbol | Behaviour | |---|---|---| | **Selector** (fallback) | `?` | Tries children left-to-right. Succeeds on first child success. Fails only if all children fail. | | **Sequence** | `→` | Runs children left-to-right. Fails on first child failure. Succeeds only if all children succeed. | | **Condition** | `◇` | Checks a boolean predicate. Returns Success or Failure instantly (no Running). | | **Action** | `▢` | Executes a command. Can return Running while in progress. | | **Decorator** | `δ` | Wraps a single child and modifies its result (invert, repeat, timeout, etc.). | #### Why it works for UAVs - Priority is structural: nodes higher and to the left are checked first. - Reactivity is built in: every tick re-evaluates from the root, so a high-priority safety condition immediately preempts lower-priority mission work. - Modularity: subtrees can be developed, tested, and swapped independently. #### Tick cycle visualised ```text Every tick (e.g. 10 Hz): Root Selector ├─ [1] Safety checks ← evaluated FIRST every tick ├─ [2] Target engagement ← only if safety passes └─ [3] Search pattern ← only if no target active ``` If during search the UAV drifts outside boundary, the next tick will hit the safety branch first and trigger recovery before search continues. ### UAV mission context - **Platform**: fixed-wing surveillance UAV. - **Target priority**: 1) Artillery, 2) Tanks, 3) Trucks and cars. - **Constraints**: operational boundary geofence, dynamic battery RTB, lost-link protocol. ### Sequence diagram (one tick) ```mermaid sequenceDiagram participant TICK as Tick (10 Hz) participant SC as scan_controller (root selector) participant SAFE as Safety subtree participant TENG as Target Engagement subtree participant SRCH as Search subtree participant BB as Blackboard TICK->>SC: tick() SC->>SAFE: evaluate alt safety triggered SAFE->>BB: read state (battery, geofence, comms, weather, health) SAFE-->>SC: Running (action in progress) | Success Note over SC: preempt lower priorities else safety passes SC->>TENG: evaluate alt target detected TENG->>BB: read target_*_detected TENG-->>SC: Running (orbit/track) | Success else no target SC->>SRCH: evaluate SRCH->>BB: read tree_row_detected / trench_detected SRCH-->>SC: Running (fly leg / investigate) end end ``` ### Full behaviour tree structure ```text Root [Selector] ├── Safety [Selector] │ ├── Boundary Recovery [Sequence] │ │ ├── [Condition] outside_boundary? │ │ └── [Action] recover_to_closest_boundary_point │ │ │ ├── Low Battery RTB [Sequence] │ │ ├── [Condition] battery ≤ energy_to_exit + reserve? │ │ └── [Action] return_to_exit_point │ │ │ ├── Lost Link [Sequence] │ │ ├── [Condition] comms_lost > timeout? │ │ └── [Action] execute_lost_link_route │ │ │ ├── No-Fly Zone [Sequence] │ │ ├── [Condition] approaching_nfz? │ │ └── [Action] divert_around_nfz │ │ │ ├── Weather Abort [Sequence] │ │ ├── [Condition] wind > max_safe_wind? │ │ └── [Action] return_to_exit_point │ │ │ └── Emergency Health [Sequence] │ ├── [Condition] critical_failure_detected? │ └── [Action] emergency_rtb │ ├── Target Engagement [Selector] │ ├── Artillery Track [Sequence] │ │ ├── [Condition] artillery_detected? │ │ ├── [Action] classify_confirm_artillery │ │ └── [Action] orbit_and_track_artillery │ │ │ ├── Tank Track [Sequence] │ │ ├── [Condition] tank_detected? │ │ ├── [Action] classify_confirm_tank │ │ └── [Action] orbit_and_track_tank │ │ │ └── Vehicle Track [Sequence] │ ├── [Condition] truck_or_car_detected? │ ├── [Action] classify_confirm_vehicle │ └── [Action] orbit_and_track_vehicle │ └── Search Pattern [Sequence] ├── [Action] fly_search_legs └── Area Investigation [Selector] ├── Tree Row Investigation [Sequence] │ ├── [Condition] tree_row_detected? │ └── [SubTree] investigate_tree_row │ ├── Trench Investigation [Sequence] │ ├── [Condition] trench_detected? │ └── [SubTree] investigate_trench │ └── [Action] continue_to_next_search_leg ``` ### Key subtrees explained #### 1. Safety — Boundary Recovery ```text Boundary Recovery [Sequence] ├── [Condition] NOT inside_boundary └── [Action] recover_to_closest_boundary_point ``` Runs every tick. The action computes the closest point on the planned route that lies inside the geofence and commands a turn toward it, respecting minimum turn radius and airspeed constraints of the fixed-wing platform. #### 2. Safety — Battery RTB The threshold is **dynamic**, not a fixed percentage. ```text battery_remaining_wh ≤ energy_to_exit_wh + reserve_wh ``` Where: - `energy_to_exit_wh` = f(distance_to_exit, ground_speed, headwind, altitude_delta, turn_penalties). - `reserve_wh` = contingency_reserve + landing_reserve. This is recalculated every tick based on current position, wind, and flight profile. As the UAV moves further from exit, the threshold rises. If the UAV is close to exit, more mission time is available. #### 3. Target Engagement — Priority The selector tries artillery first. If no artillery, it tries tanks. If no tanks, it tries trucks/cars. This is structural priority — no scoring logic needed; the tree position defines it. Each target sequence: 1. Condition: detection model flags target type with confidence above threshold. 2. Classify and confirm: zoom camera, run secondary classifier, verify. 3. Orbit and track: enter loiter pattern around confirmed target, transmit coordinates and video. A **target lock timeout** decorator wraps tracking actions. If the target is lost for T seconds, the action fails and the tree falls through to search. #### 4. Search Pattern — Area Sweep The search subtree handles the default behaviour when no target is actively tracked. ```text Search Pattern [Sequence] ├── [Action] fly_search_legs ← follow planned lawnmower/sector pattern └── Area Investigation [Selector] ← react to features found during sweep ├── Tree Row Investigation ... ├── Trench Investigation ... └── [Action] continue_to_next_leg ← nothing interesting, keep sweeping ``` ### Search Pattern: Tree Row and Trench Investigation This is a multi-phase investigation pattern that runs across both system zoom levels. The UAV flies a standard search pattern (lawnmower, expanding square, or sector scan) while in `ZoomedOut`. While flying, the detection model continuously analyses the camera feed. When it spots a feature of interest, the BT enters an investigation subtree which transitions to `ZoomedIn` and then deepens within it. > **Note on nomenclature.** The "Phase 1/2/3" below are investigation phases **inside** the BT, NOT the system-level zoom states. Phase 1 happens in `ZoomedOut`; Phases 2 and 3 happen in `ZoomedIn` (at medium and max zoom respectively). #### Phase 1 — Feature Detection (zoom-out wide-area scan) During normal search legs the detection model looks for: - **Tree rows** (linear vegetation features that can conceal equipment). - **Trenches** (linear earthwork features). If detected, the corresponding investigation subtree activates. #### Phase 2 — Tree Row Investigation (zoom-in, medium zoom) ```text investigate_tree_row [Sequence] ├── [Action] adjust_altitude_or_zoom_for_tree_row ├── [Action] fly_along_tree_row └── Feature Scan [Selector] ├── Car Entrance Found [Sequence] │ ├── [Condition] car_entrance_detected? │ └── [SubTree] detailed_inspection │ ├── Tracks Found [Sequence] │ ├── [Condition] car_or_truck_tracks_detected? │ └── [SubTree] detailed_inspection │ ├── Caponier Found [Sequence] │ ├── [Condition] caponier_detected? │ └── [SubTree] detailed_inspection │ ├── Trash Found [Sequence] │ ├── [Condition] trash_detected? │ └── [SubTree] detailed_inspection │ ├── Military Vehicle Found [Sequence] │ ├── [Condition] military_vehicle_detected? │ └── [SubTree] detailed_inspection │ ├── Truck Found [Sequence] │ ├── [Condition] truck_detected? │ └── [SubTree] detailed_inspection │ ├── Car Found [Sequence] │ ├── [Condition] car_detected? │ └── [SubTree] detailed_inspection │ └── [Action] mark_tree_row_clear_and_resume ``` The UAV adjusts camera zoom (or reduces altitude within safe bounds) and flies along the tree row length. The detection model scans for indicators of concealed activity. #### Phase 3 — Detailed Inspection (zoom-in, max zoom) ```text detailed_inspection [Sequence] ├── [Action] zoom_to_max_or_descend ├── [Action] loiter_over_point_of_interest ├── [Action] run_high_res_classifier ├── Vehicle Classification [Selector] │ ├── [Sequence] │ │ ├── [Condition] is_tank_or_artillery? │ │ └── [Action] flag_high_priority_target → feeds back to Target Engagement │ │ │ ├── [Sequence] │ │ ├── [Condition] is_truck? │ │ └── [Action] flag_medium_priority_target │ │ │ ├── [Sequence] │ │ ├── [Condition] is_car? │ │ └── [Action] flag_low_priority_target │ │ │ └── [Action] log_evidence_and_resume ├── [Action] capture_snapshot_and_coordinates └── [Action] transmit_report ``` When a high-priority target is confirmed at Phase 3, it is written to the blackboard. On the next tick, the root-level Target Engagement selector picks it up and takes over with orbit-and-track behaviour. #### Investigation flow summary ```text Phase 1: Wide-area scan (ZoomedOut, search legs) │ ▼ tree row or trench detected Phase 2: fly along feature (ZoomedIn, medium zoom) │ ▼ car entrance / tracks / caponier / trash / vehicle detected Phase 3: loiter over point (ZoomedIn, max zoom), classify │ ▼ confirmed military target Target Engagement takes over (orbit + track + report) ``` ### Additional Rules for Fixed-Wing Surveillance UAV #### Already in the tree above 1. **Boundary geofence enforcement** — continuous, highest priority. 2. **Dynamic battery RTB** — position-aware energy calculation. 3. **Lost-link protocol** — predefined safe route after comms timeout. 4. **No-fly zone avoidance** — hard constraint, same tier as boundary. 5. **Weather/wind abort** — return if wind exceeds platform limits. 6. **Emergency health monitor** — critical failure triggers immediate RTB. #### Additional considerations 7. **Airspace deconfliction** — if ADS-B or transponder data shows traffic, execute avoidance manoeuvre before resuming mission. 8. **Sensor confidence gating** — only promote a detection to target tracking when classification confidence exceeds a configurable threshold, reducing false positives. 9. **Target lock timeout** — if a tracked target is lost from the sensor for T seconds, downgrade and return to search instead of orbiting indefinitely. 10. **Energy-aware search pattern** — as battery depletes, shrink search sectors toward the exit point direction so RTB distance remains short. 11. **Minimum altitude floor** — fixed-wing must maintain safe AGL altitude; the tree should prevent descent commands that violate this. 12. **Stall speed protection** — if airspeed drops near stall threshold (e.g., strong headwind + slow manoeuvre), override current action and increase throttle / reduce bank angle. 13. **Camera gimbal limits** — if the target moves beyond gimbal range, reposition the aircraft rather than losing tracking. 14. **Duplicate target suppression** — if a target at location X was already reported and confirmed, do not re-enter full investigation; mark as known and continue search. 15. **Mission time limit** — even if battery allows, enforce maximum mission duration for operational reasons (crew rotation, replanning windows). ### YAML representation ```yaml tree_id: uav_surveillance_v2 version: 2 root: root_selector tick_rate_hz: 10 blackboard_schema: battery_remaining_wh: float energy_to_exit_wh: float reserve_wh: float inside_boundary: bool comms_active: bool comms_lost_duration_sec: float wind_speed_ms: float critical_failure: bool approaching_nfz: bool target_artillery_detected: bool target_tank_detected: bool target_vehicle_detected: bool tree_row_detected: bool trench_detected: bool car_entrance_detected: bool car_or_truck_tracks_detected: bool caponier_detected: bool trash_detected: bool military_vehicle_detected: bool truck_detected: bool car_detected: bool current_altitude_agl_m: float airspeed_ms: float parameters: comms_lost_timeout_sec: 10 target_lock_timeout_sec: 15 max_wind_speed_ms: 18 min_altitude_agl_m: 80 stall_speed_ms: 22 classification_confidence_threshold: 0.75 nodes: root_selector: type: Selector children: [safety_selector, target_engagement_selector, search_sequence] # --- Safety --- safety_selector: type: Selector children: - boundary_recovery_seq - low_battery_seq - lost_link_seq - nfz_seq - weather_seq - emergency_seq boundary_recovery_seq: type: Sequence children: [cond_outside_boundary, act_recover_to_boundary] low_battery_seq: type: Sequence children: [cond_low_battery, act_return_to_exit] lost_link_seq: type: Sequence children: [cond_comms_lost, act_lost_link_route] nfz_seq: type: Sequence children: [cond_approaching_nfz, act_divert_nfz] weather_seq: type: Sequence children: [cond_high_wind, act_return_to_exit] emergency_seq: type: Sequence children: [cond_critical_failure, act_emergency_rtb] # --- Target Engagement --- target_engagement_selector: type: Selector children: [artillery_seq, tank_seq, vehicle_seq] artillery_seq: type: Sequence children: [cond_artillery, act_classify_artillery, act_track_artillery] tank_seq: type: Sequence children: [cond_tank, act_classify_tank, act_track_tank] vehicle_seq: type: Sequence children: [cond_vehicle, act_classify_vehicle, act_track_vehicle] # --- Search Pattern --- search_sequence: type: Sequence children: [act_fly_search_legs, area_investigation_selector] area_investigation_selector: type: Selector children: [tree_row_seq, trench_seq, act_continue_next_leg] tree_row_seq: type: Sequence children: [cond_tree_row, subtree_investigate_tree_row] trench_seq: type: Sequence children: [cond_trench, subtree_investigate_trench] # subtree_investigate_trench mirrors subtree_investigate_tree_row structure # (adjust_zoom → fly_along_trench → feature_scan_selector → detailed_inspection) # --- Tree Row Investigation (Phase 2 — ZoomedIn, medium) --- subtree_investigate_tree_row: type: Sequence children: - act_zoom_medium - act_fly_along_tree_row - tree_row_feature_selector tree_row_feature_selector: type: Selector children: - car_entrance_seq - tracks_seq - caponier_seq - trash_seq - mil_vehicle_seq - truck_seq - car_seq - act_mark_clear car_entrance_seq: type: Sequence children: [cond_car_entrance, subtree_detailed_inspection] tracks_seq: type: Sequence children: [cond_tracks, subtree_detailed_inspection] caponier_seq: type: Sequence children: [cond_caponier, subtree_detailed_inspection] trash_seq: type: Sequence children: [cond_trash, subtree_detailed_inspection] mil_vehicle_seq: type: Sequence children: [cond_mil_vehicle, subtree_detailed_inspection] truck_seq: type: Sequence children: [cond_truck_in_row, subtree_detailed_inspection] car_seq: type: Sequence children: [cond_car_in_row, subtree_detailed_inspection] # --- Detailed Inspection (Level 3) --- subtree_detailed_inspection: type: Sequence children: - act_zoom_max - act_loiter_over_poi - act_run_hires_classifier - vehicle_class_selector - act_capture_snapshot - act_transmit_report vehicle_class_selector: type: Selector children: - high_priority_seq - medium_priority_seq - low_priority_seq - act_log_evidence high_priority_seq: type: Sequence children: [cond_is_tank_or_artillery, act_flag_high_priority] medium_priority_seq: type: Sequence children: [cond_is_truck, act_flag_medium_priority] low_priority_seq: type: Sequence children: [cond_is_car, act_flag_low_priority] # --- Conditions --- cond_outside_boundary: type: Condition eval: "not inside_boundary" cond_low_battery: type: Condition eval: "battery_remaining_wh <= (energy_to_exit_wh + reserve_wh)" cond_comms_lost: type: Condition eval: "comms_lost_duration_sec > comms_lost_timeout_sec" cond_approaching_nfz: type: Condition eval: "approaching_nfz" cond_high_wind: type: Condition eval: "wind_speed_ms > max_wind_speed_ms" cond_critical_failure: type: Condition eval: "critical_failure" cond_artillery: type: Condition eval: "target_artillery_detected" cond_tank: type: Condition eval: "target_tank_detected" cond_vehicle: type: Condition eval: "target_vehicle_detected" cond_tree_row: type: Condition eval: "tree_row_detected" cond_trench: type: Condition eval: "trench_detected" cond_car_entrance: type: Condition eval: "car_entrance_detected" cond_tracks: type: Condition eval: "car_or_truck_tracks_detected" cond_caponier: type: Condition eval: "caponier_detected" cond_trash: type: Condition eval: "trash_detected" cond_mil_vehicle: type: Condition eval: "military_vehicle_detected" cond_truck_in_row: type: Condition eval: "truck_detected" cond_car_in_row: type: Condition eval: "car_detected" cond_is_tank_or_artillery: type: Condition eval: "classified_type in ['tank', 'artillery']" cond_is_truck: type: Condition eval: "classified_type == 'truck'" cond_is_car: type: Condition eval: "classified_type == 'car'" # --- Actions --- act_recover_to_boundary: type: Action call: recover_to_closest_boundary_point act_return_to_exit: type: Action call: return_to_exit_point act_lost_link_route: type: Action call: execute_lost_link_route act_divert_nfz: type: Action call: divert_around_nfz act_emergency_rtb: type: Action call: emergency_rtb act_classify_artillery: type: Action call: classify_confirm_artillery act_track_artillery: type: Action call: orbit_and_track params: { target_type: artillery } act_classify_tank: type: Action call: classify_confirm_tank act_track_tank: type: Action call: orbit_and_track params: { target_type: tank } act_classify_vehicle: type: Action call: classify_confirm_vehicle act_track_vehicle: type: Action call: orbit_and_track params: { target_type: vehicle } act_fly_search_legs: type: Action call: fly_search_legs act_continue_next_leg: type: Action call: continue_to_next_search_leg act_zoom_medium: type: Action call: adjust_zoom_or_altitude params: { level: medium } act_fly_along_tree_row: type: Action call: fly_along_feature act_mark_clear: type: Action call: mark_area_clear_resume_search act_zoom_max: type: Action call: adjust_zoom_or_altitude params: { level: max } act_loiter_over_poi: type: Action call: loiter_over_point_of_interest act_run_hires_classifier: type: Action call: run_high_resolution_classifier act_flag_high_priority: type: Action call: flag_target params: { priority: high } act_flag_medium_priority: type: Action call: flag_target params: { priority: medium } act_flag_low_priority: type: Action call: flag_target params: { priority: low } act_log_evidence: type: Action call: log_evidence_and_resume act_capture_snapshot: type: Action call: capture_snapshot_and_coordinates act_transmit_report: type: Action call: transmit_report ``` ### How the tick cycle plays out — example scenarios #### Scenario A: Normal search, nothing found ```text Tick 1: Safety → all pass → Target → none → Search → fly leg 3 → no features → continue Tick 2: Safety → all pass → Target → none → Search → fly leg 3 → no features → continue ... ``` #### Scenario B: Tree row detected, tracks found, confirmed tank ```text Tick 40: Safety ✓ → Target: none → Search: tree_row_detected=true → zoom medium → fly along tree row Tick 41: Safety ✓ → Target: none → Search: car_or_truck_tracks_detected=true → zoom max → loiter → run classifier → classified_type=tank → flag_high_priority → writes target_tank_detected to blackboard Tick 42: Safety ✓ → Target: target_tank_detected=true → classify_confirm_tank → orbit_and_track (tank) ``` The tree row investigation naturally escalated into target tracking through the blackboard. #### Scenario C: Battery drops during tracking ```text Tick 100: Safety: battery_remaining ≤ energy_to_exit + reserve → TRUE → return_to_exit_point (overrides active tank tracking) ``` Safety always wins. The UAV breaks off tracking and returns. #### Scenario D: Boundary breach during search ```text Tick 55: Safety: outside_boundary=true → recover_to_closest_boundary_point Tick 56: Safety: outside_boundary=true → still recovering (Running) Tick 57: Safety: inside_boundary=true → pass → resume mission ``` ### Error scenarios - **Stale blackboard data** — if the perception layer fails to update `*_detected` flags, conditions read stale values. `scan_controller` must associate every blackboard write with a freshness timestamp; a stale flag must be treated as `false` after a configurable TTL. - **Action returning `Running` indefinitely** — wrapped by a timeout decorator (`target_lock_timeout_sec`, `comms_lost_timeout_sec`, etc.). Timeout → `Failure` → tree falls through to a lower-priority subtree. - **Conflicting safety triggers** — Safety subtree is a Selector: only the highest-priority active branch runs. Lower-priority conditions are still evaluated next tick; preemption is implicit. - **Concurrency violation between Tier 2 / VLM** — `scan_controller` enforces sequential GPU use; any concurrent invocation is a logic bug and must alarm. ### Data-flow table | From → To | Payload | Channel | Lifecycle | |---|---|---|---| | perception layer (F1/F2/F3) → scan_controller | bboxes, motion candidates, Tier 2 evidence, `VlmAssessment` | in-process | per-tick | | scan_controller ↔ blackboard | typed state (battery, geofence, comms, target/feature flags) | in-process | per-tick | | scan_controller → gimbal_controller | pan / tilt / zoom commands | in-process | per-action | | scan_controller → mission_executor | route hints / middle-waypoint requests | in-process | per-decision | | scan_controller → operator_bridge | POI surface / target-follow start-stop | in-process | per-event | --- ## F5. Operator round trip **Entry**: two parallel sub-flows: (a) the always-on data plane (camera + telemetry stream), and (b) the POI surface + operator response loop. Both flow through `operator_bridge` and `telemetry_stream`. **Narrative**: 1. **Always-on stream** (a). `telemetry_stream` continuously pushes the camera feed and telemetry (UAV position, gimbal state, bbox overlay metadata) to the Ground Station API over modem. This stream is **not** detection-gated: the operator always sees the live feed even when no POI is queued. 2. **POI surface** (b). When `scan_controller` decides to surface a POI, `operator_bridge` packages the POI (class group, confidence, MGRS coordinate, snapshot URL, evidence including optional `VlmAssessment`) and pushes it to the Ground Station as a typed event. 3. The Ground Station renders the POI in the operator UI alongside the live overlay. The operator chooses one of: `confirm` → target-follow / middle-waypoint, `decline` → ignored-items, `start-follow` / `release-follow`, or no action (timeout). 4. **Operator timeout scales with confidence** — 40 % → 30 s, 100 % → 120 s, linearly. Timeout → POI is forgotten (not added to ignored-items). Decline → POI is appended to ignored-items via `mapobjects_store` (see F7) so the same scene does not re-surface. 5. **Confirm path** branches on intent: - **Confirm as target** → `scan_controller` enters target-follow mode (gimbal keeps target in centre 25 %); see F6 for middle-waypoint propagation. - **Start-follow / release-follow** → `scan_controller` toggles target-follow; UAV continues mission per F6. 6. The operator response is delivered to autopilot via the modem in the reverse direction; `operator_bridge` validates the response payload (POI ID match, type-safe action enum) before forwarding to `scan_controller`. **Sequence diagram**: ```mermaid sequenceDiagram participant SC as scan_controller participant OB as operator_bridge participant TS as telemetry_stream participant GS as Ground Station API participant OP as Operator browser participant MO as mapobjects_store par always-on stream TS->>GS: camera + telemetry + bbox overlay (modem) GS-->>OP: live feed render and POI round trip SC->>OB: surface POI (class, MGRS, conf, evidence) OB->>GS: typed POI event GS-->>OP: POI prompt alt operator confirms OP->>GS: confirm | start-follow | release-follow GS->>OB: response OB->>SC: action (validated) else operator declines OP->>GS: decline GS->>OB: response OB->>MO: append to ignored-items (F7) OB->>SC: declined else timeout (confidence-scaled) SC->>SC: forget POI (no ignored-items append) end end ``` **Error scenarios**: - **Modem outage** — `telemetry_stream` buffers a bounded amount of recent stream; on reconnect the live feed resumes from current. Backlog beyond buffer is dropped (live operator value > completeness). Health surface reflects the outage. - **POI response with mismatched POI ID** — `operator_bridge` rejects; logged. No state change. - **Repeat-decline of the same scene** — ignored-items deduplicates by `MGRS + class`; subsequent identical declines are no-ops. - **Operator no-response** — confidence-scaled timeout fires; POI is forgotten (not declined). Distinguishes "operator chose to ignore" from "operator never saw it". **Data-flow table**: | From → To | Payload | Channel | Lifecycle | |---|---|---|---| | telemetry_stream → Ground Station | camera + telemetry + overlay | modem stream | continuous | | operator_bridge → Ground Station | typed POI events | modem (event channel) | per-POI | | Ground Station → operator_bridge | operator response | modem (response channel) | per-response | | operator_bridge → scan_controller | validated action | in-process | per-response | | operator_bridge → mapobjects_store | ignored-items append | in-process | per-decline | --- ## F6. Mission lifecycle **Entry**: `mission_client` pulls a mission from the external `missions` API at process start (and on configurable refresh). **Narrative**: 1. `mission_client` issues `GET /missions/{id}` (or the equivalent canonical endpoint per `../_docs/02_missions.md`). It receives a `Mission` payload conforming to the shared `mission-schema` artefact (Mission / Waypoint / Vehicle). 2. `mission_executor` receives the mission and selects the variant: **multirotor** or **fixed-wing**. The variant exclusively owns its state table (per `architecture.md §7.7`); the base coordinator does not contain variant-specific concepts. 3. `mission_executor` translates the `MissionItem`s into `MissionWaypoint`s usable by `mavlink_layer` (the wire-level MAVLink contract). The translation contract is documented in `data_model.md §Rewrite Entities > MissionItem vs MissionWaypoint`. 4. `mavlink_layer` uploads the mission to the autopilot (ArduPilot/PX4) via the hand-rolled MAVLink subset (~10–15 commands). Arming and takeoff use variant-specific paths: - **Multirotor** — `arm` → `takeoff` → `start_mission`. - **Plane** — `upload_mission` → `wait_for_AUTO_mode` → `start_mission` (arming and takeoff happen via RC AUTO mode in ArduPilot). 5. **Middle-waypoint POST on confirm**. When the operator confirms a POI as a target (F5), `mission_executor` requests `mission_client` to POST a middle-waypoint insert against the missions API; on acceptance, `mission_executor` updates the local mission and re-sends the affected `MissionWaypoint`s to ArduPilot via `mavlink_layer`. 6. On `mission_finished` callback from ArduPilot, `mission_executor` issues the variant-appropriate landing sequence. **Sequence diagram**: ```mermaid sequenceDiagram participant MC as mission_client participant MIS as missions API participant ME as mission_executor (multirotor | fixed-wing) participant ML as mavlink_layer participant AP as ArduPilot / PX4 MC->>MIS: GET /missions/{id} MIS-->>MC: Mission payload MC->>ME: deliver Mission ME->>ME: translate to MissionWaypoint[] ME->>ML: upload_mission ML->>AP: MAVLink upload alt multirotor ME->>ML: arm / takeoff ML->>AP: MAVLink arm / takeoff else plane ME->>ML: wait for AUTO mode AP-->>ML: AUTO mode active end ME->>ML: start_mission ML->>AP: MAVLink start AP-->>ML: telemetry, mission progress Note over ME: F5 confirm → middle-waypoint POST ME->>MC: insert waypoint MC->>MIS: POST middle-waypoint MIS-->>MC: ack MC->>ME: confirmed ME->>ML: re-send affected waypoints AP-->>ML: mission_finished ML-->>ME: finished ME->>ML: land ``` **Error scenarios**: - **`missions` API unreachable** at startup → `mission_client` retries with bounded backoff; surfaces health degradation; does not start `mission_executor` until a mission is loaded. No silent default-mission fallback. - **Mission schema mismatch** → `mission_client` rejects the payload with structured error and refuses to start. - **MAVLink upload partial failure** → `mavlink_layer` returns typed error; `mission_executor` retries per its variant policy or escalates. - **Middle-waypoint POST rejected** by missions API (validation, conflict) → `mission_executor` does NOT modify the local mission; surfaces the error to the operator via `operator_bridge`. The target-follow toggle is independent and continues regardless. - **`mission_finished` never arrives** → bounded watchdog. Variant policy decides whether to RTB or land in place. **Data-flow table**: | From → To | Payload | Channel | Lifecycle | |---|---|---|---| | missions API ↔ mission_client | `Mission` payload (mission-schema) | HTTP REST | per-mission load + per-confirm POST | | mission_client → mission_executor | `Mission` | in-process | per-load | | mission_executor → mavlink_layer | `MissionWaypoint[]`, control commands | in-process | per-upload + per-event | | mavlink_layer ↔ ArduPilot | MAVLink v2 | UDP / serial | streaming | | ArduPilot → mission_executor | telemetry + `mission_finished` | callback chain | per-event | --- ## F7. MapObjects + ignored-items **Entry**: triggered by either (a) a new detection from F1, or (b) an operator decline from F5, or (c) a region-end sweep complete signal from `scan_controller`. **Narrative**: 1. **On each new detection** (gps, class, confidence, size): - Compute H3 cell index at the chosen resolution (default `res 10` ≈ 15 m edge). - Build composite key = `H3_cell + class`. - Query `grid_disk(H3_cell, k=2)` to fetch all neighbouring cells (handles the H3 cell-boundary discontinuity). - For each neighbouring cell, look up objects in the same **class group** (configurable: e.g. `{military_vehicle, tank, artillery}` collapse together). - Decision: - Match within `distance_threshold` (default 50 m) **and** position delta < `move_threshold` (default 10 m) → `EXISTING` (no update). - Match within `distance_threshold` **and** position delta ≥ `move_threshold` → `MOVED` (update position + last_seen). - No match → `NEW` (insert with H3 hash + MGRS key). 2. **On full region-sweep complete**, `mapobjects_store` diffs the previously-known set against the re-observed set in scanned cells. Unrevisited entries become `REMOVED` candidates and are surfaced (typically to the operator for confirmation rather than auto-purged, to avoid losing real but missed objects). 3. **On operator decline of a POI** (from F5): append `(MGRS, class, decline_time, operator_id)` to the **ignored-items** list. `scan_controller` consults ignored-items before promoting any future detection that hits the same `MGRS + class` key, so declined scenes do not re-surface. 4. The 30 km broad-radius pre-filter is performed at a coarser H3 resolution (e.g. `res 4` ≈ 22 km edge) before the fine-grained k-ring query. **Sequence diagram**: ```mermaid sequenceDiagram participant SC as scan_controller participant DC as detection_client participant MO as mapobjects_store DC->>SC: bbox (class, conf, geometry, GPS) SC->>MO: classify(detection) MO->>MO: compute H3 cell MO->>MO: grid_disk(k=2) lookup, class-group filter alt match within distance_threshold AND delta < move_threshold MO-->>SC: EXISTING (no update) else match within distance_threshold AND delta ≥ move_threshold MO->>MO: update position, last_seen MO-->>SC: MOVED else no match MO->>MO: insert (H3 + MGRS + class) MO-->>SC: NEW end Note over SC,MO: at region-end: SC->>MO: region_sweep_complete(scanned_cells) MO->>MO: diff observed vs known MO-->>SC: REMOVED candidates Note over SC,MO: on operator decline (F5): SC->>MO: ignored_items_append(MGRS, class) ``` **Error scenarios**: - **Stale GPS / inaccurate MGRS** → `mapobjects_store` may classify a true existing object as `NEW` (or vice versa). Mitigated by the k-ring widen and the configurable thresholds; persistent mismatches surface as duplicate-detection bursts and indicate a GPS-quality issue rather than a map bug. - **Class confusion between similar primitives** (e.g. `tree_block` vs `tree_row`) → class-group configuration determines whether they collapse. Misconfiguration shows as ping-pong between `MOVED` and `NEW` for the same scene. - **H3 cell-boundary discontinuity** → already mitigated by k-ring query. - **Region-sweep `REMOVED` over-trigger** when the operator re-routes mid-region → diff considers only cells that were fully scanned. Partial-scan cells are excluded from `REMOVED` candidates. - **Ignored-items unbounded growth** → bounded by configurable retention policy (e.g. expire entries older than mission, or per-mission scope). Out of MVP scope to auto-purge by hard count. **Data-flow table**: | From → To | Payload | Channel | Lifecycle | |---|---|---|---| | scan_controller → mapobjects_store | detection (gps, class, conf, size) | in-process | per-detection | | mapobjects_store → scan_controller | classification (`EXISTING` / `MOVED` / `NEW` / `REMOVED`) | in-process | per-detection / per-region-end | | scan_controller → mapobjects_store | `region_sweep_complete(scanned_cells)` | in-process | per-region-end | | operator_bridge → mapobjects_store | ignored-items append (MGRS, class) | in-process | per-decline | --- ## F8. MapObjects sync (central DB, mission-bracketing) **Entry**: this flow has two trigger points — pre-flight (after `mission_client` fetches the mission and before BIT can complete) and post-flight (after landing, RTL, or mission abort). **Narrative**: 1. **Pre-flight pull** (a). After `GET /missions/{id}` succeeds, `mission_client` issues `GET /missions/{id}/mapobjects` against the same `missions` API. The response is the central map-state for the mission's bounding box (per `architecture.md §7.13`). 2. `mission_client` hands the response to `mapobjects_store`, which hydrates `current_state` (keyed by `(h3_cell, class_group)`) and `pending_ignored` (any union-merged ignored items from prior missions in the same area). 3. `mapobjects_store` reports `sync_state = synced` (or `cached_fallback` if the central API was unreachable and the operator acknowledged continuing on cache, or `degraded` if the cache was stale beyond the configurable freshness window). 4. **In-flight** (b). `mapobjects_store` is authoritative; every NEW / MOVED / EXISTING / REMOVED-CANDIDATE classification from F7 plus every IgnoredItem append from F5 is written to `pending_observations` / `pending_ignored` with `pending_upload = true`. **No central writes during flight** (Frozen choice 6 — batched only for MVP). 5. **Post-flight push** (c). After `mission_executor` reaches a terminal state (landed, RTL completed, or aborted), it triggers `mission_client` to `POST /missions/{id}/mapobjects` with the full pass diff and `POST /missions/{id}/mapobjects/ignored` with any new declines. 6. The central `missions` API merges per its conflict-resolution rules (§7.13 — append-only observation log + computed current view). Acknowledgement clears `pending_upload`. 7. **Push-failure persistence**. If the central API is unreachable post-flight, the pending diff is kept on disk; bounded retry runs on a timer. After max retries, the operator is surfaced with a warning; the data is preserved for manual replay. **Sequence diagram**: ```mermaid sequenceDiagram participant ME as mission_executor participant MC as mission_client participant MIS as missions API (central) participant MO as mapobjects_store participant SC as scan_controller participant OP as operator (via operator_bridge) Note over ME,MO: pre-flight (after mission GET) MC->>MIS: GET /missions/{id}/mapobjects alt 200 OK MIS-->>MC: central map state MC->>MO: hydrate MO-->>SC: sync_state = synced else unreachable / timeout MC->>OP: surface BIT degradation alt operator acknowledges cached fallback MO-->>SC: sync_state = cached_fallback else operator aborts ME->>ME: BIT fail; do not arm end else 4xx MC->>OP: surface error (mission ID / auth) ME->>ME: BIT fail; do not arm end Note over MO: in-flight: pending_observations + pending_ignored grow Note over MO: NO central writes during flight Note over ME,MO: post-flight (terminal state reached) ME->>MC: trigger upload MC->>MIS: POST /missions/{id}/mapobjects (pass diff) MC->>MIS: POST /missions/{id}/mapobjects/ignored (declines) alt 200 OK MIS-->>MC: ack MC->>MO: clear pending_upload MO-->>SC: sync_state = synced else unreachable / 5xx MC->>MC: persist pending diff on disk MC->>MC: bounded retry (timer) MC->>OP: surface warning after max retries else 4xx MC->>OP: surface rejection (full payload logged) end ``` **Error scenarios**: - **Pre-flight pull unreachable** → BIT degrades; operator must acknowledge cached fallback or abort. Never silent. - **Pre-flight pull returns stale cache only** → surface freshness with the operator-acknowledgement prompt. - **In-flight crash** before post-flight push → on next boot, `mapobjects_store` finds non-empty `pending_observations` for a mission that has terminated; `mission_client` runs the post-flight push at startup before BIT completes for any new mission. - **Post-flight push partial success** (mapobjects 200 but ignored 5xx, or vice versa) → independent retry per endpoint; do not roll back the successful one. - **Mission deleted centrally between pre-flight pull and post-flight push** (`DELETE /missions/{id}` cascade hit while UAV was airborne) → post-flight POST returns 404; the on-device pending diff is logged as orphaned and retained for forensic review (operator decision on whether to discard). - **Conflict at the central store** (two UAVs reporting incompatible state) → not surfaced as an error to autopilot; the central API resolves per §7.13 conflict rules and returns 200 regardless. **Data-flow table**: | From → To | Payload | Channel | Lifecycle | |---|---|---|---| | mission_client → missions API | `GET /missions/{id}/mapobjects` | HTTP REST | once per mission, pre-flight | | missions API → mission_client | central map state (mapobjects + ignored) | HTTP REST | once per mission, pre-flight | | mapobjects_store → scan_controller | `sync_state` (synced / cached_fallback / degraded) | in-process | event | | scan_controller → mapobjects_store | NEW / MOVED / EXISTING / REMOVED-CANDIDATE / IgnoredItem | in-process | per-detection / per-decline (in-flight) | | mission_client → missions API | `POST /missions/{id}/mapobjects` (pass diff) | HTTP REST | once per mission, post-flight (with retry) | | mission_client → missions API | `POST /missions/{id}/mapobjects/ignored` | HTTP REST | once per mission, post-flight | --- ## F9. Pre-flight self-test (BIT) **Entry**: `mission_executor` enters the BIT phase after `mission_client` completes the pre-flight pull (F8) — before `ARMED` (multirotor) or `WAIT_AUTO` (fixed-wing). **Narrative**: 1. `mission_executor` evaluates a fixed checklist per dependency. Each item has three results: `OK`, `DEGRADED` (operator may acknowledge to continue), `FAIL` (must be resolved; BIT cannot pass). 2. **Checklist items** (in evaluation order): - GPS lock (with configured min-satellite count and accuracy). - Camera RTSP healthy (frames flowing within timeout; resolution + framerate as configured). - Gimbal homed (yaw / pitch / zoom feedback within tolerance of last commanded). - `../detections` reachable + warmed (at least one round-trip frame succeeded). - VLM warm if `vlm_enabled` (at least one structured `VlmAssessment` returned during warmup). - `mission_client` mission loaded + schema-validated. - `mapobjects_store` pre-flight pull complete (`synced` or `cached_fallback` after operator acknowledgement; `degraded` is FAIL). - Persistent-store free space ≥ configured floor. - Wall-clock bound to GPS or NTP within tolerance. - MAVLink heartbeat + airframe health (battery, sensor health) within thresholds. 3. The aggregate status flows to `operator_bridge` → Ground Station → operator UI as a structured BIT report. The operator may acknowledge `DEGRADED` items individually (each acknowledgement is recorded with operator ID + timestamp). 4. On all items `OK` (or DEGRADED-acknowledged), `mission_executor` transitions to `ARMED` / `WAIT_AUTO`. On any `FAIL`, the transition is blocked; the operator must resolve. **Sequence diagram**: ```mermaid sequenceDiagram participant ME as mission_executor participant H as health aggregator participant DEPS as dependencies (frame_ingest, gimbal, mavlink, ../detections, vlm, mapobjects, mission_client, ...) participant OB as operator_bridge participant OP as operator UI ME->>H: run_BIT() H->>DEPS: snapshot health + functional probes DEPS-->>H: per-dep status H-->>ME: BIT report (item × {OK | DEGRADED | FAIL}) ME->>OB: surface BIT report OB->>OP: BIT prompt alt all OK ME->>ME: transition to ARMED / WAIT_AUTO else some DEGRADED OP->>OB: acknowledge degraded items (signed) OB->>ME: acknowledgement ME->>ME: transition to ARMED / WAIT_AUTO else any FAIL ME->>ME: hold; no transition end ``` **Error scenarios**: - **BIT report contains a FAIL** → no transition; operator must investigate. - **Operator acknowledges a DEGRADED item that should be FAIL** → not allowed by `operator_bridge` validation; any such request is rejected. - **Health flips during BIT** → BIT is re-run from the start; partial acknowledgements are invalidated. --- ## F10. Lost-link failsafe ladder **Entry**: `mission_executor` continuously evaluates the operator/Ground-Station modem link per the ladder defined in `architecture.md §7.7`. **Narrative**: 1. Every tick, `mission_executor` reads `last_operator_heartbeat_ts` and computes the current rung: `LinkOk`, `LinkDegraded`, `LinkLost`, or `LinkLostInFollow`. 2. **`LinkOk`** (last heartbeat ≤ 5 s) — no behavioural change. Mission continues as planned. 3. **`LinkDegraded`** (5 s < last heartbeat ≤ 30 s) — surface health → yellow; **queue all POI surface-events for replay-on-recovery** (do not drop them; the operator may still see them on reconnect within the grace window). 4. **`LinkLost`** (last heartbeat > 30 s, and target-follow inactive) — trigger RTL via `MAV_CMD_NAV_RETURN_TO_LAUNCH`; log mission abort with reason; continue logging the mission diff to `mapobjects_store` so post-flight push (F8) can succeed when the link recovers (or eventually, after landing, on cellular/Wi-Fi at the home base). 5. **`LinkLostInFollow`** (last heartbeat > 30 s, in target-follow) — extend grace by 30 s (operator may have momentarily lost link during a confirmed engagement); on grace expiry, fall through to `LinkLost`. 6. **MAVLink-link loss to ArduPilot/PX4** is a separate, more severe event: `mission_executor` cannot command the airframe at all. Health flips to red; the airframe's own MAVLink failsafe (configured in ArduPilot/PX4) takes over. We do NOT override the airframe failsafe. **Sequence diagram**: ```mermaid sequenceDiagram participant TICK as tick (10 Hz) participant ME as mission_executor participant ML as mavlink_layer participant AP as ArduPilot / PX4 participant OB as operator_bridge participant SC as scan_controller loop every tick TICK->>ME: tick ME->>OB: read last_operator_heartbeat_ts alt LinkOk (≤ 5 s) ME->>ME: continue mission else LinkDegraded (5–30 s) ME->>ME: surface health → yellow ME->>OB: queue POI surface-events else LinkLost (> 30 s, no follow) ME->>ML: MAV_CMD_NAV_RETURN_TO_LAUNCH ML->>AP: send AP-->>ML: ack ME->>SC: notify mission abort (reason: lost_link) else LinkLostInFollow ME->>ME: extend grace 30 s alt grace expires ME->>ML: MAV_CMD_NAV_RETURN_TO_LAUNCH ME->>SC: exit target-follow; mission abort end end end Note over ME,AP: MAVLink link loss is separate alt MAVLink heartbeat lost > timeout ME->>ME: health → red Note over AP: ArduPilot's own failsafe takes over end ``` **Error scenarios**: - **Heartbeat clock skew** → governed by the wall-clock policy in `architecture.md §7.3 Reliability and safety`. Drift > 200 ms surfaces health yellow; the lost-link ladder uses monotonic timing only. - **Operator acks a POI during `LinkDegraded`** → ack is processed normally on receipt; the queue replays any unsent events. Sequence numbers prevent reordering. - **RTL refused by ArduPilot** (mode lock, geofence anomaly) → bounded retry; if RTL is blocked persistently, escalate to land-now. Health → red. - **Operator deliberately suppresses lost-link RTL** (signed override) → permitted only via signed command (Q9); recorded in audit log with operator ID and rationale. --- ## Cross-flow notes - **Single Rust process, single BT.** All ten flows run inside the same autopilot binary. Cross-flow state transitions go through `scan_controller`'s behaviour tree (F4); there is no per-flow private state machine that can drift out of sync with the others. - **Evidence → decision → action.** F1 (frames + Tier 1), F2 (movement candidates at zoom-out and zoom-in), and F3 (VLM) produce evidence. F4 consumes evidence and decides. F5 (operator round trip), F6 (mission lifecycle), F7 (MapObjects + ignored items in-flight), F8 (MapObjects sync at mission boundaries), F9 (pre-flight BIT), and F10 (lost-link ladder) are the action-side consequences of F4 decisions and the surrounding mission lifecycle. - **POI lifecycle is end-to-end.** A POI is born in F1/F2, scored by F3, queued and surfaced through F4 → F5, and ends in either F6 (middle-waypoint insert on confirm) or F7 (ignored-item append on decline) or simply expires (timeout). F8 ensures the F7 outcomes survive across missions (central observation log + ignored-items merge). The state machine in F4 is the single owner of the POI's transitions. - **Telemetry plane is parallel, not serial.** `telemetry_stream` (referenced from every other flow) runs continuously and is independent of detection state. The operator always has the live feed; F5 only adds POI-specific overlays on top. - **Hard cap is enforced once.** The ≤5 POIs/min operator-review cap lives only in F4. Other flows produce as many candidates as they can; F4 decides which ever surface to F5. - **Mission lifecycle bookends.** F9 (pre-flight BIT) gates entry into the operational state machine. F8 (MapObjects pre-flight pull) is a BIT input; F8 (post-flight push) is part of the post-mission cleanup (whether mission completed normally, was RTL'd by F10, or was aborted by operator). A mission's data integrity centrally depends on F8 succeeding eventually — even for crashed UAVs, the on-device pending diff is durable so a recovered airframe can replay. - **Movement detection is dual-zoom in MVP.** F2 covers both zoom-out (well-validated, classical OpenCV) and zoom-in (benchmark-gated; see Q14). The zoom-in scope expands the set of POIs the system can produce; the ≤5 POIs/min cap in F4 absorbs the additional load by deprioritising rather than dropping.