Files
autopilot/_docs/02_document/architecture.md
T
Oleksandr Bezdieniezhnykh bc40ea7300 [AZ-626] Decompose complete: 47 tasks + docs + module layout
Greenfield Steps 1-6 baseline for the autopilot rewrite from legacy
Qt/C++ to a Rust workspace.

- Remove legacy Qt/C++ tree (ai_controller, drone_controller,
  misc/camera, python_scaffold, root Dockerfile, autopilot.pro,
  legacy main.py / requirements.txt).
- Add _docs/00_problem (problem, restrictions, acceptance criteria,
  security approach, input data + fixtures).
- Add _docs/01_solution/solution_draft01.
- Add _docs/02_document (architecture, system-flows, data_model,
  glossary, decision-rationale, deployment, 13 component descriptions,
  tests/ specs, FINAL_report, module-layout).
- Add _docs/02_tasks/todo with 47 task specs (AZ-640..AZ-686, one
  bootstrap + 46 component tasks) and _dependencies_table.md.
- Add .cursor/rules/artifact-srp.mdc (single-responsibility rule for
  canonical _docs artifacts).
- Track autodev state in _docs/_autodev_state.md (Step 6 completed,
  ready for Step 7 Implement).

Jira: bootstrap AZ-626; component epics AZ-627..AZ-639; tasks
AZ-640..AZ-686. Total complexity 173 points across 12 epics.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-19 11:02:01 +03:00

69 KiB
Raw Blame History

autopilot — Architecture

Status: forward-looking design (Rust). The implementation is in flight; the system described here is the target architecture, not what runs today. Confirmed by user 2026-05-17.

Synopsis

autopilot is the onboard mission executor for a reconnaissance winged UAV. It runs as a single Rust process on an aarch64 Jetson Orin Nano edge device. It pulls a mission from the external missions API, controls the UAV through a hand-rolled MAVLink layer (~1015 commands; no third-party SDK), drives a ViewPro A40 gimbal in a two-level scan-and-zoom loop (zoom-out wide sweep + zoom-in on POI), streams camera frames + telemetry continuously over modem to an external Ground Station API so the operator watches in a browser, and uses bi-directional gRPC to delegate primitive object detection to the external ../detections API. Semantic-vision reasoning (Tier 2 ROI analysis + an optional local VLM), a POI scheduler with an operator-review rate cap, and a target-follow mode after operator confirmation all run inside autopilot. The dominant pattern is a deterministic typed state machine (zoom-out / zoom-in / target-follow) coordinating a small set of async actors.


1. System Context

Autopilot integrates with six external systems. The local VLM is optional (benchmark-gated); everything else is mandatory.

flowchart LR
    cam["ViewPro A40<br/>RTSP camera + gimbal"]
    det["../detections<br/>Tier 1 YOLO service"]
    vlm["NanoLLM VILA1.5-3B<br/>(optional, local IPC)"]
    miss["missions API"]
    gs["Ground Station<br/>operator UI"]
    ap["ArduPilot / PX4"]
    autopilot["autopilot<br/>onboard mission + scan + perception"]
    cam <-->|RTSP frames / UDP gimbal control| autopilot
    autopilot <-->|bidir gRPC| det
    autopilot <-.->|Unix-domain socket IPC| vlm
    autopilot <-->|REST GET / POST| miss
    autopilot <-->|stream over modem| gs
    autopilot <-->|MAVLink v2| ap

Per-edge protocol details:

Edge Protocol Direction Purpose
ViewPro A40 (camera) RTSP/RTP over TCP/UDP inbound live H.264/265 1080p video to frame_ingest.
ViewPro A40 (gimbal) UDP, vendor control protocol bidirectional yaw / pitch / zoom commands + status; driven by gimbal_controller.
../detections bi-directional gRPC bidirectional frames out, bounding boxes back; driven by detection_client.
NanoLLM VILA1.5-3B Unix-domain socket IPC (peer-cred check) bidirectional bounded ROI + short prompt → structured VlmAssessment; optional.
missions API HTTPS REST (GET / POST) bidirectional mission pull on start; middle-waypoint POST on operator confirmation; MapObjects pre-flight pull + post-flight push (/missions/{id}/mapobjects, see §7.13).
Ground Station API continuous push over modem (protocol per ../_docs/04_system_design_clarifications.md) bidirectional always-on camera feed + telemetry + bbox overlay; operator confirm / decline / target-follow.
ArduPilot / PX4 MAVLink v2 over UDP or serial bidirectional the small command surface in §7.7.

2. Component Layering

Three internal layers (Perception → Decision + Memory → Action) plus an always-on Telemetry plane that runs parallel to the decision loop.

flowchart TB
    subgraph autopilot ["autopilot"]
        subgraph perception ["Perception (data plane in)"]
            fi[frame_ingest]
            dc[detection_client]
            md[movement_detector]
            sa[semantic_analyzer]
            vc["vlm_client (opt)"]
        end
        subgraph brain ["Decision + Memory"]
            sc[scan_controller]
            mo[mapobjects_store]
        end
        subgraph action ["Action (data plane out)"]
            gc[gimbal_controller]
            ob[operator_bridge]
            me[mission_executor]
            ml[mavlink_layer]
            msc[mission_client]
        end
        subgraph tplane ["Telemetry plane (always-on, parallel)"]
            ts[telemetry_stream]
        end
    end
    perception ==>|"inputs (bboxes, motion, Tier 2, VlmAssessment)"| brain
    brain ==>|"commands + POI updates + middle-waypoint hints"| action
    perception -.->|"frames + bboxes"| tplane
    action -.->|"telemetry"| tplane

Per-flow component-to-component sequence diagrams live in system-flows.md.


3. Components

Component Layer Responsibility
frame_ingest Perception Pull RTSP from ViewPro A40; decode; timestamp; hand frames to detection_client, movement_detector, and telemetry_stream (zero-copy where possible).
detection_client Perception Bi-directional gRPC to ../detections; streams frames out, receives bounding boxes back; same bboxes are reused for Tier 2 ROI selection and for operator overlay. Versioned against the ../_docs/03_detections.md contract.
movement_detector Perception Active in both zoom-out and zoom-in levels (skipped only during target-follow). OpenCV optical-flow / global-motion estimation fused with timestamped gimbal angle, zoom state, and UAV motion telemetry. Emits residual-motion clusters as POI candidates. Ego-motion compensation is mandatory; naive frame-differencing is rejected. Zoom-in adequacy of classical CV is benchmark-gated — see §7.6 Movement detector and Open Question Q14.
semantic_analyzer Perception Tier 2. Primitive graph + lightweight ROI CNN over zoom-in crops. Owns path-freshness scoring, endpoint scoring, branch choice at intersections, and concealment-POI scoring.
vlm_client Perception (optional) Local-IPC client to a NanoLLM/VILA1.5-3B process. Validates ROI payload size/format, calls the VLM with a bounded crop and short prompt, validates the response against a structured VlmAssessment schema. No cloud egress. Optional behind a vlm_enabled flag and a feature module (see §7.6 Local VLM Confirmation).
scan_controller Decision + Memory Central deterministic typed state machine — ZoomedOut, ZoomedIn, TargetFollow. Owns the POI queue, timeouts, ≤5 POIs/min cap, confidence-scaled operator-decision window, and gimbal-command issuance. Full behaviour-tree spec in system-flows.md §F4.
mapobjects_store Decision + Memory On-device H3-indexed map of detected objects + ignored-items list. Pre-flight pull of the mission-area map from the central missions API; in-flight on-device authoritative; post-flight push of the mission diff back to central. Computes new / moved / existing / removed diffs across passes (§7.10, §7.11, §7.12). Read/written directly by scan_controller; sync pulls/pushes are handled via mission_client.
gimbal_controller Action ViewPro A40 control protocol (yaw / pitch / zoom). Honours ≤2 s zoom transition budget and ≤500 ms decision-to-movement latency. Owns the smooth-pan path-tracking primitive used in zoom-in level.
operator_bridge Action Surfaces POIs and target-follow lifecycle events through telemetry_stream to the Ground Station; receives confirm / decline / target-follow start-release back. On decline, appends an IgnoredItem via mapobjects_store. On confirm, hands a middle-waypoint hint to mission_executor.
mission_executor Action Multirotor and fixed-wing variants of the platform state machine: takeoff / climb / cruise / land for multirotor; upload-and-await-AUTO for fixed-wing. Owns geofence enforcement (both INCLUSION and EXCLUSION). Issues MAVLink commands through mavlink_layer; consumes mission_client mission state. Inserts middle waypoints on operator-confirmed targets.
mavlink_layer Action Hand-rolled MAVLink v2 transport (UDP or serial) implementing only the ~1015 commands this codebase needs. See §7.7 for the command surface. No third-party SDK.
mission_client Action Pulls mission JSON from the missions API on start; validates against mission-schema; handles mid-flight middle-waypoint inserts (POST). Survives transient connection loss with bounded retry.
telemetry_stream Telemetry plane Continuous push of camera frames + flight telemetry + bbox overlay to the Ground Station API over modem. Always-on; not detection-gated. Carries operator commands (confirm / decline / target-follow start-release) on the return path.

The system is intentionally a small set of well-named components rather than 30+ files. Everything in frame_ingest, detection_client, movement_detector, semantic_analyzer, and vlm_client runs on the input data plane — no UAV control, no operator surface. Everything in gimbal_controller, mission_executor, mavlink_layer, mission_client, and operator_bridge runs on the output control plane — UAV motion + operator interaction. scan_controller and mapobjects_store are the brain in between. telemetry_stream is parallel; it never sits in the decision path.

Per-component design specs (purpose, inputs, outputs, state, failure modes, NFRs) live in components/<name>/description.md.


4. Major Data Flows

  1. Frame pipeline. ViewPro A40 RTSP → frame_ingestdetection_client (bi-dir gRPC to ../detections) → bboxes back → movement_detector (active at both zoom-out and zoom-in; residual-motion clusters) → scan_controller POI queue. The same bboxes also flow into telemetry_stream for operator overlay. (system-flows.md §F1)
  2. Zoom-in + confirmation. scan_controller pops a POI → gimbal_controller zooms ViewPro A40 → semantic_analyzer runs Tier 2 over the ROI → optionally vlm_client runs Tier 3 → scan_controller decides. Movement candidates emerging during the zoom-in hold are still consumed (subject to telemetry-skew tolerance and the per-zoom-band thresholds). (system-flows.md §F2, §F3)
  3. Operator round trip. telemetry_stream pushes camera + telemetry + bbox overlay → Ground Station → operator browser → confirm / decline / target-follow start-release → modem → operator_bridgemapobjects_store (decline) or mission_executor (confirm) or scan_controller (target-follow). Always-on, not detection-gated. Operator commands are authenticated, signed, and replay-protected (§5; scheme TBD per Q9). (system-flows.md §F5)
  4. Mission lifecycle. mission_client pulls from missions API → mission_executor issues MAVLink waypoints via mavlink_layergimbal_controller runs the zoom-out sweep along the route. On operator confirmation, mission_executor inserts a middle waypoint and resumes after target-follow ends. (system-flows.md §F6)
  5. MapObjects + ignored items. New detections compute an H3 cell, query the k-ring of neighbours, classify as new / moved / existing / removed (§7.12), and check for an IgnoredItem match before surfacing to the operator. (system-flows.md §F7)
  6. MapObjects sync (mission-bracketing). Pre-flight: mission_client pulls the last-known map state for the mission area from the missions API and hydrates mapobjects_store. Post-flight: mission_client pushes the mission's full pass diff (NEW / MOVED / REMOVED / CONFIRMED-EXISTING) back. In-flight sync is batched only for MVP — no streaming over modem (§7.13; system-flows.md §F8).

5. Architectural Principles / Non-Negotiables

  • Detection-as-a-service. Primitive (Tier 1) detection lives in ../detections, not in autopilot. Autopilot owns Tier 2 (semantic) and Tier 3 (VLM, optional) only.
  • Hand-rolled MAVLink. No third-party SDK. The MAVLink command surface is small enough to hand-implement; eliminates the largest current dependency-risk item.
  • Deterministic typed state machine for scan control. States are ZoomedOut | ZoomedIn { roi, hold_started_at } | TargetFollow { target_id, started_at }. No ad-hoc booleans, no shared mutable flags. The full behaviour-tree spec lives in system-flows.md §F4.
  • Ego-motion compensation is mandatory for movement detection. Naive frame-differencing is rejected outright. Movement detection runs at both zoom-out and zoom-in (skipped only during target-follow); zoom-in adequacy of classical CV is benchmark-gated (§7.6, Q14).
  • Operator workload cap of ≤5 POIs/minute is hard, not soft. scan_controller enforces it.
  • Operator timeout scales with confidence — 40 % → 30 s, 100 % → 120 s, linear; below 40 % the target is not surfaced. Timeout = forget; decline = IgnoredItem entry.
  • Operator commands are authenticated, signed, and replay-protected. Modem-link encryption alone is not sufficient — every confirm / decline / target-follow / abort command MUST carry a session-bound, replay-resistant signature that operator_bridge validates before dispatch. Exact scheme TBD (§8 Q9).
  • Local VLM with structured VlmAssessment schema. Free-form VLM text is not a downstream API. No cloud egress.
  • Always-on camera + telemetry stream to Ground Station is part of the mission contract — operator always sees the live feed, not just on detection.
  • Lost-link failsafe is explicit. Loss of the operator/Ground-Station modem link triggers a typed failsafe ladder in mission_executor (§7.7). The ladder is deterministic; default action is RTL after a configured grace window.
  • Pre-flight self-test (BIT) gates takeoff. Every dependency listed in §5 plus mission load + MapObjects pre-flight pull (cached fallback acknowledged) must pass before mission_executor enters ARMED (multirotor) or WAIT_AUTO (fixed-wing). Health endpoint distinguishes pre-flight vs in-flight readiness.
  • autopilot and missions are separate repos with a shared mission-schema artefact. The same missions API also hosts the central MapObjects endpoints (§7.13).
  • MapObjects are mission-bracketed and centrally synchronised. Pre-flight pull on start; on-device authoritative in-flight; full pass diff pushed at mission end. The on-device store is a working copy of the central state for the mission's bounding box, not a private database.
  • No silent error swallowing anywhere in the pipeline. Health endpoint reflects every dependency: frame_ingest, detection_client, movement_detector, semantic_analyzer, vlm_client (if enabled), scan_controller, gimbal_controller, mavlink_layer, mission_client, mission_executor, operator_bridge, telemetry_stream, mapobjects_store, plus mapobjects_sync (pre-flight pull / post-flight push status).
  • Geofence enforcement is symmetric. Both INCLUSION and EXCLUSION polygons are honoured. (Earlier C++ behaviour silently ignored EXCLUSION; the rewrite explicitly enforces both.)

6. Non-Functional Targets

Concern Target Owner
Tier 1 latency ≤100 ms / frame (end-to-end at 1280 px, FP16, batch 1) ../detections (autopilot's call budget respects it)
Tier 2 latency ≤200 ms / ROI semantic_analyzer
Tier 3 (VLM) latency ≤5 s / ROI vlm_client
ViewPro A40 zoom transition ≤2 s (medium → high) gimbal_controller
Decision-to-movement latency ≤500 ms gimbal_controller
POI rate to operator ≤5 POIs / min (hard cap) scan_controller
Concealed-position recall ≥60 % semantic_analyzer
Concealed-position precision ≥20 % (operators filter) semantic_analyzer
New per-class P / R ≥80 % ../detections
Footpath detection recall ≥70 % semantic_analyzer
Movement-candidate enqueue latency ≤1 s from detection (zoom-out); ≤1.5 s (zoom-in, accommodating gimbal slew) movement_detector
Zoom-out → zoom-in transition ≤2 s including physical zoom scan_controller + gimbal_controller
Telemetry rate (position) 1 Hz min, 10 Hz target mavlink_layer
Memory budget (semantic + movement + VLM) ≤6 GB on Jetson Orin Nano (8 GB total, ~2 GB reserved for YOLO) system-wide
Watchdog / retry on MAVLink failures bounded retry with exponential backoff; explicit max-retry; health flips to red mission_executor
Operator command → action latency ≤500 ms operator-click → outbound MAVLink / gimbal command (excludes modem RTT) operator_bridge + downstream
Sustained frame-rate floor ≥10 fps; below this scan_controller suppresses zoom-in transitions and surfaces health → yellow frame_ingest + scan_controller
MapObjects pre-flight pull ≤30 s for a 30 km × 30 km mission area; cache-fallback acceptable on timeout mission_client + mapobjects_store
MapObjects post-flight push ≤2 min for a 60 min mission's pass diff; bounded retry; persisted on disk if push fails mission_client + mapobjects_store

7. Detailed Design

This section covers the rewrite-time problem narrative, suite-level concerns (mission regions, MapObjects, MGRS sync, new-vs-existing object detection), constraints, acceptance criteria, the chosen solution architecture, the MAVLink command surface, and the tech stack.

7.1 Problem

The reconnaissance winged UAV detects vehicles and military equipment with YOLO, but current high-value targets are camouflaged positions: FPV operator hideouts, hidden artillery emplacements, and dugouts masked by branches. These cannot be found by visual similarity to known object classes alone.

The new approach has three cooperating search engines:

  • Camera sweep — follow the UAV route at wide or light/medium zoom with left-right gimbal movement to cover terrain and queue POIs.
  • Movement detection — runs in both zoom-out and zoom-in levels (skipped only during target-follow). Per-zoom-band thresholds keep false-positive rate below the operator-review cap; classical OpenCV adequacy at zoom-in is benchmark-gated (Q14).
  • Semantic zoom search — detect primitives such as black entrances, branch piles, footpaths, roads, trees, and tree blocks, then reason over scene context to find concealed positions.

The system controls a two-level scan:

  • Zoom-out level (wide-area sweep) — the camera follows the UAV route at wide or light/medium zoom, sweeping left-right across the flight path while detecting primitives, buildings, vehicles, and small motion candidates. Footpath starts, suspicious branch piles, tree rows, movement candidates, and similar POIs are marked with GPS-denied coordinates and queued.
  • Zoom-in level (detailed scan) — the camera zooms into each queued POI or movement candidate for confirmation. It follows detected footpaths from origin to endpoint, keeps paths centered while the UAV moves, follows the freshest or most promising branch at intersections, holds on endpoints for VLM analysis of branch piles, dark entrances, dugouts, vehicles, or people, and slowly pans broader POIs such as tree rows or clearings. Movement detection continues, scaled for the higher pixel-to-metre ratio. After analysis or timeout, it returns to zoom-out and continues the queue or route.

When an operator confirms a target, the system switches to target-follow mode: keep the target centered with gimbal control while the UAV moves, until the operator releases it or tracking is lost.

7.2 Mission Regions and Reconnaissance Flow

Mission directions can be vague. Waypoints define a route that passes through multiple regions:

Start → Point1 → Point2 → Point3 → Point4 → Point5 → Point6 → Finish
                    ╔═══════════════╗
                    ║   Region 1    ║
                    ╚═══════════════╝
         ╔══════════════════╗
         ║    Region 2      ║
         ╚══════════════════╝
  ╔══════════════╗
  ║   Region 3   ║
  ╚══════════════╝

The autopilot decides the route within each region (1, 2, and 3).

Alternative scenario — region-only search. The user selects only a region for the search (no explicit waypoints inside). The autopilot plans its own route within the region.

Start ──┐
        │    ╔═══════════════╗
        ├───►║    Region     ║  (contains Points)
        │    ╚═══════════════╝
Finish◄─┘

Reconnaissance flow. The reconnaissance UAV:

  1. Searches within the region and finds potential targets.
  2. Sends images to the retranslation UAV.
  3. The retranslation UAV forwards them to the human operator.
  4. The human operator makes a decision regarding the target using the behaviour-tree-driven scan_controller logic (system-flows.md §F4).

Scanning strategy.

  • Zoom-out level — wide-area scan. Camera points along the UAV route with left-right swing. The detections service continuously recognises specific patterns as POIs. This initial scan runs at medium zoom while moving between targets. POI types: tree rows (potential caponiers, entrances concealed by tree rows); polygons (areas where military vehicles could be hidden); houses with vehicles or traces; roads and routes on snow or terrain, inside the forest, or near houses.
  • Zoom-in level — detailed scan. When the camera finds a POI or movement candidate, it zooms in and performs a detailed scan. During detailed scan it searches for trees, caponiers, military vehicles, and so on. Movement detection continues during the zoom-in hold (subject to the per-zoom-band thresholds) so a moving small target found mid-detail-scan is not lost.

7.3 Restrictions

Hardware and camera.

  • Jetson Orin Nano Super: 67 TOPS INT8, 8 GB shared LPDDR5; YOLO uses ~2 GB RAM, leaving ~6 GB for semantic detection, movement detection, and VLM.
  • All models use FP16 precision (frozen choice: keep FP16-only for all models).
  • Primary camera: ViewPro A40, 1080p (1920×1080), 40× optical zoom, f=4.25170 mm, Sony 1/2.8" CMOS (IMX462LQR), HDMI or IP output at 1080p 30/60 fps.
  • Alternative camera: ViewPro Z40K at higher cost.
  • Thermal sensor (640×512, NETD ≤50 mK) is available only as a future enhancement, not a core requirement.

Operational.

  • Flight altitude: 6001000 m.
  • Support all seasons and terrain types: winter snow, spring mud, summer vegetation, autumn; forest, open field, urban edges, and mixed terrain. (Frozen choice: MVP must cover all seasons, not winter-first only.)
  • ViewPro A40 40× optical zoom traversal takes 12 s; zoom-out → zoom-in transition must complete within ≤2 s including physical zoom.
  • Movement detection runs at both zoom-out and zoom-in levels, compensates for UAV/gimbal motion, and queues candidates for zoom confirmation; target following starts only after operator confirmation. Per-zoom-band thresholds (cluster persistence, residual-velocity floor, telemetry-skew tolerance) are configurable.

Software.

  • Inference: TensorRT on Jetson, ONNX Runtime fallback, 1280 px model input, tile splitting for large images.
  • VLM must run locally on Jetson with no cloud dependency, as a separate IPC process — not compiled into the autopilot binary.
  • YOLO and VLM inference run sequentially because they share GPU memory; no concurrent execution.

Reliability and safety.

  • Lost-link failsafe is mandatory. Loss of the operator/Ground-Station modem link triggers a deterministic ladder in mission_executor (default RTL after a 30 s grace; configurable per mission). Loss of the airframe MAVLink link itself triggers immediate health → red and degrades to whatever ArduPilot/PX4's own failsafe dictates.
  • Pre-flight self-test (BIT) gates takeoff. GPS lock, camera RTSP healthy, gimbal homed (yaw/pitch/zoom feedback within tolerance), ../detections reachable + warmed, mission loaded + validated, MapObjects pre-flight pull complete (or cached fallback acknowledged with operator confirm), VLM warm (if vlm_enabled), persistent-store space ≥ configured floor.
  • Battery / fuel thresholds enforced. mission_executor triggers RTL at battery ≤ configured RTL-floor (e.g. 25 %); land-now at hard-floor (e.g. 15 %); ignored only on operator override. Surfaces health → yellow / red accordingly. Threshold values are mission-configurable.
  • Sustained frame-rate floor. Below ≥10 fps sustained, scan_controller suppresses zoom-in transitions (only TIER 1 + operator overlay continue) and surfaces health → yellow.
  • Wall-clock time source. Monotonic clock is authoritative for telemetry-skew compensation and tick budgets. Wall-clock is bound to GPS time once GPS is locked (preferred) or NTP-set at boot if reachable; both are recorded with clock_source and last_sync_at. Drift > 200 ms surfaces health → yellow.
  • On-device storage is bounded. mapobjects_store retention + log buffer have configured caps; on cap-hit, oldest pre-current-mission data is evicted; persistent-store-full pre-flight is a BIT failure.

Integration and scope.

  • The ../detections service is FastAPI + Cython + TensorRT in a Docker container on Jetson; consumed via bi-directional gRPC.
  • Consume YOLO boxes with class, confidence, and normalised coordinates; output boxes in the same format for operator display.
  • Movement candidates and confirmed followed targets use the same normalised box format for operator display.
  • GPS coordinates come from the GPS-denied service (../_docs/11_gps_denied.md) and are out of scope for autopilot's own implementation.
  • MapObjects sync uses the central missions API extension /missions/{id}/mapobjects (pre-flight GET, post-flight POST). Schema in §7.13.
  • Annotation tooling, training pipeline, and data-collection automation are separate repositories and out of scope.
  • GPS-denied navigation is a separate project; mission planning and route selection inside a region remain in autopilot.

Frozen choices (2026-05-06, updated 2026-05-18). Gating decisions for downstream design:

  1. Tier 1 remains FP16-only for all models. INT8 is rejected for MVP.
  2. MVP acceptance requires all seasons, not winter-first only.
  3. Operator-review cap is ≤5 POIs/minute (moderate cap chosen).
  4. Movement detection assumes timestamped video, gimbal angle/zoom, and UAV motion telemetry for MVP. Naive frame-differencing is rejected. Movement detection runs at both zoom-out and zoom-in; classical OpenCV adequacy at zoom-in is benchmark-gated (Q14).
  5. Local VLM is required for MVP if and only if the exact model satisfies ≤5 s/ROI and the memory budget; otherwise VLM is disabled for MVP and scan_controller operates without it.
  6. MapObjects are mission-bracketed and centrally synchronised via the missions API. In-flight sync is batched only for MVP (no streaming over modem).
  7. Operator commands are authenticated, signed, and replay-protected. Modem-link encryption alone is not sufficient.

7.4 Acceptance Criteria

Latency.

Tier Target Hardware
Tier 1 fast probe (YOLO26 + YOLOE-26) ≤100 ms/frame Jetson Orin Nano Super
Tier 2 fast confirmation (custom CNN) ≤200 ms/ROI Jetson Orin Nano Super
Tier 3 optional deep analysis (VLM) ≤5 s/ROI Jetson Orin Nano Super

YOLO object detection.

  • Add classes: black entrances of various sizes, branch piles, footpaths, roads, trees, and tree blocks.
  • New classes target: P ≥80 %, R ≥80 %; existing class performance must not degrade.
  • Baseline reference: current YOLO achieves P=81.6 %, R=85.2 % on non-masked objects.

Semantic detection.

  • Initial concealed-position recall: ≥60 %, accepting high false positives for later reduction.
  • Initial concealed-position precision: ≥20 %, with operators filtering candidates.
  • Footpath detection recall: ≥70 %.
  • Pipeline consumes YOLO primitives (footpaths, roads, branch piles, entrances, trees), assesses path freshness, traces paths to endpoints, identifies concealed structures, and follows the freshest or most promising branch at intersections.

Movement detection.

  • During the zoom-out sweep, detect small moving point/cluster candidates that are not yet classifiable and enqueue them for zoom confirmation within 1 s.
  • During the zoom-in hold, continue movement detection (independent residual-motion clustering, scaled for the zoomed pixel-to-metre ratio) so a moving small target appearing inside a held POI is not lost; enqueue within 1.5 s.
  • Account for UAV and gimbal motion: stable objects (trees, houses, roads, terrain) must not be treated as moving only because the camera platform moves.
  • Movement candidates become zoom-in POIs; after zoom, the system attempts semantic / YOLO confirmation as vehicle, people, or other relevant target.
  • Zoom-in adequacy of classical OpenCV optical-flow / global-motion estimation is benchmark-gated. If the false-positive rate at zoom-in exceeds the per-zoom-band budget, fall back to a learned optical-flow / CNN-based motion module behind a feature flag (Q14).

Scan and camera control.

  • Zoom-out level covers the planned route with a wide or light/medium-zoom left-right sweep; POIs include footpaths, tree rows, branch piles, black entrances, movement candidates, houses with vehicles or traces, and roads on snow / terrain / forest.
  • Transition zoom-out → zoom-in within 2 s of POI detection, including physical zoom from medium to high.
  • Zoom-in level keeps camera lock while the UAV flies, compensates for aircraft motion, pans along footpaths or movement candidates so they stay visible and centered, holds endpoints for VLM analysis up to 2 s, and returns to zoom-out after analysis or configurable timeout (default 5 s/POI).
  • After operator confirmation, target-follow mode keeps the target in the centre 25 % of frame while visible, until operator release, target loss, or timeout.
  • Gimbal module commands ViewPro A40 pan/tilt/zoom with ≤500 ms decision-to-movement latency, smooth transitions, and footpaths/moving targets kept centered during pan.
  • Maintain an ordered POI queue prioritised by confidence and proximity to current camera position.

Resources and data.

  • Semantic module + movement module + VLM RAM: ≤6 GB on Jetson Orin Nano Super.
  • Must coexist with the running YOLO pipeline without degrading YOLO performance.
  • Training data: hundreds to thousands of annotated images/sequences across all seasons and terrain types.
  • Dedicated annotation needed for black entrances, branch piles, footpaths, roads, trees, and tree blocks; available dataset assembly effort is 1.5 months at 5 hours/day.

7.5 Training Data

Source.

  • Aerial imagery from reconnaissance winged UAVs at 6001000 m altitude.
  • ViewPro A40 camera, 1080p resolution, various zoom levels.
  • Extracted from video frames and still images.
  • Movement detection requires frame sequences, not still images only; include camera/gimbal telemetry where available to separate target motion from UAV motion.

Target classes.

  • Footpaths / trails (linear features on snow, mud, forest floor).
  • Fresh footpaths (distinct edges, undisturbed surroundings, recent track marks).
  • Stale footpaths (partially covered by snow / vegetation, faded edges).
  • Concealed structures: branch-pile hideouts, dugout entrances, squared / circular openings.
  • Tree rows (potential concealment lines).
  • Open clearings connected to paths (FPV launch points).
  • Moving point/cluster candidates across the full zoom range (wide, light/medium, full zoom-in) — sequences must include both zoom-out and zoom-in examples to support per-zoom-band threshold tuning.

YOLO primitive classes (new).

  • Black entrances to hideouts (various sizes).
  • Piles of tree branches.
  • Footpaths.
  • Roads.
  • Trees, tree blocks.

Annotation format.

  • Managed by existing annotation tooling in a separate repository.
  • Expected: bounding boxes and/or segmentation masks depending on model architecture.
  • Footpaths may require polyline or segmentation annotation rather than bounding boxes.

Seasonal coverage.

  • Winter: snow-covered terrain (footpaths as dark lines on white).
  • Spring: mud season (footpaths as compressed/disturbed soil).
  • Summer: full vegetation (paths through grass/undergrowth).
  • Autumn: mixed leaf cover, partial snow.

Volume.

  • Target: hundreds to thousands of annotated images/sequences.
  • Available effort: 1.5 months at 5 hours/day.
  • Potential for annotation-process automation.

7.6 Solution Architecture

A two-level onboard scan system (zoom-out wide sweep + zoom-in confirmation). The system delegates Tier 1 detection to the existing FastAPI / Cython / TensorRT YOLO service (../detections), adds a central scan/perception scheduler (scan_controller), compensates motion using synchronised video / gimbal / UAV telemetry (movement detection runs at both zoom levels), controls the ViewPro A40 through a deterministic state machine, and invokes a secured local VLM process only for bounded zoom-in confirmation.

Before implementation decomposition, the project must pass a benchmark gate on target hardware: Tier 1 latency, Tier 2 ROI latency, VLM latency / memory, A40 zoom timing, movement-replay false-positive rate, and all-season dataset readiness.

Video frames + timestamped gimbal/zoom/UAV telemetry
        |
        v
Input validation + telemetry synchronisation
        |
        v
Central scan/perception scheduler (scan_controller)
        |
        +---> Existing FastAPI/Cython TensorRT service (../detections)
        |       YOLO26 + YOLOE-26 fixed-class FP16 engines
        |
        +---> Movement detector (active in ZoomedOut and ZoomedIn)
        |       OpenCV ego-motion compensation + residual clusters,
        |       per-zoom-band thresholds; learned-CV fallback Q14
        |
        +---> Tier 2 semantic analyzer
        |       primitive graph + lightweight ROI CNN (zoom-in only)
        |
        v
POI queue (confidence + proximity + aging + <=5 POIs/min cap)
        |
        +---> ViewPro A40 state-machine controller
        |
        +---> Secured local VLM IPC (optional, benchmark-gated)
                NanoLLM VILA1.5-3B, structured VlmAssessment output

Benchmark gate

The first implementation milestone is a proof suite, not product code. It validates:

  • YOLO26 + YOLOE-26 FP16 TensorRT, fixed 1280 px, batch 1, end-to-end ≤100 ms/frame.
  • Tier 2 primitive graph + lightweight CNN ≤200 ms/ROI.
  • NanoLLM VILA1.5-3B local VLM ≤5 s/ROI and within remaining memory budget while the YOLO container is present.
  • ViewPro A40 medium-to-high zoom transition and command-to-movement latency.
  • Movement replay false-positive rate measured independently at zoom-out and zoom-in, under the ≤5 POIs/minute operator-review cap. If zoom-in exceeds the per-zoom-band cap with classical CV, the learned-CV fallback (Q14) becomes a benchmark-gate prerequisite for the zoom-in scope.
  • All-season dataset readiness and hard-negative coverage.

Tier 1 primitive detector

Use custom-trained fixed-class YOLO26 and YOLOE-26 TensorRT FP16 engines, owned by ../detections. Runtime open-vocabulary prompt mutation is not part of MVP; fixed project classes or pre-baked embeddings are required. Outputs remain normalised boxes for operator display, with optional masks or path geometry passed as POI metadata.

Tier 2 semantic analyzer

Use a primitive graph plus a lightweight ROI CNN to reason over paths, branch piles, dark entrances, roads, trees, tree blocks, clearings, vehicles, people, and endpoint context. This layer owns path freshness, endpoint scoring, branch choice at intersections, and concealment-POI scoring. Active in the zoom-in level only.

Movement detector

Active at both zoom-out and zoom-in (skipped only during target-follow). Use OpenCV optical flow / global-motion estimation fused with timestamped gimbal angle, zoom state, and UAV motion telemetry. Naive frame differencing is rejected because it cannot distinguish target motion from platform motion. A telemetry synchronisation contract specifies maximum tolerated frame ↔ gimbal ↔ zoom ↔ UAV timestamp skew before motion compensation; out-of-tolerance samples must be rejected or downgraded.

Per-zoom-band tuning. Cluster persistence threshold, residual-velocity floor, and telemetry-skew tolerance are configured per zoom band (zoom-out, zoom-in). The pixel-to-metre ratio differs by ~10× between bands, so identical residual pixel motion implies very different physical motion; thresholds must scale.

Adequacy at zoom-in (research item, Q14). Classical optical flow / global-motion estimation is well-validated at zoom-out (UAV cruising, gimbal sweeping, large FOV, ego-motion is the dominant signal and easily fitted). At zoom-in the gimbal is actively path-following, the FOV is narrow, motion blur from any small command is large, and the homography model degrades. The benchmark gate (below) MUST measure the false-positive rate at zoom-in independently from zoom-out; if it exceeds the per-zoom-band cap, the implementation falls back to a learned optical-flow module (e.g. RAFT-derived) or a CNN-based motion-segmentation module behind a feature flag, while keeping the same input/output contract.

Scan controller and POI queue

Use a deterministic typed state machine with ZoomedOut, ZoomedIn { roi, hold_started_at }, and TargetFollow { target_id, started_at } states. The queue is ordered by confidence, proximity, and aging while enforcing the ≤5 POIs/minute operator-review cap. The controller handles timeouts, target loss, VLM waits, return-to-zoom-out, and target-follow centre-window behaviour. The full behaviour-tree spec — including tick scenarios and the 15 fixed-wing rules — lives in system-flows.md §F4.

Local VLM confirmation

Run NanoLLM with VILA1.5-3B through a separate local IPC process if the benchmark gate passes. Use one bounded ROI crop, short prompt, short answer, and a validated VlmAssessment schema. Free-form VLM text is not a downstream API. The IPC channel uses Unix-domain socket permissions and peer-credential checks where available.

Optionality model. VLM is the only optional Tier in the system. Two complementary mechanisms model this:

  1. Runtime configuration flag (vlm_enabled), gated by the benchmark-gate result. When the flag is false, scan_controller skips the VLM-confirmation step and proceeds with Tier 2 evidence alone for the zoom-in hold; the operator timeout still applies.
  2. Build-time feature module. The vlm_client component is a separate module behind a feature flag; the binary must build, link, and run identically when the module is absent. scan_controller MUST NOT contain a hard dependency on vlm_client's presence — it depends only on a VlmAssessment provider trait whose default implementation returns status: vlm_disabled.

The implementation chooses one of these (or both); both must yield the same observable behaviour: the system functions correctly with VLM absent, only losing the zoom-in confirmation step.

Integration and reliability

Preserve the normalised-box contract while adding POI metadata. A central scheduler (scan_controller) owns GPU-heavy work and enforces no concurrent YOLO/VLM execution. No silent exception swallowing; health must reflect every dependency listed in §5.

Security and operational controls

  • Validate image / ROI payload size and format before decoding or inference.
  • Use patched OpenCV versions and an image-format allow-list.
  • Enforce local IPC authorisation and payload limits for the VLM process (Unix-domain socket permissions plus peer-credential checks).
  • Log POI creation reasons, source detections, queue decisions, gimbal commands, VLM requests, operator confirmations, and failure states.
  • Keep VLM local with no cloud egress.

mavlink_layer is a hand-rolled MAVLink v2 transport. There is no third-party SDK dependency. The layer owns serialisation / deserialisation, heartbeat, sequence numbers, retry, and a single connection abstraction (UDP or serial, picked at startup from CLI / env).

Command surface (~1015 commands). Only what the system actually needs:

MAVLink message Direction Used by Purpose
HEARTBEAT bidirectional mavlink_layer liveness + GCS-vs-companion identification
COMMAND_LONG (subset) out mission_executor arm / disarm, takeoff, set-mode, change-speed, change-alt, land, RTL
COMMAND_ACK in mavlink_layer command-result demux, retry trigger
MISSION_COUNT out mission_executor pre-upload count
MISSION_REQUEST_INT in mission_executor pull-side mission upload
MISSION_ITEM_INT out mission_executor per-waypoint upload
MISSION_ACK in mission_executor upload completion
MISSION_SET_CURRENT out mission_executor start at item 0
MISSION_CURRENT in mission_executor progress
MISSION_ITEM_REACHED in mission_executor progress
MISSION_CLEAR_ALL out mission_executor reset before re-upload (e.g., middle waypoint)
GLOBAL_POSITION_INT in telemetry_stream, mission_executor live position
ATTITUDE in telemetry_stream attitude for operator overlay
SYS_STATUS / EXTENDED_SYS_STATE in health aggregator mode, battery, sensor health
STATUSTEXT in logger autopilot diagnostic lines
SET_MODE (or COMMAND_LONG MAV_CMD_DO_SET_MODE) out mission_executor flight-mode transitions for fixed-wing

If the autopilot link supports MAVLink-2 message signing it is enabled; otherwise the link is treated as trusted (it is point-to-point on a closed serial / UDP path on the airframe).

Piloting variants. mission_executor runs one of two state machines depending on the airframe declared at startup:

  • Multirotor variant: DISCONNECTED → CONNECTED → HEALTH_OK → ARMED → TAKE_OFF → MISSION_UPLOADED → FLY_MISSION → LAND. The executor arms, takes off to a configured altitude, and only then uploads + starts the mission. Bounded retry with exponential backoff at every transition; explicit max-retry; on exceeding it, health flips to red and the executor surfaces the failure via the operator bridge.
  • Fixed-wing variant: DISCONNECTED → CONNECTED → HEALTH_OK → MISSION_UPLOADED → WAIT_AUTO → FLY_MISSION → LAND. The executor skips arm/takeoff (the airframe is assumed already airborne under RC control), uploads the mission, and waits for the operator to switch the airframe into AUTO mode via RC. Same retry policy.

Geofence enforcement. mission_executor honours both INCLUSION and EXCLUSION polygons declared in the mission. INCLUSION violations halt forward progress and trigger return-to-launch (RTL); EXCLUSION violations trigger the same. The earlier C++ implementation parsed but silently ignored EXCLUSION; the new design rejects that behaviour explicitly.

Mission uploads and middle-waypoint inserts. When the operator confirms a target, operator_bridge hands a middle-waypoint hint to mission_executor. The executor recomputes the mission (current-position → middle-waypoint → resume original route), clears the existing autopilot mission via MISSION_CLEAR_ALL, re-uploads the new mission via the standard MISSION_COUNT / MISSION_ITEM_INT / MISSION_ACK sequence, and resumes flight. After target-follow ends (operator release, target loss, or timeout), the same sequence reverts to the original mission.

Lost-link failsafe (operator/Ground-Station modem link). A typed failsafe ladder runs in mission_executor, evaluated each tick:

Stage Trigger Action
LinkOk last operator heartbeat ≤ 5 s continue mission; no behavioural change
LinkDegraded 5 s < last heartbeat ≤ 30 s continue mission; surface health → yellow; queue all POI surface-events for replay-on-recovery
LinkLost last heartbeat > 30 s and target-follow inactive trigger RTL via MAV_CMD_NAV_RETURN_TO_LAUNCH; log mission abort with reason; continue logging the mission diff for post-flight upload via mapobjects_store
LinkLostInFollow last heartbeat > 30 s and in target-follow hold target-follow for an additional 30 s grace (operator may have momentarily lost link); thereafter fall through to LinkLost

The grace windows (5 s, 30 s, 30 s) are mission-configurable. MAVLink-link loss to ArduPilot/PX4 itself is not the same event — it triggers immediate health → red and falls through to whatever the airframe autopilot's own failsafe does (we do NOT override it).

Battery / fuel thresholds. mission_executor reads SYS_STATUS / EXTENDED_SYS_STATE and enforces:

  • battery ≤ rtl_threshold (default 25 %) → trigger RTL, log reason, continue post-mission upload.
  • battery ≤ hard_floor (default 15 %) → land-now via MAV_CMD_NAV_LAND at safest reachable point; surface health → red.

Operator override is permitted via a signed command (per Q9); without it, the thresholds are hard.

Connection configuration. A single connection URI at startup: udp://... or serial:///dev/.... No runtime URI swap.

Frames and altitudes. All waypoints in the mission API use MAV_FRAME_GLOBAL_RELATIVE_ALT. Terrain-following frames are not used (no SRTM database on the airframe).

7.8 Detection Classes

These classes extend the default seed set used by the detections service.

Class Local Name (UA) Notes
Rows of trees Посадка Linear vegetation cover
Trenches/Ditches Рів Linear earthwork features
Trash piles Сміття Indicators of activity
Tire tracks Сліди від шин Signs of movement

Plus the new YOLO primitive classes from §7.5 Training Data: black entrances of various sizes, branch piles, footpaths, roads, trees, and tree blocks.

7.9 MapObjects (H3 spatial index)

MapObjects are created and managed internally by autopilot. There are no REST API endpoints for MapObjects — autopilot reads/writes them directly in the on-device store (mapobjects_store). The only external reference is the delete cascade in DELETE /missions/{id} (per the suite-level missions API).

Autopilot needs to store objects on a 2D map efficiently in order to find differences fast:

  • New objects (new pile of trash, new tire tracks).
  • Changed objects.
  • Removed objects.

Each object on the map is described by:

  • gps(lat, lon) — geographic position.
  • size(width, height) — bounding area.

Spatial indexing. Use a hexagonal spatial index to efficiently store and query objects by location.

Approach: H3 library (by Uber) — hierarchical hexagonal geospatial indexing system.

Aspect Detail
Library H3 (h3rs crate for Rust)
Algorithm basis 3D icosahedron → 2D hexagonal tessellation
Key advantage Uniform area cells, good neighbour queries
Open question Optimal tile/resolution size
Known issue Discontinuity problem at cell boundaries

The hexagonal grid avoids the distortion problems of square grids and provides consistent neighbour relationships, making it suitable for fast spatial diff operations (detecting new, changed, and removed objects).

7.10 Drone ⇄ Operator Sync Message Format

Detection data is synced between drone and operator using a compact message format. MGRS (Military Grid Reference System) is used as the primary coordinate encoding — compact, standardised, and directly usable on military maps.

Drone → Operator (detection report):

missionId :: MGRS(encoded) :: class :: confidence :: size_width_m :: size_length_m :: photo_metadata :: flags

Operator → Drone (command/acknowledgment):

missionId :: Encoded(GroundMGRS :: Time) :: ... :: missionId2

Wire-level field semantics live in data_model.md §MGRS sync message.

7.11 Target Relocation / Movement Analysis

The system maintains a live map of objects and detects changes between survey passes.

Map update types.

Type Meaning
New Object not seen before in this area
Moved Object of same class appeared nearby
Removed Previously recorded object no longer present

Map hashtable. Objects are stored in a hashtable keyed by MGRS grid reference:

MGRS1  -> Object1
MGRS2  -> Object5
MGRS12 -> Object2
MGRSN  -> ObjectM

7.12 New vs Existing / Moved / Removed Object Detection

When a detection occurs, the system must determine whether the object is new, moved, or already known. This must be done efficiently in real time. This is the implementation of scan_controller's map-diff responsibilities; it lives in mapobjects_store.

Algorithm.

On each detection(gps, class, confidence, size):

1. Compute H3 cell index at chosen resolution (e.g. res 10 ~15m edge).
2. Build composite key = H3_cell + class.
3. Query k-ring(H3_cell, k=2) -> get all neighbouring cells.
4. For each neighbouring cell, lookup objects with same or similar class:
     similar_classes = {military_vehicle, tank, artillery}  (configurable groups)
5. Compare:
     - If matching object found within distance_threshold (config, e.g. 50m)
       AND same class group -> EXISTING (or MOVED if position delta > move_threshold).
     - If no match -> NEW -> insert into map with H3 hash key.
6. After full sweep: objects in the region that were NOT re-observed -> REMOVED candidates.

Why H3 + MGRS.

Step Mechanism Complexity
Spatial cell lookup H3 latlng_to_cell O(1)
Neighbour query H3 grid_disk(k=2) O(1)
Object lookup per cell Hashtable by MGRS+class O(1)
Total per detection ~constant time O(k²)

Configurable parameters.

Parameter Example Value Purpose
search_radius_km 30 Max radius to search for previously known objects
distance_threshold_m 50 Max distance to consider same object
move_threshold_m 10 Min displacement to flag as "moved"
h3_resolution 10 ~15 m edge length, good for vehicle-sized objects
similar_classes per config Class groups treated as equivalent for matching

Notes.

  • The 30 km radius is for the broad initial query ("get all previously stored objects within 30 km"). H3 grid_disk at resolution 10 with k=2 covers ~90 m radius — this handles fine-grained matching. For the broad query, use a coarser H3 resolution (e.g. res 4 ~22 km edge) as a pre-filter.
  • MGRS+class is the composite key for the hashtable so that lookups are partitioned by both location and object type.
  • The discontinuity problem at H3 cell boundaries is solved by always querying the k-ring (centre cell + neighbours), ensuring objects near an edge are still matched.

7.13 MapObjects Sync (central DB)

mapobjects_store is not a private on-device database. It is the working copy of a centrally maintained map of detected objects, scoped to the mission's bounding box, synchronised on a per-mission basis.

Mirror of the GPS-Denied satellite-tile pattern. Pre-flight, autopilot pulls the relevant central state into the on-device store; in-flight the on-device store is authoritative; post-flight, autopilot pushes the mission's full pass diff back to the central store. The central store is the source of truth across missions and across UAVs; the on-device store is the source of truth during the active mission.

Endpoint hosting (frozen 2026-05-18). The endpoints are an extension of the existing missions API. There is no separate mapobjects service.

Endpoint Method Purpose
/missions/{id}/mapobjects GET Pre-flight: returns the central map state for the mission's bounding box (last-known objects + ignored items).
/missions/{id}/mapobjects POST Post-flight: uploads the mission's full pass diff (NEW / MOVED / REMOVED-CANDIDATE / CONFIRMED-EXISTING) for central merge.
/missions/{id}/mapobjects/ignored GET Pre-flight: returns the central ignored-items list scoped to the mission area.
/missions/{id}/mapobjects/ignored POST Post-flight: uploads ignored-items appended during the mission.
DELETE /missions/{id} (existing) Cascade: drops mission-scoped MapObjects and IgnoredItems centrally as well as on-device.

In-flight sync is batched only for MVP — no streaming over modem. Cross-UAV awareness lags by mission length; this is an explicit MVP trade-off (Frozen choice 6 in §7.3).

Sync lifecycle (per mission).

  1. Pre-flight pullmission_client calls GET /missions/{id}/mapobjects after fetching the mission itself. Response hydrates mapobjects_store. Failure modes:
    • Reachable + 200: hydrate; record pull_completed_at. Sync state = synced.
    • Reachable + 4xx: fail BIT; surface error; operator must investigate (likely mission-id mismatch or unauthorised UAV).
    • Unreachable / timeout: BIT degrades. Operator may acknowledge to continue with last-cached state for this mission area (sync state = cached_fallback); the BIT failure is recorded for post-mission audit.
    • Empty response: sync state = synced, store empty (legitimate first-flight in this area).
  2. In-flight — store is authoritative. All NEW / MOVED / EXISTING / IgnoredItem appends accumulate in the on-device store with pending_upload = true. No central writes.
  3. Post-flight pushmission_client calls POST /missions/{id}/mapobjects with the mission's full pass diff after landing or RTL. Conflict resolution is server-side per §7.13 conflict rules. Failure modes:
    • Reachable + 200: clear pending_upload; record push_completed_at. Sync state = synced.
    • Unreachable / timeout / 5xx: persist the pending diff on disk, retry with backoff. After max retries (configurable, default 24 h), surface as a warning; operator may manually trigger replay or accept loss.
    • 4xx (rejected): log full payload, surface to operator; do not silently discard — the mission's results are at risk.

Conflict resolution at the central store (open question Q8 — proposed). When two missions report contradicting state for the same (h3_cell, class_group):

  • Both observations are appended to the per-(h3_cell, class_group) observation log (no destructive overwrite).
  • The "current view" surfaced to operator UI is computed from the observation log: most recent confirmed-existing observation wins; older REMOVED claims expire after a configurable age; class-group ambiguities surface as multi-class candidates.
  • IgnoredItems are union-merged (any operator-decline at any UAV propagates to all future missions in the same area, until explicit clear).

Central-side schema (SQL, indicative).

-- Observations: every detection ever reported by any UAV/mission, never overwritten.
CREATE TABLE map_object_observations (
    id              UUID PRIMARY KEY,
    h3_cell         BIGINT NOT NULL,
    class           TEXT NOT NULL,
    class_group     TEXT NOT NULL,
    mission_id      UUID NOT NULL REFERENCES missions(id) ON DELETE CASCADE,
    uav_id          UUID NOT NULL,
    observed_at     TIMESTAMPTZ NOT NULL,
    gps_lat         DOUBLE PRECISION NOT NULL,
    gps_lon         DOUBLE PRECISION NOT NULL,
    mgrs            TEXT NOT NULL,
    size_width_m    REAL,
    size_length_m   REAL,
    confidence      REAL NOT NULL,
    diff_kind       TEXT NOT NULL CHECK (diff_kind IN ('NEW','MOVED','EXISTING','REMOVED_CANDIDATE')),
    photo_ref       TEXT,
    raw_evidence    JSONB
);
CREATE INDEX ON map_object_observations (h3_cell, class_group);
CREATE INDEX ON map_object_observations (mission_id);
CREATE INDEX ON map_object_observations (observed_at DESC);

-- IgnoredItems: per-area operator declines, union-merged across missions.
CREATE TABLE map_object_ignored (
    id              UUID PRIMARY KEY,
    h3_cell         BIGINT NOT NULL,
    mgrs            TEXT NOT NULL,
    class_group     TEXT NOT NULL,
    declined_at     TIMESTAMPTZ NOT NULL,
    operator_id     UUID,
    mission_id      UUID REFERENCES missions(id) ON DELETE SET NULL,
    retention_scope TEXT NOT NULL CHECK (retention_scope IN ('mission','session','until_expiry')),
    expires_at      TIMESTAMPTZ
);
CREATE INDEX ON map_object_ignored (h3_cell, class_group);
CREATE INDEX ON map_object_ignored (expires_at) WHERE retention_scope = 'until_expiry';

-- Materialised "current view" derived from observations + ignored.
-- Recomputed nightly or on POST. Exact projection rules per §7.13 conflict resolution.
CREATE MATERIALIZED VIEW map_objects_current AS ...;

On-device-side schema (engine TBD per §8 Q3 — indicative shape).

mapobjects_store/
  current_state            -- key = (h3_cell, class_group); value = MapObject record
  pending_observations     -- ordered log of unflushed observations for post-flight POST
  pending_ignored          -- unflushed IgnoredItem appends
  sync_state               -- {pull_completed_at, push_completed_at, last_error, kind}

The on-device shape is intentionally narrower than the central schema — the on-device store does not need full observation history beyond the active mission; older history is only ever consulted via the central pull.

Bounding-box pull strategy. The central API uses the mission's geofence INCLUSION polygon (or a generous AABB if no INCLUSION is set) to scope the response. Pulled records are filtered by retention age (default ≤30 days); operator can override to "all". The 30 km / k-ring numbers in §7.12 apply to on-device spatial queries; the pull radius is mission-defined.

7.14 Tech Stack

Requirements.

Area Requirement
Runtime hardware Jetson Orin Nano Super 8 GB, locked JetPack/power mode, ViewPro A40.
Inference (Tier 1) FP16 only, TensorRT primary, ONNX Runtime fallback, 1280 px model input. Lives in ../detections.
Service integration Bi-directional gRPC client to the existing FastAPI + Cython + TensorRT detections service.
VLM Local-only, separate IPC process, sequential with YOLO, ≤5 s/ROI if used for MVP.
Movement Active at zoom-out and zoom-in, moving-camera compensation with timestamped video / gimbal / UAV telemetry; per-zoom-band thresholds; learned-CV fallback per Q14.
MapObjects sync Mission-bracketed: pre-flight GET + post-flight POST against /missions/{id}/mapobjects. Batched only for MVP.
Output Existing normalised-box format plus POI metadata for queue / reasoning.
Proof gates Hardware/replay benchmark suite before implementation decomposition; movement zoom-in benchmark independent of zoom-out.

Selected stack.

Layer Selection Rationale
Language (autopilot) Rust Memory safety, performance, single-binary deployment, strong type system for the deterministic state machine.
Language (../detections) Python + Cython Existing service; we consume it, not rewrite it.
Tier 1 detector YOLO26 + YOLOE-26 fixed-class FP16 TensorRT Best fit with acceptance criteria and export docs. Owned by ../detections.
Tier 2 analyzer Primitive graph + lightweight CNN Fast, explainable, data-efficient.
Movement OpenCV optical flow + telemetry Directly addresses moving-camera constraint.
VLM runtime NanoLLM / VILA1.5-3B (with fallback benchmark path) Documented local-multimodal path; matches no-cloud requirement.
Scan controller Deterministic typed state machine (Rust) Simpler and easier to test for a fixed ZoomedOut / ZoomedIn / TargetFollow lifecycle.
MAVLink transport Hand-rolled in autopilot (Rust) Eliminates the largest current dependency-risk item; small command surface (§7.7).
Gimbal protocol ViewPro A40 vendor protocol over UDP Matches the deployed camera.
mapobjects_store engine TBD (SQLite + H3 extension / KV / in-memory + snapshot) Open question; see §8.
Inter-component IPC (in-process) Tokio channels / actors Idiomatic Rust async.
External IPC (VLM) Unix-domain socket with peer-credential check Local-only authorisation.
VLM output Validated structured VlmAssessment schema Makes VLM output a stable API contract.
Input security Content / size allow-list + patched OpenCV Reduces crafted-input and resource-exhaustion risk.
Observability tracing + JSON logs to stdout, scraped by the deployment's log-shipping stack See deployment/observability.md.
Build cargo cross-compile for aarch64-unknown-linux-gnu See deployment/ci_cd_pipeline.md.

Risk register.

Risk Impact Mitigation
Tier 1 misses ≤100 ms/frame Blocks acceptance Fixed-shape FP16 engines, batch 1, benchmark before implementation decomposition.
VLM misses ≤5 s/ROI or memory budget Blocks VLM-required MVP policy Benchmark NanoLLM / VILA first; fall back to smaller VLM only if it passes the same gates; otherwise disable VLM via vlm_enabled=false.
All-season MVP data is insufficient Blocks detection-quality targets Per-season dataset gates and hard-negative mining.
Movement false positives exceed ≤5 POIs/min Operator overload Telemetry-aided compensation, replay tests, queue cap, per-zoom-band thresholds.
Classical OpenCV optical flow inadequate at zoom-in Loss of zoom-in movement detection Benchmark gate measures zoom-in independently; fallback to learned-CV / CNN motion module behind feature flag (Q14).
Operator/Ground-Station modem link lost mid-flight Uncontrolled UAV Typed lost-link failsafe ladder in mission_executor (§7.7); RTL after 30 s grace; configurable.
Battery / fuel below threshold mid-mission Forced landing or crash Hard-coded RTL + land-now thresholds (§7.7); operator override only via signed command.
Operator command spoofing / replay over modem RF Hostile hijack of operator commands Authenticated, signed, replay-protected command envelope (§5; scheme TBD per Q9).
Pre-flight self-test (BIT) misses a degraded dependency Mid-flight component failure BIT covers every dependency in §5 plus mission load + MapObjects pre-flight pull; cached-fallback acknowledgement is explicit.
Wall-clock drift breaks operator-command timestamping Forensic + audit failures GPS-time-bound when GPS locked; NTP at boot; drift > 200 ms surfaces health → yellow.
MapObjects post-flight push fails Loss of mission-diff data centrally Persist pending diff on disk; bounded retry; operator-visible warning; manual replay supported.
A40 zoom transition exceeds ≤2 s Breaks scan timing Hardware-in-loop timing test; revise scan timeout / zoom range if needed.
Hand-rolled MAVLink misses an edge case Mission failure or hard-to-debug protocol behaviour Conformance test against ArduPilot SITL; replay-based regression tests.
Unstructured VLM output corrupts downstream decisions Operator-facing false confidence Schema validation, confidence enum, timeout / error state, fail-closed behaviour.
Telemetry skew breaks movement compensation False motion candidates Define maximum frame / gimbal / UAV timestamp skew; reject / degrade unsynchronised samples.
Untrusted image / ROI payloads exploit decoders or memory Security and availability risk Pin patched OpenCV, restrict formats, enforce size caps before decode.

8. Open Questions

# Question Impact
Q1 Sweep pattern specification. Pattern shape (pendulum / raster / lawn-mower), FOV per zoom tier, dwell time per direction, and whether sweep runs continuously or only between specific mission waypoints. Blocks scan_controller zoom-out implementation.
Q2 Ground Station API contract. Stream protocol (WebRTC / WebSocket-H.264 / gRPC server-streaming?), session/auth model, and bbox-overlay rendering (server-side burn-in vs client-side render). Blocks telemetry_stream + operator_bridge design.
Q3 mapobjects_store engine. SQLite + H3 extension / KV / in-memory + snapshot. Blocks persistent-state design for ignored items + MapObjects.
Q4 Tier 1 contract evolution. How detection_client is versioned against an evolving ../detections schema. Blocks the gRPC contract definition.
Q5 mission-schema extraction location. _infra/ at suite root, or a small third repo. Blocks the mission_client / missions API contract sharing.
Q6 MAVLink-2 message signing. Whether the airframe link enables MAVLink-2 signing or treats the link as trusted. Affects mavlink_layer startup handshake.
Q7 Central MapObjects API contract. Endpoint hosting is frozen as an extension of the missions API (§7.13). The remaining contract concerns are: schema versioning, paging strategy for large mission areas, photo-reference upload mechanism (URL handoff vs inline), and observation-history retention policy. Blocks missions repo work + mission_client MapObjects sync code.
Q8 MapObjects conflict resolution. When two missions report contradicting state for the same (h3_cell, class_group), the proposed rule is "append-only observation log + computed current view" (§7.13). Open: exact projection rules, REMOVED-claim expiry window, multi-class disambiguation. Blocks central map_objects_current view definition.
Q9 Operator-command authentication scheme. The principle is committed (§5: signed, replay-protected). Scheme open: HMAC over (session_token, sequence_number, payload) vs JWT-style ed25519 vs MAVLink-2 signing extended to operator commands vs separate envelope. Blocks operator_bridge validation logic + Ground Station integration.
Q10 Software rollback policy on the airframe. Watchtower OTA is mentioned in ../_docs/00_top_level_architecture.md. Policy open: how a bad autopilot update is detected on the airframe (boot-time self-check, A/B partition, watchdog rollback) and rolled back without crew intervention. Affects deployment design + on-airframe service supervision.
Q11 Multi-operator session policy. When two operators connect (one in primary station, one remote), which is authoritative for confirm/decline? Single active operator at a time, or quorum? How is operator_id recorded in IgnoredItem? Blocks operator_bridge session model.
Q12 Comms blackout during banking turns. Winged UAV banking can lose modem LOS to Ground Station. Policy: tolerate brief blackouts as LinkDegraded, or suppress lost-link failsafe during known turn arcs (computed from mission shape)? Affects lost-link failsafe ladder timing constants (§7.7).
Q13 All-season acceptance flight gates. Dataset gates (§7.4) are committed; flight-test gates are not. Open: minimum number of real flights per season before MVP acceptance, per-season acceptance pass criteria. Affects MVP sign-off scope.
Q14 Movement detection at zoom-in — fallback selection. If classical OpenCV optical flow / global-motion estimation does not meet the per-zoom-band false-positive cap at zoom-in, the fallback module choice is open: learned optical flow (RAFT / FlowNet derivative) vs CNN motion segmentation vs IMU-tighter-coupled classical CV. The interface contract (Frame + telemetry → Vec<MovementCandidate>) is fixed; the implementation is replaceable. Blocks movement_detector zoom-in scope if classical CV fails benchmark gate.

9. Out of Scope

  • Multi-airframe coordination, fleet management, swarm logic.
  • Mission re-planning beyond middle-waypoint inserts.
  • Mission planning / route selection for arbitrary mission shapes (only intra-region routing).
  • GPS-denied navigation algorithms (delegated to the GPS-denied service, ../_docs/11_gps_denied.md).
  • Cloud-hosted VLM or any external inference dependency.
  • Encrypted transport beyond what MAVLink-2 message signing and modem-level link encryption already provide.
  • Annotation tooling, model training, dataset curation (separate ai-training repo).
  • Operator browser UI (Ground Station hosts it; autopilot only feeds it).

10. External Suite Documents

These suite-level documents live in the parent suite repo (../_docs/) and are consumed by autopilot but not owned by autopilot.

Suite-level path Owner / primary-for What autopilot uses it for
../_docs/00_top_level_architecture.md suite (cross-cutting) Suite topology, deployment tiers (edge), the flight-gate convention (/run/azaion/in-flight — written by autopilot, read by model-sync.service), Watchtower OTA model. Defines autopilot's place in the 11-component system.
../_docs/02_missions.md missions repo (.NET service) Mission / Waypoint / Vehicle schemas. Autopilot consumes the missions API via mission_client.
../_docs/03_detections.md detections repo (Cython service) Detections API spec. Autopilot consumes via bi-directional gRPC in detection_client.
../_docs/04_system_design_clarifications.md suite (cross-cutting) REST patterns, stream-detection protocol, edge-device connection semantics. Defines the Ground Station push contract used by telemetry_stream.
../_docs/11_gps_denied.md gps-denied-onboard / gps-denied-desktop (shared primary) GPS-Denied service architecture. Autopilot does NOT host any GPS-denied code; it consumes corrected GPS through the shared edge data path.
../_docs/12_ai_training.md ai-training repo AI training pipeline. Autopilot consumes the resulting ONNX/TensorRT models via the rclone model-sync timer (flight-gate-aware).