autopilot/_docs/02_document/architecture.md

# autopilot — Architecture

**Status**: forward-looking design (Rust). The implementation is in flight; the system described here is the target architecture, not what runs today. Confirmed by user 2026-05-17.

## Synopsis

`autopilot` is the onboard mission executor for a reconnaissance winged UAV. It runs as a single Rust process on an aarch64 Jetson Orin Nano edge device. It pulls a mission from the external `missions` API, controls the UAV through a hand-rolled MAVLink layer (~10–15 commands; no third-party SDK), drives a ViewPro A40 gimbal in a two-level scan-and-zoom loop (zoom-out wide sweep + zoom-in on POI), streams camera frames + telemetry continuously over modem to an external Ground Station API so the operator watches in a browser, and uses bi-directional gRPC to delegate primitive object detection to the external `../detections` API. Semantic-vision reasoning (Tier 2 ROI analysis + an optional local VLM), a POI scheduler with an operator-review rate cap, and a target-follow mode after operator confirmation all run inside autopilot. The dominant pattern is a deterministic typed state machine (zoom-out / zoom-in / target-follow) coordinating a small set of async actors.

---

## 1. System Context

Autopilot integrates with six external systems. The local VLM is optional (benchmark-gated); everything else is mandatory.

```mermaid
flowchart LR
    cam["ViewPro A40<br/>RTSP camera + gimbal"]
    det["../detections<br/>Tier 1 YOLO service"]
    vlm["NanoLLM VILA1.5-3B<br/>(optional, local IPC)"]
    miss["missions API"]
    gs["Ground Station<br/>operator UI"]
    ap["ArduPilot / PX4"]
    autopilot["autopilot<br/>onboard mission + scan + perception"]
    cam <-->|RTSP frames / UDP gimbal control| autopilot
    autopilot <-->|bidir gRPC| det
    autopilot <-.->|Unix-domain socket IPC| vlm
    autopilot <-->|REST GET / POST| miss
    autopilot <-->|stream over modem| gs
    autopilot <-->|MAVLink v2| ap
```

Per-edge protocol details:

| Edge | Protocol | Direction | Purpose |
|---|---|---|---|
| ViewPro A40 (camera) | RTSP/RTP over TCP/UDP | inbound | live H.264/265 1080p video to `frame_ingest`. |
| ViewPro A40 (gimbal) | UDP, vendor control protocol | bidirectional | yaw / pitch / zoom commands + status; driven by `gimbal_controller`. |
| `../detections` | bi-directional gRPC | bidirectional | frames out, bounding boxes back; driven by `detection_client`. |
| NanoLLM VILA1.5-3B | Unix-domain socket IPC (peer-cred check) | bidirectional | bounded ROI + short prompt → structured `VlmAssessment`; optional. |
| `missions` API | HTTPS REST (GET / POST) | bidirectional | mission pull on start; middle-waypoint POST on operator confirmation; **MapObjects** pre-flight pull + post-flight push (`/missions/{id}/mapobjects`, see §7.13). |
| Ground Station API | continuous push over modem (protocol per `../_docs/04_system_design_clarifications.md`) | bidirectional | always-on camera feed + telemetry + bbox overlay; operator confirm / decline / target-follow. |
| ArduPilot / PX4 | MAVLink v2 over UDP or serial | bidirectional | the small command surface in §7.7. |

---

## 2. Component Layering

Three internal layers (Perception → Decision + Memory → Action) plus an always-on Telemetry plane that runs parallel to the decision loop.

```mermaid
flowchart TB
    subgraph autopilot ["autopilot"]
        subgraph perception ["Perception (data plane in)"]
            fi[frame_ingest]
            dc[detection_client]
            md[movement_detector]
            sa[semantic_analyzer]
            vc["vlm_client (opt)"]
        end
        subgraph brain ["Decision + Memory"]
            sc[scan_controller]
            mo[mapobjects_store]
        end
        subgraph action ["Action (data plane out)"]
            gc[gimbal_controller]
            ob[operator_bridge]
            me[mission_executor]
            ml[mavlink_layer]
            msc[mission_client]
        end
        subgraph tplane ["Telemetry plane (always-on, parallel)"]
            ts[telemetry_stream]
        end
    end
    perception ==>|"inputs (bboxes, motion, Tier 2, VlmAssessment)"| brain
    brain ==>|"commands + POI updates + middle-waypoint hints"| action
    perception -.->|"frames + bboxes"| tplane
    action -.->|"telemetry"| tplane
```

Per-flow component-to-component sequence diagrams live in `system-flows.md`.

---

## 3. Components

| Component | Layer | Responsibility |
|---|---|---|
| `frame_ingest` | Perception | Pull RTSP from ViewPro A40; decode; timestamp; hand frames to `detection_client`, `movement_detector`, and `telemetry_stream` (zero-copy where possible). |
| `detection_client` | Perception | Bi-directional gRPC to `../detections`; streams frames out, receives bounding boxes back; same bboxes are reused for Tier 2 ROI selection and for operator overlay. Versioned against the `../_docs/03_detections.md` contract. |
| `movement_detector` | Perception | Active in **both** zoom-out and zoom-in levels (skipped only during target-follow). OpenCV optical-flow / global-motion estimation fused with timestamped gimbal angle, zoom state, and UAV motion telemetry. Emits residual-motion clusters as POI candidates. Ego-motion compensation is mandatory; naive frame-differencing is rejected. Zoom-in adequacy of classical CV is benchmark-gated — see §7.6 Movement detector and Open Question Q14. |
| `semantic_analyzer` | Perception | Tier 2. Primitive graph + lightweight ROI CNN over zoom-in crops. Owns path-freshness scoring, endpoint scoring, branch choice at intersections, and concealment-POI scoring. |
| `vlm_client` | Perception (optional) | Local-IPC client to a NanoLLM/VILA1.5-3B process. Validates ROI payload size/format, calls the VLM with a bounded crop and short prompt, validates the response against a structured `VlmAssessment` schema. No cloud egress. Optional behind a `vlm_enabled` flag and a feature module (see §7.6 Local VLM Confirmation). |
| `scan_controller` | Decision + Memory | Central deterministic typed state machine — `ZoomedOut`, `ZoomedIn`, `TargetFollow`. Owns the POI queue, timeouts, ≤5 POIs/min cap, confidence-scaled operator-decision window, and gimbal-command issuance. Full behaviour-tree spec in `system-flows.md §F4`. |
| `mapobjects_store` | Decision + Memory | On-device H3-indexed map of detected objects + ignored-items list. Pre-flight pull of the mission-area map from the central `missions` API; in-flight on-device authoritative; post-flight push of the mission diff back to central. Computes new / moved / existing / removed diffs across passes (§7.10, §7.11, §7.12). Read/written directly by `scan_controller`; sync pulls/pushes are handled via `mission_client`. |
| `gimbal_controller` | Action | ViewPro A40 control protocol (yaw / pitch / zoom). Honours ≤2 s zoom transition budget and ≤500 ms decision-to-movement latency. Owns the smooth-pan path-tracking primitive used in zoom-in level. |
| `operator_bridge` | Action | Surfaces POIs and target-follow lifecycle events through `telemetry_stream` to the Ground Station; receives confirm / decline / target-follow start-release back. On decline, appends an `IgnoredItem` via `mapobjects_store`. On confirm, hands a middle-waypoint hint to `mission_executor`. |
| `mission_executor` | Action | Multirotor and fixed-wing variants of the platform state machine: takeoff / climb / cruise / land for multirotor; upload-and-await-AUTO for fixed-wing. Owns geofence enforcement (both INCLUSION and EXCLUSION). Issues MAVLink commands through `mavlink_layer`; consumes `mission_client` mission state. Inserts middle waypoints on operator-confirmed targets. |
| `mavlink_layer` | Action | Hand-rolled MAVLink v2 transport (UDP or serial) implementing only the ~10–15 commands this codebase needs. See §7.7 for the command surface. No third-party SDK. |
| `mission_client` | Action | Pulls mission JSON from the `missions` API on start; validates against `mission-schema`; handles mid-flight middle-waypoint inserts (POST). Survives transient connection loss with bounded retry. |
| `telemetry_stream` | Telemetry plane | Continuous push of camera frames + flight telemetry + bbox overlay to the Ground Station API over modem. Always-on; not detection-gated. Carries operator commands (confirm / decline / target-follow start-release) on the return path. |

The system is intentionally a small set of well-named components rather than 30+ files. Everything in `frame_ingest`, `detection_client`, `movement_detector`, `semantic_analyzer`, and `vlm_client` runs on the **input data plane** — no UAV control, no operator surface. Everything in `gimbal_controller`, `mission_executor`, `mavlink_layer`, `mission_client`, and `operator_bridge` runs on the **output control plane** — UAV motion + operator interaction. `scan_controller` and `mapobjects_store` are the **brain** in between. `telemetry_stream` is parallel; it never sits in the decision path.

Per-component design specs (purpose, inputs, outputs, state, failure modes, NFRs) live in `components/<name>/description.md`.

---

## 4. Major Data Flows

1. **Frame pipeline**. ViewPro A40 RTSP → `frame_ingest` → `detection_client` (bi-dir gRPC to `../detections`) → bboxes back → `movement_detector` (active at both zoom-out and zoom-in; residual-motion clusters) → `scan_controller` POI queue. The same bboxes also flow into `telemetry_stream` for operator overlay. (`system-flows.md §F1`)
2. **Zoom-in + confirmation**. `scan_controller` pops a POI → `gimbal_controller` zooms ViewPro A40 → `semantic_analyzer` runs Tier 2 over the ROI → optionally `vlm_client` runs Tier 3 → `scan_controller` decides. Movement candidates emerging during the zoom-in hold are still consumed (subject to telemetry-skew tolerance and the per-zoom-band thresholds). (`system-flows.md §F2`, `§F3`)
3. **Operator round trip**. `telemetry_stream` pushes camera + telemetry + bbox overlay → Ground Station → operator browser → confirm / decline / target-follow start-release → modem → `operator_bridge` → `mapobjects_store` (decline) or `mission_executor` (confirm) or `scan_controller` (target-follow). Always-on, not detection-gated. Operator commands are authenticated, signed, and replay-protected (§5; scheme TBD per Q9). (`system-flows.md §F5`)
4. **Mission lifecycle**. `mission_client` pulls from `missions` API → `mission_executor` issues MAVLink waypoints via `mavlink_layer` → `gimbal_controller` runs the zoom-out sweep along the route. On operator confirmation, `mission_executor` inserts a middle waypoint and resumes after target-follow ends. (`system-flows.md §F6`)
5. **MapObjects + ignored items**. New detections compute an H3 cell, query the k-ring of neighbours, classify as new / moved / existing / removed (§7.12), and check for an `IgnoredItem` match before surfacing to the operator. (`system-flows.md §F7`)
6. **MapObjects sync** (mission-bracketing). Pre-flight: `mission_client` pulls the last-known map state for the mission area from the `missions` API and hydrates `mapobjects_store`. Post-flight: `mission_client` pushes the mission's full pass diff (NEW / MOVED / REMOVED / CONFIRMED-EXISTING) back. In-flight sync is **batched only** for MVP — no streaming over modem (§7.13; `system-flows.md §F8`).

---

## 5. Architectural Principles / Non-Negotiables

- **Detection-as-a-service.** Primitive (Tier 1) detection lives in `../detections`, not in autopilot. Autopilot owns Tier 2 (semantic) and Tier 3 (VLM, optional) only.
- **Hand-rolled MAVLink.** No third-party SDK. The MAVLink command surface is small enough to hand-implement; eliminates the largest current dependency-risk item.
- **Deterministic typed state machine** for scan control. States are `ZoomedOut | ZoomedIn { roi, hold_started_at } | TargetFollow { target_id, started_at }`. No ad-hoc booleans, no shared mutable flags. The full behaviour-tree spec lives in `system-flows.md §F4`.
- **Ego-motion compensation is mandatory** for movement detection. Naive frame-differencing is rejected outright. Movement detection runs at **both** zoom-out and zoom-in (skipped only during target-follow); zoom-in adequacy of classical CV is benchmark-gated (§7.6, Q14).
- **Operator workload cap of ≤5 POIs/minute** is hard, not soft. `scan_controller` enforces it.
- **Operator timeout scales with confidence** — 40 % → 30 s, 100 % → 120 s, linear; below 40 % the target is not surfaced. Timeout = forget; decline = `IgnoredItem` entry.
- **Operator commands are authenticated, signed, and replay-protected.** Modem-link encryption alone is not sufficient — every confirm / decline / target-follow / abort command MUST carry a session-bound, replay-resistant signature that `operator_bridge` validates before dispatch. Exact scheme TBD (§8 Q9).
- **Local VLM with structured `VlmAssessment` schema.** Free-form VLM text is not a downstream API. No cloud egress.
- **Always-on camera + telemetry stream** to Ground Station is part of the mission contract — operator always sees the live feed, not just on detection.
- **Lost-link failsafe is explicit.** Loss of the operator/Ground-Station modem link triggers a typed failsafe ladder in `mission_executor` (§7.7). The ladder is deterministic; default action is RTL after a configured grace window.
- **Pre-flight self-test (BIT) gates takeoff.** Every dependency listed in §5 plus mission load + MapObjects pre-flight pull (cached fallback acknowledged) must pass before `mission_executor` enters `ARMED` (multirotor) or `WAIT_AUTO` (fixed-wing). Health endpoint distinguishes pre-flight vs in-flight readiness.
- **`autopilot` and `missions` are separate repos** with a shared `mission-schema` artefact. The same `missions` API also hosts the central MapObjects endpoints (§7.13).
- **MapObjects are mission-bracketed and centrally synchronised.** Pre-flight pull on start; on-device authoritative in-flight; full pass diff pushed at mission end. The on-device store is a working copy of the central state for the mission's bounding box, not a private database.
- **No silent error swallowing** anywhere in the pipeline. Health endpoint reflects every dependency: `frame_ingest`, `detection_client`, `movement_detector`, `semantic_analyzer`, `vlm_client` (if enabled), `scan_controller`, `gimbal_controller`, `mavlink_layer`, `mission_client`, `mission_executor`, `operator_bridge`, `telemetry_stream`, `mapobjects_store`, plus `mapobjects_sync` (pre-flight pull / post-flight push status).
- **Geofence enforcement is symmetric.** Both INCLUSION and EXCLUSION polygons are honoured. (Earlier C++ behaviour silently ignored EXCLUSION; the rewrite explicitly enforces both.)

---

## 6. Non-Functional Targets

| Concern | Target | Owner |
|---|---|---|
| Tier 1 latency | ≤100 ms / frame (end-to-end at 1280 px, FP16, batch 1) | `../detections` (autopilot's call budget respects it) |
| Tier 2 latency | ≤200 ms / ROI | `semantic_analyzer` |
| Tier 3 (VLM) latency | ≤5 s / ROI | `vlm_client` |
| ViewPro A40 zoom transition | ≤2 s (medium → high) | `gimbal_controller` |
| Decision-to-movement latency | ≤500 ms | `gimbal_controller` |
| POI rate to operator | ≤5 POIs / min (hard cap) | `scan_controller` |
| Concealed-position recall | ≥60 % | `semantic_analyzer` |
| Concealed-position precision | ≥20 % (operators filter) | `semantic_analyzer` |
| New per-class P / R | ≥80 % | `../detections` |
| Footpath detection recall | ≥70 % | `semantic_analyzer` |
| Movement-candidate enqueue latency | ≤1 s from detection (zoom-out); ≤1.5 s (zoom-in, accommodating gimbal slew) | `movement_detector` |
| Zoom-out → zoom-in transition | ≤2 s including physical zoom | `scan_controller` + `gimbal_controller` |
| Telemetry rate (position) | 1 Hz min, 10 Hz target | `mavlink_layer` |
| Memory budget (semantic + movement + VLM) | ≤6 GB on Jetson Orin Nano (8 GB total, ~2 GB reserved for YOLO) | system-wide |
| Watchdog / retry on MAVLink failures | bounded retry with exponential backoff; explicit max-retry; health flips to red | `mission_executor` |
| Operator command → action latency | ≤500 ms operator-click → outbound MAVLink / gimbal command (excludes modem RTT) | `operator_bridge` + downstream |
| Sustained frame-rate floor | ≥10 fps; below this `scan_controller` suppresses zoom-in transitions and surfaces health → yellow | `frame_ingest` + `scan_controller` |
| MapObjects pre-flight pull | ≤30 s for a 30 km × 30 km mission area; cache-fallback acceptable on timeout | `mission_client` + `mapobjects_store` |
| MapObjects post-flight push | ≤2 min for a 60 min mission's pass diff; bounded retry; persisted on disk if push fails | `mission_client` + `mapobjects_store` |

---

## 7. Detailed Design

This section covers the rewrite-time problem narrative, suite-level concerns (mission regions, MapObjects, MGRS sync, new-vs-existing object detection), constraints, acceptance criteria, the chosen solution architecture, the MAVLink command surface, and the tech stack.

### 7.1 Problem

The reconnaissance winged UAV detects vehicles and military equipment with YOLO, but current high-value targets are camouflaged positions: FPV operator hideouts, hidden artillery emplacements, and dugouts masked by branches. These cannot be found by visual similarity to known object classes alone.

The new approach has three cooperating search engines:

- **Camera sweep** — follow the UAV route at wide or light/medium zoom with left-right gimbal movement to cover terrain and queue POIs.
- **Movement detection** — runs in **both** zoom-out and zoom-in levels (skipped only during target-follow). Per-zoom-band thresholds keep false-positive rate below the operator-review cap; classical OpenCV adequacy at zoom-in is benchmark-gated (Q14).
- **Semantic zoom search** — detect primitives such as black entrances, branch piles, footpaths, roads, trees, and tree blocks, then reason over scene context to find concealed positions.

The system controls a two-level scan:

- **Zoom-out level (wide-area sweep)** — the camera follows the UAV route at wide or light/medium zoom, sweeping left-right across the flight path while detecting primitives, buildings, vehicles, and small motion candidates. Footpath starts, suspicious branch piles, tree rows, movement candidates, and similar POIs are marked with GPS-denied coordinates and queued.
- **Zoom-in level (detailed scan)** — the camera zooms into each queued POI or movement candidate for confirmation. It follows detected footpaths from origin to endpoint, keeps paths centered while the UAV moves, follows the freshest or most promising branch at intersections, holds on endpoints for VLM analysis of branch piles, dark entrances, dugouts, vehicles, or people, and slowly pans broader POIs such as tree rows or clearings. Movement detection continues, scaled for the higher pixel-to-metre ratio. After analysis or timeout, it returns to zoom-out and continues the queue or route.

When an operator confirms a target, the system switches to **target-follow mode**: keep the target centered with gimbal control while the UAV moves, until the operator releases it or tracking is lost.

### 7.2 Mission Regions and Reconnaissance Flow

Mission directions can be vague. Waypoints define a route that passes through multiple regions:

```text
Start → Point1 → Point2 → Point3 → Point4 → Point5 → Point6 → Finish
                    ╔═══════════════╗
                    ║   Region 1    ║
                    ╚═══════════════╝
         ╔══════════════════╗
         ║    Region 2      ║
         ╚══════════════════╝
  ╔══════════════╗
  ║   Region 3   ║
  ╚══════════════╝
```

The autopilot decides the route within each region (1, 2, and 3).

**Alternative scenario — region-only search.** The user selects only a region for the search (no explicit waypoints inside). The autopilot plans its own route within the region.

```text
Start ──┐
        │    ╔═══════════════╗
        ├───►║    Region     ║  (contains Points)
        │    ╚═══════════════╝
Finish◄─┘
```

**Reconnaissance flow.** The reconnaissance UAV:

1. Searches within the region and finds potential targets.
2. Sends images to the retranslation UAV.
3. The retranslation UAV forwards them to the human operator.
4. The human operator makes a decision regarding the target using the behaviour-tree-driven `scan_controller` logic (`system-flows.md §F4`).

**Scanning strategy.**

- **Zoom-out level — wide-area scan.** Camera points along the UAV route with left-right swing. The detections service continuously recognises specific patterns as POIs. This initial scan runs at medium zoom while moving between targets. POI types: tree rows (potential caponiers, entrances concealed by tree rows); polygons (areas where military vehicles could be hidden); houses with vehicles or traces; roads and routes on snow or terrain, inside the forest, or near houses.
- **Zoom-in level — detailed scan.** When the camera finds a POI or movement candidate, it zooms in and performs a detailed scan. During detailed scan it searches for trees, caponiers, military vehicles, and so on. Movement detection continues during the zoom-in hold (subject to the per-zoom-band thresholds) so a moving small target found mid-detail-scan is not lost.

### 7.3 Restrictions

**Hardware and camera.**

- Jetson Orin Nano Super: 67 TOPS INT8, 8 GB shared LPDDR5; YOLO uses ~2 GB RAM, leaving ~6 GB for semantic detection, movement detection, and VLM.
- All models use FP16 precision (frozen choice: keep FP16-only for all models).
- Primary camera: ViewPro A40, 1080p (1920×1080), 40× optical zoom, f=4.25–170 mm, Sony 1/2.8" CMOS (IMX462LQR), HDMI or IP output at 1080p 30/60 fps.
- Alternative camera: ViewPro Z40K at higher cost.
- Thermal sensor (640×512, NETD ≤50 mK) is available only as a future enhancement, not a core requirement.

**Operational.**

- Flight altitude: 600–1000 m.
- Support all seasons and terrain types: winter snow, spring mud, summer vegetation, autumn; forest, open field, urban edges, and mixed terrain. (Frozen choice: MVP must cover **all** seasons, not winter-first only.)
- ViewPro A40 40× optical zoom traversal takes 1–2 s; zoom-out → zoom-in transition must complete within ≤2 s including physical zoom.
- Movement detection runs at **both** zoom-out and zoom-in levels, compensates for UAV/gimbal motion, and queues candidates for zoom confirmation; target following starts only after operator confirmation. Per-zoom-band thresholds (cluster persistence, residual-velocity floor, telemetry-skew tolerance) are configurable.

**Software.**

- Inference: TensorRT on Jetson, ONNX Runtime fallback, 1280 px model input, tile splitting for large images.
- VLM must run locally on Jetson with no cloud dependency, as a separate IPC process — not compiled into the autopilot binary.
- YOLO and VLM inference run sequentially because they share GPU memory; no concurrent execution.

**Reliability and safety.**

- **Lost-link failsafe is mandatory.** Loss of the operator/Ground-Station modem link triggers a deterministic ladder in `mission_executor` (default RTL after a 30 s grace; configurable per mission). Loss of the airframe MAVLink link itself triggers immediate health → red and degrades to whatever ArduPilot/PX4's own failsafe dictates.
- **Pre-flight self-test (BIT) gates takeoff.** GPS lock, camera RTSP healthy, gimbal homed (yaw/pitch/zoom feedback within tolerance), `../detections` reachable + warmed, mission loaded + validated, MapObjects pre-flight pull complete (or cached fallback acknowledged with operator confirm), VLM warm (if `vlm_enabled`), persistent-store space ≥ configured floor.
- **Battery / fuel thresholds enforced.** `mission_executor` triggers RTL at battery ≤ configured RTL-floor (e.g. 25 %); land-now at hard-floor (e.g. 15 %); ignored only on operator override. Surfaces health → yellow / red accordingly. Threshold values are mission-configurable.
- **Sustained frame-rate floor.** Below ≥10 fps sustained, `scan_controller` suppresses zoom-in transitions (only TIER 1 + operator overlay continue) and surfaces health → yellow.
- **Wall-clock time source.** Monotonic clock is authoritative for telemetry-skew compensation and tick budgets. Wall-clock is bound to GPS time once GPS is locked (preferred) or NTP-set at boot if reachable; both are recorded with `clock_source` and `last_sync_at`. Drift > 200 ms surfaces health → yellow.
- **On-device storage is bounded.** `mapobjects_store` retention + log buffer have configured caps; on cap-hit, oldest pre-current-mission data is evicted; persistent-store-full pre-flight is a BIT failure.

**Integration and scope.**

- The `../detections` service is FastAPI + Cython + TensorRT in a Docker container on Jetson; consumed via bi-directional gRPC.
- Consume YOLO boxes with class, confidence, and normalised coordinates; output boxes in the same format for operator display.
- Movement candidates and confirmed followed targets use the same normalised box format for operator display.
- GPS coordinates come from the GPS-denied service (`../_docs/11_gps_denied.md`) and are out of scope for autopilot's own implementation.
- **MapObjects sync** uses the central `missions` API extension `/missions/{id}/mapobjects` (pre-flight GET, post-flight POST). Schema in §7.13.
- Annotation tooling, training pipeline, and data-collection automation are separate repositories and out of scope.
- GPS-denied navigation is a separate project; mission planning and route selection inside a region remain in autopilot.

**Frozen choices (2026-05-06, updated 2026-05-18).** Gating decisions for downstream design:

1. **Tier 1 remains FP16-only** for all models. INT8 is rejected for MVP.
2. **MVP acceptance requires all seasons**, not winter-first only.
3. **Operator-review cap is ≤5 POIs/minute** (moderate cap chosen).
4. **Movement detection assumes timestamped video, gimbal angle/zoom, and UAV motion telemetry** for MVP. Naive frame-differencing is rejected. Movement detection runs at both zoom-out and zoom-in; classical OpenCV adequacy at zoom-in is benchmark-gated (Q14).
5. **Local VLM is required for MVP** if and only if the exact model satisfies ≤5 s/ROI and the memory budget; otherwise VLM is disabled for MVP and `scan_controller` operates without it.
6. **MapObjects are mission-bracketed and centrally synchronised** via the `missions` API. In-flight sync is **batched only** for MVP (no streaming over modem).
7. **Operator commands are authenticated, signed, and replay-protected.** Modem-link encryption alone is not sufficient.

### 7.4 Acceptance Criteria

**Latency.**

| Tier | Target | Hardware |
|---|---|---|
| Tier 1 fast probe (YOLO26 + YOLOE-26) | ≤100 ms/frame | Jetson Orin Nano Super |
| Tier 2 fast confirmation (custom CNN) | ≤200 ms/ROI | Jetson Orin Nano Super |
| Tier 3 optional deep analysis (VLM) | ≤5 s/ROI | Jetson Orin Nano Super |

**YOLO object detection.**

- Add classes: black entrances of various sizes, branch piles, footpaths, roads, trees, and tree blocks.
- New classes target: P ≥80 %, R ≥80 %; existing class performance must not degrade.
- Baseline reference: current YOLO achieves P=81.6 %, R=85.2 % on non-masked objects.

**Semantic detection.**

- Initial concealed-position recall: ≥60 %, accepting high false positives for later reduction.
- Initial concealed-position precision: ≥20 %, with operators filtering candidates.
- Footpath detection recall: ≥70 %.
- Pipeline consumes YOLO primitives (footpaths, roads, branch piles, entrances, trees), assesses path freshness, traces paths to endpoints, identifies concealed structures, and follows the freshest or most promising branch at intersections.

**Movement detection.**

- During the zoom-out sweep, detect small moving point/cluster candidates that are not yet classifiable and enqueue them for zoom confirmation within 1 s.
- During the zoom-in hold, continue movement detection (independent residual-motion clustering, scaled for the zoomed pixel-to-metre ratio) so a moving small target appearing inside a held POI is not lost; enqueue within 1.5 s.
- Account for UAV and gimbal motion: stable objects (trees, houses, roads, terrain) must not be treated as moving only because the camera platform moves.
- Movement candidates become zoom-in POIs; after zoom, the system attempts semantic / YOLO confirmation as vehicle, people, or other relevant target.
- Zoom-in adequacy of classical OpenCV optical-flow / global-motion estimation is benchmark-gated. If the false-positive rate at zoom-in exceeds the per-zoom-band budget, fall back to a learned optical-flow / CNN-based motion module behind a feature flag (Q14).

**Scan and camera control.**

- Zoom-out level covers the planned route with a wide or light/medium-zoom left-right sweep; POIs include footpaths, tree rows, branch piles, black entrances, movement candidates, houses with vehicles or traces, and roads on snow / terrain / forest.
- Transition zoom-out → zoom-in within 2 s of POI detection, including physical zoom from medium to high.
- Zoom-in level keeps camera lock while the UAV flies, compensates for aircraft motion, pans along footpaths or movement candidates so they stay visible and centered, holds endpoints for VLM analysis up to 2 s, and returns to zoom-out after analysis or configurable timeout (default 5 s/POI).
- After operator confirmation, target-follow mode keeps the target in the centre 25 % of frame while visible, until operator release, target loss, or timeout.
- Gimbal module commands ViewPro A40 pan/tilt/zoom with ≤500 ms decision-to-movement latency, smooth transitions, and footpaths/moving targets kept centered during pan.
- Maintain an ordered POI queue prioritised by confidence and proximity to current camera position.

**Resources and data.**

- Semantic module + movement module + VLM RAM: ≤6 GB on Jetson Orin Nano Super.
- Must coexist with the running YOLO pipeline without degrading YOLO performance.
- Training data: hundreds to thousands of annotated images/sequences across all seasons and terrain types.
- Dedicated annotation needed for black entrances, branch piles, footpaths, roads, trees, and tree blocks; available dataset assembly effort is 1.5 months at 5 hours/day.

### 7.5 Training Data

**Source.**

- Aerial imagery from reconnaissance winged UAVs at 600–1000 m altitude.
- ViewPro A40 camera, 1080p resolution, various zoom levels.
- Extracted from video frames and still images.
- Movement detection requires frame sequences, not still images only; include camera/gimbal telemetry where available to separate target motion from UAV motion.

**Target classes.**

- Footpaths / trails (linear features on snow, mud, forest floor).
- Fresh footpaths (distinct edges, undisturbed surroundings, recent track marks).
- Stale footpaths (partially covered by snow / vegetation, faded edges).
- Concealed structures: branch-pile hideouts, dugout entrances, squared / circular openings.
- Tree rows (potential concealment lines).
- Open clearings connected to paths (FPV launch points).
- Moving point/cluster candidates across the full zoom range (wide, light/medium, full zoom-in) — sequences must include both zoom-out and zoom-in examples to support per-zoom-band threshold tuning.

**YOLO primitive classes (new).**

- Black entrances to hideouts (various sizes).
- Piles of tree branches.
- Footpaths.
- Roads.
- Trees, tree blocks.

**Annotation format.**

- Managed by existing annotation tooling in a separate repository.
- Expected: bounding boxes and/or segmentation masks depending on model architecture.
- Footpaths may require polyline or segmentation annotation rather than bounding boxes.

**Seasonal coverage.**

- Winter: snow-covered terrain (footpaths as dark lines on white).
- Spring: mud season (footpaths as compressed/disturbed soil).
- Summer: full vegetation (paths through grass/undergrowth).
- Autumn: mixed leaf cover, partial snow.

**Volume.**

- Target: hundreds to thousands of annotated images/sequences.
- Available effort: 1.5 months at 5 hours/day.
- Potential for annotation-process automation.

### 7.6 Solution Architecture

A two-level onboard scan system (zoom-out wide sweep + zoom-in confirmation). The system delegates Tier 1 detection to the existing FastAPI / Cython / TensorRT YOLO service (`../detections`), adds a central scan/perception scheduler (`scan_controller`), compensates motion using synchronised video / gimbal / UAV telemetry (movement detection runs at both zoom levels), controls the ViewPro A40 through a deterministic state machine, and invokes a secured local VLM process only for bounded zoom-in confirmation.

Before implementation decomposition, the project must pass a **benchmark gate** on target hardware: Tier 1 latency, Tier 2 ROI latency, VLM latency / memory, A40 zoom timing, movement-replay false-positive rate, and all-season dataset readiness.

```text
Video frames + timestamped gimbal/zoom/UAV telemetry
        |
        v
Input validation + telemetry synchronisation
        |
        v
Central scan/perception scheduler (scan_controller)
        |
        +---> Existing FastAPI/Cython TensorRT service (../detections)
        |       YOLO26 + YOLOE-26 fixed-class FP16 engines
        |
        +---> Movement detector (active in ZoomedOut and ZoomedIn)
        |       OpenCV ego-motion compensation + residual clusters,
        |       per-zoom-band thresholds; learned-CV fallback Q14
        |
        +---> Tier 2 semantic analyzer
        |       primitive graph + lightweight ROI CNN (zoom-in only)
        |
        v
POI queue (confidence + proximity + aging + <=5 POIs/min cap)
        |
        +---> ViewPro A40 state-machine controller
        |
        +---> Secured local VLM IPC (optional, benchmark-gated)
                NanoLLM VILA1.5-3B, structured VlmAssessment output
```

#### Benchmark gate

The first implementation milestone is a proof suite, not product code. It validates:

- YOLO26 + YOLOE-26 FP16 TensorRT, fixed 1280 px, batch 1, end-to-end ≤100 ms/frame.
- Tier 2 primitive graph + lightweight CNN ≤200 ms/ROI.
- NanoLLM VILA1.5-3B local VLM ≤5 s/ROI and within remaining memory budget while the YOLO container is present.
- ViewPro A40 medium-to-high zoom transition and command-to-movement latency.
- Movement replay false-positive rate **measured independently** at zoom-out and zoom-in, under the ≤5 POIs/minute operator-review cap. If zoom-in exceeds the per-zoom-band cap with classical CV, the learned-CV fallback (Q14) becomes a benchmark-gate prerequisite for the zoom-in scope.
- All-season dataset readiness and hard-negative coverage.

#### Tier 1 primitive detector

Use custom-trained fixed-class YOLO26 and YOLOE-26 TensorRT FP16 engines, owned by `../detections`. Runtime open-vocabulary prompt mutation is **not** part of MVP; fixed project classes or pre-baked embeddings are required. Outputs remain normalised boxes for operator display, with optional masks or path geometry passed as POI metadata.

#### Tier 2 semantic analyzer

Use a primitive graph plus a lightweight ROI CNN to reason over paths, branch piles, dark entrances, roads, trees, tree blocks, clearings, vehicles, people, and endpoint context. This layer owns path freshness, endpoint scoring, branch choice at intersections, and concealment-POI scoring. Active in the zoom-in level only.

#### Movement detector

Active at **both** zoom-out and zoom-in (skipped only during target-follow). Use OpenCV optical flow / global-motion estimation fused with timestamped gimbal angle, zoom state, and UAV motion telemetry. Naive frame differencing is rejected because it cannot distinguish target motion from platform motion. A telemetry synchronisation contract specifies maximum tolerated frame ↔ gimbal ↔ zoom ↔ UAV timestamp skew before motion compensation; out-of-tolerance samples must be rejected or downgraded.

**Per-zoom-band tuning.** Cluster persistence threshold, residual-velocity floor, and telemetry-skew tolerance are configured per zoom band (zoom-out, zoom-in). The pixel-to-metre ratio differs by ~10× between bands, so identical residual pixel motion implies very different physical motion; thresholds must scale.

**Adequacy at zoom-in (research item, Q14).** Classical optical flow / global-motion estimation is well-validated at zoom-out (UAV cruising, gimbal sweeping, large FOV, ego-motion is the dominant signal and easily fitted). At zoom-in the gimbal is actively path-following, the FOV is narrow, motion blur from any small command is large, and the homography model degrades. The benchmark gate (below) MUST measure the false-positive rate at zoom-in independently from zoom-out; if it exceeds the per-zoom-band cap, the implementation falls back to a learned optical-flow module (e.g. RAFT-derived) or a CNN-based motion-segmentation module behind a feature flag, while keeping the same input/output contract.

#### Scan controller and POI queue

Use a deterministic typed state machine with **`ZoomedOut`**, **`ZoomedIn { roi, hold_started_at }`**, and **`TargetFollow { target_id, started_at }`** states. The queue is ordered by confidence, proximity, and aging while enforcing the ≤5 POIs/minute operator-review cap. The controller handles timeouts, target loss, VLM waits, return-to-zoom-out, and target-follow centre-window behaviour. The full behaviour-tree spec — including tick scenarios and the 15 fixed-wing rules — lives in `system-flows.md §F4`.

#### Local VLM confirmation

Run NanoLLM with VILA1.5-3B through a separate local IPC process **if** the benchmark gate passes. Use one bounded ROI crop, short prompt, short answer, and a validated `VlmAssessment` schema. Free-form VLM text is not a downstream API. The IPC channel uses Unix-domain socket permissions and peer-credential checks where available.

**Optionality model.** VLM is the only optional Tier in the system. Two complementary mechanisms model this:

1. **Runtime configuration flag (`vlm_enabled`)**, gated by the benchmark-gate result. When the flag is `false`, `scan_controller` skips the VLM-confirmation step and proceeds with Tier 2 evidence alone for the zoom-in hold; the operator timeout still applies.
2. **Build-time feature module.** The `vlm_client` component is a separate module behind a feature flag; the binary must build, link, and run identically when the module is absent. `scan_controller` MUST NOT contain a hard dependency on `vlm_client`'s presence — it depends only on a `VlmAssessment` provider trait whose default implementation returns `status: vlm_disabled`.

The implementation chooses one of these (or both); both must yield the same observable behaviour: the system functions correctly with VLM absent, only losing the zoom-in confirmation step.

#### Integration and reliability

Preserve the normalised-box contract while adding POI metadata. A central scheduler (`scan_controller`) owns GPU-heavy work and enforces no concurrent YOLO/VLM execution. No silent exception swallowing; health must reflect every dependency listed in §5.

#### Security and operational controls

- Validate image / ROI payload size and format before decoding or inference.
- Use patched OpenCV versions and an image-format allow-list.
- Enforce local IPC authorisation and payload limits for the VLM process (Unix-domain socket permissions plus peer-credential checks).
- Log POI creation reasons, source detections, queue decisions, gimbal commands, VLM requests, operator confirmations, and failure states.
- Keep VLM local with no cloud egress.

### 7.7 MAVLink and Piloting

`mavlink_layer` is a hand-rolled MAVLink v2 transport. There is no third-party SDK dependency. The layer owns serialisation / deserialisation, heartbeat, sequence numbers, retry, and a single connection abstraction (UDP or serial, picked at startup from CLI / env).

**Command surface (~10–15 commands).** Only what the system actually needs:

| MAVLink message | Direction | Used by | Purpose |
|---|---|---|---|
| `HEARTBEAT` | bidirectional | `mavlink_layer` | liveness + GCS-vs-companion identification |
| `COMMAND_LONG` (subset) | out | `mission_executor` | arm / disarm, takeoff, set-mode, change-speed, change-alt, land, RTL |
| `COMMAND_ACK` | in | `mavlink_layer` | command-result demux, retry trigger |
| `MISSION_COUNT` | out | `mission_executor` | pre-upload count |
| `MISSION_REQUEST_INT` | in | `mission_executor` | pull-side mission upload |
| `MISSION_ITEM_INT` | out | `mission_executor` | per-waypoint upload |
| `MISSION_ACK` | in | `mission_executor` | upload completion |
| `MISSION_SET_CURRENT` | out | `mission_executor` | start at item 0 |
| `MISSION_CURRENT` | in | `mission_executor` | progress |
| `MISSION_ITEM_REACHED` | in | `mission_executor` | progress |
| `MISSION_CLEAR_ALL` | out | `mission_executor` | reset before re-upload (e.g., middle waypoint) |
| `GLOBAL_POSITION_INT` | in | `telemetry_stream`, `mission_executor` | live position |
| `ATTITUDE` | in | `telemetry_stream` | attitude for operator overlay |
| `SYS_STATUS` / `EXTENDED_SYS_STATE` | in | health aggregator | mode, battery, sensor health |
| `STATUSTEXT` | in | logger | autopilot diagnostic lines |
| `SET_MODE` (or `COMMAND_LONG MAV_CMD_DO_SET_MODE`) | out | `mission_executor` | flight-mode transitions for fixed-wing |

If the autopilot link supports MAVLink-2 message signing it is enabled; otherwise the link is treated as trusted (it is point-to-point on a closed serial / UDP path on the airframe).

**Piloting variants.** `mission_executor` runs one of two state machines depending on the airframe declared at startup:

- **Multirotor variant**: `DISCONNECTED → CONNECTED → HEALTH_OK → ARMED → TAKE_OFF → MISSION_UPLOADED → FLY_MISSION → LAND`. The executor arms, takes off to a configured altitude, and only then uploads + starts the mission. Bounded retry with exponential backoff at every transition; explicit max-retry; on exceeding it, health flips to red and the executor surfaces the failure via the operator bridge.
- **Fixed-wing variant**: `DISCONNECTED → CONNECTED → HEALTH_OK → MISSION_UPLOADED → WAIT_AUTO → FLY_MISSION → LAND`. The executor skips arm/takeoff (the airframe is assumed already airborne under RC control), uploads the mission, and waits for the operator to switch the airframe into AUTO mode via RC. Same retry policy.

**Geofence enforcement.** `mission_executor` honours both INCLUSION and EXCLUSION polygons declared in the mission. INCLUSION violations halt forward progress and trigger return-to-launch (RTL); EXCLUSION violations trigger the same. The earlier C++ implementation parsed but silently ignored EXCLUSION; the new design rejects that behaviour explicitly.

**Mission uploads and middle-waypoint inserts.** When the operator confirms a target, `operator_bridge` hands a middle-waypoint hint to `mission_executor`. The executor recomputes the mission (current-position → middle-waypoint → resume original route), clears the existing autopilot mission via `MISSION_CLEAR_ALL`, re-uploads the new mission via the standard `MISSION_COUNT` / `MISSION_ITEM_INT` / `MISSION_ACK` sequence, and resumes flight. After target-follow ends (operator release, target loss, or timeout), the same sequence reverts to the original mission.

**Lost-link failsafe (operator/Ground-Station modem link).** A typed failsafe ladder runs in `mission_executor`, evaluated each tick:

| Stage | Trigger | Action |
|---|---|---|
| `LinkOk` | last operator heartbeat ≤ 5 s | continue mission; no behavioural change |
| `LinkDegraded` | 5 s < last heartbeat ≤ 30 s | continue mission; surface health → yellow; queue all POI surface-events for replay-on-recovery |
| `LinkLost` | last heartbeat > 30 s **and** target-follow inactive | trigger RTL via `MAV_CMD_NAV_RETURN_TO_LAUNCH`; log mission abort with reason; continue logging the mission diff for post-flight upload via `mapobjects_store` |
| `LinkLostInFollow` | last heartbeat > 30 s **and** in target-follow | hold target-follow for an additional 30 s grace (operator may have momentarily lost link); thereafter fall through to `LinkLost` |

The grace windows (5 s, 30 s, 30 s) are mission-configurable. **MAVLink-link loss to ArduPilot/PX4 itself** is not the same event — it triggers immediate health → red and falls through to whatever the airframe autopilot's own failsafe does (we do NOT override it).

**Battery / fuel thresholds.** `mission_executor` reads `SYS_STATUS` / `EXTENDED_SYS_STATE` and enforces:

- `battery ≤ rtl_threshold` (default 25 %) → trigger RTL, log reason, continue post-mission upload.
- `battery ≤ hard_floor` (default 15 %) → land-now via `MAV_CMD_NAV_LAND` at safest reachable point; surface health → red.

Operator override is permitted via a signed command (per Q9); without it, the thresholds are hard.

**Connection configuration.** A single connection URI at startup: `udp://...` or `serial:///dev/...`. No runtime URI swap.

**Frames and altitudes.** All waypoints in the mission API use `MAV_FRAME_GLOBAL_RELATIVE_ALT`. Terrain-following frames are not used (no SRTM database on the airframe).

### 7.8 Detection Classes

These classes extend the default seed set used by the detections service.

| Class           | Local Name (UA) | Notes                      |
|-----------------|-----------------|----------------------------|
| Rows of trees   | Посадка         | Linear vegetation cover    |
| Trenches/Ditches| Рів             | Linear earthwork features  |
| Trash piles     | Сміття          | Indicators of activity     |
| Tire tracks     | Сліди від шин   | Signs of movement          |

Plus the new YOLO primitive classes from §7.5 Training Data: black entrances of various sizes, branch piles, footpaths, roads, trees, and tree blocks.

### 7.9 MapObjects (H3 spatial index)

`MapObjects` are created and managed internally by autopilot. There are **no** REST API endpoints for MapObjects — autopilot reads/writes them directly in the on-device store (`mapobjects_store`). The only external reference is the delete cascade in `DELETE /missions/{id}` (per the suite-level missions API).

Autopilot needs to store objects on a 2D map efficiently in order to find differences fast:

- New objects (new pile of trash, new tire tracks).
- Changed objects.
- Removed objects.

Each object on the map is described by:

- `gps(lat, lon)` — geographic position.
- `size(width, height)` — bounding area.

**Spatial indexing.** Use a hexagonal spatial index to efficiently store and query objects by location.

**Approach:** H3 library (by Uber) — hierarchical hexagonal geospatial indexing system.

| Aspect              | Detail                                     |
|---------------------|--------------------------------------------|
| Library             | H3 (`h3rs` crate for Rust)                 |
| Algorithm basis     | 3D icosahedron → 2D hexagonal tessellation |
| Key advantage       | Uniform area cells, good neighbour queries |
| Open question       | Optimal tile/resolution size               |
| Known issue         | Discontinuity problem at cell boundaries   |

The hexagonal grid avoids the distortion problems of square grids and provides consistent neighbour relationships, making it suitable for fast spatial diff operations (detecting new, changed, and removed objects).

### 7.10 Drone ⇄ Operator Sync Message Format

Detection data is synced between drone and operator using a compact message format. MGRS (Military Grid Reference System) is used as the primary coordinate encoding — compact, standardised, and directly usable on military maps.

**Drone → Operator (detection report):**

```text
missionId :: MGRS(encoded) :: class :: confidence :: size_width_m :: size_length_m :: photo_metadata :: flags
```

**Operator → Drone (command/acknowledgment):**

```text
missionId :: Encoded(GroundMGRS :: Time) :: ... :: missionId2
```

Wire-level field semantics live in `data_model.md §MGRS sync message`.

### 7.11 Target Relocation / Movement Analysis

The system maintains a live **map of objects** and detects changes between survey passes.

**Map update types.**

| Type    | Meaning                                      |
|---------|----------------------------------------------|
| New     | Object not seen before in this area          |
| Moved   | Object of same class appeared nearby         |
| Removed | Previously recorded object no longer present |

**Map hashtable.** Objects are stored in a hashtable keyed by MGRS grid reference:

```text
MGRS1  -> Object1
MGRS2  -> Object5
MGRS12 -> Object2
MGRSN  -> ObjectM
```

### 7.12 New vs Existing / Moved / Removed Object Detection

When a detection occurs, the system must determine whether the object is **new**, **moved**, or **already known**. This must be done efficiently in real time. This is the implementation of `scan_controller`'s map-diff responsibilities; it lives in `mapobjects_store`.

**Algorithm.**

```text
On each detection(gps, class, confidence, size):

1. Compute H3 cell index at chosen resolution (e.g. res 10 ~15m edge).
2. Build composite key = H3_cell + class.
3. Query k-ring(H3_cell, k=2) -> get all neighbouring cells.
4. For each neighbouring cell, lookup objects with same or similar class:
     similar_classes = {military_vehicle, tank, artillery}  (configurable groups)
5. Compare:
     - If matching object found within distance_threshold (config, e.g. 50m)
       AND same class group -> EXISTING (or MOVED if position delta > move_threshold).
     - If no match -> NEW -> insert into map with H3 hash key.
6. After full sweep: objects in the region that were NOT re-observed -> REMOVED candidates.
```

**Why H3 + MGRS.**

| Step                     | Mechanism                  | Complexity |
|--------------------------|----------------------------|------------|
| Spatial cell lookup      | H3 `latlng_to_cell`        | O(1)       |
| Neighbour query          | H3 `grid_disk(k=2)`        | O(1)       |
| Object lookup per cell   | Hashtable by `MGRS+class`  | O(1)       |
| Total per detection      | ~constant time             | O(k²)      |

**Configurable parameters.**

| Parameter            | Example Value | Purpose                                              |
|----------------------|---------------|------------------------------------------------------|
| search_radius_km     | 30            | Max radius to search for previously known objects    |
| distance_threshold_m | 50            | Max distance to consider same object                 |
| move_threshold_m     | 10            | Min displacement to flag as "moved"                  |
| h3_resolution        | 10            | ~15 m edge length, good for vehicle-sized objects    |
| similar_classes      | per config    | Class groups treated as equivalent for matching      |

**Notes.**

- The 30 km radius is for the broad initial query ("get all previously stored objects within 30 km"). H3 `grid_disk` at resolution 10 with k=2 covers ~90 m radius — this handles fine-grained matching. For the broad query, use a coarser H3 resolution (e.g. res 4 ~22 km edge) as a pre-filter.
- `MGRS+class` is the composite key for the hashtable so that lookups are partitioned by both location and object type.
- The discontinuity problem at H3 cell boundaries is solved by always querying the k-ring (centre cell + neighbours), ensuring objects near an edge are still matched.

### 7.13 MapObjects Sync (central DB)

`mapobjects_store` is **not** a private on-device database. It is the working copy of a centrally maintained map of detected objects, scoped to the mission's bounding box, synchronised on a per-mission basis.

**Mirror of the GPS-Denied satellite-tile pattern.** Pre-flight, autopilot pulls the relevant central state into the on-device store; in-flight the on-device store is authoritative; post-flight, autopilot pushes the mission's full pass diff back to the central store. The central store is the source of truth across missions and across UAVs; the on-device store is the source of truth during the active mission.

**Endpoint hosting (frozen 2026-05-18).** The endpoints are an extension of the existing `missions` API. There is no separate `mapobjects` service.

| Endpoint | Method | Purpose |
|---|---|---|
| `/missions/{id}/mapobjects` | `GET` | Pre-flight: returns the central map state for the mission's bounding box (last-known objects + ignored items). |
| `/missions/{id}/mapobjects` | `POST` | Post-flight: uploads the mission's full pass diff (NEW / MOVED / REMOVED-CANDIDATE / CONFIRMED-EXISTING) for central merge. |
| `/missions/{id}/mapobjects/ignored` | `GET` | Pre-flight: returns the central ignored-items list scoped to the mission area. |
| `/missions/{id}/mapobjects/ignored` | `POST` | Post-flight: uploads ignored-items appended during the mission. |
| `DELETE /missions/{id}` | (existing) | Cascade: drops mission-scoped MapObjects and IgnoredItems centrally as well as on-device. |

In-flight sync is **batched only** for MVP — no streaming over modem. Cross-UAV awareness lags by mission length; this is an explicit MVP trade-off (Frozen choice 6 in §7.3).

**Sync lifecycle (per mission).**

1. **Pre-flight pull** — `mission_client` calls `GET /missions/{id}/mapobjects` after fetching the mission itself. Response hydrates `mapobjects_store`. Failure modes:
   - **Reachable + 200**: hydrate; record `pull_completed_at`. Sync state = `synced`.
   - **Reachable + 4xx**: fail BIT; surface error; operator must investigate (likely mission-id mismatch or unauthorised UAV).
   - **Unreachable / timeout**: BIT degrades. Operator may acknowledge to continue with **last-cached** state for this mission area (`sync state = cached_fallback`); the BIT failure is recorded for post-mission audit.
   - **Empty response**: `sync state = synced`, store empty (legitimate first-flight in this area).
2. **In-flight** — store is authoritative. All NEW / MOVED / EXISTING / IgnoredItem appends accumulate in the on-device store with `pending_upload = true`. No central writes.
3. **Post-flight push** — `mission_client` calls `POST /missions/{id}/mapobjects` with the mission's full pass diff after landing or RTL. Conflict resolution is server-side per §7.13 conflict rules. Failure modes:
   - **Reachable + 200**: clear `pending_upload`; record `push_completed_at`. Sync state = `synced`.
   - **Unreachable / timeout / 5xx**: persist the pending diff on disk, retry with backoff. After max retries (configurable, default 24 h), surface as a warning; operator may manually trigger replay or accept loss.
   - **4xx (rejected)**: log full payload, surface to operator; do not silently discard — the mission's results are at risk.

**Conflict resolution at the central store (open question Q8 — proposed).** When two missions report contradicting state for the same `(h3_cell, class_group)`:

- Both observations are **appended** to the per-`(h3_cell, class_group)` observation log (no destructive overwrite).
- The "current view" surfaced to operator UI is computed from the observation log: most recent confirmed-existing observation wins; older REMOVED claims expire after a configurable age; class-group ambiguities surface as multi-class candidates.
- IgnoredItems are union-merged (any operator-decline at any UAV propagates to all future missions in the same area, until explicit clear).

**Central-side schema (SQL, indicative).**

```sql
-- Observations: every detection ever reported by any UAV/mission, never overwritten.
CREATE TABLE map_object_observations (
    id              UUID PRIMARY KEY,
    h3_cell         BIGINT NOT NULL,
    class           TEXT NOT NULL,
    class_group     TEXT NOT NULL,
    mission_id      UUID NOT NULL REFERENCES missions(id) ON DELETE CASCADE,
    uav_id          UUID NOT NULL,
    observed_at     TIMESTAMPTZ NOT NULL,
    gps_lat         DOUBLE PRECISION NOT NULL,
    gps_lon         DOUBLE PRECISION NOT NULL,
    mgrs            TEXT NOT NULL,
    size_width_m    REAL,
    size_length_m   REAL,
    confidence      REAL NOT NULL,
    diff_kind       TEXT NOT NULL CHECK (diff_kind IN ('NEW','MOVED','EXISTING','REMOVED_CANDIDATE')),
    photo_ref       TEXT,
    raw_evidence    JSONB
);
CREATE INDEX ON map_object_observations (h3_cell, class_group);
CREATE INDEX ON map_object_observations (mission_id);
CREATE INDEX ON map_object_observations (observed_at DESC);

-- IgnoredItems: per-area operator declines, union-merged across missions.
CREATE TABLE map_object_ignored (
    id              UUID PRIMARY KEY,
    h3_cell         BIGINT NOT NULL,
    mgrs            TEXT NOT NULL,
    class_group     TEXT NOT NULL,
    declined_at     TIMESTAMPTZ NOT NULL,
    operator_id     UUID,
    mission_id      UUID REFERENCES missions(id) ON DELETE SET NULL,
    retention_scope TEXT NOT NULL CHECK (retention_scope IN ('mission','session','until_expiry')),
    expires_at      TIMESTAMPTZ
);
CREATE INDEX ON map_object_ignored (h3_cell, class_group);
CREATE INDEX ON map_object_ignored (expires_at) WHERE retention_scope = 'until_expiry';

-- Materialised "current view" derived from observations + ignored.
-- Recomputed nightly or on POST. Exact projection rules per §7.13 conflict resolution.
CREATE MATERIALIZED VIEW map_objects_current AS ...;
```

**On-device-side schema (engine TBD per §8 Q3 — indicative shape).**

```text
mapobjects_store/
  current_state            -- key = (h3_cell, class_group); value = MapObject record
  pending_observations     -- ordered log of unflushed observations for post-flight POST
  pending_ignored          -- unflushed IgnoredItem appends
  sync_state               -- {pull_completed_at, push_completed_at, last_error, kind}
```

The on-device shape is intentionally narrower than the central schema — the on-device store does not need full observation history beyond the active mission; older history is only ever consulted via the central pull.

**Bounding-box pull strategy.** The central API uses the mission's geofence INCLUSION polygon (or a generous AABB if no INCLUSION is set) to scope the response. Pulled records are filtered by retention age (default ≤30 days); operator can override to "all". The 30 km / k-ring numbers in §7.12 apply to **on-device** spatial queries; the pull radius is mission-defined.

### 7.14 Tech Stack

**Requirements.**

| Area | Requirement |
|---|---|
| Runtime hardware | Jetson Orin Nano Super 8 GB, locked JetPack/power mode, ViewPro A40. |
| Inference (Tier 1) | FP16 only, TensorRT primary, ONNX Runtime fallback, 1280 px model input. Lives in `../detections`. |
| Service integration | Bi-directional gRPC client to the existing FastAPI + Cython + TensorRT detections service. |
| VLM | Local-only, separate IPC process, sequential with YOLO, ≤5 s/ROI if used for MVP. |
| Movement | Active at zoom-out and zoom-in, moving-camera compensation with timestamped video / gimbal / UAV telemetry; per-zoom-band thresholds; learned-CV fallback per Q14. |
| MapObjects sync | Mission-bracketed: pre-flight `GET` + post-flight `POST` against `/missions/{id}/mapobjects`. Batched only for MVP. |
| Output | Existing normalised-box format plus POI metadata for queue / reasoning. |
| Proof gates | Hardware/replay benchmark suite before implementation decomposition; movement zoom-in benchmark independent of zoom-out. |

**Selected stack.**

| Layer | Selection | Rationale |
|---|---|---|
| Language (autopilot) | Rust | Memory safety, performance, single-binary deployment, strong type system for the deterministic state machine. |
| Language (`../detections`) | Python + Cython | Existing service; we consume it, not rewrite it. |
| Tier 1 detector | YOLO26 + YOLOE-26 fixed-class FP16 TensorRT | Best fit with acceptance criteria and export docs. Owned by `../detections`. |
| Tier 2 analyzer | Primitive graph + lightweight CNN | Fast, explainable, data-efficient. |
| Movement | OpenCV optical flow + telemetry | Directly addresses moving-camera constraint. |
| VLM runtime | NanoLLM / VILA1.5-3B (with fallback benchmark path) | Documented local-multimodal path; matches no-cloud requirement. |
| Scan controller | Deterministic typed state machine (Rust) | Simpler and easier to test for a fixed `ZoomedOut` / `ZoomedIn` / `TargetFollow` lifecycle. |
| MAVLink transport | Hand-rolled in autopilot (Rust) | Eliminates the largest current dependency-risk item; small command surface (§7.7). |
| Gimbal protocol | ViewPro A40 vendor protocol over UDP | Matches the deployed camera. |
| `mapobjects_store` engine | TBD (SQLite + H3 extension / KV / in-memory + snapshot) | Open question; see §8. |
| Inter-component IPC (in-process) | Tokio channels / actors | Idiomatic Rust async. |
| External IPC (VLM) | Unix-domain socket with peer-credential check | Local-only authorisation. |
| VLM output | Validated structured `VlmAssessment` schema | Makes VLM output a stable API contract. |
| Input security | Content / size allow-list + patched OpenCV | Reduces crafted-input and resource-exhaustion risk. |
| Observability | `tracing` + JSON logs to stdout, scraped by the deployment's log-shipping stack | See `deployment/observability.md`. |
| Build | `cargo` cross-compile for `aarch64-unknown-linux-gnu` | See `deployment/ci_cd_pipeline.md`. |

**Risk register.**

| Risk | Impact | Mitigation |
|---|---|---|
| Tier 1 misses ≤100 ms/frame | Blocks acceptance | Fixed-shape FP16 engines, batch 1, benchmark before implementation decomposition. |
| VLM misses ≤5 s/ROI or memory budget | Blocks VLM-required MVP policy | Benchmark NanoLLM / VILA first; fall back to smaller VLM only if it passes the same gates; otherwise disable VLM via `vlm_enabled=false`. |
| All-season MVP data is insufficient | Blocks detection-quality targets | Per-season dataset gates and hard-negative mining. |
| Movement false positives exceed ≤5 POIs/min | Operator overload | Telemetry-aided compensation, replay tests, queue cap, per-zoom-band thresholds. |
| Classical OpenCV optical flow inadequate at zoom-in | Loss of zoom-in movement detection | Benchmark gate measures zoom-in independently; fallback to learned-CV / CNN motion module behind feature flag (Q14). |
| Operator/Ground-Station modem link lost mid-flight | Uncontrolled UAV | Typed lost-link failsafe ladder in `mission_executor` (§7.7); RTL after 30 s grace; configurable. |
| Battery / fuel below threshold mid-mission | Forced landing or crash | Hard-coded RTL + land-now thresholds (§7.7); operator override only via signed command. |
| Operator command spoofing / replay over modem RF | Hostile hijack of operator commands | Authenticated, signed, replay-protected command envelope (§5; scheme TBD per Q9). |
| Pre-flight self-test (BIT) misses a degraded dependency | Mid-flight component failure | BIT covers every dependency in §5 plus mission load + MapObjects pre-flight pull; cached-fallback acknowledgement is explicit. |
| Wall-clock drift breaks operator-command timestamping | Forensic + audit failures | GPS-time-bound when GPS locked; NTP at boot; drift > 200 ms surfaces health → yellow. |
| MapObjects post-flight push fails | Loss of mission-diff data centrally | Persist pending diff on disk; bounded retry; operator-visible warning; manual replay supported. |
| A40 zoom transition exceeds ≤2 s | Breaks scan timing | Hardware-in-loop timing test; revise scan timeout / zoom range if needed. |
| Hand-rolled MAVLink misses an edge case | Mission failure or hard-to-debug protocol behaviour | Conformance test against ArduPilot SITL; replay-based regression tests. |
| Unstructured VLM output corrupts downstream decisions | Operator-facing false confidence | Schema validation, confidence enum, timeout / error state, fail-closed behaviour. |
| Telemetry skew breaks movement compensation | False motion candidates | Define maximum frame / gimbal / UAV timestamp skew; reject / degrade unsynchronised samples. |
| Untrusted image / ROI payloads exploit decoders or memory | Security and availability risk | Pin patched OpenCV, restrict formats, enforce size caps before decode. |

---

## 8. Open Questions

| # | Question | Impact |
|---|---|---|
| Q1 | **Sweep pattern specification.** Pattern shape (pendulum / raster / lawn-mower), FOV per zoom tier, dwell time per direction, and whether sweep runs continuously or only between specific mission waypoints. | Blocks `scan_controller` zoom-out implementation. |
| Q2 | **Ground Station API contract.** Stream protocol (WebRTC / WebSocket-H.264 / gRPC server-streaming?), session/auth model, and bbox-overlay rendering (server-side burn-in vs client-side render). | Blocks `telemetry_stream` + `operator_bridge` design. |
| Q3 | **`mapobjects_store` engine.** SQLite + H3 extension / KV / in-memory + snapshot. | Blocks persistent-state design for ignored items + MapObjects. |
| Q4 | **Tier 1 contract evolution.** How `detection_client` is versioned against an evolving `../detections` schema. | Blocks the gRPC contract definition. |
| Q5 | **`mission-schema` extraction location.** `_infra/` at suite root, or a small third repo. | Blocks the `mission_client` / `missions` API contract sharing. |
| Q6 | **MAVLink-2 message signing.** Whether the airframe link enables MAVLink-2 signing or treats the link as trusted. | Affects `mavlink_layer` startup handshake. |
| Q7 | **Central MapObjects API contract.** Endpoint hosting is frozen as an extension of the `missions` API (§7.13). The remaining contract concerns are: schema versioning, paging strategy for large mission areas, photo-reference upload mechanism (URL handoff vs inline), and observation-history retention policy. | Blocks `missions` repo work + `mission_client` MapObjects sync code. |
| Q8 | **MapObjects conflict resolution.** When two missions report contradicting state for the same `(h3_cell, class_group)`, the proposed rule is "append-only observation log + computed current view" (§7.13). Open: exact projection rules, REMOVED-claim expiry window, multi-class disambiguation. | Blocks central `map_objects_current` view definition. |
| Q9 | **Operator-command authentication scheme.** The principle is committed (§5: signed, replay-protected). Scheme open: HMAC over (session_token, sequence_number, payload) vs JWT-style ed25519 vs MAVLink-2 signing extended to operator commands vs separate envelope. | Blocks `operator_bridge` validation logic + Ground Station integration. |
| Q10 | **Software rollback policy on the airframe.** Watchtower OTA is mentioned in `../_docs/00_top_level_architecture.md`. Policy open: how a bad autopilot update is detected on the airframe (boot-time self-check, A/B partition, watchdog rollback) and rolled back without crew intervention. | Affects deployment design + on-airframe service supervision. |
| Q11 | **Multi-operator session policy.** When two operators connect (one in primary station, one remote), which is authoritative for confirm/decline? Single active operator at a time, or quorum? How is `operator_id` recorded in `IgnoredItem`? | Blocks `operator_bridge` session model. |
| Q12 | **Comms blackout during banking turns.** Winged UAV banking can lose modem LOS to Ground Station. Policy: tolerate brief blackouts as `LinkDegraded`, or suppress lost-link failsafe during known turn arcs (computed from mission shape)? | Affects lost-link failsafe ladder timing constants (§7.7). |
| Q13 | **All-season acceptance flight gates.** Dataset gates (§7.4) are committed; flight-test gates are not. Open: minimum number of real flights per season before MVP acceptance, per-season acceptance pass criteria. | Affects MVP sign-off scope. |
| Q14 | **Movement detection at zoom-in — fallback selection.** If classical OpenCV optical flow / global-motion estimation does not meet the per-zoom-band false-positive cap at zoom-in, the fallback module choice is open: learned optical flow (RAFT / FlowNet derivative) vs CNN motion segmentation vs IMU-tighter-coupled classical CV. The interface contract (`Frame + telemetry → Vec<MovementCandidate>`) is fixed; the implementation is replaceable. | Blocks `movement_detector` zoom-in scope if classical CV fails benchmark gate. |

---

## 9. Out of Scope

- Multi-airframe coordination, fleet management, swarm logic.
- Mission re-planning beyond middle-waypoint inserts.
- Mission planning / route selection for arbitrary mission shapes (only intra-region routing).
- GPS-denied navigation algorithms (delegated to the GPS-denied service, `../_docs/11_gps_denied.md`).
- Cloud-hosted VLM or any external inference dependency.
- Encrypted transport beyond what MAVLink-2 message signing and modem-level link encryption already provide.
- Annotation tooling, model training, dataset curation (separate `ai-training` repo).
- Operator browser UI (Ground Station hosts it; autopilot only feeds it).

---

## 10. External Suite Documents

These suite-level documents live in the parent suite repo (`../_docs/`) and are consumed by autopilot but **not owned** by autopilot.

| Suite-level path | Owner / primary-for | What autopilot uses it for |
|---|---|---|
| `../_docs/00_top_level_architecture.md` | suite (cross-cutting) | Suite topology, deployment tiers (`edge`), the **flight-gate convention** (`/run/azaion/in-flight` — written by autopilot, read by `model-sync.service`), Watchtower OTA model. Defines autopilot's place in the 11-component system. |
| `../_docs/02_missions.md` | `missions` repo (.NET service) | Mission / Waypoint / Vehicle schemas. Autopilot consumes the missions API via `mission_client`. |
| `../_docs/03_detections.md` | `detections` repo (Cython service) | Detections API spec. Autopilot consumes via bi-directional gRPC in `detection_client`. |
| `../_docs/04_system_design_clarifications.md` | suite (cross-cutting) | REST patterns, stream-detection protocol, edge-device connection semantics. Defines the Ground Station push contract used by `telemetry_stream`. |
| `../_docs/11_gps_denied.md` | `gps-denied-onboard` / `gps-denied-desktop` (shared primary) | GPS-Denied service architecture. Autopilot does NOT host any GPS-denied code; it consumes corrected GPS through the shared edge data path. |
| `../_docs/12_ai_training.md` | `ai-training` repo | AI training pipeline. Autopilot consumes the resulting ONNX/TensorRT models via the rclone model-sync timer (flight-gate-aware). |