detections-semantic/_docs/02_plans/architecture.md

# Semantic Detection System — Architecture

## 1. System Context

**Problem being solved**: Reconnaissance UAVs with YOLO-based object detection cannot identify camouflaged/concealed military positions (FPV operator hideouts, hidden artillery, dugouts masked by branches). A semantic detection layer is needed that detects footpaths, traces them to endpoints, and identifies concealed structures — controlling the camera gimbal through a two-level scan strategy (wide sweep + detailed investigation).

**System boundaries**:
- **Inside**: Semantic detection pipeline (Tier 1/2/3 inference), scan controller (L1/L2 Behavior Tree), gimbal driver (ViewLink serial), frame recorder, detection logger, system health monitor
- **Outside**: Existing YOLO detection pipeline, GPS-denied navigation, mission planning, annotation tooling, training pipelines, operator display

**External systems**:

| System | Integration Type | Direction | Purpose |
|--------|-----------------|-----------|---------|
| Existing YOLO Pipeline | REST API (in-process or local HTTP) | Inbound | Provides scene-level detections (vehicles, roads, buildings) as context |
| ViewPro A40 Gimbal | UART serial (ViewLink protocol) | Outbound | Camera pan/tilt/zoom commands |
| GPS-Denied System | Shared memory / API | Inbound | Provides current GPS-denied coordinates for detection logging |
| Operator Display | REST API / shared detection output | Outbound | Delivers detection results (bounding boxes + metadata) |
| NVMe Storage | Filesystem | Both | Frame recording, detection logs, model files, config |

## 2. Technology Stack

| Layer | Technology | Version | Rationale |
|-------|-----------|---------|-----------|
| Language (core) | Cython / C | — | Extends existing detection codebase; maximum performance |
| Language (VLM) | Python 3.11 | 3.11 | NanoLLM and VLM libraries are Python-native |
| Language (tools) | Python 3.11 | 3.11 | Configuration, logging, frame recording utilities |
| Inference (Tier 1) | TensorRT FP16 | JetPack 6.2 bundled | Fastest inference on Jetson; FP16 is stable (INT8 deferred) |
| Inference (Tier 3) | NanoLLM (MLC/TVM) | 24.7+ | Purpose-built for Jetson VLM inference; Docker-based |
| Detection model | YOLOE (yoloe-11s-seg or yoloe-26s-seg) | Ultralytics 8.4.x (pinned) | Open-vocabulary segmentation; backbone selected empirically |
| VLM model | VILA1.5-3B (4-bit MLC) | — | Confirmed on Orin Nano; multimodal; stable via NanoLLM |
| Image processing | OpenCV + scikit-image | 4.x | Skeletonization, morphology, frame quality assessment |
| Orchestration | py_trees | 2.4.0 | Behavior tree for scan controller; extensible, preemptive |
| Serial comm | pyserial + crcmod | — | ViewLink gimbal protocol with CRC-16 |
| IPC | Unix domain socket | — | Semantic process ↔ VLM process communication |
| Containerization | Docker | JetPack 6.2 container | VLM runs in NanoLLM Docker; main service in existing Docker |
| Configuration | YAML | — | All thresholds, class names, scan parameters, degradation levels |
| Platform | Jetson Orin Nano Super | JetPack 6.2 | 67 TOPS, 8GB LPDDR5, NVMe SSD boot |

**Key constraints from restrictions.md**:
- 8GB shared RAM: YOLO ~2GB, semantic+VLM must fit in ~6GB. Sequential GPU scheduling (no concurrent YOLO+VLM).
- Cython + TRT codebase: new modules must integrate with existing Cython build system
- Air-gapped: no cloud connectivity, all inference local. Updates via USB drive.
- ViewPro A40 zoom transition: 1-2 seconds physical constraint affects L1→L2 timing

## 3. Deployment Model

**Environments**: Development (workstation with GPU), Production (Jetson Orin Nano Super on UAV)

**Infrastructure**:
- Production: Jetson Orin Nano Super with ruggedized carrier board (MILBOX-ORNX or similar), NVMe SSD, active cooling
- Development: x86 workstation with NVIDIA GPU (for model training and testing)
- No cloud, no staging environment — field-deployed edge device

**Environment-specific configuration**:

| Config | Development | Production |
|--------|-------------|------------|
| Inference engine | ONNX Runtime (CPU/GPU) or TRT on dev GPU | TensorRT FP16 on Jetson |
| Gimbal | Mock serial (TCP socket) | Real UART to ViewPro A40 |
| VLM | NanoLLM Docker or direct Python | NanoLLM Docker on Jetson |
| Storage | Local filesystem | NVMe SSD (industrial grade) |
| Logging | Console + file | JSON-lines to NVMe |
| Thermal monitor | Disabled | Active (tegrastats) |
| Power monitor | Disabled | Active (INA sensors) |
| Config file | config.dev.yaml | config.prod.yaml |

## 4. Data Model Overview

**Core entities**:

See `data_model.md` for full details. Summary:

**Runtime structs** (in-memory only): FrameContext, YoloDetection (external input), POI, GimbalState
**Persistent** (NVMe flat files): DetectionLogEntry (JSON-lines), HealthLogEntry (JSON-lines), RecordedFrames (JPEG), Config (YAML)

No database. Transient processing artifacts (segmentation masks, skeletons, endpoint crops) are created, consumed, and discarded within a single frame's processing cycle.

**Data flow summary**:
- Camera → Frame → YOLO (external) → detections → SemanticPipeline → detection log + operator
- SemanticPipeline → ScanController → GimbalDriver → ViewPro A40
- Frame → Recorder → NVMe (JPEG) + Logger → NVMe (JSON-lines)

## 5. Integration Points

### Internal Communication

| From | To | Protocol | Pattern | Notes |
|------|----|----------|---------|-------|
| ScanController | Tier1Detector | Direct function call (Cython) | Sync pipeline | Same process, frame buffer shared |
| Tier1Detector | Tier2SpatialAnalyzer | Direct function call | Sync pipeline | Segmentation mask or detection list passed in memory |
| ScanController | VLMProcess | Unix domain socket (JSON) | Async request-response | VLM in separate Docker container; 5s timeout |
| ScanController | GimbalDriver | Direct function call | Command queue | Scan controller pushes target angles |
| GimbalDriver | ViewPro A40 | UART serial (ViewLink protocol) | Command-response | 115200 baud; use native ViewLink checksum if available, add CRC-16 only if protocol lacks integrity checks |
| ScanController | Logger/Recorder | Direct function call | Fire-and-forget (async write) | Non-blocking NVMe write; detection log + frame recording |
| ScanController | (inline) | health_check() at top of main loop | Capability flags | Reads tegrastats, gimbal heartbeat, VLM status — no separate thread |

### External Integrations

| External System | Protocol | Auth | Rate Limits | Failure Mode |
|----------------|----------|------|-------------|--------------|
| Existing YOLO Pipeline | In-process call or local HTTP (localhost) | None (same device) | Frame rate (10-30 FPS) | semantic_available=false → YOLO-only mode |
| GPS-Denied System | Shared memory or local API | None (same device) | Per-frame | Coordinates logged as null if unavailable |
| Operator Display | Detection output format (same as YOLO) | None (same device) | Per-detection | Detections queued if display unavailable |

## 6. Non-Functional Requirements

| Requirement | Target | Measurement | Priority |
|------------|--------|-------------|----------|
| Tier 1 latency (p95) | ≤100ms per frame | TRT inference time on Jetson | High |
| Tier 2 latency (p95) | ≤200ms per ROI (V2 CNN) / ≤50ms (V1 heuristic) | Processing time from mask to classification | High |
| Tier 3 latency (p95) | ≤5s per ROI | VLM request-to-response via IPC | Medium |
| Memory (semantic+VLM) | ≤6GB peak | tegrastats monitoring | High |
| Thermal (sustained) | T_junction < 75°C | tegrastats, 60-min test | High |
| Throughput | ≥8 FPS sustained (Tier 1) | Frames processed per second | High |
| Availability | Capability-flag degradation (vlm, gimbal, semantic) | Continuous operation despite component failures | High |
| Cold start | ≤60s to first detection | Power-on to first result | Medium |
| Recording endurance | ≥2 hours at Level 2 rate | NVMe write, 256GB SSD | Medium |
| Data retention | Until NVMe full (circular buffer) | Oldest L1 frames overwritten first | Low |

## 7. Security Architecture

**Authentication**: None required — all components are local on the same Jetson device, air-gapped network.

**Authorization**: N/A — single-user system, operator interacts via separate display system.

**Data protection**:
- At rest: No encryption (performance priority on edge device; physical security assumed via UAV possession)
- In transit: N/A (all communication is local — UART, Unix socket, localhost)
- Secrets management: No secrets — no API keys, no cloud credentials. Model files are not sensitive (publicly available architectures).

**Audit logging**: Detection log (JSON-lines) records every detection with timestamp, coordinates, confidence, tier. Gimbal command log records every command sent. Both stored on NVMe. Retained until overwritten by circular buffer or manually extracted via USB.

## 8. Key Architectural Decisions

### ADR-001: Three-tier inference architecture

**Context**: Need both fast initial detection (≤100ms) and deep semantic analysis (≤5s). Single model cannot achieve both.

**Decision**: Three tiers — Tier 1 (YOLOE TRT, ≤100ms), Tier 2 (path tracing + heuristic/CNN, ≤200ms), Tier 3 (VLM, ≤5s, optional). Each tier runs only when the previous tier triggers it.

**Alternatives considered**:
1. Single VLM for all analysis — rejected: too slow for real-time scanning (>2s per frame)
2. YOLO + VLM only (no Tier 2) — rejected: VLM would be invoked too frequently, saturating GPU

**Consequences**: More complex pipeline; three models to manage; but enables real-time scanning with deep analysis only when needed.

### ADR-002: NanoLLM instead of vLLM for VLM runtime

**Context**: VLM process needs stable inference on Jetson Orin Nano 8GB. vLLM has documented system freezes and crashes on this hardware.

**Decision**: Use NanoLLM (NVIDIA's Jetson-optimized library) with Docker containers and MLC/TVM quantization.

**Alternatives considered**:
1. vLLM — rejected: system freezes, reboots, installation crashes (multiple open GitHub issues)
2. llama.cpp — kept as fallback for GGUF models not supported by NanoLLM

**Consequences**: Limited model selection (VILA, LLaVA, Obsidian); UAV-VL-R1 only available via llama.cpp fallback.

### ADR-003: YOLOE backbone selection deferred to empirical benchmark

**Context**: YOLO26 has reported accuracy regression on custom datasets vs YOLO11. Both are supported by YOLOE.

**Decision**: Support both yoloe-11s-seg and yoloe-26s-seg as configurable backends. Sprint 1 benchmarks on real annotated data determine the winner.

**Alternatives considered**:
1. Commit to YOLO26 — rejected: reported regression risk
2. Commit to YOLO11 — rejected: YOLO26 has better NMS-free deployment and small-object features

**Consequences**: Must maintain two TRT engine files; config switch; slightly more build complexity.

### ADR-004: FP16 only, INT8 deferred

**Context**: TensorRT INT8 export crashes on Jetson Orin (JetPack 6, TRT 10.3.0) during calibration.

**Decision**: Use FP16 for all TRT engines in initial deployment. INT8 optimization deferred to Phase 3+.

**Alternatives considered**:
1. INT8 from day one — rejected: documented crashes, unstable tooling
2. Mixed precision (FP16 backbone, INT8 head) — rejected: adds complexity without proven stability

**Consequences**: ~2x slower than INT8 theoretical maximum; acceptable given FP16 already meets latency targets.

### ADR-005: VLM as separate Docker process with IPC

**Context**: VLM (NanoLLM) runs in a Docker container with specific CUDA/MLC dependencies. Cannot be compiled into Cython codebase.

**Decision**: VLM runs as a separate Docker container. Communication via Unix domain socket (JSON messages). Loaded dynamically during Level 2 only; unloaded to free GPU memory during Level 1.

**Alternatives considered**:
1. VLM compiled into main process — rejected: dependency incompatibility with Cython + TRT pipeline
2. VLM always loaded — rejected: consumes ~3GB GPU memory that's needed for YOLO during Level 1

**Consequences**: IPC latency overhead (~10ms); container management complexity; but clean separation and memory efficiency.

### ADR-006: NVMe SSD mandatory, no SD card

**Context**: Recurring SD card corruption documented on Jetson Orin Nano. Production module has no eMMC.

**Decision**: NVMe SSD for OS, models, recording, logging. Industrial-grade SSD with vibration-resistant mount.

**Alternatives considered**:
1. SD card — rejected: documented corruption issues across multiple brands
2. USB drive — rejected: slower, less reliable under vibration

**Consequences**: Additional hardware cost (~$40-80); requires NVMe-compatible carrier board.

### ADR-007: UART integrity for gimbal communication

**Context**: ViewPro documents EMI-induced random gimbal panning from antenna interference. UART communication needs error detection.

**Decision**: First, check if ViewLink Serial Protocol V3.3.3 includes native checksums (read full spec during implementation). If yes, use the native checksum and add retry logic on checksum failure. If no native checksum exists, add CRC-16 (CRC-CCITT) wrapper. Either way: retry up to 3 times on integrity failure, log errors.

**Alternatives considered**:
1. No error detection — rejected: EMI is a documented real-world issue
2. Always add custom CRC regardless — rejected: may conflict with native protocol

**Consequences**: Depends on spec reading; physical EMI mitigation (shielded cable, 35cm antenna separation) still needed regardless.

### ADR-008: Behavior Tree for ScanController orchestration

**Context**: ScanController manages two scan levels (L1 sweep, L2 investigation), health preemption, POI queueing, and future extensions (spiral search, thermal scan). Need a pattern that handles preemption cleanly and is extensible.

**Decision**: Use py_trees (2.4.0) Behavior Tree. Root Selector tries HealthGuard → L2Investigation → L1Sweep → Idle. Leaf nodes are simple procedural calls into existing components. Shared state via py_trees Blackboard.

**Alternatives considered**:
1. Flat state machine — rejected: adding new scan modes requires rewiring transitions; preemption logic becomes tangled
2. Hierarchical state machine — viable but less standard for autonomous vehicles; less tooling support
3. Hybrid (BT + procedural leaves) — this is essentially what we chose; BT structure with procedural leaf logic

**Consequences**: Adds py_trees dependency (~150KB); tree tick overhead negligible (<1ms); ASCII tree rendering aids debugging; new scan behaviors added as subtrees without modifying existing ones.