mirror of https://github.com/azaion/detections-semantic.git synced 2026-04-22 22:26:39 +00:00

Files

T

Oleksandr Bezdieniezhnykh 8e2ecf50fd Initial commit

Made-with: Cursor

2026-03-26 00:20:30 +02:00

14 KiB

Raw Blame History

Semantic Detection System — Architecture

1. System Context

Problem being solved: Reconnaissance UAVs with YOLO-based object detection cannot identify camouflaged/concealed military positions (FPV operator hideouts, hidden artillery, dugouts masked by branches). A semantic detection layer is needed that detects footpaths, traces them to endpoints, and identifies concealed structures — controlling the camera gimbal through a two-level scan strategy (wide sweep + detailed investigation).

System boundaries:

Inside: Semantic detection pipeline (Tier 1/2/3 inference), scan controller (L1/L2 Behavior Tree), gimbal driver (ViewLink serial), frame recorder, detection logger, system health monitor
Outside: Existing YOLO detection pipeline, GPS-denied navigation, mission planning, annotation tooling, training pipelines, operator display

External systems:

System	Integration Type	Direction	Purpose
Existing YOLO Pipeline	REST API (in-process or local HTTP)	Inbound	Provides scene-level detections (vehicles, roads, buildings) as context
ViewPro A40 Gimbal	UART serial (ViewLink protocol)	Outbound	Camera pan/tilt/zoom commands
GPS-Denied System	Shared memory / API	Inbound	Provides current GPS-denied coordinates for detection logging
Operator Display	REST API / shared detection output	Outbound	Delivers detection results (bounding boxes + metadata)
NVMe Storage	Filesystem	Both	Frame recording, detection logs, model files, config

2. Technology Stack

Layer	Technology	Version	Rationale
Language (core)	Cython / C	—	Extends existing detection codebase; maximum performance
Language (VLM)	Python 3.11	3.11	NanoLLM and VLM libraries are Python-native
Language (tools)	Python 3.11	3.11	Configuration, logging, frame recording utilities
Inference (Tier 1)	TensorRT FP16	JetPack 6.2 bundled	Fastest inference on Jetson; FP16 is stable (INT8 deferred)
Inference (Tier 3)	NanoLLM (MLC/TVM)	24.7+	Purpose-built for Jetson VLM inference; Docker-based
Detection model	YOLOE (yoloe-11s-seg or yoloe-26s-seg)	Ultralytics 8.4.x (pinned)	Open-vocabulary segmentation; backbone selected empirically
VLM model	VILA1.5-3B (4-bit MLC)	—	Confirmed on Orin Nano; multimodal; stable via NanoLLM
Image processing	OpenCV + scikit-image	4.x	Skeletonization, morphology, frame quality assessment
Orchestration	py_trees	2.4.0	Behavior tree for scan controller; extensible, preemptive
Serial comm	pyserial + crcmod	—	ViewLink gimbal protocol with CRC-16
IPC	Unix domain socket	—	Semantic process ↔ VLM process communication
Containerization	Docker	JetPack 6.2 container	VLM runs in NanoLLM Docker; main service in existing Docker
Configuration	YAML	—	All thresholds, class names, scan parameters, degradation levels
Platform	Jetson Orin Nano Super	JetPack 6.2	67 TOPS, 8GB LPDDR5, NVMe SSD boot

Key constraints from restrictions.md:

8GB shared RAM: YOLO ~2GB, semantic+VLM must fit in ~6GB. Sequential GPU scheduling (no concurrent YOLO+VLM).
Cython + TRT codebase: new modules must integrate with existing Cython build system
Air-gapped: no cloud connectivity, all inference local. Updates via USB drive.
ViewPro A40 zoom transition: 1-2 seconds physical constraint affects L1→L2 timing

3. Deployment Model

Environments: Development (workstation with GPU), Production (Jetson Orin Nano Super on UAV)

Infrastructure:

Production: Jetson Orin Nano Super with ruggedized carrier board (MILBOX-ORNX or similar), NVMe SSD, active cooling
Development: x86 workstation with NVIDIA GPU (for model training and testing)
No cloud, no staging environment — field-deployed edge device

Environment-specific configuration:

Config	Development	Production
Inference engine	ONNX Runtime (CPU/GPU) or TRT on dev GPU	TensorRT FP16 on Jetson
Gimbal	Mock serial (TCP socket)	Real UART to ViewPro A40
VLM	NanoLLM Docker or direct Python	NanoLLM Docker on Jetson
Storage	Local filesystem	NVMe SSD (industrial grade)
Logging	Console + file	JSON-lines to NVMe
Thermal monitor	Disabled	Active (tegrastats)
Power monitor	Disabled	Active (INA sensors)
Config file	config.dev.yaml	config.prod.yaml

4. Data Model Overview

Core entities:

See data_model.md for full details. Summary:

Runtime structs (in-memory only): FrameContext, YoloDetection (external input), POI, GimbalState Persistent (NVMe flat files): DetectionLogEntry (JSON-lines), HealthLogEntry (JSON-lines), RecordedFrames (JPEG), Config (YAML)

No database. Transient processing artifacts (segmentation masks, skeletons, endpoint crops) are created, consumed, and discarded within a single frame's processing cycle.

Data flow summary:

Camera → Frame → YOLO (external) → detections → SemanticPipeline → detection log + operator
SemanticPipeline → ScanController → GimbalDriver → ViewPro A40
Frame → Recorder → NVMe (JPEG) + Logger → NVMe (JSON-lines)

5. Integration Points

Internal Communication

From	To	Protocol	Pattern	Notes
ScanController	Tier1Detector	Direct function call (Cython)	Sync pipeline	Same process, frame buffer shared
Tier1Detector	Tier2SpatialAnalyzer	Direct function call	Sync pipeline	Segmentation mask or detection list passed in memory
ScanController	VLMProcess	Unix domain socket (JSON)	Async request-response	VLM in separate Docker container; 5s timeout
ScanController	GimbalDriver	Direct function call	Command queue	Scan controller pushes target angles
GimbalDriver	ViewPro A40	UART serial (ViewLink protocol)	Command-response	115200 baud; use native ViewLink checksum if available, add CRC-16 only if protocol lacks integrity checks
ScanController	Logger/Recorder	Direct function call	Fire-and-forget (async write)	Non-blocking NVMe write; detection log + frame recording
ScanController	(inline)	health_check() at top of main loop	Capability flags	Reads tegrastats, gimbal heartbeat, VLM status — no separate thread

External Integrations

External System	Protocol	Auth	Rate Limits	Failure Mode
Existing YOLO Pipeline	In-process call or local HTTP (localhost)	None (same device)	Frame rate (10-30 FPS)	semantic_available=false → YOLO-only mode
GPS-Denied System	Shared memory or local API	None (same device)	Per-frame	Coordinates logged as null if unavailable
Operator Display	Detection output format (same as YOLO)	None (same device)	Per-detection	Detections queued if display unavailable

6. Non-Functional Requirements

Requirement	Target	Measurement	Priority
Tier 1 latency (p95)	≤100ms per frame	TRT inference time on Jetson	High
Tier 2 latency (p95)	≤200ms per ROI (V2 CNN) / ≤50ms (V1 heuristic)	Processing time from mask to classification	High
Tier 3 latency (p95)	≤5s per ROI	VLM request-to-response via IPC	Medium
Memory (semantic+VLM)	≤6GB peak	tegrastats monitoring	High
Thermal (sustained)	T_junction < 75°C	tegrastats, 60-min test	High
Throughput	≥8 FPS sustained (Tier 1)	Frames processed per second	High
Availability	Capability-flag degradation (vlm, gimbal, semantic)	Continuous operation despite component failures	High
Cold start	≤60s to first detection	Power-on to first result	Medium
Recording endurance	≥2 hours at Level 2 rate	NVMe write, 256GB SSD	Medium
Data retention	Until NVMe full (circular buffer)	Oldest L1 frames overwritten first	Low

7. Security Architecture

Authentication: None required — all components are local on the same Jetson device, air-gapped network.

Authorization: N/A — single-user system, operator interacts via separate display system.

Data protection:

At rest: No encryption (performance priority on edge device; physical security assumed via UAV possession)
In transit: N/A (all communication is local — UART, Unix socket, localhost)
Secrets management: No secrets — no API keys, no cloud credentials. Model files are not sensitive (publicly available architectures).

Audit logging: Detection log (JSON-lines) records every detection with timestamp, coordinates, confidence, tier. Gimbal command log records every command sent. Both stored on NVMe. Retained until overwritten by circular buffer or manually extracted via USB.

8. Key Architectural Decisions

ADR-001: Three-tier inference architecture

Context: Need both fast initial detection (≤100ms) and deep semantic analysis (≤5s). Single model cannot achieve both.

Decision: Three tiers — Tier 1 (YOLOE TRT, ≤100ms), Tier 2 (path tracing + heuristic/CNN, ≤200ms), Tier 3 (VLM, ≤5s, optional). Each tier runs only when the previous tier triggers it.

Alternatives considered:

Single VLM for all analysis — rejected: too slow for real-time scanning (>2s per frame)
YOLO + VLM only (no Tier 2) — rejected: VLM would be invoked too frequently, saturating GPU

Consequences: More complex pipeline; three models to manage; but enables real-time scanning with deep analysis only when needed.

ADR-002: NanoLLM instead of vLLM for VLM runtime

Context: VLM process needs stable inference on Jetson Orin Nano 8GB. vLLM has documented system freezes and crashes on this hardware.

Decision: Use NanoLLM (NVIDIA's Jetson-optimized library) with Docker containers and MLC/TVM quantization.

Alternatives considered:

vLLM — rejected: system freezes, reboots, installation crashes (multiple open GitHub issues)
llama.cpp — kept as fallback for GGUF models not supported by NanoLLM

Consequences: Limited model selection (VILA, LLaVA, Obsidian); UAV-VL-R1 only available via llama.cpp fallback.

ADR-003: YOLOE backbone selection deferred to empirical benchmark

Context: YOLO26 has reported accuracy regression on custom datasets vs YOLO11. Both are supported by YOLOE.

Decision: Support both yoloe-11s-seg and yoloe-26s-seg as configurable backends. Sprint 1 benchmarks on real annotated data determine the winner.

Alternatives considered:

Commit to YOLO26 — rejected: reported regression risk
Commit to YOLO11 — rejected: YOLO26 has better NMS-free deployment and small-object features

Consequences: Must maintain two TRT engine files; config switch; slightly more build complexity.

ADR-004: FP16 only, INT8 deferred

Context: TensorRT INT8 export crashes on Jetson Orin (JetPack 6, TRT 10.3.0) during calibration.

Decision: Use FP16 for all TRT engines in initial deployment. INT8 optimization deferred to Phase 3+.

Alternatives considered:

INT8 from day one — rejected: documented crashes, unstable tooling
Mixed precision (FP16 backbone, INT8 head) — rejected: adds complexity without proven stability

Consequences: ~2x slower than INT8 theoretical maximum; acceptable given FP16 already meets latency targets.

ADR-005: VLM as separate Docker process with IPC

Context: VLM (NanoLLM) runs in a Docker container with specific CUDA/MLC dependencies. Cannot be compiled into Cython codebase.

Decision: VLM runs as a separate Docker container. Communication via Unix domain socket (JSON messages). Loaded dynamically during Level 2 only; unloaded to free GPU memory during Level 1.

Alternatives considered:

VLM compiled into main process — rejected: dependency incompatibility with Cython + TRT pipeline
VLM always loaded — rejected: consumes ~3GB GPU memory that's needed for YOLO during Level 1

Consequences: IPC latency overhead (~10ms); container management complexity; but clean separation and memory efficiency.

ADR-006: NVMe SSD mandatory, no SD card

Context: Recurring SD card corruption documented on Jetson Orin Nano. Production module has no eMMC.

Decision: NVMe SSD for OS, models, recording, logging. Industrial-grade SSD with vibration-resistant mount.

Alternatives considered:

SD card — rejected: documented corruption issues across multiple brands
USB drive — rejected: slower, less reliable under vibration

Consequences: Additional hardware cost (~$40-80); requires NVMe-compatible carrier board.

ADR-007: UART integrity for gimbal communication

Context: ViewPro documents EMI-induced random gimbal panning from antenna interference. UART communication needs error detection.

Decision: First, check if ViewLink Serial Protocol V3.3.3 includes native checksums (read full spec during implementation). If yes, use the native checksum and add retry logic on checksum failure. If no native checksum exists, add CRC-16 (CRC-CCITT) wrapper. Either way: retry up to 3 times on integrity failure, log errors.

Alternatives considered:

No error detection — rejected: EMI is a documented real-world issue
Always add custom CRC regardless — rejected: may conflict with native protocol

Consequences: Depends on spec reading; physical EMI mitigation (shielded cable, 35cm antenna separation) still needed regardless.

ADR-008: Behavior Tree for ScanController orchestration

Context: ScanController manages two scan levels (L1 sweep, L2 investigation), health preemption, POI queueing, and future extensions (spiral search, thermal scan). Need a pattern that handles preemption cleanly and is extensible.

Decision: Use py_trees (2.4.0) Behavior Tree. Root Selector tries HealthGuard → L2Investigation → L1Sweep → Idle. Leaf nodes are simple procedural calls into existing components. Shared state via py_trees Blackboard.

Alternatives considered:

Flat state machine — rejected: adding new scan modes requires rewiring transitions; preemption logic becomes tangled
Hierarchical state machine — viable but less standard for autonomous vehicles; less tooling support
Hybrid (BT + procedural leaves) — this is essentially what we chose; BT structure with procedural leaf logic

Consequences: Adds py_trees dependency (~150KB); tree tick overhead negligible (<1ms); ASCII tree rendering aids debugging; new scan behaviors added as subtrees without modifying existing ones.

14 KiB Raw Blame History

Semantic Detection System — Architecture

1. System Context

2. Technology Stack

3. Deployment Model

4. Data Model Overview

5. Integration Points

Internal Communication

External Integrations

6. Non-Functional Requirements

7. Security Architecture

8. Key Architectural Decisions

ADR-001: Three-tier inference architecture

ADR-002: NanoLLM instead of vLLM for VLM runtime

ADR-003: YOLOE backbone selection deferred to empirical benchmark

ADR-004: FP16 only, INT8 deferred

ADR-005: VLM as separate Docker process with IPC

ADR-006: NVMe SSD mandatory, no SD card

ADR-007: UART integrity for gimbal communication

ADR-008: Behavior Tree for ScanController orchestration

14 KiB

Raw Blame History