mirror of https://github.com/azaion/detections-semantic.git synced 2026-04-22 12:16:37 +00:00

Files

T

Oleksandr Bezdieniezhnykh 8e2ecf50fd Initial commit

Made-with: Cursor

2026-03-26 00:20:30 +02:00

24 KiB

Raw Blame History

Solution Draft

Assessment Findings

Old Component Solution	Weak Point (functional/security/performance)	New Solution
YOLO26 as sole detection backbone	Accuracy regression on custom datasets: Reported YOLO26s "much less accurate" than YOLO11s on identical training data (GitHub #23206). YOLO26 is 3 months old — less battle-tested than YOLO11.	Benchmark YOLO26 vs YOLO11 on initial annotated data before committing. YOLO11 as fallback. YOLOE supports both backbones (yoloe-11s-seg, yoloe-26s-seg).
YOLO26 TensorRT INT8 export	INT8 export fails on Jetson (TRT Error Code 2, OOM). Fix merged (PR #23928) but indicates fragile tooling.	Use FP16 only for initial deployment (confirmed stable). INT8 as future optimization after tooling matures. Pin Ultralytics version + JetPack version.
vLLM as VLM runtime	Unstable on Jetson Orin Nano: system freezes, reboots, installation crashes, excessive memory (multiple open issues). Not production-ready for 8GB devices.	Replace with NanoLLM/NanoVLM — purpose-built for Jetson by NVIDIA's Dusty-NV team. Docker containers for JetPack 5/6. Supports VILA, LLaVA. Stable. Or use llama.cpp with GGUF models (proven on Jetson).
No storage strategy	SD card corruption: Recurring corruption documented across multiple Jetson Orin Nano users. SD cards unsuitable for production.	Mandatory NVMe SSD for OS + models + logging. No SD card in production. Ruggedized NVMe mount for vibration resistance.
No EMI protection on UART	ViewPro documents EMI issues: antennas cause random gimbal panning if within 35cm. Standard UART parity bit insufficient for noisy UAV environment.	Add CRC-16 checksum layer on gimbal commands. Enforce 35cm antenna separation in physical design. Consider shielded UART cable. Command retry on CRC failure (max 3 retries, then log error).
No environmental hardening addressed	UAV environment: vibration, temperature extremes (-20°C to +50°C), dust, EMI, power fluctuations. Dev kit form factor is not field-deployable.	Use ruggedized carrier board (MILBOX-ORNX or similar) with vibration dampening. Conformal coating on exposed connectors. External temperature sensor for environmental monitoring.
No logging or telemetry	No post-flight review capability: field system must log all detections with metadata for model iteration, operator review, and evidence collection.	Add detection logging: timestamp, GPS-denied coordinates, confidence score, detection class, JPEG thumbnail, tier that triggered, freshness metadata. Log to NVMe SSD. Export as structured format (JSON lines) after flight.
No frame recording for offline replay	Training data collection depends on field recording: Without recording, no way to build training dataset from real flights.	Record all camera frames to NVMe at configurable rate (1-5 FPS during Level 1, full rate during Level 2). Include detection overlay option. Post-flight: use recordings for annotation.
No power management	UAV power budget is finite: Jetson at 15W + gimbal + camera + radio. No monitoring of power draw or load shedding.	Monitor power consumption via Jetson's INA sensors. Power budget alert at 80% of allocated watts. Load shedding: disable VLM first, then reduce inference rate, then disable semantic detection.
YOLO26 not validated for this domain	No benchmark on aerial concealment detection: All YOLO26 numbers are on COCO/LVIS. Concealment detection may behave very differently.	First sprint deliverable: benchmark YOLOE-26 (both 11 and 26 backbones) on semantic01-04.png with text/visual prompts. Report AP on initial annotated validation set before committing to backbone.
Freshness and path tracing are untested algorithms	No proven prior art: Both freshness assessment and path-following via skeletonization are novel combinations. Risk of over-engineering before validation.	Implement minimal viable versions first. V1 path tracing: skeleton + endpoint only, no freshness, no junction following. Validate on real flight data before adding complexity.

Product Solution Description

A three-tier semantic detection system for identifying concealed/camouflaged positions from reconnaissance UAV aerial imagery, running on Jetson Orin Nano Super with NVMe SSD storage, active cooling, and ruggedized carrier board, alongside the existing YOLO detection pipeline.

┌──────────────────────────────────────────────────────────────────────────┐
│          JETSON ORIN NANO SUPER (ruggedized carrier, NVMe, 15W)         │
│                                                                          │
│  ┌──────────┐    ┌──────────────┐    ┌──────────────┐    ┌───────────┐  │
│  │ ViewPro  │───▶│  Tier 1      │───▶│  Tier 2      │───▶│ Tier 3    │  │
│  │ A40      │    │  YOLOE       │    │  Path Trace  │    │ VLM       │  │
│  │ Camera   │    │  (11 or 26   │    │  + CNN       │    │ NanoLLM   │  │
│  │ + Frame  │    │   backbone)  │    │  ≤200ms      │    │ (L2 only) │  │
│  │ Quality  │    │  TRT FP16    │    │              │    │ ≤5s       │  │
│  │ Gate     │    │  ≤100ms      │    │              │    │           │  │
│  └────▲─────┘    └──────────────┘    └──────────────┘    └───────────┘  │
│       │                                                                  │
│  ┌────┴─────┐    ┌──────────────┐    ┌──────────────┐    ┌───────────┐  │
│  │ Gimbal   │◀───│  Scan        │    │  Watchdog    │    │ Recorder  │  │
│  │ Control  │    │  Controller  │    │  + Thermal   │    │ + Logger  │  │
│  │ + CRC    │    │  (L1/L2 FSM) │    │  + Power     │    │ (NVMe)   │  │
│  └──────────┘    └──────────────┘    └──────────────┘    └───────────┘  │
│                                                                          │
│  ┌──────────────────────────────┐                                       │
│  │  Existing YOLO Detection    │ (always running, scene context)        │
│  │  Cython + TRT               │                                        │
│  └──────────────────────────────┘                                       │
└──────────────────────────────────────────────────────────────────────────┘

Key changes from draft02:

YOLOE backbone is configurable (YOLO11 or YOLO26) — benchmark before committing
NanoLLM replaces vLLM as VLM runtime (purpose-built for Jetson, stable)
NVMe SSD mandatory — no SD card in production
CRC-16 on gimbal UART — EMI protection
Detection logger + frame recorder — post-flight review and training data collection
Ruggedized carrier board — vibration, temperature, dust protection
Power monitoring + load shedding — finite UAV power budget
FP16 only for initial deployment (INT8 export unstable on Jetson)
Minimal V1 for unproven components — path tracing and freshness start simple

Architecture

Component 1: Tier 1 — Real-Time Detection

Solution	Tools	Advantages	Limitations	Requirements	Security	Performance	Fit
YOLOE with configurable backbone (recommended)	yoloe-11s-seg.pt or yoloe-26s-seg.pt, set_classes() → TRT FP16	Supports both YOLO11 and YOLO26 backbones. Benchmark on real data, pick winner. set_classes() bakes CLIP embeddings for zero overhead.	YOLO26 may regress on custom data vs YOLO11. Needs empirical comparison.	Ultralytics ≥8.4 (pinned version), TensorRT, JetPack 6.2	Local only	YOLO11s TRT FP16: ~7ms (640px). YOLO26s: similar or slightly faster.	Best fit. Hedge against backbone risk.

Version pinning strategy:

Pin ultralytics==8.4.X (specific patch version validated on Jetson)
Pin JetPack 6.2 + TensorRT version
Test every Ultralytics update in staging before deploying to production
Keep both yoloe-11s-seg and yoloe-26s-seg TRT engines on NVMe; switch via config

YOLO backbone selection process (Sprint 1):

Annotate 200 frames from real flight footage (footpaths, branch piles, entrances)
Fine-tune YOLOE-11s-seg and YOLOE-26s-seg on same dataset, same hyperparameters
Evaluate on held-out validation set (50 frames)
Pick backbone with higher mAP50
If delta < 2%: pick YOLO26 (faster CPU inference, NMS-free deployment)

Component 2: Tier 2 — Spatial Reasoning & CNN Confirmation

Solution	Tools	Advantages	Limitations	Requirements	Security	Performance	Fit
V1 minimal path tracing + heuristic classifier (recommended for initial release)	OpenCV, scikit-image	No training data needed. Skeleton + endpoint detection + simple heuristic: "dark mass at endpoint → flag." Fast to implement and validate.	Low accuracy. Many false positives.	OpenCV, scikit-image	Offline	~30ms	V1: ship fast, validate on real data.
V2 trained CNN (after data collection)	MobileNetV3-Small, TensorRT FP16	Higher accuracy after training. Dynamic ROI sizing.	Needs 300+ positive, 1000+ negative annotated ROI crops.	PyTorch, TRT export	Offline	~5-10ms classification	V2: replace heuristic once data exists.

V1 heuristic for endpoint analysis (no training data needed):

Skeletonize footpath mask with branch pruning
Find endpoints
For each endpoint: extract ROI (dynamic size based on GSD)
Compute: mean_darkness = mean intensity in ROI center 50%. contrast = (surrounding_mean - center_mean) / surrounding_mean. area_ratio = dark_pixel_count / total_pixels.
If mean_darkness < threshold_dark AND contrast > threshold_contrast → flag as potential concealed position
Thresholds: configurable, tuned per season. Start with winter values.

V1 freshness (metadata only, not a filter):

contrast_ratio of path vs surrounding terrain
Report as: "high contrast" (likely fresh) / "low contrast" (likely stale)
No binary classification. Operator sees all detections with freshness tag.

Component 3: Tier 3 — VLM Deep Analysis

Solution	Tools	Advantages	Limitations	Requirements	Security	Performance	Fit
NanoLLM with VILA-2.7B or VILA1.5-3B (recommended)	NanoLLM Docker container, MLC/TVM quantization	Purpose-built for Jetson by NVIDIA team. Stable Docker containers. Optimized memory management. Supports VLMs natively.	Limited model selection (VILA, LLaVA, Obsidian). Not all VLMs available.	Docker, JetPack 6, NVMe for container storage	Local only, container isolation	~15-25 tok/s on Orin Nano (4-bit MLC)	Most stable Jetson VLM option.
llama.cpp with GGUF VLM	llama.cpp, GGUF model files	Lightweight. No Docker needed. Proven stability on Jetson. Wide model support.	Manual build. Less optimized than NanoLLM for Jetson GPU.	llama.cpp build, GGUF weights	Local only	~10-20 tok/s estimated	Fallback if NanoLLM doesn't support needed model.
~~vLLM~~	~~vLLM Docker~~	~~High throughput~~	System freezes, reboots, installation crashes on Orin Nano. Multiple open bugs. Not production-ready.	N/A	N/A	N/A	Not recommended.

Model selection for NanoLLM:

Primary: VILA1.5-3B (confirmed on Orin Nano, multimodal, 4-bit MLC)
If UAV-VL-R1 GGUF weights become available: use via llama.cpp (aerial-specialized)
Fallback: Obsidian-3B (mini VLM, lower accuracy but very fast)

Component 4: Camera Gimbal Control

Solution	Tools	Advantages	Limitations	Requirements	Security	Performance	Fit
ViewLink serial driver + CRC-16 + PID + watchdog (recommended)	pyserial, crcmod, PID library, threading	Robust communication. CRC catches EMI-corrupted commands. Retry logic. Watchdog.	ViewLink protocol implementation from spec. Physical EMI mitigation required.	ViewPro docs, UART, shielded cable, 35cm antenna separation	Physical only	<10ms command + CRC overhead negligible	Production-grade.

UART reliability layer:

Packet format: [SOF(2)] [CMD(N)] [CRC16(2)]
- SOF: 0xAA 0x55 (start of frame)
- CMD: ViewLink command bytes per protocol spec
- CRC16: CRC-CCITT over CMD bytes

On send: compute CRC-16, append to ViewLink command packet
On receive (gimbal feedback): validate CRC-16. Discard corrupted frames.
On CRC failure (send): retry up to 3 times with 10ms delay. Log failure after 3 retries.
Note: Check if ViewLink protocol already includes checksums (read full spec first). If so, use native checksum; don't add redundant CRC.

Physical EMI mitigation checklist:

Gimbal UART cable: shielded, shortest possible run
Video/data transmitter antenna: ≥35cm from gimbal (ViewPro recommendation)
Independent power supply for gimbal (or filtered from main bus)
Ferrite beads on UART cable near Jetson connector

Component 5: Recording, Logging & Telemetry

Solution	Tools	Advantages	Limitations	Requirements	Security	Performance	Fit
NVMe-backed frame recorder + JSON-lines detection logger (recommended)	OpenCV VideoWriter / JPEG sequences, JSON lines, NVMe SSD	Post-flight review. Training data collection. Evidence. Detection audit trail.	NVMe write bandwidth (~500 MB/s) more than sufficient. Storage: ~2GB/min at 1080p 5FPS JPEG.	NVMe SSD ≥256GB	Physical access to NVMe	~5ms per frame write (async)	Essential for field deployment.

Detection log format (JSON lines, one per detection):

{
  "ts": "2026-03-19T14:32:01.234Z",
  "frame_id": 12345,
  "gps_denied_lat": 48.123456,
  "gps_denied_lon": 37.654321,
  "tier": 1,
  "class": "footpath",
  "confidence": 0.72,
  "bbox": [0.12, 0.34, 0.45, 0.67],
  "freshness": "high_contrast",
  "tier2_result": "concealed_position",
  "tier2_confidence": 0.85,
  "tier3_used": false,
  "thumbnail_path": "frames/12345_det_0.jpg"
}

Frame recording strategy:

Level 1: record every 5th frame (1-2 FPS) — overview coverage
Level 2: record every frame (30 FPS) — detailed analysis footage
Storage budget: 256GB NVMe ≈ 2 hours at Level 2 full rate, or 10+ hours at Level 1 rate
Circular buffer: when storage >80% full, overwrite oldest Level 1 frames (keep Level 2)

Component 6: System Health & Resilience

Monitoring threads:

Monitor	Check Interval	Threshold	Action
Thermal (T_junction)	1s	>75°C	Degrade to Level 1 only
Thermal (T_junction)	1s	>80°C	Disable semantic detection
Power (Jetson INA)	2s	>80% budget	Disable VLM
Power (Jetson INA)	2s	>90% budget	Reduce inference rate to 5 FPS
Gimbal heartbeat	2s	No response	Force Level 1 sweep pattern
Semantic process	5s	No heartbeat	Restart with 5s backoff, max 3 attempts
VLM process	5s	No heartbeat	Mark Tier 3 unavailable, continue Tier 1+2
NVMe free space	60s	<20% free	Switch to Level 1 recording rate only
Frame quality	per frame	Laplacian var < threshold	Skip frame, use buffered good frame

Graceful degradation (4 levels, unchanged from draft02):

Level	Condition	Capability
0 — Full	All nominal, T < 70°C	Tier 1+2+3, Level 1+2, gimbal, recording
1 — No VLM	VLM unavailable or T > 75°C or power > 80%	Tier 1+2, Level 1+2, gimbal, recording
2 — No semantic	Semantic crashed 3x or T > 80°C	Existing YOLO only, Level 1 sweep, recording
3 — No gimbal	Gimbal UART failed 3x	Existing YOLO only, fixed camera, recording

Component 7: Integration & Deployment

Hardware BOM additions (beyond existing system):

Item	Purpose	Estimated Cost
NVMe SSD ≥256GB (industrial grade)	OS + models + recording + logging	$40-80
Active cooling fan (30mm+)	Prevent thermal throttling	$10-20
Ruggedized carrier board (e.g., MILBOX-ORNX or custom)	Vibration, temperature, dust protection	$200-500
Shielded UART cable + ferrite beads	EMI protection for gimbal communication	$10-20
Total additional hardware		~$260-620

Software deployment:

OS: JetPack 6.2 on NVMe SSD
YOLOE models: TRT FP16 engines on NVMe (both 11 and 26 backbone variants)
VLM: NanoLLM Docker container on NVMe
Existing YOLO: current Cython + TRT pipeline (unchanged)
New Cython modules: semantic detection, gimbal control, scan controller, recorder
VLM process: separate Docker container, IPC via Unix socket
Config: YAML file for all thresholds, class names, scan parameters, degradation thresholds

Version control & update strategy:

Pin all dependency versions (Ultralytics, TensorRT, NanoLLM, OpenCV)
Model updates: swap TRT engine files on NVMe, restart service
Config updates: edit YAML, restart service
No over-the-air updates (air-gapped system). USB drive for field updates.

Training & Data Strategy

Phase 0: Benchmark Sprint (Week 1-2)

Deploy YOLOE-26s-seg and YOLOE-11s-seg in open-vocab mode
Test text/visual prompts on semantic01-04.png + 50 additional frames
Record results. Pick backbone with better qualitative detection.
Deploy V1 heuristic endpoint analysis (no CNN, no training data needed)
First field test flight with recording enabled

Phase 1: Field validation & data collection (Week 2-6)

Deploy TRT FP16 engine with best backbone
Record all flights to NVMe
Operator marks detections as true/false positive in post-flight review
Build annotation backlog from recorded frames
Target: 500 annotated frames by week 6

Phase 2: Custom model training (Week 6-10)

Fine-tune YOLOE-Seg on custom dataset (linear probing → full fine-tune)
Train MobileNetV3-Small CNN on endpoint ROI crops
A/B test: custom model vs YOLOE zero-shot on validation set
Deploy winning model as new TRT engine

Phase 3: VLM & refinement (Week 8-14)

Deploy NanoLLM with VILA1.5-3B
Tune prompting on collected ambiguous cases
Train freshness classifier (if enough annotated freshness labels exist)
Target: 1500+ images per class

Phase 4: Seasonal expansion (Month 4+)

Spring/summer annotation campaigns
Re-train all models with multi-season data
Adjust heuristic thresholds per season (configurable via YAML)

Testing Strategy

Integration / Functional Tests

YOLOE text prompt detection on reference images (both 11 and 26 backbones)
TRT FP16 export on Jetson Orin Nano Super (verify no OOM, no crash)
V1 heuristic endpoint analysis on 20 synthetic masks (10 with hideouts, 10 without)
Frame quality gate: inject blurry frames, verify rejection
Gimbal CRC layer: inject corrupted commands, verify retry + log
Gimbal watchdog: simulate hang, verify forced Level 1 within 2.5s
NanoLLM VLM: load model, run inference on 10 aerial images, verify output + memory
VLM load/unload cycle: 10 cycles without memory leak
Detection logger: verify JSON-lines format, all fields populated
Frame recorder: verify NVMe write speed, no dropped frames at 30 FPS
Full pipeline end-to-end on recorded flight footage (offline replay)
Graceful degradation: simulate each failure mode, verify correct degradation level

Non-Functional Tests

Tier 1 latency on Jetson Orin Nano Super TRT FP16: ≤100ms (both backbones)
Tier 2 latency (V1 heuristic): ≤50ms. (V2 CNN): ≤200ms
Tier 3 latency (NanoLLM VLM): ≤5 seconds
Memory peak: all components loaded < 7GB
Thermal: 60-minute sustained inference, T_junction < 75°C with active cooling
NVMe endurance: continuous recording for 2 hours, verify no write errors
Power draw: measure at each degradation level, verify within UAV power budget
EMI test: operate near data transmitter antenna, verify no gimbal anomalies with CRC layer
Cold start: power on → first detection within 60 seconds (model load time)
Vibration: mount Jetson on vibration table, run inference, compare detection accuracy vs static

Technology Maturity Assessment

Component	Technology	Maturity	Risk	Mitigation
Tier 1 Detection	YOLOE/YOLO26/YOLO11	Medium — YOLO26 is 3 months old, reported regressions on custom data. YOLOE-26 even newer.	Medium	Benchmark both backbones. YOLO11 is battle-tested fallback. Pin versions.
TRT FP16 Export	TensorRT on JetPack 6.2	High — FP16 is stable on Jetson. Well-documented.	Low	FP16 only. Avoid INT8 initially.
TRT INT8 Export	TensorRT on JetPack 6.2	Low — Documented crashes (PR #23928). Calibration issues.	High	Defer to Phase 3+. FP16 sufficient for now.
VLM (NanoLLM)	NanoLLM + VILA-3B	Medium-High — Purpose-built for Jetson by NVIDIA team. Docker-based. Monthly releases.	Low	More stable than vLLM. Use Docker containers.
VLM (vLLM)	vLLM on Jetson	Low — System freezes, crashes, open bugs.	High	Do not use. NanoLLM instead.
Path Tracing	Skeletonization + OpenCV	High — Decades-old algorithms. Well-understood.	Low	Pruning needed for noisy inputs.
CNN Classifier	MobileNetV3-Small + TRT	High — Proven architecture. TRT FP16 stable.	Low	Standard transfer learning.
Gimbal Control	ViewLink Serial Protocol	Medium — Protocol documented. ArduPilot driver exists.	Medium	EMI mitigation critical. CRC layer.
Freshness Assessment	Novel heuristic	Low — No prior art. Experimental.	High	V1: metadata only, not a filter. Iterate with data.
NVMe Storage	Industrial NVMe on Jetson	High — Production standard. SD card alternative is unreliable.	Low	Use industrial-grade SSD.
Ruggedized Hardware	MILBOX-ORNX or custom	High — Established product. Designed for Jetson + UAV.	Low	Standard procurement.

References

YOLO26 docs: https://docs.ultralytics.com/models/yolo26/
YOLOE docs: https://docs.ultralytics.com/models/yoloe/
YOLO26 accuracy regression: https://github.com/ultralytics/ultralytics/issues/23206
YOLO26 TRT INT8 crash: https://github.com/ultralytics/ultralytics/issues/23841
YOLO26 TRT OOM fix: https://github.com/ultralytics/ultralytics/pull/23928
vLLM Jetson freezes: https://github.com/dusty-nv/jetson-containers/issues/800
vLLM Jetson install crash: https://github.com/vllm-project/vllm/issues/23376
NanoLLM docs: https://dusty-nv.github.io/NanoLLM/
NanoVLM: https://jetson-ai-lab.com/tutorial_nano-vlm.html
MILBOX-ORNX: https://forecr.io/products/jetson-orin-nx-orin-nano-rugged-compact-pc-milbox-ornx
Jetson SD card corruption: https://forums.developer.nvidia.com/t/corrupted-sd-cards/265418
Jetson thermal throttling: https://www.alibaba.com/product-insights/how-to-run-private-llama-3-inference-on-a-200-jetson-orin-nano-without-thermal-throttling.html
ViewPro EMI issues: https://www.viewprouav.com/help/gimbal/
UART CRC reliability: https://link.springer.com/chapter/10.1007/978-981-19-8563-8_23
Military edge AI thermal: https://www.mobilityengineeringtech.com/component/content/article/53967
UAV-VL-R1: https://arxiv.org/pdf/2508.11196
Qwen2-VL-2B-GPTQ-INT8: https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct-GPTQ-Int8
Ultralytics Jetson guide: https://docs.ultralytics.com/guides/nvidia-jetson
Skelite: https://arxiv.org/html/2503.07369v1
YOLO FP reduction: https://docs.ultralytics.com/yolov5/tutorials/tips_for_best_training_results

24 KiB Raw Blame History