# Solution Draft ## Assessment Findings | Old Component Solution | Weak Point (functional/security/performance) | New Solution | |------------------------|----------------------------------------------|-------------| | YOLO26 as sole detection backbone | **Accuracy regression on custom datasets**: Reported YOLO26s "much less accurate" than YOLO11s on identical training data (GitHub #23206). YOLO26 is 3 months old — less battle-tested than YOLO11. | Benchmark YOLO26 vs YOLO11 on initial annotated data before committing. YOLO11 as fallback. YOLOE supports both backbones (yoloe-11s-seg, yoloe-26s-seg). | | YOLO26 TensorRT INT8 export | **INT8 export fails on Jetson** (TRT Error Code 2, OOM). Fix merged (PR #23928) but indicates fragile tooling. | Use FP16 only for initial deployment (confirmed stable). INT8 as future optimization after tooling matures. Pin Ultralytics version + JetPack version. | | vLLM as VLM runtime | **Unstable on Jetson Orin Nano**: system freezes, reboots, installation crashes, excessive memory (multiple open issues). Not production-ready for 8GB devices. | **Replace with NanoLLM/NanoVLM** — purpose-built for Jetson by NVIDIA's Dusty-NV team. Docker containers for JetPack 5/6. Supports VILA, LLaVA. Stable. Or use llama.cpp with GGUF models (proven on Jetson). | | No storage strategy | **SD card corruption**: Recurring corruption documented across multiple Jetson Orin Nano users. SD cards unsuitable for production. | **Mandatory NVMe SSD** for OS + models + logging. No SD card in production. Ruggedized NVMe mount for vibration resistance. | | No EMI protection on UART | **ViewPro documents EMI issues**: antennas cause random gimbal panning if within 35cm. Standard UART parity bit insufficient for noisy UAV environment. | Add CRC-16 checksum layer on gimbal commands. Enforce 35cm antenna separation in physical design. Consider shielded UART cable. Command retry on CRC failure (max 3 retries, then log error). | | No environmental hardening addressed | **UAV environment**: vibration, temperature extremes (-20°C to +50°C), dust, EMI, power fluctuations. Dev kit form factor is not field-deployable. | Use ruggedized carrier board (MILBOX-ORNX or similar) with vibration dampening. Conformal coating on exposed connectors. External temperature sensor for environmental monitoring. | | No logging or telemetry | **No post-flight review capability**: field system must log all detections with metadata for model iteration, operator review, and evidence collection. | Add detection logging: timestamp, GPS-denied coordinates, confidence score, detection class, JPEG thumbnail, tier that triggered, freshness metadata. Log to NVMe SSD. Export as structured format (JSON lines) after flight. | | No frame recording for offline replay | **Training data collection depends on field recording**: Without recording, no way to build training dataset from real flights. | Record all camera frames to NVMe at configurable rate (1-5 FPS during Level 1, full rate during Level 2). Include detection overlay option. Post-flight: use recordings for annotation. | | No power management | **UAV power budget is finite**: Jetson at 15W + gimbal + camera + radio. No monitoring of power draw or load shedding. | Monitor power consumption via Jetson's INA sensors. Power budget alert at 80% of allocated watts. Load shedding: disable VLM first, then reduce inference rate, then disable semantic detection. | | YOLO26 not validated for this domain | **No benchmark on aerial concealment detection**: All YOLO26 numbers are on COCO/LVIS. Concealment detection may behave very differently. | First sprint deliverable: benchmark YOLOE-26 (both 11 and 26 backbones) on semantic01-04.png with text/visual prompts. Report AP on initial annotated validation set before committing to backbone. | | Freshness and path tracing are untested algorithms | **No proven prior art**: Both freshness assessment and path-following via skeletonization are novel combinations. Risk of over-engineering before validation. | Implement minimal viable versions first. V1 path tracing: skeleton + endpoint only, no freshness, no junction following. Validate on real flight data before adding complexity. | ## Product Solution Description A three-tier semantic detection system for identifying concealed/camouflaged positions from reconnaissance UAV aerial imagery, running on Jetson Orin Nano Super with NVMe SSD storage, active cooling, and ruggedized carrier board, alongside the existing YOLO detection pipeline. ``` ┌──────────────────────────────────────────────────────────────────────────┐ │ JETSON ORIN NANO SUPER (ruggedized carrier, NVMe, 15W) │ │ │ │ ┌──────────┐ ┌──────────────┐ ┌──────────────┐ ┌───────────┐ │ │ │ ViewPro │───▶│ Tier 1 │───▶│ Tier 2 │───▶│ Tier 3 │ │ │ │ A40 │ │ YOLOE │ │ Path Trace │ │ VLM │ │ │ │ Camera │ │ (11 or 26 │ │ + CNN │ │ NanoLLM │ │ │ │ + Frame │ │ backbone) │ │ ≤200ms │ │ (L2 only) │ │ │ │ Quality │ │ TRT FP16 │ │ │ │ ≤5s │ │ │ │ Gate │ │ ≤100ms │ │ │ │ │ │ │ └────▲─────┘ └──────────────┘ └──────────────┘ └───────────┘ │ │ │ │ │ ┌────┴─────┐ ┌──────────────┐ ┌──────────────┐ ┌───────────┐ │ │ │ Gimbal │◀───│ Scan │ │ Watchdog │ │ Recorder │ │ │ │ Control │ │ Controller │ │ + Thermal │ │ + Logger │ │ │ │ + CRC │ │ (L1/L2 FSM) │ │ + Power │ │ (NVMe) │ │ │ └──────────┘ └──────────────┘ └──────────────┘ └───────────┘ │ │ │ │ ┌──────────────────────────────┐ │ │ │ Existing YOLO Detection │ (always running, scene context) │ │ │ Cython + TRT │ │ │ └──────────────────────────────┘ │ └──────────────────────────────────────────────────────────────────────────┘ ``` Key changes from draft02: - **YOLOE backbone is configurable** (YOLO11 or YOLO26) — benchmark before committing - **NanoLLM replaces vLLM** as VLM runtime (purpose-built for Jetson, stable) - **NVMe SSD mandatory** — no SD card in production - **CRC-16 on gimbal UART** — EMI protection - **Detection logger + frame recorder** — post-flight review and training data collection - **Ruggedized carrier board** — vibration, temperature, dust protection - **Power monitoring + load shedding** — finite UAV power budget - **FP16 only** for initial deployment (INT8 export unstable on Jetson) - **Minimal V1 for unproven components** — path tracing and freshness start simple ## Architecture ### Component 1: Tier 1 — Real-Time Detection | Solution | Tools | Advantages | Limitations | Requirements | Security | Performance | Fit | |----------|-------|-----------|-------------|-------------|----------|------------|-----| | **YOLOE with configurable backbone (recommended)** | yoloe-11s-seg.pt or yoloe-26s-seg.pt, set_classes() → TRT FP16 | Supports both YOLO11 and YOLO26 backbones. Benchmark on real data, pick winner. set_classes() bakes CLIP embeddings for zero overhead. | YOLO26 may regress on custom data vs YOLO11. Needs empirical comparison. | Ultralytics ≥8.4 (pinned version), TensorRT, JetPack 6.2 | Local only | YOLO11s TRT FP16: ~7ms (640px). YOLO26s: similar or slightly faster. | **Best fit. Hedge against backbone risk.** | **Version pinning strategy**: - Pin `ultralytics==8.4.X` (specific patch version validated on Jetson) - Pin JetPack 6.2 + TensorRT version - Test every Ultralytics update in staging before deploying to production - Keep both yoloe-11s-seg and yoloe-26s-seg TRT engines on NVMe; switch via config **YOLO backbone selection process (Sprint 1)**: 1. Annotate 200 frames from real flight footage (footpaths, branch piles, entrances) 2. Fine-tune YOLOE-11s-seg and YOLOE-26s-seg on same dataset, same hyperparameters 3. Evaluate on held-out validation set (50 frames) 4. Pick backbone with higher mAP50 5. If delta < 2%: pick YOLO26 (faster CPU inference, NMS-free deployment) ### Component 2: Tier 2 — Spatial Reasoning & CNN Confirmation | Solution | Tools | Advantages | Limitations | Requirements | Security | Performance | Fit | |----------|-------|-----------|-------------|-------------|----------|------------|-----| | **V1 minimal path tracing + heuristic classifier (recommended for initial release)** | OpenCV, scikit-image | No training data needed. Skeleton + endpoint detection + simple heuristic: "dark mass at endpoint → flag." Fast to implement and validate. | Low accuracy. Many false positives. | OpenCV, scikit-image | Offline | ~30ms | **V1: ship fast, validate on real data.** | | **V2 trained CNN (after data collection)** | MobileNetV3-Small, TensorRT FP16 | Higher accuracy after training. Dynamic ROI sizing. | Needs 300+ positive, 1000+ negative annotated ROI crops. | PyTorch, TRT export | Offline | ~5-10ms classification | **V2: replace heuristic once data exists.** | **V1 heuristic for endpoint analysis** (no training data needed): 1. Skeletonize footpath mask with branch pruning 2. Find endpoints 3. For each endpoint: extract ROI (dynamic size based on GSD) 4. Compute: mean_darkness = mean intensity in ROI center 50%. contrast = (surrounding_mean - center_mean) / surrounding_mean. area_ratio = dark_pixel_count / total_pixels. 5. If mean_darkness < threshold_dark AND contrast > threshold_contrast → flag as potential concealed position 6. Thresholds: configurable, tuned per season. Start with winter values. **V1 freshness** (metadata only, not a filter): - contrast_ratio of path vs surrounding terrain - Report as: "high contrast" (likely fresh) / "low contrast" (likely stale) - No binary classification. Operator sees all detections with freshness tag. ### Component 3: Tier 3 — VLM Deep Analysis | Solution | Tools | Advantages | Limitations | Requirements | Security | Performance | Fit | |----------|-------|-----------|-------------|-------------|----------|------------|-----| | **NanoLLM with VILA-2.7B or VILA1.5-3B (recommended)** | NanoLLM Docker container, MLC/TVM quantization | Purpose-built for Jetson by NVIDIA team. Stable Docker containers. Optimized memory management. Supports VLMs natively. | Limited model selection (VILA, LLaVA, Obsidian). Not all VLMs available. | Docker, JetPack 6, NVMe for container storage | Local only, container isolation | ~15-25 tok/s on Orin Nano (4-bit MLC) | **Most stable Jetson VLM option.** | | llama.cpp with GGUF VLM | llama.cpp, GGUF model files | Lightweight. No Docker needed. Proven stability on Jetson. Wide model support. | Manual build. Less optimized than NanoLLM for Jetson GPU. | llama.cpp build, GGUF weights | Local only | ~10-20 tok/s estimated | **Fallback if NanoLLM doesn't support needed model.** | | ~~vLLM~~ | ~~vLLM Docker~~ | ~~High throughput~~ | **System freezes, reboots, installation crashes on Orin Nano. Multiple open bugs. Not production-ready.** | N/A | N/A | N/A | **Not recommended.** | **Model selection for NanoLLM**: - Primary: VILA1.5-3B (confirmed on Orin Nano, multimodal, 4-bit MLC) - If UAV-VL-R1 GGUF weights become available: use via llama.cpp (aerial-specialized) - Fallback: Obsidian-3B (mini VLM, lower accuracy but very fast) ### Component 4: Camera Gimbal Control | Solution | Tools | Advantages | Limitations | Requirements | Security | Performance | Fit | |----------|-------|-----------|-------------|-------------|----------|------------|-----| | **ViewLink serial driver + CRC-16 + PID + watchdog (recommended)** | pyserial, crcmod, PID library, threading | Robust communication. CRC catches EMI-corrupted commands. Retry logic. Watchdog. | ViewLink protocol implementation from spec. Physical EMI mitigation required. | ViewPro docs, UART, shielded cable, 35cm antenna separation | Physical only | <10ms command + CRC overhead negligible | **Production-grade.** | **UART reliability layer**: ``` Packet format: [SOF(2)] [CMD(N)] [CRC16(2)] - SOF: 0xAA 0x55 (start of frame) - CMD: ViewLink command bytes per protocol spec - CRC16: CRC-CCITT over CMD bytes ``` - On send: compute CRC-16, append to ViewLink command packet - On receive (gimbal feedback): validate CRC-16. Discard corrupted frames. - On CRC failure (send): retry up to 3 times with 10ms delay. Log failure after 3 retries. - Note: Check if ViewLink protocol already includes checksums (read full spec first). If so, use native checksum; don't add redundant CRC. **Physical EMI mitigation checklist**: - [ ] Gimbal UART cable: shielded, shortest possible run - [ ] Video/data transmitter antenna: ≥35cm from gimbal (ViewPro recommendation) - [ ] Independent power supply for gimbal (or filtered from main bus) - [ ] Ferrite beads on UART cable near Jetson connector ### Component 5: Recording, Logging & Telemetry | Solution | Tools | Advantages | Limitations | Requirements | Security | Performance | Fit | |----------|-------|-----------|-------------|-------------|----------|------------|-----| | **NVMe-backed frame recorder + JSON-lines detection logger (recommended)** | OpenCV VideoWriter / JPEG sequences, JSON lines, NVMe SSD | Post-flight review. Training data collection. Evidence. Detection audit trail. | NVMe write bandwidth (~500 MB/s) more than sufficient. Storage: ~2GB/min at 1080p 5FPS JPEG. | NVMe SSD ≥256GB | Physical access to NVMe | ~5ms per frame write (async) | **Essential for field deployment.** | **Detection log format** (JSON lines, one per detection): ```json { "ts": "2026-03-19T14:32:01.234Z", "frame_id": 12345, "gps_denied_lat": 48.123456, "gps_denied_lon": 37.654321, "tier": 1, "class": "footpath", "confidence": 0.72, "bbox": [0.12, 0.34, 0.45, 0.67], "freshness": "high_contrast", "tier2_result": "concealed_position", "tier2_confidence": 0.85, "tier3_used": false, "thumbnail_path": "frames/12345_det_0.jpg" } ``` **Frame recording strategy**: - Level 1: record every 5th frame (1-2 FPS) — overview coverage - Level 2: record every frame (30 FPS) — detailed analysis footage - Storage budget: 256GB NVMe ≈ 2 hours at Level 2 full rate, or 10+ hours at Level 1 rate - Circular buffer: when storage >80% full, overwrite oldest Level 1 frames (keep Level 2) ### Component 6: System Health & Resilience **Monitoring threads**: | Monitor | Check Interval | Threshold | Action | |---------|---------------|-----------|--------| | Thermal (T_junction) | 1s | >75°C | Degrade to Level 1 only | | Thermal (T_junction) | 1s | >80°C | Disable semantic detection | | Power (Jetson INA) | 2s | >80% budget | Disable VLM | | Power (Jetson INA) | 2s | >90% budget | Reduce inference rate to 5 FPS | | Gimbal heartbeat | 2s | No response | Force Level 1 sweep pattern | | Semantic process | 5s | No heartbeat | Restart with 5s backoff, max 3 attempts | | VLM process | 5s | No heartbeat | Mark Tier 3 unavailable, continue Tier 1+2 | | NVMe free space | 60s | <20% free | Switch to Level 1 recording rate only | | Frame quality | per frame | Laplacian var < threshold | Skip frame, use buffered good frame | **Graceful degradation** (4 levels, unchanged from draft02): | Level | Condition | Capability | |-------|-----------|-----------| | 0 — Full | All nominal, T < 70°C | Tier 1+2+3, Level 1+2, gimbal, recording | | 1 — No VLM | VLM unavailable or T > 75°C or power > 80% | Tier 1+2, Level 1+2, gimbal, recording | | 2 — No semantic | Semantic crashed 3x or T > 80°C | Existing YOLO only, Level 1 sweep, recording | | 3 — No gimbal | Gimbal UART failed 3x | Existing YOLO only, fixed camera, recording | ### Component 7: Integration & Deployment **Hardware BOM additions** (beyond existing system): | Item | Purpose | Estimated Cost | |------|---------|----------------| | NVMe SSD ≥256GB (industrial grade) | OS + models + recording + logging | $40-80 | | Active cooling fan (30mm+) | Prevent thermal throttling | $10-20 | | Ruggedized carrier board (e.g., MILBOX-ORNX or custom) | Vibration, temperature, dust protection | $200-500 | | Shielded UART cable + ferrite beads | EMI protection for gimbal communication | $10-20 | | Total additional hardware | | ~$260-620 | **Software deployment**: - OS: JetPack 6.2 on NVMe SSD - YOLOE models: TRT FP16 engines on NVMe (both 11 and 26 backbone variants) - VLM: NanoLLM Docker container on NVMe - Existing YOLO: current Cython + TRT pipeline (unchanged) - New Cython modules: semantic detection, gimbal control, scan controller, recorder - VLM process: separate Docker container, IPC via Unix socket - Config: YAML file for all thresholds, class names, scan parameters, degradation thresholds **Version control & update strategy**: - Pin all dependency versions (Ultralytics, TensorRT, NanoLLM, OpenCV) - Model updates: swap TRT engine files on NVMe, restart service - Config updates: edit YAML, restart service - No over-the-air updates (air-gapped system). USB drive for field updates. ## Training & Data Strategy ### Phase 0: Benchmark Sprint (Week 1-2) - Deploy YOLOE-26s-seg and YOLOE-11s-seg in open-vocab mode - Test text/visual prompts on semantic01-04.png + 50 additional frames - Record results. Pick backbone with better qualitative detection. - Deploy V1 heuristic endpoint analysis (no CNN, no training data needed) - First field test flight with recording enabled ### Phase 1: Field validation & data collection (Week 2-6) - Deploy TRT FP16 engine with best backbone - Record all flights to NVMe - Operator marks detections as true/false positive in post-flight review - Build annotation backlog from recorded frames - Target: 500 annotated frames by week 6 ### Phase 2: Custom model training (Week 6-10) - Fine-tune YOLOE-Seg on custom dataset (linear probing → full fine-tune) - Train MobileNetV3-Small CNN on endpoint ROI crops - A/B test: custom model vs YOLOE zero-shot on validation set - Deploy winning model as new TRT engine ### Phase 3: VLM & refinement (Week 8-14) - Deploy NanoLLM with VILA1.5-3B - Tune prompting on collected ambiguous cases - Train freshness classifier (if enough annotated freshness labels exist) - Target: 1500+ images per class ### Phase 4: Seasonal expansion (Month 4+) - Spring/summer annotation campaigns - Re-train all models with multi-season data - Adjust heuristic thresholds per season (configurable via YAML) ## Testing Strategy ### Integration / Functional Tests - YOLOE text prompt detection on reference images (both 11 and 26 backbones) - TRT FP16 export on Jetson Orin Nano Super (verify no OOM, no crash) - V1 heuristic endpoint analysis on 20 synthetic masks (10 with hideouts, 10 without) - Frame quality gate: inject blurry frames, verify rejection - Gimbal CRC layer: inject corrupted commands, verify retry + log - Gimbal watchdog: simulate hang, verify forced Level 1 within 2.5s - NanoLLM VLM: load model, run inference on 10 aerial images, verify output + memory - VLM load/unload cycle: 10 cycles without memory leak - Detection logger: verify JSON-lines format, all fields populated - Frame recorder: verify NVMe write speed, no dropped frames at 30 FPS - Full pipeline end-to-end on recorded flight footage (offline replay) - Graceful degradation: simulate each failure mode, verify correct degradation level ### Non-Functional Tests - Tier 1 latency on Jetson Orin Nano Super TRT FP16: ≤100ms (both backbones) - Tier 2 latency (V1 heuristic): ≤50ms. (V2 CNN): ≤200ms - Tier 3 latency (NanoLLM VLM): ≤5 seconds - Memory peak: all components loaded < 7GB - Thermal: 60-minute sustained inference, T_junction < 75°C with active cooling - NVMe endurance: continuous recording for 2 hours, verify no write errors - Power draw: measure at each degradation level, verify within UAV power budget - EMI test: operate near data transmitter antenna, verify no gimbal anomalies with CRC layer - Cold start: power on → first detection within 60 seconds (model load time) - Vibration: mount Jetson on vibration table, run inference, compare detection accuracy vs static ## Technology Maturity Assessment | Component | Technology | Maturity | Risk | Mitigation | |-----------|-----------|----------|------|------------| | Tier 1 Detection | YOLOE/YOLO26/YOLO11 | **Medium** — YOLO26 is 3 months old, reported regressions on custom data. YOLOE-26 even newer. | Medium | Benchmark both backbones. YOLO11 is battle-tested fallback. Pin versions. | | TRT FP16 Export | TensorRT on JetPack 6.2 | **High** — FP16 is stable on Jetson. Well-documented. | Low | FP16 only. Avoid INT8 initially. | | TRT INT8 Export | TensorRT on JetPack 6.2 | **Low** — Documented crashes (PR #23928). Calibration issues. | High | Defer to Phase 3+. FP16 sufficient for now. | | VLM (NanoLLM) | NanoLLM + VILA-3B | **Medium-High** — Purpose-built for Jetson by NVIDIA team. Docker-based. Monthly releases. | Low | More stable than vLLM. Use Docker containers. | | VLM (vLLM) | vLLM on Jetson | **Low** — System freezes, crashes, open bugs. | **High** | **Do not use.** NanoLLM instead. | | Path Tracing | Skeletonization + OpenCV | **High** — Decades-old algorithms. Well-understood. | Low | Pruning needed for noisy inputs. | | CNN Classifier | MobileNetV3-Small + TRT | **High** — Proven architecture. TRT FP16 stable. | Low | Standard transfer learning. | | Gimbal Control | ViewLink Serial Protocol | **Medium** — Protocol documented. ArduPilot driver exists. | Medium | EMI mitigation critical. CRC layer. | | Freshness Assessment | Novel heuristic | **Low** — No prior art. Experimental. | High | V1: metadata only, not a filter. Iterate with data. | | NVMe Storage | Industrial NVMe on Jetson | **High** — Production standard. SD card alternative is unreliable. | Low | Use industrial-grade SSD. | | Ruggedized Hardware | MILBOX-ORNX or custom | **High** — Established product. Designed for Jetson + UAV. | Low | Standard procurement. | ## References - YOLO26 docs: https://docs.ultralytics.com/models/yolo26/ - YOLOE docs: https://docs.ultralytics.com/models/yoloe/ - YOLO26 accuracy regression: https://github.com/ultralytics/ultralytics/issues/23206 - YOLO26 TRT INT8 crash: https://github.com/ultralytics/ultralytics/issues/23841 - YOLO26 TRT OOM fix: https://github.com/ultralytics/ultralytics/pull/23928 - vLLM Jetson freezes: https://github.com/dusty-nv/jetson-containers/issues/800 - vLLM Jetson install crash: https://github.com/vllm-project/vllm/issues/23376 - NanoLLM docs: https://dusty-nv.github.io/NanoLLM/ - NanoVLM: https://jetson-ai-lab.com/tutorial_nano-vlm.html - MILBOX-ORNX: https://forecr.io/products/jetson-orin-nx-orin-nano-rugged-compact-pc-milbox-ornx - Jetson SD card corruption: https://forums.developer.nvidia.com/t/corrupted-sd-cards/265418 - Jetson thermal throttling: https://www.alibaba.com/product-insights/how-to-run-private-llama-3-inference-on-a-200-jetson-orin-nano-without-thermal-throttling.html - ViewPro EMI issues: https://www.viewprouav.com/help/gimbal/ - UART CRC reliability: https://link.springer.com/chapter/10.1007/978-981-19-8563-8_23 - Military edge AI thermal: https://www.mobilityengineeringtech.com/component/content/article/53967 - UAV-VL-R1: https://arxiv.org/pdf/2508.11196 - Qwen2-VL-2B-GPTQ-INT8: https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct-GPTQ-Int8 - Ultralytics Jetson guide: https://docs.ultralytics.com/guides/nvidia-jetson - Skelite: https://arxiv.org/html/2503.07369v1 - YOLO FP reduction: https://docs.ultralytics.com/yolov5/tutorials/tips_for_best_training_results