Made-with: Cursor
24 KiB
Solution Draft
Assessment Findings
| Old Component Solution | Weak Point (functional/security/performance) | New Solution |
|---|---|---|
| YOLO26 as sole detection backbone | Accuracy regression on custom datasets: Reported YOLO26s "much less accurate" than YOLO11s on identical training data (GitHub #23206). YOLO26 is 3 months old — less battle-tested than YOLO11. | Benchmark YOLO26 vs YOLO11 on initial annotated data before committing. YOLO11 as fallback. YOLOE supports both backbones (yoloe-11s-seg, yoloe-26s-seg). |
| YOLO26 TensorRT INT8 export | INT8 export fails on Jetson (TRT Error Code 2, OOM). Fix merged (PR #23928) but indicates fragile tooling. | Use FP16 only for initial deployment (confirmed stable). INT8 as future optimization after tooling matures. Pin Ultralytics version + JetPack version. |
| vLLM as VLM runtime | Unstable on Jetson Orin Nano: system freezes, reboots, installation crashes, excessive memory (multiple open issues). Not production-ready for 8GB devices. | Replace with NanoLLM/NanoVLM — purpose-built for Jetson by NVIDIA's Dusty-NV team. Docker containers for JetPack 5/6. Supports VILA, LLaVA. Stable. Or use llama.cpp with GGUF models (proven on Jetson). |
| No storage strategy | SD card corruption: Recurring corruption documented across multiple Jetson Orin Nano users. SD cards unsuitable for production. | Mandatory NVMe SSD for OS + models + logging. No SD card in production. Ruggedized NVMe mount for vibration resistance. |
| No EMI protection on UART | ViewPro documents EMI issues: antennas cause random gimbal panning if within 35cm. Standard UART parity bit insufficient for noisy UAV environment. | Add CRC-16 checksum layer on gimbal commands. Enforce 35cm antenna separation in physical design. Consider shielded UART cable. Command retry on CRC failure (max 3 retries, then log error). |
| No environmental hardening addressed | UAV environment: vibration, temperature extremes (-20°C to +50°C), dust, EMI, power fluctuations. Dev kit form factor is not field-deployable. | Use ruggedized carrier board (MILBOX-ORNX or similar) with vibration dampening. Conformal coating on exposed connectors. External temperature sensor for environmental monitoring. |
| No logging or telemetry | No post-flight review capability: field system must log all detections with metadata for model iteration, operator review, and evidence collection. | Add detection logging: timestamp, GPS-denied coordinates, confidence score, detection class, JPEG thumbnail, tier that triggered, freshness metadata. Log to NVMe SSD. Export as structured format (JSON lines) after flight. |
| No frame recording for offline replay | Training data collection depends on field recording: Without recording, no way to build training dataset from real flights. | Record all camera frames to NVMe at configurable rate (1-5 FPS during Level 1, full rate during Level 2). Include detection overlay option. Post-flight: use recordings for annotation. |
| No power management | UAV power budget is finite: Jetson at 15W + gimbal + camera + radio. No monitoring of power draw or load shedding. | Monitor power consumption via Jetson's INA sensors. Power budget alert at 80% of allocated watts. Load shedding: disable VLM first, then reduce inference rate, then disable semantic detection. |
| YOLO26 not validated for this domain | No benchmark on aerial concealment detection: All YOLO26 numbers are on COCO/LVIS. Concealment detection may behave very differently. | First sprint deliverable: benchmark YOLOE-26 (both 11 and 26 backbones) on semantic01-04.png with text/visual prompts. Report AP on initial annotated validation set before committing to backbone. |
| Freshness and path tracing are untested algorithms | No proven prior art: Both freshness assessment and path-following via skeletonization are novel combinations. Risk of over-engineering before validation. | Implement minimal viable versions first. V1 path tracing: skeleton + endpoint only, no freshness, no junction following. Validate on real flight data before adding complexity. |
Product Solution Description
A three-tier semantic detection system for identifying concealed/camouflaged positions from reconnaissance UAV aerial imagery, running on Jetson Orin Nano Super with NVMe SSD storage, active cooling, and ruggedized carrier board, alongside the existing YOLO detection pipeline.
┌──────────────────────────────────────────────────────────────────────────┐
│ JETSON ORIN NANO SUPER (ruggedized carrier, NVMe, 15W) │
│ │
│ ┌──────────┐ ┌──────────────┐ ┌──────────────┐ ┌───────────┐ │
│ │ ViewPro │───▶│ Tier 1 │───▶│ Tier 2 │───▶│ Tier 3 │ │
│ │ A40 │ │ YOLOE │ │ Path Trace │ │ VLM │ │
│ │ Camera │ │ (11 or 26 │ │ + CNN │ │ NanoLLM │ │
│ │ + Frame │ │ backbone) │ │ ≤200ms │ │ (L2 only) │ │
│ │ Quality │ │ TRT FP16 │ │ │ │ ≤5s │ │
│ │ Gate │ │ ≤100ms │ │ │ │ │ │
│ └────▲─────┘ └──────────────┘ └──────────────┘ └───────────┘ │
│ │ │
│ ┌────┴─────┐ ┌──────────────┐ ┌──────────────┐ ┌───────────┐ │
│ │ Gimbal │◀───│ Scan │ │ Watchdog │ │ Recorder │ │
│ │ Control │ │ Controller │ │ + Thermal │ │ + Logger │ │
│ │ + CRC │ │ (L1/L2 FSM) │ │ + Power │ │ (NVMe) │ │
│ └──────────┘ └──────────────┘ └──────────────┘ └───────────┘ │
│ │
│ ┌──────────────────────────────┐ │
│ │ Existing YOLO Detection │ (always running, scene context) │
│ │ Cython + TRT │ │
│ └──────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────────┘
Key changes from draft02:
- YOLOE backbone is configurable (YOLO11 or YOLO26) — benchmark before committing
- NanoLLM replaces vLLM as VLM runtime (purpose-built for Jetson, stable)
- NVMe SSD mandatory — no SD card in production
- CRC-16 on gimbal UART — EMI protection
- Detection logger + frame recorder — post-flight review and training data collection
- Ruggedized carrier board — vibration, temperature, dust protection
- Power monitoring + load shedding — finite UAV power budget
- FP16 only for initial deployment (INT8 export unstable on Jetson)
- Minimal V1 for unproven components — path tracing and freshness start simple
Architecture
Component 1: Tier 1 — Real-Time Detection
| Solution | Tools | Advantages | Limitations | Requirements | Security | Performance | Fit |
|---|---|---|---|---|---|---|---|
| YOLOE with configurable backbone (recommended) | yoloe-11s-seg.pt or yoloe-26s-seg.pt, set_classes() → TRT FP16 | Supports both YOLO11 and YOLO26 backbones. Benchmark on real data, pick winner. set_classes() bakes CLIP embeddings for zero overhead. | YOLO26 may regress on custom data vs YOLO11. Needs empirical comparison. | Ultralytics ≥8.4 (pinned version), TensorRT, JetPack 6.2 | Local only | YOLO11s TRT FP16: ~7ms (640px). YOLO26s: similar or slightly faster. | Best fit. Hedge against backbone risk. |
Version pinning strategy:
- Pin
ultralytics==8.4.X(specific patch version validated on Jetson) - Pin JetPack 6.2 + TensorRT version
- Test every Ultralytics update in staging before deploying to production
- Keep both yoloe-11s-seg and yoloe-26s-seg TRT engines on NVMe; switch via config
YOLO backbone selection process (Sprint 1):
- Annotate 200 frames from real flight footage (footpaths, branch piles, entrances)
- Fine-tune YOLOE-11s-seg and YOLOE-26s-seg on same dataset, same hyperparameters
- Evaluate on held-out validation set (50 frames)
- Pick backbone with higher mAP50
- If delta < 2%: pick YOLO26 (faster CPU inference, NMS-free deployment)
Component 2: Tier 2 — Spatial Reasoning & CNN Confirmation
| Solution | Tools | Advantages | Limitations | Requirements | Security | Performance | Fit |
|---|---|---|---|---|---|---|---|
| V1 minimal path tracing + heuristic classifier (recommended for initial release) | OpenCV, scikit-image | No training data needed. Skeleton + endpoint detection + simple heuristic: "dark mass at endpoint → flag." Fast to implement and validate. | Low accuracy. Many false positives. | OpenCV, scikit-image | Offline | ~30ms | V1: ship fast, validate on real data. |
| V2 trained CNN (after data collection) | MobileNetV3-Small, TensorRT FP16 | Higher accuracy after training. Dynamic ROI sizing. | Needs 300+ positive, 1000+ negative annotated ROI crops. | PyTorch, TRT export | Offline | ~5-10ms classification | V2: replace heuristic once data exists. |
V1 heuristic for endpoint analysis (no training data needed):
- Skeletonize footpath mask with branch pruning
- Find endpoints
- For each endpoint: extract ROI (dynamic size based on GSD)
- Compute: mean_darkness = mean intensity in ROI center 50%. contrast = (surrounding_mean - center_mean) / surrounding_mean. area_ratio = dark_pixel_count / total_pixels.
- If mean_darkness < threshold_dark AND contrast > threshold_contrast → flag as potential concealed position
- Thresholds: configurable, tuned per season. Start with winter values.
V1 freshness (metadata only, not a filter):
- contrast_ratio of path vs surrounding terrain
- Report as: "high contrast" (likely fresh) / "low contrast" (likely stale)
- No binary classification. Operator sees all detections with freshness tag.
Component 3: Tier 3 — VLM Deep Analysis
| Solution | Tools | Advantages | Limitations | Requirements | Security | Performance | Fit |
|---|---|---|---|---|---|---|---|
| NanoLLM with VILA-2.7B or VILA1.5-3B (recommended) | NanoLLM Docker container, MLC/TVM quantization | Purpose-built for Jetson by NVIDIA team. Stable Docker containers. Optimized memory management. Supports VLMs natively. | Limited model selection (VILA, LLaVA, Obsidian). Not all VLMs available. | Docker, JetPack 6, NVMe for container storage | Local only, container isolation | ~15-25 tok/s on Orin Nano (4-bit MLC) | Most stable Jetson VLM option. |
| llama.cpp with GGUF VLM | llama.cpp, GGUF model files | Lightweight. No Docker needed. Proven stability on Jetson. Wide model support. | Manual build. Less optimized than NanoLLM for Jetson GPU. | llama.cpp build, GGUF weights | Local only | ~10-20 tok/s estimated | Fallback if NanoLLM doesn't support needed model. |
| System freezes, reboots, installation crashes on Orin Nano. Multiple open bugs. Not production-ready. | N/A | N/A | N/A | Not recommended. |
Model selection for NanoLLM:
- Primary: VILA1.5-3B (confirmed on Orin Nano, multimodal, 4-bit MLC)
- If UAV-VL-R1 GGUF weights become available: use via llama.cpp (aerial-specialized)
- Fallback: Obsidian-3B (mini VLM, lower accuracy but very fast)
Component 4: Camera Gimbal Control
| Solution | Tools | Advantages | Limitations | Requirements | Security | Performance | Fit |
|---|---|---|---|---|---|---|---|
| ViewLink serial driver + CRC-16 + PID + watchdog (recommended) | pyserial, crcmod, PID library, threading | Robust communication. CRC catches EMI-corrupted commands. Retry logic. Watchdog. | ViewLink protocol implementation from spec. Physical EMI mitigation required. | ViewPro docs, UART, shielded cable, 35cm antenna separation | Physical only | <10ms command + CRC overhead negligible | Production-grade. |
UART reliability layer:
Packet format: [SOF(2)] [CMD(N)] [CRC16(2)]
- SOF: 0xAA 0x55 (start of frame)
- CMD: ViewLink command bytes per protocol spec
- CRC16: CRC-CCITT over CMD bytes
- On send: compute CRC-16, append to ViewLink command packet
- On receive (gimbal feedback): validate CRC-16. Discard corrupted frames.
- On CRC failure (send): retry up to 3 times with 10ms delay. Log failure after 3 retries.
- Note: Check if ViewLink protocol already includes checksums (read full spec first). If so, use native checksum; don't add redundant CRC.
Physical EMI mitigation checklist:
- Gimbal UART cable: shielded, shortest possible run
- Video/data transmitter antenna: ≥35cm from gimbal (ViewPro recommendation)
- Independent power supply for gimbal (or filtered from main bus)
- Ferrite beads on UART cable near Jetson connector
Component 5: Recording, Logging & Telemetry
| Solution | Tools | Advantages | Limitations | Requirements | Security | Performance | Fit |
|---|---|---|---|---|---|---|---|
| NVMe-backed frame recorder + JSON-lines detection logger (recommended) | OpenCV VideoWriter / JPEG sequences, JSON lines, NVMe SSD | Post-flight review. Training data collection. Evidence. Detection audit trail. | NVMe write bandwidth (~500 MB/s) more than sufficient. Storage: ~2GB/min at 1080p 5FPS JPEG. | NVMe SSD ≥256GB | Physical access to NVMe | ~5ms per frame write (async) | Essential for field deployment. |
Detection log format (JSON lines, one per detection):
{
"ts": "2026-03-19T14:32:01.234Z",
"frame_id": 12345,
"gps_denied_lat": 48.123456,
"gps_denied_lon": 37.654321,
"tier": 1,
"class": "footpath",
"confidence": 0.72,
"bbox": [0.12, 0.34, 0.45, 0.67],
"freshness": "high_contrast",
"tier2_result": "concealed_position",
"tier2_confidence": 0.85,
"tier3_used": false,
"thumbnail_path": "frames/12345_det_0.jpg"
}
Frame recording strategy:
- Level 1: record every 5th frame (1-2 FPS) — overview coverage
- Level 2: record every frame (30 FPS) — detailed analysis footage
- Storage budget: 256GB NVMe ≈ 2 hours at Level 2 full rate, or 10+ hours at Level 1 rate
- Circular buffer: when storage >80% full, overwrite oldest Level 1 frames (keep Level 2)
Component 6: System Health & Resilience
Monitoring threads:
| Monitor | Check Interval | Threshold | Action |
|---|---|---|---|
| Thermal (T_junction) | 1s | >75°C | Degrade to Level 1 only |
| Thermal (T_junction) | 1s | >80°C | Disable semantic detection |
| Power (Jetson INA) | 2s | >80% budget | Disable VLM |
| Power (Jetson INA) | 2s | >90% budget | Reduce inference rate to 5 FPS |
| Gimbal heartbeat | 2s | No response | Force Level 1 sweep pattern |
| Semantic process | 5s | No heartbeat | Restart with 5s backoff, max 3 attempts |
| VLM process | 5s | No heartbeat | Mark Tier 3 unavailable, continue Tier 1+2 |
| NVMe free space | 60s | <20% free | Switch to Level 1 recording rate only |
| Frame quality | per frame | Laplacian var < threshold | Skip frame, use buffered good frame |
Graceful degradation (4 levels, unchanged from draft02):
| Level | Condition | Capability |
|---|---|---|
| 0 — Full | All nominal, T < 70°C | Tier 1+2+3, Level 1+2, gimbal, recording |
| 1 — No VLM | VLM unavailable or T > 75°C or power > 80% | Tier 1+2, Level 1+2, gimbal, recording |
| 2 — No semantic | Semantic crashed 3x or T > 80°C | Existing YOLO only, Level 1 sweep, recording |
| 3 — No gimbal | Gimbal UART failed 3x | Existing YOLO only, fixed camera, recording |
Component 7: Integration & Deployment
Hardware BOM additions (beyond existing system):
| Item | Purpose | Estimated Cost |
|---|---|---|
| NVMe SSD ≥256GB (industrial grade) | OS + models + recording + logging | $40-80 |
| Active cooling fan (30mm+) | Prevent thermal throttling | $10-20 |
| Ruggedized carrier board (e.g., MILBOX-ORNX or custom) | Vibration, temperature, dust protection | $200-500 |
| Shielded UART cable + ferrite beads | EMI protection for gimbal communication | $10-20 |
| Total additional hardware | ~$260-620 |
Software deployment:
- OS: JetPack 6.2 on NVMe SSD
- YOLOE models: TRT FP16 engines on NVMe (both 11 and 26 backbone variants)
- VLM: NanoLLM Docker container on NVMe
- Existing YOLO: current Cython + TRT pipeline (unchanged)
- New Cython modules: semantic detection, gimbal control, scan controller, recorder
- VLM process: separate Docker container, IPC via Unix socket
- Config: YAML file for all thresholds, class names, scan parameters, degradation thresholds
Version control & update strategy:
- Pin all dependency versions (Ultralytics, TensorRT, NanoLLM, OpenCV)
- Model updates: swap TRT engine files on NVMe, restart service
- Config updates: edit YAML, restart service
- No over-the-air updates (air-gapped system). USB drive for field updates.
Training & Data Strategy
Phase 0: Benchmark Sprint (Week 1-2)
- Deploy YOLOE-26s-seg and YOLOE-11s-seg in open-vocab mode
- Test text/visual prompts on semantic01-04.png + 50 additional frames
- Record results. Pick backbone with better qualitative detection.
- Deploy V1 heuristic endpoint analysis (no CNN, no training data needed)
- First field test flight with recording enabled
Phase 1: Field validation & data collection (Week 2-6)
- Deploy TRT FP16 engine with best backbone
- Record all flights to NVMe
- Operator marks detections as true/false positive in post-flight review
- Build annotation backlog from recorded frames
- Target: 500 annotated frames by week 6
Phase 2: Custom model training (Week 6-10)
- Fine-tune YOLOE-Seg on custom dataset (linear probing → full fine-tune)
- Train MobileNetV3-Small CNN on endpoint ROI crops
- A/B test: custom model vs YOLOE zero-shot on validation set
- Deploy winning model as new TRT engine
Phase 3: VLM & refinement (Week 8-14)
- Deploy NanoLLM with VILA1.5-3B
- Tune prompting on collected ambiguous cases
- Train freshness classifier (if enough annotated freshness labels exist)
- Target: 1500+ images per class
Phase 4: Seasonal expansion (Month 4+)
- Spring/summer annotation campaigns
- Re-train all models with multi-season data
- Adjust heuristic thresholds per season (configurable via YAML)
Testing Strategy
Integration / Functional Tests
- YOLOE text prompt detection on reference images (both 11 and 26 backbones)
- TRT FP16 export on Jetson Orin Nano Super (verify no OOM, no crash)
- V1 heuristic endpoint analysis on 20 synthetic masks (10 with hideouts, 10 without)
- Frame quality gate: inject blurry frames, verify rejection
- Gimbal CRC layer: inject corrupted commands, verify retry + log
- Gimbal watchdog: simulate hang, verify forced Level 1 within 2.5s
- NanoLLM VLM: load model, run inference on 10 aerial images, verify output + memory
- VLM load/unload cycle: 10 cycles without memory leak
- Detection logger: verify JSON-lines format, all fields populated
- Frame recorder: verify NVMe write speed, no dropped frames at 30 FPS
- Full pipeline end-to-end on recorded flight footage (offline replay)
- Graceful degradation: simulate each failure mode, verify correct degradation level
Non-Functional Tests
- Tier 1 latency on Jetson Orin Nano Super TRT FP16: ≤100ms (both backbones)
- Tier 2 latency (V1 heuristic): ≤50ms. (V2 CNN): ≤200ms
- Tier 3 latency (NanoLLM VLM): ≤5 seconds
- Memory peak: all components loaded < 7GB
- Thermal: 60-minute sustained inference, T_junction < 75°C with active cooling
- NVMe endurance: continuous recording for 2 hours, verify no write errors
- Power draw: measure at each degradation level, verify within UAV power budget
- EMI test: operate near data transmitter antenna, verify no gimbal anomalies with CRC layer
- Cold start: power on → first detection within 60 seconds (model load time)
- Vibration: mount Jetson on vibration table, run inference, compare detection accuracy vs static
Technology Maturity Assessment
| Component | Technology | Maturity | Risk | Mitigation |
|---|---|---|---|---|
| Tier 1 Detection | YOLOE/YOLO26/YOLO11 | Medium — YOLO26 is 3 months old, reported regressions on custom data. YOLOE-26 even newer. | Medium | Benchmark both backbones. YOLO11 is battle-tested fallback. Pin versions. |
| TRT FP16 Export | TensorRT on JetPack 6.2 | High — FP16 is stable on Jetson. Well-documented. | Low | FP16 only. Avoid INT8 initially. |
| TRT INT8 Export | TensorRT on JetPack 6.2 | Low — Documented crashes (PR #23928). Calibration issues. | High | Defer to Phase 3+. FP16 sufficient for now. |
| VLM (NanoLLM) | NanoLLM + VILA-3B | Medium-High — Purpose-built for Jetson by NVIDIA team. Docker-based. Monthly releases. | Low | More stable than vLLM. Use Docker containers. |
| VLM (vLLM) | vLLM on Jetson | Low — System freezes, crashes, open bugs. | High | Do not use. NanoLLM instead. |
| Path Tracing | Skeletonization + OpenCV | High — Decades-old algorithms. Well-understood. | Low | Pruning needed for noisy inputs. |
| CNN Classifier | MobileNetV3-Small + TRT | High — Proven architecture. TRT FP16 stable. | Low | Standard transfer learning. |
| Gimbal Control | ViewLink Serial Protocol | Medium — Protocol documented. ArduPilot driver exists. | Medium | EMI mitigation critical. CRC layer. |
| Freshness Assessment | Novel heuristic | Low — No prior art. Experimental. | High | V1: metadata only, not a filter. Iterate with data. |
| NVMe Storage | Industrial NVMe on Jetson | High — Production standard. SD card alternative is unreliable. | Low | Use industrial-grade SSD. |
| Ruggedized Hardware | MILBOX-ORNX or custom | High — Established product. Designed for Jetson + UAV. | Low | Standard procurement. |
References
- YOLO26 docs: https://docs.ultralytics.com/models/yolo26/
- YOLOE docs: https://docs.ultralytics.com/models/yoloe/
- YOLO26 accuracy regression: https://github.com/ultralytics/ultralytics/issues/23206
- YOLO26 TRT INT8 crash: https://github.com/ultralytics/ultralytics/issues/23841
- YOLO26 TRT OOM fix: https://github.com/ultralytics/ultralytics/pull/23928
- vLLM Jetson freezes: https://github.com/dusty-nv/jetson-containers/issues/800
- vLLM Jetson install crash: https://github.com/vllm-project/vllm/issues/23376
- NanoLLM docs: https://dusty-nv.github.io/NanoLLM/
- NanoVLM: https://jetson-ai-lab.com/tutorial_nano-vlm.html
- MILBOX-ORNX: https://forecr.io/products/jetson-orin-nx-orin-nano-rugged-compact-pc-milbox-ornx
- Jetson SD card corruption: https://forums.developer.nvidia.com/t/corrupted-sd-cards/265418
- Jetson thermal throttling: https://www.alibaba.com/product-insights/how-to-run-private-llama-3-inference-on-a-200-jetson-orin-nano-without-thermal-throttling.html
- ViewPro EMI issues: https://www.viewprouav.com/help/gimbal/
- UART CRC reliability: https://link.springer.com/chapter/10.1007/978-981-19-8563-8_23
- Military edge AI thermal: https://www.mobilityengineeringtech.com/component/content/article/53967
- UAV-VL-R1: https://arxiv.org/pdf/2508.11196
- Qwen2-VL-2B-GPTQ-INT8: https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct-GPTQ-Int8
- Ultralytics Jetson guide: https://docs.ultralytics.com/guides/nvidia-jetson
- Skelite: https://arxiv.org/html/2503.07369v1
- YOLO FP reduction: https://docs.ultralytics.com/yolov5/tutorials/tips_for_best_training_results