Files
detections-semantic/_docs/01_solution/solution_draft03.md
T
Oleksandr Bezdieniezhnykh 8e2ecf50fd Initial commit
Made-with: Cursor
2026-03-26 00:20:30 +02:00

24 KiB

Solution Draft

Assessment Findings

Old Component Solution Weak Point (functional/security/performance) New Solution
YOLO26 as sole detection backbone Accuracy regression on custom datasets: Reported YOLO26s "much less accurate" than YOLO11s on identical training data (GitHub #23206). YOLO26 is 3 months old — less battle-tested than YOLO11. Benchmark YOLO26 vs YOLO11 on initial annotated data before committing. YOLO11 as fallback. YOLOE supports both backbones (yoloe-11s-seg, yoloe-26s-seg).
YOLO26 TensorRT INT8 export INT8 export fails on Jetson (TRT Error Code 2, OOM). Fix merged (PR #23928) but indicates fragile tooling. Use FP16 only for initial deployment (confirmed stable). INT8 as future optimization after tooling matures. Pin Ultralytics version + JetPack version.
vLLM as VLM runtime Unstable on Jetson Orin Nano: system freezes, reboots, installation crashes, excessive memory (multiple open issues). Not production-ready for 8GB devices. Replace with NanoLLM/NanoVLM — purpose-built for Jetson by NVIDIA's Dusty-NV team. Docker containers for JetPack 5/6. Supports VILA, LLaVA. Stable. Or use llama.cpp with GGUF models (proven on Jetson).
No storage strategy SD card corruption: Recurring corruption documented across multiple Jetson Orin Nano users. SD cards unsuitable for production. Mandatory NVMe SSD for OS + models + logging. No SD card in production. Ruggedized NVMe mount for vibration resistance.
No EMI protection on UART ViewPro documents EMI issues: antennas cause random gimbal panning if within 35cm. Standard UART parity bit insufficient for noisy UAV environment. Add CRC-16 checksum layer on gimbal commands. Enforce 35cm antenna separation in physical design. Consider shielded UART cable. Command retry on CRC failure (max 3 retries, then log error).
No environmental hardening addressed UAV environment: vibration, temperature extremes (-20°C to +50°C), dust, EMI, power fluctuations. Dev kit form factor is not field-deployable. Use ruggedized carrier board (MILBOX-ORNX or similar) with vibration dampening. Conformal coating on exposed connectors. External temperature sensor for environmental monitoring.
No logging or telemetry No post-flight review capability: field system must log all detections with metadata for model iteration, operator review, and evidence collection. Add detection logging: timestamp, GPS-denied coordinates, confidence score, detection class, JPEG thumbnail, tier that triggered, freshness metadata. Log to NVMe SSD. Export as structured format (JSON lines) after flight.
No frame recording for offline replay Training data collection depends on field recording: Without recording, no way to build training dataset from real flights. Record all camera frames to NVMe at configurable rate (1-5 FPS during Level 1, full rate during Level 2). Include detection overlay option. Post-flight: use recordings for annotation.
No power management UAV power budget is finite: Jetson at 15W + gimbal + camera + radio. No monitoring of power draw or load shedding. Monitor power consumption via Jetson's INA sensors. Power budget alert at 80% of allocated watts. Load shedding: disable VLM first, then reduce inference rate, then disable semantic detection.
YOLO26 not validated for this domain No benchmark on aerial concealment detection: All YOLO26 numbers are on COCO/LVIS. Concealment detection may behave very differently. First sprint deliverable: benchmark YOLOE-26 (both 11 and 26 backbones) on semantic01-04.png with text/visual prompts. Report AP on initial annotated validation set before committing to backbone.
Freshness and path tracing are untested algorithms No proven prior art: Both freshness assessment and path-following via skeletonization are novel combinations. Risk of over-engineering before validation. Implement minimal viable versions first. V1 path tracing: skeleton + endpoint only, no freshness, no junction following. Validate on real flight data before adding complexity.

Product Solution Description

A three-tier semantic detection system for identifying concealed/camouflaged positions from reconnaissance UAV aerial imagery, running on Jetson Orin Nano Super with NVMe SSD storage, active cooling, and ruggedized carrier board, alongside the existing YOLO detection pipeline.

┌──────────────────────────────────────────────────────────────────────────┐
│          JETSON ORIN NANO SUPER (ruggedized carrier, NVMe, 15W)         │
│                                                                          │
│  ┌──────────┐    ┌──────────────┐    ┌──────────────┐    ┌───────────┐  │
│  │ ViewPro  │───▶│  Tier 1      │───▶│  Tier 2      │───▶│ Tier 3    │  │
│  │ A40      │    │  YOLOE       │    │  Path Trace  │    │ VLM       │  │
│  │ Camera   │    │  (11 or 26   │    │  + CNN       │    │ NanoLLM   │  │
│  │ + Frame  │    │   backbone)  │    │  ≤200ms      │    │ (L2 only) │  │
│  │ Quality  │    │  TRT FP16    │    │              │    │ ≤5s       │  │
│  │ Gate     │    │  ≤100ms      │    │              │    │           │  │
│  └────▲─────┘    └──────────────┘    └──────────────┘    └───────────┘  │
│       │                                                                  │
│  ┌────┴─────┐    ┌──────────────┐    ┌──────────────┐    ┌───────────┐  │
│  │ Gimbal   │◀───│  Scan        │    │  Watchdog    │    │ Recorder  │  │
│  │ Control  │    │  Controller  │    │  + Thermal   │    │ + Logger  │  │
│  │ + CRC    │    │  (L1/L2 FSM) │    │  + Power     │    │ (NVMe)   │  │
│  └──────────┘    └──────────────┘    └──────────────┘    └───────────┘  │
│                                                                          │
│  ┌──────────────────────────────┐                                       │
│  │  Existing YOLO Detection    │ (always running, scene context)        │
│  │  Cython + TRT               │                                        │
│  └──────────────────────────────┘                                       │
└──────────────────────────────────────────────────────────────────────────┘

Key changes from draft02:

  • YOLOE backbone is configurable (YOLO11 or YOLO26) — benchmark before committing
  • NanoLLM replaces vLLM as VLM runtime (purpose-built for Jetson, stable)
  • NVMe SSD mandatory — no SD card in production
  • CRC-16 on gimbal UART — EMI protection
  • Detection logger + frame recorder — post-flight review and training data collection
  • Ruggedized carrier board — vibration, temperature, dust protection
  • Power monitoring + load shedding — finite UAV power budget
  • FP16 only for initial deployment (INT8 export unstable on Jetson)
  • Minimal V1 for unproven components — path tracing and freshness start simple

Architecture

Component 1: Tier 1 — Real-Time Detection

Solution Tools Advantages Limitations Requirements Security Performance Fit
YOLOE with configurable backbone (recommended) yoloe-11s-seg.pt or yoloe-26s-seg.pt, set_classes() → TRT FP16 Supports both YOLO11 and YOLO26 backbones. Benchmark on real data, pick winner. set_classes() bakes CLIP embeddings for zero overhead. YOLO26 may regress on custom data vs YOLO11. Needs empirical comparison. Ultralytics ≥8.4 (pinned version), TensorRT, JetPack 6.2 Local only YOLO11s TRT FP16: ~7ms (640px). YOLO26s: similar or slightly faster. Best fit. Hedge against backbone risk.

Version pinning strategy:

  • Pin ultralytics==8.4.X (specific patch version validated on Jetson)
  • Pin JetPack 6.2 + TensorRT version
  • Test every Ultralytics update in staging before deploying to production
  • Keep both yoloe-11s-seg and yoloe-26s-seg TRT engines on NVMe; switch via config

YOLO backbone selection process (Sprint 1):

  1. Annotate 200 frames from real flight footage (footpaths, branch piles, entrances)
  2. Fine-tune YOLOE-11s-seg and YOLOE-26s-seg on same dataset, same hyperparameters
  3. Evaluate on held-out validation set (50 frames)
  4. Pick backbone with higher mAP50
  5. If delta < 2%: pick YOLO26 (faster CPU inference, NMS-free deployment)

Component 2: Tier 2 — Spatial Reasoning & CNN Confirmation

Solution Tools Advantages Limitations Requirements Security Performance Fit
V1 minimal path tracing + heuristic classifier (recommended for initial release) OpenCV, scikit-image No training data needed. Skeleton + endpoint detection + simple heuristic: "dark mass at endpoint → flag." Fast to implement and validate. Low accuracy. Many false positives. OpenCV, scikit-image Offline ~30ms V1: ship fast, validate on real data.
V2 trained CNN (after data collection) MobileNetV3-Small, TensorRT FP16 Higher accuracy after training. Dynamic ROI sizing. Needs 300+ positive, 1000+ negative annotated ROI crops. PyTorch, TRT export Offline ~5-10ms classification V2: replace heuristic once data exists.

V1 heuristic for endpoint analysis (no training data needed):

  1. Skeletonize footpath mask with branch pruning
  2. Find endpoints
  3. For each endpoint: extract ROI (dynamic size based on GSD)
  4. Compute: mean_darkness = mean intensity in ROI center 50%. contrast = (surrounding_mean - center_mean) / surrounding_mean. area_ratio = dark_pixel_count / total_pixels.
  5. If mean_darkness < threshold_dark AND contrast > threshold_contrast → flag as potential concealed position
  6. Thresholds: configurable, tuned per season. Start with winter values.

V1 freshness (metadata only, not a filter):

  • contrast_ratio of path vs surrounding terrain
  • Report as: "high contrast" (likely fresh) / "low contrast" (likely stale)
  • No binary classification. Operator sees all detections with freshness tag.

Component 3: Tier 3 — VLM Deep Analysis

Solution Tools Advantages Limitations Requirements Security Performance Fit
NanoLLM with VILA-2.7B or VILA1.5-3B (recommended) NanoLLM Docker container, MLC/TVM quantization Purpose-built for Jetson by NVIDIA team. Stable Docker containers. Optimized memory management. Supports VLMs natively. Limited model selection (VILA, LLaVA, Obsidian). Not all VLMs available. Docker, JetPack 6, NVMe for container storage Local only, container isolation ~15-25 tok/s on Orin Nano (4-bit MLC) Most stable Jetson VLM option.
llama.cpp with GGUF VLM llama.cpp, GGUF model files Lightweight. No Docker needed. Proven stability on Jetson. Wide model support. Manual build. Less optimized than NanoLLM for Jetson GPU. llama.cpp build, GGUF weights Local only ~10-20 tok/s estimated Fallback if NanoLLM doesn't support needed model.
vLLM vLLM Docker High throughput System freezes, reboots, installation crashes on Orin Nano. Multiple open bugs. Not production-ready. N/A N/A N/A Not recommended.

Model selection for NanoLLM:

  • Primary: VILA1.5-3B (confirmed on Orin Nano, multimodal, 4-bit MLC)
  • If UAV-VL-R1 GGUF weights become available: use via llama.cpp (aerial-specialized)
  • Fallback: Obsidian-3B (mini VLM, lower accuracy but very fast)

Component 4: Camera Gimbal Control

Solution Tools Advantages Limitations Requirements Security Performance Fit
ViewLink serial driver + CRC-16 + PID + watchdog (recommended) pyserial, crcmod, PID library, threading Robust communication. CRC catches EMI-corrupted commands. Retry logic. Watchdog. ViewLink protocol implementation from spec. Physical EMI mitigation required. ViewPro docs, UART, shielded cable, 35cm antenna separation Physical only <10ms command + CRC overhead negligible Production-grade.

UART reliability layer:

Packet format: [SOF(2)] [CMD(N)] [CRC16(2)]
- SOF: 0xAA 0x55 (start of frame)
- CMD: ViewLink command bytes per protocol spec
- CRC16: CRC-CCITT over CMD bytes
  • On send: compute CRC-16, append to ViewLink command packet
  • On receive (gimbal feedback): validate CRC-16. Discard corrupted frames.
  • On CRC failure (send): retry up to 3 times with 10ms delay. Log failure after 3 retries.
  • Note: Check if ViewLink protocol already includes checksums (read full spec first). If so, use native checksum; don't add redundant CRC.

Physical EMI mitigation checklist:

  • Gimbal UART cable: shielded, shortest possible run
  • Video/data transmitter antenna: ≥35cm from gimbal (ViewPro recommendation)
  • Independent power supply for gimbal (or filtered from main bus)
  • Ferrite beads on UART cable near Jetson connector

Component 5: Recording, Logging & Telemetry

Solution Tools Advantages Limitations Requirements Security Performance Fit
NVMe-backed frame recorder + JSON-lines detection logger (recommended) OpenCV VideoWriter / JPEG sequences, JSON lines, NVMe SSD Post-flight review. Training data collection. Evidence. Detection audit trail. NVMe write bandwidth (~500 MB/s) more than sufficient. Storage: ~2GB/min at 1080p 5FPS JPEG. NVMe SSD ≥256GB Physical access to NVMe ~5ms per frame write (async) Essential for field deployment.

Detection log format (JSON lines, one per detection):

{
  "ts": "2026-03-19T14:32:01.234Z",
  "frame_id": 12345,
  "gps_denied_lat": 48.123456,
  "gps_denied_lon": 37.654321,
  "tier": 1,
  "class": "footpath",
  "confidence": 0.72,
  "bbox": [0.12, 0.34, 0.45, 0.67],
  "freshness": "high_contrast",
  "tier2_result": "concealed_position",
  "tier2_confidence": 0.85,
  "tier3_used": false,
  "thumbnail_path": "frames/12345_det_0.jpg"
}

Frame recording strategy:

  • Level 1: record every 5th frame (1-2 FPS) — overview coverage
  • Level 2: record every frame (30 FPS) — detailed analysis footage
  • Storage budget: 256GB NVMe ≈ 2 hours at Level 2 full rate, or 10+ hours at Level 1 rate
  • Circular buffer: when storage >80% full, overwrite oldest Level 1 frames (keep Level 2)

Component 6: System Health & Resilience

Monitoring threads:

Monitor Check Interval Threshold Action
Thermal (T_junction) 1s >75°C Degrade to Level 1 only
Thermal (T_junction) 1s >80°C Disable semantic detection
Power (Jetson INA) 2s >80% budget Disable VLM
Power (Jetson INA) 2s >90% budget Reduce inference rate to 5 FPS
Gimbal heartbeat 2s No response Force Level 1 sweep pattern
Semantic process 5s No heartbeat Restart with 5s backoff, max 3 attempts
VLM process 5s No heartbeat Mark Tier 3 unavailable, continue Tier 1+2
NVMe free space 60s <20% free Switch to Level 1 recording rate only
Frame quality per frame Laplacian var < threshold Skip frame, use buffered good frame

Graceful degradation (4 levels, unchanged from draft02):

Level Condition Capability
0 — Full All nominal, T < 70°C Tier 1+2+3, Level 1+2, gimbal, recording
1 — No VLM VLM unavailable or T > 75°C or power > 80% Tier 1+2, Level 1+2, gimbal, recording
2 — No semantic Semantic crashed 3x or T > 80°C Existing YOLO only, Level 1 sweep, recording
3 — No gimbal Gimbal UART failed 3x Existing YOLO only, fixed camera, recording

Component 7: Integration & Deployment

Hardware BOM additions (beyond existing system):

Item Purpose Estimated Cost
NVMe SSD ≥256GB (industrial grade) OS + models + recording + logging $40-80
Active cooling fan (30mm+) Prevent thermal throttling $10-20
Ruggedized carrier board (e.g., MILBOX-ORNX or custom) Vibration, temperature, dust protection $200-500
Shielded UART cable + ferrite beads EMI protection for gimbal communication $10-20
Total additional hardware ~$260-620

Software deployment:

  • OS: JetPack 6.2 on NVMe SSD
  • YOLOE models: TRT FP16 engines on NVMe (both 11 and 26 backbone variants)
  • VLM: NanoLLM Docker container on NVMe
  • Existing YOLO: current Cython + TRT pipeline (unchanged)
  • New Cython modules: semantic detection, gimbal control, scan controller, recorder
  • VLM process: separate Docker container, IPC via Unix socket
  • Config: YAML file for all thresholds, class names, scan parameters, degradation thresholds

Version control & update strategy:

  • Pin all dependency versions (Ultralytics, TensorRT, NanoLLM, OpenCV)
  • Model updates: swap TRT engine files on NVMe, restart service
  • Config updates: edit YAML, restart service
  • No over-the-air updates (air-gapped system). USB drive for field updates.

Training & Data Strategy

Phase 0: Benchmark Sprint (Week 1-2)

  • Deploy YOLOE-26s-seg and YOLOE-11s-seg in open-vocab mode
  • Test text/visual prompts on semantic01-04.png + 50 additional frames
  • Record results. Pick backbone with better qualitative detection.
  • Deploy V1 heuristic endpoint analysis (no CNN, no training data needed)
  • First field test flight with recording enabled

Phase 1: Field validation & data collection (Week 2-6)

  • Deploy TRT FP16 engine with best backbone
  • Record all flights to NVMe
  • Operator marks detections as true/false positive in post-flight review
  • Build annotation backlog from recorded frames
  • Target: 500 annotated frames by week 6

Phase 2: Custom model training (Week 6-10)

  • Fine-tune YOLOE-Seg on custom dataset (linear probing → full fine-tune)
  • Train MobileNetV3-Small CNN on endpoint ROI crops
  • A/B test: custom model vs YOLOE zero-shot on validation set
  • Deploy winning model as new TRT engine

Phase 3: VLM & refinement (Week 8-14)

  • Deploy NanoLLM with VILA1.5-3B
  • Tune prompting on collected ambiguous cases
  • Train freshness classifier (if enough annotated freshness labels exist)
  • Target: 1500+ images per class

Phase 4: Seasonal expansion (Month 4+)

  • Spring/summer annotation campaigns
  • Re-train all models with multi-season data
  • Adjust heuristic thresholds per season (configurable via YAML)

Testing Strategy

Integration / Functional Tests

  • YOLOE text prompt detection on reference images (both 11 and 26 backbones)
  • TRT FP16 export on Jetson Orin Nano Super (verify no OOM, no crash)
  • V1 heuristic endpoint analysis on 20 synthetic masks (10 with hideouts, 10 without)
  • Frame quality gate: inject blurry frames, verify rejection
  • Gimbal CRC layer: inject corrupted commands, verify retry + log
  • Gimbal watchdog: simulate hang, verify forced Level 1 within 2.5s
  • NanoLLM VLM: load model, run inference on 10 aerial images, verify output + memory
  • VLM load/unload cycle: 10 cycles without memory leak
  • Detection logger: verify JSON-lines format, all fields populated
  • Frame recorder: verify NVMe write speed, no dropped frames at 30 FPS
  • Full pipeline end-to-end on recorded flight footage (offline replay)
  • Graceful degradation: simulate each failure mode, verify correct degradation level

Non-Functional Tests

  • Tier 1 latency on Jetson Orin Nano Super TRT FP16: ≤100ms (both backbones)
  • Tier 2 latency (V1 heuristic): ≤50ms. (V2 CNN): ≤200ms
  • Tier 3 latency (NanoLLM VLM): ≤5 seconds
  • Memory peak: all components loaded < 7GB
  • Thermal: 60-minute sustained inference, T_junction < 75°C with active cooling
  • NVMe endurance: continuous recording for 2 hours, verify no write errors
  • Power draw: measure at each degradation level, verify within UAV power budget
  • EMI test: operate near data transmitter antenna, verify no gimbal anomalies with CRC layer
  • Cold start: power on → first detection within 60 seconds (model load time)
  • Vibration: mount Jetson on vibration table, run inference, compare detection accuracy vs static

Technology Maturity Assessment

Component Technology Maturity Risk Mitigation
Tier 1 Detection YOLOE/YOLO26/YOLO11 Medium — YOLO26 is 3 months old, reported regressions on custom data. YOLOE-26 even newer. Medium Benchmark both backbones. YOLO11 is battle-tested fallback. Pin versions.
TRT FP16 Export TensorRT on JetPack 6.2 High — FP16 is stable on Jetson. Well-documented. Low FP16 only. Avoid INT8 initially.
TRT INT8 Export TensorRT on JetPack 6.2 Low — Documented crashes (PR #23928). Calibration issues. High Defer to Phase 3+. FP16 sufficient for now.
VLM (NanoLLM) NanoLLM + VILA-3B Medium-High — Purpose-built for Jetson by NVIDIA team. Docker-based. Monthly releases. Low More stable than vLLM. Use Docker containers.
VLM (vLLM) vLLM on Jetson Low — System freezes, crashes, open bugs. High Do not use. NanoLLM instead.
Path Tracing Skeletonization + OpenCV High — Decades-old algorithms. Well-understood. Low Pruning needed for noisy inputs.
CNN Classifier MobileNetV3-Small + TRT High — Proven architecture. TRT FP16 stable. Low Standard transfer learning.
Gimbal Control ViewLink Serial Protocol Medium — Protocol documented. ArduPilot driver exists. Medium EMI mitigation critical. CRC layer.
Freshness Assessment Novel heuristic Low — No prior art. Experimental. High V1: metadata only, not a filter. Iterate with data.
NVMe Storage Industrial NVMe on Jetson High — Production standard. SD card alternative is unreliable. Low Use industrial-grade SSD.
Ruggedized Hardware MILBOX-ORNX or custom High — Established product. Designed for Jetson + UAV. Low Standard procurement.

References