detections-semantic/_docs/01_solution/solution_draft03.md

# Solution Draft

## Assessment Findings

| Old Component Solution | Weak Point (functional/security/performance) | New Solution |
|------------------------|----------------------------------------------|-------------|
| YOLO26 as sole detection backbone | **Accuracy regression on custom datasets**: Reported YOLO26s "much less accurate" than YOLO11s on identical training data (GitHub #23206). YOLO26 is 3 months old — less battle-tested than YOLO11. | Benchmark YOLO26 vs YOLO11 on initial annotated data before committing. YOLO11 as fallback. YOLOE supports both backbones (yoloe-11s-seg, yoloe-26s-seg). |
| YOLO26 TensorRT INT8 export | **INT8 export fails on Jetson** (TRT Error Code 2, OOM). Fix merged (PR #23928) but indicates fragile tooling. | Use FP16 only for initial deployment (confirmed stable). INT8 as future optimization after tooling matures. Pin Ultralytics version + JetPack version. |
| vLLM as VLM runtime | **Unstable on Jetson Orin Nano**: system freezes, reboots, installation crashes, excessive memory (multiple open issues). Not production-ready for 8GB devices. | **Replace with NanoLLM/NanoVLM** — purpose-built for Jetson by NVIDIA's Dusty-NV team. Docker containers for JetPack 5/6. Supports VILA, LLaVA. Stable. Or use llama.cpp with GGUF models (proven on Jetson). |
| No storage strategy | **SD card corruption**: Recurring corruption documented across multiple Jetson Orin Nano users. SD cards unsuitable for production. | **Mandatory NVMe SSD** for OS + models + logging. No SD card in production. Ruggedized NVMe mount for vibration resistance. |
| No EMI protection on UART | **ViewPro documents EMI issues**: antennas cause random gimbal panning if within 35cm. Standard UART parity bit insufficient for noisy UAV environment. | Add CRC-16 checksum layer on gimbal commands. Enforce 35cm antenna separation in physical design. Consider shielded UART cable. Command retry on CRC failure (max 3 retries, then log error). |
| No environmental hardening addressed | **UAV environment**: vibration, temperature extremes (-20°C to +50°C), dust, EMI, power fluctuations. Dev kit form factor is not field-deployable. | Use ruggedized carrier board (MILBOX-ORNX or similar) with vibration dampening. Conformal coating on exposed connectors. External temperature sensor for environmental monitoring. |
| No logging or telemetry | **No post-flight review capability**: field system must log all detections with metadata for model iteration, operator review, and evidence collection. | Add detection logging: timestamp, GPS-denied coordinates, confidence score, detection class, JPEG thumbnail, tier that triggered, freshness metadata. Log to NVMe SSD. Export as structured format (JSON lines) after flight. |
| No frame recording for offline replay | **Training data collection depends on field recording**: Without recording, no way to build training dataset from real flights. | Record all camera frames to NVMe at configurable rate (1-5 FPS during Level 1, full rate during Level 2). Include detection overlay option. Post-flight: use recordings for annotation. |
| No power management | **UAV power budget is finite**: Jetson at 15W + gimbal + camera + radio. No monitoring of power draw or load shedding. | Monitor power consumption via Jetson's INA sensors. Power budget alert at 80% of allocated watts. Load shedding: disable VLM first, then reduce inference rate, then disable semantic detection. |
| YOLO26 not validated for this domain | **No benchmark on aerial concealment detection**: All YOLO26 numbers are on COCO/LVIS. Concealment detection may behave very differently. | First sprint deliverable: benchmark YOLOE-26 (both 11 and 26 backbones) on semantic01-04.png with text/visual prompts. Report AP on initial annotated validation set before committing to backbone. |
| Freshness and path tracing are untested algorithms | **No proven prior art**: Both freshness assessment and path-following via skeletonization are novel combinations. Risk of over-engineering before validation. | Implement minimal viable versions first. V1 path tracing: skeleton + endpoint only, no freshness, no junction following. Validate on real flight data before adding complexity. |

## Product Solution Description

A three-tier semantic detection system for identifying concealed/camouflaged positions from reconnaissance UAV aerial imagery, running on Jetson Orin Nano Super with NVMe SSD storage, active cooling, and ruggedized carrier board, alongside the existing YOLO detection pipeline.

```
┌──────────────────────────────────────────────────────────────────────────┐
│          JETSON ORIN NANO SUPER (ruggedized carrier, NVMe, 15W)         │
│                                                                          │
│  ┌──────────┐    ┌──────────────┐    ┌──────────────┐    ┌───────────┐  │
│  │ ViewPro  │───▶│  Tier 1      │───▶│  Tier 2      │───▶│ Tier 3    │  │
│  │ A40      │    │  YOLOE       │    │  Path Trace  │    │ VLM       │  │
│  │ Camera   │    │  (11 or 26   │    │  + CNN       │    │ NanoLLM   │  │
│  │ + Frame  │    │   backbone)  │    │  ≤200ms      │    │ (L2 only) │  │
│  │ Quality  │    │  TRT FP16    │    │              │    │ ≤5s       │  │
│  │ Gate     │    │  ≤100ms      │    │              │    │           │  │
│  └────▲─────┘    └──────────────┘    └──────────────┘    └───────────┘  │
│       │                                                                  │
│  ┌────┴─────┐    ┌──────────────┐    ┌──────────────┐    ┌───────────┐  │
│  │ Gimbal   │◀───│  Scan        │    │  Watchdog    │    │ Recorder  │  │
│  │ Control  │    │  Controller  │    │  + Thermal   │    │ + Logger  │  │
│  │ + CRC    │    │  (L1/L2 FSM) │    │  + Power     │    │ (NVMe)   │  │
│  └──────────┘    └──────────────┘    └──────────────┘    └───────────┘  │
│                                                                          │
│  ┌──────────────────────────────┐                                       │
│  │  Existing YOLO Detection    │ (always running, scene context)        │
│  │  Cython + TRT               │                                        │
│  └──────────────────────────────┘                                       │
└──────────────────────────────────────────────────────────────────────────┘
```

Key changes from draft02:
- **YOLOE backbone is configurable** (YOLO11 or YOLO26) — benchmark before committing
- **NanoLLM replaces vLLM** as VLM runtime (purpose-built for Jetson, stable)
- **NVMe SSD mandatory** — no SD card in production
- **CRC-16 on gimbal UART** — EMI protection
- **Detection logger + frame recorder** — post-flight review and training data collection
- **Ruggedized carrier board** — vibration, temperature, dust protection
- **Power monitoring + load shedding** — finite UAV power budget
- **FP16 only** for initial deployment (INT8 export unstable on Jetson)
- **Minimal V1 for unproven components** — path tracing and freshness start simple

## Architecture

### Component 1: Tier 1 — Real-Time Detection

| Solution | Tools | Advantages | Limitations | Requirements | Security | Performance | Fit |
|----------|-------|-----------|-------------|-------------|----------|------------|-----|
| **YOLOE with configurable backbone (recommended)** | yoloe-11s-seg.pt or yoloe-26s-seg.pt, set_classes() → TRT FP16 | Supports both YOLO11 and YOLO26 backbones. Benchmark on real data, pick winner. set_classes() bakes CLIP embeddings for zero overhead. | YOLO26 may regress on custom data vs YOLO11. Needs empirical comparison. | Ultralytics ≥8.4 (pinned version), TensorRT, JetPack 6.2 | Local only | YOLO11s TRT FP16: ~7ms (640px). YOLO26s: similar or slightly faster. | **Best fit. Hedge against backbone risk.** |

**Version pinning strategy**:
- Pin `ultralytics==8.4.X` (specific patch version validated on Jetson)
- Pin JetPack 6.2 + TensorRT version
- Test every Ultralytics update in staging before deploying to production
- Keep both yoloe-11s-seg and yoloe-26s-seg TRT engines on NVMe; switch via config

**YOLO backbone selection process (Sprint 1)**:
1. Annotate 200 frames from real flight footage (footpaths, branch piles, entrances)
2. Fine-tune YOLOE-11s-seg and YOLOE-26s-seg on same dataset, same hyperparameters
3. Evaluate on held-out validation set (50 frames)
4. Pick backbone with higher mAP50
5. If delta < 2%: pick YOLO26 (faster CPU inference, NMS-free deployment)

### Component 2: Tier 2 — Spatial Reasoning & CNN Confirmation

| Solution | Tools | Advantages | Limitations | Requirements | Security | Performance | Fit |
|----------|-------|-----------|-------------|-------------|----------|------------|-----|
| **V1 minimal path tracing + heuristic classifier (recommended for initial release)** | OpenCV, scikit-image | No training data needed. Skeleton + endpoint detection + simple heuristic: "dark mass at endpoint → flag." Fast to implement and validate. | Low accuracy. Many false positives. | OpenCV, scikit-image | Offline | ~30ms | **V1: ship fast, validate on real data.** |
| **V2 trained CNN (after data collection)** | MobileNetV3-Small, TensorRT FP16 | Higher accuracy after training. Dynamic ROI sizing. | Needs 300+ positive, 1000+ negative annotated ROI crops. | PyTorch, TRT export | Offline | ~5-10ms classification | **V2: replace heuristic once data exists.** |

**V1 heuristic for endpoint analysis** (no training data needed):
1. Skeletonize footpath mask with branch pruning
2. Find endpoints
3. For each endpoint: extract ROI (dynamic size based on GSD)
4. Compute: mean_darkness = mean intensity in ROI center 50%. contrast = (surrounding_mean - center_mean) / surrounding_mean. area_ratio = dark_pixel_count / total_pixels.
5. If mean_darkness < threshold_dark AND contrast > threshold_contrast → flag as potential concealed position
6. Thresholds: configurable, tuned per season. Start with winter values.

**V1 freshness** (metadata only, not a filter):
- contrast_ratio of path vs surrounding terrain
- Report as: "high contrast" (likely fresh) / "low contrast" (likely stale)
- No binary classification. Operator sees all detections with freshness tag.

### Component 3: Tier 3 — VLM Deep Analysis

| Solution | Tools | Advantages | Limitations | Requirements | Security | Performance | Fit |
|----------|-------|-----------|-------------|-------------|----------|------------|-----|
| **NanoLLM with VILA-2.7B or VILA1.5-3B (recommended)** | NanoLLM Docker container, MLC/TVM quantization | Purpose-built for Jetson by NVIDIA team. Stable Docker containers. Optimized memory management. Supports VLMs natively. | Limited model selection (VILA, LLaVA, Obsidian). Not all VLMs available. | Docker, JetPack 6, NVMe for container storage | Local only, container isolation | ~15-25 tok/s on Orin Nano (4-bit MLC) | **Most stable Jetson VLM option.** |
| llama.cpp with GGUF VLM | llama.cpp, GGUF model files | Lightweight. No Docker needed. Proven stability on Jetson. Wide model support. | Manual build. Less optimized than NanoLLM for Jetson GPU. | llama.cpp build, GGUF weights | Local only | ~10-20 tok/s estimated | **Fallback if NanoLLM doesn't support needed model.** |
| ~~vLLM~~ | ~~vLLM Docker~~ | ~~High throughput~~ | **System freezes, reboots, installation crashes on Orin Nano. Multiple open bugs. Not production-ready.** | N/A | N/A | N/A | **Not recommended.** |

**Model selection for NanoLLM**:
- Primary: VILA1.5-3B (confirmed on Orin Nano, multimodal, 4-bit MLC)
- If UAV-VL-R1 GGUF weights become available: use via llama.cpp (aerial-specialized)
- Fallback: Obsidian-3B (mini VLM, lower accuracy but very fast)

### Component 4: Camera Gimbal Control

| Solution | Tools | Advantages | Limitations | Requirements | Security | Performance | Fit |
|----------|-------|-----------|-------------|-------------|----------|------------|-----|
| **ViewLink serial driver + CRC-16 + PID + watchdog (recommended)** | pyserial, crcmod, PID library, threading | Robust communication. CRC catches EMI-corrupted commands. Retry logic. Watchdog. | ViewLink protocol implementation from spec. Physical EMI mitigation required. | ViewPro docs, UART, shielded cable, 35cm antenna separation | Physical only | <10ms command + CRC overhead negligible | **Production-grade.** |

**UART reliability layer**:
```
Packet format: [SOF(2)] [CMD(N)] [CRC16(2)]
- SOF: 0xAA 0x55 (start of frame)
- CMD: ViewLink command bytes per protocol spec
- CRC16: CRC-CCITT over CMD bytes
```
- On send: compute CRC-16, append to ViewLink command packet
- On receive (gimbal feedback): validate CRC-16. Discard corrupted frames.
- On CRC failure (send): retry up to 3 times with 10ms delay. Log failure after 3 retries.
- Note: Check if ViewLink protocol already includes checksums (read full spec first). If so, use native checksum; don't add redundant CRC.

**Physical EMI mitigation checklist**:
- [ ] Gimbal UART cable: shielded, shortest possible run
- [ ] Video/data transmitter antenna: ≥35cm from gimbal (ViewPro recommendation)
- [ ] Independent power supply for gimbal (or filtered from main bus)
- [ ] Ferrite beads on UART cable near Jetson connector

### Component 5: Recording, Logging & Telemetry

| Solution | Tools | Advantages | Limitations | Requirements | Security | Performance | Fit |
|----------|-------|-----------|-------------|-------------|----------|------------|-----|
| **NVMe-backed frame recorder + JSON-lines detection logger (recommended)** | OpenCV VideoWriter / JPEG sequences, JSON lines, NVMe SSD | Post-flight review. Training data collection. Evidence. Detection audit trail. | NVMe write bandwidth (~500 MB/s) more than sufficient. Storage: ~2GB/min at 1080p 5FPS JPEG. | NVMe SSD ≥256GB | Physical access to NVMe | ~5ms per frame write (async) | **Essential for field deployment.** |

**Detection log format** (JSON lines, one per detection):
```json
{
  "ts": "2026-03-19T14:32:01.234Z",
  "frame_id": 12345,
  "gps_denied_lat": 48.123456,
  "gps_denied_lon": 37.654321,
  "tier": 1,
  "class": "footpath",
  "confidence": 0.72,
  "bbox": [0.12, 0.34, 0.45, 0.67],
  "freshness": "high_contrast",
  "tier2_result": "concealed_position",
  "tier2_confidence": 0.85,
  "tier3_used": false,
  "thumbnail_path": "frames/12345_det_0.jpg"
}
```

**Frame recording strategy**:
- Level 1: record every 5th frame (1-2 FPS) — overview coverage
- Level 2: record every frame (30 FPS) — detailed analysis footage
- Storage budget: 256GB NVMe ≈ 2 hours at Level 2 full rate, or 10+ hours at Level 1 rate
- Circular buffer: when storage >80% full, overwrite oldest Level 1 frames (keep Level 2)

### Component 6: System Health & Resilience

**Monitoring threads**:

| Monitor | Check Interval | Threshold | Action |
|---------|---------------|-----------|--------|
| Thermal (T_junction) | 1s | >75°C | Degrade to Level 1 only |
| Thermal (T_junction) | 1s | >80°C | Disable semantic detection |
| Power (Jetson INA) | 2s | >80% budget | Disable VLM |
| Power (Jetson INA) | 2s | >90% budget | Reduce inference rate to 5 FPS |
| Gimbal heartbeat | 2s | No response | Force Level 1 sweep pattern |
| Semantic process | 5s | No heartbeat | Restart with 5s backoff, max 3 attempts |
| VLM process | 5s | No heartbeat | Mark Tier 3 unavailable, continue Tier 1+2 |
| NVMe free space | 60s | <20% free | Switch to Level 1 recording rate only |
| Frame quality | per frame | Laplacian var < threshold | Skip frame, use buffered good frame |

**Graceful degradation** (4 levels, unchanged from draft02):

| Level | Condition | Capability |
|-------|-----------|-----------|
| 0 — Full | All nominal, T < 70°C | Tier 1+2+3, Level 1+2, gimbal, recording |
| 1 — No VLM | VLM unavailable or T > 75°C or power > 80% | Tier 1+2, Level 1+2, gimbal, recording |
| 2 — No semantic | Semantic crashed 3x or T > 80°C | Existing YOLO only, Level 1 sweep, recording |
| 3 — No gimbal | Gimbal UART failed 3x | Existing YOLO only, fixed camera, recording |

### Component 7: Integration & Deployment

**Hardware BOM additions** (beyond existing system):

| Item | Purpose | Estimated Cost |
|------|---------|----------------|
| NVMe SSD ≥256GB (industrial grade) | OS + models + recording + logging | $40-80 |
| Active cooling fan (30mm+) | Prevent thermal throttling | $10-20 |
| Ruggedized carrier board (e.g., MILBOX-ORNX or custom) | Vibration, temperature, dust protection | $200-500 |
| Shielded UART cable + ferrite beads | EMI protection for gimbal communication | $10-20 |
| Total additional hardware | | ~$260-620 |

**Software deployment**:
- OS: JetPack 6.2 on NVMe SSD
- YOLOE models: TRT FP16 engines on NVMe (both 11 and 26 backbone variants)
- VLM: NanoLLM Docker container on NVMe
- Existing YOLO: current Cython + TRT pipeline (unchanged)
- New Cython modules: semantic detection, gimbal control, scan controller, recorder
- VLM process: separate Docker container, IPC via Unix socket
- Config: YAML file for all thresholds, class names, scan parameters, degradation thresholds

**Version control & update strategy**:
- Pin all dependency versions (Ultralytics, TensorRT, NanoLLM, OpenCV)
- Model updates: swap TRT engine files on NVMe, restart service
- Config updates: edit YAML, restart service
- No over-the-air updates (air-gapped system). USB drive for field updates.

## Training & Data Strategy

### Phase 0: Benchmark Sprint (Week 1-2)
- Deploy YOLOE-26s-seg and YOLOE-11s-seg in open-vocab mode
- Test text/visual prompts on semantic01-04.png + 50 additional frames
- Record results. Pick backbone with better qualitative detection.
- Deploy V1 heuristic endpoint analysis (no CNN, no training data needed)
- First field test flight with recording enabled

### Phase 1: Field validation & data collection (Week 2-6)
- Deploy TRT FP16 engine with best backbone
- Record all flights to NVMe
- Operator marks detections as true/false positive in post-flight review
- Build annotation backlog from recorded frames
- Target: 500 annotated frames by week 6

### Phase 2: Custom model training (Week 6-10)
- Fine-tune YOLOE-Seg on custom dataset (linear probing → full fine-tune)
- Train MobileNetV3-Small CNN on endpoint ROI crops
- A/B test: custom model vs YOLOE zero-shot on validation set
- Deploy winning model as new TRT engine

### Phase 3: VLM & refinement (Week 8-14)
- Deploy NanoLLM with VILA1.5-3B
- Tune prompting on collected ambiguous cases
- Train freshness classifier (if enough annotated freshness labels exist)
- Target: 1500+ images per class

### Phase 4: Seasonal expansion (Month 4+)
- Spring/summer annotation campaigns
- Re-train all models with multi-season data
- Adjust heuristic thresholds per season (configurable via YAML)

## Testing Strategy

### Integration / Functional Tests
- YOLOE text prompt detection on reference images (both 11 and 26 backbones)
- TRT FP16 export on Jetson Orin Nano Super (verify no OOM, no crash)
- V1 heuristic endpoint analysis on 20 synthetic masks (10 with hideouts, 10 without)
- Frame quality gate: inject blurry frames, verify rejection
- Gimbal CRC layer: inject corrupted commands, verify retry + log
- Gimbal watchdog: simulate hang, verify forced Level 1 within 2.5s
- NanoLLM VLM: load model, run inference on 10 aerial images, verify output + memory
- VLM load/unload cycle: 10 cycles without memory leak
- Detection logger: verify JSON-lines format, all fields populated
- Frame recorder: verify NVMe write speed, no dropped frames at 30 FPS
- Full pipeline end-to-end on recorded flight footage (offline replay)
- Graceful degradation: simulate each failure mode, verify correct degradation level

### Non-Functional Tests
- Tier 1 latency on Jetson Orin Nano Super TRT FP16: ≤100ms (both backbones)
- Tier 2 latency (V1 heuristic): ≤50ms. (V2 CNN): ≤200ms
- Tier 3 latency (NanoLLM VLM): ≤5 seconds
- Memory peak: all components loaded < 7GB
- Thermal: 60-minute sustained inference, T_junction < 75°C with active cooling
- NVMe endurance: continuous recording for 2 hours, verify no write errors
- Power draw: measure at each degradation level, verify within UAV power budget
- EMI test: operate near data transmitter antenna, verify no gimbal anomalies with CRC layer
- Cold start: power on → first detection within 60 seconds (model load time)
- Vibration: mount Jetson on vibration table, run inference, compare detection accuracy vs static

## Technology Maturity Assessment

| Component | Technology | Maturity | Risk | Mitigation |
|-----------|-----------|----------|------|------------|
| Tier 1 Detection | YOLOE/YOLO26/YOLO11 | **Medium** — YOLO26 is 3 months old, reported regressions on custom data. YOLOE-26 even newer. | Medium | Benchmark both backbones. YOLO11 is battle-tested fallback. Pin versions. |
| TRT FP16 Export | TensorRT on JetPack 6.2 | **High** — FP16 is stable on Jetson. Well-documented. | Low | FP16 only. Avoid INT8 initially. |
| TRT INT8 Export | TensorRT on JetPack 6.2 | **Low** — Documented crashes (PR #23928). Calibration issues. | High | Defer to Phase 3+. FP16 sufficient for now. |
| VLM (NanoLLM) | NanoLLM + VILA-3B | **Medium-High** — Purpose-built for Jetson by NVIDIA team. Docker-based. Monthly releases. | Low | More stable than vLLM. Use Docker containers. |
| VLM (vLLM) | vLLM on Jetson | **Low** — System freezes, crashes, open bugs. | **High** | **Do not use.** NanoLLM instead. |
| Path Tracing | Skeletonization + OpenCV | **High** — Decades-old algorithms. Well-understood. | Low | Pruning needed for noisy inputs. |
| CNN Classifier | MobileNetV3-Small + TRT | **High** — Proven architecture. TRT FP16 stable. | Low | Standard transfer learning. |
| Gimbal Control | ViewLink Serial Protocol | **Medium** — Protocol documented. ArduPilot driver exists. | Medium | EMI mitigation critical. CRC layer. |
| Freshness Assessment | Novel heuristic | **Low** — No prior art. Experimental. | High | V1: metadata only, not a filter. Iterate with data. |
| NVMe Storage | Industrial NVMe on Jetson | **High** — Production standard. SD card alternative is unreliable. | Low | Use industrial-grade SSD. |
| Ruggedized Hardware | MILBOX-ORNX or custom | **High** — Established product. Designed for Jetson + UAV. | Low | Standard procurement. |

## References

- YOLO26 docs: https://docs.ultralytics.com/models/yolo26/
- YOLOE docs: https://docs.ultralytics.com/models/yoloe/
- YOLO26 accuracy regression: https://github.com/ultralytics/ultralytics/issues/23206
- YOLO26 TRT INT8 crash: https://github.com/ultralytics/ultralytics/issues/23841
- YOLO26 TRT OOM fix: https://github.com/ultralytics/ultralytics/pull/23928
- vLLM Jetson freezes: https://github.com/dusty-nv/jetson-containers/issues/800
- vLLM Jetson install crash: https://github.com/vllm-project/vllm/issues/23376
- NanoLLM docs: https://dusty-nv.github.io/NanoLLM/
- NanoVLM: https://jetson-ai-lab.com/tutorial_nano-vlm.html
- MILBOX-ORNX: https://forecr.io/products/jetson-orin-nx-orin-nano-rugged-compact-pc-milbox-ornx
- Jetson SD card corruption: https://forums.developer.nvidia.com/t/corrupted-sd-cards/265418
- Jetson thermal throttling: https://www.alibaba.com/product-insights/how-to-run-private-llama-3-inference-on-a-200-jetson-orin-nano-without-thermal-throttling.html
- ViewPro EMI issues: https://www.viewprouav.com/help/gimbal/
- UART CRC reliability: https://link.springer.com/chapter/10.1007/978-981-19-8563-8_23
- Military edge AI thermal: https://www.mobilityengineeringtech.com/component/content/article/53967
- UAV-VL-R1: https://arxiv.org/pdf/2508.11196
- Qwen2-VL-2B-GPTQ-INT8: https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct-GPTQ-Int8
- Ultralytics Jetson guide: https://docs.ultralytics.com/guides/nvidia-jetson
- Skelite: https://arxiv.org/html/2503.07369v1
- YOLO FP reduction: https://docs.ultralytics.com/yolov5/tutorials/tips_for_best_training_results