Initial commit

Made-with: Cursor
2026-06-21 09:31:12 +00:00 · 2026-03-26 00:20:30 +02:00
commit 8e2ecf50fd
144 changed files with 19781 additions and 0 deletions
@@ -0,0 +1,319 @@
+# Solution Draft
+
+## Assessment Findings
+
+| Old Component Solution | Weak Point (functional/security/performance) | New Solution |
+|------------------------|----------------------------------------------|-------------|
+| YOLO26 as sole detection backbone | **Accuracy regression on custom datasets**: Reported YOLO26s "much less accurate" than YOLO11s on identical training data (GitHub #23206). YOLO26 is 3 months old — less battle-tested than YOLO11. | Benchmark YOLO26 vs YOLO11 on initial annotated data before committing. YOLO11 as fallback. YOLOE supports both backbones (yoloe-11s-seg, yoloe-26s-seg). |
+| YOLO26 TensorRT INT8 export | **INT8 export fails on Jetson** (TRT Error Code 2, OOM). Fix merged (PR #23928) but indicates fragile tooling. | Use FP16 only for initial deployment (confirmed stable). INT8 as future optimization after tooling matures. Pin Ultralytics version + JetPack version. |
+| vLLM as VLM runtime | **Unstable on Jetson Orin Nano**: system freezes, reboots, installation crashes, excessive memory (multiple open issues). Not production-ready for 8GB devices. | **Replace with NanoLLM/NanoVLM** — purpose-built for Jetson by NVIDIA's Dusty-NV team. Docker containers for JetPack 5/6. Supports VILA, LLaVA. Stable. Or use llama.cpp with GGUF models (proven on Jetson). |
+| No storage strategy | **SD card corruption**: Recurring corruption documented across multiple Jetson Orin Nano users. SD cards unsuitable for production. | **Mandatory NVMe SSD** for OS + models + logging. No SD card in production. Ruggedized NVMe mount for vibration resistance. |
+| No EMI protection on UART | **ViewPro documents EMI issues**: antennas cause random gimbal panning if within 35cm. Standard UART parity bit insufficient for noisy UAV environment. | Add CRC-16 checksum layer on gimbal commands. Enforce 35cm antenna separation in physical design. Consider shielded UART cable. Command retry on CRC failure (max 3 retries, then log error). |
+| No environmental hardening addressed | **UAV environment**: vibration, temperature extremes (-20°C to +50°C), dust, EMI, power fluctuations. Dev kit form factor is not field-deployable. | Use ruggedized carrier board (MILBOX-ORNX or similar) with vibration dampening. Conformal coating on exposed connectors. External temperature sensor for environmental monitoring. |
+| No logging or telemetry | **No post-flight review capability**: field system must log all detections with metadata for model iteration, operator review, and evidence collection. | Add detection logging: timestamp, GPS-denied coordinates, confidence score, detection class, JPEG thumbnail, tier that triggered, freshness metadata. Log to NVMe SSD. Export as structured format (JSON lines) after flight. |
+| No frame recording for offline replay | **Training data collection depends on field recording**: Without recording, no way to build training dataset from real flights. | Record all camera frames to NVMe at configurable rate (1-5 FPS during Level 1, full rate during Level 2). Include detection overlay option. Post-flight: use recordings for annotation. |
+| No power management | **UAV power budget is finite**: Jetson at 15W + gimbal + camera + radio. No monitoring of power draw or load shedding. | Monitor power consumption via Jetson's INA sensors. Power budget alert at 80% of allocated watts. Load shedding: disable VLM first, then reduce inference rate, then disable semantic detection. |
+| YOLO26 not validated for this domain | **No benchmark on aerial concealment detection**: All YOLO26 numbers are on COCO/LVIS. Concealment detection may behave very differently. | First sprint deliverable: benchmark YOLOE-26 (both 11 and 26 backbones) on semantic01-04.png with text/visual prompts. Report AP on initial annotated validation set before committing to backbone. |
+| Freshness and path tracing are untested algorithms | **No proven prior art**: Both freshness assessment and path-following via skeletonization are novel combinations. Risk of over-engineering before validation. | Implement minimal viable versions first. V1 path tracing: skeleton + endpoint only, no freshness, no junction following. Validate on real flight data before adding complexity. |
+
+## Product Solution Description
+
+A three-tier semantic detection system for identifying concealed/camouflaged positions from reconnaissance UAV aerial imagery, running on Jetson Orin Nano Super with NVMe SSD storage, active cooling, and ruggedized carrier board, alongside the existing YOLO detection pipeline.
+
+```
+┌──────────────────────────────────────────────────────────────────────────┐
+│          JETSON ORIN NANO SUPER (ruggedized carrier, NVMe, 15W)         │
+│                                                                          │
+│  ┌──────────┐    ┌──────────────┐    ┌──────────────┐    ┌───────────┐  │
+│  │ ViewPro  │───▶│  Tier 1      │───▶│  Tier 2      │───▶│ Tier 3    │  │
+│  │ A40      │    │  YOLOE       │    │  Path Trace  │    │ VLM       │  │
+│  │ Camera   │    │  (11 or 26   │    │  + CNN       │    │ NanoLLM   │  │
+│  │ + Frame  │    │   backbone)  │    │  ≤200ms      │    │ (L2 only) │  │
+│  │ Quality  │    │  TRT FP16    │    │              │    │ ≤5s       │  │
+│  │ Gate     │    │  ≤100ms      │    │              │    │           │  │
+│  └────▲─────┘    └──────────────┘    └──────────────┘    └───────────┘  │
+│       │                                                                  │
+│  ┌────┴─────┐    ┌──────────────┐    ┌──────────────┐    ┌───────────┐  │
+│  │ Gimbal   │◀───│  Scan        │    │  Watchdog    │    │ Recorder  │  │
+│  │ Control  │    │  Controller  │    │  + Thermal   │    │ + Logger  │  │
+│  │ + CRC    │    │  (L1/L2 FSM) │    │  + Power     │    │ (NVMe)   │  │
+│  └──────────┘    └──────────────┘    └──────────────┘    └───────────┘  │
+│                                                                          │
+│  ┌──────────────────────────────┐                                       │
+│  │  Existing YOLO Detection    │ (always running, scene context)        │
+│  │  Cython + TRT               │                                        │
+│  └──────────────────────────────┘                                       │
+└──────────────────────────────────────────────────────────────────────────┘
+```
+
+Key changes from draft02:
+- **YOLOE backbone is configurable** (YOLO11 or YOLO26) — benchmark before committing
+- **NanoLLM replaces vLLM** as VLM runtime (purpose-built for Jetson, stable)
+- **NVMe SSD mandatory** — no SD card in production
+- **CRC-16 on gimbal UART** — EMI protection
+- **Detection logger + frame recorder** — post-flight review and training data collection
+- **Ruggedized carrier board** — vibration, temperature, dust protection
+- **Power monitoring + load shedding** — finite UAV power budget
+- **FP16 only** for initial deployment (INT8 export unstable on Jetson)
+- **Minimal V1 for unproven components** — path tracing and freshness start simple
+
+## Architecture
+
+### Component 1: Tier 1 — Real-Time Detection
+
+| Solution | Tools | Advantages | Limitations | Requirements | Security | Performance | Fit |
+|----------|-------|-----------|-------------|-------------|----------|------------|-----|
+| **YOLOE with configurable backbone (recommended)** | yoloe-11s-seg.pt or yoloe-26s-seg.pt, set_classes() → TRT FP16 | Supports both YOLO11 and YOLO26 backbones. Benchmark on real data, pick winner. set_classes() bakes CLIP embeddings for zero overhead. | YOLO26 may regress on custom data vs YOLO11. Needs empirical comparison. | Ultralytics ≥8.4 (pinned version), TensorRT, JetPack 6.2 | Local only | YOLO11s TRT FP16: ~7ms (640px). YOLO26s: similar or slightly faster. | **Best fit. Hedge against backbone risk.** |
+
+**Version pinning strategy**:
+- Pin `ultralytics==8.4.X` (specific patch version validated on Jetson)
+- Pin JetPack 6.2 + TensorRT version
+- Test every Ultralytics update in staging before deploying to production
+- Keep both yoloe-11s-seg and yoloe-26s-seg TRT engines on NVMe; switch via config
+
+**YOLO backbone selection process (Sprint 1)**:
+1. Annotate 200 frames from real flight footage (footpaths, branch piles, entrances)
+2. Fine-tune YOLOE-11s-seg and YOLOE-26s-seg on same dataset, same hyperparameters
+3. Evaluate on held-out validation set (50 frames)
+4. Pick backbone with higher mAP50
+5. If delta < 2%: pick YOLO26 (faster CPU inference, NMS-free deployment)
+
+### Component 2: Tier 2 — Spatial Reasoning & CNN Confirmation
+
+| Solution | Tools | Advantages | Limitations | Requirements | Security | Performance | Fit |
+|----------|-------|-----------|-------------|-------------|----------|------------|-----|
+| **V1 minimal path tracing + heuristic classifier (recommended for initial release)** | OpenCV, scikit-image | No training data needed. Skeleton + endpoint detection + simple heuristic: "dark mass at endpoint → flag." Fast to implement and validate. | Low accuracy. Many false positives. | OpenCV, scikit-image | Offline | ~30ms | **V1: ship fast, validate on real data.** |
+| **V2 trained CNN (after data collection)** | MobileNetV3-Small, TensorRT FP16 | Higher accuracy after training. Dynamic ROI sizing. | Needs 300+ positive, 1000+ negative annotated ROI crops. | PyTorch, TRT export | Offline | ~5-10ms classification | **V2: replace heuristic once data exists.** |
+
+**V1 heuristic for endpoint analysis** (no training data needed):
+1. Skeletonize footpath mask with branch pruning
+2. Find endpoints
+3. For each endpoint: extract ROI (dynamic size based on GSD)
+4. Compute: mean_darkness = mean intensity in ROI center 50%. contrast = (surrounding_mean - center_mean) / surrounding_mean. area_ratio = dark_pixel_count / total_pixels.
+5. If mean_darkness < threshold_dark AND contrast > threshold_contrast → flag as potential concealed position
+6. Thresholds: configurable, tuned per season. Start with winter values.
+
+**V1 freshness** (metadata only, not a filter):
+- contrast_ratio of path vs surrounding terrain
+- Report as: "high contrast" (likely fresh) / "low contrast" (likely stale)
+- No binary classification. Operator sees all detections with freshness tag.
+
+### Component 3: Tier 3 — VLM Deep Analysis
+
+| Solution | Tools | Advantages | Limitations | Requirements | Security | Performance | Fit |
+|----------|-------|-----------|-------------|-------------|----------|------------|-----|
+| **NanoLLM with VILA-2.7B or VILA1.5-3B (recommended)** | NanoLLM Docker container, MLC/TVM quantization | Purpose-built for Jetson by NVIDIA team. Stable Docker containers. Optimized memory management. Supports VLMs natively. | Limited model selection (VILA, LLaVA, Obsidian). Not all VLMs available. | Docker, JetPack 6, NVMe for container storage | Local only, container isolation | ~15-25 tok/s on Orin Nano (4-bit MLC) | **Most stable Jetson VLM option.** |
+| llama.cpp with GGUF VLM | llama.cpp, GGUF model files | Lightweight. No Docker needed. Proven stability on Jetson. Wide model support. | Manual build. Less optimized than NanoLLM for Jetson GPU. | llama.cpp build, GGUF weights | Local only | ~10-20 tok/s estimated | **Fallback if NanoLLM doesn't support needed model.** |
+| ~~vLLM~~ | ~~vLLM Docker~~ | ~~High throughput~~ | **System freezes, reboots, installation crashes on Orin Nano. Multiple open bugs. Not production-ready.** | N/A | N/A | N/A | **Not recommended.** |
+
+**Model selection for NanoLLM**:
+- Primary: VILA1.5-3B (confirmed on Orin Nano, multimodal, 4-bit MLC)
+- If UAV-VL-R1 GGUF weights become available: use via llama.cpp (aerial-specialized)
+- Fallback: Obsidian-3B (mini VLM, lower accuracy but very fast)
+
+### Component 4: Camera Gimbal Control
+
+| Solution | Tools | Advantages | Limitations | Requirements | Security | Performance | Fit |
+|----------|-------|-----------|-------------|-------------|----------|------------|-----|
+| **ViewLink serial driver + CRC-16 + PID + watchdog (recommended)** | pyserial, crcmod, PID library, threading | Robust communication. CRC catches EMI-corrupted commands. Retry logic. Watchdog. | ViewLink protocol implementation from spec. Physical EMI mitigation required. | ViewPro docs, UART, shielded cable, 35cm antenna separation | Physical only | <10ms command + CRC overhead negligible | **Production-grade.** |
+
+**UART reliability layer**:
+```
+Packet format: [SOF(2)] [CMD(N)] [CRC16(2)]
+- SOF: 0xAA 0x55 (start of frame)
+- CMD: ViewLink command bytes per protocol spec
+- CRC16: CRC-CCITT over CMD bytes
+```
+- On send: compute CRC-16, append to ViewLink command packet
+- On receive (gimbal feedback): validate CRC-16. Discard corrupted frames.
+- On CRC failure (send): retry up to 3 times with 10ms delay. Log failure after 3 retries.
+- Note: Check if ViewLink protocol already includes checksums (read full spec first). If so, use native checksum; don't add redundant CRC.
+
+**Physical EMI mitigation checklist**:
+- [ ] Gimbal UART cable: shielded, shortest possible run
+- [ ] Video/data transmitter antenna: ≥35cm from gimbal (ViewPro recommendation)
+- [ ] Independent power supply for gimbal (or filtered from main bus)
+- [ ] Ferrite beads on UART cable near Jetson connector
+
+### Component 5: Recording, Logging & Telemetry
+
+| Solution | Tools | Advantages | Limitations | Requirements | Security | Performance | Fit |
+|----------|-------|-----------|-------------|-------------|----------|------------|-----|
+| **NVMe-backed frame recorder + JSON-lines detection logger (recommended)** | OpenCV VideoWriter / JPEG sequences, JSON lines, NVMe SSD | Post-flight review. Training data collection. Evidence. Detection audit trail. | NVMe write bandwidth (~500 MB/s) more than sufficient. Storage: ~2GB/min at 1080p 5FPS JPEG. | NVMe SSD ≥256GB | Physical access to NVMe | ~5ms per frame write (async) | **Essential for field deployment.** |
+
+**Detection log format** (JSON lines, one per detection):
+```json
+{
+  "ts": "2026-03-19T14:32:01.234Z",
+  "frame_id": 12345,
+  "gps_denied_lat": 48.123456,
+  "gps_denied_lon": 37.654321,
+  "tier": 1,
+  "class": "footpath",
+  "confidence": 0.72,
+  "bbox": [0.12, 0.34, 0.45, 0.67],
+  "freshness": "high_contrast",
+  "tier2_result": "concealed_position",
+  "tier2_confidence": 0.85,
+  "tier3_used": false,
+  "thumbnail_path": "frames/12345_det_0.jpg"
+}
+```
+
+**Frame recording strategy**:
+- Level 1: record every 5th frame (1-2 FPS) — overview coverage
+- Level 2: record every frame (30 FPS) — detailed analysis footage
+- Storage budget: 256GB NVMe ≈ 2 hours at Level 2 full rate, or 10+ hours at Level 1 rate
+- Circular buffer: when storage >80% full, overwrite oldest Level 1 frames (keep Level 2)
+
+### Component 6: System Health & Resilience
+
+**Monitoring threads**:
+
+| Monitor | Check Interval | Threshold | Action |
+|---------|---------------|-----------|--------|
+| Thermal (T_junction) | 1s | >75°C | Degrade to Level 1 only |
+| Thermal (T_junction) | 1s | >80°C | Disable semantic detection |
+| Power (Jetson INA) | 2s | >80% budget | Disable VLM |
+| Power (Jetson INA) | 2s | >90% budget | Reduce inference rate to 5 FPS |
+| Gimbal heartbeat | 2s | No response | Force Level 1 sweep pattern |
+| Semantic process | 5s | No heartbeat | Restart with 5s backoff, max 3 attempts |
+| VLM process | 5s | No heartbeat | Mark Tier 3 unavailable, continue Tier 1+2 |
+| NVMe free space | 60s | <20% free | Switch to Level 1 recording rate only |
+| Frame quality | per frame | Laplacian var < threshold | Skip frame, use buffered good frame |
+
+**Graceful degradation** (4 levels, unchanged from draft02):
+
+| Level | Condition | Capability |
+|-------|-----------|-----------|
+| 0 — Full | All nominal, T < 70°C | Tier 1+2+3, Level 1+2, gimbal, recording |
+| 1 — No VLM | VLM unavailable or T > 75°C or power > 80% | Tier 1+2, Level 1+2, gimbal, recording |
+| 2 — No semantic | Semantic crashed 3x or T > 80°C | Existing YOLO only, Level 1 sweep, recording |
+| 3 — No gimbal | Gimbal UART failed 3x | Existing YOLO only, fixed camera, recording |
+
+### Component 7: Integration & Deployment
+
+**Hardware BOM additions** (beyond existing system):
+
+| Item | Purpose | Estimated Cost |
+|------|---------|----------------|
+| NVMe SSD ≥256GB (industrial grade) | OS + models + recording + logging | $40-80 |
+| Active cooling fan (30mm+) | Prevent thermal throttling | $10-20 |
+| Ruggedized carrier board (e.g., MILBOX-ORNX or custom) | Vibration, temperature, dust protection | $200-500 |
+| Shielded UART cable + ferrite beads | EMI protection for gimbal communication | $10-20 |
+| Total additional hardware | | ~$260-620 |
+
+**Software deployment**:
+- OS: JetPack 6.2 on NVMe SSD
+- YOLOE models: TRT FP16 engines on NVMe (both 11 and 26 backbone variants)
+- VLM: NanoLLM Docker container on NVMe
+- Existing YOLO: current Cython + TRT pipeline (unchanged)
+- New Cython modules: semantic detection, gimbal control, scan controller, recorder
+- VLM process: separate Docker container, IPC via Unix socket
+- Config: YAML file for all thresholds, class names, scan parameters, degradation thresholds
+
+**Version control & update strategy**:
+- Pin all dependency versions (Ultralytics, TensorRT, NanoLLM, OpenCV)
+- Model updates: swap TRT engine files on NVMe, restart service
+- Config updates: edit YAML, restart service
+- No over-the-air updates (air-gapped system). USB drive for field updates.
+
+## Training & Data Strategy
+
+### Phase 0: Benchmark Sprint (Week 1-2)
+- Deploy YOLOE-26s-seg and YOLOE-11s-seg in open-vocab mode
+- Test text/visual prompts on semantic01-04.png + 50 additional frames
+- Record results. Pick backbone with better qualitative detection.
+- Deploy V1 heuristic endpoint analysis (no CNN, no training data needed)
+- First field test flight with recording enabled
+
+### Phase 1: Field validation & data collection (Week 2-6)
+- Deploy TRT FP16 engine with best backbone
+- Record all flights to NVMe
+- Operator marks detections as true/false positive in post-flight review
+- Build annotation backlog from recorded frames
+- Target: 500 annotated frames by week 6
+
+### Phase 2: Custom model training (Week 6-10)
+- Fine-tune YOLOE-Seg on custom dataset (linear probing → full fine-tune)
+- Train MobileNetV3-Small CNN on endpoint ROI crops
+- A/B test: custom model vs YOLOE zero-shot on validation set
+- Deploy winning model as new TRT engine
+
+### Phase 3: VLM & refinement (Week 8-14)
+- Deploy NanoLLM with VILA1.5-3B
+- Tune prompting on collected ambiguous cases
+- Train freshness classifier (if enough annotated freshness labels exist)
+- Target: 1500+ images per class
+
+### Phase 4: Seasonal expansion (Month 4+)
+- Spring/summer annotation campaigns
+- Re-train all models with multi-season data
+- Adjust heuristic thresholds per season (configurable via YAML)
+
+## Testing Strategy
+
+### Integration / Functional Tests
+- YOLOE text prompt detection on reference images (both 11 and 26 backbones)
+- TRT FP16 export on Jetson Orin Nano Super (verify no OOM, no crash)
+- V1 heuristic endpoint analysis on 20 synthetic masks (10 with hideouts, 10 without)
+- Frame quality gate: inject blurry frames, verify rejection
+- Gimbal CRC layer: inject corrupted commands, verify retry + log
+- Gimbal watchdog: simulate hang, verify forced Level 1 within 2.5s
+- NanoLLM VLM: load model, run inference on 10 aerial images, verify output + memory
+- VLM load/unload cycle: 10 cycles without memory leak
+- Detection logger: verify JSON-lines format, all fields populated
+- Frame recorder: verify NVMe write speed, no dropped frames at 30 FPS
+- Full pipeline end-to-end on recorded flight footage (offline replay)
+- Graceful degradation: simulate each failure mode, verify correct degradation level
+
+### Non-Functional Tests
+- Tier 1 latency on Jetson Orin Nano Super TRT FP16: ≤100ms (both backbones)
+- Tier 2 latency (V1 heuristic): ≤50ms. (V2 CNN): ≤200ms
+- Tier 3 latency (NanoLLM VLM): ≤5 seconds
+- Memory peak: all components loaded < 7GB
+- Thermal: 60-minute sustained inference, T_junction < 75°C with active cooling
+- NVMe endurance: continuous recording for 2 hours, verify no write errors
+- Power draw: measure at each degradation level, verify within UAV power budget
+- EMI test: operate near data transmitter antenna, verify no gimbal anomalies with CRC layer
+- Cold start: power on → first detection within 60 seconds (model load time)
+- Vibration: mount Jetson on vibration table, run inference, compare detection accuracy vs static
+
+## Technology Maturity Assessment
+
+| Component | Technology | Maturity | Risk | Mitigation |
+|-----------|-----------|----------|------|------------|
+| Tier 1 Detection | YOLOE/YOLO26/YOLO11 | **Medium** — YOLO26 is 3 months old, reported regressions on custom data. YOLOE-26 even newer. | Medium | Benchmark both backbones. YOLO11 is battle-tested fallback. Pin versions. |
+| TRT FP16 Export | TensorRT on JetPack 6.2 | **High** — FP16 is stable on Jetson. Well-documented. | Low | FP16 only. Avoid INT8 initially. |
+| TRT INT8 Export | TensorRT on JetPack 6.2 | **Low** — Documented crashes (PR #23928). Calibration issues. | High | Defer to Phase 3+. FP16 sufficient for now. |
+| VLM (NanoLLM) | NanoLLM + VILA-3B | **Medium-High** — Purpose-built for Jetson by NVIDIA team. Docker-based. Monthly releases. | Low | More stable than vLLM. Use Docker containers. |
+| VLM (vLLM) | vLLM on Jetson | **Low** — System freezes, crashes, open bugs. | **High** | **Do not use.** NanoLLM instead. |
+| Path Tracing | Skeletonization + OpenCV | **High** — Decades-old algorithms. Well-understood. | Low | Pruning needed for noisy inputs. |
+| CNN Classifier | MobileNetV3-Small + TRT | **High** — Proven architecture. TRT FP16 stable. | Low | Standard transfer learning. |
+| Gimbal Control | ViewLink Serial Protocol | **Medium** — Protocol documented. ArduPilot driver exists. | Medium | EMI mitigation critical. CRC layer. |
+| Freshness Assessment | Novel heuristic | **Low** — No prior art. Experimental. | High | V1: metadata only, not a filter. Iterate with data. |
+| NVMe Storage | Industrial NVMe on Jetson | **High** — Production standard. SD card alternative is unreliable. | Low | Use industrial-grade SSD. |
+| Ruggedized Hardware | MILBOX-ORNX or custom | **High** — Established product. Designed for Jetson + UAV. | Low | Standard procurement. |
+
+## References
+
+- YOLO26 docs: https://docs.ultralytics.com/models/yolo26/
+- YOLOE docs: https://docs.ultralytics.com/models/yoloe/
+- YOLO26 accuracy regression: https://github.com/ultralytics/ultralytics/issues/23206
+- YOLO26 TRT INT8 crash: https://github.com/ultralytics/ultralytics/issues/23841
+- YOLO26 TRT OOM fix: https://github.com/ultralytics/ultralytics/pull/23928
+- vLLM Jetson freezes: https://github.com/dusty-nv/jetson-containers/issues/800
+- vLLM Jetson install crash: https://github.com/vllm-project/vllm/issues/23376
+- NanoLLM docs: https://dusty-nv.github.io/NanoLLM/
+- NanoVLM: https://jetson-ai-lab.com/tutorial_nano-vlm.html
+- MILBOX-ORNX: https://forecr.io/products/jetson-orin-nx-orin-nano-rugged-compact-pc-milbox-ornx
+- Jetson SD card corruption: https://forums.developer.nvidia.com/t/corrupted-sd-cards/265418
+- Jetson thermal throttling: https://www.alibaba.com/product-insights/how-to-run-private-llama-3-inference-on-a-200-jetson-orin-nano-without-thermal-throttling.html
+- ViewPro EMI issues: https://www.viewprouav.com/help/gimbal/
+- UART CRC reliability: https://link.springer.com/chapter/10.1007/978-981-19-8563-8_23
+- Military edge AI thermal: https://www.mobilityengineeringtech.com/component/content/article/53967
+- UAV-VL-R1: https://arxiv.org/pdf/2508.11196
+- Qwen2-VL-2B-GPTQ-INT8: https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct-GPTQ-Int8
+- Ultralytics Jetson guide: https://docs.ultralytics.com/guides/nvidia-jetson
+- Skelite: https://arxiv.org/html/2503.07369v1
+- YOLO FP reduction: https://docs.ultralytics.com/yolov5/tutorials/tips_for_best_training_results
@@ -0,0 +1,325 @@
+# Solution Draft
+
+## Product Solution Description
+
+A three-tier semantic detection system for identifying concealed/camouflaged positions from reconnaissance UAV aerial imagery, running on Jetson Orin Nano Super alongside the existing YOLO detection pipeline.
+
+```
+┌─────────────────────────────────────────────────────────────────────────┐
+│                        JETSON ORIN NANO SUPER                          │
+│                                                                        │
+│  ┌──────────┐    ┌──────────────┐    ┌──────────────┐    ┌──────────┐  │
+│  │ ViewPro  │───▶│  Tier 1      │───▶│  Tier 2      │───▶│ Tier 3   │  │
+│  │ A40      │    │  YOLO26-Seg  │    │  Spatial     │    │ VLM      │  │
+│  │ Camera   │    │  + YOLOE-26  │    │  Reasoning   │    │ UAV-VL   │  │
+│  │          │    │  ≤100ms      │    │  + CNN       │    │ -R1      │  │
+│  │          │    │              │    │  ≤200ms      │    │ ≤5s      │  │
+│  └────▲─────┘    └──────────────┘    └──────────────┘    └──────────┘  │
+│       │                                                                │
+│  ┌────┴─────┐    ┌──────────────┐                                      │
+│  │ Gimbal   │◀───│  Scan        │                                      │
+│  │ Control  │    │  Controller  │                                      │
+│  │ Module   │    │  (L1/L2)     │                                      │
+│  └──────────┘    └──────────────┘                                      │
+│                                                                        │
+│  ┌──────────────────────────────┐                                      │
+│  │  Existing YOLO Detection    │ (separate service, provides context)  │
+│  │  Cython + TRT               │                                       │
+│  └──────────────────────────────┘                                      │
+└─────────────────────────────────────────────────────────────────────────┘
+```
+
+The system operates in two scan levels:
+- **Level 1 (Wide Sweep)**: Camera at medium zoom, left-right swing. YOLOE-26 text/visual prompts detect POIs in real-time. Existing YOLO provides scene context.
+- **Level 2 (Detailed Scan)**: Camera zooms into POI. Spatial reasoning traces footpaths, finds endpoints. CNN classifies potential hideouts. Optional VLM provides deep analysis for ambiguous cases.
+
+Three submodules: (1) Semantic Detection AI, (2) Camera Gimbal Control, (3) Integration with existing detections service.
+
+## Existing/Competitor Solutions Analysis
+
+No direct commercial or open-source competitor exists for this specific combination of requirements (concealed position detection from UAV with edge inference). Related work:
+
+| Solution | Approach | Limitations for This Use Case |
+|----------|----------|-------------------------------|
+| Standard YOLO object detection | Bounding box classification of known object types | Cannot detect camouflaged/concealed targets without explicit visual features |
+| CAMOUFLAGE-Net (YOLOv7-based) | Attention mechanisms + ELAN for camouflage detection | Designed for ground-level imagery, not aerial; academic datasets only |
+| Open-Vocabulary Camouflaged Object Segmentation | VLM + SAM cascaded segmentation | Too slow for real-time edge inference; requires cloud GPU |
+| UAV-YOLO12 | Multi-scale road segmentation from UAV imagery | Roads only, no concealment reasoning |
+| FootpathSeg (DINO-MC + UNet) | Footpath segmentation with self-supervised learning | Pedestrian context, not military aerial; no path-following logic |
+| YOLO-World / YOLOE | Open-vocabulary detection | Closest fit — YOLOE-26 is our primary Tier 1 mechanism |
+
+**Key insight**: No existing solution combines footpath detection + path tracing + endpoint analysis + concealment classification in a single pipeline. This requires a custom multi-stage system.
+
+## Architecture
+
+### Component 1: Tier 1 — Real-Time Detection (YOLOE-26 + YOLO26-Seg)
+
+| Solution | Tools | Advantages | Limitations | Requirements | Security | Cost | Fit |
+|----------|-------|-----------|-------------|-------------|----------|------|-----|
+| **YOLOE-26 text/visual prompts (recommended)** | yoloe-26s-seg.pt, Ultralytics, TensorRT | Zero-shot detection from day 1. Text prompts: "footpath", "branch pile", "dark entrance". Visual prompts: reference images of hideouts. Zero overhead when re-parameterized. NMS-free. | Open-vocabulary accuracy lower than custom-trained model. Text prompts may not capture all concealment patterns. | Ultralytics ≥8.4, TensorRT, JetPack 6.2 | Model weights stored locally, no cloud | Free (open source) | **Best fit for bootstrapping. Immediate capability.** |
+| YOLO26-Seg custom-trained | yolo26s-seg.pt fine-tuned on custom dataset | Higher accuracy for known classes after training. Instance segmentation masks for footpaths. | Requires annotated training data (1500+ images/class). No zero-shot capability. | Custom dataset, GPU for training | Same | Free (open source) + annotation labor | **Best fit for production after data collection.** |
+| UNet + MambaOut | PyTorch, TensorRT | Best published accuracy for trail segmentation from aerial photos. | Separate model, additional memory. No built-in detection head. | Custom integration | Same | Free (open source) | Backup option if YOLO26-Seg underperforms on trails |
+
+**Recommended approach**: Start with YOLOE-26 text/visual prompts. Parallel annotation effort builds custom dataset. Transition to fine-tuned YOLO26-Seg once data is sufficient. YOLOE-26 zero-shot capability provides immediate usability.
+
+**YOLOE-26 configuration for this project**:
+
+Text prompts for Level 1 detection:
+- `"footpath"`, `"trail"`, `"path in snow"`, `"road"`, `"track"`
+- `"pile of branches"`, `"tree branches"`, `"camouflage netting"`
+- `"dark entrance"`, `"hole"`, `"dugout"`, `"dark opening"`
+- `"tree row"`, `"tree line"`, `"group of trees"`
+- `"clearing"`, `"open area near forest"`
+
+Visual prompts: Annotated reference images (semantic01-04.png) as visual prompt sources. Crop ROIs around known hideouts, use as reference for SAVPE visual-prompted detection.
+
+API usage:
+```python
+from ultralytics import YOLOE
+
+model = YOLOE("yoloe-26s-seg.pt")
+model.set_classes(["footpath", "branch pile", "dark entrance", "tree row", "road"])
+results = model.predict(frame, conf=0.15)  # low threshold, high recall
+```
+
+Visual prompt usage:
+```python
+results = model.predict(
+    frame,
+    refer_image="reference_hideout.jpg",
+    visual_prompts={"bboxes": np.array([[x1, y1, x2, y2]]), "cls": np.array([0])}
+)
+```
+
+### Component 2: Tier 2 — Spatial Reasoning & CNN Confirmation
+
+| Solution | Tools | Advantages | Limitations | Requirements | Security | Cost | Fit |
+|----------|-------|-----------|-------------|-------------|----------|------|-----|
+| **Path tracing + CNN classifier (recommended)** | OpenCV skeletonization, MobileNetV3-Small TensorRT | Fast (<200ms total). Path tracing from segmentation masks. Binary classifier: "concealed position yes/no" on ROI crop. Well-understood algorithms. | Requires custom annotated data for CNN classifier. Path tracing quality depends on segmentation quality. | OpenCV, scikit-image, PyTorch → TRT | Offline inference | Free + annotation labor | **Best fit. Modular, fast, interpretable.** |
+| Heuristic rules only | OpenCV, NumPy | No training data needed. Rule-based: "if footpath ends at dark mass → flag." | Brittle. Hard to tune. Cannot generalize across seasons/terrain. | None | Offline | Free | Baseline/fallback for initial version |
+| End-to-end custom model | PyTorch, TensorRT | Single model handles everything. | Requires massive training data. Black box. Hard to debug. | Large annotated dataset | Offline | Free + GPU time | Not recommended for initial release |
+
+**Path tracing algorithm**:
+
+1. Take footpath segmentation mask from Tier 1
+2. Skeletonize using Zhang-Suen algorithm (`skimage.morphology.skeletonize`)
+3. Detect endpoints using hit-miss morphological operations (8 kernel patterns)
+4. Detect junctions using branch-point kernels
+5. Trace path segments between junctions/endpoints
+6. For each endpoint: extract 128×128 ROI crop centered on endpoint from original image
+7. Feed ROI crop to MobileNetV3-Small binary classifier: "concealed structure" vs "natural terminus"
+
+**Freshness assessment approach**:
+
+Visual features for fresh vs stale classification (binary classifier on path ROI):
+- Edge sharpness (Laplacian variance on path boundary)
+- Contrast ratio (path intensity vs surrounding terrain)
+- Fill ratio (percentage of path area with snow/vegetation coverage)
+- Path width consistency (fresh paths have more uniform width)
+
+Implementation: Extract these features per path segment, feed to lightweight classifier (Random Forest or small CNN). Initial version can use hand-tuned thresholds, then train with annotated data.
+
+### Component 3: Tier 3 — VLM Deep Analysis (Optional)
+
+| Solution | Tools | Advantages | Limitations | Requirements | Security | Cost | Fit |
+|----------|-------|-----------|-------------|-------------|----------|------|-----|
+| **UAV-VL-R1 (recommended)** | Qwen2-VL-2B fine-tuned, vLLM, INT8 quantization | Purpose-built for aerial reasoning. 48% better than generic Qwen2-VL-2B on UAV tasks. 2.5GB INT8. Open source. | 3-5s per analysis. Competes for GPU memory with YOLO (sequential scheduling). | vLLM or TRT-LLM, GPTQ-INT8 weights | Local inference only | Free (open source) | **Best fit for aerial VLM tasks.** |
+| SmolVLM2-500M | HuggingFace Transformers, ONNX | Smallest memory (1.8GB). Fastest inference (~1-2s estimated). | Weakest reasoning. May lack nuance for concealment analysis. | ONNX Runtime or TRT | Local only | Free | Fallback if memory is tight |
+| Moondream 2B | moondream API, PyTorch | Built-in detect()/point() APIs. Strong grounded detection (refcoco 91.1). | Not aerial-specialized. Same size class as UAV-VL-R1 but less relevant. | PyTorch or ONNX | Local only | Free | Alternative if UAV-VL-R1 underperforms |
+| No VLM | N/A | Simpler system. Less memory. No latency for Tier 3. | No zero-shot capability for novel patterns. No operator explanations. | None | N/A | Free | Viable for production if Tier 1+2 accuracy is sufficient |
+
+**VLM prompting strategy for concealment analysis**:
+
+```
+Analyze this aerial UAV image crop. A footpath was detected leading to this area.
+Is there a concealed military position visible? Look for:
+- Dark entrances or openings
+- Piles of cut tree branches used as camouflage
+- Dugout structures
+- Signs of recent human activity
+
+Answer: YES or NO, then one sentence explaining why.
+```
+
+**Integration**: VLM runs as separate Python process. Communicates with main Cython pipeline via Unix domain socket or shared memory. Triggered only when Tier 2 CNN confidence is between 30-70% (ambiguous cases).
+
+### Component 4: Camera Gimbal Control
+
+| Solution | Tools | Advantages | Limitations | Requirements | Security | Cost | Fit |
+|----------|-------|-----------|-------------|-------------|----------|------|-----|
+| **Custom ViewLink serial driver + PID controller (recommended)** | Python serial, ViewLink Protocol V3.3.3, PID library | Direct hardware control. Closed-loop tracking with detection feedback. Low latency (<10ms serial command). | Must implement ViewLink protocol from spec. PID tuning needed. | ViewPro A40 documentation, pyserial | Physical hardware access only | Free + hardware | **Best fit. Direct control, no middleware.** |
+| ArduPilot integration via MAVLink | ArduPilot, MAVLink, pymavlink | Battle-tested gimbal driver. Well-documented. | Requires ArduPilot flight controller. Additional latency through FC. May conflict with mission planner. | Pixhawk or similar FC running ArduPilot 4.5+ | MAVLink protocol | Pixhawk hardware ($50-200) | Alternative if ArduPilot is already used for flight control |
+
+**Scan controller state machine**:
+
+```
+┌─────────────────┐
+│   LEVEL 1       │
+│   Wide Sweep    │◀──────────────────────────┐
+│   Medium Zoom   │                            │
+└────────┬────────┘                            │
+         │ POI detected                        │ Analysis complete
+         ▼                                     │ or timeout (5s)
+┌─────────────────┐                            │
+│   ZOOM          │                            │
+│   TRANSITION    │                            │
+│   1-2 seconds   │                            │
+└────────┬────────┘                            │
+         │ Zoom complete                       │
+         ▼                                     │
+┌─────────────────┐     ┌──────────────┐       │
+│   LEVEL 2       │────▶│  PATH        │───────┤
+│   Detailed Scan │     │  FOLLOWING   │       │
+│   High Zoom     │     │  Pan along   │       │
+└────────┬────────┘     │  detected    │       │
+         │              │  path        │       │
+         │              └──────────────┘       │
+         │ Endpoint found                      │
+         ▼                                     │
+┌─────────────────┐                            │
+│   ENDPOINT      │                            │
+│   ANALYSIS      │────────────────────────────┘
+│   Hold + VLM    │
+└─────────────────┘
+```
+
+**Level 1 sweep pattern**: Sinusoidal yaw oscillation centered on flight heading. Amplitude: ±30° (configurable). Period: matched to ground speed so adjacent sweeps overlap by 20%. Pitch: slightly downward (configurable based on altitude and desired ground coverage).
+
+**Path-following control loop**:
+1. Tier 1 outputs footpath segmentation mask
+2. Extract path centerline direction (from skeleton)
+3. Compute error: path center vs frame center
+4. PID controller adjusts gimbal yaw/pitch to minimize error
+5. Update rate: tied to detection frame rate (10-30 FPS)
+6. When path endpoint reached or path leaves frame: stop following, analyze endpoint
+
+**POI queue management**: Priority queue sorted by: (1) detection confidence, (2) proximity to current camera position (minimize slew time), (3) recency. Max queue size: 20 POIs. Older/lower-confidence entries expire after 30 seconds.
+
+### Component 5: Integration with Existing Detections Service
+
+| Solution | Tools | Advantages | Limitations | Requirements | Security | Cost | Fit |
+|----------|-------|-----------|-------------|-------------|----------|------|-----|
+| **Extend existing Cython codebase + separate VLM process (recommended)** | Cython, TensorRT, Unix socket IPC | Maintains existing architecture. YOLO26/YOLOE-26 fits naturally into TRT pipeline. VLM isolated in separate process. | VLM IPC adds small latency. Two processes to manage. | Cython extensions, process management | Process isolation | Free | **Best fit. Minimal disruption to existing system.** |
+| Microservice architecture | FastAPI, Docker, gRPC | Clean separation. Independent scaling. | Overhead for single Jetson. Over-engineered for edge. | Docker, networking | Service mesh | Free | Over-engineered for single device |
+
+**Integration architecture**:
+
+```
+┌─────────────────────────────────────────────────────────┐
+│                    Main Process (Cython + TRT)           │
+│                                                          │
+│  ┌──────────────┐   ┌──────────────┐   ┌──────────────┐ │
+│  │ Existing     │   │ YOLOE-26     │   │ Scan         │ │
+│  │ YOLO Det     │   │ Semantic     │   │ Controller   │ │
+│  │ (TRT Engine) │   │ (TRT Engine) │   │ + Gimbal     │ │
+│  └──────┬───────┘   └──────┬───────┘   └──────┬───────┘ │
+│         │                  │                   │         │
+│         └─────────┬────────┘                   │         │
+│                   ▼                            │         │
+│         ┌──────────────┐                       │         │
+│         │ Spatial      │◀──────────────────────┘         │
+│         │ Reasoning    │                                 │
+│         │ + CNN Class. │                                 │
+│         └──────┬───────┘                                 │
+│                │ ambiguous cases                          │
+│                ▼                                         │
+│         ┌──────────────┐                                 │
+│         │ IPC (Unix    │                                 │
+│         │ Socket)      │                                 │
+│         └──────┬───────┘                                 │
+└────────────────┼─────────────────────────────────────────┘
+                 │
+                 ▼
+┌────────────────────────────┐
+│  VLM Process (Python)      │
+│  UAV-VL-R1 (vLLM/TRT-LLM) │
+│  INT8 quantized            │
+└────────────────────────────┘
+```
+
+**Data flow**:
+1. Camera frame → existing YOLO detection (scene context: vehicles, debris, structures)
+2. Same frame → YOLOE-26 semantic detection (footpaths, branch piles, entrances)
+3. YOLO context + YOLOE-26 detections → Spatial reasoning module
+4. Spatial reasoning: path tracing, endpoint analysis, CNN classification
+5. High-confidence detections → operator notification (bounding box + coordinates)
+6. Ambiguous detections → VLM process via IPC → response → operator notification
+7. All detections → scan controller → gimbal commands (Level 1/2 transitions, path following)
+
+**GPU scheduling**: YOLO and YOLOE-26 can share a single TRT engine (YOLOE-26 re-parameterized to standard YOLO26 weights for fixed classes). VLM inference is sequential: pause YOLO frames, run VLM, resume YOLO. VLM analysis typically lasts 3-5s during which the camera holds position (endpoint analysis phase).
+
+## Training & Data Strategy
+
+### Phase 1: Zero-shot (Week 1-2)
+- Deploy YOLOE-26 with text/visual prompts
+- Use semantic01-04.png as visual prompt references
+- Tune text prompt class names and confidence thresholds
+- Collect false positive/negative data for annotation
+
+### Phase 2: Annotation & Fine-tuning (Week 3-8)
+- Annotate collected data using existing annotation tooling
+- Priority order: footpaths (segmentation masks) → branch piles (bboxes) → entrances (bboxes) → roads (segmentation) → trees (bboxes)
+- Use SAM (Segment Anything Model) for semi-automated segmentation mask generation
+- Target: 500+ images per class by week 6, 1500+ by week 8
+- Fine-tune YOLO26-Seg on custom dataset using linear probing first, then full fine-tuning
+
+### Phase 3: Custom CNN classifier (Week 6-10)
+- Train MobileNetV3-Small binary classifier on ROI crops from endpoint analysis
+- Positive: annotated concealed positions. Negative: natural path termini, random terrain
+- Target: 300+ positive, 1000+ negative samples
+- Export to TensorRT FP16
+
+### Phase 4: VLM integration (Week 8-12)
+- Deploy UAV-VL-R1 INT8 as separate process
+- Tune prompting strategy on collected ambiguous cases
+- Optional: fine-tune UAV-VL-R1 on domain-specific data if base accuracy insufficient
+
+### Phase 5: Seasonal expansion (Month 3+)
+- Winter data → spring/summer annotation campaigns
+- Re-train all models with multi-season data
+- Expect accuracy degradation in summer (vegetation occlusion), mitigate with larger dataset
+
+## Testing Strategy
+
+### Integration / Functional Tests
+- YOLOE-26 text prompt detection on reference images (semantic01-04.png) — verify footpath and hideout regions are flagged
+- Path tracing on synthetic segmentation masks — verify skeleton, endpoint, junction detection
+- CNN classifier on known positive/negative ROI crops — verify binary output correctness
+- Gimbal control loop on simulated camera feed — verify PID convergence and path-following accuracy
+- VLM IPC round-trip — verify request/response latency and correctness
+- Full pipeline test: frame → YOLOE detection → path tracing → CNN → VLM → operator notification
+- Scan controller state machine: verify Level 1 → Level 2 transitions, timeout, return to Level 1
+
+### Non-Functional Tests
+- Tier 1 latency: measure end-to-end YOLOE-26 inference on Jetson Orin Nano Super ≤100ms
+- Tier 2 latency: path tracing + CNN classification ≤200ms
+- Tier 3 latency: VLM IPC + inference ≤5 seconds
+- Memory profiling: total RAM usage under YOLO + YOLOE + CNN + VLM concurrent loading
+- Thermal stress test: continuous inference for 30+ minutes without thermal throttling
+- False positive rate measurement on known clean terrain
+- False negative rate measurement on annotated concealed positions
+- Gimbal response test: measure physical camera movement latency vs command
+
+## References
+
+- YOLO26 docs: https://docs.ultralytics.com/models/yolo26/
+- YOLOE docs: https://docs.ultralytics.com/models/yoloe/
+- YOLOE-26 paper: https://arxiv.org/abs/2602.00168
+- YOLO26 Jetson benchmarks: https://docs.ultralytics.com/guides/nvidia-jetson
+- UAV-VL-R1: https://arxiv.org/pdf/2508.11196, https://github.com/Leke-G/UAV-VL-R1
+- SmolVLM2: https://huggingface.co/blog/smolervlm
+- Moondream: https://moondream.ai/blog/introducing-moondream-0-5b
+- ViewPro Protocol: https://www.viewprotech.com/index.php?ac=article&at=read&did=510
+- ArduPilot ViewPro: https://ardupilot.org/copter/docs/common-viewpro-gimbal.html
+- FootpathSeg: https://github.com/WennyXY/FootpathSeg
+- UAV-YOLO12: https://www.mdpi.com/2072-4292/17/9/1539
+- Trail segmentation (UNet+MambaOut): https://arxiv.org/pdf/2504.12121
+- servopilot PID: https://pypi.org/project/servopilot/
+- Camouflage detection (OVCOS): https://arxiv.org/html/2506.19300v1
+- Jetson AI Lab benchmarks: https://www.jetson-ai-lab.com/tutorials/genai-benchmarking/
+- Cosmos-Reason2-2B benchmarks: Embedl blog (Feb 2026)
+
+## Related Artifacts
+- AC assessment: `_docs/00_research/00_ac_assessment.md`
+- Tech stack evaluation: `_docs/01_solution/tech_stack.md` (if Phase 3 was executed)
+- Security analysis: `_docs/01_solution/security_analysis.md` (if Phase 4 was executed)
@@ -0,0 +1,370 @@
+# Solution Draft
+
+## Assessment Findings
+
+| Old Component Solution | Weak Point (functional/security/performance) | New Solution |
+|------------------------|----------------------------------------------|-------------|
+| YOLOE-26-seg TRT engine | YOLO26 has confirmed TRT confidence misalignment and INT8 export crashes on Jetson (bugs #23841, Hackster.io report). YOLOE-26 inherits these bugs. | Use YOLOE-v8-seg for initial deployment (proven TRT stability). Transition to YOLOE-26 once Ultralytics fixes TRT issues. |
+| Two separate TRT engines (existing YOLO + YOLOE-26) | Combined memory ~5-6GB exceeds usable 5.2GB VRAM. cuDNN overhead ~1GB per engine. | Single merged TRT engine: YOLOE-v8-seg re-parameterized with fixed classes merges into existing YOLO pipeline. One engine, one CUDA context. |
+| UAV-VL-R1 (2B) via vLLM ≤5s | TRT-LLM does not support edge. 2B VLM: ~4.7 tok/s → 10-21s for useful response. VLM (2.5GB) cannot fit alongside YOLO in memory. | Moondream 0.5B (816 MiB INT4) as primary VLM. Demand-loaded: unload YOLO → load VLM → analyze batch → unload → reload YOLO. Background mode, not real-time. |
+| Text prompts for concealment classes | Military concealment classes are far OOD from LVIS/COCO training data. "dugout", "camouflage netting" unlikely to work. | Visual prompts (SAVPE) primary for concealment. Text prompts only for in-distribution classes (footpath, road, trail). Multimodal fusion (text+visual) for robustness. |
+| Zhang-Suen skeletonization raw | Noise-sensitive: spurious branches from noisy aerial segmentation masks. | Add preprocessing pipeline: Gaussian blur → threshold → morphological closing → skeletonization → branch pruning (remove < 20px branches). Increase ROI to 256×256. |
+| PID-only gimbal control | PID cannot compensate UAV attitude drift and mounting errors during flight. | Kalman filter + PID cascade: Kalman estimates state from IMU → PID corrects error → gimbal actuates. |
+| 1500 images/class in 8 weeks | Optimistic for military concealment data collection. Access constraints, annotation complexity. | 300-500 real + 1000+ synthetic (GenCAMO/CamouflageAnything) per class. Active learning loop from YOLOE zero-shot. |
+| No security measures | Small edge YOLO models vulnerable to adversarial patches. Physical device capture risk. No data protection. | Three layers: PatchBlock adversarial defense, encrypted model weights at rest, auto-wipe on tamper. |
+
+## Product Solution Description
+
+A three-tier semantic detection system for identifying concealed/camouflaged positions from reconnaissance UAV aerial imagery, running on Jetson Orin Nano Super alongside the existing YOLO detection pipeline. Redesigned for the 5.2GB usable VRAM budget with demand-loaded VLM.
+
+```
+┌─────────────────────────────────────────────────────────────────────────┐
+│                        JETSON ORIN NANO SUPER                          │
+│                        (5.2 GB usable VRAM)                            │
+│                                                                        │
+│  ┌──────────┐    ┌──────────────────────┐    ┌───────────────────────┐ │
+│  │ ViewPro  │───▶│  Tier 1              │───▶│  Tier 2               │ │
+│  │ A40      │    │  Merged TRT Engine   │    │  Path Preprocessing   │ │
+│  │ Camera   │    │  YOLOE-v8-seg        │    │  + Skeletonization    │ │
+│  │          │    │  + Existing YOLO     │    │  + MobileNetV3-Small  │ │
+│  │          │    │  ≤15ms               │    │  ≤200ms               │ │
+│  └────▲─────┘    └──────────────────────┘    └───────────┬───────────┘ │
+│       │                                                  │             │
+│  ┌────┴─────┐    ┌──────────────┐                        │ ambiguous   │
+│  │ Gimbal   │◀───│  Scan        │                        ▼             │
+│  │ Kalman   │    │  Controller  │              ┌───────────────────┐   │
+│  │ + PID    │    │  (L1/L2)     │              │ VLM Queue         │   │
+│  └──────────┘    └──────────────┘              │ (batch when ≥3    │   │
+│                                                │  or on demand)    │   │
+│  ┌──────────────────────────────┐              └────────┬──────────┘   │
+│  │  PatchBlock Adversarial     │                        │              │
+│  │  Defense (CPU preprocessing)│              [demand-load cycle]      │
+│  └──────────────────────────────┘              ┌────────▼──────────┐   │
+│                                                │ Tier 3            │   │
+│                                                │ Moondream 0.5B    │   │
+│                                                │ 816 MiB INT4      │   │
+│                                                │ ~5-10s per image  │   │
+│                                                └───────────────────┘   │
+└─────────────────────────────────────────────────────────────────────────┘
+```
+
+The system operates in two scan levels:
+- **Level 1 (Wide Sweep)**: Camera at medium zoom. Merged TRT engine runs YOLOE-v8-seg (visual + text prompts) and existing YOLO detection simultaneously. POIs queued by confidence.
+- **Level 2 (Detailed Scan)**: Camera zooms into POI. Path preprocessing → skeletonization → endpoint CNN. High-confidence → immediate alert. Ambiguous → VLM queue.
+- **VLM Batch Analysis**: When queue reaches 3+ detections or operator requests: scan pauses, YOLO engine unloads, Moondream loads, batch analyzes, unloads, YOLO reloads. ~30-45s total cycle.
+
+Three submodules: (1) Semantic Detection AI, (2) Camera Gimbal Control, (3) Integration with existing detections service.
+
+### Memory Budget
+
+| Component | Mode | GPU Memory | Notes |
+|-----------|------|-----------|-------|
+| OS + System | Always | ~2.4 GB | From 8GB total, leaves 5.2GB usable |
+| Merged TRT Engine (YOLOE-v8-seg + YOLO) | Detection mode | ~2.8 GB | Single engine, shared CUDA context |
+| MobileNetV3-Small TRT (FP16) | Detection mode | ~50 MB | Tiny binary classifier |
+| OpenCV + NumPy buffers | Always | ~200 MB | Frame buffers, masks |
+| PatchBlock defense | Always | ~50 MB | CPU-based, minimal GPU |
+| **Total in Detection Mode** | | **~3.1 GB** | **1.7 GB headroom** |
+| Moondream 0.5B INT4 | VLM mode | ~816 MB | Demand-loaded |
+| vLLM overhead + KV cache | VLM mode | ~500 MB | Minimal for 0.5B model |
+| **Total in VLM Mode** | | **~1.6 GB** | **After unloading TRT engine** |
+
+## Architecture
+
+### Component 1: Tier 1 — Real-Time Detection (YOLOE-v8-seg, merged engine)
+
+| Solution | Tools | Advantages | Limitations | Requirements | Security | Performance | Fit |
+|----------|-------|-----------|-------------|-------------|----------|------------|-----|
+| **YOLOE-v8-seg re-parameterized (recommended)** | yoloe-v8s-seg.pt, Ultralytics, TensorRT FP16 | Proven TRT stability on Jetson. Zero inference overhead when re-parameterized. Visual+text multimodal fusion. Merges into existing YOLO engine. | Older architecture than YOLO26 (slightly lower base accuracy). | Ultralytics ≥8.4, TensorRT, JetPack 6.2 | PatchBlock CPU preprocessing | ~13ms FP16 (s-size) | **Best fit for stable deployment.** |
+| YOLOE-26-seg (future upgrade) | yoloe-26s-seg.pt, TensorRT | Better accuracy (YOLO26 architecture). NMS-free. | Active TRT bugs on Jetson: confidence misalignment, INT8 crash. | Wait for Ultralytics fix | Same | ~7ms FP16 (estimated) | **Future upgrade when TRT bugs resolved.** |
+| YOLO26-Seg custom-trained (production) | yolo26s-seg.pt fine-tuned | Highest accuracy for known classes. | Requires 1500+ annotated images/class. Same TRT bugs. | Custom dataset, GPU for training | Same | ~7ms FP16 | **Long-term production model.** |
+
+**Prompt strategy (revised)**:
+
+Text prompts (in-distribution classes only):
+- `"footpath"`, `"trail"`, `"path"`, `"road"`, `"track"`
+- `"tree row"`, `"tree line"`, `"clearing"`
+
+Visual prompts (SAVPE, for concealment-specific detection):
+- Reference images cropped from semantic01-04.png: branch piles, dark entrances, dugout structures
+- Use multimodal fusion mode: `concat` (zero overhead)
+
+```python
+from ultralytics import YOLOE
+
+model = YOLOE("yoloe-v8s-seg.pt")
+
+text_classes = ["footpath", "trail", "road", "tree row", "clearing"]
+model.set_classes(text_classes)
+
+results = model.predict(
+    frame,
+    conf=0.15,
+    refer_image="reference_hideout.jpg",
+    visual_prompts={"bboxes": np.array([[x1, y1, x2, y2]]), "cls": np.array([0])},
+    fusion_mode="concat"
+)
+```
+
+**Re-parameterization for production**: Once classes are fixed after training, re-parameterize YOLOE-v8 to standard YOLOv8 weights. This eliminates the open-vocabulary overhead entirely, and the model becomes a regular YOLO inference engine. Merge with existing YOLO detection into a single TRT engine using TensorRT's multi-model support or batch inference.
+
+### Component 2: Tier 2 — Spatial Reasoning & CNN Confirmation
+
+| Solution | Tools | Advantages | Limitations | Requirements | Security | Performance | Fit |
+|----------|-------|-----------|-------------|-------------|----------|------------|-----|
+| **Robust path tracing + CNN classifier (recommended)** | OpenCV, scikit-image, MobileNetV3-Small TRT | Preprocessing removes noise. Branch pruning eliminates artifacts. 256×256 ROI for better context. | Still depends on segmentation quality. | OpenCV, scikit-image, PyTorch → TRT | Offline inference | ~150ms total | **Best fit. Robust against noisy masks.** |
+| GraphMorph centerline extraction | PyTorch, custom model | Topology-aware. Reduces false positives. | Requires additional model in memory. More complex integration. | PyTorch, custom training | Offline | ~200ms estimated | Upgrade path if basic approach fails |
+| Heuristic rules only | OpenCV, NumPy | No training data. Immediate. | Brittle. Cannot generalize. | None | Offline | ~50ms | Baseline/fallback for day-1 |
+
+**Revised path tracing pipeline**:
+
+1. Take footpath segmentation mask from Tier 1
+2. **Preprocessing**: Gaussian blur (σ=1.5) → binary threshold (Otsu) → morphological closing (5×5 kernel, 2 iterations) → remove small connected components (< 100px area)
+3. Skeletonize using Zhang-Suen algorithm
+4. **Branch pruning**: Remove skeleton branches shorter than 20 pixels (noise artifacts)
+5. Detect endpoints using hit-miss morphological operations (8 kernel patterns)
+6. Detect junctions using branch-point kernels
+7. Trace path segments between junctions/endpoints
+8. For each endpoint: extract **256×256** ROI crop centered on endpoint from original image
+9. Feed ROI crop to MobileNetV3-Small binary classifier
+
+**Freshness assessment** (unchanged from draft01, validated approach):
+- Edge sharpness, contrast ratio, fill ratio, path width consistency
+- Initial hand-tuned thresholds → Random Forest with annotated data
+
+### Component 3: Tier 3 — VLM Deep Analysis (Background Batch Mode)
+
+| Solution | Tools | Advantages | Limitations | Requirements | Security | Performance | Fit |
+|----------|-------|-----------|-------------|-------------|----------|------------|-----|
+| **Moondream 0.5B INT4 demand-loaded (recommended)** | Moondream, ONNX/PyTorch, INT4 | 816 MiB memory. Built-in detect()/point() APIs. Runs on Raspberry Pi. | Weaker reasoning than 2B models. Not aerial-specialized. | ONNX Runtime or PyTorch | Local only | ~2-5s per image (0.5B) | **Best fit for memory-constrained edge.** |
+| SmolVLM2-500M | HuggingFace, ONNX | 1.8GB. Small. ONNX support. | Less capable than Moondream for detection. No detect() API. | ONNX Runtime | Local only | ~3-7s estimated | Alternative if Moondream underperforms |
+| UAV-VL-R1 (2B) demand-loaded | vLLM, W4A16 | Aerial-specialized. Best reasoning for UAV imagery. | 2.5GB INT8. ~10-21s per analysis. Tight memory fit. | vLLM, W4A16 weights | Local only | ~10-21s | **Upgrade path if Moondream insufficient.** |
+| No VLM | N/A | Simplest. Most memory. Zero latency impact. | No fallback for ambiguous CNN outputs. No explanations. | None | N/A | N/A | **Viable MVP if Tier 1+2 accuracy is sufficient.** |
+
+**Demand-loading protocol**:
+
+```
+1. VLM queue reaches threshold (≥3 detections or operator request)
+2. Scan controller transitions to HOLD state (camera fixed position)
+3. Signal main process to unload TRT engine
+4. Wait for GPU memory release (~1s)
+5. Launch VLM process: load Moondream 0.5B INT4
+6. Process all queued detections sequentially (~2-5s each)
+7. Collect results, send to operator
+8. Unload VLM, release GPU memory
+9. Reload TRT engine (~2s)
+10. Resume scan from HOLD position
+Total cycle: ~30-45s for 3-5 detections
+```
+
+**VLM prompting strategy** (adapted for Moondream's capabilities):
+
+Using detect() API for fast binary check:
+```python
+model.detect(image, "concealed military position")
+model.detect(image, "dugout covered with branches")
+```
+
+Using caption for detailed analysis:
+```python
+model.caption(image, length="normal")
+```
+
+### Component 4: Camera Gimbal Control
+
+| Solution | Tools | Advantages | Limitations | Requirements | Security | Performance | Fit |
+|----------|-------|-----------|-------------|-------------|----------|------------|-----|
+| **Kalman+PID cascade with ViewLink (recommended)** | pyserial, ViewLink V3.3.3, filterpy (Kalman), servopilot (PID) | Compensates UAV attitude drift. Proven in aerospace. Smooth path-following. | More complex than PID-only. Requires IMU data feed. | ViewPro A40, pyserial, IMU data access | Physical only | <10ms command latency | **Best fit. Flight-grade control.** |
+| PID-only with ViewLink | pyserial, ViewLink V3.3.3, servopilot | Simple. Works for hovering UAV. | Drifts during flight. Cannot compensate mounting errors. | ViewPro A40, pyserial | Physical only | <10ms | Acceptable for testing only |
+
+**Revised control architecture**:
+
+```
+UAV IMU Data ──▶ Kalman Filter ──▶ State Estimate (attitude, angular velocity)
+                     │
+Camera Frame ──▶ Detection ──▶ Target Position ──▶ Error Calculation
+                                                        │
+                                    State Estimate ─────▶│
+                                                        │
+                                                   PID Controller
+                                                        │
+                                                   Gimbal Command
+                                                        │
+                                                   ViewLink Serial
+```
+
+Kalman filter state vector: [yaw, pitch, yaw_rate, pitch_rate]
+Measurement inputs: IMU gyroscope (yaw_rate, pitch_rate), detection-derived angles
+Process model: constant angular velocity with noise
+
+**Scan patterns** (unchanged from draft01): sinusoidal yaw oscillation, POI queue management.
+
+**Path-following** (revised): Kalman-filtered state estimate provides smoother tracking. PID gains can be lower (less aggressive) because state estimate is already stabilized. Update rate: tied to detection frame rate.
+
+### Component 5: Integration with Existing Detections Service
+
+| Solution | Tools | Advantages | Limitations | Requirements | Security | Performance | Fit |
+|----------|-------|-----------|-------------|-------------|----------|------------|-----|
+| **Single merged Cython+TRT process + demand-loaded VLM (recommended)** | Cython, TensorRT, ONNX Runtime | Single TRT engine. Minimal memory. VLM isolated. | VLM loading pauses detection (30-45s). | Cython extensions, process management | Process isolation + encryption | Minimal overhead | **Best fit for 5.2GB VRAM.** |
+
+**Revised integration architecture**:
+
+```
+┌───────────────────────────────────────────────────────────────────┐
+│                 Main Process (Cython + TRT)                       │
+│                                                                   │
+│  ┌──────────────────────────────────────────────┐                 │
+│  │ Single Merged TRT Engine                     │                 │
+│  │  ├─ Existing YOLO Detection heads            │                 │
+│  │  ├─ YOLOE-v8-seg (re-parameterized)         │                 │
+│  │  └─ MobileNetV3-Small classifier            │                 │
+│  └──────────────────────────────────────────────┘                 │
+│                                                                   │
+│  ┌──────────────┐  ┌──────────────┐  ┌──────────────────────────┐│
+│  │ Path Tracing │  │ Scan         │  │ PatchBlock Defense       ││
+│  │ + Skeleton   │  │ Controller   │  │ (CPU parallel)           ││
+│  │ (CPU)        │  │ + Kalman+PID │  │                          ││
+│  └──────────────┘  └──────────────┘  └──────────────────────────┘│
+│                                                                   │
+│  ┌──────────────────────────────────────────────┐                 │
+│  │ VLM Manager                                  │                 │
+│  │  state: IDLE | LOADING | ANALYZING | UNLOAD  │                 │
+│  │  queue: [detection_1, detection_2, ...]      │                 │
+│  └──────────────────────────────────────────────┘                 │
+└───────────────────────────────────────────────────────────────────┘
+
+VLM mode (demand-loaded, replaces TRT engine temporarily):
+┌───────────────────────────────────────────────────────────────────┐
+│  ┌──────────────────────────────────────────────┐                 │
+│  │ Moondream 0.5B INT4                          │                 │
+│  │ (ONNX Runtime or PyTorch)                    │                 │
+│  └──────────────────────────────────────────────┘                 │
+│  Detection paused. Camera in HOLD state.                          │
+└───────────────────────────────────────────────────────────────────┘
+```
+
+**Data flow** (revised):
+1. PatchBlock preprocesses frame on CPU (parallel with GPU inference)
+2. Cleaned frame → merged TRT engine → YOLO detections + YOLOE-v8 semantic detections
+3. Semantic detections → path preprocessing → skeletonization → endpoint extraction → CNN
+4. High-confidence → operator alert (coordinates + bounding box + confidence)
+5. Ambiguous → VLM queue
+6. VLM queue management: batch-process when queue ≥ 3 or operator triggers
+7. During VLM mode: detection paused, camera holds, operator notified of pause
+
+**GPU scheduling** (revised): No concurrent multi-model GPU sharing. Single TRT engine runs during detection mode. VLM demand-loaded exclusively during analysis mode. This eliminates the 10-40% latency jitter from GPU sharing.
+
+### Component 6: Security
+
+| Solution | Tools | Advantages | Limitations | Requirements | Security | Performance | Fit |
+|----------|-------|-----------|-------------|-------------|----------|------------|-----|
+| **Three-layer security (recommended)** | PatchBlock, LUKS/dm-crypt, tmpfs | Adversarial defense + model protection + data protection | Adds ~5ms CPU overhead for PatchBlock | PatchBlock library, Linux crypto | Full stack | Minimal GPU impact | **Required for military edge deployment.** |
+
+**Layer 1: Adversarial Input Defense**
+- PatchBlock CPU preprocessing on every frame before GPU inference
+- Detects anomalous patches via outlier detection and dimensionality reduction
+- Recovers up to 77% accuracy under adversarial attack
+- Runs in parallel with GPU inference (no latency addition to pipeline)
+
+**Layer 2: Model & Weight Protection**
+- TRT engine files encrypted at rest using LUKS on a dedicated partition
+- At boot: decrypt into tmpfs (RAM disk) — never written to persistent storage unencrypted
+- Secure boot chain via Jetson's secure boot (fuse-based, hardware root of trust)
+- If device is captured powered-off: encrypted models, no plaintext weights accessible
+
+**Layer 3: Operational Data Protection**
+- Captured imagery stored in encrypted circular buffer (last N minutes only)
+- Detection logs (coordinates, confidence, timestamps) encrypted at rest
+- Over datalink: transmit only coordinates + confidence + small thumbnail (not raw frames)
+- Tamper detection: if enclosure opened or unauthorized boot detected → auto-wipe keys + detection logs
+
+## Training & Data Strategy (Revised)
+
+### Phase 1: Zero-shot (Week 1-2)
+- Deploy YOLOE-v8-seg with multimodal prompts (text for paths, visual for concealment)
+- Use semantic01-04.png as visual prompt references via SAVPE
+- Tune confidence thresholds per class type
+- Collect false positive/negative data for annotation
+- **Benchmark YOLOE-v8-seg TRT on Jetson: confirm inference time, memory, stability**
+
+### Phase 2: Annotation & Fine-tuning (Week 3-8)
+- Annotate collected real data (target: 300-500 images/class)
+- **Generate 1000+ synthetic images per class using GenCAMO/CamouflageAnything**
+- Priority: footpaths (segmentation) → branch piles (bboxes) → entrances (bboxes)
+- Active learning: YOLOE zero-shot flags candidates → human reviews → annotates
+- Fine-tune YOLOv8-Seg (or YOLO26-Seg if TRT fixed) on real + synthetic dataset
+- Use linear probing first, then full fine-tuning
+
+### Phase 3: CNN classifier (Week 4-8, parallel with Phase 2)
+- Train MobileNetV3-Small on ROI crops: 256×256 from endpoint analysis
+- Positive: annotated concealed positions + synthetic. Negative: natural termini, random terrain
+- Target: 200+ real positive + 500+ synthetic positive, 1000+ negative
+- Export to TensorRT FP16
+
+### Phase 4: VLM integration (Week 8-12)
+- Deploy Moondream 0.5B INT4 in demand-loaded mode
+- Test demand-load cycle timing: measure unload → load → infer → unload → reload
+- Tune detect() prompts and caption prompts on collected ambiguous cases
+- **If Moondream accuracy insufficient: test UAV-VL-R1 (2B) demand-loaded**
+- **If YOLO26 TRT bugs fixed: test YOLOE-26-seg as Tier 1 upgrade**
+
+### Phase 5: Seasonal expansion (Month 3+)
+- Winter data → spring/summer annotation campaigns
+- Re-train all models with multi-season data + seasonal synthetic augmentation
+
+## Testing Strategy
+
+### Integration / Functional Tests
+- YOLOE-v8-seg multimodal prompt detection on reference images — verify text+visual fusion
+- **TRT engine stability test**: 1000 consecutive inferences without confidence drift
+- Path preprocessing pipeline on synthetic noisy masks — verify cleaning + skeletonization
+- Branch pruning: verify short spurious branches removed, real path branches preserved
+- CNN classifier on known positive/negative 256×256 ROI crops
+- **Demand-load VLM cycle**: measure timing of unload TRT → load Moondream → infer → unload → reload TRT
+- **Memory monitoring during demand-load**: confirm no memory leak across 10+ cycles
+- Kalman+PID gimbal control with simulated IMU data — verify drift compensation
+- Full pipeline: frame → PatchBlock → YOLOE-v8 → path tracing → CNN → (VLM) → alert
+- Scan controller: Level 1 → Level 2 → HOLD (for VLM) → resume Level 1
+
+### Non-Functional Tests
+- Tier 1 latency: YOLOE-v8-seg TRT FP16 ≤15ms on Jetson Orin Nano Super
+- Tier 2 latency: preprocessing + skeletonization + CNN ≤200ms
+- **VLM demand-load cycle: ≤45s for 3 detections (including load/unload overhead)**
+- **Memory profiling: peak detection mode ≤3.5GB GPU, peak VLM mode ≤2.0GB GPU**
+- Thermal stress test: 30+ minutes continuous detection without thermal throttling
+- PatchBlock adversarial test: inject test adversarial patches, measure accuracy recovery
+- False positive/negative rate on annotated reference images
+- Gimbal path-following accuracy with and without Kalman filter (measure improvement)
+- **Demand-load memory leak test: 50+ VLM cycles without memory growth**
+
+## References
+
+- YOLOE-v8 docs: https://docs.ultralytics.com/models/yoloe/
+- YOLOE-26 paper: https://arxiv.org/abs/2602.00168
+- YOLO26 TRT confidence bug: https://www.hackster.io/qwe018931/pushing-limits-yolov8-vs-v26-on-jetson-orin-nano-b89267
+- YOLO26 INT8 crash: https://github.com/ultralytics/ultralytics/issues/23841
+- YOLOE multimodal fusion: https://github.com/ultralytics/ultralytics/pull/21966
+- Jetson Orin Nano Super memory: https://forums.developer.nvidia.com/t/jetson-orin-nano-super-insufficient-gpu-memory/330777
+- Multi-model survey on Orin Nano: https://dev.to/ankk98/multi-model-ai-resource-allocation-for-humanoid-robots-a-survey-on-jetson-orin-nano-super-310i
+- TRT multiple engines: https://github.com/NVIDIA/TensorRT/issues/4358
+- TRT memory on Jetson: https://github.com/ultralytics/ultralytics/issues/21562
+- Moondream: https://moondream.ai/blog/introducing-moondream-0-5b
+- Cosmos-Reason2-2B Jetson benchmark: https://www.thenextgentechinsider.com/pulse/cosmos-reason2-runs-on-jetson-orin-nano-super-with-w4a16-quantization
+- Jetson AI Lab benchmarks: https://www.jetson-ai-lab.com/tutorials/genai-benchmarking/
+- Jetson LLM bottleneck: https://ericxliu.me/posts/benchmarking-llms-on-jetson-orin-nano/
+- vLLM on Jetson: https://learnopencv.com/deployment-on-edge-vllm-on-jetson/
+- TRT-LLM no edge support: https://github.com/NVIDIA/TensorRT-LLM/issues/7978
+- PatchBlock defense: https://arxiv.org/abs/2601.00367
+- Adversarial patches on YOLO: https://link.springer.com/article/10.1007/s10207-025-01067-3
+- GenCAMO synthetic data: https://arxiv.org/abs/2601.01181
+- CamouflageAnything (CVPR 2025): https://openaccess.thecvf.com/content/CVPR2025/html/Das_Camouflage_Anything_...
+- GraphMorph centerlines: https://arxiv.org/pdf/2502.11731
+- Learnable skeleton + SAM: https://ui.adsabs.harvard.edu/abs/2025ITGRS..63S1458X
+- Kalman filter gimbal: https://ieeexplore.ieee.org/ielx7/6287639/10005208/10160027.pdf
+- UAV-VL-R1: https://arxiv.org/pdf/2508.11196
+- ViewPro Protocol: https://www.viewprotech.com/index.php?ac=article&at=read&did=510
+- servopilot PID: https://pypi.org/project/servopilot/
+
+## Related Artifacts
+- AC assessment: `_docs/00_research/00_ac_assessment.md`
+- Previous draft: `_docs/01_solution/solution_draft01.md`
@@ -0,0 +1,319 @@
+# Solution Draft
+
+## Assessment Findings
+
+| Old Component Solution | Weak Point (functional/security/performance) | New Solution |
+|------------------------|----------------------------------------------|-------------|
+| YOLO26 as sole detection backbone | **Accuracy regression on custom datasets**: Reported YOLO26s "much less accurate" than YOLO11s on identical training data (GitHub #23206). YOLO26 is 3 months old — less battle-tested than YOLO11. | Benchmark YOLO26 vs YOLO11 on initial annotated data before committing. YOLO11 as fallback. YOLOE supports both backbones (yoloe-11s-seg, yoloe-26s-seg). |
+| YOLO26 TensorRT INT8 export | **INT8 export fails on Jetson** (TRT Error Code 2, OOM). Fix merged (PR #23928) but indicates fragile tooling. | Use FP16 only for initial deployment (confirmed stable). INT8 as future optimization after tooling matures. Pin Ultralytics version + JetPack version. |
+| vLLM as VLM runtime | **Unstable on Jetson Orin Nano**: system freezes, reboots, installation crashes, excessive memory (multiple open issues). Not production-ready for 8GB devices. | **Replace with NanoLLM/NanoVLM** — purpose-built for Jetson by NVIDIA's Dusty-NV team. Docker containers for JetPack 5/6. Supports VILA, LLaVA. Stable. Or use llama.cpp with GGUF models (proven on Jetson). |
+| No storage strategy | **SD card corruption**: Recurring corruption documented across multiple Jetson Orin Nano users. SD cards unsuitable for production. | **Mandatory NVMe SSD** for OS + models + logging. No SD card in production. Ruggedized NVMe mount for vibration resistance. |
+| No EMI protection on UART | **ViewPro documents EMI issues**: antennas cause random gimbal panning if within 35cm. Standard UART parity bit insufficient for noisy UAV environment. | Add CRC-16 checksum layer on gimbal commands. Enforce 35cm antenna separation in physical design. Consider shielded UART cable. Command retry on CRC failure (max 3 retries, then log error). |
+| No environmental hardening addressed | **UAV environment**: vibration, temperature extremes (-20°C to +50°C), dust, EMI, power fluctuations. Dev kit form factor is not field-deployable. | Use ruggedized carrier board (MILBOX-ORNX or similar) with vibration dampening. Conformal coating on exposed connectors. External temperature sensor for environmental monitoring. |
+| No logging or telemetry | **No post-flight review capability**: field system must log all detections with metadata for model iteration, operator review, and evidence collection. | Add detection logging: timestamp, GPS-denied coordinates, confidence score, detection class, JPEG thumbnail, tier that triggered, freshness metadata. Log to NVMe SSD. Export as structured format (JSON lines) after flight. |
+| No frame recording for offline replay | **Training data collection depends on field recording**: Without recording, no way to build training dataset from real flights. | Record all camera frames to NVMe at configurable rate (1-5 FPS during Level 1, full rate during Level 2). Include detection overlay option. Post-flight: use recordings for annotation. |
+| No power management | **UAV power budget is finite**: Jetson at 15W + gimbal + camera + radio. No monitoring of power draw or load shedding. | Monitor power consumption via Jetson's INA sensors. Power budget alert at 80% of allocated watts. Load shedding: disable VLM first, then reduce inference rate, then disable semantic detection. |
+| YOLO26 not validated for this domain | **No benchmark on aerial concealment detection**: All YOLO26 numbers are on COCO/LVIS. Concealment detection may behave very differently. | First sprint deliverable: benchmark YOLOE-26 (both 11 and 26 backbones) on semantic01-04.png with text/visual prompts. Report AP on initial annotated validation set before committing to backbone. |
+| Freshness and path tracing are untested algorithms | **No proven prior art**: Both freshness assessment and path-following via skeletonization are novel combinations. Risk of over-engineering before validation. | Implement minimal viable versions first. V1 path tracing: skeleton + endpoint only, no freshness, no junction following. Validate on real flight data before adding complexity. |
+
+## Product Solution Description
+
+A three-tier semantic detection system for identifying concealed/camouflaged positions from reconnaissance UAV aerial imagery, running on Jetson Orin Nano Super with NVMe SSD storage, active cooling, and ruggedized carrier board, alongside the existing YOLO detection pipeline.
+
+```
+┌──────────────────────────────────────────────────────────────────────────┐
+│          JETSON ORIN NANO SUPER (ruggedized carrier, NVMe, 15W)         │
+│                                                                          │
+│  ┌──────────┐    ┌──────────────┐    ┌──────────────┐    ┌───────────┐  │
+│  │ ViewPro  │───▶│  Tier 1      │───▶│  Tier 2      │───▶│ Tier 3    │  │
+│  │ A40      │    │  YOLOE       │    │  Path Trace  │    │ VLM       │  │
+│  │ Camera   │    │  (11 or 26   │    │  + CNN       │    │ NanoLLM   │  │
+│  │ + Frame  │    │   backbone)  │    │  ≤200ms      │    │ (L2 only) │  │
+│  │ Quality  │    │  TRT FP16    │    │              │    │ ≤5s       │  │
+│  │ Gate     │    │  ≤100ms      │    │              │    │           │  │
+│  └────▲─────┘    └──────────────┘    └──────────────┘    └───────────┘  │
+│       │                                                                  │
+│  ┌────┴─────┐    ┌──────────────┐    ┌──────────────┐    ┌───────────┐  │
+│  │ Gimbal   │◀───│  Scan        │    │  Watchdog    │    │ Recorder  │  │
+│  │ Control  │    │  Controller  │    │  + Thermal   │    │ + Logger  │  │
+│  │ + CRC    │    │  (L1/L2 FSM) │    │  + Power     │    │ (NVMe)   │  │
+│  └──────────┘    └──────────────┘    └──────────────┘    └───────────┘  │
+│                                                                          │
+│  ┌──────────────────────────────┐                                       │
+│  │  Existing YOLO Detection    │ (always running, scene context)        │
+│  │  Cython + TRT               │                                        │
+│  └──────────────────────────────┘                                       │
+└──────────────────────────────────────────────────────────────────────────┘
+```
+
+Key changes from draft02:
+- **YOLOE backbone is configurable** (YOLO11 or YOLO26) — benchmark before committing
+- **NanoLLM replaces vLLM** as VLM runtime (purpose-built for Jetson, stable)
+- **NVMe SSD mandatory** — no SD card in production
+- **CRC-16 on gimbal UART** — EMI protection
+- **Detection logger + frame recorder** — post-flight review and training data collection
+- **Ruggedized carrier board** — vibration, temperature, dust protection
+- **Power monitoring + load shedding** — finite UAV power budget
+- **FP16 only** for initial deployment (INT8 export unstable on Jetson)
+- **Minimal V1 for unproven components** — path tracing and freshness start simple
+
+## Architecture
+
+### Component 1: Tier 1 — Real-Time Detection
+
+| Solution | Tools | Advantages | Limitations | Requirements | Security | Performance | Fit |
+|----------|-------|-----------|-------------|-------------|----------|------------|-----|
+| **YOLOE with configurable backbone (recommended)** | yoloe-11s-seg.pt or yoloe-26s-seg.pt, set_classes() → TRT FP16 | Supports both YOLO11 and YOLO26 backbones. Benchmark on real data, pick winner. set_classes() bakes CLIP embeddings for zero overhead. | YOLO26 may regress on custom data vs YOLO11. Needs empirical comparison. | Ultralytics ≥8.4 (pinned version), TensorRT, JetPack 6.2 | Local only | YOLO11s TRT FP16: ~7ms (640px). YOLO26s: similar or slightly faster. | **Best fit. Hedge against backbone risk.** |
+
+**Version pinning strategy**:
+- Pin `ultralytics==8.4.X` (specific patch version validated on Jetson)
+- Pin JetPack 6.2 + TensorRT version
+- Test every Ultralytics update in staging before deploying to production
+- Keep both yoloe-11s-seg and yoloe-26s-seg TRT engines on NVMe; switch via config
+
+**YOLO backbone selection process (Sprint 1)**:
+1. Annotate 200 frames from real flight footage (footpaths, branch piles, entrances)
+2. Fine-tune YOLOE-11s-seg and YOLOE-26s-seg on same dataset, same hyperparameters
+3. Evaluate on held-out validation set (50 frames)
+4. Pick backbone with higher mAP50
+5. If delta < 2%: pick YOLO26 (faster CPU inference, NMS-free deployment)
+
+### Component 2: Tier 2 — Spatial Reasoning & CNN Confirmation
+
+| Solution | Tools | Advantages | Limitations | Requirements | Security | Performance | Fit |
+|----------|-------|-----------|-------------|-------------|----------|------------|-----|
+| **V1 minimal path tracing + heuristic classifier (recommended for initial release)** | OpenCV, scikit-image | No training data needed. Skeleton + endpoint detection + simple heuristic: "dark mass at endpoint → flag." Fast to implement and validate. | Low accuracy. Many false positives. | OpenCV, scikit-image | Offline | ~30ms | **V1: ship fast, validate on real data.** |
+| **V2 trained CNN (after data collection)** | MobileNetV3-Small, TensorRT FP16 | Higher accuracy after training. Dynamic ROI sizing. | Needs 300+ positive, 1000+ negative annotated ROI crops. | PyTorch, TRT export | Offline | ~5-10ms classification | **V2: replace heuristic once data exists.** |
+
+**V1 heuristic for endpoint analysis** (no training data needed):
+1. Skeletonize footpath mask with branch pruning
+2. Find endpoints
+3. For each endpoint: extract ROI (dynamic size based on GSD)
+4. Compute: mean_darkness = mean intensity in ROI center 50%. contrast = (surrounding_mean - center_mean) / surrounding_mean. area_ratio = dark_pixel_count / total_pixels.
+5. If mean_darkness < threshold_dark AND contrast > threshold_contrast → flag as potential concealed position
+6. Thresholds: configurable, tuned per season. Start with winter values.
+
+**V1 freshness** (metadata only, not a filter):
+- contrast_ratio of path vs surrounding terrain
+- Report as: "high contrast" (likely fresh) / "low contrast" (likely stale)
+- No binary classification. Operator sees all detections with freshness tag.
+
+### Component 3: Tier 3 — VLM Deep Analysis
+
+| Solution | Tools | Advantages | Limitations | Requirements | Security | Performance | Fit |
+|----------|-------|-----------|-------------|-------------|----------|------------|-----|
+| **NanoLLM with VILA-2.7B or VILA1.5-3B (recommended)** | NanoLLM Docker container, MLC/TVM quantization | Purpose-built for Jetson by NVIDIA team. Stable Docker containers. Optimized memory management. Supports VLMs natively. | Limited model selection (VILA, LLaVA, Obsidian). Not all VLMs available. | Docker, JetPack 6, NVMe for container storage | Local only, container isolation | ~15-25 tok/s on Orin Nano (4-bit MLC) | **Most stable Jetson VLM option.** |
+| llama.cpp with GGUF VLM | llama.cpp, GGUF model files | Lightweight. No Docker needed. Proven stability on Jetson. Wide model support. | Manual build. Less optimized than NanoLLM for Jetson GPU. | llama.cpp build, GGUF weights | Local only | ~10-20 tok/s estimated | **Fallback if NanoLLM doesn't support needed model.** |
+| ~~vLLM~~ | ~~vLLM Docker~~ | ~~High throughput~~ | **System freezes, reboots, installation crashes on Orin Nano. Multiple open bugs. Not production-ready.** | N/A | N/A | N/A | **Not recommended.** |
+
+**Model selection for NanoLLM**:
+- Primary: VILA1.5-3B (confirmed on Orin Nano, multimodal, 4-bit MLC)
+- If UAV-VL-R1 GGUF weights become available: use via llama.cpp (aerial-specialized)
+- Fallback: Obsidian-3B (mini VLM, lower accuracy but very fast)
+
+### Component 4: Camera Gimbal Control
+
+| Solution | Tools | Advantages | Limitations | Requirements | Security | Performance | Fit |
+|----------|-------|-----------|-------------|-------------|----------|------------|-----|
+| **ViewLink serial driver + CRC-16 + PID + watchdog (recommended)** | pyserial, crcmod, PID library, threading | Robust communication. CRC catches EMI-corrupted commands. Retry logic. Watchdog. | ViewLink protocol implementation from spec. Physical EMI mitigation required. | ViewPro docs, UART, shielded cable, 35cm antenna separation | Physical only | <10ms command + CRC overhead negligible | **Production-grade.** |
+
+**UART reliability layer**:
+```
+Packet format: [SOF(2)] [CMD(N)] [CRC16(2)]
+- SOF: 0xAA 0x55 (start of frame)
+- CMD: ViewLink command bytes per protocol spec
+- CRC16: CRC-CCITT over CMD bytes
+```
+- On send: compute CRC-16, append to ViewLink command packet
+- On receive (gimbal feedback): validate CRC-16. Discard corrupted frames.
+- On CRC failure (send): retry up to 3 times with 10ms delay. Log failure after 3 retries.
+- Note: Check if ViewLink protocol already includes checksums (read full spec first). If so, use native checksum; don't add redundant CRC.
+
+**Physical EMI mitigation checklist**:
+- [ ] Gimbal UART cable: shielded, shortest possible run
+- [ ] Video/data transmitter antenna: ≥35cm from gimbal (ViewPro recommendation)
+- [ ] Independent power supply for gimbal (or filtered from main bus)
+- [ ] Ferrite beads on UART cable near Jetson connector
+
+### Component 5: Recording, Logging & Telemetry
+
+| Solution | Tools | Advantages | Limitations | Requirements | Security | Performance | Fit |
+|----------|-------|-----------|-------------|-------------|----------|------------|-----|
+| **NVMe-backed frame recorder + JSON-lines detection logger (recommended)** | OpenCV VideoWriter / JPEG sequences, JSON lines, NVMe SSD | Post-flight review. Training data collection. Evidence. Detection audit trail. | NVMe write bandwidth (~500 MB/s) more than sufficient. Storage: ~2GB/min at 1080p 5FPS JPEG. | NVMe SSD ≥256GB | Physical access to NVMe | ~5ms per frame write (async) | **Essential for field deployment.** |
+
+**Detection log format** (JSON lines, one per detection):
+```json
+{
+  "ts": "2026-03-19T14:32:01.234Z",
+  "frame_id": 12345,
+  "gps_denied_lat": 48.123456,
+  "gps_denied_lon": 37.654321,
+  "tier": 1,
+  "class": "footpath",
+  "confidence": 0.72,
+  "bbox": [0.12, 0.34, 0.45, 0.67],
+  "freshness": "high_contrast",
+  "tier2_result": "concealed_position",
+  "tier2_confidence": 0.85,
+  "tier3_used": false,
+  "thumbnail_path": "frames/12345_det_0.jpg"
+}
+```
+
+**Frame recording strategy**:
+- Level 1: record every 5th frame (1-2 FPS) — overview coverage
+- Level 2: record every frame (30 FPS) — detailed analysis footage
+- Storage budget: 256GB NVMe ≈ 2 hours at Level 2 full rate, or 10+ hours at Level 1 rate
+- Circular buffer: when storage >80% full, overwrite oldest Level 1 frames (keep Level 2)
+
+### Component 6: System Health & Resilience
+
+**Monitoring threads**:
+
+| Monitor | Check Interval | Threshold | Action |
+|---------|---------------|-----------|--------|
+| Thermal (T_junction) | 1s | >75°C | Degrade to Level 1 only |
+| Thermal (T_junction) | 1s | >80°C | Disable semantic detection |
+| Power (Jetson INA) | 2s | >80% budget | Disable VLM |
+| Power (Jetson INA) | 2s | >90% budget | Reduce inference rate to 5 FPS |
+| Gimbal heartbeat | 2s | No response | Force Level 1 sweep pattern |
+| Semantic process | 5s | No heartbeat | Restart with 5s backoff, max 3 attempts |
+| VLM process | 5s | No heartbeat | Mark Tier 3 unavailable, continue Tier 1+2 |
+| NVMe free space | 60s | <20% free | Switch to Level 1 recording rate only |
+| Frame quality | per frame | Laplacian var < threshold | Skip frame, use buffered good frame |
+
+**Graceful degradation** (4 levels, unchanged from draft02):
+
+| Level | Condition | Capability |
+|-------|-----------|-----------|
+| 0 — Full | All nominal, T < 70°C | Tier 1+2+3, Level 1+2, gimbal, recording |
+| 1 — No VLM | VLM unavailable or T > 75°C or power > 80% | Tier 1+2, Level 1+2, gimbal, recording |
+| 2 — No semantic | Semantic crashed 3x or T > 80°C | Existing YOLO only, Level 1 sweep, recording |
+| 3 — No gimbal | Gimbal UART failed 3x | Existing YOLO only, fixed camera, recording |
+
+### Component 7: Integration & Deployment
+
+**Hardware BOM additions** (beyond existing system):
+
+| Item | Purpose | Estimated Cost |
+|------|---------|----------------|
+| NVMe SSD ≥256GB (industrial grade) | OS + models + recording + logging | $40-80 |
+| Active cooling fan (30mm+) | Prevent thermal throttling | $10-20 |
+| Ruggedized carrier board (e.g., MILBOX-ORNX or custom) | Vibration, temperature, dust protection | $200-500 |
+| Shielded UART cable + ferrite beads | EMI protection for gimbal communication | $10-20 |
+| Total additional hardware | | ~$260-620 |
+
+**Software deployment**:
+- OS: JetPack 6.2 on NVMe SSD
+- YOLOE models: TRT FP16 engines on NVMe (both 11 and 26 backbone variants)
+- VLM: NanoLLM Docker container on NVMe
+- Existing YOLO: current Cython + TRT pipeline (unchanged)
+- New Cython modules: semantic detection, gimbal control, scan controller, recorder
+- VLM process: separate Docker container, IPC via Unix socket
+- Config: YAML file for all thresholds, class names, scan parameters, degradation thresholds
+
+**Version control & update strategy**:
+- Pin all dependency versions (Ultralytics, TensorRT, NanoLLM, OpenCV)
+- Model updates: swap TRT engine files on NVMe, restart service
+- Config updates: edit YAML, restart service
+- No over-the-air updates (air-gapped system). USB drive for field updates.
+
+## Training & Data Strategy
+
+### Phase 0: Benchmark Sprint (Week 1-2)
+- Deploy YOLOE-26s-seg and YOLOE-11s-seg in open-vocab mode
+- Test text/visual prompts on semantic01-04.png + 50 additional frames
+- Record results. Pick backbone with better qualitative detection.
+- Deploy V1 heuristic endpoint analysis (no CNN, no training data needed)
+- First field test flight with recording enabled
+
+### Phase 1: Field validation & data collection (Week 2-6)
+- Deploy TRT FP16 engine with best backbone
+- Record all flights to NVMe
+- Operator marks detections as true/false positive in post-flight review
+- Build annotation backlog from recorded frames
+- Target: 500 annotated frames by week 6
+
+### Phase 2: Custom model training (Week 6-10)
+- Fine-tune YOLOE-Seg on custom dataset (linear probing → full fine-tune)
+- Train MobileNetV3-Small CNN on endpoint ROI crops
+- A/B test: custom model vs YOLOE zero-shot on validation set
+- Deploy winning model as new TRT engine
+
+### Phase 3: VLM & refinement (Week 8-14)
+- Deploy NanoLLM with VILA1.5-3B
+- Tune prompting on collected ambiguous cases
+- Train freshness classifier (if enough annotated freshness labels exist)
+- Target: 1500+ images per class
+
+### Phase 4: Seasonal expansion (Month 4+)
+- Spring/summer annotation campaigns
+- Re-train all models with multi-season data
+- Adjust heuristic thresholds per season (configurable via YAML)
+
+## Testing Strategy
+
+### Integration / Functional Tests
+- YOLOE text prompt detection on reference images (both 11 and 26 backbones)
+- TRT FP16 export on Jetson Orin Nano Super (verify no OOM, no crash)
+- V1 heuristic endpoint analysis on 20 synthetic masks (10 with hideouts, 10 without)
+- Frame quality gate: inject blurry frames, verify rejection
+- Gimbal CRC layer: inject corrupted commands, verify retry + log
+- Gimbal watchdog: simulate hang, verify forced Level 1 within 2.5s
+- NanoLLM VLM: load model, run inference on 10 aerial images, verify output + memory
+- VLM load/unload cycle: 10 cycles without memory leak
+- Detection logger: verify JSON-lines format, all fields populated
+- Frame recorder: verify NVMe write speed, no dropped frames at 30 FPS
+- Full pipeline end-to-end on recorded flight footage (offline replay)
+- Graceful degradation: simulate each failure mode, verify correct degradation level
+
+### Non-Functional Tests
+- Tier 1 latency on Jetson Orin Nano Super TRT FP16: ≤100ms (both backbones)
+- Tier 2 latency (V1 heuristic): ≤50ms. (V2 CNN): ≤200ms
+- Tier 3 latency (NanoLLM VLM): ≤5 seconds
+- Memory peak: all components loaded < 7GB
+- Thermal: 60-minute sustained inference, T_junction < 75°C with active cooling
+- NVMe endurance: continuous recording for 2 hours, verify no write errors
+- Power draw: measure at each degradation level, verify within UAV power budget
+- EMI test: operate near data transmitter antenna, verify no gimbal anomalies with CRC layer
+- Cold start: power on → first detection within 60 seconds (model load time)
+- Vibration: mount Jetson on vibration table, run inference, compare detection accuracy vs static
+
+## Technology Maturity Assessment
+
+| Component | Technology | Maturity | Risk | Mitigation |
+|-----------|-----------|----------|------|------------|
+| Tier 1 Detection | YOLOE/YOLO26/YOLO11 | **Medium** — YOLO26 is 3 months old, reported regressions on custom data. YOLOE-26 even newer. | Medium | Benchmark both backbones. YOLO11 is battle-tested fallback. Pin versions. |
+| TRT FP16 Export | TensorRT on JetPack 6.2 | **High** — FP16 is stable on Jetson. Well-documented. | Low | FP16 only. Avoid INT8 initially. |
+| TRT INT8 Export | TensorRT on JetPack 6.2 | **Low** — Documented crashes (PR #23928). Calibration issues. | High | Defer to Phase 3+. FP16 sufficient for now. |
+| VLM (NanoLLM) | NanoLLM + VILA-3B | **Medium-High** — Purpose-built for Jetson by NVIDIA team. Docker-based. Monthly releases. | Low | More stable than vLLM. Use Docker containers. |
+| VLM (vLLM) | vLLM on Jetson | **Low** — System freezes, crashes, open bugs. | **High** | **Do not use.** NanoLLM instead. |
+| Path Tracing | Skeletonization + OpenCV | **High** — Decades-old algorithms. Well-understood. | Low | Pruning needed for noisy inputs. |
+| CNN Classifier | MobileNetV3-Small + TRT | **High** — Proven architecture. TRT FP16 stable. | Low | Standard transfer learning. |
+| Gimbal Control | ViewLink Serial Protocol | **Medium** — Protocol documented. ArduPilot driver exists. | Medium | EMI mitigation critical. CRC layer. |
+| Freshness Assessment | Novel heuristic | **Low** — No prior art. Experimental. | High | V1: metadata only, not a filter. Iterate with data. |
+| NVMe Storage | Industrial NVMe on Jetson | **High** — Production standard. SD card alternative is unreliable. | Low | Use industrial-grade SSD. |
+| Ruggedized Hardware | MILBOX-ORNX or custom | **High** — Established product. Designed for Jetson + UAV. | Low | Standard procurement. |
+
+## References
+
+- YOLO26 docs: https://docs.ultralytics.com/models/yolo26/
+- YOLOE docs: https://docs.ultralytics.com/models/yoloe/
+- YOLO26 accuracy regression: https://github.com/ultralytics/ultralytics/issues/23206
+- YOLO26 TRT INT8 crash: https://github.com/ultralytics/ultralytics/issues/23841
+- YOLO26 TRT OOM fix: https://github.com/ultralytics/ultralytics/pull/23928
+- vLLM Jetson freezes: https://github.com/dusty-nv/jetson-containers/issues/800
+- vLLM Jetson install crash: https://github.com/vllm-project/vllm/issues/23376
+- NanoLLM docs: https://dusty-nv.github.io/NanoLLM/
+- NanoVLM: https://jetson-ai-lab.com/tutorial_nano-vlm.html
+- MILBOX-ORNX: https://forecr.io/products/jetson-orin-nx-orin-nano-rugged-compact-pc-milbox-ornx
+- Jetson SD card corruption: https://forums.developer.nvidia.com/t/corrupted-sd-cards/265418
+- Jetson thermal throttling: https://www.alibaba.com/product-insights/how-to-run-private-llama-3-inference-on-a-200-jetson-orin-nano-without-thermal-throttling.html
+- ViewPro EMI issues: https://www.viewprouav.com/help/gimbal/
+- UART CRC reliability: https://link.springer.com/chapter/10.1007/978-981-19-8563-8_23
+- Military edge AI thermal: https://www.mobilityengineeringtech.com/component/content/article/53967
+- UAV-VL-R1: https://arxiv.org/pdf/2508.11196
+- Qwen2-VL-2B-GPTQ-INT8: https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct-GPTQ-Int8
+- Ultralytics Jetson guide: https://docs.ultralytics.com/guides/nvidia-jetson
+- Skelite: https://arxiv.org/html/2503.07369v1
+- YOLO FP reduction: https://docs.ultralytics.com/yolov5/tutorials/tips_for_best_training_results
@@ -0,0 +1,140 @@
+# Tech Stack Evaluation
+
+## Requirements Analysis
+
+### Functional Requirements
+- Real-time open-vocabulary detection from UAV aerial imagery
+- Footpath segmentation and path tracing with endpoint analysis
+- Binary concealment classification on ROI crops
+- On-demand VLM analysis for ambiguous detections
+- Camera gimbal control with path-following
+- Integration with existing Cython+TRT YOLO pipeline
+
+### Non-Functional Requirements
+- Tier 1 inference ≤15ms, Tier 2 ≤200ms
+- 5.2GB usable VRAM budget (Jetson Orin Nano Super 8GB)
+- Field-deployable: thermal resilience, tamper protection
+- Offline operation (no cloud dependency)
+
+### Constraints
+- Jetson Orin Nano Super: 67 TOPS INT8, 8GB LPDDR5 unified, 68 GB/s bandwidth
+- JetPack 6.2, CUDA 12.6, TensorRT 10.3
+- Existing codebase: Cython + TensorRT (must extend, not replace)
+- ViewPro A40 camera with ViewLink Serial Protocol V3.3.3
+
+## Technology Evaluation
+
+### Detection Framework
+
+| Option | Fitness | Maturity | Security | Team Fit | Cost | Scalability | Score |
+|--------|---------|----------|----------|----------|------|-------------|-------|
+| **YOLOE-v8-seg (Ultralytics)** | 9/10 — open-vocab + segmentation | 9/10 — YOLOv8 TRT proven | 7/10 — PatchBlock compatible | 9/10 — existing Cython+TRT expertise | Free | 8/10 | **8.5** |
+| YOLOE-26-seg (Ultralytics) | 10/10 — latest arch, NMS-free | 4/10 — TRT bugs on Jetson | 7/10 | 7/10 — new arch, less familiar | Free | 9/10 | **6.5** |
+| YOLO-World v2 | 7/10 — open-vocab, no seg | 7/10 — stable but older | 7/10 | 8/10 | Free | 7/10 | **7.0** |
+
+**Selected**: YOLOE-v8-seg. Upgrade path to YOLOE-26 when TRT issues resolved.
+
+### CNN Classifier
+
+| Option | Fitness | Maturity | Security | Team Fit | Cost | Scalability | Score |
+|--------|---------|----------|----------|----------|------|-------------|-------|
+| **MobileNetV3-Small** | 9/10 — binary classification, tiny | 10/10 — battle-tested | 8/10 | 9/10 | Free | 8/10 | **9.0** |
+| EfficientNet-B0 | 8/10 — slightly more accurate | 10/10 | 8/10 | 8/10 | Free | 7/10 — larger | **8.0** |
+| ResNet-18 | 7/10 — overkill for binary | 10/10 | 8/10 | 9/10 | Free | 6/10 — 44MB | **7.5** |
+
+**Selected**: MobileNetV3-Small. ~50MB TRT FP16. Best size/accuracy trade-off.
+
+### VLM
+
+| Option | Fitness | Maturity | Security | Team Fit | Cost | Scalability | Score |
+|--------|---------|----------|----------|----------|------|-------------|-------|
+| **Moondream 0.5B INT4** | 7/10 — detect()/point() APIs | 7/10 — active development | 8/10 — local only | 7/10 — new, learning curve | Free | 9/10 — 816 MiB | **7.5** |
+| SmolVLM2-500M | 6/10 — no detect API | 6/10 — newer | 8/10 | 6/10 | Free | 8/10 — 1.8GB | **6.5** |
+| UAV-VL-R1 2B | 9/10 — aerial-specialized | 5/10 — not tested on Jetson | 8/10 | 5/10 | Free | 4/10 — 2.5GB | **5.5** |
+| No VLM (MVP) | 5/10 — no fallback | 10/10 | 10/10 | 10/10 | Free | 10/10 | **8.0** |
+
+**Selected**: Moondream 0.5B for VLM tier. "No VLM" as MVP fallback if Moondream insufficient.
+
+### VLM Runtime
+
+| Option | Fitness | Maturity | Security | Team Fit | Cost | Scalability | Score |
+|--------|---------|----------|----------|----------|------|-------------|-------|
+| **ONNX Runtime** | 8/10 — lightweight, cross-platform | 9/10 | 8/10 | 8/10 | Free | 9/10 | **8.5** |
+| vLLM | 7/10 — server-oriented, overkill for 0.5B | 8/10 — Jetson compatible | 7/10 | 6/10 — complex setup | Free | 7/10 | **7.0** |
+| PyTorch direct | 7/10 — simplest integration | 10/10 | 8/10 | 9/10 | Free | 6/10 — no optimization | **7.5** |
+| MLC-LLM | 6/10 — declining adoption | 5/10 | 7/10 | 5/10 | Free | 7/10 | **5.5** |
+
+**Selected**: ONNX Runtime for Moondream 0.5B. Lightweight, no server overhead.
+
+### Gimbal Control
+
+| Option | Fitness | Maturity | Security | Team Fit | Cost | Scalability | Score |
+|--------|---------|----------|----------|----------|------|-------------|-------|
+| **filterpy (Kalman) + servopilot (PID)** | 9/10 — cascade control | 8/10 — proven libraries | 8/10 | 7/10 — Kalman is new | Free | 8/10 | **8.0** |
+| Custom Kalman + PID from scratch | 8/10 | 5/10 — unproven | 8/10 | 6/10 | Free | 7/10 | **6.5** |
+| PID only (servopilot) | 6/10 — no drift compensation | 9/10 | 8/10 | 9/10 | Free | 7/10 | **7.5** |
+
+**Selected**: filterpy + servopilot cascade.
+
+### Adversarial Defense
+
+| Option | Fitness | Maturity | Security | Team Fit | Cost | Scalability | Score |
+|--------|---------|----------|----------|----------|------|-------------|-------|
+| **PatchBlock** | 9/10 — designed for edge YOLO | 7/10 — 2026 paper | 9/10 | 7/10 | Free | 9/10 — CPU-based | **8.0** |
+| Custom input validation | 5/10 — ad-hoc | 3/10 | 6/10 | 8/10 | Free | 7/10 | **5.5** |
+| None | 0/10 | 10/10 | 0/10 | 10/10 | Free | 10/10 | **3.0** |
+
+**Selected**: PatchBlock. Integrate as CPU preprocessing step.
+
+### Synthetic Data Generation
+
+| Option | Fitness | Maturity | Security | Team Fit | Cost | Scalability | Score |
+|--------|---------|----------|----------|----------|------|-------------|-------|
+| **CamouflageAnything** | 8/10 — CVPR 2025, camouflage-specific | 7/10 | 8/10 | 6/10 | Free | 8/10 | **7.5** |
+| GenCAMO | 8/10 — environment-aware, 2026 | 6/10 — newer | 8/10 | 6/10 | Free | 8/10 | **7.0** |
+| Cut-paste augmentation | 6/10 — simple but effective | 10/10 | 8/10 | 9/10 | Free | 7/10 | **7.5** |
+
+**Selected**: CamouflageAnything (primary) + cut-paste (supplementary).
+
+## Tech Stack Summary
+
+| Layer | Technology | Version | Justification |
+|-------|-----------|---------|---------------|
+| Hardware | Jetson Orin Nano Super | 8GB | Existing constraint |
+| OS / SDK | JetPack | 6.2 | Latest for Orin Nano Super |
+| GPU Runtime | TensorRT | 10.3 (FP16) | Existing pipeline, proven stability |
+| Detection | YOLOE-v8-seg | Ultralytics ≥8.4 | Stable TRT, open-vocab + segmentation |
+| Classifier | MobileNetV3-Small | torchvision → TRT FP16 | Tiny footprint, binary classification |
+| VLM | Moondream 0.5B | INT4, ONNX | 816 MiB, detect()/point() APIs |
+| VLM Runtime | ONNX Runtime | ≥1.17 | Lightweight, no server overhead |
+| Path Tracing | OpenCV + scikit-image | OpenCV 4.x, skimage 0.22+ | Preprocessing + skeletonization |
+| Gimbal Kalman | filterpy | ≥1.4 | Kalman filter state estimation |
+| Gimbal PID | servopilot | latest | Anti-windup PID, dual-axis |
+| Serial | pyserial | ≥3.5 | ViewLink protocol communication |
+| Adversarial Defense | PatchBlock | 2026 release | CPU-based, edge-optimized |
+| Synthetic Data | CamouflageAnything | CVPR 2025 | Camouflage-specific generation |
+| Encryption | LUKS / dm-crypt | Linux kernel | Model weight encryption at rest |
+| Core Language | Cython + Python | 3.10+ | Existing codebase extension |
+
+## Risk Assessment
+
+| Technology | Risk | Mitigation |
+|-----------|------|------------|
+| YOLOE-v8-seg | Lower accuracy than YOLOE-26 | Monitor YOLO26 TRT fix; upgrade when stable |
+| Moondream 0.5B | Untested for aerial concealment | Empirical testing Week 8; fallback to no-VLM MVP |
+| PatchBlock | New (2026), limited field testing | Can disable if causes false positives; low integration risk |
+| filterpy Kalman | Team unfamiliar | Well-documented library; standard aerospace algorithm |
+| CamouflageAnything | Synthetic-to-real domain gap | Supplement with real data; validate FP/FN rates |
+| Demand-loaded VLM | 30-45s detection pause | Batch requests; operator-triggered only; async notification |
+| ONNX Runtime on Jetson | Less optimized than TRT for vision models | For 0.5B model, ONNX overhead is acceptable |
+
+## Learning Requirements
+
+| Technology | Effort | Who |
+|-----------|--------|-----|
+| YOLOE visual prompts (SAVPE) | Low — API-based | Detection engineer |
+| Moondream detect()/caption() | Low — simple API | ML engineer |
+| filterpy Kalman filter | Medium — state estimation theory | Controls engineer |
+| PatchBlock integration | Low — preprocessing module | Detection engineer |
+| CamouflageAnything pipeline | Medium — generative model setup | Data engineer |
+| LUKS encryption + secure boot | Medium — Linux security | DevOps / platform |