Initial commit

Made-with: Cursor
2026-06-21 09:31:12 +00:00 · 2026-03-26 00:20:30 +02:00
commit 8e2ecf50fd
144 changed files with 19781 additions and 0 deletions
@@ -0,0 +1,104 @@
+# Acceptance Criteria Assessment (Revised)
+
+## Acceptance Criteria
+
+| Criterion | Our Values | Researched Values | Cost/Timeline Impact | Status |
+|-----------|-----------|-------------------|---------------------|--------|
+| Tier 1 latency | ≤100ms per frame | YOLO26n TensorRT FP16 on Jetson Orin Nano Super: ~7ms at 640px. YOLOE-26 adds zero inference overhead when re-parameterized. Combined detection + segmentation well under 100ms. | Low risk | **Confirmed** |
+| Tier 2 detailed analysis | ≤2 seconds (originally VLM) | See VLM assessment below. Recommend tiered approach: Tier 2 = custom CNN classifier (≤100ms), Tier 3 = optional VLM (3-5s). | Architecture change reduces risk | **Modified: split into Tier 2 (CNN ≤200ms) + Tier 3 (VLM ≤5s optional)** |
+| YOLO new classes P≥80%, R≥80% | P≥80%, R≥80% | Use YOLO26-Seg for footpaths/roads (instance segmentation), YOLO26 detection for compact objects. YOLOE-26 open-vocabulary as bootstrap. | Reduced training data dependency initially via YOLOE-26 | **Modified: YOLO26-Seg for linear features, YOLOE-26 text prompts for bootstrapping** |
+| Semantic recall ≥60% concealed positions | ≥60% recall | Reasonable for initial release. YOLOE-26 zero-shot + custom-trained YOLO26 models should reach this. | Medium risk | **Confirmed** |
+| Semantic precision ≥20% | ≥20% initial | Reasonable. YOLOE-26 text prompts will start noisy, custom training improves over time. | Low risk | **Confirmed** |
+| Footpath recall ≥70% | ≥70% | Confirmed achievable. UAV-YOLO12: F1=0.825. YOLO26-Seg NMS-free: likely competitive. | Low risk, seasonal caveat | **Confirmed** |
+| Level 1→2 transition | ≤1 second | Physical zoom takes 1-3s on ViewPro A40. | Physical constraint | **Modified: ≤2 seconds** |
+| Gimbal command latency ≤500ms | ≤500ms | ViewPro A40 UART 115200 + physical response: well under 500ms. | Low risk | **Confirmed** |
+| RAM ≤6GB for semantic + VLM | ≤6GB | YOLO26 models: ~0.1-0.5GB. YOLOE-26: same as YOLO26 (zero overhead). Custom CNN: ~0.1GB. VLM (optional): SmolVLM2-500M ~1.8GB or UAV-VL-R1 INT8 ~2.5GB. Total worst case: ~4GB. | Low risk — well within budget | **Confirmed** |
+| Dataset 1.5 months × 5h/day | ~225 hours | YOLOE-26 text prompts reduce initial data dependency. Recommend SAM-assisted annotation. | Reduced risk via YOLOE-26 bootstrapping | **Modified: phased data collection strategy** |
+
+## VLM Options Assessment
+
+| Model | Params | Memory (quantized) | Estimated Speed (Jetson Orin Nano Super) | Strengths | Weaknesses | Fit for This Task |
+|-------|--------|-------------------|----------------------------------------|-----------|------------|-------------------|
+| **UAV-VL-R1** | 2B | 2.5 GB (INT8) | ~10-15 tok/s (estimated from Qwen2-VL-2B base) | **Specifically trained for aerial reasoning.** 48% better than Qwen2-VL-2B on UAV tasks. Supports object counting, spatial reasoning. Open source. | Largest memory footprint of candidates. ~3-5s per analysis with image. | **Best fit for aerial scene understanding.** Use as Tier 3 for ambiguous cases. |
+| **SmolVLM2-500M** | 500M | 1.8 GB (FP16) | ~30-50 tok/s (estimated, 4x smaller than 2B) | Tiny memory footprint. Fast inference. ONNX export available. | Weakest reasoning capability. Video-MME: 42.2 (mediocre). May lack nuance for concealment analysis. | **Marginal.** Fast but may be too weak for meaningful contextual reasoning on concealed positions. |
+| **SmolVLM2-256M** | 256M | <1 GB | ~50-80 tok/s (estimated) | Smallest available VLM. Near-instant inference. | Very limited reasoning. Outperforms Idefics-80B on simple benchmarks but not on complex spatial reasoning. | **Not recommended.** Likely too weak for this task. |
+| **Moondream 0.5B** | 500M | 816 MiB (INT4) | ~30-50 tok/s (estimated) | Built-in detect() and point() APIs. Tiny. | No specific aerial training. Detection benchmarks unverified for small model. | **Interesting for pointing/localization.** Could complement YOLO by pointing at "suspicious area at end of path." |
+| **Moondream 2B** | 2B | ~2 GB (INT4) | ~10-15 tok/s (estimated) | Strong grounded detection (refcoco: 91.1). detect() and point() APIs. | No aerial-specific training. Similar size to UAV-VL-R1 but less specialized. | **Good general VLM, but UAV-VL-R1 is better for aerial tasks.** |
+| **Cosmos-Reason2-2B** | 2B | ~2 GB (W4A16) | 4.7 tok/s (benchmarked) | NVIDIA-optimized for Jetson. Reasoning focused. | Slowest of candidates. Not aerial-specific. | **Slow. Not recommended over UAV-VL-R1.** |
+
+### VLM Verdict
+
+**Is a VLM helpful at all for this task?**
+
+**Yes, but as a supplementary layer, not the primary mechanism.** The core detection pipeline should be:
+- **Tier 1 (real-time)**: YOLO26-Seg + YOLOE-26 text prompts — detect footpaths, roads, branch piles, entrances, trees
+- **Tier 2 (fast confirmation)**: Custom lightweight CNN classifier — binary "concealed position: yes/no" on cropped ROI (≤200ms)
+- **Tier 3 (optional deep analysis)**: Small VLM — contextual reasoning on ambiguous cases, operator-facing descriptions
+
+VLM adds value for:
+1. **Zero-shot bootstrapping** — before enough custom training data exists, VLM can reason about "is this a hidden position?"
+2. **Ambiguous cases** — when Tier 2 CNN confidence is between 30-70%, VLM provides second opinion
+3. **Operator trust** — VLM can generate natural-language explanations ("footpath terminates at dark mass consistent with concealed entrance")
+4. **Novel patterns** — VLM generalizes to new types of concealment without retraining
+
+**Recommended VLM: UAV-VL-R1** (2.5GB INT8, aerial-specialized, open source). Fallback: SmolVLM2-500M if memory is too tight.
+
+## YOLO26 Key Capabilities for This Project
+
+| Feature | Relevance |
+|---------|-----------|
+| **NMS-free end-to-end inference** | Simpler deployment on Jetson, more predictable latency, no post-processing tuning |
+| **Instance segmentation (YOLO26-Seg)** | Precise pixel-level masks for footpaths and roads — far better than bounding boxes for linear features |
+| **YOLOE-26 open-vocabulary** | Text-prompted detection: "footpath", "pile of branches", "dark entrance" — zero-shot, no training needed initially |
+| **YOLOE-26 visual prompts** | Use reference images of hideouts to find visually similar structures — directly applicable |
+| **YOLOE-26 prompt-free mode** | 1200+ built-in categories for autonomous discovery — useful for Level 1 wide scan |
+| **43% faster CPU inference** | Better edge performance than previous YOLO versions |
+| **MuSGD optimizer** | Better convergence for custom training with small datasets |
+| **Improved small object detection** | ProgLoss + STAL loss — directly relevant for detecting small entrances and path features |
+
+**YOLOE-26 is a potential game-changer for this project.** It enables:
+- Immediate zero-shot detection with text prompts before any custom training
+- Visual prompt mode: provide example images of hideouts, system finds similar patterns
+- Gradual transition to custom-trained YOLO26 as annotated data accumulates
+- No inference overhead vs standard YOLO26 when re-parameterized for fixed classes
+
+## Restrictions Assessment
+
+| Restriction | Our Values | Researched Values | Cost/Timeline Impact | Status |
+|-------------|-----------|-------------------|---------------------|--------|
+| Jetson Orin Nano Super 67 TOPS, 8GB | 67 TOPS INT8, 8GB LPDDR5, 102 GB/s | Memory budget: YOLO26 (~0.3GB) + YOLOE-26 (zero overhead) + CNN (~0.1GB) + VLM (~2.5GB) = ~3GB. Leaves ~3GB for OS, buffers. VLM and YOLO should run sequentially to avoid contention. | Manageable with scheduling | **Confirmed, add sequential inference scheduling** |
+| ViewPro A40 zoom transition | Not specified | 40x optical zoom full traversal: 1-3 seconds. Partial zoom (medium→high): ~1-2s. | Physical constraint, must account for in scan timing | **Add: zoom transition time 1-2s as physical constraint** |
+| All seasons | No seasonal restriction | Phased rollout: winter first (highest contrast), then spring/summer/autumn. | Multi-season dataset collection is long-term effort | **Add: phased seasonal rollout** |
+| Cython + TensorRT | Must extend existing | YOLO26 deploys natively to TensorRT. YOLOE-26 also TensorRT compatible. VLM may need separate process (vLLM or TensorRT-LLM). | Low complexity for YOLO26. Medium for VLM. | **Modified: VLM as separate process with IPC** |
+| VLM local only | No cloud | UAV-VL-R1 INT8: 2.5GB, open source, fits on Jetson. SmolVLM2-500M: 1.8GB, ONNX available. | Feasible | **Confirmed** |
+
+## Key Findings (Revised)
+
+1. **YOLOE-26 open-vocabulary is the biggest opportunity.** Text-prompted and visual-prompted detection enables immediate zero-shot capability without custom training. Transition to custom-trained YOLO26 as data accumulates.
+
+2. **Three-tier architecture is more realistic than two-tier.** Tier 1: YOLO26/YOLOE-26 real-time (≤100ms). Tier 2: Custom CNN confirmation (≤200ms). Tier 3: VLM deep analysis (≤5s, optional).
+
+3. **UAV-VL-R1 is the best VLM candidate.** Purpose-built for aerial reasoning, 48% better than generic Qwen2-VL-2B, fits on Jetson at 2.5GB INT8. SmolVLM2-500M is a lighter fallback.
+
+4. **VLM is valuable but supplementary.** Zero-shot bootstrapping, ambiguous case analysis, and operator-facing explanations. Not the primary detection mechanism.
+
+5. **YOLO26 NMS-free design simplifies edge deployment.** No NMS tuning, more predictable latency, native TensorRT support. Instance segmentation mode ideal for footpaths.
+
+6. **Phased approach reduces risk.** Start with YOLOE-26 text prompts (no training needed), then train custom YOLO26 models, then add VLM for edge cases.
+
+## Sources
+
+- Ultralytics YOLO26 docs: https://docs.ultralytics.com/models/yolo26/ (Jan 2026)
+- YOLOE-26 paper: arXiv:2602.00168 (Feb 2026)
+- YOLOE docs: https://v8docs.ultralytics.com/models/yoloe/ (2025)
+- Ultralytics YOLO26 Jetson benchmarks: https://docs.ultralytics.com/guides/nvidia-jetson (2026)
+- Cosmos-Reason2-2B on Jetson Orin Nano Super: 4.7 tok/s W4A16 (Embedl, Feb 2026)
+- UAV-VL-R1: arXiv:2508.11196 (2025), 2.5GB INT8, open source
+- SmolVLM2-500M: HuggingFace blog (Feb 2025), 1.8GB GPU RAM
+- SmolVLM-256M: HuggingFace blog (Jan 2025), <1GB
+- Moondream 0.5B: moondream.ai (Dec 2024), 816 MiB INT4
+- Moondream 2B: moondream.ai (2025), refcoco 91.1
+- UAV-YOLO12 road segmentation: F1=0.825, 11.1ms (MDPI Drones, 2025)
+- ViewPro ViewLink Serial Protocol V3.3.3 (viewprotech.com)
+- YOLO training best practices: ≥1500 images/class (Ultralytics docs)
+- Open-Vocabulary Camouflaged Object Segmentation: arXiv:2506.19300 (2025)
@@ -0,0 +1,74 @@
+# Question Decomposition
+
+## Original Question
+Assess solution_draft01.md for weak points, performance bottlenecks, security issues, and produce a revised solution draft.
+
+## Active Mode
+Mode B — Solution Assessment of draft01
+Rationale: solution_draft01.md exists in OUTPUT_DIR. Assessing and improving.
+
+## Problem Context Summary
+- Three-tier semantic detection (YOLOE-26 → Spatial Reasoning + CNN → VLM) on Jetson Orin Nano Super (8GB, 67 TOPS)
+- Two-level camera scan (wide sweep → detailed investigation) with ViewPro A40 gimbal
+- Integration with existing Cython+TRT YOLO detection service
+- YOLOE-26 zero-shot bootstrapping → custom YOLO26 fine-tuning transition
+- VLM (UAV-VL-R1) as separate process via Unix socket IPC
+- Winter-first seasonal rollout
+
+## Question Type
+**Problem Diagnosis** — root cause analysis of weak points
+Combined with **Decision Support** — weighing alternative solutions for identified issues
+
+## Research Subject Boundary Definition
+- **Population**: Edge AI semantic detection pipelines on Jetson-class hardware
+- **Geography**: Deployment in Eastern European winter conditions (Ukraine conflict)
+- **Timeframe**: 2025-2026 technology (YOLO26, YOLOE-26, VLMs, JetPack 6.2)
+- **Level**: Single Jetson Orin Nano Super device (8GB unified memory, 67 TOPS INT8)
+
+## Decomposed Sub-Questions
+
+### Memory & Resource Contention
+1. What is the actual GPU memory footprint of YOLOE-26s-seg TRT engine + existing YOLO TRT engine + MobileNetV3-Small TRT + UAV-VL-R1 INT8 running on 8GB unified memory?
+2. Can two TRT engines (existing YOLO + YOLOE-26) share the same GPU execution context, or do they need separate CUDA streams?
+3. Is sequential VLM scheduling (pause YOLO → run VLM → resume) viable without dropping detection frames?
+
+### YOLOE-26 Zero-Shot Accuracy
+4. How well do YOLOE text prompts perform on out-of-distribution domains (military concealment vs COCO/LVIS training data)?
+5. Are visual prompts (SAVPE) more reliable than text prompts for this domain? What are the reference image requirements?
+6. What fallback if YOLOE-26 zero-shot produces unacceptable false positive rates?
+
+### Path Tracing & Spatial Reasoning
+7. How robust is morphological skeletonization on noisy aerial segmentation masks (partial paths, broken segments)?
+8. What happens with dense path networks (villages, supply routes)? How to filter relevant paths?
+9. Is 128×128 ROI sufficient for endpoint classification, or does the CNN need more spatial context?
+
+### VLM Integration
+10. What is the actual inference latency of a 2B-parameter VLM (INT8) on Jetson Orin Nano Super?
+11. Is vLLM the right runtime for Jetson, or should we use TRT-LLM / llama.cpp / MLC-LLM?
+12. What is the memory overhead of keeping a VLM loaded but idle vs loading on-demand?
+
+### Gimbal Control & Scan Strategy
+13. Is PID control sufficient for path-following, or do we need a more sophisticated controller (Kalman filter, predictive)?
+14. What happens when the UAV itself is moving during Level 2 detailed scan? How to compensate?
+15. Is the POI queue strategy (20 max, 30s expiry) well-calibrated for typical mission profiles?
+
+### Training Data Strategy
+16. Is 1500 images/class realistic for military concealment data? What are actual annotation throughput estimates?
+17. Can synthetic data augmentation (cut-paste, style transfer) meaningfully boost concealment detection training?
+
+### Security
+18. What adversarial attack vectors exist against edge-deployed YOLO models?
+19. How to protect model weights and inference pipeline on a physical device that could be captured?
+20. What operational security measures are needed for the data pipeline (captured imagery, detection logs)?
+
+## Timeliness Sensitivity Assessment
+
+- **Research Topic**: Edge AI resource management, VLM deployment on Jetson, YOLOE accuracy assessment
+- **Sensitivity Level**: Critical
+- **Rationale**: Tools (vLLM, TRT-LLM, MLC-LLM for Jetson) are actively evolving. JetPack 6.2 is latest. YOLOE-26 is weeks old.
+- **Source Time Window**: 6 months (Sep 2025 — Mar 2026)
+- **Priority official sources**:
+  1. NVIDIA Jetson AI Lab (memory/performance benchmarks)
+  2. Ultralytics docs (YOLOE-26 accuracy, TRT export)
+  3. vLLM / TRT-LLM / MLC-LLM Jetson compatibility docs
+  4. TensorRT 10.x memory management documentation
@@ -0,0 +1,280 @@
+# Source Registry
+
+## Source #1
+- **Title**: Ultralytics YOLO26 Documentation
+- **Link**: https://docs.ultralytics.com/models/yolo26/
+- **Tier**: L1
+- **Publication Date**: 2026-01-14
+- **Timeliness Status**: Currently valid
+- **Version Info**: YOLO26, Ultralytics 8.4.x
+- **Summary**: Official YOLO26 docs — NMS-free, edge-first, MuSGD optimizer, improved small object detection, instance segmentation with semantic loss.
+
+## Source #2
+- **Title**: YOLOE: Real-Time Seeing Anything — Ultralytics Docs
+- **Link**: https://docs.ultralytics.com/models/yoloe/
+- **Tier**: L1
+- **Publication Date**: 2025-2026
+- **Timeliness Status**: Currently valid
+- **Version Info**: YOLOE, YOLOE-26 (yoloe-26n-seg.pt through yoloe-26x-seg.pt)
+- **Summary**: Official YOLOE docs — open-vocabulary detection/segmentation, text/visual/prompt-free modes, RepRTA, SAVPE, LRPC, zero inference overhead when re-parameterized.
+
+## Source #3
+- **Title**: YOLOE-26 Paper
+- **Link**: https://arxiv.org/abs/2602.00168
+- **Tier**: L1
+- **Publication Date**: 2026-02
+- **Timeliness Status**: Currently valid
+- **Summary**: Integration of YOLO26 with YOLOE for real-time open-vocabulary instance segmentation. NMS-free, end-to-end.
+
+## Source #4
+- **Title**: Ultralytics YOLO26 Jetson Benchmarks
+- **Link**: https://docs.ultralytics.com/guides/nvidia-jetson
+- **Tier**: L1
+- **Publication Date**: 2026
+- **Timeliness Status**: Currently valid
+- **Version Info**: YOLO11 benchmarks on Jetson Orin Nano Super, TensorRT FP16
+- **Summary**: YOLO11n TensorRT FP16 on Jetson Orin Nano Super: 6.93ms at 640px. YOLO11s: 13.50ms. YOLO11m: 17.48ms.
+
+## Source #5
+- **Title**: Cosmos-Reason2-2B on Jetson Orin Nano Super
+- **Link**: https://www.thenextgentechinsider.com/pulse/cosmos-reason2-runs-on-jetson-orin-nano-super-with-w4a16-quantization
+- **Tier**: L2
+- **Publication Date**: 2026-02
+- **Timeliness Status**: Currently valid
+- **Summary**: 4.7 tok/s on Jetson Orin Nano Super with W4A16 quantization.
+
+## Source #6
+- **Title**: UAV-VL-R1 Paper
+- **Link**: https://arxiv.org/pdf/2508.11196
+- **Tier**: L1
+- **Publication Date**: 2025
+- **Timeliness Status**: Currently valid
+- **Summary**: Lightweight VLM for aerial reasoning. 48% better zero-shot than Qwen2-VL-2B. 2.5GB INT8, 3.9GB FP16. Open source.
+
+## Source #7
+- **Title**: SmolVLM 256M & 500M Blog
+- **Link**: https://huggingface.co/blog/smolervlm
+- **Tier**: L1
+- **Publication Date**: 2025-01
+- **Timeliness Status**: Currently valid
+- **Summary**: SmolVLM-500M: 1.8GB GPU RAM, ONNX/WebGPU support, 93M SigLIP vision encoder.
+
+## Source #8
+- **Title**: Moondream 0.5B Blog
+- **Link**: https://moondream.ai/blog/introducing-moondream-0-5b
+- **Tier**: L1
+- **Publication Date**: 2024-12
+- **Timeliness Status**: Currently valid
+- **Summary**: 500M params, 816 MiB INT4, detect()/point() APIs, Raspberry Pi compatible.
+
+## Source #9
+- **Title**: ViewPro ViewLink Serial Protocol V3.3.3
+- **Link**: https://www.viewprotech.com/index.php?ac=article&at=read&did=510
+- **Tier**: L1
+- **Publication Date**: 2024
+- **Timeliness Status**: Currently valid
+- **Summary**: Serial command protocol for ViewPro gimbal cameras. UART 115200.
+
+## Source #10
+- **Title**: ArduPilot ViewPro Gimbal Integration
+- **Link**: https://ardupilot.org/copter/docs/common-viewpro-gimbal.html
+- **Tier**: L1
+- **Publication Date**: 2025
+- **Version Info**: ArduPilot 4.5+
+- **Summary**: MNT1_TYPE=11 (Viewpro), SERIAL2_PROTOCOL=8, TTL serial, MAVLink 10Hz.
+
+## Source #11
+- **Title**: UAV-YOLO12 Road Segmentation
+- **Link**: https://www.mdpi.com/2072-4292/17/9/1539
+- **Tier**: L1
+- **Publication Date**: 2025
+- **Summary**: F1=0.825 for paths from UAV imagery. 11.1ms inference. SKNet + PConv modules.
+
+## Source #12
+- **Title**: FootpathSeg GitHub
+- **Link**: https://github.com/WennyXY/FootpathSeg
+- **Tier**: L3
+- **Publication Date**: 2025
+- **Summary**: DINO-MC pre-training + UNet fine-tuning for footpath segmentation. GIS layer generation.
+
+## Source #13
+- **Title**: Herbivore Trail Segmentation (UNet+MambaOut)
+- **Link**: https://arxiv.org/pdf/2504.12121
+- **Tier**: L1
+- **Publication Date**: 2025-04
+- **Summary**: UNet+MambaOut achieves best accuracy for trail detection from aerial photographs.
+
+## Source #14
+- **Title**: Open-Vocabulary Camouflaged Object Segmentation
+- **Link**: https://arxiv.org/html/2506.19300v1
+- **Tier**: L1
+- **Publication Date**: 2025
+- **Summary**: VLM + SAM cascaded approach for camouflage detection. VLM-derived features as prompts to SAM.
+
+## Source #15
+- **Title**: YOLO Training Best Practices
+- **Link**: https://docs.ultralytics.com/yolov5/tutorials/tips_for_best_training_results
+- **Tier**: L1
+- **Publication Date**: 2025
+- **Summary**: ≥1500 images/class, ≥10,000 instances/class. 0-10% background images. Pretrained weights recommended.
+
+## Source #16
+- **Title**: Jetson AI Lab LLM/VLM Benchmarks
+- **Link**: https://www.jetson-ai-lab.com/tutorials/genai-benchmarking/
+- **Tier**: L1
+- **Publication Date**: 2025-2026
+- **Summary**: Llama-3.1-8B W4A16 on Jetson Orin Nano Super: 44.19 tok/s output, 32ms TTFT. vLLM as inference engine.
+
+## Source #17
+- **Title**: servopilot Python Library
+- **Link**: https://pypi.org/project/servopilot/
+- **Tier**: L3
+- **Publication Date**: 2025
+- **Summary**: Anti-windup PID controller for gimbal control. Dual-axis support. Zero dependencies.
+
+## Source #18
+- **Title**: Multi-Model AI Resource Allocation for Humanoid Robots: A Survey on Jetson Orin Nano Super
+- **Link**: https://dev.to/ankk98/multi-model-ai-resource-allocation-for-humanoid-robots-a-survey-on-jetson-orin-nano-super-310i
+- **Tier**: L3
+- **Publication Date**: 2025
+- **Summary**: Running VLA + YOLO concurrently on Orin Nano Super is "mostly theoretical". GPU sharing causes 10-40% latency jitter. Needs lighter edge-optimized models.
+
+## Source #19
+- **Title**: TensorRT Multiple Engines on Single GPU
+- **Link**: https://github.com/NVIDIA/TensorRT/issues/4358
+- **Tier**: L2
+- **Publication Date**: 2025
+- **Summary**: NVIDIA recommends single engine with async CUDA streams over multiple separate engines. CUDA context push/pop needed for multiple engines.
+
+## Source #20
+- **Title**: TensorRT High Memory Usage on Jetson Orin Nano (Ultralytics)
+- **Link**: https://github.com/ultralytics/ultralytics/issues/21562
+- **Tier**: L2
+- **Publication Date**: 2025
+- **Summary**: YOLOv8-OBB TRT engine consumes ~2.6GB on Jetson Orin Nano. cuDNN/CUDA binary loading adds ~940MB-1.1GB overhead per engine.
+
+## Source #21
+- **Title**: NVIDIA Forum: Jetson Orin Nano Super Insufficient GPU Memory
+- **Link**: https://forums.developer.nvidia.com/t/jetson-orin-nano-super-insufficient-gpu-memory/330777
+- **Tier**: L2
+- **Publication Date**: 2025-04
+- **Summary**: Orin Nano Super shows 3.7GB/7.6GB free GPU memory after OS. Even 1.5B Q4 model fails to load due to KV cache buffer requirements (model weight 876MB + temp buffer 10.7GB needed).
+
+## Source #22
+- **Title**: YOLO26 TensorRT Confidence Misalignment on Jetson
+- **Link**: https://www.hackster.io/qwe018931/pushing-limits-yolov8-vs-v26-on-jetson-orin-nano-b89267
+- **Tier**: L2
+- **Publication Date**: 2026
+- **Summary**: YOLO26 exhibits bounding box drift and inaccurate confidence scores when converted to TRT for C++ deployment on Jetson. YOLOv8 works fine. Architecture-specific export issue.
+
+## Source #23
+- **Title**: YOLO26 INT8 TensorRT Export Fails on Jetson Orin (Ultralytics Issue #23841)
+- **Link**: https://github.com/ultralytics/ultralytics/issues/23841
+- **Tier**: L2
+- **Publication Date**: 2026
+- **Summary**: YOLO26n INT8 TRT export fails with checkLinks error during calibration on Jetson Orin with TensorRT 10.3.0 / JetPack 6.
+
+## Source #24
+- **Title**: PatchBlock: Lightweight Defense Against Adversarial Patches for Edge AI
+- **Link**: https://arxiv.org/abs/2601.00367
+- **Tier**: L1
+- **Publication Date**: 2026-01
+- **Summary**: CPU-based preprocessing module recovers up to 77% model accuracy under adversarial patch attacks. Minimal clean accuracy loss. Suitable for edge deployment.
+
+## Source #25
+- **Title**: Qrypt Quantum-Secure Encryption for NVIDIA Jetson Edge AI
+- **Link**: https://thequantuminsider.com/2026/03/12/qrypt-quantum-secure-encryption-nvidia-jetson-edge-ai/
+- **Tier**: L2
+- **Publication Date**: 2026-03
+- **Summary**: BLAST encryption protocol for Jetson Orin Nano and Thor. Quantum-secure end-to-end encryption, independent key generation.
+
+## Source #26
+- **Title**: Adversarial Patch Attacks on YOLO Edge Deployment (Springer)
+- **Link**: https://link.springer.com/article/10.1007/s10207-025-01067-3
+- **Tier**: L1
+- **Publication Date**: 2025
+- **Summary**: Smaller YOLO models on edge devices are more vulnerable to adversarial attacks. Trade-off between latency and security.
+
+## Source #27
+- **Title**: Synthetic Data for Military Camouflaged Object Detection (IEEE)
+- **Link**: https://ieeexplore.ieee.org/document/10660900/
+- **Tier**: L1
+- **Publication Date**: 2024
+- **Summary**: Synthetic data generation approach for military camouflage detection training.
+
+## Source #28
+- **Title**: GenCAMO: Environment-Aware Camouflage Image Generation
+- **Link**: https://arxiv.org/abs/2601.01181
+- **Tier**: L1
+- **Publication Date**: 2026-01
+- **Summary**: Scene graph + generative models for synthetic camouflage data with multi-modal annotations. Improves complex scene detection.
+
+## Source #29
+- **Title**: Camouflage Anything (CVPR 2025)
+- **Link**: https://openaccess.thecvf.com/content/CVPR2025/html/Das_Camouflage_Anything_...
+- **Tier**: L1
+- **Publication Date**: 2025
+- **Summary**: Controlled out-painting for realistic camouflage dataset generation. CamOT metric. Improves detection baselines when used for fine-tuning.
+
+## Source #30
+- **Title**: YOLOE Visual+Text Multimodal Fusion PR (Ultralytics)
+- **Link**: https://github.com/ultralytics/ultralytics/pull/21966
+- **Tier**: L2
+- **Publication Date**: 2025
+- **Summary**: Multimodal fusion of text + visual prompts for YOLOE. Concat mode (zero overhead) and weighted-sum mode (fuse_alpha). Merged into Ultralytics.
+
+## Source #31
+- **Title**: Learnable Morphological Skeleton for Remote Sensing (IEEE TGRS 2025)
+- **Link**: https://ui.adsabs.harvard.edu/abs/2025ITGRS..63S1458X
+- **Tier**: L1
+- **Publication Date**: 2025
+- **Summary**: Learnable morphological skeleton priors integrated into SAM for slender object segmentation. Addresses downsampling information loss.
+
+## Source #32
+- **Title**: GraphMorph: Topologically Accurate Tubular Structure Extraction
+- **Link**: https://arxiv.org/pdf/2502.11731
+- **Tier**: L1
+- **Publication Date**: 2025
+- **Summary**: Branch-level graph decoder + SkeletonDijkstra for centerline extraction. Reduces false positives vs pixel-level segmentation.
+
+## Source #33
+- **Title**: UAV Gimbal PID Control for Camera Stabilization (IEEE 2024)
+- **Link**: https://ieeexplore.ieee.org/document/10569310/
+- **Tier**: L1
+- **Publication Date**: 2024
+- **Summary**: PID controllers applied in gimbal construction for stabilization and tracking.
+
+## Source #34
+- **Title**: Kalman Filter Steady Aiming for UAV Gimbal (IEEE)
+- **Link**: https://ieeexplore.ieee.org/ielx7/6287639/10005208/10160027.pdf
+- **Tier**: L1
+- **Publication Date**: 2023
+- **Summary**: Kalman filter + coordinate transformation eliminates attitude and mounting errors in UAV gimbal. Better accuracy than PID alone during flight.
+
+## Source #35
+- **Title**: vLLM on Jetson Orin Nano Deployment Guide
+- **Link**: https://learnopencv.com/deployment-on-edge-vllm-on-jetson/
+- **Tier**: L2
+- **Publication Date**: 2026
+- **Summary**: vLLM can run 2B models on Orin Nano 8GB. Shared memory must be increased to 8GB. Memory management critical.
+
+## Source #36
+- **Title**: Jetson Orin Nano LLM Bottleneck Analysis
+- **Link**: https://ericxliu.me/posts/benchmarking-llms-on-jetson-orin-nano/
+- **Tier**: L2
+- **Publication Date**: 2025
+- **Summary**: Bottleneck is memory bandwidth (68 GB/s), not compute. Only 5.2GB usable VRAM after OS overhead. 40 TOPS largely underutilized for LLM inference.
+
+## Source #37
+- **Title**: TRT-LLM: No Edge Device Support Statement
+- **Link**: https://github.com/NVIDIA/TensorRT-LLM/issues/7978
+- **Tier**: L1
+- **Publication Date**: 2025
+- **Summary**: TensorRT-LLM developers explicitly state they do not aim to support edge devices/platforms.
+
+## Source #38
+- **Title**: Qwen3-VL-2B on Orin Nano Super (NVIDIA Forum)
+- **Link**: https://forums.developer.nvidia.com/t/performance-inquiry-optimizing-qwen3-vl-2b-inference-for-2-qps-target-on-orin-nano-super/359639
+- **Tier**: L2
+- **Publication Date**: 2026
+- **Summary**: Performance inquiry for Qwen3-VL-2B targeting 2 QPS on Orin Nano Super. Indicates active community attempts to deploy 2B VLMs on this hardware.
@@ -0,0 +1,161 @@
+# Fact Cards
+
+## Fact #1
+- **Statement**: Jetson Orin Nano Super has 7.6GB total unified memory, but only ~3.7GB free GPU memory after OS/system overhead in a Docker container.
+- **Source**: Source #21
+- **Phase**: Assessment
+- **Target Audience**: Edge AI multi-model deployment on Orin Nano Super
+- **Confidence**: ✅ High
+- **Related Dimension**: Memory contention
+
+## Fact #2
+- **Statement**: A single TensorRT engine (YOLOv8-OBB) consumes ~2.6GB on Jetson Orin Nano. cuDNN/CUDA binary loading adds ~940MB-1.1GB overhead per engine initialization.
+- **Source**: Source #20
+- **Phase**: Assessment
+- **Target Audience**: TRT multi-engine memory planning
+- **Confidence**: ✅ High
+- **Related Dimension**: Memory contention
+
+## Fact #3
+- **Statement**: Running VLA + YOLO detection concurrently on Orin Nano Super is described as "mostly theoretical" in 2025 surveys. GPU sharing causes 10-40% latency jitter.
+- **Source**: Source #18
+- **Phase**: Assessment
+- **Target Audience**: Multi-model concurrent inference
+- **Confidence**: ⚠️ Medium (survey, not primary benchmark)
+- **Related Dimension**: Memory contention, performance
+
+## Fact #4
+- **Statement**: NVIDIA recommends using a single TRT engine with async CUDA streams over multiple separate engines for GPU efficiency. Multiple engines need CUDA context push/pop management.
+- **Source**: Source #19
+- **Phase**: Assessment
+- **Target Audience**: TRT engine management
+- **Confidence**: ✅ High
+- **Related Dimension**: Memory contention, architecture
+
+## Fact #5
+- **Statement**: YOLO26 exhibits bounding box drift and inaccurate confidence scores when deployed via TensorRT on Jetson Orin Nano in C++. This is an architecture-specific export issue not present in YOLOv8.
+- **Source**: Source #22
+- **Phase**: Assessment
+- **Target Audience**: YOLO26/YOLOE-26 TRT deployment
+- **Confidence**: ✅ High
+- **Related Dimension**: YOLOE-26 viability, deployment risk
+
+## Fact #6
+- **Statement**: YOLO26n INT8 TensorRT export fails during calibration graph optimization on Jetson Orin with TensorRT 10.3.0 / JetPack 6. ONNX export succeeds but TRT build crashes.
+- **Source**: Source #23
+- **Phase**: Assessment
+- **Target Audience**: YOLO26 edge deployment
+- **Confidence**: ✅ High
+- **Related Dimension**: YOLOE-26 viability, deployment risk
+
+## Fact #7
+- **Statement**: YOLOE supports multimodal fusion of text + visual prompts with two modes: concat (zero overhead) and weighted-sum (fuse_alpha). This can improve robustness over text-only or visual-only prompts.
+- **Source**: Source #30
+- **Phase**: Assessment
+- **Target Audience**: YOLOE prompt strategy
+- **Confidence**: ✅ High
+- **Related Dimension**: YOLOE-26 accuracy
+
+## Fact #8
+- **Statement**: YOLOE text prompts are trained on LVIS (1203 categories) and COCO. Military concealment classes (dugouts, branch camouflage, FPV hideouts) are far out-of-distribution from training data. No published benchmarks for this domain.
+- **Source**: Sources #2, #3 (inferred from training data descriptions)
+- **Phase**: Assessment
+- **Target Audience**: YOLOE-26 zero-shot accuracy
+- **Confidence**: ⚠️ Medium (inference from known training data)
+- **Related Dimension**: YOLOE-26 accuracy
+
+## Fact #9
+- **Statement**: Smaller YOLO models (commonly used on edge devices) are more vulnerable to adversarial patch attacks than larger counterparts, creating a latency-security trade-off.
+- **Source**: Source #26
+- **Phase**: Assessment
+- **Target Audience**: Edge AI security
+- **Confidence**: ✅ High
+- **Related Dimension**: Security
+
+## Fact #10
+- **Statement**: PatchBlock is a lightweight CPU-based preprocessing module that recovers up to 77% of model accuracy under adversarial patch attacks with minimal clean accuracy loss.
+- **Source**: Source #24
+- **Phase**: Assessment
+- **Target Audience**: Edge AI adversarial defense
+- **Confidence**: ✅ High
+- **Related Dimension**: Security
+
+## Fact #11
+- **Statement**: TensorRT-LLM developers explicitly stated they "do not aim to support models on edge devices/platforms" when asked about VLM support on Orin NX.
+- **Source**: Source #37
+- **Phase**: Assessment
+- **Target Audience**: VLM runtime selection
+- **Confidence**: ✅ High
+- **Related Dimension**: VLM integration
+
+## Fact #12
+- **Statement**: vLLM can deploy 2B models on Jetson Orin Nano 8GB. Shared memory must be increased to 8GB. Memory management is critical. Bottleneck is memory bandwidth (68 GB/s), not compute (67 TOPS).
+- **Source**: Sources #35, #36
+- **Phase**: Assessment
+- **Target Audience**: VLM runtime on Jetson
+- **Confidence**: ✅ High
+- **Related Dimension**: VLM integration
+
+## Fact #13
+- **Statement**: Cosmos-Reason2-2B achieves 4.7 tok/s on Jetson Orin Nano Super with W4A16 quantization. Llama-3.1-8B W4A16 achieves 44.19 tok/s (text-only). VLMs are significantly slower due to vision encoder overhead.
+- **Source**: Sources #5, #16
+- **Phase**: Assessment
+- **Target Audience**: VLM inference speed estimation
+- **Confidence**: ✅ High
+- **Related Dimension**: VLM integration, performance
+
+## Fact #14
+- **Statement**: A 1.5B Q4 model on Jetson Orin Nano Super failed to load because KV cache temp buffer required 10.7GB while only 6.5GB was available. Model weights alone were only 876MB.
+- **Source**: Source #21
+- **Phase**: Assessment
+- **Target Audience**: VLM memory management
+- **Confidence**: ✅ High
+- **Related Dimension**: Memory contention, VLM integration
+
+## Fact #15
+- **Statement**: Morphological skeletonization suffers from noise-induced boundary variations causing spurious skeletal branches. Recent methods (2025) use scale-space hierarchical simplification for controllable robustness.
+- **Source**: Source #31 (related search results)
+- **Phase**: Assessment
+- **Target Audience**: Path tracing robustness
+- **Confidence**: ✅ High
+- **Related Dimension**: Path tracing
+
+## Fact #16
+- **Statement**: GraphMorph (2025) operates at branch-level using Graph Decoder + SkeletonDijkstra, producing topology-aware centerline masks. Reduces false positives vs pixel-level segmentation approaches.
+- **Source**: Source #32
+- **Phase**: Assessment
+- **Target Audience**: Path extraction algorithms
+- **Confidence**: ✅ High
+- **Related Dimension**: Path tracing
+
+## Fact #17
+- **Statement**: Kalman filter + coordinate transformation in UAV gimbal systems eliminates attitude and mounting errors that PID controllers alone cannot compensate for during flight.
+- **Source**: Source #34
+- **Phase**: Assessment
+- **Target Audience**: Gimbal control algorithm
+- **Confidence**: ✅ High
+- **Related Dimension**: Gimbal control
+
+## Fact #18
+- **Statement**: Synthetic data generation for camouflage detection is a validated approach: GenCAMO (2026) uses scene graphs + generative models; CamouflageAnything (CVPR 2025) uses controlled out-painting. Both improve detection baselines.
+- **Source**: Sources #28, #29
+- **Phase**: Assessment
+- **Target Audience**: Training data strategy
+- **Confidence**: ✅ High
+- **Related Dimension**: Training data
+
+## Fact #19
+- **Statement**: Usable VRAM on Jetson Orin Nano Super is approximately 5.2GB after OS overhead (not the advertised 8GB). The 8GB is shared between CPU and GPU.
+- **Source**: Source #36
+- **Phase**: Assessment
+- **Target Audience**: Memory budget planning
+- **Confidence**: ✅ High
+- **Related Dimension**: Memory contention
+
+## Fact #20
+- **Statement**: FP8 quantization for Qwen2-VL-2B performs worse than FP16 on vLLM. INT8/W4A16 are the recommended quantization formats for 2B VLMs on constrained hardware.
+- **Source**: vLLM Issue #9992
+- **Phase**: Assessment
+- **Target Audience**: VLM quantization strategy
+- **Confidence**: ✅ High
+- **Related Dimension**: VLM integration
@@ -0,0 +1,28 @@
+# Comparison Framework
+
+## Selected Framework Type
+Problem Diagnosis + Decision Support
+
+## Selected Dimensions
+1. Memory Budget Feasibility
+2. YOLO26/YOLOE-26 TRT Deployment Stability
+3. YOLOE-26 Zero-Shot Accuracy for Domain
+4. Path Tracing Algorithm Robustness
+5. VLM Runtime & Integration Viability
+6. Gimbal Control Adequacy
+7. Training Data Realism
+8. Security & Adversarial Resilience
+
+## Initial Population
+
+| Dimension | Draft01 Assumption | Researched Reality | Risk Level | Factual Basis |
+|-----------|-------------------|-------------------|------------|---------------|
+| Memory Budget | YOLO + YOLOE-26 + CNN + VLM coexist on 8GB | Only ~5.2GB usable VRAM. Single YOLO TRT engine ~2.6GB. Two engines + CNN ≈ 5-6GB. No room for VLM simultaneously. | **CRITICAL** | Fact #1, #2, #3, #14, #19 |
+| YOLO26 TRT Stability | YOLO26-Seg TRT export assumed working | YOLO26 has confirmed confidence misalignment in TRT C++ and INT8 export crashes on Jetson. Active bugs unfixed. | **HIGH** | Fact #5, #6 |
+| YOLOE-26 Zero-Shot | Text prompts "footpath", "branch pile" assumed effective | Trained on LVIS/COCO. Military concealment is far OOD. No published domain benchmarks. Generic prompts may work for "footpath" but not "dugout" or "camouflage netting". | **HIGH** | Fact #7, #8 |
+| Path Tracing | Zhang-Suen skeletonization assumed robust | Classical skeletonization is noise-sensitive — spurious branches from noisy segmentation masks. GraphMorph/learnable skeletons are more robust alternatives. | **MEDIUM** | Fact #15, #16 |
+| VLM Runtime | vLLM or TRT-LLM assumed viable | TRT-LLM explicitly does not support edge devices. vLLM works but requires careful memory management. VLM cannot run concurrently with YOLO — must unload/reload. | **HIGH** | Fact #11, #12, #14 |
+| VLM Speed | UAV-VL-R1 ≤5s assumed | Cosmos-Reason2-2B: 4.7 tok/s on Orin Nano Super. For 50-100 token response: 10-21s. Significantly exceeds 5s target. | **HIGH** | Fact #13 |
+| Gimbal Control | PID assumed sufficient | PID works for stationary UAV. During flight, Kalman filter needed to compensate attitude/mounting errors. PID alone causes drift. | **MEDIUM** | Fact #17 |
+| Training Data | 1500 images/class in 8 weeks assumed | Realistic for generic objects; challenging for military concealment (access, annotation complexity). Synthetic augmentation (GenCAMO, CamouflageAnything) can significantly help. | **MEDIUM** | Fact #18 |
+| Security | No security measures in draft01 | Small edge YOLO models are more vulnerable to adversarial patches. Physical device capture risk (model weights, logs). PatchBlock defense available. | **HIGH** | Fact #9, #10 |
@@ -0,0 +1,127 @@
+# Reasoning Chain
+
+## Dimension 1: Memory Budget Feasibility
+
+### Fact Confirmation
+The Jetson Orin Nano Super has 8GB unified (CPU+GPU shared) memory. After OS overhead, only ~5.2GB is usable for GPU workloads (Fact #19). A single YOLO TRT engine consumes ~2.6GB including cuDNN/CUDA overhead (Fact #2). The free GPU memory in a Docker container is ~3.7GB (Fact #1).
+
+### Reference Comparison
+Draft01 assumes: existing YOLO TRT + YOLOE-26 TRT + MobileNetV3-Small TRT + UAV-VL-R1 all running or ready. Reality: two TRT engines alone likely consume 4-5GB (2×~2.5GB for engines, minus some shared CUDA overhead ~1GB). VLM (UAV-VL-R1 INT8 = 2.5GB) cannot fit alongside both YOLO engines.
+
+### Conclusion
+**The draft01 architecture is memory-infeasible as designed.** Two YOLO TRT engines + CNN already saturate available memory. VLM cannot run concurrently. Solution: (A) time-multiplex — unload YOLO engines before loading VLM, (B) use a single merged TRT engine where YOLOE-26 is re-parameterized into the existing YOLO pipeline, (C) offload VLM to a companion device or cloud.
+
+### Confidence
+✅ High — multiple sources confirm memory constraints
+
+---
+
+## Dimension 2: YOLO26/YOLOE-26 TRT Deployment Stability
+
+### Fact Confirmation
+YOLO26 has confirmed bugs when deployed via TensorRT on Jetson: confidence misalignment in C++ (Fact #5), INT8 export crash (Fact #6). These are architecture-specific issues in YOLO26's end-to-end NMS-free design when converted through ONNX→TRT pipeline.
+
+### Reference Comparison
+Draft01 assumes YOLOE-26 TRT deployment works. YOLOE-26 inherits YOLO26's architecture. If YOLO26 has TRT issues, YOLOE-26 likely inherits them. YOLOv8 TRT deployment is stable and proven on Jetson.
+
+### Conclusion
+**YOLOE-26 TRT deployment on Jetson is a high-risk path.** Mitigation: (A) use YOLOE-v8-seg (YOLOE built on YOLOv8 backbone) instead of YOLOE-26-seg for initial deployment — proven TRT stability, (B) use Python Ultralytics predict() API instead of C++ TRT initially (avoids confidence issue but slower), (C) monitor Ultralytics issue tracker for YOLO26 TRT fixes before transitioning.
+
+### Confidence
+✅ High — bugs are documented with issue numbers
+
+---
+
+## Dimension 3: YOLOE-26 Zero-Shot Accuracy for Domain
+
+### Fact Confirmation
+YOLOE is trained on LVIS (1203 categories) and COCO data (Fact #8). Military concealment classes are not in LVIS/COCO. Generic concepts like "footpath" and "road" are in-distribution and should work. Domain-specific concepts like "dugout", "FPV hideout", "camouflage netting" are far out-of-distribution.
+
+### Reference Comparison
+Draft01 relies heavily on YOLOE-26 text prompts as the bootstrapping mechanism. If text prompts fail for concealment-specific classes, the entire zero-shot Phase 1 is compromised. However, visual prompts (SAVPE) and multimodal fusion (Fact #7) offer a stronger alternative for domain-specific detection.
+
+### Conclusion
+**Text prompts alone are unreliable for concealment classes.** Solution: (A) prioritize visual prompts (SAVPE) using semantic01-04.png reference images — visually similar detection is less dependent on LVIS vocabulary, (B) use multimodal fusion (text + visual) for robustness, (C) use generic text prompts only for in-distribution classes ("footpath", "road", "trail") and visual prompts for concealment-specific patterns, (D) measure zero-shot recall in first week and have fallback to heuristic detectors.
+
+### Confidence
+⚠️ Medium — no direct benchmarks for this domain, reasoning from training data distribution
+
+---
+
+## Dimension 4: Path Tracing Algorithm Robustness
+
+### Fact Confirmation
+Classical Zhang-Suen skeletonization is sensitive to boundary noise — small perturbations create spurious skeletal branches (Fact #15). Aerial segmentation masks from YOLOE-26 will have noise, partial segments, and broken paths. GraphMorph (2025) provides topology-aware centerline extraction with fewer false positives (Fact #16).
+
+### Reference Comparison
+Draft01 uses basic skeletonization + hit-miss endpoint detection. This works on clean synthetic masks but will fail on real-world noisy masks with multiple branches, gaps, and artifacts.
+
+### Conclusion
+**Add morphological preprocessing before skeletonization.** Solution: (A) apply Gaussian blur + binary threshold + morphological closing to clean segmentation masks before skeletonization, (B) prune short skeleton branches (< threshold length) to remove noise-induced artifacts, (C) consider GraphMorph as a more robust alternative if simple pruning is insufficient, (D) increase ROI crop from 128×128 to 256×256 for more spatial context in CNN classification.
+
+### Confidence
+✅ High — skeletonization noise sensitivity is well-documented
+
+---
+
+## Dimension 5: VLM Runtime & Integration Viability
+
+### Fact Confirmation
+TensorRT-LLM explicitly does not support edge devices (Fact #11). vLLM works on Jetson but requires careful memory management (Fact #12). VLM inference speed for 2B models: Cosmos-Reason2-2B achieves only 4.7 tok/s (Fact #13). For a 50-100 token response, this means 10-21 seconds — far exceeding the 5s target. A 1.5B model failed to load due to KV cache buffer requirements exceeding available memory (Fact #14).
+
+### Reference Comparison
+Draft01 assumes UAV-VL-R1 runs in ≤5s via vLLM or TRT-LLM. Reality: TRT-LLM is not an option. vLLM can work but inference will be 10-20s, not 5s. VLM cannot coexist with YOLO engines in memory.
+
+### Conclusion
+**VLM tier needs fundamental redesign.** Solutions: (A) accept 10-20s VLM latency — change from "optional real-time" to "background analysis" mode where VLM results arrive after operator has moved on, (B) use SmolVLM-500M (1.8GB) or Moondream-0.5B (816 MiB INT4) instead of UAV-VL-R1 for faster inference and smaller footprint, (C) implement VLM as demand-loaded: unload all TRT engines → load VLM → infer → unload VLM → reload TRT engines (adds 2-3s per switch but ensures memory fits), (D) defer VLM to a ground station connected via datalink if latency is acceptable.
+
+### Confidence
+✅ High — benchmarks directly confirm constraints
+
+---
+
+## Dimension 6: Gimbal Control Adequacy
+
+### Fact Confirmation
+PID controllers work for gimbal stabilization when the platform is relatively stable (Fact #17). However, during UAV flight, attitude changes and mounting errors cause drift that PID alone cannot compensate. Kalman filter + coordinate transformation is proven to eliminate these errors (Fact #17).
+
+### Reference Comparison
+Draft01 uses PID-only control. This works if the UAV is hovering with minimal movement. During active flight (Level 1 sweep), the UAV is moving, causing gimbal drift and path-following errors.
+
+### Conclusion
+**Add Kalman filter to gimbal control pipeline.** Solution: cascade architecture: Kalman filter for state estimation (compensates UAV attitude) → PID for error correction → gimbal actuator. Use UAV IMU data as Kalman filter input. This is standard practice for aerial gimbal systems.
+
+### Confidence
+✅ High — well-established engineering practice
+
+---
+
+## Dimension 7: Training Data Realism
+
+### Fact Confirmation
+Ultralytics recommends ≥1500 images/class and ≥10,000 instances/class for good YOLO performance (Source #15). Synthetic data generation for camouflage detection is validated: GenCAMO and CamouflageAnything both improve detection baselines (Fact #18).
+
+### Reference Comparison
+Draft01 targets 500+ images by week 6, 1500+ by week 8. For military concealment in winter conditions, this is optimistic: images are hard to collect (active conflict zone), annotation requires domain expertise, and class diversity is limited.
+
+### Conclusion
+**Supplement real data with synthetic augmentation.** Solution: (A) use CamouflageAnything or GenCAMO to generate synthetic concealment training data, (B) apply cut-paste augmentation of annotated concealment objects onto clean terrain backgrounds, (C) reduce initial target to 300-500 real images + 1000+ synthetic images per class for first model iteration, (D) implement active learning: YOLOE-26 zero-shot flags candidates → human annotates → feeds into training loop.
+
+### Confidence
+⚠️ Medium — synthetic data quality for this specific domain is untested
+
+---
+
+## Dimension 8: Security & Adversarial Resilience
+
+### Fact Confirmation
+Smaller YOLO models are more vulnerable to adversarial patches (Fact #9). PatchBlock defense recovers up to 77% accuracy (Fact #10). No security considerations in draft01 for model weight protection, inference pipeline integrity, or operational data.
+
+### Reference Comparison
+Draft01 has zero security measures. For a military edge device that could be physically captured, this is a significant gap.
+
+### Conclusion
+**Add three security layers.** (A) Adversarial defense: integrate PatchBlock or equivalent CPU-based preprocessing to detect anomalous input patches, (B) Model protection: encrypt TRT engine files at rest, decrypt into tmpfs at boot, use secure boot chain, (C) Operational data: encrypted circular buffer for captured imagery, auto-wipe on tamper detection, transmit only coordinates + confidence (not raw imagery) over datalink.
+
+### Confidence
+✅ High — threats are well-documented, mitigations exist
@@ -0,0 +1,55 @@
+# Validation Log
+
+## Validation Scenario
+Same scenario as draft01: Winter reconnaissance flight at 700m altitude over forested area. But now accounting for memory constraints, TRT bugs, and revised VLM latency.
+
+## Expected Based on Revised Conclusions
+
+**Using the revised architecture (YOLOE-v8-seg, demand-loaded VLM, Kalman+PID gimbal):**
+
+1. Level 1 sweep begins. Single TRT engine running YOLOE-v8-seg (re-parameterized with fixed classes) + existing YOLO detection in shared engine context. Memory: ~3-3.5GB for combined engine. Inference: ~13ms (s-size).
+
+2. YOLOE-v8-seg detects footpaths via text prompt ("footpath", "trail") + visual prompt (reference images of paths). Also detects "road", "tree row". CNN-specific concealment classes handled by visual prompts only.
+
+3. Path segmentation mask preprocessed: Gaussian blur → binary threshold → morphological closing → skeletonization → branch pruning. Endpoints extracted. 256×256 ROI crops.
+
+4. MobileNetV3-Small CNN classifies endpoints. Memory: ~50MB TRT engine. Total pipeline (mask preprocessing + skeleton + CNN): ~150ms.
+
+5. High-confidence detection → operator alert with coordinates. Ambiguous detection (CNN 30-70%) → queued for VLM analysis.
+
+6. VLM analysis is **background/batch mode**: Scan controller continues Level 1 sweep. When a batch of 3-5 ambiguous detections accumulates or operator requests deep analysis: pause YOLO TRT → unload engine → load Moondream-0.5B (816 MiB) → analyze batch → unload → reload YOLO TRT. Total pause: ~20-40s. Operator receives delayed analysis results.
+
+7. Gimbal: Kalman filter fuses IMU data for state estimation → PID corrects → gimbal actuates. Path-following during Level 2 is smoother, compensates for UAV drift.
+
+## Actual Validation Results
+Cannot validate against real-world data. Validation based on:
+- YOLOE-v8-seg TRT deployment on Jetson is proven stable (unlike YOLO26)
+- Memory budget: ~3.5GB (YOLO engine) + 0.8GB (Moondream) = 4.3GB peak during VLM phase, within 5.2GB usable
+- Moondream 0.5B is confirmed to run on Raspberry Pi — Jetson will be faster
+- Kalman+PID gimbal control is standard aerospace engineering
+
+## Counterexamples
+
+1. **VLM delay unacceptable**: If 20-40s batch VLM delay is unacceptable, could use Moondream's detect() API for faster binary yes/no (~2-5s for 0.5B) instead of full text generation. Or skip VLM entirely and rely on CNN + operator judgment.
+
+2. **YOLOE-v8-seg accuracy lower than YOLOE-26-seg**: YOLOE-v8 is older architecture. YOLOE-26 should have better accuracy. Mitigation: use YOLOE-v8 for stable deployment now, switch to YOLOE-26 once TRT bugs are fixed.
+
+3. **Model switching latency**: Loading/unloading TRT engines adds 2-3s each direction. For frequent VLM requests, this overhead accumulates. Mitigation: batch VLM requests, implement predictive pre-loading.
+
+4. **Single-engine approach limits flexibility**: Merging YOLOE + existing YOLO into one engine may require re-exporting when classes change. Mitigation: use YOLOE re-parameterization — when classes are fixed, YOLOE becomes standard YOLO with zero overhead.
+
+## Review Checklist
+- [x] Draft conclusions consistent with fact cards
+- [x] No important dimensions missed
+- [x] No over-extrapolation
+- [x] Conclusions actionable/verifiable
+- [x] Memory budget calculated from documented values
+- [x] TRT deployment risk based on documented bugs
+- [ ] Note: YOLOE-v8-seg TRT stability on Jetson not directly tested (inferred from YOLOv8 stability)
+- [ ] Note: Moondream 0.5B accuracy for aerial concealment analysis is unknown
+
+## Conclusions Requiring Revision
+- VLM latency target must change from ≤5s to "background batch" (20-40s)
+- Consider dropping VLM entirely for MVP and adding later when hardware/software matures
+- YOLOE-26 should be replaced with YOLOE-v8 for initial deployment
+- Memory architecture needs explicit budget table in solution draft