Initial commit

Made-with: Cursor
2026-06-22 11:41:13 +00:00 · 2026-03-26 00:20:30 +02:00
commit 8e2ecf50fd
144 changed files with 19781 additions and 0 deletions
@@ -0,0 +1,127 @@
+# Reasoning Chain
+
+## Dimension 1: Memory Budget Feasibility
+
+### Fact Confirmation
+The Jetson Orin Nano Super has 8GB unified (CPU+GPU shared) memory. After OS overhead, only ~5.2GB is usable for GPU workloads (Fact #19). A single YOLO TRT engine consumes ~2.6GB including cuDNN/CUDA overhead (Fact #2). The free GPU memory in a Docker container is ~3.7GB (Fact #1).
+
+### Reference Comparison
+Draft01 assumes: existing YOLO TRT + YOLOE-26 TRT + MobileNetV3-Small TRT + UAV-VL-R1 all running or ready. Reality: two TRT engines alone likely consume 4-5GB (2×~2.5GB for engines, minus some shared CUDA overhead ~1GB). VLM (UAV-VL-R1 INT8 = 2.5GB) cannot fit alongside both YOLO engines.
+
+### Conclusion
+**The draft01 architecture is memory-infeasible as designed.** Two YOLO TRT engines + CNN already saturate available memory. VLM cannot run concurrently. Solution: (A) time-multiplex — unload YOLO engines before loading VLM, (B) use a single merged TRT engine where YOLOE-26 is re-parameterized into the existing YOLO pipeline, (C) offload VLM to a companion device or cloud.
+
+### Confidence
+✅ High — multiple sources confirm memory constraints
+
+---
+
+## Dimension 2: YOLO26/YOLOE-26 TRT Deployment Stability
+
+### Fact Confirmation
+YOLO26 has confirmed bugs when deployed via TensorRT on Jetson: confidence misalignment in C++ (Fact #5), INT8 export crash (Fact #6). These are architecture-specific issues in YOLO26's end-to-end NMS-free design when converted through ONNX→TRT pipeline.
+
+### Reference Comparison
+Draft01 assumes YOLOE-26 TRT deployment works. YOLOE-26 inherits YOLO26's architecture. If YOLO26 has TRT issues, YOLOE-26 likely inherits them. YOLOv8 TRT deployment is stable and proven on Jetson.
+
+### Conclusion
+**YOLOE-26 TRT deployment on Jetson is a high-risk path.** Mitigation: (A) use YOLOE-v8-seg (YOLOE built on YOLOv8 backbone) instead of YOLOE-26-seg for initial deployment — proven TRT stability, (B) use Python Ultralytics predict() API instead of C++ TRT initially (avoids confidence issue but slower), (C) monitor Ultralytics issue tracker for YOLO26 TRT fixes before transitioning.
+
+### Confidence
+✅ High — bugs are documented with issue numbers
+
+---
+
+## Dimension 3: YOLOE-26 Zero-Shot Accuracy for Domain
+
+### Fact Confirmation
+YOLOE is trained on LVIS (1203 categories) and COCO data (Fact #8). Military concealment classes are not in LVIS/COCO. Generic concepts like "footpath" and "road" are in-distribution and should work. Domain-specific concepts like "dugout", "FPV hideout", "camouflage netting" are far out-of-distribution.
+
+### Reference Comparison
+Draft01 relies heavily on YOLOE-26 text prompts as the bootstrapping mechanism. If text prompts fail for concealment-specific classes, the entire zero-shot Phase 1 is compromised. However, visual prompts (SAVPE) and multimodal fusion (Fact #7) offer a stronger alternative for domain-specific detection.
+
+### Conclusion
+**Text prompts alone are unreliable for concealment classes.** Solution: (A) prioritize visual prompts (SAVPE) using semantic01-04.png reference images — visually similar detection is less dependent on LVIS vocabulary, (B) use multimodal fusion (text + visual) for robustness, (C) use generic text prompts only for in-distribution classes ("footpath", "road", "trail") and visual prompts for concealment-specific patterns, (D) measure zero-shot recall in first week and have fallback to heuristic detectors.
+
+### Confidence
+⚠️ Medium — no direct benchmarks for this domain, reasoning from training data distribution
+
+---
+
+## Dimension 4: Path Tracing Algorithm Robustness
+
+### Fact Confirmation
+Classical Zhang-Suen skeletonization is sensitive to boundary noise — small perturbations create spurious skeletal branches (Fact #15). Aerial segmentation masks from YOLOE-26 will have noise, partial segments, and broken paths. GraphMorph (2025) provides topology-aware centerline extraction with fewer false positives (Fact #16).
+
+### Reference Comparison
+Draft01 uses basic skeletonization + hit-miss endpoint detection. This works on clean synthetic masks but will fail on real-world noisy masks with multiple branches, gaps, and artifacts.
+
+### Conclusion
+**Add morphological preprocessing before skeletonization.** Solution: (A) apply Gaussian blur + binary threshold + morphological closing to clean segmentation masks before skeletonization, (B) prune short skeleton branches (< threshold length) to remove noise-induced artifacts, (C) consider GraphMorph as a more robust alternative if simple pruning is insufficient, (D) increase ROI crop from 128×128 to 256×256 for more spatial context in CNN classification.
+
+### Confidence
+✅ High — skeletonization noise sensitivity is well-documented
+
+---
+
+## Dimension 5: VLM Runtime & Integration Viability
+
+### Fact Confirmation
+TensorRT-LLM explicitly does not support edge devices (Fact #11). vLLM works on Jetson but requires careful memory management (Fact #12). VLM inference speed for 2B models: Cosmos-Reason2-2B achieves only 4.7 tok/s (Fact #13). For a 50-100 token response, this means 10-21 seconds — far exceeding the 5s target. A 1.5B model failed to load due to KV cache buffer requirements exceeding available memory (Fact #14).
+
+### Reference Comparison
+Draft01 assumes UAV-VL-R1 runs in ≤5s via vLLM or TRT-LLM. Reality: TRT-LLM is not an option. vLLM can work but inference will be 10-20s, not 5s. VLM cannot coexist with YOLO engines in memory.
+
+### Conclusion
+**VLM tier needs fundamental redesign.** Solutions: (A) accept 10-20s VLM latency — change from "optional real-time" to "background analysis" mode where VLM results arrive after operator has moved on, (B) use SmolVLM-500M (1.8GB) or Moondream-0.5B (816 MiB INT4) instead of UAV-VL-R1 for faster inference and smaller footprint, (C) implement VLM as demand-loaded: unload all TRT engines → load VLM → infer → unload VLM → reload TRT engines (adds 2-3s per switch but ensures memory fits), (D) defer VLM to a ground station connected via datalink if latency is acceptable.
+
+### Confidence
+✅ High — benchmarks directly confirm constraints
+
+---
+
+## Dimension 6: Gimbal Control Adequacy
+
+### Fact Confirmation
+PID controllers work for gimbal stabilization when the platform is relatively stable (Fact #17). However, during UAV flight, attitude changes and mounting errors cause drift that PID alone cannot compensate. Kalman filter + coordinate transformation is proven to eliminate these errors (Fact #17).
+
+### Reference Comparison
+Draft01 uses PID-only control. This works if the UAV is hovering with minimal movement. During active flight (Level 1 sweep), the UAV is moving, causing gimbal drift and path-following errors.
+
+### Conclusion
+**Add Kalman filter to gimbal control pipeline.** Solution: cascade architecture: Kalman filter for state estimation (compensates UAV attitude) → PID for error correction → gimbal actuator. Use UAV IMU data as Kalman filter input. This is standard practice for aerial gimbal systems.
+
+### Confidence
+✅ High — well-established engineering practice
+
+---
+
+## Dimension 7: Training Data Realism
+
+### Fact Confirmation
+Ultralytics recommends ≥1500 images/class and ≥10,000 instances/class for good YOLO performance (Source #15). Synthetic data generation for camouflage detection is validated: GenCAMO and CamouflageAnything both improve detection baselines (Fact #18).
+
+### Reference Comparison
+Draft01 targets 500+ images by week 6, 1500+ by week 8. For military concealment in winter conditions, this is optimistic: images are hard to collect (active conflict zone), annotation requires domain expertise, and class diversity is limited.
+
+### Conclusion
+**Supplement real data with synthetic augmentation.** Solution: (A) use CamouflageAnything or GenCAMO to generate synthetic concealment training data, (B) apply cut-paste augmentation of annotated concealment objects onto clean terrain backgrounds, (C) reduce initial target to 300-500 real images + 1000+ synthetic images per class for first model iteration, (D) implement active learning: YOLOE-26 zero-shot flags candidates → human annotates → feeds into training loop.
+
+### Confidence
+⚠️ Medium — synthetic data quality for this specific domain is untested
+
+---
+
+## Dimension 8: Security & Adversarial Resilience
+
+### Fact Confirmation
+Smaller YOLO models are more vulnerable to adversarial patches (Fact #9). PatchBlock defense recovers up to 77% accuracy (Fact #10). No security considerations in draft01 for model weight protection, inference pipeline integrity, or operational data.
+
+### Reference Comparison
+Draft01 has zero security measures. For a military edge device that could be physically captured, this is a significant gap.
+
+### Conclusion
+**Add three security layers.** (A) Adversarial defense: integrate PatchBlock or equivalent CPU-based preprocessing to detect anomalous input patches, (B) Model protection: encrypt TRT engine files at rest, decrypt into tmpfs at boot, use secure boot chain, (C) Operational data: encrypted circular buffer for captured imagery, auto-wipe on tamper detection, transmit only coordinates + confidence (not raw imagery) over datalink.
+
+### Confidence
+✅ High — threats are well-documented, mitigations exist