# Validation Log ## Validation Scenario Same scenario as draft01: Winter reconnaissance flight at 700m altitude over forested area. But now accounting for memory constraints, TRT bugs, and revised VLM latency. ## Expected Based on Revised Conclusions **Using the revised architecture (YOLOE-v8-seg, demand-loaded VLM, Kalman+PID gimbal):** 1. Level 1 sweep begins. Single TRT engine running YOLOE-v8-seg (re-parameterized with fixed classes) + existing YOLO detection in shared engine context. Memory: ~3-3.5GB for combined engine. Inference: ~13ms (s-size). 2. YOLOE-v8-seg detects footpaths via text prompt ("footpath", "trail") + visual prompt (reference images of paths). Also detects "road", "tree row". CNN-specific concealment classes handled by visual prompts only. 3. Path segmentation mask preprocessed: Gaussian blur → binary threshold → morphological closing → skeletonization → branch pruning. Endpoints extracted. 256×256 ROI crops. 4. MobileNetV3-Small CNN classifies endpoints. Memory: ~50MB TRT engine. Total pipeline (mask preprocessing + skeleton + CNN): ~150ms. 5. High-confidence detection → operator alert with coordinates. Ambiguous detection (CNN 30-70%) → queued for VLM analysis. 6. VLM analysis is **background/batch mode**: Scan controller continues Level 1 sweep. When a batch of 3-5 ambiguous detections accumulates or operator requests deep analysis: pause YOLO TRT → unload engine → load Moondream-0.5B (816 MiB) → analyze batch → unload → reload YOLO TRT. Total pause: ~20-40s. Operator receives delayed analysis results. 7. Gimbal: Kalman filter fuses IMU data for state estimation → PID corrects → gimbal actuates. Path-following during Level 2 is smoother, compensates for UAV drift. ## Actual Validation Results Cannot validate against real-world data. Validation based on: - YOLOE-v8-seg TRT deployment on Jetson is proven stable (unlike YOLO26) - Memory budget: ~3.5GB (YOLO engine) + 0.8GB (Moondream) = 4.3GB peak during VLM phase, within 5.2GB usable - Moondream 0.5B is confirmed to run on Raspberry Pi — Jetson will be faster - Kalman+PID gimbal control is standard aerospace engineering ## Counterexamples 1. **VLM delay unacceptable**: If 20-40s batch VLM delay is unacceptable, could use Moondream's detect() API for faster binary yes/no (~2-5s for 0.5B) instead of full text generation. Or skip VLM entirely and rely on CNN + operator judgment. 2. **YOLOE-v8-seg accuracy lower than YOLOE-26-seg**: YOLOE-v8 is older architecture. YOLOE-26 should have better accuracy. Mitigation: use YOLOE-v8 for stable deployment now, switch to YOLOE-26 once TRT bugs are fixed. 3. **Model switching latency**: Loading/unloading TRT engines adds 2-3s each direction. For frequent VLM requests, this overhead accumulates. Mitigation: batch VLM requests, implement predictive pre-loading. 4. **Single-engine approach limits flexibility**: Merging YOLOE + existing YOLO into one engine may require re-exporting when classes change. Mitigation: use YOLOE re-parameterization — when classes are fixed, YOLOE becomes standard YOLO with zero overhead. ## Review Checklist - [x] Draft conclusions consistent with fact cards - [x] No important dimensions missed - [x] No over-extrapolation - [x] Conclusions actionable/verifiable - [x] Memory budget calculated from documented values - [x] TRT deployment risk based on documented bugs - [ ] Note: YOLOE-v8-seg TRT stability on Jetson not directly tested (inferred from YOLOv8 stability) - [ ] Note: Moondream 0.5B accuracy for aerial concealment analysis is unknown ## Conclusions Requiring Revision - VLM latency target must change from ≤5s to "background batch" (20-40s) - Consider dropping VLM entirely for MVP and adding later when hardware/software matures - YOLOE-26 should be replaced with YOLOE-v8 for initial deployment - Memory architecture needs explicit budget table in solution draft