Made-with: Cursor
3.8 KiB
Validation Log
Validation Scenario
Same scenario as draft01: Winter reconnaissance flight at 700m altitude over forested area. But now accounting for memory constraints, TRT bugs, and revised VLM latency.
Expected Based on Revised Conclusions
Using the revised architecture (YOLOE-v8-seg, demand-loaded VLM, Kalman+PID gimbal):
-
Level 1 sweep begins. Single TRT engine running YOLOE-v8-seg (re-parameterized with fixed classes) + existing YOLO detection in shared engine context. Memory: ~3-3.5GB for combined engine. Inference: ~13ms (s-size).
-
YOLOE-v8-seg detects footpaths via text prompt ("footpath", "trail") + visual prompt (reference images of paths). Also detects "road", "tree row". CNN-specific concealment classes handled by visual prompts only.
-
Path segmentation mask preprocessed: Gaussian blur → binary threshold → morphological closing → skeletonization → branch pruning. Endpoints extracted. 256×256 ROI crops.
-
MobileNetV3-Small CNN classifies endpoints. Memory: ~50MB TRT engine. Total pipeline (mask preprocessing + skeleton + CNN): ~150ms.
-
High-confidence detection → operator alert with coordinates. Ambiguous detection (CNN 30-70%) → queued for VLM analysis.
-
VLM analysis is background/batch mode: Scan controller continues Level 1 sweep. When a batch of 3-5 ambiguous detections accumulates or operator requests deep analysis: pause YOLO TRT → unload engine → load Moondream-0.5B (816 MiB) → analyze batch → unload → reload YOLO TRT. Total pause: ~20-40s. Operator receives delayed analysis results.
-
Gimbal: Kalman filter fuses IMU data for state estimation → PID corrects → gimbal actuates. Path-following during Level 2 is smoother, compensates for UAV drift.
Actual Validation Results
Cannot validate against real-world data. Validation based on:
- YOLOE-v8-seg TRT deployment on Jetson is proven stable (unlike YOLO26)
- Memory budget: ~3.5GB (YOLO engine) + 0.8GB (Moondream) = 4.3GB peak during VLM phase, within 5.2GB usable
- Moondream 0.5B is confirmed to run on Raspberry Pi — Jetson will be faster
- Kalman+PID gimbal control is standard aerospace engineering
Counterexamples
-
VLM delay unacceptable: If 20-40s batch VLM delay is unacceptable, could use Moondream's detect() API for faster binary yes/no (~2-5s for 0.5B) instead of full text generation. Or skip VLM entirely and rely on CNN + operator judgment.
-
YOLOE-v8-seg accuracy lower than YOLOE-26-seg: YOLOE-v8 is older architecture. YOLOE-26 should have better accuracy. Mitigation: use YOLOE-v8 for stable deployment now, switch to YOLOE-26 once TRT bugs are fixed.
-
Model switching latency: Loading/unloading TRT engines adds 2-3s each direction. For frequent VLM requests, this overhead accumulates. Mitigation: batch VLM requests, implement predictive pre-loading.
-
Single-engine approach limits flexibility: Merging YOLOE + existing YOLO into one engine may require re-exporting when classes change. Mitigation: use YOLOE re-parameterization — when classes are fixed, YOLOE becomes standard YOLO with zero overhead.
Review Checklist
- Draft conclusions consistent with fact cards
- No important dimensions missed
- No over-extrapolation
- Conclusions actionable/verifiable
- Memory budget calculated from documented values
- TRT deployment risk based on documented bugs
- Note: YOLOE-v8-seg TRT stability on Jetson not directly tested (inferred from YOLOv8 stability)
- Note: Moondream 0.5B accuracy for aerial concealment analysis is unknown
Conclusions Requiring Revision
- VLM latency target must change from ≤5s to "background batch" (20-40s)
- Consider dropping VLM entirely for MVP and adding later when hardware/software matures
- YOLOE-26 should be replaced with YOLOE-v8 for initial deployment
- Memory architecture needs explicit budget table in solution draft