Initial commit

Made-with: Cursor
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-03-26 00:20:30 +02:00
commit 8e2ecf50fd
144 changed files with 19781 additions and 0 deletions
+55
View File
@@ -0,0 +1,55 @@
# Validation Log
## Validation Scenario
Same scenario as draft01: Winter reconnaissance flight at 700m altitude over forested area. But now accounting for memory constraints, TRT bugs, and revised VLM latency.
## Expected Based on Revised Conclusions
**Using the revised architecture (YOLOE-v8-seg, demand-loaded VLM, Kalman+PID gimbal):**
1. Level 1 sweep begins. Single TRT engine running YOLOE-v8-seg (re-parameterized with fixed classes) + existing YOLO detection in shared engine context. Memory: ~3-3.5GB for combined engine. Inference: ~13ms (s-size).
2. YOLOE-v8-seg detects footpaths via text prompt ("footpath", "trail") + visual prompt (reference images of paths). Also detects "road", "tree row". CNN-specific concealment classes handled by visual prompts only.
3. Path segmentation mask preprocessed: Gaussian blur → binary threshold → morphological closing → skeletonization → branch pruning. Endpoints extracted. 256×256 ROI crops.
4. MobileNetV3-Small CNN classifies endpoints. Memory: ~50MB TRT engine. Total pipeline (mask preprocessing + skeleton + CNN): ~150ms.
5. High-confidence detection → operator alert with coordinates. Ambiguous detection (CNN 30-70%) → queued for VLM analysis.
6. VLM analysis is **background/batch mode**: Scan controller continues Level 1 sweep. When a batch of 3-5 ambiguous detections accumulates or operator requests deep analysis: pause YOLO TRT → unload engine → load Moondream-0.5B (816 MiB) → analyze batch → unload → reload YOLO TRT. Total pause: ~20-40s. Operator receives delayed analysis results.
7. Gimbal: Kalman filter fuses IMU data for state estimation → PID corrects → gimbal actuates. Path-following during Level 2 is smoother, compensates for UAV drift.
## Actual Validation Results
Cannot validate against real-world data. Validation based on:
- YOLOE-v8-seg TRT deployment on Jetson is proven stable (unlike YOLO26)
- Memory budget: ~3.5GB (YOLO engine) + 0.8GB (Moondream) = 4.3GB peak during VLM phase, within 5.2GB usable
- Moondream 0.5B is confirmed to run on Raspberry Pi — Jetson will be faster
- Kalman+PID gimbal control is standard aerospace engineering
## Counterexamples
1. **VLM delay unacceptable**: If 20-40s batch VLM delay is unacceptable, could use Moondream's detect() API for faster binary yes/no (~2-5s for 0.5B) instead of full text generation. Or skip VLM entirely and rely on CNN + operator judgment.
2. **YOLOE-v8-seg accuracy lower than YOLOE-26-seg**: YOLOE-v8 is older architecture. YOLOE-26 should have better accuracy. Mitigation: use YOLOE-v8 for stable deployment now, switch to YOLOE-26 once TRT bugs are fixed.
3. **Model switching latency**: Loading/unloading TRT engines adds 2-3s each direction. For frequent VLM requests, this overhead accumulates. Mitigation: batch VLM requests, implement predictive pre-loading.
4. **Single-engine approach limits flexibility**: Merging YOLOE + existing YOLO into one engine may require re-exporting when classes change. Mitigation: use YOLOE re-parameterization — when classes are fixed, YOLOE becomes standard YOLO with zero overhead.
## Review Checklist
- [x] Draft conclusions consistent with fact cards
- [x] No important dimensions missed
- [x] No over-extrapolation
- [x] Conclusions actionable/verifiable
- [x] Memory budget calculated from documented values
- [x] TRT deployment risk based on documented bugs
- [ ] Note: YOLOE-v8-seg TRT stability on Jetson not directly tested (inferred from YOLOv8 stability)
- [ ] Note: Moondream 0.5B accuracy for aerial concealment analysis is unknown
## Conclusions Requiring Revision
- VLM latency target must change from ≤5s to "background batch" (20-40s)
- Consider dropping VLM entirely for MVP and adding later when hardware/software matures
- YOLOE-26 should be replaced with YOLOE-v8 for initial deployment
- Memory architecture needs explicit budget table in solution draft