mirror of
https://github.com/azaion/detections-semantic.git
synced 2026-04-22 11:16:37 +00:00
Initial commit
Made-with: Cursor
This commit is contained in:
@@ -0,0 +1,55 @@
|
||||
# Validation Log
|
||||
|
||||
## Validation Scenario
|
||||
Same scenario as draft01: Winter reconnaissance flight at 700m altitude over forested area. But now accounting for memory constraints, TRT bugs, and revised VLM latency.
|
||||
|
||||
## Expected Based on Revised Conclusions
|
||||
|
||||
**Using the revised architecture (YOLOE-v8-seg, demand-loaded VLM, Kalman+PID gimbal):**
|
||||
|
||||
1. Level 1 sweep begins. Single TRT engine running YOLOE-v8-seg (re-parameterized with fixed classes) + existing YOLO detection in shared engine context. Memory: ~3-3.5GB for combined engine. Inference: ~13ms (s-size).
|
||||
|
||||
2. YOLOE-v8-seg detects footpaths via text prompt ("footpath", "trail") + visual prompt (reference images of paths). Also detects "road", "tree row". CNN-specific concealment classes handled by visual prompts only.
|
||||
|
||||
3. Path segmentation mask preprocessed: Gaussian blur → binary threshold → morphological closing → skeletonization → branch pruning. Endpoints extracted. 256×256 ROI crops.
|
||||
|
||||
4. MobileNetV3-Small CNN classifies endpoints. Memory: ~50MB TRT engine. Total pipeline (mask preprocessing + skeleton + CNN): ~150ms.
|
||||
|
||||
5. High-confidence detection → operator alert with coordinates. Ambiguous detection (CNN 30-70%) → queued for VLM analysis.
|
||||
|
||||
6. VLM analysis is **background/batch mode**: Scan controller continues Level 1 sweep. When a batch of 3-5 ambiguous detections accumulates or operator requests deep analysis: pause YOLO TRT → unload engine → load Moondream-0.5B (816 MiB) → analyze batch → unload → reload YOLO TRT. Total pause: ~20-40s. Operator receives delayed analysis results.
|
||||
|
||||
7. Gimbal: Kalman filter fuses IMU data for state estimation → PID corrects → gimbal actuates. Path-following during Level 2 is smoother, compensates for UAV drift.
|
||||
|
||||
## Actual Validation Results
|
||||
Cannot validate against real-world data. Validation based on:
|
||||
- YOLOE-v8-seg TRT deployment on Jetson is proven stable (unlike YOLO26)
|
||||
- Memory budget: ~3.5GB (YOLO engine) + 0.8GB (Moondream) = 4.3GB peak during VLM phase, within 5.2GB usable
|
||||
- Moondream 0.5B is confirmed to run on Raspberry Pi — Jetson will be faster
|
||||
- Kalman+PID gimbal control is standard aerospace engineering
|
||||
|
||||
## Counterexamples
|
||||
|
||||
1. **VLM delay unacceptable**: If 20-40s batch VLM delay is unacceptable, could use Moondream's detect() API for faster binary yes/no (~2-5s for 0.5B) instead of full text generation. Or skip VLM entirely and rely on CNN + operator judgment.
|
||||
|
||||
2. **YOLOE-v8-seg accuracy lower than YOLOE-26-seg**: YOLOE-v8 is older architecture. YOLOE-26 should have better accuracy. Mitigation: use YOLOE-v8 for stable deployment now, switch to YOLOE-26 once TRT bugs are fixed.
|
||||
|
||||
3. **Model switching latency**: Loading/unloading TRT engines adds 2-3s each direction. For frequent VLM requests, this overhead accumulates. Mitigation: batch VLM requests, implement predictive pre-loading.
|
||||
|
||||
4. **Single-engine approach limits flexibility**: Merging YOLOE + existing YOLO into one engine may require re-exporting when classes change. Mitigation: use YOLOE re-parameterization — when classes are fixed, YOLOE becomes standard YOLO with zero overhead.
|
||||
|
||||
## Review Checklist
|
||||
- [x] Draft conclusions consistent with fact cards
|
||||
- [x] No important dimensions missed
|
||||
- [x] No over-extrapolation
|
||||
- [x] Conclusions actionable/verifiable
|
||||
- [x] Memory budget calculated from documented values
|
||||
- [x] TRT deployment risk based on documented bugs
|
||||
- [ ] Note: YOLOE-v8-seg TRT stability on Jetson not directly tested (inferred from YOLOv8 stability)
|
||||
- [ ] Note: Moondream 0.5B accuracy for aerial concealment analysis is unknown
|
||||
|
||||
## Conclusions Requiring Revision
|
||||
- VLM latency target must change from ≤5s to "background batch" (20-40s)
|
||||
- Consider dropping VLM entirely for MVP and adding later when hardware/software matures
|
||||
- YOLOE-26 should be replaced with YOLOE-v8 for initial deployment
|
||||
- Memory architecture needs explicit budget table in solution draft
|
||||
Reference in New Issue
Block a user