mirror of https://github.com/azaion/detections-semantic.git synced 2026-04-22 22:36:38 +00:00

Files

T

Oleksandr Bezdieniezhnykh 8e2ecf50fd Initial commit

Made-with: Cursor

2026-03-26 00:20:30 +02:00

3.8 KiB

Raw Blame History

Validation Log

Validation Scenario

Same scenario as draft01: Winter reconnaissance flight at 700m altitude over forested area. But now accounting for memory constraints, TRT bugs, and revised VLM latency.

Expected Based on Revised Conclusions

Using the revised architecture (YOLOE-v8-seg, demand-loaded VLM, Kalman+PID gimbal):

Level 1 sweep begins. Single TRT engine running YOLOE-v8-seg (re-parameterized with fixed classes) + existing YOLO detection in shared engine context. Memory: ~3-3.5GB for combined engine. Inference: ~13ms (s-size).
YOLOE-v8-seg detects footpaths via text prompt ("footpath", "trail") + visual prompt (reference images of paths). Also detects "road", "tree row". CNN-specific concealment classes handled by visual prompts only.
Path segmentation mask preprocessed: Gaussian blur → binary threshold → morphological closing → skeletonization → branch pruning. Endpoints extracted. 256×256 ROI crops.
MobileNetV3-Small CNN classifies endpoints. Memory: ~50MB TRT engine. Total pipeline (mask preprocessing + skeleton + CNN): ~150ms.
High-confidence detection → operator alert with coordinates. Ambiguous detection (CNN 30-70%) → queued for VLM analysis.
VLM analysis is background/batch mode: Scan controller continues Level 1 sweep. When a batch of 3-5 ambiguous detections accumulates or operator requests deep analysis: pause YOLO TRT → unload engine → load Moondream-0.5B (816 MiB) → analyze batch → unload → reload YOLO TRT. Total pause: ~20-40s. Operator receives delayed analysis results.
Gimbal: Kalman filter fuses IMU data for state estimation → PID corrects → gimbal actuates. Path-following during Level 2 is smoother, compensates for UAV drift.

Actual Validation Results

Cannot validate against real-world data. Validation based on:

YOLOE-v8-seg TRT deployment on Jetson is proven stable (unlike YOLO26)
Memory budget: ~3.5GB (YOLO engine) + 0.8GB (Moondream) = 4.3GB peak during VLM phase, within 5.2GB usable
Moondream 0.5B is confirmed to run on Raspberry Pi — Jetson will be faster
Kalman+PID gimbal control is standard aerospace engineering

Counterexamples

VLM delay unacceptable: If 20-40s batch VLM delay is unacceptable, could use Moondream's detect() API for faster binary yes/no (~2-5s for 0.5B) instead of full text generation. Or skip VLM entirely and rely on CNN + operator judgment.
YOLOE-v8-seg accuracy lower than YOLOE-26-seg: YOLOE-v8 is older architecture. YOLOE-26 should have better accuracy. Mitigation: use YOLOE-v8 for stable deployment now, switch to YOLOE-26 once TRT bugs are fixed.
Model switching latency: Loading/unloading TRT engines adds 2-3s each direction. For frequent VLM requests, this overhead accumulates. Mitigation: batch VLM requests, implement predictive pre-loading.
Single-engine approach limits flexibility: Merging YOLOE + existing YOLO into one engine may require re-exporting when classes change. Mitigation: use YOLOE re-parameterization — when classes are fixed, YOLOE becomes standard YOLO with zero overhead.

Review Checklist

Draft conclusions consistent with fact cards
No important dimensions missed
No over-extrapolation
Conclusions actionable/verifiable
Memory budget calculated from documented values
TRT deployment risk based on documented bugs
Note: YOLOE-v8-seg TRT stability on Jetson not directly tested (inferred from YOLOv8 stability)
Note: Moondream 0.5B accuracy for aerial concealment analysis is unknown

Conclusions Requiring Revision

VLM latency target must change from ≤5s to "background batch" (20-40s)
Consider dropping VLM entirely for MVP and adding later when hardware/software matures
YOLOE-26 should be replaced with YOLOE-v8 for initial deployment
Memory architecture needs explicit budget table in solution draft

3.8 KiB Raw Blame History Unescape Escape