Files
detections-semantic/_docs/01_solution/solution_draft02.md
T
Oleksandr Bezdieniezhnykh 8e2ecf50fd Initial commit
Made-with: Cursor
2026-03-26 00:20:30 +02:00

371 lines
28 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Solution Draft
## Assessment Findings
| Old Component Solution | Weak Point (functional/security/performance) | New Solution |
|------------------------|----------------------------------------------|-------------|
| YOLOE-26-seg TRT engine | YOLO26 has confirmed TRT confidence misalignment and INT8 export crashes on Jetson (bugs #23841, Hackster.io report). YOLOE-26 inherits these bugs. | Use YOLOE-v8-seg for initial deployment (proven TRT stability). Transition to YOLOE-26 once Ultralytics fixes TRT issues. |
| Two separate TRT engines (existing YOLO + YOLOE-26) | Combined memory ~5-6GB exceeds usable 5.2GB VRAM. cuDNN overhead ~1GB per engine. | Single merged TRT engine: YOLOE-v8-seg re-parameterized with fixed classes merges into existing YOLO pipeline. One engine, one CUDA context. |
| UAV-VL-R1 (2B) via vLLM ≤5s | TRT-LLM does not support edge. 2B VLM: ~4.7 tok/s → 10-21s for useful response. VLM (2.5GB) cannot fit alongside YOLO in memory. | Moondream 0.5B (816 MiB INT4) as primary VLM. Demand-loaded: unload YOLO → load VLM → analyze batch → unload → reload YOLO. Background mode, not real-time. |
| Text prompts for concealment classes | Military concealment classes are far OOD from LVIS/COCO training data. "dugout", "camouflage netting" unlikely to work. | Visual prompts (SAVPE) primary for concealment. Text prompts only for in-distribution classes (footpath, road, trail). Multimodal fusion (text+visual) for robustness. |
| Zhang-Suen skeletonization raw | Noise-sensitive: spurious branches from noisy aerial segmentation masks. | Add preprocessing pipeline: Gaussian blur → threshold → morphological closing → skeletonization → branch pruning (remove < 20px branches). Increase ROI to 256×256. |
| PID-only gimbal control | PID cannot compensate UAV attitude drift and mounting errors during flight. | Kalman filter + PID cascade: Kalman estimates state from IMU → PID corrects error → gimbal actuates. |
| 1500 images/class in 8 weeks | Optimistic for military concealment data collection. Access constraints, annotation complexity. | 300-500 real + 1000+ synthetic (GenCAMO/CamouflageAnything) per class. Active learning loop from YOLOE zero-shot. |
| No security measures | Small edge YOLO models vulnerable to adversarial patches. Physical device capture risk. No data protection. | Three layers: PatchBlock adversarial defense, encrypted model weights at rest, auto-wipe on tamper. |
## Product Solution Description
A three-tier semantic detection system for identifying concealed/camouflaged positions from reconnaissance UAV aerial imagery, running on Jetson Orin Nano Super alongside the existing YOLO detection pipeline. Redesigned for the 5.2GB usable VRAM budget with demand-loaded VLM.
```
┌─────────────────────────────────────────────────────────────────────────┐
│ JETSON ORIN NANO SUPER │
│ (5.2 GB usable VRAM) │
│ │
│ ┌──────────┐ ┌──────────────────────┐ ┌───────────────────────┐ │
│ │ ViewPro │───▶│ Tier 1 │───▶│ Tier 2 │ │
│ │ A40 │ │ Merged TRT Engine │ │ Path Preprocessing │ │
│ │ Camera │ │ YOLOE-v8-seg │ │ + Skeletonization │ │
│ │ │ │ + Existing YOLO │ │ + MobileNetV3-Small │ │
│ │ │ │ ≤15ms │ │ ≤200ms │ │
│ └────▲─────┘ └──────────────────────┘ └───────────┬───────────┘ │
│ │ │ │
│ ┌────┴─────┐ ┌──────────────┐ │ ambiguous │
│ │ Gimbal │◀───│ Scan │ ▼ │
│ │ Kalman │ │ Controller │ ┌───────────────────┐ │
│ │ + PID │ │ (L1/L2) │ │ VLM Queue │ │
│ └──────────┘ └──────────────┘ │ (batch when ≥3 │ │
│ │ or on demand) │ │
│ ┌──────────────────────────────┐ └────────┬──────────┘ │
│ │ PatchBlock Adversarial │ │ │
│ │ Defense (CPU preprocessing)│ [demand-load cycle] │
│ └──────────────────────────────┘ ┌────────▼──────────┐ │
│ │ Tier 3 │ │
│ │ Moondream 0.5B │ │
│ │ 816 MiB INT4 │ │
│ │ ~5-10s per image │ │
│ └───────────────────┘ │
└─────────────────────────────────────────────────────────────────────────┘
```
The system operates in two scan levels:
- **Level 1 (Wide Sweep)**: Camera at medium zoom. Merged TRT engine runs YOLOE-v8-seg (visual + text prompts) and existing YOLO detection simultaneously. POIs queued by confidence.
- **Level 2 (Detailed Scan)**: Camera zooms into POI. Path preprocessing → skeletonization → endpoint CNN. High-confidence → immediate alert. Ambiguous → VLM queue.
- **VLM Batch Analysis**: When queue reaches 3+ detections or operator requests: scan pauses, YOLO engine unloads, Moondream loads, batch analyzes, unloads, YOLO reloads. ~30-45s total cycle.
Three submodules: (1) Semantic Detection AI, (2) Camera Gimbal Control, (3) Integration with existing detections service.
### Memory Budget
| Component | Mode | GPU Memory | Notes |
|-----------|------|-----------|-------|
| OS + System | Always | ~2.4 GB | From 8GB total, leaves 5.2GB usable |
| Merged TRT Engine (YOLOE-v8-seg + YOLO) | Detection mode | ~2.8 GB | Single engine, shared CUDA context |
| MobileNetV3-Small TRT (FP16) | Detection mode | ~50 MB | Tiny binary classifier |
| OpenCV + NumPy buffers | Always | ~200 MB | Frame buffers, masks |
| PatchBlock defense | Always | ~50 MB | CPU-based, minimal GPU |
| **Total in Detection Mode** | | **~3.1 GB** | **1.7 GB headroom** |
| Moondream 0.5B INT4 | VLM mode | ~816 MB | Demand-loaded |
| vLLM overhead + KV cache | VLM mode | ~500 MB | Minimal for 0.5B model |
| **Total in VLM Mode** | | **~1.6 GB** | **After unloading TRT engine** |
## Architecture
### Component 1: Tier 1 — Real-Time Detection (YOLOE-v8-seg, merged engine)
| Solution | Tools | Advantages | Limitations | Requirements | Security | Performance | Fit |
|----------|-------|-----------|-------------|-------------|----------|------------|-----|
| **YOLOE-v8-seg re-parameterized (recommended)** | yoloe-v8s-seg.pt, Ultralytics, TensorRT FP16 | Proven TRT stability on Jetson. Zero inference overhead when re-parameterized. Visual+text multimodal fusion. Merges into existing YOLO engine. | Older architecture than YOLO26 (slightly lower base accuracy). | Ultralytics ≥8.4, TensorRT, JetPack 6.2 | PatchBlock CPU preprocessing | ~13ms FP16 (s-size) | **Best fit for stable deployment.** |
| YOLOE-26-seg (future upgrade) | yoloe-26s-seg.pt, TensorRT | Better accuracy (YOLO26 architecture). NMS-free. | Active TRT bugs on Jetson: confidence misalignment, INT8 crash. | Wait for Ultralytics fix | Same | ~7ms FP16 (estimated) | **Future upgrade when TRT bugs resolved.** |
| YOLO26-Seg custom-trained (production) | yolo26s-seg.pt fine-tuned | Highest accuracy for known classes. | Requires 1500+ annotated images/class. Same TRT bugs. | Custom dataset, GPU for training | Same | ~7ms FP16 | **Long-term production model.** |
**Prompt strategy (revised)**:
Text prompts (in-distribution classes only):
- `"footpath"`, `"trail"`, `"path"`, `"road"`, `"track"`
- `"tree row"`, `"tree line"`, `"clearing"`
Visual prompts (SAVPE, for concealment-specific detection):
- Reference images cropped from semantic01-04.png: branch piles, dark entrances, dugout structures
- Use multimodal fusion mode: `concat` (zero overhead)
```python
from ultralytics import YOLOE
model = YOLOE("yoloe-v8s-seg.pt")
text_classes = ["footpath", "trail", "road", "tree row", "clearing"]
model.set_classes(text_classes)
results = model.predict(
frame,
conf=0.15,
refer_image="reference_hideout.jpg",
visual_prompts={"bboxes": np.array([[x1, y1, x2, y2]]), "cls": np.array([0])},
fusion_mode="concat"
)
```
**Re-parameterization for production**: Once classes are fixed after training, re-parameterize YOLOE-v8 to standard YOLOv8 weights. This eliminates the open-vocabulary overhead entirely, and the model becomes a regular YOLO inference engine. Merge with existing YOLO detection into a single TRT engine using TensorRT's multi-model support or batch inference.
### Component 2: Tier 2 — Spatial Reasoning & CNN Confirmation
| Solution | Tools | Advantages | Limitations | Requirements | Security | Performance | Fit |
|----------|-------|-----------|-------------|-------------|----------|------------|-----|
| **Robust path tracing + CNN classifier (recommended)** | OpenCV, scikit-image, MobileNetV3-Small TRT | Preprocessing removes noise. Branch pruning eliminates artifacts. 256×256 ROI for better context. | Still depends on segmentation quality. | OpenCV, scikit-image, PyTorch → TRT | Offline inference | ~150ms total | **Best fit. Robust against noisy masks.** |
| GraphMorph centerline extraction | PyTorch, custom model | Topology-aware. Reduces false positives. | Requires additional model in memory. More complex integration. | PyTorch, custom training | Offline | ~200ms estimated | Upgrade path if basic approach fails |
| Heuristic rules only | OpenCV, NumPy | No training data. Immediate. | Brittle. Cannot generalize. | None | Offline | ~50ms | Baseline/fallback for day-1 |
**Revised path tracing pipeline**:
1. Take footpath segmentation mask from Tier 1
2. **Preprocessing**: Gaussian blur (σ=1.5) → binary threshold (Otsu) → morphological closing (5×5 kernel, 2 iterations) → remove small connected components (< 100px area)
3. Skeletonize using Zhang-Suen algorithm
4. **Branch pruning**: Remove skeleton branches shorter than 20 pixels (noise artifacts)
5. Detect endpoints using hit-miss morphological operations (8 kernel patterns)
6. Detect junctions using branch-point kernels
7. Trace path segments between junctions/endpoints
8. For each endpoint: extract **256×256** ROI crop centered on endpoint from original image
9. Feed ROI crop to MobileNetV3-Small binary classifier
**Freshness assessment** (unchanged from draft01, validated approach):
- Edge sharpness, contrast ratio, fill ratio, path width consistency
- Initial hand-tuned thresholds → Random Forest with annotated data
### Component 3: Tier 3 — VLM Deep Analysis (Background Batch Mode)
| Solution | Tools | Advantages | Limitations | Requirements | Security | Performance | Fit |
|----------|-------|-----------|-------------|-------------|----------|------------|-----|
| **Moondream 0.5B INT4 demand-loaded (recommended)** | Moondream, ONNX/PyTorch, INT4 | 816 MiB memory. Built-in detect()/point() APIs. Runs on Raspberry Pi. | Weaker reasoning than 2B models. Not aerial-specialized. | ONNX Runtime or PyTorch | Local only | ~2-5s per image (0.5B) | **Best fit for memory-constrained edge.** |
| SmolVLM2-500M | HuggingFace, ONNX | 1.8GB. Small. ONNX support. | Less capable than Moondream for detection. No detect() API. | ONNX Runtime | Local only | ~3-7s estimated | Alternative if Moondream underperforms |
| UAV-VL-R1 (2B) demand-loaded | vLLM, W4A16 | Aerial-specialized. Best reasoning for UAV imagery. | 2.5GB INT8. ~10-21s per analysis. Tight memory fit. | vLLM, W4A16 weights | Local only | ~10-21s | **Upgrade path if Moondream insufficient.** |
| No VLM | N/A | Simplest. Most memory. Zero latency impact. | No fallback for ambiguous CNN outputs. No explanations. | None | N/A | N/A | **Viable MVP if Tier 1+2 accuracy is sufficient.** |
**Demand-loading protocol**:
```
1. VLM queue reaches threshold (≥3 detections or operator request)
2. Scan controller transitions to HOLD state (camera fixed position)
3. Signal main process to unload TRT engine
4. Wait for GPU memory release (~1s)
5. Launch VLM process: load Moondream 0.5B INT4
6. Process all queued detections sequentially (~2-5s each)
7. Collect results, send to operator
8. Unload VLM, release GPU memory
9. Reload TRT engine (~2s)
10. Resume scan from HOLD position
Total cycle: ~30-45s for 3-5 detections
```
**VLM prompting strategy** (adapted for Moondream's capabilities):
Using detect() API for fast binary check:
```python
model.detect(image, "concealed military position")
model.detect(image, "dugout covered with branches")
```
Using caption for detailed analysis:
```python
model.caption(image, length="normal")
```
### Component 4: Camera Gimbal Control
| Solution | Tools | Advantages | Limitations | Requirements | Security | Performance | Fit |
|----------|-------|-----------|-------------|-------------|----------|------------|-----|
| **Kalman+PID cascade with ViewLink (recommended)** | pyserial, ViewLink V3.3.3, filterpy (Kalman), servopilot (PID) | Compensates UAV attitude drift. Proven in aerospace. Smooth path-following. | More complex than PID-only. Requires IMU data feed. | ViewPro A40, pyserial, IMU data access | Physical only | <10ms command latency | **Best fit. Flight-grade control.** |
| PID-only with ViewLink | pyserial, ViewLink V3.3.3, servopilot | Simple. Works for hovering UAV. | Drifts during flight. Cannot compensate mounting errors. | ViewPro A40, pyserial | Physical only | <10ms | Acceptable for testing only |
**Revised control architecture**:
```
UAV IMU Data ──▶ Kalman Filter ──▶ State Estimate (attitude, angular velocity)
Camera Frame ──▶ Detection ──▶ Target Position ──▶ Error Calculation
State Estimate ─────▶│
PID Controller
Gimbal Command
ViewLink Serial
```
Kalman filter state vector: [yaw, pitch, yaw_rate, pitch_rate]
Measurement inputs: IMU gyroscope (yaw_rate, pitch_rate), detection-derived angles
Process model: constant angular velocity with noise
**Scan patterns** (unchanged from draft01): sinusoidal yaw oscillation, POI queue management.
**Path-following** (revised): Kalman-filtered state estimate provides smoother tracking. PID gains can be lower (less aggressive) because state estimate is already stabilized. Update rate: tied to detection frame rate.
### Component 5: Integration with Existing Detections Service
| Solution | Tools | Advantages | Limitations | Requirements | Security | Performance | Fit |
|----------|-------|-----------|-------------|-------------|----------|------------|-----|
| **Single merged Cython+TRT process + demand-loaded VLM (recommended)** | Cython, TensorRT, ONNX Runtime | Single TRT engine. Minimal memory. VLM isolated. | VLM loading pauses detection (30-45s). | Cython extensions, process management | Process isolation + encryption | Minimal overhead | **Best fit for 5.2GB VRAM.** |
**Revised integration architecture**:
```
┌───────────────────────────────────────────────────────────────────┐
│ Main Process (Cython + TRT) │
│ │
│ ┌──────────────────────────────────────────────┐ │
│ │ Single Merged TRT Engine │ │
│ │ ├─ Existing YOLO Detection heads │ │
│ │ ├─ YOLOE-v8-seg (re-parameterized) │ │
│ │ └─ MobileNetV3-Small classifier │ │
│ └──────────────────────────────────────────────┘ │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────────┐│
│ │ Path Tracing │ │ Scan │ │ PatchBlock Defense ││
│ │ + Skeleton │ │ Controller │ │ (CPU parallel) ││
│ │ (CPU) │ │ + Kalman+PID │ │ ││
│ └──────────────┘ └──────────────┘ └──────────────────────────┘│
│ │
│ ┌──────────────────────────────────────────────┐ │
│ │ VLM Manager │ │
│ │ state: IDLE | LOADING | ANALYZING | UNLOAD │ │
│ │ queue: [detection_1, detection_2, ...] │ │
│ └──────────────────────────────────────────────┘ │
└───────────────────────────────────────────────────────────────────┘
VLM mode (demand-loaded, replaces TRT engine temporarily):
┌───────────────────────────────────────────────────────────────────┐
│ ┌──────────────────────────────────────────────┐ │
│ │ Moondream 0.5B INT4 │ │
│ │ (ONNX Runtime or PyTorch) │ │
│ └──────────────────────────────────────────────┘ │
│ Detection paused. Camera in HOLD state. │
└───────────────────────────────────────────────────────────────────┘
```
**Data flow** (revised):
1. PatchBlock preprocesses frame on CPU (parallel with GPU inference)
2. Cleaned frame → merged TRT engine → YOLO detections + YOLOE-v8 semantic detections
3. Semantic detections → path preprocessing → skeletonization → endpoint extraction → CNN
4. High-confidence → operator alert (coordinates + bounding box + confidence)
5. Ambiguous → VLM queue
6. VLM queue management: batch-process when queue ≥ 3 or operator triggers
7. During VLM mode: detection paused, camera holds, operator notified of pause
**GPU scheduling** (revised): No concurrent multi-model GPU sharing. Single TRT engine runs during detection mode. VLM demand-loaded exclusively during analysis mode. This eliminates the 10-40% latency jitter from GPU sharing.
### Component 6: Security
| Solution | Tools | Advantages | Limitations | Requirements | Security | Performance | Fit |
|----------|-------|-----------|-------------|-------------|----------|------------|-----|
| **Three-layer security (recommended)** | PatchBlock, LUKS/dm-crypt, tmpfs | Adversarial defense + model protection + data protection | Adds ~5ms CPU overhead for PatchBlock | PatchBlock library, Linux crypto | Full stack | Minimal GPU impact | **Required for military edge deployment.** |
**Layer 1: Adversarial Input Defense**
- PatchBlock CPU preprocessing on every frame before GPU inference
- Detects anomalous patches via outlier detection and dimensionality reduction
- Recovers up to 77% accuracy under adversarial attack
- Runs in parallel with GPU inference (no latency addition to pipeline)
**Layer 2: Model & Weight Protection**
- TRT engine files encrypted at rest using LUKS on a dedicated partition
- At boot: decrypt into tmpfs (RAM disk) — never written to persistent storage unencrypted
- Secure boot chain via Jetson's secure boot (fuse-based, hardware root of trust)
- If device is captured powered-off: encrypted models, no plaintext weights accessible
**Layer 3: Operational Data Protection**
- Captured imagery stored in encrypted circular buffer (last N minutes only)
- Detection logs (coordinates, confidence, timestamps) encrypted at rest
- Over datalink: transmit only coordinates + confidence + small thumbnail (not raw frames)
- Tamper detection: if enclosure opened or unauthorized boot detected → auto-wipe keys + detection logs
## Training & Data Strategy (Revised)
### Phase 1: Zero-shot (Week 1-2)
- Deploy YOLOE-v8-seg with multimodal prompts (text for paths, visual for concealment)
- Use semantic01-04.png as visual prompt references via SAVPE
- Tune confidence thresholds per class type
- Collect false positive/negative data for annotation
- **Benchmark YOLOE-v8-seg TRT on Jetson: confirm inference time, memory, stability**
### Phase 2: Annotation & Fine-tuning (Week 3-8)
- Annotate collected real data (target: 300-500 images/class)
- **Generate 1000+ synthetic images per class using GenCAMO/CamouflageAnything**
- Priority: footpaths (segmentation) → branch piles (bboxes) → entrances (bboxes)
- Active learning: YOLOE zero-shot flags candidates → human reviews → annotates
- Fine-tune YOLOv8-Seg (or YOLO26-Seg if TRT fixed) on real + synthetic dataset
- Use linear probing first, then full fine-tuning
### Phase 3: CNN classifier (Week 4-8, parallel with Phase 2)
- Train MobileNetV3-Small on ROI crops: 256×256 from endpoint analysis
- Positive: annotated concealed positions + synthetic. Negative: natural termini, random terrain
- Target: 200+ real positive + 500+ synthetic positive, 1000+ negative
- Export to TensorRT FP16
### Phase 4: VLM integration (Week 8-12)
- Deploy Moondream 0.5B INT4 in demand-loaded mode
- Test demand-load cycle timing: measure unload → load → infer → unload → reload
- Tune detect() prompts and caption prompts on collected ambiguous cases
- **If Moondream accuracy insufficient: test UAV-VL-R1 (2B) demand-loaded**
- **If YOLO26 TRT bugs fixed: test YOLOE-26-seg as Tier 1 upgrade**
### Phase 5: Seasonal expansion (Month 3+)
- Winter data → spring/summer annotation campaigns
- Re-train all models with multi-season data + seasonal synthetic augmentation
## Testing Strategy
### Integration / Functional Tests
- YOLOE-v8-seg multimodal prompt detection on reference images — verify text+visual fusion
- **TRT engine stability test**: 1000 consecutive inferences without confidence drift
- Path preprocessing pipeline on synthetic noisy masks — verify cleaning + skeletonization
- Branch pruning: verify short spurious branches removed, real path branches preserved
- CNN classifier on known positive/negative 256×256 ROI crops
- **Demand-load VLM cycle**: measure timing of unload TRT → load Moondream → infer → unload → reload TRT
- **Memory monitoring during demand-load**: confirm no memory leak across 10+ cycles
- Kalman+PID gimbal control with simulated IMU data — verify drift compensation
- Full pipeline: frame → PatchBlock → YOLOE-v8 → path tracing → CNN → (VLM) → alert
- Scan controller: Level 1 → Level 2 → HOLD (for VLM) → resume Level 1
### Non-Functional Tests
- Tier 1 latency: YOLOE-v8-seg TRT FP16 ≤15ms on Jetson Orin Nano Super
- Tier 2 latency: preprocessing + skeletonization + CNN ≤200ms
- **VLM demand-load cycle: ≤45s for 3 detections (including load/unload overhead)**
- **Memory profiling: peak detection mode ≤3.5GB GPU, peak VLM mode ≤2.0GB GPU**
- Thermal stress test: 30+ minutes continuous detection without thermal throttling
- PatchBlock adversarial test: inject test adversarial patches, measure accuracy recovery
- False positive/negative rate on annotated reference images
- Gimbal path-following accuracy with and without Kalman filter (measure improvement)
- **Demand-load memory leak test: 50+ VLM cycles without memory growth**
## References
- YOLOE-v8 docs: https://docs.ultralytics.com/models/yoloe/
- YOLOE-26 paper: https://arxiv.org/abs/2602.00168
- YOLO26 TRT confidence bug: https://www.hackster.io/qwe018931/pushing-limits-yolov8-vs-v26-on-jetson-orin-nano-b89267
- YOLO26 INT8 crash: https://github.com/ultralytics/ultralytics/issues/23841
- YOLOE multimodal fusion: https://github.com/ultralytics/ultralytics/pull/21966
- Jetson Orin Nano Super memory: https://forums.developer.nvidia.com/t/jetson-orin-nano-super-insufficient-gpu-memory/330777
- Multi-model survey on Orin Nano: https://dev.to/ankk98/multi-model-ai-resource-allocation-for-humanoid-robots-a-survey-on-jetson-orin-nano-super-310i
- TRT multiple engines: https://github.com/NVIDIA/TensorRT/issues/4358
- TRT memory on Jetson: https://github.com/ultralytics/ultralytics/issues/21562
- Moondream: https://moondream.ai/blog/introducing-moondream-0-5b
- Cosmos-Reason2-2B Jetson benchmark: https://www.thenextgentechinsider.com/pulse/cosmos-reason2-runs-on-jetson-orin-nano-super-with-w4a16-quantization
- Jetson AI Lab benchmarks: https://www.jetson-ai-lab.com/tutorials/genai-benchmarking/
- Jetson LLM bottleneck: https://ericxliu.me/posts/benchmarking-llms-on-jetson-orin-nano/
- vLLM on Jetson: https://learnopencv.com/deployment-on-edge-vllm-on-jetson/
- TRT-LLM no edge support: https://github.com/NVIDIA/TensorRT-LLM/issues/7978
- PatchBlock defense: https://arxiv.org/abs/2601.00367
- Adversarial patches on YOLO: https://link.springer.com/article/10.1007/s10207-025-01067-3
- GenCAMO synthetic data: https://arxiv.org/abs/2601.01181
- CamouflageAnything (CVPR 2025): https://openaccess.thecvf.com/content/CVPR2025/html/Das_Camouflage_Anything_...
- GraphMorph centerlines: https://arxiv.org/pdf/2502.11731
- Learnable skeleton + SAM: https://ui.adsabs.harvard.edu/abs/2025ITGRS..63S1458X
- Kalman filter gimbal: https://ieeexplore.ieee.org/ielx7/6287639/10005208/10160027.pdf
- UAV-VL-R1: https://arxiv.org/pdf/2508.11196
- ViewPro Protocol: https://www.viewprotech.com/index.php?ac=article&at=read&did=510
- servopilot PID: https://pypi.org/project/servopilot/
## Related Artifacts
- AC assessment: `_docs/00_research/00_ac_assessment.md`
- Previous draft: `_docs/01_solution/solution_draft01.md`