detections-semantic/_docs/01_solution/solution_draft02.md

# Solution Draft

## Assessment Findings

| Old Component Solution | Weak Point (functional/security/performance) | New Solution |
|------------------------|----------------------------------------------|-------------|
| YOLOE-26-seg TRT engine | YOLO26 has confirmed TRT confidence misalignment and INT8 export crashes on Jetson (bugs #23841, Hackster.io report). YOLOE-26 inherits these bugs. | Use YOLOE-v8-seg for initial deployment (proven TRT stability). Transition to YOLOE-26 once Ultralytics fixes TRT issues. |
| Two separate TRT engines (existing YOLO + YOLOE-26) | Combined memory ~5-6GB exceeds usable 5.2GB VRAM. cuDNN overhead ~1GB per engine. | Single merged TRT engine: YOLOE-v8-seg re-parameterized with fixed classes merges into existing YOLO pipeline. One engine, one CUDA context. |
| UAV-VL-R1 (2B) via vLLM ≤5s | TRT-LLM does not support edge. 2B VLM: ~4.7 tok/s → 10-21s for useful response. VLM (2.5GB) cannot fit alongside YOLO in memory. | Moondream 0.5B (816 MiB INT4) as primary VLM. Demand-loaded: unload YOLO → load VLM → analyze batch → unload → reload YOLO. Background mode, not real-time. |
| Text prompts for concealment classes | Military concealment classes are far OOD from LVIS/COCO training data. "dugout", "camouflage netting" unlikely to work. | Visual prompts (SAVPE) primary for concealment. Text prompts only for in-distribution classes (footpath, road, trail). Multimodal fusion (text+visual) for robustness. |
| Zhang-Suen skeletonization raw | Noise-sensitive: spurious branches from noisy aerial segmentation masks. | Add preprocessing pipeline: Gaussian blur → threshold → morphological closing → skeletonization → branch pruning (remove < 20px branches). Increase ROI to 256×256. |
| PID-only gimbal control | PID cannot compensate UAV attitude drift and mounting errors during flight. | Kalman filter + PID cascade: Kalman estimates state from IMU → PID corrects error → gimbal actuates. |
| 1500 images/class in 8 weeks | Optimistic for military concealment data collection. Access constraints, annotation complexity. | 300-500 real + 1000+ synthetic (GenCAMO/CamouflageAnything) per class. Active learning loop from YOLOE zero-shot. |
| No security measures | Small edge YOLO models vulnerable to adversarial patches. Physical device capture risk. No data protection. | Three layers: PatchBlock adversarial defense, encrypted model weights at rest, auto-wipe on tamper. |

## Product Solution Description

A three-tier semantic detection system for identifying concealed/camouflaged positions from reconnaissance UAV aerial imagery, running on Jetson Orin Nano Super alongside the existing YOLO detection pipeline. Redesigned for the 5.2GB usable VRAM budget with demand-loaded VLM.

```
┌─────────────────────────────────────────────────────────────────────────┐
│                        JETSON ORIN NANO SUPER                          │
│                        (5.2 GB usable VRAM)                            │
│                                                                        │
│  ┌──────────┐    ┌──────────────────────┐    ┌───────────────────────┐ │
│  │ ViewPro  │───▶│  Tier 1              │───▶│  Tier 2               │ │
│  │ A40      │    │  Merged TRT Engine   │    │  Path Preprocessing   │ │
│  │ Camera   │    │  YOLOE-v8-seg        │    │  + Skeletonization    │ │
│  │          │    │  + Existing YOLO     │    │  + MobileNetV3-Small  │ │
│  │          │    │  ≤15ms               │    │  ≤200ms               │ │
│  └────▲─────┘    └──────────────────────┘    └───────────┬───────────┘ │
│       │                                                  │             │
│  ┌────┴─────┐    ┌──────────────┐                        │ ambiguous   │
│  │ Gimbal   │◀───│  Scan        │                        ▼             │
│  │ Kalman   │    │  Controller  │              ┌───────────────────┐   │
│  │ + PID    │    │  (L1/L2)     │              │ VLM Queue         │   │
│  └──────────┘    └──────────────┘              │ (batch when ≥3    │   │
│                                                │  or on demand)    │   │
│  ┌──────────────────────────────┐              └────────┬──────────┘   │
│  │  PatchBlock Adversarial     │                        │              │
│  │  Defense (CPU preprocessing)│              [demand-load cycle]      │
│  └──────────────────────────────┘              ┌────────▼──────────┐   │
│                                                │ Tier 3            │   │
│                                                │ Moondream 0.5B    │   │
│                                                │ 816 MiB INT4      │   │
│                                                │ ~5-10s per image  │   │
│                                                └───────────────────┘   │
└─────────────────────────────────────────────────────────────────────────┘
```

The system operates in two scan levels:
- **Level 1 (Wide Sweep)**: Camera at medium zoom. Merged TRT engine runs YOLOE-v8-seg (visual + text prompts) and existing YOLO detection simultaneously. POIs queued by confidence.
- **Level 2 (Detailed Scan)**: Camera zooms into POI. Path preprocessing → skeletonization → endpoint CNN. High-confidence → immediate alert. Ambiguous → VLM queue.
- **VLM Batch Analysis**: When queue reaches 3+ detections or operator requests: scan pauses, YOLO engine unloads, Moondream loads, batch analyzes, unloads, YOLO reloads. ~30-45s total cycle.

Three submodules: (1) Semantic Detection AI, (2) Camera Gimbal Control, (3) Integration with existing detections service.

### Memory Budget

| Component | Mode | GPU Memory | Notes |
|-----------|------|-----------|-------|
| OS + System | Always | ~2.4 GB | From 8GB total, leaves 5.2GB usable |
| Merged TRT Engine (YOLOE-v8-seg + YOLO) | Detection mode | ~2.8 GB | Single engine, shared CUDA context |
| MobileNetV3-Small TRT (FP16) | Detection mode | ~50 MB | Tiny binary classifier |
| OpenCV + NumPy buffers | Always | ~200 MB | Frame buffers, masks |
| PatchBlock defense | Always | ~50 MB | CPU-based, minimal GPU |
| **Total in Detection Mode** | | **~3.1 GB** | **1.7 GB headroom** |
| Moondream 0.5B INT4 | VLM mode | ~816 MB | Demand-loaded |
| vLLM overhead + KV cache | VLM mode | ~500 MB | Minimal for 0.5B model |
| **Total in VLM Mode** | | **~1.6 GB** | **After unloading TRT engine** |

## Architecture

### Component 1: Tier 1 — Real-Time Detection (YOLOE-v8-seg, merged engine)

| Solution | Tools | Advantages | Limitations | Requirements | Security | Performance | Fit |
|----------|-------|-----------|-------------|-------------|----------|------------|-----|
| **YOLOE-v8-seg re-parameterized (recommended)** | yoloe-v8s-seg.pt, Ultralytics, TensorRT FP16 | Proven TRT stability on Jetson. Zero inference overhead when re-parameterized. Visual+text multimodal fusion. Merges into existing YOLO engine. | Older architecture than YOLO26 (slightly lower base accuracy). | Ultralytics ≥8.4, TensorRT, JetPack 6.2 | PatchBlock CPU preprocessing | ~13ms FP16 (s-size) | **Best fit for stable deployment.** |
| YOLOE-26-seg (future upgrade) | yoloe-26s-seg.pt, TensorRT | Better accuracy (YOLO26 architecture). NMS-free. | Active TRT bugs on Jetson: confidence misalignment, INT8 crash. | Wait for Ultralytics fix | Same | ~7ms FP16 (estimated) | **Future upgrade when TRT bugs resolved.** |
| YOLO26-Seg custom-trained (production) | yolo26s-seg.pt fine-tuned | Highest accuracy for known classes. | Requires 1500+ annotated images/class. Same TRT bugs. | Custom dataset, GPU for training | Same | ~7ms FP16 | **Long-term production model.** |

**Prompt strategy (revised)**:

Text prompts (in-distribution classes only):
- `"footpath"`, `"trail"`, `"path"`, `"road"`, `"track"`
- `"tree row"`, `"tree line"`, `"clearing"`

Visual prompts (SAVPE, for concealment-specific detection):
- Reference images cropped from semantic01-04.png: branch piles, dark entrances, dugout structures
- Use multimodal fusion mode: `concat` (zero overhead)

```python
from ultralytics import YOLOE

model = YOLOE("yoloe-v8s-seg.pt")

text_classes = ["footpath", "trail", "road", "tree row", "clearing"]
model.set_classes(text_classes)

results = model.predict(
    frame,
    conf=0.15,
    refer_image="reference_hideout.jpg",
    visual_prompts={"bboxes": np.array([[x1, y1, x2, y2]]), "cls": np.array([0])},
    fusion_mode="concat"
)
```

**Re-parameterization for production**: Once classes are fixed after training, re-parameterize YOLOE-v8 to standard YOLOv8 weights. This eliminates the open-vocabulary overhead entirely, and the model becomes a regular YOLO inference engine. Merge with existing YOLO detection into a single TRT engine using TensorRT's multi-model support or batch inference.

### Component 2: Tier 2 — Spatial Reasoning & CNN Confirmation

| Solution | Tools | Advantages | Limitations | Requirements | Security | Performance | Fit |
|----------|-------|-----------|-------------|-------------|----------|------------|-----|
| **Robust path tracing + CNN classifier (recommended)** | OpenCV, scikit-image, MobileNetV3-Small TRT | Preprocessing removes noise. Branch pruning eliminates artifacts. 256×256 ROI for better context. | Still depends on segmentation quality. | OpenCV, scikit-image, PyTorch → TRT | Offline inference | ~150ms total | **Best fit. Robust against noisy masks.** |
| GraphMorph centerline extraction | PyTorch, custom model | Topology-aware. Reduces false positives. | Requires additional model in memory. More complex integration. | PyTorch, custom training | Offline | ~200ms estimated | Upgrade path if basic approach fails |
| Heuristic rules only | OpenCV, NumPy | No training data. Immediate. | Brittle. Cannot generalize. | None | Offline | ~50ms | Baseline/fallback for day-1 |

**Revised path tracing pipeline**:

1. Take footpath segmentation mask from Tier 1
2. **Preprocessing**: Gaussian blur (σ=1.5) → binary threshold (Otsu) → morphological closing (5×5 kernel, 2 iterations) → remove small connected components (< 100px area)
3. Skeletonize using Zhang-Suen algorithm
4. **Branch pruning**: Remove skeleton branches shorter than 20 pixels (noise artifacts)
5. Detect endpoints using hit-miss morphological operations (8 kernel patterns)
6. Detect junctions using branch-point kernels
7. Trace path segments between junctions/endpoints
8. For each endpoint: extract **256×256** ROI crop centered on endpoint from original image
9. Feed ROI crop to MobileNetV3-Small binary classifier

**Freshness assessment** (unchanged from draft01, validated approach):
- Edge sharpness, contrast ratio, fill ratio, path width consistency
- Initial hand-tuned thresholds → Random Forest with annotated data

### Component 3: Tier 3 — VLM Deep Analysis (Background Batch Mode)

| Solution | Tools | Advantages | Limitations | Requirements | Security | Performance | Fit |
|----------|-------|-----------|-------------|-------------|----------|------------|-----|
| **Moondream 0.5B INT4 demand-loaded (recommended)** | Moondream, ONNX/PyTorch, INT4 | 816 MiB memory. Built-in detect()/point() APIs. Runs on Raspberry Pi. | Weaker reasoning than 2B models. Not aerial-specialized. | ONNX Runtime or PyTorch | Local only | ~2-5s per image (0.5B) | **Best fit for memory-constrained edge.** |
| SmolVLM2-500M | HuggingFace, ONNX | 1.8GB. Small. ONNX support. | Less capable than Moondream for detection. No detect() API. | ONNX Runtime | Local only | ~3-7s estimated | Alternative if Moondream underperforms |
| UAV-VL-R1 (2B) demand-loaded | vLLM, W4A16 | Aerial-specialized. Best reasoning for UAV imagery. | 2.5GB INT8. ~10-21s per analysis. Tight memory fit. | vLLM, W4A16 weights | Local only | ~10-21s | **Upgrade path if Moondream insufficient.** |
| No VLM | N/A | Simplest. Most memory. Zero latency impact. | No fallback for ambiguous CNN outputs. No explanations. | None | N/A | N/A | **Viable MVP if Tier 1+2 accuracy is sufficient.** |

**Demand-loading protocol**:

```
1. VLM queue reaches threshold (≥3 detections or operator request)
2. Scan controller transitions to HOLD state (camera fixed position)
3. Signal main process to unload TRT engine
4. Wait for GPU memory release (~1s)
5. Launch VLM process: load Moondream 0.5B INT4
6. Process all queued detections sequentially (~2-5s each)
7. Collect results, send to operator
8. Unload VLM, release GPU memory
9. Reload TRT engine (~2s)
10. Resume scan from HOLD position
Total cycle: ~30-45s for 3-5 detections
```

**VLM prompting strategy** (adapted for Moondream's capabilities):

Using detect() API for fast binary check:
```python
model.detect(image, "concealed military position")
model.detect(image, "dugout covered with branches")
```

Using caption for detailed analysis:
```python
model.caption(image, length="normal")
```

### Component 4: Camera Gimbal Control

| Solution | Tools | Advantages | Limitations | Requirements | Security | Performance | Fit |
|----------|-------|-----------|-------------|-------------|----------|------------|-----|
| **Kalman+PID cascade with ViewLink (recommended)** | pyserial, ViewLink V3.3.3, filterpy (Kalman), servopilot (PID) | Compensates UAV attitude drift. Proven in aerospace. Smooth path-following. | More complex than PID-only. Requires IMU data feed. | ViewPro A40, pyserial, IMU data access | Physical only | <10ms command latency | **Best fit. Flight-grade control.** |
| PID-only with ViewLink | pyserial, ViewLink V3.3.3, servopilot | Simple. Works for hovering UAV. | Drifts during flight. Cannot compensate mounting errors. | ViewPro A40, pyserial | Physical only | <10ms | Acceptable for testing only |

**Revised control architecture**:

```
UAV IMU Data ──▶ Kalman Filter ──▶ State Estimate (attitude, angular velocity)
                     │
Camera Frame ──▶ Detection ──▶ Target Position ──▶ Error Calculation
                                                        │
                                    State Estimate ─────▶│
                                                        │
                                                   PID Controller
                                                        │
                                                   Gimbal Command
                                                        │
                                                   ViewLink Serial
```

Kalman filter state vector: [yaw, pitch, yaw_rate, pitch_rate]
Measurement inputs: IMU gyroscope (yaw_rate, pitch_rate), detection-derived angles
Process model: constant angular velocity with noise

**Scan patterns** (unchanged from draft01): sinusoidal yaw oscillation, POI queue management.

**Path-following** (revised): Kalman-filtered state estimate provides smoother tracking. PID gains can be lower (less aggressive) because state estimate is already stabilized. Update rate: tied to detection frame rate.

### Component 5: Integration with Existing Detections Service

| Solution | Tools | Advantages | Limitations | Requirements | Security | Performance | Fit |
|----------|-------|-----------|-------------|-------------|----------|------------|-----|
| **Single merged Cython+TRT process + demand-loaded VLM (recommended)** | Cython, TensorRT, ONNX Runtime | Single TRT engine. Minimal memory. VLM isolated. | VLM loading pauses detection (30-45s). | Cython extensions, process management | Process isolation + encryption | Minimal overhead | **Best fit for 5.2GB VRAM.** |

**Revised integration architecture**:

```
┌───────────────────────────────────────────────────────────────────┐
│                 Main Process (Cython + TRT)                       │
│                                                                   │
│  ┌──────────────────────────────────────────────┐                 │
│  │ Single Merged TRT Engine                     │                 │
│  │  ├─ Existing YOLO Detection heads            │                 │
│  │  ├─ YOLOE-v8-seg (re-parameterized)         │                 │
│  │  └─ MobileNetV3-Small classifier            │                 │
│  └──────────────────────────────────────────────┘                 │
│                                                                   │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────────────────┐│
│  │ Path Tracing │  │ Scan         │  │ PatchBlock Defense       ││
│  │ + Skeleton   │  │ Controller   │  │ (CPU parallel)           ││
│  │ (CPU)        │  │ + Kalman+PID │  │                          ││
│  └──────────────┘  └──────────────┘  └──────────────────────────┘│
│                                                                   │
│  ┌──────────────────────────────────────────────┐                 │
│  │ VLM Manager                                  │                 │
│  │  state: IDLE | LOADING | ANALYZING | UNLOAD  │                 │
│  │  queue: [detection_1, detection_2, ...]      │                 │
│  └──────────────────────────────────────────────┘                 │
└───────────────────────────────────────────────────────────────────┘

VLM mode (demand-loaded, replaces TRT engine temporarily):
┌───────────────────────────────────────────────────────────────────┐
│  ┌──────────────────────────────────────────────┐                 │
│  │ Moondream 0.5B INT4                          │                 │
│  │ (ONNX Runtime or PyTorch)                    │                 │
│  └──────────────────────────────────────────────┘                 │
│  Detection paused. Camera in HOLD state.                          │
└───────────────────────────────────────────────────────────────────┘
```

**Data flow** (revised):
1. PatchBlock preprocesses frame on CPU (parallel with GPU inference)
2. Cleaned frame → merged TRT engine → YOLO detections + YOLOE-v8 semantic detections
3. Semantic detections → path preprocessing → skeletonization → endpoint extraction → CNN
4. High-confidence → operator alert (coordinates + bounding box + confidence)
5. Ambiguous → VLM queue
6. VLM queue management: batch-process when queue ≥ 3 or operator triggers
7. During VLM mode: detection paused, camera holds, operator notified of pause

**GPU scheduling** (revised): No concurrent multi-model GPU sharing. Single TRT engine runs during detection mode. VLM demand-loaded exclusively during analysis mode. This eliminates the 10-40% latency jitter from GPU sharing.

### Component 6: Security

| Solution | Tools | Advantages | Limitations | Requirements | Security | Performance | Fit |
|----------|-------|-----------|-------------|-------------|----------|------------|-----|
| **Three-layer security (recommended)** | PatchBlock, LUKS/dm-crypt, tmpfs | Adversarial defense + model protection + data protection | Adds ~5ms CPU overhead for PatchBlock | PatchBlock library, Linux crypto | Full stack | Minimal GPU impact | **Required for military edge deployment.** |

**Layer 1: Adversarial Input Defense**
- PatchBlock CPU preprocessing on every frame before GPU inference
- Detects anomalous patches via outlier detection and dimensionality reduction
- Recovers up to 77% accuracy under adversarial attack
- Runs in parallel with GPU inference (no latency addition to pipeline)

**Layer 2: Model & Weight Protection**
- TRT engine files encrypted at rest using LUKS on a dedicated partition
- At boot: decrypt into tmpfs (RAM disk) — never written to persistent storage unencrypted
- Secure boot chain via Jetson's secure boot (fuse-based, hardware root of trust)
- If device is captured powered-off: encrypted models, no plaintext weights accessible

**Layer 3: Operational Data Protection**
- Captured imagery stored in encrypted circular buffer (last N minutes only)
- Detection logs (coordinates, confidence, timestamps) encrypted at rest
- Over datalink: transmit only coordinates + confidence + small thumbnail (not raw frames)
- Tamper detection: if enclosure opened or unauthorized boot detected → auto-wipe keys + detection logs

## Training & Data Strategy (Revised)

### Phase 1: Zero-shot (Week 1-2)
- Deploy YOLOE-v8-seg with multimodal prompts (text for paths, visual for concealment)
- Use semantic01-04.png as visual prompt references via SAVPE
- Tune confidence thresholds per class type
- Collect false positive/negative data for annotation
- **Benchmark YOLOE-v8-seg TRT on Jetson: confirm inference time, memory, stability**

### Phase 2: Annotation & Fine-tuning (Week 3-8)
- Annotate collected real data (target: 300-500 images/class)
- **Generate 1000+ synthetic images per class using GenCAMO/CamouflageAnything**
- Priority: footpaths (segmentation) → branch piles (bboxes) → entrances (bboxes)
- Active learning: YOLOE zero-shot flags candidates → human reviews → annotates
- Fine-tune YOLOv8-Seg (or YOLO26-Seg if TRT fixed) on real + synthetic dataset
- Use linear probing first, then full fine-tuning

### Phase 3: CNN classifier (Week 4-8, parallel with Phase 2)
- Train MobileNetV3-Small on ROI crops: 256×256 from endpoint analysis
- Positive: annotated concealed positions + synthetic. Negative: natural termini, random terrain
- Target: 200+ real positive + 500+ synthetic positive, 1000+ negative
- Export to TensorRT FP16

### Phase 4: VLM integration (Week 8-12)
- Deploy Moondream 0.5B INT4 in demand-loaded mode
- Test demand-load cycle timing: measure unload → load → infer → unload → reload
- Tune detect() prompts and caption prompts on collected ambiguous cases
- **If Moondream accuracy insufficient: test UAV-VL-R1 (2B) demand-loaded**
- **If YOLO26 TRT bugs fixed: test YOLOE-26-seg as Tier 1 upgrade**

### Phase 5: Seasonal expansion (Month 3+)
- Winter data → spring/summer annotation campaigns
- Re-train all models with multi-season data + seasonal synthetic augmentation

## Testing Strategy

### Integration / Functional Tests
- YOLOE-v8-seg multimodal prompt detection on reference images — verify text+visual fusion
- **TRT engine stability test**: 1000 consecutive inferences without confidence drift
- Path preprocessing pipeline on synthetic noisy masks — verify cleaning + skeletonization
- Branch pruning: verify short spurious branches removed, real path branches preserved
- CNN classifier on known positive/negative 256×256 ROI crops
- **Demand-load VLM cycle**: measure timing of unload TRT → load Moondream → infer → unload → reload TRT
- **Memory monitoring during demand-load**: confirm no memory leak across 10+ cycles
- Kalman+PID gimbal control with simulated IMU data — verify drift compensation
- Full pipeline: frame → PatchBlock → YOLOE-v8 → path tracing → CNN → (VLM) → alert
- Scan controller: Level 1 → Level 2 → HOLD (for VLM) → resume Level 1

### Non-Functional Tests
- Tier 1 latency: YOLOE-v8-seg TRT FP16 ≤15ms on Jetson Orin Nano Super
- Tier 2 latency: preprocessing + skeletonization + CNN ≤200ms
- **VLM demand-load cycle: ≤45s for 3 detections (including load/unload overhead)**
- **Memory profiling: peak detection mode ≤3.5GB GPU, peak VLM mode ≤2.0GB GPU**
- Thermal stress test: 30+ minutes continuous detection without thermal throttling
- PatchBlock adversarial test: inject test adversarial patches, measure accuracy recovery
- False positive/negative rate on annotated reference images
- Gimbal path-following accuracy with and without Kalman filter (measure improvement)
- **Demand-load memory leak test: 50+ VLM cycles without memory growth**

## References

- YOLOE-v8 docs: https://docs.ultralytics.com/models/yoloe/
- YOLOE-26 paper: https://arxiv.org/abs/2602.00168
- YOLO26 TRT confidence bug: https://www.hackster.io/qwe018931/pushing-limits-yolov8-vs-v26-on-jetson-orin-nano-b89267
- YOLO26 INT8 crash: https://github.com/ultralytics/ultralytics/issues/23841
- YOLOE multimodal fusion: https://github.com/ultralytics/ultralytics/pull/21966
- Jetson Orin Nano Super memory: https://forums.developer.nvidia.com/t/jetson-orin-nano-super-insufficient-gpu-memory/330777
- Multi-model survey on Orin Nano: https://dev.to/ankk98/multi-model-ai-resource-allocation-for-humanoid-robots-a-survey-on-jetson-orin-nano-super-310i
- TRT multiple engines: https://github.com/NVIDIA/TensorRT/issues/4358
- TRT memory on Jetson: https://github.com/ultralytics/ultralytics/issues/21562
- Moondream: https://moondream.ai/blog/introducing-moondream-0-5b
- Cosmos-Reason2-2B Jetson benchmark: https://www.thenextgentechinsider.com/pulse/cosmos-reason2-runs-on-jetson-orin-nano-super-with-w4a16-quantization
- Jetson AI Lab benchmarks: https://www.jetson-ai-lab.com/tutorials/genai-benchmarking/
- Jetson LLM bottleneck: https://ericxliu.me/posts/benchmarking-llms-on-jetson-orin-nano/
- vLLM on Jetson: https://learnopencv.com/deployment-on-edge-vllm-on-jetson/
- TRT-LLM no edge support: https://github.com/NVIDIA/TensorRT-LLM/issues/7978
- PatchBlock defense: https://arxiv.org/abs/2601.00367
- Adversarial patches on YOLO: https://link.springer.com/article/10.1007/s10207-025-01067-3
- GenCAMO synthetic data: https://arxiv.org/abs/2601.01181
- CamouflageAnything (CVPR 2025): https://openaccess.thecvf.com/content/CVPR2025/html/Das_Camouflage_Anything_...
- GraphMorph centerlines: https://arxiv.org/pdf/2502.11731
- Learnable skeleton + SAM: https://ui.adsabs.harvard.edu/abs/2025ITGRS..63S1458X
- Kalman filter gimbal: https://ieeexplore.ieee.org/ielx7/6287639/10005208/10160027.pdf
- UAV-VL-R1: https://arxiv.org/pdf/2508.11196
- ViewPro Protocol: https://www.viewprotech.com/index.php?ac=article&at=read&did=510
- servopilot PID: https://pypi.org/project/servopilot/

## Related Artifacts
- AC assessment: `_docs/00_research/00_ac_assessment.md`
- Previous draft: `_docs/01_solution/solution_draft01.md`