mirror of
https://github.com/azaion/detections-semantic.git
synced 2026-04-22 22:16:37 +00:00
8e2ecf50fd
Made-with: Cursor
320 lines
24 KiB
Markdown
320 lines
24 KiB
Markdown
# Solution Draft
|
|
|
|
## Assessment Findings
|
|
|
|
| Old Component Solution | Weak Point (functional/security/performance) | New Solution |
|
|
|------------------------|----------------------------------------------|-------------|
|
|
| YOLO26 as sole detection backbone | **Accuracy regression on custom datasets**: Reported YOLO26s "much less accurate" than YOLO11s on identical training data (GitHub #23206). YOLO26 is 3 months old — less battle-tested than YOLO11. | Benchmark YOLO26 vs YOLO11 on initial annotated data before committing. YOLO11 as fallback. YOLOE supports both backbones (yoloe-11s-seg, yoloe-26s-seg). |
|
|
| YOLO26 TensorRT INT8 export | **INT8 export fails on Jetson** (TRT Error Code 2, OOM). Fix merged (PR #23928) but indicates fragile tooling. | Use FP16 only for initial deployment (confirmed stable). INT8 as future optimization after tooling matures. Pin Ultralytics version + JetPack version. |
|
|
| vLLM as VLM runtime | **Unstable on Jetson Orin Nano**: system freezes, reboots, installation crashes, excessive memory (multiple open issues). Not production-ready for 8GB devices. | **Replace with NanoLLM/NanoVLM** — purpose-built for Jetson by NVIDIA's Dusty-NV team. Docker containers for JetPack 5/6. Supports VILA, LLaVA. Stable. Or use llama.cpp with GGUF models (proven on Jetson). |
|
|
| No storage strategy | **SD card corruption**: Recurring corruption documented across multiple Jetson Orin Nano users. SD cards unsuitable for production. | **Mandatory NVMe SSD** for OS + models + logging. No SD card in production. Ruggedized NVMe mount for vibration resistance. |
|
|
| No EMI protection on UART | **ViewPro documents EMI issues**: antennas cause random gimbal panning if within 35cm. Standard UART parity bit insufficient for noisy UAV environment. | Add CRC-16 checksum layer on gimbal commands. Enforce 35cm antenna separation in physical design. Consider shielded UART cable. Command retry on CRC failure (max 3 retries, then log error). |
|
|
| No environmental hardening addressed | **UAV environment**: vibration, temperature extremes (-20°C to +50°C), dust, EMI, power fluctuations. Dev kit form factor is not field-deployable. | Use ruggedized carrier board (MILBOX-ORNX or similar) with vibration dampening. Conformal coating on exposed connectors. External temperature sensor for environmental monitoring. |
|
|
| No logging or telemetry | **No post-flight review capability**: field system must log all detections with metadata for model iteration, operator review, and evidence collection. | Add detection logging: timestamp, GPS-denied coordinates, confidence score, detection class, JPEG thumbnail, tier that triggered, freshness metadata. Log to NVMe SSD. Export as structured format (JSON lines) after flight. |
|
|
| No frame recording for offline replay | **Training data collection depends on field recording**: Without recording, no way to build training dataset from real flights. | Record all camera frames to NVMe at configurable rate (1-5 FPS during Level 1, full rate during Level 2). Include detection overlay option. Post-flight: use recordings for annotation. |
|
|
| No power management | **UAV power budget is finite**: Jetson at 15W + gimbal + camera + radio. No monitoring of power draw or load shedding. | Monitor power consumption via Jetson's INA sensors. Power budget alert at 80% of allocated watts. Load shedding: disable VLM first, then reduce inference rate, then disable semantic detection. |
|
|
| YOLO26 not validated for this domain | **No benchmark on aerial concealment detection**: All YOLO26 numbers are on COCO/LVIS. Concealment detection may behave very differently. | First sprint deliverable: benchmark YOLOE-26 (both 11 and 26 backbones) on semantic01-04.png with text/visual prompts. Report AP on initial annotated validation set before committing to backbone. |
|
|
| Freshness and path tracing are untested algorithms | **No proven prior art**: Both freshness assessment and path-following via skeletonization are novel combinations. Risk of over-engineering before validation. | Implement minimal viable versions first. V1 path tracing: skeleton + endpoint only, no freshness, no junction following. Validate on real flight data before adding complexity. |
|
|
|
|
## Product Solution Description
|
|
|
|
A three-tier semantic detection system for identifying concealed/camouflaged positions from reconnaissance UAV aerial imagery, running on Jetson Orin Nano Super with NVMe SSD storage, active cooling, and ruggedized carrier board, alongside the existing YOLO detection pipeline.
|
|
|
|
```
|
|
┌──────────────────────────────────────────────────────────────────────────┐
|
|
│ JETSON ORIN NANO SUPER (ruggedized carrier, NVMe, 15W) │
|
|
│ │
|
|
│ ┌──────────┐ ┌──────────────┐ ┌──────────────┐ ┌───────────┐ │
|
|
│ │ ViewPro │───▶│ Tier 1 │───▶│ Tier 2 │───▶│ Tier 3 │ │
|
|
│ │ A40 │ │ YOLOE │ │ Path Trace │ │ VLM │ │
|
|
│ │ Camera │ │ (11 or 26 │ │ + CNN │ │ NanoLLM │ │
|
|
│ │ + Frame │ │ backbone) │ │ ≤200ms │ │ (L2 only) │ │
|
|
│ │ Quality │ │ TRT FP16 │ │ │ │ ≤5s │ │
|
|
│ │ Gate │ │ ≤100ms │ │ │ │ │ │
|
|
│ └────▲─────┘ └──────────────┘ └──────────────┘ └───────────┘ │
|
|
│ │ │
|
|
│ ┌────┴─────┐ ┌──────────────┐ ┌──────────────┐ ┌───────────┐ │
|
|
│ │ Gimbal │◀───│ Scan │ │ Watchdog │ │ Recorder │ │
|
|
│ │ Control │ │ Controller │ │ + Thermal │ │ + Logger │ │
|
|
│ │ + CRC │ │ (L1/L2 FSM) │ │ + Power │ │ (NVMe) │ │
|
|
│ └──────────┘ └──────────────┘ └──────────────┘ └───────────┘ │
|
|
│ │
|
|
│ ┌──────────────────────────────┐ │
|
|
│ │ Existing YOLO Detection │ (always running, scene context) │
|
|
│ │ Cython + TRT │ │
|
|
│ └──────────────────────────────┘ │
|
|
└──────────────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
Key changes from draft02:
|
|
- **YOLOE backbone is configurable** (YOLO11 or YOLO26) — benchmark before committing
|
|
- **NanoLLM replaces vLLM** as VLM runtime (purpose-built for Jetson, stable)
|
|
- **NVMe SSD mandatory** — no SD card in production
|
|
- **CRC-16 on gimbal UART** — EMI protection
|
|
- **Detection logger + frame recorder** — post-flight review and training data collection
|
|
- **Ruggedized carrier board** — vibration, temperature, dust protection
|
|
- **Power monitoring + load shedding** — finite UAV power budget
|
|
- **FP16 only** for initial deployment (INT8 export unstable on Jetson)
|
|
- **Minimal V1 for unproven components** — path tracing and freshness start simple
|
|
|
|
## Architecture
|
|
|
|
### Component 1: Tier 1 — Real-Time Detection
|
|
|
|
| Solution | Tools | Advantages | Limitations | Requirements | Security | Performance | Fit |
|
|
|----------|-------|-----------|-------------|-------------|----------|------------|-----|
|
|
| **YOLOE with configurable backbone (recommended)** | yoloe-11s-seg.pt or yoloe-26s-seg.pt, set_classes() → TRT FP16 | Supports both YOLO11 and YOLO26 backbones. Benchmark on real data, pick winner. set_classes() bakes CLIP embeddings for zero overhead. | YOLO26 may regress on custom data vs YOLO11. Needs empirical comparison. | Ultralytics ≥8.4 (pinned version), TensorRT, JetPack 6.2 | Local only | YOLO11s TRT FP16: ~7ms (640px). YOLO26s: similar or slightly faster. | **Best fit. Hedge against backbone risk.** |
|
|
|
|
**Version pinning strategy**:
|
|
- Pin `ultralytics==8.4.X` (specific patch version validated on Jetson)
|
|
- Pin JetPack 6.2 + TensorRT version
|
|
- Test every Ultralytics update in staging before deploying to production
|
|
- Keep both yoloe-11s-seg and yoloe-26s-seg TRT engines on NVMe; switch via config
|
|
|
|
**YOLO backbone selection process (Sprint 1)**:
|
|
1. Annotate 200 frames from real flight footage (footpaths, branch piles, entrances)
|
|
2. Fine-tune YOLOE-11s-seg and YOLOE-26s-seg on same dataset, same hyperparameters
|
|
3. Evaluate on held-out validation set (50 frames)
|
|
4. Pick backbone with higher mAP50
|
|
5. If delta < 2%: pick YOLO26 (faster CPU inference, NMS-free deployment)
|
|
|
|
### Component 2: Tier 2 — Spatial Reasoning & CNN Confirmation
|
|
|
|
| Solution | Tools | Advantages | Limitations | Requirements | Security | Performance | Fit |
|
|
|----------|-------|-----------|-------------|-------------|----------|------------|-----|
|
|
| **V1 minimal path tracing + heuristic classifier (recommended for initial release)** | OpenCV, scikit-image | No training data needed. Skeleton + endpoint detection + simple heuristic: "dark mass at endpoint → flag." Fast to implement and validate. | Low accuracy. Many false positives. | OpenCV, scikit-image | Offline | ~30ms | **V1: ship fast, validate on real data.** |
|
|
| **V2 trained CNN (after data collection)** | MobileNetV3-Small, TensorRT FP16 | Higher accuracy after training. Dynamic ROI sizing. | Needs 300+ positive, 1000+ negative annotated ROI crops. | PyTorch, TRT export | Offline | ~5-10ms classification | **V2: replace heuristic once data exists.** |
|
|
|
|
**V1 heuristic for endpoint analysis** (no training data needed):
|
|
1. Skeletonize footpath mask with branch pruning
|
|
2. Find endpoints
|
|
3. For each endpoint: extract ROI (dynamic size based on GSD)
|
|
4. Compute: mean_darkness = mean intensity in ROI center 50%. contrast = (surrounding_mean - center_mean) / surrounding_mean. area_ratio = dark_pixel_count / total_pixels.
|
|
5. If mean_darkness < threshold_dark AND contrast > threshold_contrast → flag as potential concealed position
|
|
6. Thresholds: configurable, tuned per season. Start with winter values.
|
|
|
|
**V1 freshness** (metadata only, not a filter):
|
|
- contrast_ratio of path vs surrounding terrain
|
|
- Report as: "high contrast" (likely fresh) / "low contrast" (likely stale)
|
|
- No binary classification. Operator sees all detections with freshness tag.
|
|
|
|
### Component 3: Tier 3 — VLM Deep Analysis
|
|
|
|
| Solution | Tools | Advantages | Limitations | Requirements | Security | Performance | Fit |
|
|
|----------|-------|-----------|-------------|-------------|----------|------------|-----|
|
|
| **NanoLLM with VILA-2.7B or VILA1.5-3B (recommended)** | NanoLLM Docker container, MLC/TVM quantization | Purpose-built for Jetson by NVIDIA team. Stable Docker containers. Optimized memory management. Supports VLMs natively. | Limited model selection (VILA, LLaVA, Obsidian). Not all VLMs available. | Docker, JetPack 6, NVMe for container storage | Local only, container isolation | ~15-25 tok/s on Orin Nano (4-bit MLC) | **Most stable Jetson VLM option.** |
|
|
| llama.cpp with GGUF VLM | llama.cpp, GGUF model files | Lightweight. No Docker needed. Proven stability on Jetson. Wide model support. | Manual build. Less optimized than NanoLLM for Jetson GPU. | llama.cpp build, GGUF weights | Local only | ~10-20 tok/s estimated | **Fallback if NanoLLM doesn't support needed model.** |
|
|
| ~~vLLM~~ | ~~vLLM Docker~~ | ~~High throughput~~ | **System freezes, reboots, installation crashes on Orin Nano. Multiple open bugs. Not production-ready.** | N/A | N/A | N/A | **Not recommended.** |
|
|
|
|
**Model selection for NanoLLM**:
|
|
- Primary: VILA1.5-3B (confirmed on Orin Nano, multimodal, 4-bit MLC)
|
|
- If UAV-VL-R1 GGUF weights become available: use via llama.cpp (aerial-specialized)
|
|
- Fallback: Obsidian-3B (mini VLM, lower accuracy but very fast)
|
|
|
|
### Component 4: Camera Gimbal Control
|
|
|
|
| Solution | Tools | Advantages | Limitations | Requirements | Security | Performance | Fit |
|
|
|----------|-------|-----------|-------------|-------------|----------|------------|-----|
|
|
| **ViewLink serial driver + CRC-16 + PID + watchdog (recommended)** | pyserial, crcmod, PID library, threading | Robust communication. CRC catches EMI-corrupted commands. Retry logic. Watchdog. | ViewLink protocol implementation from spec. Physical EMI mitigation required. | ViewPro docs, UART, shielded cable, 35cm antenna separation | Physical only | <10ms command + CRC overhead negligible | **Production-grade.** |
|
|
|
|
**UART reliability layer**:
|
|
```
|
|
Packet format: [SOF(2)] [CMD(N)] [CRC16(2)]
|
|
- SOF: 0xAA 0x55 (start of frame)
|
|
- CMD: ViewLink command bytes per protocol spec
|
|
- CRC16: CRC-CCITT over CMD bytes
|
|
```
|
|
- On send: compute CRC-16, append to ViewLink command packet
|
|
- On receive (gimbal feedback): validate CRC-16. Discard corrupted frames.
|
|
- On CRC failure (send): retry up to 3 times with 10ms delay. Log failure after 3 retries.
|
|
- Note: Check if ViewLink protocol already includes checksums (read full spec first). If so, use native checksum; don't add redundant CRC.
|
|
|
|
**Physical EMI mitigation checklist**:
|
|
- [ ] Gimbal UART cable: shielded, shortest possible run
|
|
- [ ] Video/data transmitter antenna: ≥35cm from gimbal (ViewPro recommendation)
|
|
- [ ] Independent power supply for gimbal (or filtered from main bus)
|
|
- [ ] Ferrite beads on UART cable near Jetson connector
|
|
|
|
### Component 5: Recording, Logging & Telemetry
|
|
|
|
| Solution | Tools | Advantages | Limitations | Requirements | Security | Performance | Fit |
|
|
|----------|-------|-----------|-------------|-------------|----------|------------|-----|
|
|
| **NVMe-backed frame recorder + JSON-lines detection logger (recommended)** | OpenCV VideoWriter / JPEG sequences, JSON lines, NVMe SSD | Post-flight review. Training data collection. Evidence. Detection audit trail. | NVMe write bandwidth (~500 MB/s) more than sufficient. Storage: ~2GB/min at 1080p 5FPS JPEG. | NVMe SSD ≥256GB | Physical access to NVMe | ~5ms per frame write (async) | **Essential for field deployment.** |
|
|
|
|
**Detection log format** (JSON lines, one per detection):
|
|
```json
|
|
{
|
|
"ts": "2026-03-19T14:32:01.234Z",
|
|
"frame_id": 12345,
|
|
"gps_denied_lat": 48.123456,
|
|
"gps_denied_lon": 37.654321,
|
|
"tier": 1,
|
|
"class": "footpath",
|
|
"confidence": 0.72,
|
|
"bbox": [0.12, 0.34, 0.45, 0.67],
|
|
"freshness": "high_contrast",
|
|
"tier2_result": "concealed_position",
|
|
"tier2_confidence": 0.85,
|
|
"tier3_used": false,
|
|
"thumbnail_path": "frames/12345_det_0.jpg"
|
|
}
|
|
```
|
|
|
|
**Frame recording strategy**:
|
|
- Level 1: record every 5th frame (1-2 FPS) — overview coverage
|
|
- Level 2: record every frame (30 FPS) — detailed analysis footage
|
|
- Storage budget: 256GB NVMe ≈ 2 hours at Level 2 full rate, or 10+ hours at Level 1 rate
|
|
- Circular buffer: when storage >80% full, overwrite oldest Level 1 frames (keep Level 2)
|
|
|
|
### Component 6: System Health & Resilience
|
|
|
|
**Monitoring threads**:
|
|
|
|
| Monitor | Check Interval | Threshold | Action |
|
|
|---------|---------------|-----------|--------|
|
|
| Thermal (T_junction) | 1s | >75°C | Degrade to Level 1 only |
|
|
| Thermal (T_junction) | 1s | >80°C | Disable semantic detection |
|
|
| Power (Jetson INA) | 2s | >80% budget | Disable VLM |
|
|
| Power (Jetson INA) | 2s | >90% budget | Reduce inference rate to 5 FPS |
|
|
| Gimbal heartbeat | 2s | No response | Force Level 1 sweep pattern |
|
|
| Semantic process | 5s | No heartbeat | Restart with 5s backoff, max 3 attempts |
|
|
| VLM process | 5s | No heartbeat | Mark Tier 3 unavailable, continue Tier 1+2 |
|
|
| NVMe free space | 60s | <20% free | Switch to Level 1 recording rate only |
|
|
| Frame quality | per frame | Laplacian var < threshold | Skip frame, use buffered good frame |
|
|
|
|
**Graceful degradation** (4 levels, unchanged from draft02):
|
|
|
|
| Level | Condition | Capability |
|
|
|-------|-----------|-----------|
|
|
| 0 — Full | All nominal, T < 70°C | Tier 1+2+3, Level 1+2, gimbal, recording |
|
|
| 1 — No VLM | VLM unavailable or T > 75°C or power > 80% | Tier 1+2, Level 1+2, gimbal, recording |
|
|
| 2 — No semantic | Semantic crashed 3x or T > 80°C | Existing YOLO only, Level 1 sweep, recording |
|
|
| 3 — No gimbal | Gimbal UART failed 3x | Existing YOLO only, fixed camera, recording |
|
|
|
|
### Component 7: Integration & Deployment
|
|
|
|
**Hardware BOM additions** (beyond existing system):
|
|
|
|
| Item | Purpose | Estimated Cost |
|
|
|------|---------|----------------|
|
|
| NVMe SSD ≥256GB (industrial grade) | OS + models + recording + logging | $40-80 |
|
|
| Active cooling fan (30mm+) | Prevent thermal throttling | $10-20 |
|
|
| Ruggedized carrier board (e.g., MILBOX-ORNX or custom) | Vibration, temperature, dust protection | $200-500 |
|
|
| Shielded UART cable + ferrite beads | EMI protection for gimbal communication | $10-20 |
|
|
| Total additional hardware | | ~$260-620 |
|
|
|
|
**Software deployment**:
|
|
- OS: JetPack 6.2 on NVMe SSD
|
|
- YOLOE models: TRT FP16 engines on NVMe (both 11 and 26 backbone variants)
|
|
- VLM: NanoLLM Docker container on NVMe
|
|
- Existing YOLO: current Cython + TRT pipeline (unchanged)
|
|
- New Cython modules: semantic detection, gimbal control, scan controller, recorder
|
|
- VLM process: separate Docker container, IPC via Unix socket
|
|
- Config: YAML file for all thresholds, class names, scan parameters, degradation thresholds
|
|
|
|
**Version control & update strategy**:
|
|
- Pin all dependency versions (Ultralytics, TensorRT, NanoLLM, OpenCV)
|
|
- Model updates: swap TRT engine files on NVMe, restart service
|
|
- Config updates: edit YAML, restart service
|
|
- No over-the-air updates (air-gapped system). USB drive for field updates.
|
|
|
|
## Training & Data Strategy
|
|
|
|
### Phase 0: Benchmark Sprint (Week 1-2)
|
|
- Deploy YOLOE-26s-seg and YOLOE-11s-seg in open-vocab mode
|
|
- Test text/visual prompts on semantic01-04.png + 50 additional frames
|
|
- Record results. Pick backbone with better qualitative detection.
|
|
- Deploy V1 heuristic endpoint analysis (no CNN, no training data needed)
|
|
- First field test flight with recording enabled
|
|
|
|
### Phase 1: Field validation & data collection (Week 2-6)
|
|
- Deploy TRT FP16 engine with best backbone
|
|
- Record all flights to NVMe
|
|
- Operator marks detections as true/false positive in post-flight review
|
|
- Build annotation backlog from recorded frames
|
|
- Target: 500 annotated frames by week 6
|
|
|
|
### Phase 2: Custom model training (Week 6-10)
|
|
- Fine-tune YOLOE-Seg on custom dataset (linear probing → full fine-tune)
|
|
- Train MobileNetV3-Small CNN on endpoint ROI crops
|
|
- A/B test: custom model vs YOLOE zero-shot on validation set
|
|
- Deploy winning model as new TRT engine
|
|
|
|
### Phase 3: VLM & refinement (Week 8-14)
|
|
- Deploy NanoLLM with VILA1.5-3B
|
|
- Tune prompting on collected ambiguous cases
|
|
- Train freshness classifier (if enough annotated freshness labels exist)
|
|
- Target: 1500+ images per class
|
|
|
|
### Phase 4: Seasonal expansion (Month 4+)
|
|
- Spring/summer annotation campaigns
|
|
- Re-train all models with multi-season data
|
|
- Adjust heuristic thresholds per season (configurable via YAML)
|
|
|
|
## Testing Strategy
|
|
|
|
### Integration / Functional Tests
|
|
- YOLOE text prompt detection on reference images (both 11 and 26 backbones)
|
|
- TRT FP16 export on Jetson Orin Nano Super (verify no OOM, no crash)
|
|
- V1 heuristic endpoint analysis on 20 synthetic masks (10 with hideouts, 10 without)
|
|
- Frame quality gate: inject blurry frames, verify rejection
|
|
- Gimbal CRC layer: inject corrupted commands, verify retry + log
|
|
- Gimbal watchdog: simulate hang, verify forced Level 1 within 2.5s
|
|
- NanoLLM VLM: load model, run inference on 10 aerial images, verify output + memory
|
|
- VLM load/unload cycle: 10 cycles without memory leak
|
|
- Detection logger: verify JSON-lines format, all fields populated
|
|
- Frame recorder: verify NVMe write speed, no dropped frames at 30 FPS
|
|
- Full pipeline end-to-end on recorded flight footage (offline replay)
|
|
- Graceful degradation: simulate each failure mode, verify correct degradation level
|
|
|
|
### Non-Functional Tests
|
|
- Tier 1 latency on Jetson Orin Nano Super TRT FP16: ≤100ms (both backbones)
|
|
- Tier 2 latency (V1 heuristic): ≤50ms. (V2 CNN): ≤200ms
|
|
- Tier 3 latency (NanoLLM VLM): ≤5 seconds
|
|
- Memory peak: all components loaded < 7GB
|
|
- Thermal: 60-minute sustained inference, T_junction < 75°C with active cooling
|
|
- NVMe endurance: continuous recording for 2 hours, verify no write errors
|
|
- Power draw: measure at each degradation level, verify within UAV power budget
|
|
- EMI test: operate near data transmitter antenna, verify no gimbal anomalies with CRC layer
|
|
- Cold start: power on → first detection within 60 seconds (model load time)
|
|
- Vibration: mount Jetson on vibration table, run inference, compare detection accuracy vs static
|
|
|
|
## Technology Maturity Assessment
|
|
|
|
| Component | Technology | Maturity | Risk | Mitigation |
|
|
|-----------|-----------|----------|------|------------|
|
|
| Tier 1 Detection | YOLOE/YOLO26/YOLO11 | **Medium** — YOLO26 is 3 months old, reported regressions on custom data. YOLOE-26 even newer. | Medium | Benchmark both backbones. YOLO11 is battle-tested fallback. Pin versions. |
|
|
| TRT FP16 Export | TensorRT on JetPack 6.2 | **High** — FP16 is stable on Jetson. Well-documented. | Low | FP16 only. Avoid INT8 initially. |
|
|
| TRT INT8 Export | TensorRT on JetPack 6.2 | **Low** — Documented crashes (PR #23928). Calibration issues. | High | Defer to Phase 3+. FP16 sufficient for now. |
|
|
| VLM (NanoLLM) | NanoLLM + VILA-3B | **Medium-High** — Purpose-built for Jetson by NVIDIA team. Docker-based. Monthly releases. | Low | More stable than vLLM. Use Docker containers. |
|
|
| VLM (vLLM) | vLLM on Jetson | **Low** — System freezes, crashes, open bugs. | **High** | **Do not use.** NanoLLM instead. |
|
|
| Path Tracing | Skeletonization + OpenCV | **High** — Decades-old algorithms. Well-understood. | Low | Pruning needed for noisy inputs. |
|
|
| CNN Classifier | MobileNetV3-Small + TRT | **High** — Proven architecture. TRT FP16 stable. | Low | Standard transfer learning. |
|
|
| Gimbal Control | ViewLink Serial Protocol | **Medium** — Protocol documented. ArduPilot driver exists. | Medium | EMI mitigation critical. CRC layer. |
|
|
| Freshness Assessment | Novel heuristic | **Low** — No prior art. Experimental. | High | V1: metadata only, not a filter. Iterate with data. |
|
|
| NVMe Storage | Industrial NVMe on Jetson | **High** — Production standard. SD card alternative is unreliable. | Low | Use industrial-grade SSD. |
|
|
| Ruggedized Hardware | MILBOX-ORNX or custom | **High** — Established product. Designed for Jetson + UAV. | Low | Standard procurement. |
|
|
|
|
## References
|
|
|
|
- YOLO26 docs: https://docs.ultralytics.com/models/yolo26/
|
|
- YOLOE docs: https://docs.ultralytics.com/models/yoloe/
|
|
- YOLO26 accuracy regression: https://github.com/ultralytics/ultralytics/issues/23206
|
|
- YOLO26 TRT INT8 crash: https://github.com/ultralytics/ultralytics/issues/23841
|
|
- YOLO26 TRT OOM fix: https://github.com/ultralytics/ultralytics/pull/23928
|
|
- vLLM Jetson freezes: https://github.com/dusty-nv/jetson-containers/issues/800
|
|
- vLLM Jetson install crash: https://github.com/vllm-project/vllm/issues/23376
|
|
- NanoLLM docs: https://dusty-nv.github.io/NanoLLM/
|
|
- NanoVLM: https://jetson-ai-lab.com/tutorial_nano-vlm.html
|
|
- MILBOX-ORNX: https://forecr.io/products/jetson-orin-nx-orin-nano-rugged-compact-pc-milbox-ornx
|
|
- Jetson SD card corruption: https://forums.developer.nvidia.com/t/corrupted-sd-cards/265418
|
|
- Jetson thermal throttling: https://www.alibaba.com/product-insights/how-to-run-private-llama-3-inference-on-a-200-jetson-orin-nano-without-thermal-throttling.html
|
|
- ViewPro EMI issues: https://www.viewprouav.com/help/gimbal/
|
|
- UART CRC reliability: https://link.springer.com/chapter/10.1007/978-981-19-8563-8_23
|
|
- Military edge AI thermal: https://www.mobilityengineeringtech.com/component/content/article/53967
|
|
- UAV-VL-R1: https://arxiv.org/pdf/2508.11196
|
|
- Qwen2-VL-2B-GPTQ-INT8: https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct-GPTQ-Int8
|
|
- Ultralytics Jetson guide: https://docs.ultralytics.com/guides/nvidia-jetson
|
|
- Skelite: https://arxiv.org/html/2503.07369v1
|
|
- YOLO FP reduction: https://docs.ultralytics.com/yolov5/tutorials/tips_for_best_training_results
|