Initial commit

Made-with: Cursor
2026-06-22 05:51:13 +00:00 · 2026-03-26 00:20:30 +02:00
commit 8e2ecf50fd
144 changed files with 19781 additions and 0 deletions
@@ -0,0 +1,98 @@
+# VLMClient
+
+## 1. High-Level Overview
+
+**Purpose**: IPC client that communicates with the NanoLLM Docker container via Unix domain socket. Sends ROI image + text prompt, receives analysis text. Manages VLM lifecycle (load/unload to free GPU memory).
+
+**Architectural Pattern**: Client adapter with lifecycle management.
+
+**Upstream dependencies**: Config helper (socket path, model name, timeout), Types helper
+
+**Downstream consumers**: ScanController
+
+## 2. Internal Interfaces
+
+### Interface: VLMClient
+
+| Method | Input | Output | Async | Error Types |
+|--------|-------|--------|-------|-------------|
+| `connect()` | — | bool | No | ConnectionError |
+| `disconnect()` | — | — | No | — |
+| `is_available()` | — | bool | No | — |
+| `analyze(image, prompt)` | numpy (H,W,3), str | VLMResponse | No (blocks up to 5s) | VLMTimeoutError, VLMError |
+| `load_model()` | — | — | No | ModelLoadError |
+| `unload_model()` | — | — | No | — |
+
+**VLMResponse**:
+```
+text: str — VLM analysis text
+confidence: float (0-1) — extracted from response or heuristic
+latency_ms: float — round-trip time
+```
+
+**IPC Protocol** (Unix domain socket, JSON messages):
+```json
+// Request
+{"type": "analyze", "image_path": "/tmp/roi_1234.jpg", "prompt": "..."}
+
+// Response
+{"type": "result", "text": "...", "tokens": 42, "latency_ms": 2100}
+
+// Load/unload
+{"type": "load_model", "model": "VILA1.5-3B"}
+{"type": "unload_model"}
+{"type": "status", "loaded": true, "model": "VILA1.5-3B", "gpu_mb": 2800}
+```
+
+## 5. Implementation Details
+
+**Lifecycle**:
+- L1 sweep: VLM unloaded (GPU memory freed for YOLOE)
+- L2 investigation: VLM loaded on demand when Tier 2 result is ambiguous
+- Load time: ~5-10s (model loading + warmup)
+- ScanController decides when to load/unload
+
+**Prompt template** (generic visual descriptors, not military jargon):
+```
+Analyze this aerial image crop. Describe what you see at the center of the image.
+Is there a structure, entrance, or covered area? Is there evidence of recent
+human activity (disturbed ground, fresh tracks, organized materials)?
+Answer briefly: what is the most likely explanation for the dark/dense area?
+```
+
+**Key Dependencies**:
+
+| Library | Version | Purpose |
+|---------|---------|---------|
+| socket (stdlib) | — | Unix domain socket client |
+| json (stdlib) | — | IPC message serialization |
+| OpenCV | 4.x | Save ROI crop as temporary JPEG for IPC |
+
+**Error Handling Strategy**:
+- Connection refused → VLM container not running → is_available()=false
+- Timeout (>5s) → VLMTimeoutError → ScanController skips Tier 3
+- 3 consecutive errors → ScanController sets vlm_available=false
+
+## 7. Caveats & Edge Cases
+
+**Known limitations**:
+- NanoLLM model selection limited: VILA, LLaVA, Obsidian only
+- Model load time (~5-10s) delays first L2 VLM analysis
+- ROI crop saved to /tmp as JPEG for IPC (disk I/O, ~1ms)
+
+**Potential race conditions**:
+- ScanController requests unload while analyze() is in progress → client must wait for response before unloading
+
+## 8. Dependency Graph
+
+**Must be implemented after**: Config helper, Types helper
+**Can be implemented in parallel with**: Tier1Detector, Tier2SpatialAnalyzer, GimbalDriver, OutputManager
+**Blocks**: ScanController (needs VLMClient for L2 Tier 3 analysis)
+
+## 9. Logging Strategy
+
+| Log Level | When | Example |
+|-----------|------|---------|
+| ERROR | Connection refused, model load failed | `VLM connection refused at /tmp/vlm.sock` |
+| WARN | Timeout, high latency | `VLM analyze timeout after 5000ms` |
+| INFO | Model loaded/unloaded, analysis result | `VLM loaded VILA1.5-3B (2800MB GPU). Analysis: "branch-covered structure"` |