# VLMClient ## 1. High-Level Overview **Purpose**: IPC client that communicates with the NanoLLM Docker container via Unix domain socket. Sends ROI image + text prompt, receives analysis text. Manages VLM lifecycle (load/unload to free GPU memory). **Architectural Pattern**: Client adapter with lifecycle management. **Upstream dependencies**: Config helper (socket path, model name, timeout), Types helper **Downstream consumers**: ScanController ## 2. Internal Interfaces ### Interface: VLMClient | Method | Input | Output | Async | Error Types | |--------|-------|--------|-------|-------------| | `connect()` | — | bool | No | ConnectionError | | `disconnect()` | — | — | No | — | | `is_available()` | — | bool | No | — | | `analyze(image, prompt)` | numpy (H,W,3), str | VLMResponse | No (blocks up to 5s) | VLMTimeoutError, VLMError | | `load_model()` | — | — | No | ModelLoadError | | `unload_model()` | — | — | No | — | **VLMResponse**: ``` text: str — VLM analysis text confidence: float (0-1) — extracted from response or heuristic latency_ms: float — round-trip time ``` **IPC Protocol** (Unix domain socket, JSON messages): ```json // Request {"type": "analyze", "image_path": "/tmp/roi_1234.jpg", "prompt": "..."} // Response {"type": "result", "text": "...", "tokens": 42, "latency_ms": 2100} // Load/unload {"type": "load_model", "model": "VILA1.5-3B"} {"type": "unload_model"} {"type": "status", "loaded": true, "model": "VILA1.5-3B", "gpu_mb": 2800} ``` ## 5. Implementation Details **Lifecycle**: - L1 sweep: VLM unloaded (GPU memory freed for YOLOE) - L2 investigation: VLM loaded on demand when Tier 2 result is ambiguous - Load time: ~5-10s (model loading + warmup) - ScanController decides when to load/unload **Prompt template** (generic visual descriptors, not military jargon): ``` Analyze this aerial image crop. Describe what you see at the center of the image. Is there a structure, entrance, or covered area? Is there evidence of recent human activity (disturbed ground, fresh tracks, organized materials)? Answer briefly: what is the most likely explanation for the dark/dense area? ``` **Key Dependencies**: | Library | Version | Purpose | |---------|---------|---------| | socket (stdlib) | — | Unix domain socket client | | json (stdlib) | — | IPC message serialization | | OpenCV | 4.x | Save ROI crop as temporary JPEG for IPC | **Error Handling Strategy**: - Connection refused → VLM container not running → is_available()=false - Timeout (>5s) → VLMTimeoutError → ScanController skips Tier 3 - 3 consecutive errors → ScanController sets vlm_available=false ## 7. Caveats & Edge Cases **Known limitations**: - NanoLLM model selection limited: VILA, LLaVA, Obsidian only - Model load time (~5-10s) delays first L2 VLM analysis - ROI crop saved to /tmp as JPEG for IPC (disk I/O, ~1ms) **Potential race conditions**: - ScanController requests unload while analyze() is in progress → client must wait for response before unloading ## 8. Dependency Graph **Must be implemented after**: Config helper, Types helper **Can be implemented in parallel with**: Tier1Detector, Tier2SpatialAnalyzer, GimbalDriver, OutputManager **Blocks**: ScanController (needs VLMClient for L2 Tier 3 analysis) ## 9. Logging Strategy | Log Level | When | Example | |-----------|------|---------| | ERROR | Connection refused, model load failed | `VLM connection refused at /tmp/vlm.sock` | | WARN | Timeout, high latency | `VLM analyze timeout after 5000ms` | | INFO | Model loaded/unloaded, analysis result | `VLM loaded VILA1.5-3B (2800MB GPU). Analysis: "branch-covered structure"` |