# VLMClient

## 1. High-Level Overview

**Purpose**: IPC client that communicates with the NanoLLM Docker container via Unix domain socket. Sends ROI image + text prompt, receives analysis text. Manages VLM lifecycle (load/unload to free GPU memory).

**Architectural Pattern**: Client adapter with lifecycle management.

**Upstream dependencies**: Config helper (socket path, model name, timeout), Types helper

**Downstream consumers**: ScanController

## 2. Internal Interfaces

### Interface: VLMClient

| Method | Input | Output | Async | Error Types |
|--------|-------|--------|-------|-------------|
| `connect()` | — | bool | No | ConnectionError |
| `disconnect()` | — | — | No | — |
| `is_available()` | — | bool | No | — |
| `analyze(image, prompt)` | numpy (H,W,3), str | VLMResponse | No (blocks up to 5s) | VLMTimeoutError, VLMError |
| `load_model()` | — | — | No | ModelLoadError |
| `unload_model()` | — | — | No | — |

**VLMResponse**:
```
text: str — VLM analysis text
confidence: float (0-1) — extracted from response or heuristic
latency_ms: float — round-trip time
```

**IPC Protocol** (Unix domain socket, JSON messages):
```json
// Request
{"type": "analyze", "image_path": "/tmp/roi_1234.jpg", "prompt": "..."}

// Response
{"type": "result", "text": "...", "tokens": 42, "latency_ms": 2100}

// Load/unload
{"type": "load_model", "model": "VILA1.5-3B"}
{"type": "unload_model"}
{"type": "status", "loaded": true, "model": "VILA1.5-3B", "gpu_mb": 2800}
```

## 5. Implementation Details

**Lifecycle**:
- L1 sweep: VLM unloaded (GPU memory freed for YOLOE)
- L2 investigation: VLM loaded on demand when Tier 2 result is ambiguous
- Load time: ~5-10s (model loading + warmup)
- ScanController decides when to load/unload

**Prompt template** (generic visual descriptors, not military jargon):
```
Analyze this aerial image crop. Describe what you see at the center of the image.
Is there a structure, entrance, or covered area? Is there evidence of recent
human activity (disturbed ground, fresh tracks, organized materials)?
Answer briefly: what is the most likely explanation for the dark/dense area?
```

**Key Dependencies**:

| Library | Version | Purpose |
|---------|---------|---------|
| socket (stdlib) | — | Unix domain socket client |
| json (stdlib) | — | IPC message serialization |
| OpenCV | 4.x | Save ROI crop as temporary JPEG for IPC |

**Error Handling Strategy**:
- Connection refused → VLM container not running → is_available()=false
- Timeout (>5s) → VLMTimeoutError → ScanController skips Tier 3
- 3 consecutive errors → ScanController sets vlm_available=false

## 7. Caveats & Edge Cases

**Known limitations**:
- NanoLLM model selection limited: VILA, LLaVA, Obsidian only
- Model load time (~5-10s) delays first L2 VLM analysis
- ROI crop saved to /tmp as JPEG for IPC (disk I/O, ~1ms)

**Potential race conditions**:
- ScanController requests unload while analyze() is in progress → client must wait for response before unloading

## 8. Dependency Graph

**Must be implemented after**: Config helper, Types helper
**Can be implemented in parallel with**: Tier1Detector, Tier2SpatialAnalyzer, GimbalDriver, OutputManager
**Blocks**: ScanController (needs VLMClient for L2 Tier 3 analysis)

## 9. Logging Strategy

| Log Level | When | Example |
|-----------|------|---------|
| ERROR | Connection refused, model load failed | `VLM connection refused at /tmp/vlm.sock` |
| WARN | Timeout, high latency | `VLM analyze timeout after 5000ms` |
| INFO | Model loaded/unloaded, analysis result | `VLM loaded VILA1.5-3B (2800MB GPU). Analysis: "branch-covered structure"` |