Made-with: Cursor
3.6 KiB
VLMClient
1. High-Level Overview
Purpose: IPC client that communicates with the NanoLLM Docker container via Unix domain socket. Sends ROI image + text prompt, receives analysis text. Manages VLM lifecycle (load/unload to free GPU memory).
Architectural Pattern: Client adapter with lifecycle management.
Upstream dependencies: Config helper (socket path, model name, timeout), Types helper
Downstream consumers: ScanController
2. Internal Interfaces
Interface: VLMClient
| Method | Input | Output | Async | Error Types |
|---|---|---|---|---|
connect() |
— | bool | No | ConnectionError |
disconnect() |
— | — | No | — |
is_available() |
— | bool | No | — |
analyze(image, prompt) |
numpy (H,W,3), str | VLMResponse | No (blocks up to 5s) | VLMTimeoutError, VLMError |
load_model() |
— | — | No | ModelLoadError |
unload_model() |
— | — | No | — |
VLMResponse:
text: str — VLM analysis text
confidence: float (0-1) — extracted from response or heuristic
latency_ms: float — round-trip time
IPC Protocol (Unix domain socket, JSON messages):
// Request
{"type": "analyze", "image_path": "/tmp/roi_1234.jpg", "prompt": "..."}
// Response
{"type": "result", "text": "...", "tokens": 42, "latency_ms": 2100}
// Load/unload
{"type": "load_model", "model": "VILA1.5-3B"}
{"type": "unload_model"}
{"type": "status", "loaded": true, "model": "VILA1.5-3B", "gpu_mb": 2800}
5. Implementation Details
Lifecycle:
- L1 sweep: VLM unloaded (GPU memory freed for YOLOE)
- L2 investigation: VLM loaded on demand when Tier 2 result is ambiguous
- Load time: ~5-10s (model loading + warmup)
- ScanController decides when to load/unload
Prompt template (generic visual descriptors, not military jargon):
Analyze this aerial image crop. Describe what you see at the center of the image.
Is there a structure, entrance, or covered area? Is there evidence of recent
human activity (disturbed ground, fresh tracks, organized materials)?
Answer briefly: what is the most likely explanation for the dark/dense area?
Key Dependencies:
| Library | Version | Purpose |
|---|---|---|
| socket (stdlib) | — | Unix domain socket client |
| json (stdlib) | — | IPC message serialization |
| OpenCV | 4.x | Save ROI crop as temporary JPEG for IPC |
Error Handling Strategy:
- Connection refused → VLM container not running → is_available()=false
- Timeout (>5s) → VLMTimeoutError → ScanController skips Tier 3
- 3 consecutive errors → ScanController sets vlm_available=false
7. Caveats & Edge Cases
Known limitations:
- NanoLLM model selection limited: VILA, LLaVA, Obsidian only
- Model load time (~5-10s) delays first L2 VLM analysis
- ROI crop saved to /tmp as JPEG for IPC (disk I/O, ~1ms)
Potential race conditions:
- ScanController requests unload while analyze() is in progress → client must wait for response before unloading
8. Dependency Graph
Must be implemented after: Config helper, Types helper Can be implemented in parallel with: Tier1Detector, Tier2SpatialAnalyzer, GimbalDriver, OutputManager Blocks: ScanController (needs VLMClient for L2 Tier 3 analysis)
9. Logging Strategy
| Log Level | When | Example |
|---|---|---|
| ERROR | Connection refused, model load failed | VLM connection refused at /tmp/vlm.sock |
| WARN | Timeout, high latency | VLM analyze timeout after 5000ms |
| INFO | Model loaded/unloaded, analysis result | VLM loaded VILA1.5-3B (2800MB GPU). Analysis: "branch-covered structure" |