Files
detections-semantic/_docs/02_plans/components/04_vlm_client/description.md
T
Oleksandr Bezdieniezhnykh 8e2ecf50fd Initial commit
Made-with: Cursor
2026-03-26 00:20:30 +02:00

3.6 KiB

VLMClient

1. High-Level Overview

Purpose: IPC client that communicates with the NanoLLM Docker container via Unix domain socket. Sends ROI image + text prompt, receives analysis text. Manages VLM lifecycle (load/unload to free GPU memory).

Architectural Pattern: Client adapter with lifecycle management.

Upstream dependencies: Config helper (socket path, model name, timeout), Types helper

Downstream consumers: ScanController

2. Internal Interfaces

Interface: VLMClient

Method Input Output Async Error Types
connect() bool No ConnectionError
disconnect() No
is_available() bool No
analyze(image, prompt) numpy (H,W,3), str VLMResponse No (blocks up to 5s) VLMTimeoutError, VLMError
load_model() No ModelLoadError
unload_model() No

VLMResponse:

text: str — VLM analysis text
confidence: float (0-1) — extracted from response or heuristic
latency_ms: float — round-trip time

IPC Protocol (Unix domain socket, JSON messages):

// Request
{"type": "analyze", "image_path": "/tmp/roi_1234.jpg", "prompt": "..."}

// Response
{"type": "result", "text": "...", "tokens": 42, "latency_ms": 2100}

// Load/unload
{"type": "load_model", "model": "VILA1.5-3B"}
{"type": "unload_model"}
{"type": "status", "loaded": true, "model": "VILA1.5-3B", "gpu_mb": 2800}

5. Implementation Details

Lifecycle:

  • L1 sweep: VLM unloaded (GPU memory freed for YOLOE)
  • L2 investigation: VLM loaded on demand when Tier 2 result is ambiguous
  • Load time: ~5-10s (model loading + warmup)
  • ScanController decides when to load/unload

Prompt template (generic visual descriptors, not military jargon):

Analyze this aerial image crop. Describe what you see at the center of the image.
Is there a structure, entrance, or covered area? Is there evidence of recent
human activity (disturbed ground, fresh tracks, organized materials)?
Answer briefly: what is the most likely explanation for the dark/dense area?

Key Dependencies:

Library Version Purpose
socket (stdlib) Unix domain socket client
json (stdlib) IPC message serialization
OpenCV 4.x Save ROI crop as temporary JPEG for IPC

Error Handling Strategy:

  • Connection refused → VLM container not running → is_available()=false
  • Timeout (>5s) → VLMTimeoutError → ScanController skips Tier 3
  • 3 consecutive errors → ScanController sets vlm_available=false

7. Caveats & Edge Cases

Known limitations:

  • NanoLLM model selection limited: VILA, LLaVA, Obsidian only
  • Model load time (~5-10s) delays first L2 VLM analysis
  • ROI crop saved to /tmp as JPEG for IPC (disk I/O, ~1ms)

Potential race conditions:

  • ScanController requests unload while analyze() is in progress → client must wait for response before unloading

8. Dependency Graph

Must be implemented after: Config helper, Types helper Can be implemented in parallel with: Tier1Detector, Tier2SpatialAnalyzer, GimbalDriver, OutputManager Blocks: ScanController (needs VLMClient for L2 Tier 3 analysis)

9. Logging Strategy

Log Level When Example
ERROR Connection refused, model load failed VLM connection refused at /tmp/vlm.sock
WARN Timeout, high latency VLM analyze timeout after 5000ms
INFO Model loaded/unloaded, analysis result VLM loaded VILA1.5-3B (2800MB GPU). Analysis: "branch-covered structure"