mirror of https://github.com/azaion/detections-semantic.git synced 2026-04-23 01:06:37 +00:00

Files

T

Oleksandr Bezdieniezhnykh 8e2ecf50fd Initial commit

Made-with: Cursor

2026-03-26 00:20:30 +02:00

3.6 KiB

Raw Blame History

VLMClient

1. High-Level Overview

Purpose: IPC client that communicates with the NanoLLM Docker container via Unix domain socket. Sends ROI image + text prompt, receives analysis text. Manages VLM lifecycle (load/unload to free GPU memory).

Architectural Pattern: Client adapter with lifecycle management.

Upstream dependencies: Config helper (socket path, model name, timeout), Types helper

Downstream consumers: ScanController

2. Internal Interfaces

Interface: VLMClient

Method	Input	Output	Async	Error Types
`connect()`	—	bool	No	ConnectionError
`disconnect()`	—	—	No	—
`is_available()`	—	bool	No	—
`analyze(image, prompt)`	numpy (H,W,3), str	VLMResponse	No (blocks up to 5s)	VLMTimeoutError, VLMError
`load_model()`	—	—	No	ModelLoadError
`unload_model()`	—	—	No	—

VLMResponse:

text: str — VLM analysis text
confidence: float (0-1) — extracted from response or heuristic
latency_ms: float — round-trip time

IPC Protocol (Unix domain socket, JSON messages):

// Request
{"type": "analyze", "image_path": "/tmp/roi_1234.jpg", "prompt": "..."}

// Response
{"type": "result", "text": "...", "tokens": 42, "latency_ms": 2100}

// Load/unload
{"type": "load_model", "model": "VILA1.5-3B"}
{"type": "unload_model"}
{"type": "status", "loaded": true, "model": "VILA1.5-3B", "gpu_mb": 2800}

5. Implementation Details

Lifecycle:

L1 sweep: VLM unloaded (GPU memory freed for YOLOE)
L2 investigation: VLM loaded on demand when Tier 2 result is ambiguous
Load time: ~5-10s (model loading + warmup)
ScanController decides when to load/unload

Prompt template (generic visual descriptors, not military jargon):

Analyze this aerial image crop. Describe what you see at the center of the image.
Is there a structure, entrance, or covered area? Is there evidence of recent
human activity (disturbed ground, fresh tracks, organized materials)?
Answer briefly: what is the most likely explanation for the dark/dense area?

Key Dependencies:

Library	Version	Purpose
socket (stdlib)	—	Unix domain socket client
json (stdlib)	—	IPC message serialization
OpenCV	4.x	Save ROI crop as temporary JPEG for IPC

Error Handling Strategy:

Connection refused → VLM container not running → is_available()=false
Timeout (>5s) → VLMTimeoutError → ScanController skips Tier 3
3 consecutive errors → ScanController sets vlm_available=false

7. Caveats & Edge Cases

Known limitations:

NanoLLM model selection limited: VILA, LLaVA, Obsidian only
Model load time (~5-10s) delays first L2 VLM analysis
ROI crop saved to /tmp as JPEG for IPC (disk I/O, ~1ms)

Potential race conditions:

ScanController requests unload while analyze() is in progress → client must wait for response before unloading

8. Dependency Graph

Must be implemented after: Config helper, Types helper Can be implemented in parallel with: Tier1Detector, Tier2SpatialAnalyzer, GimbalDriver, OutputManager Blocks: ScanController (needs VLMClient for L2 Tier 3 analysis)

9. Logging Strategy

Log Level	When	Example
ERROR	Connection refused, model load failed	`VLM connection refused at /tmp/vlm.sock`
WARN	Timeout, high latency	`VLM analyze timeout after 5000ms`
INFO	Model loaded/unloaded, analysis result	`VLM loaded VILA1.5-3B (2800MB GPU). Analysis: "branch-covered structure"`

3.6 KiB Raw Blame History