mirror of
https://github.com/azaion/detections-semantic.git
synced 2026-04-22 21:26:38 +00:00
Initial commit
Made-with: Cursor
This commit is contained in:
@@ -0,0 +1,98 @@
|
||||
# VLMClient
|
||||
|
||||
## 1. High-Level Overview
|
||||
|
||||
**Purpose**: IPC client that communicates with the NanoLLM Docker container via Unix domain socket. Sends ROI image + text prompt, receives analysis text. Manages VLM lifecycle (load/unload to free GPU memory).
|
||||
|
||||
**Architectural Pattern**: Client adapter with lifecycle management.
|
||||
|
||||
**Upstream dependencies**: Config helper (socket path, model name, timeout), Types helper
|
||||
|
||||
**Downstream consumers**: ScanController
|
||||
|
||||
## 2. Internal Interfaces
|
||||
|
||||
### Interface: VLMClient
|
||||
|
||||
| Method | Input | Output | Async | Error Types |
|
||||
|--------|-------|--------|-------|-------------|
|
||||
| `connect()` | — | bool | No | ConnectionError |
|
||||
| `disconnect()` | — | — | No | — |
|
||||
| `is_available()` | — | bool | No | — |
|
||||
| `analyze(image, prompt)` | numpy (H,W,3), str | VLMResponse | No (blocks up to 5s) | VLMTimeoutError, VLMError |
|
||||
| `load_model()` | — | — | No | ModelLoadError |
|
||||
| `unload_model()` | — | — | No | — |
|
||||
|
||||
**VLMResponse**:
|
||||
```
|
||||
text: str — VLM analysis text
|
||||
confidence: float (0-1) — extracted from response or heuristic
|
||||
latency_ms: float — round-trip time
|
||||
```
|
||||
|
||||
**IPC Protocol** (Unix domain socket, JSON messages):
|
||||
```json
|
||||
// Request
|
||||
{"type": "analyze", "image_path": "/tmp/roi_1234.jpg", "prompt": "..."}
|
||||
|
||||
// Response
|
||||
{"type": "result", "text": "...", "tokens": 42, "latency_ms": 2100}
|
||||
|
||||
// Load/unload
|
||||
{"type": "load_model", "model": "VILA1.5-3B"}
|
||||
{"type": "unload_model"}
|
||||
{"type": "status", "loaded": true, "model": "VILA1.5-3B", "gpu_mb": 2800}
|
||||
```
|
||||
|
||||
## 5. Implementation Details
|
||||
|
||||
**Lifecycle**:
|
||||
- L1 sweep: VLM unloaded (GPU memory freed for YOLOE)
|
||||
- L2 investigation: VLM loaded on demand when Tier 2 result is ambiguous
|
||||
- Load time: ~5-10s (model loading + warmup)
|
||||
- ScanController decides when to load/unload
|
||||
|
||||
**Prompt template** (generic visual descriptors, not military jargon):
|
||||
```
|
||||
Analyze this aerial image crop. Describe what you see at the center of the image.
|
||||
Is there a structure, entrance, or covered area? Is there evidence of recent
|
||||
human activity (disturbed ground, fresh tracks, organized materials)?
|
||||
Answer briefly: what is the most likely explanation for the dark/dense area?
|
||||
```
|
||||
|
||||
**Key Dependencies**:
|
||||
|
||||
| Library | Version | Purpose |
|
||||
|---------|---------|---------|
|
||||
| socket (stdlib) | — | Unix domain socket client |
|
||||
| json (stdlib) | — | IPC message serialization |
|
||||
| OpenCV | 4.x | Save ROI crop as temporary JPEG for IPC |
|
||||
|
||||
**Error Handling Strategy**:
|
||||
- Connection refused → VLM container not running → is_available()=false
|
||||
- Timeout (>5s) → VLMTimeoutError → ScanController skips Tier 3
|
||||
- 3 consecutive errors → ScanController sets vlm_available=false
|
||||
|
||||
## 7. Caveats & Edge Cases
|
||||
|
||||
**Known limitations**:
|
||||
- NanoLLM model selection limited: VILA, LLaVA, Obsidian only
|
||||
- Model load time (~5-10s) delays first L2 VLM analysis
|
||||
- ROI crop saved to /tmp as JPEG for IPC (disk I/O, ~1ms)
|
||||
|
||||
**Potential race conditions**:
|
||||
- ScanController requests unload while analyze() is in progress → client must wait for response before unloading
|
||||
|
||||
## 8. Dependency Graph
|
||||
|
||||
**Must be implemented after**: Config helper, Types helper
|
||||
**Can be implemented in parallel with**: Tier1Detector, Tier2SpatialAnalyzer, GimbalDriver, OutputManager
|
||||
**Blocks**: ScanController (needs VLMClient for L2 Tier 3 analysis)
|
||||
|
||||
## 9. Logging Strategy
|
||||
|
||||
| Log Level | When | Example |
|
||||
|-----------|------|---------|
|
||||
| ERROR | Connection refused, model load failed | `VLM connection refused at /tmp/vlm.sock` |
|
||||
| WARN | Timeout, high latency | `VLM analyze timeout after 5000ms` |
|
||||
| INFO | Model loaded/unloaded, analysis result | `VLM loaded VILA1.5-3B (2800MB GPU). Analysis: "branch-covered structure"` |
|
||||
Reference in New Issue
Block a user