Files
autopilot/_docs/02_document/components/vlm_client/description.md
T
Oleksandr Bezdieniezhnykh bc40ea7300 [AZ-626] Decompose complete: 47 tasks + docs + module layout
Greenfield Steps 1-6 baseline for the autopilot rewrite from legacy
Qt/C++ to a Rust workspace.

- Remove legacy Qt/C++ tree (ai_controller, drone_controller,
  misc/camera, python_scaffold, root Dockerfile, autopilot.pro,
  legacy main.py / requirements.txt).
- Add _docs/00_problem (problem, restrictions, acceptance criteria,
  security approach, input data + fixtures).
- Add _docs/01_solution/solution_draft01.
- Add _docs/02_document (architecture, system-flows, data_model,
  glossary, decision-rationale, deployment, 13 component descriptions,
  tests/ specs, FINAL_report, module-layout).
- Add _docs/02_tasks/todo with 47 task specs (AZ-640..AZ-686, one
  bootstrap + 46 component tasks) and _dependencies_table.md.
- Add .cursor/rules/artifact-srp.mdc (single-responsibility rule for
  canonical _docs artifacts).
- Track autodev state in _docs/_autodev_state.md (Step 6 completed,
  ready for Step 7 Implement).

Jira: bootstrap AZ-626; component epics AZ-627..AZ-639; tasks
AZ-640..AZ-686. Total complexity 173 points across 12 epics.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-19 11:02:01 +03:00

4.6 KiB

Component — vlm_client (optional)

Layer: Perception (data plane in) Status: forward-looking design (Rust); optional behind a feature flag and a runtime config flag

1. Purpose

Tier 3 of the perception pipeline. Asks a local NanoLLM/VILA1.5-3B process to confirm a zoom-in endpoint POI using one bounded ROI crop and a short prompt. Returns a structured VlmAssessment. The free-form VLM text is not a downstream API contract — only the validated structured output is.

VLM is optional; the system MUST function correctly when VLM is disabled or absent.

2. Inputs

Input Source Cadence Notes
Zoom-in ROI crop + prompt scan_controller per zoom-in endpoint hold One bounded crop, short prompt, short answer.
vlm_enabled runtime flag startup config once at start (re-readable on SIGHUP if implemented) Gates whether scan_controller calls this component at all.
IPC socket path startup config once Unix-domain socket to the NanoLLM process.

3. Outputs

Output Consumer Shape
VlmAssessment scan_controller { label, confidence, status: ok | inconclusive | timeout | schema_invalid | ipc_error | disabled, source_roi_id, latency_ms, model_version }
Health metric health aggregator enabled, vlm_latency_p50/p99, errors_by_kind, peer_cred_check_pass_rate.

4. Key Responsibilities

  • Validate the ROI payload (size, format) before sending it across the IPC channel.
  • Maintain the Unix-domain-socket connection to the NanoLLM process; perform a peer-credential check on connect (where supported by the platform).
  • Send one bounded ROI + short prompt; await one short response within ≤5 s.
  • Validate the response against the VlmAssessment schema; on schema-invalid, return status: schema_invalid to scan_controller and surface to health.
  • Return status: disabled when the runtime flag is false; scan_controller treats this identically to "VLM not present" and proceeds with Tier 2 evidence alone.
  • Capture model_version (whatever the NanoLLM process reports for its loaded weights) on every assessment for forensic correlation; log the version on change.

5. Internal State

  • IPC socket handle and peer-credential cache.
  • In-flight request map (request id → caller).

State is in-process only.

6. Failure Modes

Failure Detection Behaviour
VLM process not reachable connect / send error Return status: ipc_error; bounded-backoff reconnect; health → yellow then red.
Peer-cred check fails platform API Hard-fail the connect; do not retry without operator intervention; health → red.
Response timeout (>5 s) wall-clock Return status: timeout; do not block scan_controller past the budget.
Schema-invalid response response parser Return status: schema_invalid; log the raw response (size-capped) for offline analysis.
ROI payload too large pre-send size check Return status: schema_invalid synchronously; never send.
Optional component absent at build time feature flag off at compile scan_controller depends only on the VlmAssessment provider trait; the default impl returns status: disabled. The binary builds and runs identically without vlm_client.

7. Dependencies

In-process: scan_controller.

External: NanoLLM / VILA1.5-3B local process. IPC over Unix-domain socket. No network egress.

8. Non-Functional Targets

Concern Target
Per-ROI latency ≤5 s p99
Memory budget within the 6 GB shared budget after Tier 1 + Tier 2
Cloud egress none (hard rule)
Failure mode fail-closed — never surface a POI with VLM evidence on a degraded VLM call

9. Optionality Model

Two complementary mechanisms; the implementation chooses one or both:

  1. Runtime flag (vlm_enabled) gated by the benchmark-gate result. When false, scan_controller skips VLM confirmation; the zoom-in hold proceeds with Tier 2 evidence alone.
  2. Build-time feature module. vlm_client is a separate Cargo feature; the binary builds, links, and runs identically when the feature is off. scan_controller depends on a VlmAssessmentProvider trait whose default impl returns status: disabled.

Both must yield the same observable behaviour: the system functions correctly with VLM absent, only losing the zoom-in confirmation step.

10. References

  • architecture.md §5 Architectural Principles (no cloud egress, fail-closed), §7.6 Local VLM confirmation.
  • system-flows.md §F3 VLM confirmation (with explicit fail-closed and disabled branches).
  • data_model.md §VlmAssessment.