Greenfield Steps 1-6 baseline for the autopilot rewrite from legacy Qt/C++ to a Rust workspace. - Remove legacy Qt/C++ tree (ai_controller, drone_controller, misc/camera, python_scaffold, root Dockerfile, autopilot.pro, legacy main.py / requirements.txt). - Add _docs/00_problem (problem, restrictions, acceptance criteria, security approach, input data + fixtures). - Add _docs/01_solution/solution_draft01. - Add _docs/02_document (architecture, system-flows, data_model, glossary, decision-rationale, deployment, 13 component descriptions, tests/ specs, FINAL_report, module-layout). - Add _docs/02_tasks/todo with 47 task specs (AZ-640..AZ-686, one bootstrap + 46 component tasks) and _dependencies_table.md. - Add .cursor/rules/artifact-srp.mdc (single-responsibility rule for canonical _docs artifacts). - Track autodev state in _docs/_autodev_state.md (Step 6 completed, ready for Step 7 Implement). Jira: bootstrap AZ-626; component epics AZ-627..AZ-639; tasks AZ-640..AZ-686. Total complexity 173 points across 12 epics. Co-authored-by: Cursor <cursoragent@cursor.com>
4.6 KiB
Component — vlm_client (optional)
Layer: Perception (data plane in) Status: forward-looking design (Rust); optional behind a feature flag and a runtime config flag
1. Purpose
Tier 3 of the perception pipeline. Asks a local NanoLLM/VILA1.5-3B process to confirm a zoom-in endpoint POI using one bounded ROI crop and a short prompt. Returns a structured VlmAssessment. The free-form VLM text is not a downstream API contract — only the validated structured output is.
VLM is optional; the system MUST function correctly when VLM is disabled or absent.
2. Inputs
| Input | Source | Cadence | Notes |
|---|---|---|---|
| Zoom-in ROI crop + prompt | scan_controller |
per zoom-in endpoint hold | One bounded crop, short prompt, short answer. |
vlm_enabled runtime flag |
startup config | once at start (re-readable on SIGHUP if implemented) | Gates whether scan_controller calls this component at all. |
| IPC socket path | startup config | once | Unix-domain socket to the NanoLLM process. |
3. Outputs
| Output | Consumer | Shape |
|---|---|---|
VlmAssessment |
scan_controller |
{ label, confidence, status: ok | inconclusive | timeout | schema_invalid | ipc_error | disabled, source_roi_id, latency_ms, model_version } |
| Health metric | health aggregator | enabled, vlm_latency_p50/p99, errors_by_kind, peer_cred_check_pass_rate. |
4. Key Responsibilities
- Validate the ROI payload (size, format) before sending it across the IPC channel.
- Maintain the Unix-domain-socket connection to the NanoLLM process; perform a peer-credential check on connect (where supported by the platform).
- Send one bounded ROI + short prompt; await one short response within ≤5 s.
- Validate the response against the
VlmAssessmentschema; on schema-invalid, returnstatus: schema_invalidtoscan_controllerand surface to health. - Return
status: disabledwhen the runtime flag isfalse;scan_controllertreats this identically to "VLM not present" and proceeds with Tier 2 evidence alone. - Capture
model_version(whatever the NanoLLM process reports for its loaded weights) on every assessment for forensic correlation; log the version on change.
5. Internal State
- IPC socket handle and peer-credential cache.
- In-flight request map (request id → caller).
State is in-process only.
6. Failure Modes
| Failure | Detection | Behaviour |
|---|---|---|
| VLM process not reachable | connect / send error | Return status: ipc_error; bounded-backoff reconnect; health → yellow then red. |
| Peer-cred check fails | platform API | Hard-fail the connect; do not retry without operator intervention; health → red. |
| Response timeout (>5 s) | wall-clock | Return status: timeout; do not block scan_controller past the budget. |
| Schema-invalid response | response parser | Return status: schema_invalid; log the raw response (size-capped) for offline analysis. |
| ROI payload too large | pre-send size check | Return status: schema_invalid synchronously; never send. |
| Optional component absent at build time | feature flag off at compile | scan_controller depends only on the VlmAssessment provider trait; the default impl returns status: disabled. The binary builds and runs identically without vlm_client. |
7. Dependencies
In-process: scan_controller.
External: NanoLLM / VILA1.5-3B local process. IPC over Unix-domain socket. No network egress.
8. Non-Functional Targets
| Concern | Target |
|---|---|
| Per-ROI latency | ≤5 s p99 |
| Memory budget | within the 6 GB shared budget after Tier 1 + Tier 2 |
| Cloud egress | none (hard rule) |
| Failure mode | fail-closed — never surface a POI with VLM evidence on a degraded VLM call |
9. Optionality Model
Two complementary mechanisms; the implementation chooses one or both:
- Runtime flag (
vlm_enabled) gated by the benchmark-gate result. Whenfalse,scan_controllerskips VLM confirmation; the zoom-in hold proceeds with Tier 2 evidence alone. - Build-time feature module.
vlm_clientis a separate Cargo feature; the binary builds, links, and runs identically when the feature is off.scan_controllerdepends on aVlmAssessmentProvidertrait whose default impl returnsstatus: disabled.
Both must yield the same observable behaviour: the system functions correctly with VLM absent, only losing the zoom-in confirmation step.
10. References
architecture.md §5 Architectural Principles(no cloud egress, fail-closed),§7.6 Local VLM confirmation.system-flows.md §F3 VLM confirmation(with explicit fail-closed and disabled branches).data_model.md §VlmAssessment.