Transitioned the autodev state to phase 21, reflecting the completion of Step 5 and the drafting of Step 6 epics. Revised the architecture documentation to clarify the roles of the Tile Manager and its components, ensuring accurate representation of the system's operational flow. Updated glossary entries for Flight State and Operator to incorporate recent changes and enhance clarity on component interactions and responsibilities.
6.5 KiB
C2 — Visual Place Recognition
1. High-Level Overview
Purpose: given the current NavCameraFrame, retrieve the top-K=10 candidate satellite tiles from the pre-cached corpus by descriptor similarity. C2 owns the retrieval step; C2.5 narrows K=10 → N=3 via inlier-based re-rank.
Architectural Pattern: Strategy — VprStrategy interface; concrete implementations (UltraVPR primary, MegaLoc secondary, MixVPR / SelaVPR / EigenPlaces / NetVLAD / SALAD additional candidates) selected at startup by config (ADR-001); build-time gated per-implementation by BUILD_* flags (ADR-002); composition-root wired (ADR-009).
Upstream dependencies:
- Camera ingest thread →
NavCameraFrame(parallel fan-out with C1; same frame, distinct queue depth). - C7 InferenceRuntime → backbone forward pass (TRT/ONNX/PyTorch per active runtime).
- C6 DescriptorIndex → FAISS HNSW lookup over pre-cached tile descriptors.
- Camera calibration artifact — for backbone input preprocessing (resize/crop/normalise).
Downstream consumers:
- C2.5 ReRanker (consumes
VprResult).
2. Internal Interfaces
Interface: VprStrategy
| Method | Input | Output | Async | Error Types |
|---|---|---|---|---|
embed_query |
NavCameraFrame, CameraCalibration |
VprQuery |
No | VprBackboneError |
retrieve_topk |
VprQuery, k: int |
VprResult |
No | IndexUnavailableError, VprBackboneError |
descriptor_dim |
() |
int |
No | — |
Input DTOs:
NavCameraFrame: see C1 spec — same DTO
VprQuery:
frame_id: uuid (required)
embedding: ndarray[D, dtype=float16|float32] (required) — D depends on backbone
produced_at: monotonic_ns
Output DTOs:
VprResult:
frame_id: uuid
candidates: list[VprCandidate] (length = k, ranked by descriptor distance ascending)
retrieved_at: monotonic_ns
backbone_label: string — for FDR provenance
VprCandidate:
tile_id: composite (zoomLevel, lat, lon)
descriptor_distance: float — backbone-specific metric (cosine for L2-normalised embeddings)
descriptor_dim: int
3. External API Specification
Not applicable — internal-only component.
4. Data Access Patterns
Queries
| Query | Frequency | Hot Path | Index Needed |
|---|---|---|---|
| FAISS HNSW top-K=10 search | 3 Hz (per nav frame) | Yes | Yes — pre-built HNSW (C6) |
Caching Strategy
| Data | Cache Type | TTL | Invalidation |
|---|---|---|---|
| Backbone weights | TRT engine on disk + GPU resident | flight lifetime | Manifest content-hash gate (D-C10-3) at takeoff |
| FAISS HNSW index | mmap (C6 owns the file) | flight lifetime | Same as above |
Storage Estimates
C2 itself stores no persistent data; it consumes C6's descriptor index. Sizing belongs in C6.
Data Management
C2 is read-only against C6 during F3/F4/F6. Pre-flight, F1 triggers C10 (after C11 TileDownloader has populated C6) to call embed_query on every staged tile to populate the descriptor matrix consumed by C6.
5. Implementation Details
Algorithmic Complexity: HNSW search is O(log N) in corpus size for k=10; backbone forward pass is O(1) per frame (GPU-bound).
State Management: stateless per-frame; the only persistent state is the loaded backbone weights and the FAISS index pointer (held by C6 and passed in via constructor).
Key Dependencies:
| Library | Version | Purpose |
|---|---|---|
| FAISS (Python + C++) | upstream HEAD pinned per Plan-phase | HNSW retrieval; consumed via C6 |
| TensorRT | 10.3 (JetPack 6.2 pin) | Primary inference backend; consumed via C7 |
| ONNX Runtime + TRT EP | matches C7 | Fallback backend |
| PyTorch | matches simple-baseline track | FP16 baseline (NetVLAD / MixVPR mandatory) |
| UltraVPR (research code drop) | upstream HEAD pinned per Plan-phase | Documentary Lead PRIMARY backbone |
| MegaLoc, MixVPR, SelaVPR, EigenPlaces, NetVLAD | upstream HEAD pinned per Plan-phase | Secondary + mandatory simple-baselines |
Error Handling Strategy:
VprBackboneError: backbone forward pass failed (CUDA OOM, TRT engine deserialize mismatch). C2 emits noVprResult; C5 falls back to VIO-only with provenance labelvisual_propagated(AC-1.4).IndexUnavailableError: FAISS index handle invalid (e.g., post-F8 reboot before warm-up). Same fallback as above; F8 recovery flow re-mmaps the index.
6. Extensions and Helpers
| Helper | Purpose | Used By |
|---|---|---|
BackbonePreprocessor |
resize / crop / normalise per backbone's input contract | C2 only — keep inside the component, not a shared helper |
DescriptorNormaliser |
L2-normalise descriptors so cosine similarity aligns with Euclidean | C2 (query side), C10 (corpus side at cache artifact build) |
7. Caveats & Edge Cases
Known limitations:
- VPR is sensitive to scene change between cache build and flight time — AC-NEW-6 freshness gating is the project-level mitigation, not a C2 concern.
- Backbone choice is constrained by ADR-002: only the linked-in implementations are selectable at runtime.
Potential race conditions:
- Concurrent
embed_querycalls on a single strategy instance can race on the GPU stream. Bind one strategy instance to one ingest thread — composition root enforces.
Performance bottlenecks:
- Backbone forward pass is the dominant cost (~30–80 ms on Jetson per backbone). FAISS HNSW search is sub-millisecond for 100k-tile corpora.
- D-CROSS-LATENCY-1 hybrid does not change C2 behaviour — C2's budget is fixed; the auto-degrade happens at C4.
8. Dependency Graph
Must be implemented after: C6 (descriptor index), C7 (inference runtime), C10 (descriptor population at cache artifact build).
Can be implemented in parallel with: C1, C8 — independent paths.
Blocks: C2.5 (no candidates without VprResult), F3 / F6.
9. Logging Strategy
| Log Level | When | Example |
|---|---|---|
| ERROR | VprBackboneError or IndexUnavailableError |
VPR backbone OOM: backbone=ultravpr, frame=12345 |
| WARN | top-1 distance exceeds drift threshold (potential false-positive retrieval) | VPR top-1 distance 0.42 above warn threshold 0.30; backbone=ultravpr |
| INFO | Strategy ready; backbone loaded | VPR ready: backbone=ultravpr, dim=512, corpus_size=87654 |
| DEBUG | Per-frame top-K distances | VPR frame=12345 top10_distances=[0.12, 0.14, ...] |
Log format: structured JSON. Log storage: stdout / journald / FDR via C13 (ERROR + WARN only).