Decompose Step 6 snapshot: 140 task specs + contract docs

Closes out greenfield Step 6 (Decompose) for all 14 components
(C1-C13 + cross-cutting helpers/replay). Covers tasks AZ-266..AZ-446
plus the _dependencies_table.md and component contract documents.

State file updated to greenfield Step 7 (Implement), not_started.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
Oleksandr Bezdieniezhnykh
2026-05-11 00:39:48 +03:00
parent 8171fcb29e
commit 880eabcb3f
172 changed files with 22897 additions and 35 deletions
@@ -0,0 +1,214 @@
# Contract: `VprStrategy` Protocol + `BackbonePreprocessor` Protocol
**Owner**: c2_vpr (epic AZ-255 / E-C2)
**Producer task**: AZ-336 (`VprStrategy` Protocol + factory + composition)
**Consumer tasks**: AZ-337 (UltraVPR), AZ-338 (NetVLAD baseline), AZ-339 (MegaLoc + MixVPR), AZ-340 (SelaVPR + EigenPlaces + SALAD), AZ-341 (FAISS HNSW retrieve wiring), and downstream c2_5_rerank (AZ-256 / E-C2.5)
**Module-layout home**: `src/gps_denied_onboard/components/c2_vpr/interface.py` (Protocols), `src/gps_denied_onboard/components/c2_vpr/__init__.py` (re-exports), `src/gps_denied_onboard/runtime_root/vpr_factory.py` (factory)
**Status**: draft, awaiting AZ-336 implementation
## Purpose
Defines the public interface for every C2 VPR backbone strategy: `embed_query` produces a `VprQuery` from a `NavCameraFrame`, `retrieve_topk` runs the FAISS HNSW lookup against the C6-owned descriptor index, and `descriptor_dim` advertises the embedding dimensionality so the composition root can pre-validate index/strategy compatibility. Every concrete backbone (UltraVPR, NetVLAD, MegaLoc, MixVPR, SelaVPR, EigenPlaces, SALAD) implements this Protocol; the composition root selects exactly one at startup based on `config.vpr.strategy` and refuses to wire a strategy whose `BUILD_VPR_<variant>` flag is OFF (ADR-002 + ADR-009).
`BackbonePreprocessor` is the C2-internal helper Protocol for resize/crop/normalise per backbone's input contract. It lives next to the strategy (NOT in `helpers/`) because preprocessing parameters are tightly coupled to the backbone weights; sharing across backbones is forbidden — each strategy owns its own concrete preprocessor.
## Public API
### Protocol: `VprStrategy`
```python
from typing import Protocol, runtime_checkable
from gps_denied_onboard._types import NavCameraFrame, CameraCalibration, VprQuery, VprResult
@runtime_checkable
class VprStrategy(Protocol):
"""Single-camera visual place recognition strategy. Stateless per-frame; the only persistent state is the loaded backbone weights and the C6-owned FAISS index handle (passed in via constructor)."""
def embed_query(
self,
frame: NavCameraFrame,
calibration: CameraCalibration,
) -> VprQuery:
"""Run the backbone forward pass on the provided frame and return a `VprQuery` carrying the descriptor embedding.
Calibration is consumed for input preprocessing (resize / crop / normalise per the backbone's input contract — owned by the strategy's internal `BackbonePreprocessor`).
Raises:
VprBackboneError: backbone forward pass failed (CUDA OOM, TRT engine deserialize mismatch, etc.).
"""
...
def retrieve_topk(self, query: VprQuery, k: int) -> VprResult:
"""Run the FAISS HNSW top-K lookup against the corpus descriptor index.
The strategy holds the FAISS index handle (constructor-injected from C6's `TileStore` Public API). Top-K candidates are returned in ascending `descriptor_distance` order.
Raises:
IndexUnavailableError: FAISS index handle invalid (e.g., post-F8 reboot before warm-up, or out-of-band file replacement caught by the underlying mmap defence).
VprBackboneError: descriptor distance computation failed unexpectedly.
"""
...
def descriptor_dim(self) -> int:
"""Backbone embedding dimensionality (e.g., 512 for UltraVPR, 4096 for NetVLAD-VGG16). Stable for the strategy's lifetime; consumed by the composition root to pre-validate index compatibility (the C6 index file declares its own dim in its sidecar; mismatch → `ConfigurationError` at startup, NOT at first frame)."""
...
```
**Invariants** (every implementation MUST guarantee):
1. **Single-threaded by contract** — each instance is bound to one ingest thread (composition root enforces; concurrent `embed_query` calls on a single instance race the GPU stream).
2. **Stateless per-frame** — no implicit dependency on prior frames; reordering `embed_query` calls (which the live path NEVER does, but tests do) MUST yield identical embeddings.
3. **L2-normalised embeddings** — the `VprQuery.embedding` MUST be L2-normalised (via `helpers.descriptor_normaliser`) so cosine similarity aligns with Euclidean distance for FAISS HNSW lookup. Strategies that produce raw embeddings (e.g., NetVLAD) MUST normalise before returning.
4. **`retrieve_topk` returns exactly `k` candidates, sorted ascending by `descriptor_distance`** — never fewer, never more, never unordered. If the corpus has fewer than `k` tiles, the strategy raises `IndexUnavailableError` (production deployments stage corpora with ≥1000 tiles; `k=10`).
5. **`backbone_label` is non-empty** — every `VprResult` carries the strategy's name (e.g., `"ultra_vpr"`, `"net_vlad"`) for FDR provenance. This MUST match the `BUILD_VPR_<variant>` flag's lowercase form.
6. **`embed_query` and `retrieve_topk` are deterministic** — given the same frame + calibration + corpus, identical embeddings and identical top-K candidates (in identical order). This is required for the C2-IT-02 invariant test and post-flight forensics.
7. **`descriptor_dim()` is stable for the strategy's lifetime** — never changes after construction; the value reflects the loaded weights' output dim, NOT a config knob.
### DTOs (in `_types/vpr.py`)
```python
from dataclasses import dataclass
from uuid import UUID
import numpy as np
@dataclass(frozen=True, slots=True)
class VprQuery:
"""Backbone embedding for a single nav-camera frame. Produced by `VprStrategy.embed_query`; consumed by `VprStrategy.retrieve_topk` (same instance) or — in the C10 corpus-build path — by `DescriptorIndexBuilder` to populate the corpus descriptor matrix."""
frame_id: UUID
embedding: np.ndarray # shape (D,), dtype float16 or float32; L2-normalised
produced_at: int # monotonic_ns
@dataclass(frozen=True, slots=True)
class VprCandidate:
"""One retrieval candidate from the top-K result."""
tile_id: tuple # composite (zoomLevel, lat, lon); see C6 TileRecord
descriptor_distance: float # backbone-specific metric (cosine for L2-normalised; Euclidean for raw)
descriptor_dim: int
@dataclass(frozen=True, slots=True)
class VprResult:
"""Top-K candidates from `VprStrategy.retrieve_topk`. Consumed by C2.5 ReRanker."""
frame_id: UUID
candidates: list[VprCandidate] # length == k, sorted ascending by descriptor_distance
retrieved_at: int # monotonic_ns
backbone_label: str # non-empty; matches BUILD_VPR_<variant> lowercase
```
### Protocol: `BackbonePreprocessor` (C2-internal; lives in `c2_vpr/_preprocessor.py`)
```python
from typing import Protocol, runtime_checkable
from gps_denied_onboard._types import NavCameraFrame, CameraCalibration
import numpy as np
@runtime_checkable
class BackbonePreprocessor(Protocol):
"""Resize / crop / normalise per backbone's input contract. Each `VprStrategy` implementation owns its concrete preprocessor (NOT shared across backbones — preprocessing parameters are tightly coupled to weights)."""
def preprocess(
self,
frame: NavCameraFrame,
calibration: CameraCalibration,
) -> np.ndarray:
"""Return the preprocessed input tensor in the layout the backbone's forward pass expects (e.g., (1, 3, H, W) NCHW float16 for TRT).
Raises:
VprPreprocessError: input frame violates the backbone's contract (wrong colour channels, calibration mismatch).
"""
...
def input_shape(self) -> tuple[int, ...]:
"""The (H, W) resize target the backbone expects. Stable for the preprocessor's lifetime; consumed by tests to assert preprocessing fidelity."""
...
```
### Error Hierarchy (in `c2_vpr/errors.py`)
```python
class VprError(Exception):
"""Base for all C2 VPR errors. Caught at the runtime root; downstream effect: C5 falls back to VIO-only with provenance `visual_propagated` (AC-1.4)."""
class VprBackboneError(VprError):
"""Backbone forward pass failed (CUDA OOM, TRT engine deserialize mismatch, ONNX runtime IO mismatch). Logged at ERROR; per-occurrence FDR record."""
class VprPreprocessError(VprError):
"""Input frame violates backbone's preprocessing contract (wrong colour channels, calibration mismatch). Logged at ERROR; per-occurrence FDR record."""
class IndexUnavailableError(VprError):
"""FAISS index handle invalid (post-F8 reboot before warm-up; out-of-band file replacement). Logged at ERROR; recovery: F8 reboot path re-mmaps the index. Per C2-ST-01 the strategy MUST raise this rather than return stale candidates."""
```
## Composition-Root Factory
```python
# src/gps_denied_onboard/runtime_root/vpr_factory.py
from typing import TYPE_CHECKING
from gps_denied_onboard.config import Config
from gps_denied_onboard.components.c2_vpr import VprStrategy
from gps_denied_onboard.components.c6_tile_cache import TileStore
from gps_denied_onboard.components.c7_inference import InferenceRuntime
def build_vpr_strategy(
config: Config,
tile_store: TileStore,
inference_runtime: InferenceRuntime,
) -> VprStrategy:
"""Composition-root factory. Reads `config.vpr.strategy` and `config.vpr.backbone_weights_path`; lazy-imports the concrete strategy module gated by its CMake `BUILD_VPR_<variant>` flag; refuses to instantiate a strategy whose flag is OFF (raises `ConfigurationError` pointing at the offending strategy name + missing flag).
Strategy resolution table:
| config.vpr.strategy | Implementation | Module | Build flag |
|---------------------|----------------------|-----------------------------------------------|-------------------|
| "ultra_vpr" | UltraVprStrategy | components.c2_vpr.ultra_vpr | BUILD_VPR_ULTRA_VPR |
| "net_vlad" | NetVladStrategy | components.c2_vpr.net_vlad | BUILD_VPR_NETVLAD |
| "mega_loc" | MegaLocStrategy | components.c2_vpr.mega_loc | BUILD_VPR_MEGALOC |
| "mix_vpr" | MixVprStrategy | components.c2_vpr.mix_vpr | BUILD_VPR_MIXVPR |
| "sela_vpr" | SelaVprStrategy | components.c2_vpr.sela_vpr | BUILD_VPR_SELAVPR |
| "eigen_places" | EigenPlacesStrategy | components.c2_vpr.eigen_places | BUILD_VPR_EIGENPLACES |
| "salad" | SaladStrategy | components.c2_vpr.salad | BUILD_VPR_SALAD |
Pre-flight validation: after constructing the strategy, the factory queries `strategy.descriptor_dim()` and asserts it matches the C6 corpus index's declared `descriptor_dim` (read from the FAISS index sidecar). Mismatch → `ConfigurationError` at startup, NOT at first frame.
Returns a fully-constructed strategy ready for `embed_query` / `retrieve_topk` invocation. The caller (runtime root) is responsible for binding the instance to one ingest thread.
"""
...
```
## Versioning
- The `VprStrategy` Protocol's method signatures are part of the cross-component public API. Any change (new method, removed method, parameter rename, return-type change) is a major bump and requires updating every concrete implementation in lockstep.
- DTO field additions are minor (frozen dataclasses with new optional fields default to None); field removals are major.
- `BackbonePreprocessor` is C2-internal; backwards-compat is per-strategy, not cross-strategy.
## Test Cases (protocol conformance — runs against every concrete strategy)
| AC Ref | What to Test | Required Outcome |
|--------|-------------|-----------------|
| INV-1 (single-thread) | Concurrent `embed_query` from 2 threads on one instance | Documented as forbidden in test docstring; test asserts composition root rejects multi-thread binding |
| INV-2 (stateless) | `embed_query(frame_A)` then `embed_query(frame_B)` then `embed_query(frame_A)` again | First and third call return identical embeddings (bit-exact for float embeddings; ULP-tolerant for float16) |
| INV-3 (L2-normalised) | `||VprQuery.embedding||_2` after `embed_query` | Equal to 1.0 ± 1e-3 (tolerance for float16) |
| INV-4 (top-K size + order) | `retrieve_topk(query, k=10)` against a 100-tile fixture corpus | `len(candidates) == 10`; distances are non-strictly-ascending |
| INV-5 (backbone_label non-empty) | Every `VprResult` from `retrieve_topk` | `backbone_label` is a non-empty string and matches the strategy's `BUILD_VPR_<variant>` lowercase |
| INV-6 (deterministic) | `embed_query(same frame)` × 3 then `retrieve_topk(same query)` × 3 | All three pairs return bit-exact embeddings + identical top-K (tile_ids in same order) |
| INV-7 (descriptor_dim stable) | `descriptor_dim()` × 100 calls | Returns the same value every call |
| AC-2.1b (recall floor) | UltraVPR + NetVLAD on Derkachi normal-segment corpus | UltraVPR recall@10 ≥ 0.95; NetVLAD recall@10 ≥ 0.85 (engine rule check; AZ-338) |
| AC-NEW-7 (poisoned tile) | Top-1 distance to poisoned tile in NFT-SEC-01 corpus | Within AC-NEW-7 relaxed CI |
| C2-ST-01 (stale index) | Out-of-band corpus file replacement | `retrieve_topk` raises `IndexUnavailableError`; no candidates returned |
## Open Questions / Risks
- **Risk: backbone weights' descriptor_dim drifts across upstream code drops** (e.g., a new UltraVPR release changes embedding dim from 512 to 768). *Mitigation*: the factory's pre-flight `descriptor_dim()` × C6 sidecar match catches this at startup; the operator must rebuild the C6 corpus before the new weights can be used.
- **Risk: SALAD is mentioned in description.md but NOT in the original epic's child issues** — included here for completeness because module-layout.md `BUILD_VPR_<variant>` table lists SALAD. *Decision*: SALAD lives in AZ-340 (with SelaVPR + EigenPlaces). If the team decides SALAD is out of scope this cycle, that task drops one backbone with no other changes needed.