# C2 UltraVPR Primary Backbone **Task**: AZ-337_c2_ultra_vpr **Name**: C2 UltraVPR Primary Backbone (TRT) **Description**: Implement `UltraVprStrategy`, the production-default `VprStrategy` (per ADR-001 default selection). UltraVPR is the Documentary Lead's PRIMARY backbone selected at config time and ON in airborne / research / replay-cli binaries (per ADR-002 build-time exclusion map). Wraps the upstream UltraVPR research code drop, exposes its forward pass via the C7 `InferenceRuntime` (TensorRT 10.3 primary, ONNX-Runtime fallback), and produces L2-normalised float16 embeddings (D=512 typical) for FAISS HNSW retrieval. Includes the concrete `UltraVprBackbonePreprocessor` (resize / centre-crop / mean-std normalise per UltraVPR's input contract). The strategy MUST satisfy the AC-2.1b recall@10 ≥ 0.95 floor on the Derkachi normal segment and the C2-PT-01 latency budget (`embed_query` p95 ≤ 60 ms). **Complexity**: 5 points **Dependencies**: AZ-336_c2_vpr_strategy_protocol, AZ-263_initial_structure, AZ-269_config_loader, AZ-298_c7_tensorrt_runtime, AZ-303_c6_storage_interfaces, AZ-283_descriptor_normaliser, AZ-281_engine_filename_schema, AZ-321_c10_engine_compiler (engine compile path; UltraVPR engine is one of the engines C10 builds), AZ-266_log_module, AZ-272_fdr_record_schema **Component**: c2_vpr (epic AZ-255 / E-C2) **Tracker**: AZ-337 **Epic**: AZ-255 (E-C2) ### Document Dependencies - `_docs/02_document/contracts/c2_vpr/vpr_strategy_protocol.md` — Protocol contract this task implements (every invariant MUST be satisfied). - `_docs/02_document/components/02_c2_vpr/description.md` — § 1 PRIMARY backbone designation; § 2 interface; § 5 backbone weights ≤ 600 MB GPU; § 7 GPU stream race notes; § 9 logging. - `_docs/02_document/module-layout.md` — `c2_vpr` Per-Component Mapping (`ultra_vpr.py` Internal); `BUILD_VPR_ULTRA_VPR` row in build-time exclusion map (ON for airborne/research/replay-cli, OFF for operator-tooling). - `_docs/02_document/contracts/c7_inference/inference_runtime_protocol.md` — `InferenceRuntime` interface (engine load, forward pass, output extraction). - `_docs/02_document/contracts/shared_helpers/descriptor_normaliser.md` — L2 normalisation contract (UltraVPR raw embeddings are NOT L2-normalised; this task MUST normalise). - `_docs/02_document/contracts/shared_helpers/engine_filename_schema.md` — TensorRT engine filename → metadata extraction. - `_docs/02_document/components/02_c2_vpr/tests.md` — C2-IT-01 (recall@10 ≥ 0.95 on Derkachi); C2-IT-02 (`VprResult` invariants); C2-PT-01 (`embed_query` p95 ≤ 60 ms; combined ≤ 65 ms; ≤ 600 MB GPU; ≤ 200 MB sys mem). ## Problem UltraVPR is the production-default backbone (description.md § 1 "Documentary Lead PRIMARY backbone"). Without this task: - The composition root has no concrete strategy to wire when `config.vpr.strategy = "ultra_vpr"` (the default value); the airborne binary cannot start. - AC-2.1b (recall@10 ≥ 0.95 on Derkachi) — the highest-priority C2 acceptance criterion — has no producer; the suite-level FT-P-19 satellite re-loc test cannot pass. - AC-4.1 latency budget for VPR is allocated against UltraVPR specifically (60 ms `embed_query`); without the TRT-backed implementation, the budget is unconsumable and the E2E latency target (400 ms p95) cannot be validated. - The 600 MB GPU memory ceiling for backbone weights is enforced at the implementation layer; without it, no operator can validate the airborne deployment fits the Tier-1 Jetson Orin's GPU memory budget. - UltraVPR has a non-trivial input preprocessing contract (specific resize target, centre-crop, ImageNet mean/std normalisation, FP16 cast); without `UltraVprBackbonePreprocessor`, every consumer would re-derive the contract → silent recall regression. ## Outcome - `src/gps_denied_onboard/components/c2_vpr/ultra_vpr.py` defining: - `UltraVprStrategy` class implementing the `VprStrategy` Protocol (AZ-336). - Constructor signature: `__init__(self, runtime: InferenceRuntime, tile_store: TileStore, weights_path: Path, preprocessor: UltraVprBackbonePreprocessor, normaliser: DescriptorNormaliser, fdr_client: FdrClient)`. - `embed_query(frame, calibration)`: 1. `tensor = self._preprocessor.preprocess(frame, calibration)` (returns FP16 NCHW (1, 3, H, W)). 2. `raw = self._runtime.forward(self._engine_id, {"input": tensor})["embedding"]` (returns FP16 (1, 512)). 3. `embedding = self._normaliser.l2_normalise(raw[0])` (returns FP16 (512,) with `||embedding||_2 == 1.0 ± 1e-3`). 4. Return `VprQuery(frame_id, embedding, produced_at=monotonic_ns())`. 5. Catch RuntimeError / CudaError → wrap in `VprBackboneError`; emit ERROR log + FDR record `kind="vpr.backbone_error"`. - `retrieve_topk(query, k)`: 1. `distances, tile_ids = self._tile_store.faiss_topk(query.embedding, k)` (delegates to C6 TileStore Public API). 2. Build `[VprCandidate(tile_id, distance, descriptor_dim=512) for ...]`. 3. Return `VprResult(query.frame_id, candidates, retrieved_at=monotonic_ns(), backbone_label="ultra_vpr")`. 4. On `IndexUnavailableError` (raised by C6 TileStore on stale handle), re-raise unchanged. - `descriptor_dim() -> int`: returns 512 (the UltraVPR research code drop's published embedding dim; the value is asserted at engine-load time against the engine's output tensor shape; mismatch → `RuntimeError` at startup). - Module-level `create(config, tile_store, inference_runtime) -> VprStrategy`: 1. Resolve `weights_path = config.vpr.backbone_weights_path` (a TensorRT engine file produced by C10's engine compiler — AZ-321 — with the AZ-281 self-describing filename schema). 2. Construct `UltraVprBackbonePreprocessor(input_shape=(384, 384), mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))` (parameters from the upstream UltraVPR config, hard-coded here per CONST.SRP — these are weights-coupled, not config-knobs). 3. Construct `DescriptorNormaliser` (or fetch from helpers; AZ-283). 4. Load engine via `inference_runtime.load_engine(weights_path)` — the engine ID is captured for later `forward` calls. 5. Assert engine output shape == `(1, 512)` FP16; mismatch → `ConfigurationError`. 6. Construct and return `UltraVprStrategy(...)`. - `src/gps_denied_onboard/components/c2_vpr/_preprocessor_ultra_vpr.py` (or `_preprocessor.py` shared scaffolding + concrete `UltraVprBackbonePreprocessor`): - Implements `BackbonePreprocessor` Protocol from AZ-336. - `preprocess(frame, calibration)`: 1. Decode `frame.image_bytes` to RGB uint8 ndarray (H_in, W_in, 3) via OpenCV / Pillow. 2. Centre-crop to a square region of side `min(H_in, W_in)` using calibration's principal point if non-centre (otherwise geometric centre). Calibration is consumed here for principal-point alignment per the upstream UltraVPR contract; if calibration is absent, fall back to geometric centre with a WARN log. 3. Resize to `(384, 384)` via OpenCV `INTER_AREA` for downscale, `INTER_CUBIC` for upscale. 4. Normalise: `(pixel/255.0 - mean) / std` per channel; cast to FP16. 5. Transpose HWC → CHW; add batch dim → NCHW. 6. Return ndarray of shape `(1, 3, 384, 384)` dtype float16. - `input_shape() -> tuple[int, ...]`: returns `(384, 384)`. - On any preprocessing failure (corrupt image bytes, calibration mismatch), raise `VprPreprocessError` and emit ERROR log + FDR record `kind="vpr.preprocess_error"`. - Composition-root wiring: `runtime_root.compose_root` includes a path that, when `config.vpr.strategy == "ultra_vpr"`, calls `UltraVprStrategy.create(config, tile_store, inference_runtime)` via the AZ-336 factory. - Logging per description.md § 9: - INFO `kind="c2.vpr.ready"` with `{strategy: "ultra_vpr", descriptor_dim: 512, corpus_size: }` after engine load. - WARN `kind="c2.vpr.top1_distance_above_threshold"` if top-1 distance > `config.vpr.warn_top1_threshold` (default 0.30). - ERROR `kind="c2.vpr.backbone_error"` and `kind="c2.vpr.preprocess_error"` per error path. - DEBUG `kind="c2.vpr.frame_distances"` with top-K distances per frame (gated by config; off by default to avoid log volume at 3 Hz). - FDR records emitted: `kind="vpr.embed_query"` (per frame, with frame_id + backbone_label + bbox of distances), `kind="vpr.backbone_error"` and `kind="vpr.preprocess_error"` (per error). ## Scope ### Included - `UltraVprStrategy` class implementing the `VprStrategy` Protocol exactly per the AZ-336 contract. - `UltraVprBackbonePreprocessor` implementing `BackbonePreprocessor` Protocol with the upstream UltraVPR's published preprocessing parameters. - Module-level `create(config, tile_store, inference_runtime)` factory entry-point. - Engine-output-shape assertion at load time (`(1, 512)` FP16); mismatch → `ConfigurationError`. - L2-normalisation of every embedding via the AZ-283 `DescriptorNormaliser` helper. - Composition-root wiring path for `config.vpr.strategy == "ultra_vpr"`. - Logging per description.md § 9 (INFO ready, WARN top-1-above-threshold, ERROR error paths, DEBUG per-frame distances). - FDR record emission for embed-query and error paths. - Unit tests covering all 7 invariants (INV-1..INV-7), the engine-output-shape assertion, the preprocessing contract, the L2-normalisation post-condition, the composition-root wiring path. - `BUILD_VPR_ULTRA_VPR` CMake flag wiring (per ADR-002): the strategy module is excluded from the operator-tooling binary. ### Excluded - The `VprStrategy` Protocol + `BackbonePreprocessor` Protocol + DTOs + errors + factory — owned by AZ-336. - The `DescriptorNormaliser` helper — already AZ-283. - The C7 `InferenceRuntime` (engine load + forward pass) — owned by AZ-298 (TensorRT runtime). - The C6 `TileStore.faiss_topk` query — owned by AZ-303 / AZ-306; this task consumes the Public API. - Engine compile (`.onnx` → `.trt`) — owned by AZ-321 (`c10_engine_compiler`); this task consumes the produced `.trt` engine via `config.vpr.backbone_weights_path`. - Other backbones — AZ-338 (NetVLAD), AZ-339 (MegaLoc + MixVPR), AZ-340 (SelaVPR + EigenPlaces + SALAD). - FAISS HNSW wiring at the strategy level — `retrieve_topk` delegates to `tile_store.faiss_topk`; the FAISS index lifecycle (mmap, sidecar verify, handle invalidation) is owned by AZ-341. - Component-internal tests beyond Protocol + invariants + preprocessing-contract: C2-IT-01 (recall@10 acceptance test), C2-IT-03 (poisoned-tile), C2-IT-04 (scale-ratio), C2-PT-01 (latency NFR), C2-ST-01 (stale handle) are deferred to Step 9 / E-BBT. ## Acceptance Criteria **AC-1: Protocol conformance** Given a constructed `UltraVprStrategy` instance When `isinstance(strategy, VprStrategy)` is evaluated Then the result is `True`; the instance has `embed_query`, `retrieve_topk`, `descriptor_dim` **AC-2: `embed_query` produces L2-normalised FP16 (512,) embedding** Given a valid `NavCameraFrame` and `CameraCalibration` When `strategy.embed_query(frame, calibration)` is called Then a `VprQuery` is returned with `embedding.shape == (512,)`, `embedding.dtype == np.float16`, `||embedding||_2 == 1.0 ± 1e-3` **AC-3: `embed_query` is deterministic (INV-2 + INV-6)** Given the same frame + calibration When `embed_query` is called 3 times Then all three returns have bit-exact `embedding` arrays (ULP-tolerant for FP16); `frame_id` and `produced_at` differ across calls but `embedding` does not **AC-4: `retrieve_topk` returns exactly k candidates sorted ascending** Given a corpus of 100 tiles loaded into C6 TileStore + a constructed `VprQuery` When `strategy.retrieve_topk(query, k=10)` is called Then `len(candidates) == 10`; `[c.descriptor_distance for c in candidates]` is non-strictly-ascending; `backbone_label == "ultra_vpr"`; `candidates[0].descriptor_dim == 512` **AC-5: `descriptor_dim()` is stable and returns 512** Given a constructed `UltraVprStrategy` When `descriptor_dim()` is called 100 times Then every call returns `512` **AC-6: Engine output shape mismatch at load → `ConfigurationError`** Given a TRT engine whose output tensor shape is `(1, 256)` (not 512) When `UltraVprStrategy.create(config, tile_store, inference_runtime)` is called Then `ConfigurationError` is raised with message containing `"engine output shape mismatch: expected (1, 512), got (1, 256)"`; the strategy is NOT instantiated **AC-7: `VprBackboneError` on forward-pass failure** Given an `InferenceRuntime` test double that raises `RuntimeError` from `forward` When `strategy.embed_query(frame, calibration)` is called Then `VprBackboneError` is raised; ONE ERROR log `kind="c2.vpr.backbone_error"` is emitted; ONE FDR record `kind="vpr.backbone_error"` is emitted **AC-8: `VprPreprocessError` on corrupt image bytes** Given a `NavCameraFrame` with malformed `image_bytes` (not decodable) When `strategy.embed_query(frame, calibration)` is called Then `VprPreprocessError` is raised; ONE ERROR log `kind="c2.vpr.preprocess_error"` is emitted; ONE FDR record `kind="vpr.preprocess_error"` is emitted **AC-9: Calibration absent → centre-crop falls back to geometric centre + WARN log** Given a frame with `calibration = None` (or `calibration.principal_point` absent) When `embed_query(frame, calibration)` is called Then preprocessing succeeds with geometric-centre crop; ONE WARN log `kind="c2.vpr.calibration_missing"` is emitted; the embedding is L2-normalised (AC-2 still holds) **AC-10: `IndexUnavailableError` propagated unchanged from `retrieve_topk`** Given a C6 `TileStore` test double that raises `IndexUnavailableError` from `faiss_topk` When `strategy.retrieve_topk(query, k=10)` is called Then `IndexUnavailableError` is raised unchanged (NOT wrapped); no candidates returned **AC-11: Composition-root wiring — `config.vpr.strategy = "ultra_vpr"`** Given `config.vpr.strategy = "ultra_vpr"` AND a valid weights_path AND matching `descriptor_dim` in C6 sidecar When `compose_root(config)` runs Then a `UltraVprStrategy` instance is wired into the runtime root; the AZ-336 factory's pre-flight `descriptor_dim` validation passes; ONE INFO log `kind="c2.vpr.ready"` with `{strategy: "ultra_vpr", descriptor_dim: 512, corpus_size: }` is emitted **AC-12: WARN log on top-1 distance above threshold** Given `config.vpr.warn_top1_threshold = 0.30` AND a `VprResult` whose top-1 `descriptor_distance = 0.42` When `retrieve_topk` returns Then ONE WARN log `kind="c2.vpr.top1_distance_above_threshold"` with structured field `{distance: 0.42, threshold: 0.30}` is emitted ## Non-Functional Requirements **Performance** (deferred validation to C2-PT-01 / E-BBT; this task delivers the implementation): - `embed_query` p95 ≤ 60 ms on Tier-1 Jetson Orin with TensorRT 10.3 FP16 — bounded by the TRT engine forward-pass time + preprocessing overhead. The preprocessing path itself MUST be ≤ 5 ms p95 (so the TRT call has ~55 ms budget). - `retrieve_topk` p95 ≤ 2 ms — bounded by C6 FAISS HNSW; this task contributes only the Python wrapping overhead. - GPU memory: ≤ 600 MB resident for backbone weights (FP16 engine ~ 100-150 MB; remainder is workspace). - System memory: ≤ 200 MB for the mmap'd FAISS index handle (C6 owns this; this task consumes). **Compatibility** - The TRT engine file format is owned by C10 / C7; this task consumes the produced `.trt` engine via `config.vpr.backbone_weights_path`. Engine version mismatches surface via the AZ-281 self-describing filename schema; the C7 `load_engine` enforces compatibility. - The upstream UltraVPR research code drop is pinned per Plan-phase; weight-format changes between drops would require a new engine build (C10) and a re-run of C2-IT-01 to confirm recall@10 still passes. **Reliability** - Strategy is single-threaded by contract (INV-1, AZ-336); composition root binds to one ingest thread. - L2-normalisation is unconditional (INV-3); raw UltraVPR embeddings are not L2-normalised by the upstream forward pass. - `VprBackboneError` does not crash the process; downstream C5 falls back to VIO-only with provenance `visual_propagated` (AC-1.4). ## Unit Tests | AC Ref | What to Test | Required Outcome | |--------|-------------|-----------------| | AC-1 | `isinstance(UltraVprStrategy(...), VprStrategy)` | `True` | | AC-2 | `embed_query` output | shape (512,), dtype float16, L2-norm == 1.0 ± 1e-3 | | AC-3 | `embed_query` × 3 same frame | bit-exact embeddings (ULP-tolerant FP16) | | AC-4 | `retrieve_topk` against fixture corpus | `len == 10`, sorted ascending, `backbone_label == "ultra_vpr"`, `descriptor_dim == 512` | | AC-5 | `descriptor_dim()` × 100 | always 512 | | AC-6 | TRT engine with wrong output shape | `ConfigurationError` at create time | | AC-7 | `InferenceRuntime.forward` raises | `VprBackboneError`; ERROR log + FDR record | | AC-8 | malformed `image_bytes` | `VprPreprocessError`; ERROR log + FDR record | | AC-9 | `calibration = None` | preprocessing succeeds with geometric centre; WARN log | | AC-10 | `tile_store.faiss_topk` raises `IndexUnavailableError` | propagated unchanged | | AC-11 | `compose_root(config="ultra_vpr")` | wired; INFO log with `{strategy, descriptor_dim, corpus_size}` | | AC-12 | top-1 distance > threshold | WARN log emitted | | Preprocess-shape | `preprocessor.preprocess(frame)` output | shape `(1, 3, 384, 384)`, dtype float16 | | Preprocess-mean-std | preprocessing on a uniform-grey image | per-channel `(grey - mean) / std` matches expected to ULP | | Preprocess-input-shape | `preprocessor.input_shape()` | returns `(384, 384)` | ## Constraints - **The `BackbonePreprocessor` instance for UltraVPR lives next to the strategy, NOT in `helpers/`** — preprocessing parameters are weights-coupled (description.md § 6 "C2-internal helper, NOT a shared helper"). - **Preprocessing parameters are hard-coded** — `(384, 384)` resize target, `(0.485, 0.456, 0.406)` ImageNet mean, `(0.229, 0.224, 0.225)` ImageNet std. These are weights-coupled per the upstream UltraVPR contract; making them config-knobs would let an operator silently break the AC-2.1b recall floor. - **L2-normalisation is mandatory** even though some downstream code paths are robust to non-normalised embeddings — INV-3 from the contract is non-negotiable. - **Engine load happens at `create` time, NOT at first frame** — the engine-output-shape assertion (AC-6) MUST fire at startup. - **The strategy holds the engine ID returned by `inference_runtime.load_engine`, NOT the engine itself** — engine lifecycle is owned by C7. - **Constructor injection only** — no `import gps_denied_onboard.config` inside the strategy module; config is consumed via the `create` factory. - **No GPU operations outside `embed_query`** — `__init__` does the engine load (one-time cost), `embed_query` does the per-frame forward pass; nothing else touches the GPU stream. ## Risks & Mitigation **Risk 1: UltraVPR upstream code drop ships an unsupported ONNX op** - *Risk*: The TRT 10.3 ONNX importer doesn't support a custom op in UltraVPR's graph; engine compilation fails at C10 stage. - *Mitigation*: Engine compile is C10's responsibility (AZ-321). This task consumes the produced engine and assumes it's loadable. If C10 cannot build the engine, the strategy cannot be wired — a hard upstream blocker that surfaces during AZ-321 implementation, NOT here. **Risk 2: FP16 precision insufficient for AC-2.1b recall@10 ≥ 0.95** - *Risk*: FP16 quantisation degrades embedding fidelity below the recall floor on the Derkachi corpus. - *Mitigation*: C2-IT-01 (deferred to Step 9) is the validation gate. If FP16 fails, the operator can fall back to FP32 by rebuilding the engine via C10 with `precision=fp32` — this is a config-time decision, NOT a code change in this task. The strategy treats FP16 vs FP32 as transparent (the engine output dtype is asserted at load time; embedding dtype follows the engine). **Risk 3: Centre-crop with calibration's principal point introduces non-determinism if calibration changes mid-flight** - *Risk*: An operator hot-swaps calibration during flight; embeddings shift; recall drops silently. - *Mitigation*: Calibration changes mid-flight are forbidden by the broader F1 / F2 / F3 lifecycle (calibration is loaded once per flight at takeoff). If a future cycle adds hot-swap support, a separate task adds calibration-versioning to embeddings. **Risk 4: Per-frame DEBUG log volume at 3 Hz × 10 distances = 30 entries/sec** - *Risk*: Default-on DEBUG logging floods journald. - *Mitigation*: DEBUG `kind="c2.vpr.frame_distances"` is gated by `config.vpr.debug_per_frame_distances` (default `false`); operators enable it only for forensic investigation of a specific flight. **Risk 5: WARN-threshold default (0.30) needs calibration** - *Risk*: The 0.30 default threshold for top-1 distance WARN is a placeholder; production-tuned values come from FT-P-19 telemetry. - *Mitigation*: `config.vpr.warn_top1_threshold` is config-driven (default 0.30); a follow-up cycle will tune from real flight FDR data. The default is a conservative starting point that surfaces obvious false-positives without flooding logs. ## Runtime Completeness - **Named capability**: production-default `VprStrategy` for top-K retrieval against the C6 FAISS corpus (architecture / E-C2 / `solution.md` "UltraVPR primary backbone" / AC-2.1b + AC-4.1). - **Production code that must exist**: real `UltraVprStrategy` calling real C7 `InferenceRuntime.forward` with a real TRT-compiled UltraVPR engine; real `UltraVprBackbonePreprocessor` performing real OpenCV resize + ImageNet normalisation + FP16 cast; real L2-normalisation via real `DescriptorNormaliser`; real composition-root wiring in `runtime_root.compose_root` for the `ultra_vpr` strategy choice. - **Allowed external stubs**: tests MAY use `FakeInferenceRuntime` returning pre-computed embeddings (AC-2..AC-7), `FakeTileStore` (AC-4 / AC-10 / AC-11), `FakeFdrClient` (verifying FDR record emission), a synthetic frame fixture for preprocessing tests; production wiring uses the real C7 + C6 + UltraVPR engine. - **Unacceptable substitutes**: a Python-only NumPy implementation of UltraVPR's forward pass (would not satisfy C2-PT-01 latency at 60 ms p95; would defeat the GPU-bound architectural choice); skipping L2-normalisation (would break INV-3 and downstream cosine-similarity assumptions); making preprocessing parameters config-knobs (would let operators silently break AC-2.1b); engine load at first frame instead of `create` time (would defer the engine-output-shape assertion past startup, defeating fail-fast); per-strategy thread safety (the contract is single-thread; adding locks would mask the composition-root binding bug if it ever broke); a "demo mode" that returns dummy embeddings to bypass the TRT engine.