# C2 NetVLAD Mandatory Simple-Baseline **Task**: AZ-338_c2_net_vlad **Name**: C2 NetVLAD Mandatory Simple-Baseline **Description**: Implement `NetVladStrategy`, the C2 mandatory simple-baseline `VprStrategy` (engine rule: every component MUST ship a comparative baseline alongside its production-default; description.md § 1 designates NetVLAD as the C2 baseline). NetVLAD has a much higher embedding dim than UltraVPR (D=4096 with NetVLAD-VGG16 default; can be reduced to D=512 via PCA-whitening per the upstream NetVLAD code drop) and uses PyTorch FP16 (NOT TensorRT) per the simple-baseline policy: "the baseline runs on the simplest available runtime" so a TRT engine compile bug doesn't simultaneously break baseline AND primary. Includes the concrete `NetVladBackbonePreprocessor` (different resize target + normalisation than UltraVPR). MUST satisfy AC-2.1b's relaxed engine-rule floor `recall@10 ≥ 0.85` on Derkachi normal segment. **Complexity**: 3 points **Dependencies**: AZ-336_c2_vpr_strategy_protocol, AZ-263_initial_structure, AZ-269_config_loader, AZ-300_c7_pytorch_baseline, AZ-303_c6_storage_interfaces, AZ-283_descriptor_normaliser, AZ-266_log_module, AZ-272_fdr_record_schema **Component**: c2_vpr (epic AZ-255 / E-C2) **Tracker**: AZ-338 **Epic**: AZ-255 (E-C2) ### Document Dependencies - `_docs/02_document/contracts/c2_vpr/vpr_strategy_protocol.md` — Protocol contract; every invariant MUST be satisfied; INV-3 (L2-normalised) is critical because NetVLAD raw embeddings include intra-cluster residuals that must be globally L2-normalised after the VLAD aggregation. - `_docs/02_document/components/02_c2_vpr/description.md` — § 1 NetVLAD designated as mandatory simple-baseline; § 5 PyTorch matches simple-baseline track; § 9 logging. - `_docs/02_document/module-layout.md` — `c2_vpr.net_vlad` Internal entry; `BUILD_VPR_NETVLAD` row; `BUILD_PYTORCH_RUNTIME` row (NetVLAD requires PyTorch runtime ON which is OFF for airborne — NetVLAD is research/replay-only by build-flag combination). - `_docs/02_document/components/02_c2_vpr/tests.md` — C2-IT-01 engine rule check `recall@10 ≥ 0.85` for NetVLAD on Derkachi normal segment. - `_docs/02_document/contracts/c7_inference/inference_runtime_protocol.md` — `InferenceRuntime` interface; AZ-300 `pytorch_fp16_runtime` is the consumed concrete runtime. - `_docs/02_document/contracts/shared_helpers/descriptor_normaliser.md` — L2 + intra-normalisation (NetVLAD's published preprocessing chain includes intra-cluster normalisation BEFORE the global L2 normalisation; the `DescriptorNormaliser` helper must support both). ## Problem Without this task: - The C2 component has no comparative baseline; the engine rule (every primary backbone has a baseline alongside it for FT-12 comparative-study and for risk reduction if the primary fails) is violated for C2 specifically — the project-wide policy goes unsatisfied for one of its largest backbone surfaces. - AC-2.1b's relaxed-floor check (`recall@10 ≥ 0.85` for NetVLAD) has no producer; suite-level FT-P-19 cannot validate the engine rule. - The research binary (which links every backbone for IT-12 comparative studies) cannot ship without a NetVLAD strategy; researchers cannot run the comparative study that informs whether the primary's engine choice is justified. - A code drop / weights / engine compile bug in UltraVPR has no fallback at the strategy layer; the operator who notices a sudden drop in suite-level satellite re-loc accuracy would have no mechanism to A/B against the baseline. ## Outcome - `src/gps_denied_onboard/components/c2_vpr/net_vlad.py` defining: - `NetVladStrategy` class implementing the `VprStrategy` Protocol. - Constructor signature: `__init__(self, runtime: InferenceRuntime, tile_store: TileStore, weights_path: Path, preprocessor: NetVladBackbonePreprocessor, normaliser: DescriptorNormaliser, fdr_client: FdrClient, descriptor_dim: int = 4096)`. - `embed_query(frame, calibration)`: 1. `tensor = self._preprocessor.preprocess(frame, calibration)` (returns FP16 NCHW (1, 3, H, W); H=W=480 per the upstream NetVLAD-VGG16 default). 2. `intermediate = self._runtime.forward(self._engine_id, {"input": tensor})["vlad_descriptor"]` (returns FP16 (1, descriptor_dim) post-VLAD aggregation). 3. `intra_normalised = self._normaliser.intra_cluster_normalise(intermediate[0], num_clusters=64)` (per NetVLAD's published preprocessing: intra-cluster L2 first). 4. `embedding = self._normaliser.l2_normalise(intra_normalised)` (then global L2). 5. Return `VprQuery(frame_id, embedding, produced_at=monotonic_ns())`. 6. Catch RuntimeError → wrap in `VprBackboneError`; emit ERROR log + FDR record. - `retrieve_topk(query, k)`: identical to UltraVPR — delegates to `tile_store.faiss_topk`; returns `VprResult` with `backbone_label="net_vlad"`. - `descriptor_dim() -> int`: returns the constructor-passed value (default 4096); asserted at engine-load time against the engine's output tensor shape; mismatch → `RuntimeError`. - Module-level `create(config, tile_store, inference_runtime) -> VprStrategy`: 1. Resolve `weights_path = config.vpr.backbone_weights_path` (a PyTorch state_dict file with the `.pth` extension; NetVLAD does NOT use the AZ-281 self-describing TRT filename schema — its own AZ-280 sidecar carries the PCA matrix + cluster centres). 2. Resolve `descriptor_dim = config.vpr.netvlad_descriptor_dim` (default 4096; can be 512 if PCA-whitened weights are loaded). 3. Construct `NetVladBackbonePreprocessor(input_shape=(480, 480), mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))`. 4. Construct `DescriptorNormaliser` with `intra_cluster_normalise` capability. 5. Load model via `inference_runtime.load_engine(weights_path)` (the PyTorch runtime accepts `.pth` files; AZ-300). 6. Assert engine output shape == `(1, descriptor_dim)`; mismatch → `ConfigurationError`. 7. Construct and return `NetVladStrategy(...)`. - `src/gps_denied_onboard/components/c2_vpr/_preprocessor_net_vlad.py`: - Implements `BackbonePreprocessor` Protocol. - `preprocess(frame, calibration)`: 1. Decode `frame.image_bytes` to RGB uint8 (H_in, W_in, 3). 2. Centre-crop to a square region (same calibration-aware logic as UltraVPR — copied here, NOT shared, because the calibration handling is part of the preprocessor's contract). 3. Resize to `(480, 480)` via OpenCV. 4. Normalise: `(pixel/255.0 - mean) / std`; cast to FP16. 5. Transpose HWC → CHW; add batch dim. 6. Return ndarray of shape `(1, 3, 480, 480)` dtype float16. - `input_shape() -> tuple[int, ...]`: returns `(480, 480)`. - On failure: raise `VprPreprocessError`. - Composition-root wiring path for `config.vpr.strategy == "net_vlad"`. - Logging per description.md § 9: INFO `kind="c2.vpr.ready"` with `{strategy: "net_vlad", descriptor_dim: 4096}`; ERROR / WARN identical to UltraVPR. - FDR records emitted: `kind="vpr.embed_query"`, `kind="vpr.backbone_error"`, `kind="vpr.preprocess_error"`. ## Scope ### Included - `NetVladStrategy` implementing the Protocol; `NetVladBackbonePreprocessor` implementing `BackbonePreprocessor`. - Module-level `create(config, tile_store, inference_runtime)` factory entry-point. - Intra-cluster L2 normalisation BEFORE global L2 normalisation (NetVLAD's published preprocessing chain). - Composition-root wiring for `config.vpr.strategy == "net_vlad"`. - Engine output shape assertion at load time. - Logging + FDR records identical to UltraVPR (the per-backbone label distinguishes the records). - Unit tests covering all 7 invariants, the dual-stage normalisation, the preprocessing contract, the load-time shape assertion. - `BUILD_VPR_NETVLAD` CMake flag wiring per ADR-002 (ON for research; OFF for airborne / operator-tooling because PyTorch runtime is excluded; ON-but-effectively-unused for replay-cli unless explicitly selected). ### Excluded - The `VprStrategy` Protocol — owned by AZ-336. - The `DescriptorNormaliser.l2_normalise` — already AZ-283. **Note**: AZ-283 ships `l2_normalise`; this task may need to extend AZ-283 to add `intra_cluster_normalise(vec, num_clusters)`. **Decision**: extending AZ-283 is in scope here as a small contract addition (the helper ships with `l2_normalise`; adding `intra_cluster_normalise` is a single function). If AZ-283 is already merged when this task starts, the addition is a backward-compatible function add; no breaking change. - The C7 PyTorch runtime — owned by AZ-300; this task consumes the interface. - Other backbones — owned by AZ-337 (UltraVPR), AZ-339 (MegaLoc + MixVPR), AZ-340 (SelaVPR + EigenPlaces + SALAD). - FAISS retrieve wiring — owned by AZ-341. - C2-IT-01's NetVLAD recall@10 ≥ 0.85 acceptance test — deferred to Step 9 / E-BBT. ## Acceptance Criteria **AC-1: Protocol conformance** Given a constructed `NetVladStrategy` instance When `isinstance(strategy, VprStrategy)` is evaluated Then the result is `True` **AC-2: `embed_query` produces L2-normalised FP16 (descriptor_dim,) embedding** Given a valid `NavCameraFrame` and `CameraCalibration` When `strategy.embed_query(frame, calibration)` is called Then `embedding.shape == (4096,)` (or the configured `descriptor_dim`), `embedding.dtype == np.float16`, `||embedding||_2 == 1.0 ± 1e-3` **AC-3: Dual-stage normalisation — intra-cluster THEN global L2** Given a fake intermediate VLAD descriptor with non-zero per-cluster sub-vectors When the embedding pipeline runs Then `intra_cluster_normalise` is called BEFORE `l2_normalise` (verifiable via spy on the normaliser); the order is NEVER reversed; the output's per-cluster sub-vectors are unit-norm in the intra-cluster sense AND the full vector is unit-norm globally **AC-4: `embed_query` is deterministic** Given the same frame + calibration When `embed_query` is called 3 times Then all three returns have bit-exact `embedding` arrays (ULP-tolerant FP16) **AC-5: `retrieve_topk` returns exactly k candidates with `backbone_label = "net_vlad"`** Given a corpus of 100 tiles + a constructed `VprQuery` with D=4096 When `strategy.retrieve_topk(query, k=10)` is called Then `len(candidates) == 10`; sorted ascending; `backbone_label == "net_vlad"`; `candidates[0].descriptor_dim == 4096` **AC-6: `descriptor_dim()` is config-driven and stable** Given construction with `descriptor_dim=4096` When `descriptor_dim()` is called 100 times Then every call returns 4096; constructing a second instance with `descriptor_dim=512` (PCA-whitened weights case) returns 512 from that instance's `descriptor_dim()` **AC-7: Engine output shape mismatch at load → `ConfigurationError`** Given a model whose output tensor shape is `(1, 2048)` while `config.vpr.netvlad_descriptor_dim = 4096` When `NetVladStrategy.create(...)` is called Then `ConfigurationError` is raised with message containing `"engine output shape mismatch: expected (1, 4096), got (1, 2048)"`; the strategy is NOT instantiated **AC-8: `VprBackboneError` on forward-pass failure** Given a `InferenceRuntime` test double that raises `RuntimeError` from `forward` When `embed_query` is called Then `VprBackboneError` is raised; ERROR log + FDR record emitted **AC-9: `VprPreprocessError` on corrupt image bytes** Given a frame with malformed `image_bytes` When `embed_query` is called Then `VprPreprocessError` is raised; ERROR log + FDR record emitted **AC-10: Composition-root wiring** Given `config.vpr.strategy = "net_vlad"` AND valid weights AND matching `descriptor_dim` When `compose_root(config)` runs Then a `NetVladStrategy` is wired; AZ-336 factory's pre-flight `descriptor_dim` validation passes; INFO log `kind="c2.vpr.ready"` with `{strategy: "net_vlad", descriptor_dim: 4096}` emitted **AC-11: Build-flag combination — NetVLAD requires PyTorch runtime** Given `config.vpr.strategy = "net_vlad"` AND `BUILD_PYTORCH_RUNTIME=OFF` (airborne binary) When the binary tries to load Then `ConfigurationError` is raised at composition-root time with message containing `"NetVLAD requires BUILD_PYTORCH_RUNTIME=ON; this binary has BUILD_PYTORCH_RUNTIME=OFF"`; the binary refuses to start (fail-fast) ## Non-Functional Requirements **Performance** - `embed_query` p95 ≤ 80 ms on Tier-1 Jetson Orin with PyTorch FP16 — looser than UltraVPR's 60 ms because the simple-baseline runs on the simpler runtime; not on the production critical path. - `retrieve_topk` p95 ≤ 4 ms — slightly looser than UltraVPR because the higher embedding dim (4096 vs 512) makes FAISS lookup ~ 8× more compute; still sub-frame at 3 Hz. - GPU memory: ≤ 800 MB resident for backbone weights — looser than UltraVPR's 600 MB because NetVLAD's VGG16 backbone is larger. - These NFRs are not enforced as engine-rule blockers; they're operator guidance for the research binary's resource budget. **Compatibility** - The PyTorch state_dict format is owned by C7's PyTorch runtime (AZ-300); this task consumes the produced model via `config.vpr.backbone_weights_path`. - The upstream NetVLAD code drop is pinned per Plan-phase; PCA-whitening parameters change with weights → AZ-280 sidecar carries them. **Reliability** - Strategy is single-threaded by contract (INV-1). - Dual-stage normalisation order (intra-cluster THEN global L2) is mandatory; reversing the order produces a different embedding subspace and silently breaks AC-2.1b (recall regression). - `VprBackboneError` does not crash the process; downstream falls back to VIO-only. ## Unit Tests | AC Ref | What to Test | Required Outcome | |--------|-------------|-----------------| | AC-1 | `isinstance(NetVladStrategy(...), VprStrategy)` | `True` | | AC-2 | `embed_query` output | shape (4096,), dtype float16, L2-norm == 1.0 ± 1e-3 | | AC-3 | Spy on normaliser methods | `intra_cluster_normalise` called BEFORE `l2_normalise` exactly once each per `embed_query` | | AC-4 | `embed_query` × 3 same frame | bit-exact embeddings | | AC-5 | `retrieve_topk` against fixture corpus | `len == 10`, sorted, `backbone_label == "net_vlad"`, `descriptor_dim == 4096` | | AC-6 | `descriptor_dim()` × 100 (D=4096 instance) + a second D=512 instance | first instance always 4096; second always 512 | | AC-7 | Model with wrong output shape | `ConfigurationError` at create time | | AC-8 | `forward` raises | `VprBackboneError`; ERROR log + FDR | | AC-9 | malformed `image_bytes` | `VprPreprocessError`; ERROR log + FDR | | AC-10 | `compose_root(config="net_vlad")` | wired; INFO log with `{strategy: "net_vlad", descriptor_dim: 4096}` | | AC-11 | airborne binary + `config.vpr.strategy = "net_vlad"` | `ConfigurationError` with PyTorch-OFF message; fail-fast | | Preprocess-shape | `preprocessor.preprocess(frame)` output | shape `(1, 3, 480, 480)`, dtype float16 | | Preprocess-input-shape | `preprocessor.input_shape()` | returns `(480, 480)` | ## Constraints - **Dual-stage normalisation order is non-negotiable** — intra-cluster THEN global L2. Reversing is forbidden. - **NetVLAD uses the PyTorch runtime, NOT TensorRT** — the simple-baseline policy isolates it from TRT engine compile risk. The research binary links both runtimes; airborne binary excludes the PyTorch runtime via `BUILD_PYTORCH_RUNTIME=OFF`, which makes NetVLAD effectively unselectable for airborne (AC-11). - **Preprocessing parameters are weights-coupled** — `(480, 480)` resize, ImageNet mean/std. Hard-coded; not config-knobs. - **`descriptor_dim` IS config-driven** (unlike UltraVPR which hard-codes 512) because NetVLAD ships in two flavours: full 4096-d and PCA-whitened 512-d. The choice is part of the operator's deployment, not a runtime decision. - **Constructor injection only**; no `import gps_denied_onboard.config` inside the strategy module. - **The strategy holds the engine ID, NOT the engine itself** — engine lifecycle is owned by C7. ## Risks & Mitigation **Risk 1: NetVLAD embedding dim of 4096 is 8× larger than UltraVPR's 512; FAISS HNSW lookup is slower** - *Risk*: `retrieve_topk` may exceed C2-PT-01's 2 ms budget for the lookup stage; the budget was set against UltraVPR's D=512. - *Mitigation*: `retrieve_topk` p95 ≤ 4 ms is the looser baseline budget (acknowledged in NFRs); for the research binary this is acceptable since NetVLAD is comparison-only. If an operator wants the production-fast path with NetVLAD, they configure PCA-whitening (D=512) at corpus build time (C10). **Risk 2: NetVLAD recall@10 ≥ 0.85 floor not achievable with FP16** - *Risk*: FP16 quantisation degrades the VLAD aggregation precision below the relaxed engine-rule floor. - *Mitigation*: C2-IT-01's NetVLAD assertion is the validation gate (deferred to Step 9). If FP16 fails, the operator can configure FP32 weights — the strategy does not hard-code dtype; it follows the runtime's loaded model. **Risk 3: PyTorch FP16 runtime on Tier-1 Jetson is slower than expected** - *Risk*: PyTorch FP16 inference on Jetson has known pipeline-stall issues compared to TRT. - *Mitigation*: NetVLAD is research-only by build-flag combination (AC-11 enforces); the production critical path is UltraVPR. If a future cycle wants NetVLAD on the airborne binary, that's a separate task: convert NetVLAD to ONNX → TRT engine, then update this strategy to use the TRT runtime. **Risk 4: Operator picks NetVLAD on airborne binary by mistake** - *Risk*: A typo in the airborne config that selects `net_vlad` would silently fall back to VIO-only every flight if the runtime were missing. - *Mitigation*: AC-11 makes this fail-fast at composition-root time with a clear error message. Operators learn at startup, not after takeoff. **Risk 5: AZ-283 `DescriptorNormaliser` may not yet ship `intra_cluster_normalise`** - *Risk*: The helper as defined in AZ-283 ships only `l2_normalise`; this task needs `intra_cluster_normalise` too. - *Mitigation*: As noted in Scope/Excluded, extending AZ-283 to add `intra_cluster_normalise` is a backward-compatible function addition. If AZ-283 already merged before this task starts, the addition is committed alongside this task with a one-line note in `_docs/02_document/contracts/shared_helpers/descriptor_normaliser.md`. If AZ-283 not yet merged, coordinate the addition during AZ-283's implementation. Either way, no breaking change to existing consumers. ## Runtime Completeness - **Named capability**: mandatory simple-baseline `VprStrategy` for engine-rule comparative validation against the production-default UltraVPR (architecture / E-C2 / `solution.md` "NetVLAD mandatory simple-baseline" / engine rule + AC-2.1b relaxed floor). - **Production code that must exist**: real `NetVladStrategy` calling real C7 PyTorch `InferenceRuntime.forward` with a real loaded NetVLAD `.pth` model; real `NetVladBackbonePreprocessor` performing real OpenCV resize + ImageNet normalisation + FP16 cast; real dual-stage normalisation (intra-cluster THEN global L2); real composition-root wiring path. - **Allowed external stubs**: tests MAY use `FakeInferenceRuntime` returning pre-computed VLAD descriptors; `FakeTileStore`; `FakeFdrClient`; `FakeDescriptorNormaliser` instrumented to verify call order (AC-3); production wiring uses the real C7 PyTorch runtime + real NetVLAD weights + real C6. - **Unacceptable substitutes**: a NumPy-only NetVLAD forward pass (would not satisfy NFR-perf budget; would defeat the runtime-isolation strategy of using a different runtime than UltraVPR); skipping intra-cluster normalisation (would silently break AC-2.1b's recall floor); using TensorRT for NetVLAD (would defeat the simple-baseline policy of isolating runtime risk); making preprocessing parameters config-knobs (would let operators silently break the recall floor); selecting NetVLAD in an airborne binary (must fail-fast per AC-11); a single-stage L2-only normalisation (would deviate from NetVLAD's published preprocessing chain; recall regression risk).