[AZ-338] [AZ-283] C2 NetVLAD mandatory simple-baseline VprStrategy

NetVLAD is the C2 comparative baseline per the engine rule (every production-default backbone ships with a simple-baseline alongside). Runs on the C7 PyTorch FP16 runtime (NOT TRT) so a TRT engine compile bug cannot simultaneously break NetVLAD AND UltraVPR. Production changes: - c2_vpr/net_vlad.py — NetVladStrategy + module-level create() factory. Constructor wires InferenceRuntimeCut + DescriptorIndexCut + NetVladBackbonePreprocessor + DescriptorNormaliser + FaissBridge. embed_query pipeline: preprocess -> runtime.infer -> dual-stage normalisation (intra-cluster THEN global L2) -> VprQuery. retrieve_topk delegates one-line to FaissBridge. - c2_vpr/_net_vlad_architecture.py — Arandjelovic et al. 2016 NetVLAD layer over torchvision VGG16 features + optional Linear PCA projection to descriptor_dim (default 4096; published Pittsburgh reference uses K*D=64*512=32768 raw + Linear(32768, 4096) PCA). - c2_vpr/_preprocessor_net_vlad.py — OpenCV-based image preprocessor: decode -> centre-crop square -> resize (480, 480) -> ImageNet normalisation -> FP16 NCHW. Calibration is not consumed (NetVLAD is calibration-agnostic per published preprocessing chain). - c2_vpr/inference_runtime_cut.py — NEW AZ-507 consumer-side cut mirroring C7 InferenceRuntime; lets c2_vpr stay AZ-507-clean. - c2_vpr/config.py — added netvlad_descriptor_dim: int = 4096 knob. - helpers/descriptor_normaliser.py — added intra_cluster_normalise (DescriptorNormaliser v1.0.0 -> v1.1.0; backward-compatible add). - runtime_root/vpr_factory.py — added _register_strategy_architecture helper that binds (MODEL_NAME, architecture_factory(descriptor_dim)) to C7's architecture registry before delegating to the strategy's create() factory. Keeps the c7 import at L4, preserves AZ-507. - fdr_client/records.py — registered vpr.embed_query, vpr.backbone_error, vpr.preprocess_error record kinds. Tests: - tests/unit/c2_vpr/test_net_vlad.py — 31 tests covering all 11 ACs + preprocessor contract + architecture factory + constructor validation + FDR record emission. - tests/unit/test_az283_descriptor_normaliser.py — +8 tests for the new intra_cluster_normalise. - tests/unit/test_az272_fdr_record_schema.py — +3 fixture payloads. Full unit suite: 1608 passed / 80 env-skipped (+43 new tests). Per-batch code review (batch_46_review.md): PASS_WITH_WARNINGS (4 Low-severity hygiene findings; no Critical/High/Medium). Architectural notes: - The spec implied c2_vpr.net_vlad.create() registers the architecture with C7. That violates AZ-507 (no cross-component imports). Resolved by exposing MODEL_NAME + architecture_factory(descriptor_dim) on the strategy module and having the composition root perform the C7 bind. - C7 PyTorch runtime API names in the spec (forward, load_engine) were outdated; aligned implementation with the live v1.0.0 Protocol (infer, compile_engine + deserialize_engine). Spec hygiene flagged in review F2. Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-22 15:41:12 +00:00 · 2026-05-13 22:30:29 +03:00
parent dd2f1cbae6
commit af0dbe863a
15 changed files with 2200 additions and 17 deletions
@@ -37,6 +37,9 @@ from gps_denied_onboard.components.c2_vpr.errors import (
    VprError,
    VprPreprocessError,
 )
+from gps_denied_onboard.components.c2_vpr.inference_runtime_cut import (
+    InferenceRuntimeCut,
+)
 from gps_denied_onboard.components.c2_vpr.interface import VprStrategy
 from gps_denied_onboard.config.schema import register_component_block

@@ -46,6 +49,7 @@ __all__ = [
    "C2VprConfig",
    "DescriptorIndexCut",
    "IndexUnavailableError",
+    "InferenceRuntimeCut",
    "TileIdTuple",
    "VprBackboneError",
    "VprCandidate",
@@ -0,0 +1,144 @@
+"""NetVLAD VGG16 architecture (AZ-338).
+
+Reference: Arandjelović, R., Gronat, P., Torii, A., Pajdla, T., & Sivic, J.
+"NetVLAD: CNN architecture for weakly supervised place recognition", CVPR
+2016. The architecture is the canonical VPR baseline.
+
+The architecture has three parts:
+
+1. **VGG16 trunk** — ``torchvision.models.vgg16`` feature extractor up to
+   the ``conv5_3`` layer (last conv before the classifier). Output is a
+   ``(B, encoder_dim=512, H', W')`` feature map.
+2. **NetVLAD pooling layer** — implements the differentiable VLAD
+   aggregation: soft cluster assignment (1x1 conv + softmax over K)
+   times residuals against K learned cluster centres, summed per cluster
+   to produce a ``(B, K, D)`` aggregated descriptor, then flattened to
+   ``(B, K*D,)``.
+3. **PCA projection (optional)** — a learned ``nn.Linear(K*D, descriptor_dim)``
+   that whitens / reduces the raw VLAD descriptor to the deployment-pinned
+   output dim. Per the published Pittsburgh NetVLAD code drop, the default
+   pinned dim is ``4096`` (whitened from ``K*D = 64*512 = 32768``). When
+   ``descriptor_dim == K*D`` the PCA layer is omitted and the raw VLAD is
+   returned unchanged.
+
+The module is registered into the C7 ``architecture_registry`` (AZ-300)
+under the ``"net_vlad"`` key by the strategy's ``create(...)`` factory in
+:mod:`net_vlad`. Registration time is composition time — the strategy
+constructs a closure carrying the config-driven ``descriptor_dim`` and
+registers it. The C7 registry stays torch-free; torch / torchvision are
+imported lazily inside the factory.
+"""
+
+from __future__ import annotations
+
+from typing import TYPE_CHECKING, Final
+
+if TYPE_CHECKING:
+    from torch import nn
+
+__all__ = [
+    "DEFAULT_DESCRIPTOR_DIM",
+    "DEFAULT_ENCODER_DIM",
+    "DEFAULT_NUM_CLUSTERS",
+    "make_net_vlad_vgg16",
+]
+
+DEFAULT_ENCODER_DIM: Final[int] = 512
+DEFAULT_NUM_CLUSTERS: Final[int] = 64
+DEFAULT_DESCRIPTOR_DIM: Final[int] = 4096
+
+
+def make_net_vlad_vgg16(
+    *,
+    num_clusters: int = DEFAULT_NUM_CLUSTERS,
+    encoder_dim: int = DEFAULT_ENCODER_DIM,
+    descriptor_dim: int = DEFAULT_DESCRIPTOR_DIM,
+) -> nn.Module:
+    """Construct a fresh, randomly-initialised NetVLAD-VGG16 module.
+
+    ``descriptor_dim == num_clusters * encoder_dim`` skips the PCA
+    projection; any other value adds an ``nn.Linear(K*D, descriptor_dim)``
+    final layer (the published NetVLAD reference's "WPCA + L2" tail).
+
+    Torch / torchvision are imported here, not at module load — keeping
+    the c2_vpr package free of torch on Tier-0 builds. Callers seeking
+    deterministic weights MUST seed ``torch.manual_seed`` before
+    invocation.
+    """
+    if num_clusters < 1 or encoder_dim < 1 or descriptor_dim < 1:
+        raise ValueError(
+            f"make_net_vlad_vgg16: dimensions must be positive; "
+            f"got num_clusters={num_clusters}, encoder_dim={encoder_dim}, "
+            f"descriptor_dim={descriptor_dim}"
+        )
+
+    import torch
+    import torch.nn as nn
+    import torch.nn.functional as F
+    import torchvision
+
+    class _NetVladLayer(nn.Module):
+        """Differentiable VLAD aggregation (Arandjelović et al. 2016).
+
+        Soft assignment: ``soft_assign = softmax(conv1x1(x))`` over K
+        clusters. Residuals: ``r_ijk = x_ij - c_k``. Output:
+        ``v_k = sum_ij(soft_assign[k,i,j] * r_ijk)``. Flattened to a
+        single 1-D vector per batch.
+        """
+
+        def __init__(self, num_clusters: int, encoder_dim: int) -> None:
+            super().__init__()
+            self.num_clusters = num_clusters
+            self.encoder_dim = encoder_dim
+            self.conv = nn.Conv2d(
+                encoder_dim, num_clusters, kernel_size=(1, 1), bias=True
+            )
+            self.centroids = nn.Parameter(
+                torch.randn(num_clusters, encoder_dim) * 0.01
+            )
+
+        def forward(self, features: torch.Tensor) -> torch.Tensor:
+            n_batch, n_channels = features.shape[0], features.shape[1]
+            soft_assign = self.conv(features).view(n_batch, self.num_clusters, -1)
+            soft_assign = F.softmax(soft_assign, dim=1)
+            flat = features.view(n_batch, n_channels, -1)
+            soft_assign_t = soft_assign.transpose(1, 2)
+            assigned = torch.bmm(flat, soft_assign_t)
+            cluster_sum = soft_assign.sum(dim=2)
+            scaled_centroids = self.centroids.unsqueeze(0) * cluster_sum.unsqueeze(2)
+            vlad = assigned.transpose(1, 2) - scaled_centroids
+            return vlad.reshape(n_batch, -1)
+
+    class _NetVladVgg16(nn.Module):
+        def __init__(
+            self,
+            num_clusters: int,
+            encoder_dim: int,
+            descriptor_dim: int,
+        ) -> None:
+            super().__init__()
+            self._raw_dim = num_clusters * encoder_dim
+            vgg = torchvision.models.vgg16(weights=None)
+            self.encoder = nn.Sequential(*list(vgg.features.children())[:-2])
+            self.pool = _NetVladLayer(num_clusters, encoder_dim)
+            if descriptor_dim == self._raw_dim:
+                self.pca: nn.Module | None = None
+            else:
+                self.pca = nn.Linear(self._raw_dim, descriptor_dim, bias=True)
+
+        def forward(
+            self, input: torch.Tensor
+        ) -> dict[str, torch.Tensor]:
+            features = self.encoder(input)
+            vlad_raw = self.pool(features)
+            if self.pca is not None:
+                vlad_descriptor = self.pca(vlad_raw)
+            else:
+                vlad_descriptor = vlad_raw
+            return {"vlad_descriptor": vlad_descriptor}
+
+    return _NetVladVgg16(
+        num_clusters=num_clusters,
+        encoder_dim=encoder_dim,
+        descriptor_dim=descriptor_dim,
+    )
@@ -0,0 +1,137 @@
+"""NetVLAD-VGG16 backbone preprocessor (AZ-338).
+
+Per AZ-338 § Outcome: NetVLAD's published preprocessing chain decodes the
+nav-camera frame's image to RGB uint8, centre-crops to a square region
+respecting the camera calibration, resizes to ``(480, 480)``, applies
+ImageNet mean/std normalisation, casts to FP16, and reshapes to NCHW.
+
+This preprocessor is C2-internal and owned exclusively by
+:class:`NetVladStrategy` — UltraVPR and the other backbones each ship
+their own concrete preprocessor (description.md § 6 forbids sharing).
+
+The :class:`BackbonePreprocessor` Protocol is mirrored here (the
+strategy module imports the concrete preprocessor and constructs it in
+the ``create(...)`` factory; the Protocol lives in
+:mod:`c2_vpr._preprocessor`).
+"""
+
+from __future__ import annotations
+
+from typing import TYPE_CHECKING, Final
+
+import cv2
+import numpy as np
+
+from gps_denied_onboard.components.c2_vpr.errors import VprPreprocessError
+
+if TYPE_CHECKING:
+    from gps_denied_onboard._types.calibration import CameraCalibration
+    from gps_denied_onboard._types.nav import NavCameraFrame
+
+__all__ = [
+    "IMAGENET_MEAN",
+    "IMAGENET_STD",
+    "NETVLAD_INPUT_HW",
+    "NetVladBackbonePreprocessor",
+]
+
+NETVLAD_INPUT_HW: Final[tuple[int, int]] = (480, 480)
+IMAGENET_MEAN: Final[tuple[float, float, float]] = (0.485, 0.456, 0.406)
+IMAGENET_STD: Final[tuple[float, float, float]] = (0.229, 0.224, 0.225)
+
+
+class NetVladBackbonePreprocessor:
+    """Resize + ImageNet-normalise + FP16-NCHW for NetVLAD-VGG16."""
+
+    def __init__(
+        self,
+        *,
+        input_shape: tuple[int, int] = NETVLAD_INPUT_HW,
+        mean: tuple[float, float, float] = IMAGENET_MEAN,
+        std: tuple[float, float, float] = IMAGENET_STD,
+    ) -> None:
+        if (
+            not isinstance(input_shape, tuple)
+            or len(input_shape) != 2
+            or any(not isinstance(v, int) or v <= 0 for v in input_shape)
+        ):
+            raise ValueError(
+                f"NetVladBackbonePreprocessor.input_shape must be a (H, W) "
+                f"tuple of positive ints; got {input_shape!r}"
+            )
+        if len(mean) != 3 or len(std) != 3:
+            raise ValueError(
+                "NetVladBackbonePreprocessor.mean and std must each be "
+                "3-tuples (one per channel)"
+            )
+        if any(v <= 0 for v in std):
+            raise ValueError(
+                "NetVladBackbonePreprocessor.std components must be > 0"
+            )
+        self._input_shape: tuple[int, int] = input_shape
+        self._mean: np.ndarray = np.array(mean, dtype=np.float32).reshape(1, 1, 3)
+        self._std: np.ndarray = np.array(std, dtype=np.float32).reshape(1, 1, 3)
+
+    def preprocess(
+        self,
+        frame: NavCameraFrame,
+        calibration: CameraCalibration,
+    ) -> np.ndarray:
+        """Decode → centre-crop → resize → normalise → FP16 NCHW.
+
+        ``calibration`` is accepted for Protocol conformance but is not
+        consumed here — NetVLAD's published preprocessing chain does not
+        use principal-point or distortion correction (the backbone is
+        trained on ImageNet-style centre-cropped frames; calibration
+        differences are absorbed into the learned VLAD residuals).
+        UltraVPR's preprocessor uses calibration; this one does not.
+
+        Raises:
+            :class:`VprPreprocessError` on shape / dtype / decode
+            violations.
+        """
+        del calibration
+        image = self._coerce_to_rgb_uint8(frame.image)
+        cropped = self._centre_crop_square(image)
+        try:
+            resized = cv2.resize(
+                cropped, self._input_shape[::-1], interpolation=cv2.INTER_AREA
+            )
+        except cv2.error as exc:
+            raise VprPreprocessError(
+                f"cv2.resize failed: {type(exc).__name__}: {exc}"
+            ) from exc
+        as_f32 = resized.astype(np.float32) / 255.0
+        normalised = (as_f32 - self._mean) / self._std
+        chw = normalised.transpose(2, 0, 1)
+        return np.ascontiguousarray(chw[None, :, :, :], dtype=np.float16)
+
+    def input_shape(self) -> tuple[int, int]:
+        return self._input_shape
+
+    @staticmethod
+    def _coerce_to_rgb_uint8(image: object) -> np.ndarray:
+        if not isinstance(image, np.ndarray):
+            raise VprPreprocessError(
+                f"frame.image must be a numpy array; got {type(image).__name__}"
+            )
+        if image.dtype != np.uint8:
+            raise VprPreprocessError(
+                f"frame.image must be uint8 RGB; got dtype {image.dtype}"
+            )
+        if image.ndim == 2:
+            # Grayscale → 3-channel by repeating
+            return np.stack([image, image, image], axis=-1)
+        if image.ndim == 3 and image.shape[2] == 3:
+            return image
+        raise VprPreprocessError(
+            f"frame.image must be (H,W) or (H,W,3); got shape {image.shape}"
+        )
+
+    @staticmethod
+    def _centre_crop_square(image: np.ndarray) -> np.ndarray:
+        h, w = image.shape[:2]
+        side = min(h, w)
+        top = (h - side) // 2
+        left = (w - side) // 2
+        return image[top : top + side, left : left + side, :]
@@ -21,8 +21,8 @@ from typing import Final
 from gps_denied_onboard.config.schema import ConfigError

 __all__ = [
-    "C2VprConfig",
    "KNOWN_STRATEGIES",
+    "C2VprConfig",
 ]

 KNOWN_STRATEGIES: Final[frozenset[str]] = frozenset(
@@ -69,6 +69,7 @@ class C2VprConfig:
    faiss_index_path: Path = field(default_factory=lambda: Path("/cache/vpr/index.faiss"))
    warn_top1_threshold: float = 0.30
    debug_per_frame_distances: bool = False
+    netvlad_descriptor_dim: int = 4096

    def __post_init__(self) -> None:
        if self.strategy not in KNOWN_STRATEGIES:
@@ -109,3 +110,15 @@ class C2VprConfig:
                f"C2VprConfig.debug_per_frame_distances must be a bool; "
                f"got {self.debug_per_frame_distances!r}"
            )
+        if not isinstance(self.netvlad_descriptor_dim, int) or isinstance(
+            self.netvlad_descriptor_dim, bool
+        ):
+            raise ConfigError(
+                f"C2VprConfig.netvlad_descriptor_dim must be a non-bool "
+                f"int; got {self.netvlad_descriptor_dim!r}"
+            )
+        if self.netvlad_descriptor_dim < 1:
+            raise ConfigError(
+                f"C2VprConfig.netvlad_descriptor_dim must be >= 1; "
+                f"got {self.netvlad_descriptor_dim}"
+            )
@@ -0,0 +1,60 @@
+"""C2's structural cut of C7 ``InferenceRuntime`` (AZ-507).
+
+Concrete C2 ``VprStrategy`` impls call into C7's inference runtime to
+load engine handles and run forward passes. Per AZ-507, ``c2_vpr`` MUST
+NOT import ``components.c7_inference`` directly; the consumer-side cut
+declares the structural Protocol surface that c2 actually uses, and the
+composition root binds the c7 runtime as the concrete implementation.
+
+This Protocol mirrors the subset of
+:class:`gps_denied_onboard.components.c7_inference.InferenceRuntime`
+that the C2 strategies consume — ``compile_engine``,
+``deserialize_engine``, ``infer``, ``release_engine``, and
+``current_runtime_label``. The full Protocol (which adds
+``thermal_state``) is wider; the cut narrows to what C2 needs so
+``isinstance(runtime, InferenceRuntimeCut)`` can be enforced without
+demanding the wider surface.
+
+DTOs (``BuildConfig``, ``EngineHandle``, ``EngineCacheEntry``) live in
+:mod:`gps_denied_onboard._types.inference` (L1) and are imported here
+directly — they are L1 shared types, not cross-component imports.
+"""
+
+from __future__ import annotations
+
+from pathlib import Path
+from typing import TYPE_CHECKING, Literal, Protocol, runtime_checkable
+
+from gps_denied_onboard._types.inference import (
+    BuildConfig,
+    EngineCacheEntry,
+    EngineHandle,
+)
+
+if TYPE_CHECKING:
+    import numpy as np
+
+__all__ = ["InferenceRuntimeCut"]
+
+
+@runtime_checkable
+class InferenceRuntimeCut(Protocol):
+    """Subset of C7 ``InferenceRuntime`` consumed by C2 strategies."""
+
+    def compile_engine(
+        self, model_path: Path, build_config: BuildConfig
+    ) -> EngineCacheEntry: ...
+
+    def deserialize_engine(self, entry: EngineCacheEntry) -> EngineHandle: ...
+
+    def infer(
+        self,
+        handle: EngineHandle,
+        inputs: dict[str, np.ndarray],
+    ) -> dict[str, np.ndarray]: ...
+
+    def release_engine(self, handle: EngineHandle) -> None: ...
+
+    def current_runtime_label(
+        self,
+    ) -> Literal["tensorrt", "onnx_trt_ep", "pytorch_fp16"]: ...
@@ -0,0 +1,521 @@
+"""``NetVladStrategy`` — C2 mandatory simple-baseline VprStrategy (AZ-338).
+
+NetVLAD is the C2 comparative baseline mandated by the engine rule (every
+production-default backbone ships with a simpler baseline alongside, so
+a code-drop / weights / engine compile bug in the primary has a
+fallback at the strategy layer). Per ``components/02_c2_vpr/description.md``
+§ 1, NetVLAD is paired with UltraVPR's primary path; per § 5, NetVLAD
+runs on the C7 PyTorch FP16 runtime (NOT TensorRT) so a TRT engine
+issue cannot simultaneously break both.
+
+The strategy delegates retrieval to :class:`FaissBridge` (AZ-341) and
+the c6 ``DescriptorIndex`` cut (AZ-507) — see
+:mod:`gps_denied_onboard.components.c2_vpr._faiss_bridge`. Embedding
+goes through the c7 :class:`InferenceRuntime` Protocol; the architecture
+is registered into c7's architecture registry by this module's
+``create(...)`` factory.
+
+Architecture loading flow:
+
+1. ``create(config, descriptor_index, inference_runtime)`` is called by
+   the composition-root :func:`build_vpr_strategy`.
+2. Build-flag guard: confirms ``inference_runtime.current_runtime_label()
+   == "pytorch_fp16"`` — fails fast with :class:`ConfigError` otherwise
+   (AC-11: airborne binary has the PyTorch runtime excluded so this
+   surfaces at composition time, not at first frame).
+3. The factory binds ``descriptor_dim`` from
+   ``config.c2_vpr.netvlad_descriptor_dim`` into a closure and registers
+   that closure with c7's architecture registry under ``"net_vlad"``.
+   The closure is the zero-arg ``ArchitectureFactory`` callable shape
+   the registry expects.
+4. ``inference_runtime.compile_engine(weights_path, build_config)`` is
+   called with ``BuildConfig(precision=FP16, ...)``; the PyTorch runtime
+   (AZ-300) returns an :class:`EngineCacheEntry` whose
+   ``extras["model_name"]`` is the checkpoint's file stem. The factory
+   forces ``model_name = "net_vlad"`` so the registered factory is
+   selected at :meth:`deserialize_engine` time.
+5. ``inference_runtime.deserialize_engine(entry)`` returns an
+   :class:`EngineHandle`. The factory then queries the architecture's
+   output shape via a single dry-run inference on a zero-init input,
+   compares against the configured ``descriptor_dim``, and raises
+   :class:`ConfigError` on mismatch (AC-7) BEFORE the strategy is
+   bound.
+6. ``NetVladStrategy`` is constructed with the resolved handle + the
+   :class:`FaissBridge` + :class:`NetVladBackbonePreprocessor` +
+   :class:`DescriptorNormaliser`.
+
+Per-frame :meth:`embed_query` pipeline:
+
+1. ``preprocessor.preprocess(frame, calibration)`` → ``(1, 3, 480, 480)``
+   FP16 NCHW ndarray.
+2. ``inference_runtime.infer(handle, {"input": tensor})`` →
+   ``{"vlad_descriptor": (1, descriptor_dim) FP16 ndarray}``.
+3. ``normaliser.intra_cluster_normalise(intermediate, num_clusters=64)``
+   → per-cluster L2 (the NetVLAD-canonical first stage).
+4. ``normaliser.l2_normalise(intra)`` → global L2 (the second stage).
+5. Return :class:`VprQuery` with ``frame_id``, normalised embedding,
+   produced_at monotonic ns.
+
+Error envelope: every method raises only members of :class:`VprError`.
+``RuntimeError`` from the backbone forward → rewrapped to
+:class:`VprBackboneError`; :class:`VprPreprocessError` from the
+preprocessor propagates unchanged.
+
+Retrieval is a single-line delegation to :class:`FaissBridge.retrieve`;
+see AZ-341 AC-10.
+"""
+
+from __future__ import annotations
+
+import logging
+from pathlib import Path
+from typing import TYPE_CHECKING, Final, Literal
+
+import numpy as np
+
+from gps_denied_onboard._types.inference import (
+    BuildConfig,
+    EngineHandle,
+    PrecisionMode,
+)
+from gps_denied_onboard._types.vpr import VprQuery, VprResult
+from gps_denied_onboard.clock import Clock
+from gps_denied_onboard.components.c2_vpr._faiss_bridge import FaissBridge
+from gps_denied_onboard.components.c2_vpr._net_vlad_architecture import (
+    DEFAULT_NUM_CLUSTERS,
+    make_net_vlad_vgg16,
+)
+from gps_denied_onboard.components.c2_vpr._preprocessor_net_vlad import (
+    NetVladBackbonePreprocessor,
+)
+from gps_denied_onboard.components.c2_vpr.descriptor_index_cut import (
+    DescriptorIndexCut,
+)
+from gps_denied_onboard.components.c2_vpr.errors import (
+    VprBackboneError,
+    VprPreprocessError,
+)
+from gps_denied_onboard.components.c2_vpr.inference_runtime_cut import (
+    InferenceRuntimeCut,
+)
+from gps_denied_onboard.config.schema import ConfigError
+from gps_denied_onboard.fdr_client import EnqueueResult, FdrClient
+from gps_denied_onboard.fdr_client.records import (
+    CURRENT_SCHEMA_VERSION,
+    FdrRecord,
+)
+from gps_denied_onboard.helpers.descriptor_normaliser import DescriptorNormaliser
+
+if TYPE_CHECKING:
+    from gps_denied_onboard._types.calibration import CameraCalibration
+    from gps_denied_onboard._types.nav import NavCameraFrame
+    from gps_denied_onboard.config.schema import Config
+
+__all__ = ["MODEL_NAME", "NetVladStrategy", "architecture_factory", "create"]
+
+
+MODEL_NAME: Final[str] = "net_vlad"
+
+
+def architecture_factory(
+    descriptor_dim: int,
+    *,
+    num_clusters: int = DEFAULT_NUM_CLUSTERS,
+):
+    """Zero-arg architecture factory closure for C7 registry binding.
+
+    The composition root calls this with the configured ``descriptor_dim``
+    and registers the returned closure under :data:`MODEL_NAME` in C7's
+    architecture registry. Keeping the registration step in the
+    composition root preserves the AZ-507 layering: ``c2_vpr`` MUST NOT
+    import ``c7_inference``.
+    """
+    if descriptor_dim < 1:
+        raise ValueError(
+            f"architecture_factory: descriptor_dim must be >= 1; "
+            f"got {descriptor_dim}"
+        )
+
+    def _factory():
+        return make_net_vlad_vgg16(
+            num_clusters=num_clusters, descriptor_dim=descriptor_dim
+        )
+
+    return _factory
+
+
+_BACKBONE_LABEL: Final[Literal["net_vlad"]] = "net_vlad"
+_COMPONENT: Final[str] = "c2_vpr"
+
+_LOG_KIND_READY: Final[str] = "c2.vpr.ready"
+_LOG_KIND_BACKBONE_ERROR: Final[str] = "c2.vpr.backbone_error"
+_LOG_KIND_PREPROCESS_ERROR: Final[str] = "c2.vpr.preprocess_error"
+_LOG_KIND_FDR_OVERRUN: Final[str] = "c2.vpr.fdr_overrun"
+
+_FDR_KIND_EMBED: Final[str] = "vpr.embed_query"
+_FDR_KIND_BACKBONE_ERROR: Final[str] = "vpr.backbone_error"
+_FDR_KIND_PREPROCESS_ERROR: Final[str] = "vpr.preprocess_error"
+
+
+class NetVladStrategy:
+    """C2 mandatory simple-baseline VprStrategy.
+
+    See module docstring for the architecture-loading + per-frame
+    pipeline. Stateless across frames (INV-2); single-threaded per
+    instance (INV-1).
+    """
+
+    def __init__(
+        self,
+        *,
+        inference_runtime: InferenceRuntimeCut,
+        engine_handle: EngineHandle,
+        descriptor_index: DescriptorIndexCut,
+        preprocessor: NetVladBackbonePreprocessor,
+        normaliser: DescriptorNormaliser,
+        faiss_bridge: FaissBridge,
+        fdr_client: FdrClient,
+        clock: Clock,
+        logger: logging.Logger,
+        descriptor_dim: int,
+        num_clusters: int = DEFAULT_NUM_CLUSTERS,
+    ) -> None:
+        if descriptor_dim < 1:
+            raise ValueError(
+                f"NetVladStrategy.descriptor_dim must be >= 1; "
+                f"got {descriptor_dim}"
+            )
+        if num_clusters < 1:
+            raise ValueError(
+                f"NetVladStrategy.num_clusters must be >= 1; "
+                f"got {num_clusters}"
+            )
+        if descriptor_dim % num_clusters != 0:
+            raise ValueError(
+                f"NetVladStrategy: descriptor_dim={descriptor_dim} must be "
+                f"divisible by num_clusters={num_clusters} for intra-cluster "
+                f"normalisation"
+            )
+        self._inference_runtime = inference_runtime
+        self._engine_handle = engine_handle
+        self._descriptor_index = descriptor_index
+        self._preprocessor = preprocessor
+        self._normaliser = normaliser
+        self._faiss_bridge = faiss_bridge
+        self._fdr_client = fdr_client
+        self._clock = clock
+        self._logger = logger
+        self._descriptor_dim = descriptor_dim
+        self._num_clusters = num_clusters
+
+    def embed_query(
+        self,
+        frame: NavCameraFrame,
+        calibration: CameraCalibration,
+    ) -> VprQuery:
+        try:
+            tensor = self._preprocessor.preprocess(frame, calibration)
+        except VprPreprocessError as exc:
+            self._emit_preprocess_error(frame, exc)
+            raise
+
+        ns_start = self._clock.monotonic_ns()
+        try:
+            outputs = self._inference_runtime.infer(
+                self._engine_handle, {"input": tensor}
+            )
+        except Exception as exc:
+            wrapped = self._wrap_backbone_error(frame, exc)
+            raise wrapped from exc
+        ns_end = self._clock.monotonic_ns()
+        latency_us = max(1, (ns_end - ns_start) // 1_000)
+
+        if "vlad_descriptor" not in outputs:
+            err = VprBackboneError(
+                f"NetVLAD forward returned no 'vlad_descriptor' key; "
+                f"got {sorted(outputs.keys())!r}"
+            )
+            self._emit_backbone_error(frame, err)
+            raise err
+
+        raw = np.asarray(outputs["vlad_descriptor"])
+        if raw.ndim != 2 or raw.shape[0] != 1 or raw.shape[1] != self._descriptor_dim:
+            err = VprBackboneError(
+                f"NetVLAD forward returned shape {raw.shape}; "
+                f"expected (1, {self._descriptor_dim})"
+            )
+            self._emit_backbone_error(frame, err)
+            raise err
+
+        flat = np.ascontiguousarray(raw[0], dtype=np.float16)
+        intra = self._normaliser.intra_cluster_normalise(
+            flat, num_clusters=self._num_clusters
+        )
+        normalised = self._normaliser.l2_normalise(intra)
+
+        self._emit_embed_record(
+            frame_id=int(frame.frame_id), latency_us=int(latency_us)
+        )
+
+        return VprQuery(
+            frame_id=int(frame.frame_id),
+            embedding=normalised,
+            produced_at=ns_end,
+        )
+
+    def retrieve_topk(self, query: VprQuery, k: int) -> VprResult:
+        return self._faiss_bridge.retrieve(
+            query, k, backbone_label=_BACKBONE_LABEL
+        )
+
+    def descriptor_dim(self) -> int:
+        return self._descriptor_dim
+
+    def _wrap_backbone_error(
+        self, frame: NavCameraFrame, exc: BaseException
+    ) -> VprBackboneError:
+        wrapped = VprBackboneError(
+            f"NetVLAD forward raised {type(exc).__name__}: {exc}"
+        )
+        self._emit_backbone_error(frame, wrapped)
+        return wrapped
+
+    def _emit_embed_record(self, *, frame_id: int, latency_us: int) -> None:
+        record = FdrRecord(
+            schema_version=CURRENT_SCHEMA_VERSION,
+            ts=_iso_ts_from_clock(self._clock),
+            producer_id=self._fdr_client.producer_id,
+            kind=_FDR_KIND_EMBED,
+            payload={
+                "frame_id": frame_id,
+                "backbone_label": _BACKBONE_LABEL,
+                "descriptor_dim": self._descriptor_dim,
+                "latency_us": latency_us,
+            },
+        )
+        result = self._fdr_client.enqueue(record)
+        if result == EnqueueResult.OVERRUN:
+            self._logger.warning(
+                "FDR enqueue dropped vpr.embed_query record (buffer overrun)",
+                extra={
+                    "component": _COMPONENT,
+                    "kind": _LOG_KIND_FDR_OVERRUN,
+                    "kv": {
+                        "frame_id": frame_id,
+                        "backbone_label": _BACKBONE_LABEL,
+                    },
+                },
+            )
+
+    def _emit_backbone_error(
+        self, frame: NavCameraFrame, error: BaseException
+    ) -> None:
+        frame_id = int(frame.frame_id)
+        msg = f"NetVLAD backbone error: {error}"
+        self._logger.error(
+            msg,
+            extra={
+                "component": _COMPONENT,
+                "kind": _LOG_KIND_BACKBONE_ERROR,
+                "kv": {
+                    "frame_id": frame_id,
+                    "backbone_label": _BACKBONE_LABEL,
+                    "error_type": type(error).__name__,
+                },
+            },
+        )
+        self._fdr_client.enqueue(
+            FdrRecord(
+                schema_version=CURRENT_SCHEMA_VERSION,
+                ts=_iso_ts_from_clock(self._clock),
+                producer_id=self._fdr_client.producer_id,
+                kind=_FDR_KIND_BACKBONE_ERROR,
+                payload={
+                    "frame_id": frame_id,
+                    "backbone_label": _BACKBONE_LABEL,
+                    "error_type": type(error).__name__,
+                    "error_message": str(error)[:512],
+                },
+            )
+        )
+
+    def _emit_preprocess_error(
+        self, frame: NavCameraFrame, error: BaseException
+    ) -> None:
+        frame_id = int(frame.frame_id)
+        msg = f"NetVLAD preprocess error: {error}"
+        self._logger.error(
+            msg,
+            extra={
+                "component": _COMPONENT,
+                "kind": _LOG_KIND_PREPROCESS_ERROR,
+                "kv": {
+                    "frame_id": frame_id,
+                    "backbone_label": _BACKBONE_LABEL,
+                    "error_type": type(error).__name__,
+                },
+            },
+        )
+        self._fdr_client.enqueue(
+            FdrRecord(
+                schema_version=CURRENT_SCHEMA_VERSION,
+                ts=_iso_ts_from_clock(self._clock),
+                producer_id=self._fdr_client.producer_id,
+                kind=_FDR_KIND_PREPROCESS_ERROR,
+                payload={
+                    "frame_id": frame_id,
+                    "backbone_label": _BACKBONE_LABEL,
+                    "error_type": type(error).__name__,
+                    "error_message": str(error)[:512],
+                },
+            )
+        )
+
+
+def _iso_ts_from_clock(clock: Clock) -> str:
+    # Same shape every component uses for FDR timestamps; AZ-508 will
+    # consolidate the duplicate helpers across c2/c11/c12/c6.
+    from datetime import datetime, timezone
+
+    ns = int(clock.time_ns())
+    seconds, fraction_ns = divmod(ns, 1_000_000_000)
+    dt = datetime.fromtimestamp(seconds, tz=timezone.utc)
+    return f"{dt.strftime('%Y-%m-%dT%H:%M:%S')}.{fraction_ns:09d}+00:00"
+
+
+def _build_pytorch_build_config(weights_path: Path) -> BuildConfig:
+    del weights_path
+    return BuildConfig(
+        precision=PrecisionMode.FP16,
+        workspace_mb=0,
+        calibration_dataset=None,
+        optimization_profiles=(),
+    )
+
+
+def create(
+    config: Config,
+    *,
+    descriptor_index: DescriptorIndexCut,
+    inference_runtime: InferenceRuntimeCut,
+    fdr_client: FdrClient | None = None,
+    clock: Clock | None = None,
+    logger: logging.Logger | None = None,
+) -> NetVladStrategy:
+    """Module-level factory consumed by :func:`build_vpr_strategy`.
+
+    Prerequisite: the composition root MUST have already registered
+    :func:`architecture_factory` under :data:`MODEL_NAME` in C7's
+    architecture registry before calling this factory. The registration
+    step lives in the composition root to preserve AZ-507 — ``c2_vpr``
+    does not import ``c7_inference``.
+
+    Optional keyword-only injection points (``fdr_client`` / ``clock`` /
+    ``logger``) keep tests deterministic; production wiring fills them
+    from the composition root.
+    """
+    runtime_label = inference_runtime.current_runtime_label()
+    if runtime_label != "pytorch_fp16":
+        raise ConfigError(
+            f"NetVLAD requires BUILD_PYTORCH_RUNTIME=ON; this binary "
+            f"has BUILD_PYTORCH_RUNTIME=OFF (current_runtime_label="
+            f"{runtime_label!r}). Per AZ-338 AC-11, NetVLAD is "
+            f"unselectable when the C7 PyTorch FP16 runtime is "
+            f"excluded."
+        )
+
+    block = config.components["c2_vpr"]
+    descriptor_dim = block.netvlad_descriptor_dim
+    weights_path = block.backbone_weights_path
+
+    if fdr_client is None:
+        raise ValueError(
+            "NetVladStrategy.create: fdr_client is required; the "
+            "composition root must inject the running FDR client."
+        )
+    if clock is None:
+        from gps_denied_onboard.clock.wall_clock import WallClock
+
+        clock = WallClock()
+    if logger is None:
+        logger = logging.getLogger("gps_denied_onboard.c2_vpr.net_vlad")
+
+    entry = inference_runtime.compile_engine(
+        weights_path, _build_pytorch_build_config(weights_path)
+    )
+    # Force the registry lookup to "net_vlad" regardless of the on-disk
+    # filename stem; the registered factory holds the descriptor_dim
+    # closure.
+    entry_for_deserialize = type(entry)(
+        engine_path=entry.engine_path,
+        sha256_hex=entry.sha256_hex,
+        sm=entry.sm,
+        jp=entry.jp,
+        trt=entry.trt,
+        precision=entry.precision,
+        extras={**entry.extras, "model_name": MODEL_NAME},
+    )
+    handle = inference_runtime.deserialize_engine(entry_for_deserialize)
+
+    preprocessor = NetVladBackbonePreprocessor()
+    normaliser = DescriptorNormaliser()
+    faiss_bridge = FaissBridge(
+        descriptor_index=descriptor_index,
+        descriptor_dim=descriptor_dim,
+        warn_top1_threshold=block.warn_top1_threshold,
+        debug_log_per_frame_distances=block.debug_per_frame_distances,
+        fdr_client=fdr_client,
+        logger=logger,
+        clock=clock,
+    )
+
+    _assert_engine_output_dim(
+        inference_runtime, handle, descriptor_dim, preprocessor
+    )
+
+    logger.info(
+        "C2 VPR strategy ready",
+        extra={
+            "component": _COMPONENT,
+            "kind": _LOG_KIND_READY,
+            "kv": {
+                "strategy": _BACKBONE_LABEL,
+                "descriptor_dim": descriptor_dim,
+            },
+        },
+    )
+
+    return NetVladStrategy(
+        inference_runtime=inference_runtime,
+        engine_handle=handle,
+        descriptor_index=descriptor_index,
+        preprocessor=preprocessor,
+        normaliser=normaliser,
+        faiss_bridge=faiss_bridge,
+        fdr_client=fdr_client,
+        clock=clock,
+        logger=logger,
+        descriptor_dim=descriptor_dim,
+    )
+
+
+def _assert_engine_output_dim(
+    inference_runtime: InferenceRuntimeCut,
+    handle: EngineHandle,
+    expected_dim: int,
+    preprocessor: NetVladBackbonePreprocessor,
+) -> None:
+    h, w = preprocessor.input_shape()
+    probe = np.zeros((1, 3, h, w), dtype=np.float16)
+    outputs = inference_runtime.infer(handle, {"input": probe})
+    if "vlad_descriptor" not in outputs:
+        raise ConfigError(
+            f"engine output shape mismatch: 'vlad_descriptor' key absent; "
+            f"got keys {sorted(outputs.keys())!r}"
+        )
+    actual = np.asarray(outputs["vlad_descriptor"])
+    if actual.ndim != 2 or actual.shape[0] != 1 or actual.shape[1] != expected_dim:
+        raise ConfigError(
+            f"engine output shape mismatch: expected (1, {expected_dim}), "
+            f"got {tuple(actual.shape)}"
+        )