Files
gps-denied-onboard/_docs/02_tasks/todo/AZ-338_c2_net_vlad.md
T
Oleksandr Bezdieniezhnykh 880eabcb3f Decompose Step 6 snapshot: 140 task specs + contract docs
Closes out greenfield Step 6 (Decompose) for all 14 components
(C1-C13 + cross-cutting helpers/replay). Covers tasks AZ-266..AZ-446
plus the _dependencies_table.md and component contract documents.

State file updated to greenfield Step 7 (Implement), not_started.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-11 00:39:48 +03:00

19 KiB
Raw Blame History

C2 NetVLAD Mandatory Simple-Baseline

Task: AZ-338_c2_net_vlad Name: C2 NetVLAD Mandatory Simple-Baseline Description: Implement NetVladStrategy, the C2 mandatory simple-baseline VprStrategy (engine rule: every component MUST ship a comparative baseline alongside its production-default; description.md § 1 designates NetVLAD as the C2 baseline). NetVLAD has a much higher embedding dim than UltraVPR (D=4096 with NetVLAD-VGG16 default; can be reduced to D=512 via PCA-whitening per the upstream NetVLAD code drop) and uses PyTorch FP16 (NOT TensorRT) per the simple-baseline policy: "the baseline runs on the simplest available runtime" so a TRT engine compile bug doesn't simultaneously break baseline AND primary. Includes the concrete NetVladBackbonePreprocessor (different resize target + normalisation than UltraVPR). MUST satisfy AC-2.1b's relaxed engine-rule floor recall@10 ≥ 0.85 on Derkachi normal segment. Complexity: 3 points Dependencies: AZ-336_c2_vpr_strategy_protocol, AZ-263_initial_structure, AZ-269_config_loader, AZ-300_c7_pytorch_baseline, AZ-303_c6_storage_interfaces, AZ-283_descriptor_normaliser, AZ-266_log_module, AZ-272_fdr_record_schema Component: c2_vpr (epic AZ-255 / E-C2) Tracker: AZ-338 Epic: AZ-255 (E-C2)

Document Dependencies

  • _docs/02_document/contracts/c2_vpr/vpr_strategy_protocol.md — Protocol contract; every invariant MUST be satisfied; INV-3 (L2-normalised) is critical because NetVLAD raw embeddings include intra-cluster residuals that must be globally L2-normalised after the VLAD aggregation.
  • _docs/02_document/components/02_c2_vpr/description.md — § 1 NetVLAD designated as mandatory simple-baseline; § 5 PyTorch matches simple-baseline track; § 9 logging.
  • _docs/02_document/module-layout.mdc2_vpr.net_vlad Internal entry; BUILD_VPR_NETVLAD row; BUILD_PYTORCH_RUNTIME row (NetVLAD requires PyTorch runtime ON which is OFF for airborne — NetVLAD is research/replay-only by build-flag combination).
  • _docs/02_document/components/02_c2_vpr/tests.md — C2-IT-01 engine rule check recall@10 ≥ 0.85 for NetVLAD on Derkachi normal segment.
  • _docs/02_document/contracts/c7_inference/inference_runtime_protocol.mdInferenceRuntime interface; AZ-300 pytorch_fp16_runtime is the consumed concrete runtime.
  • _docs/02_document/contracts/shared_helpers/descriptor_normaliser.md — L2 + intra-normalisation (NetVLAD's published preprocessing chain includes intra-cluster normalisation BEFORE the global L2 normalisation; the DescriptorNormaliser helper must support both).

Problem

Without this task:

  • The C2 component has no comparative baseline; the engine rule (every primary backbone has a baseline alongside it for FT-12 comparative-study and for risk reduction if the primary fails) is violated for C2 specifically — the project-wide policy goes unsatisfied for one of its largest backbone surfaces.
  • AC-2.1b's relaxed-floor check (recall@10 ≥ 0.85 for NetVLAD) has no producer; suite-level FT-P-19 cannot validate the engine rule.
  • The research binary (which links every backbone for IT-12 comparative studies) cannot ship without a NetVLAD strategy; researchers cannot run the comparative study that informs whether the primary's engine choice is justified.
  • A code drop / weights / engine compile bug in UltraVPR has no fallback at the strategy layer; the operator who notices a sudden drop in suite-level satellite re-loc accuracy would have no mechanism to A/B against the baseline.

Outcome

  • src/gps_denied_onboard/components/c2_vpr/net_vlad.py defining:
    • NetVladStrategy class implementing the VprStrategy Protocol.
    • Constructor signature: __init__(self, runtime: InferenceRuntime, tile_store: TileStore, weights_path: Path, preprocessor: NetVladBackbonePreprocessor, normaliser: DescriptorNormaliser, fdr_client: FdrClient, descriptor_dim: int = 4096).
    • embed_query(frame, calibration):
      1. tensor = self._preprocessor.preprocess(frame, calibration) (returns FP16 NCHW (1, 3, H, W); H=W=480 per the upstream NetVLAD-VGG16 default).
      2. intermediate = self._runtime.forward(self._engine_id, {"input": tensor})["vlad_descriptor"] (returns FP16 (1, descriptor_dim) post-VLAD aggregation).
      3. intra_normalised = self._normaliser.intra_cluster_normalise(intermediate[0], num_clusters=64) (per NetVLAD's published preprocessing: intra-cluster L2 first).
      4. embedding = self._normaliser.l2_normalise(intra_normalised) (then global L2).
      5. Return VprQuery(frame_id, embedding, produced_at=monotonic_ns()).
      6. Catch RuntimeError → wrap in VprBackboneError; emit ERROR log + FDR record.
    • retrieve_topk(query, k): identical to UltraVPR — delegates to tile_store.faiss_topk; returns VprResult with backbone_label="net_vlad".
    • descriptor_dim() -> int: returns the constructor-passed value (default 4096); asserted at engine-load time against the engine's output tensor shape; mismatch → RuntimeError.
    • Module-level create(config, tile_store, inference_runtime) -> VprStrategy:
      1. Resolve weights_path = config.vpr.backbone_weights_path (a PyTorch state_dict file with the .pth extension; NetVLAD does NOT use the AZ-281 self-describing TRT filename schema — its own AZ-280 sidecar carries the PCA matrix + cluster centres).
      2. Resolve descriptor_dim = config.vpr.netvlad_descriptor_dim (default 4096; can be 512 if PCA-whitened weights are loaded).
      3. Construct NetVladBackbonePreprocessor(input_shape=(480, 480), mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)).
      4. Construct DescriptorNormaliser with intra_cluster_normalise capability.
      5. Load model via inference_runtime.load_engine(weights_path) (the PyTorch runtime accepts .pth files; AZ-300).
      6. Assert engine output shape == (1, descriptor_dim); mismatch → ConfigurationError.
      7. Construct and return NetVladStrategy(...).
  • src/gps_denied_onboard/components/c2_vpr/_preprocessor_net_vlad.py:
    • Implements BackbonePreprocessor Protocol.
    • preprocess(frame, calibration):
      1. Decode frame.image_bytes to RGB uint8 (H_in, W_in, 3).
      2. Centre-crop to a square region (same calibration-aware logic as UltraVPR — copied here, NOT shared, because the calibration handling is part of the preprocessor's contract).
      3. Resize to (480, 480) via OpenCV.
      4. Normalise: (pixel/255.0 - mean) / std; cast to FP16.
      5. Transpose HWC → CHW; add batch dim.
      6. Return ndarray of shape (1, 3, 480, 480) dtype float16.
    • input_shape() -> tuple[int, ...]: returns (480, 480).
    • On failure: raise VprPreprocessError.
  • Composition-root wiring path for config.vpr.strategy == "net_vlad".
  • Logging per description.md § 9: INFO kind="c2.vpr.ready" with {strategy: "net_vlad", descriptor_dim: 4096}; ERROR / WARN identical to UltraVPR.
  • FDR records emitted: kind="vpr.embed_query", kind="vpr.backbone_error", kind="vpr.preprocess_error".

Scope

Included

  • NetVladStrategy implementing the Protocol; NetVladBackbonePreprocessor implementing BackbonePreprocessor.
  • Module-level create(config, tile_store, inference_runtime) factory entry-point.
  • Intra-cluster L2 normalisation BEFORE global L2 normalisation (NetVLAD's published preprocessing chain).
  • Composition-root wiring for config.vpr.strategy == "net_vlad".
  • Engine output shape assertion at load time.
  • Logging + FDR records identical to UltraVPR (the per-backbone label distinguishes the records).
  • Unit tests covering all 7 invariants, the dual-stage normalisation, the preprocessing contract, the load-time shape assertion.
  • BUILD_VPR_NETVLAD CMake flag wiring per ADR-002 (ON for research; OFF for airborne / operator-tooling because PyTorch runtime is excluded; ON-but-effectively-unused for replay-cli unless explicitly selected).

Excluded

  • The VprStrategy Protocol — owned by AZ-336.
  • The DescriptorNormaliser.l2_normalise — already AZ-283. Note: AZ-283 ships l2_normalise; this task may need to extend AZ-283 to add intra_cluster_normalise(vec, num_clusters). Decision: extending AZ-283 is in scope here as a small contract addition (the helper ships with l2_normalise; adding intra_cluster_normalise is a single function). If AZ-283 is already merged when this task starts, the addition is a backward-compatible function add; no breaking change.
  • The C7 PyTorch runtime — owned by AZ-300; this task consumes the interface.
  • Other backbones — owned by AZ-337 (UltraVPR), AZ-339 (MegaLoc + MixVPR), AZ-340 (SelaVPR + EigenPlaces + SALAD).
  • FAISS retrieve wiring — owned by AZ-341.
  • C2-IT-01's NetVLAD recall@10 ≥ 0.85 acceptance test — deferred to Step 9 / E-BBT.

Acceptance Criteria

AC-1: Protocol conformance Given a constructed NetVladStrategy instance When isinstance(strategy, VprStrategy) is evaluated Then the result is True

AC-2: embed_query produces L2-normalised FP16 (descriptor_dim,) embedding Given a valid NavCameraFrame and CameraCalibration When strategy.embed_query(frame, calibration) is called Then embedding.shape == (4096,) (or the configured descriptor_dim), embedding.dtype == np.float16, ||embedding||_2 == 1.0 ± 1e-3

AC-3: Dual-stage normalisation — intra-cluster THEN global L2 Given a fake intermediate VLAD descriptor with non-zero per-cluster sub-vectors When the embedding pipeline runs Then intra_cluster_normalise is called BEFORE l2_normalise (verifiable via spy on the normaliser); the order is NEVER reversed; the output's per-cluster sub-vectors are unit-norm in the intra-cluster sense AND the full vector is unit-norm globally

AC-4: embed_query is deterministic Given the same frame + calibration When embed_query is called 3 times Then all three returns have bit-exact embedding arrays (ULP-tolerant FP16)

AC-5: retrieve_topk returns exactly k candidates with backbone_label = "net_vlad" Given a corpus of 100 tiles + a constructed VprQuery with D=4096 When strategy.retrieve_topk(query, k=10) is called Then len(candidates) == 10; sorted ascending; backbone_label == "net_vlad"; candidates[0].descriptor_dim == 4096

AC-6: descriptor_dim() is config-driven and stable Given construction with descriptor_dim=4096 When descriptor_dim() is called 100 times Then every call returns 4096; constructing a second instance with descriptor_dim=512 (PCA-whitened weights case) returns 512 from that instance's descriptor_dim()

AC-7: Engine output shape mismatch at load → ConfigurationError Given a model whose output tensor shape is (1, 2048) while config.vpr.netvlad_descriptor_dim = 4096 When NetVladStrategy.create(...) is called Then ConfigurationError is raised with message containing "engine output shape mismatch: expected (1, 4096), got (1, 2048)"; the strategy is NOT instantiated

AC-8: VprBackboneError on forward-pass failure Given a InferenceRuntime test double that raises RuntimeError from forward When embed_query is called Then VprBackboneError is raised; ERROR log + FDR record emitted

AC-9: VprPreprocessError on corrupt image bytes Given a frame with malformed image_bytes When embed_query is called Then VprPreprocessError is raised; ERROR log + FDR record emitted

AC-10: Composition-root wiring Given config.vpr.strategy = "net_vlad" AND valid weights AND matching descriptor_dim When compose_root(config) runs Then a NetVladStrategy is wired; AZ-336 factory's pre-flight descriptor_dim validation passes; INFO log kind="c2.vpr.ready" with {strategy: "net_vlad", descriptor_dim: 4096} emitted

AC-11: Build-flag combination — NetVLAD requires PyTorch runtime Given config.vpr.strategy = "net_vlad" AND BUILD_PYTORCH_RUNTIME=OFF (airborne binary) When the binary tries to load Then ConfigurationError is raised at composition-root time with message containing "NetVLAD requires BUILD_PYTORCH_RUNTIME=ON; this binary has BUILD_PYTORCH_RUNTIME=OFF"; the binary refuses to start (fail-fast)

Non-Functional Requirements

Performance

  • embed_query p95 ≤ 80 ms on Tier-1 Jetson Orin with PyTorch FP16 — looser than UltraVPR's 60 ms because the simple-baseline runs on the simpler runtime; not on the production critical path.
  • retrieve_topk p95 ≤ 4 ms — slightly looser than UltraVPR because the higher embedding dim (4096 vs 512) makes FAISS lookup ~ 8× more compute; still sub-frame at 3 Hz.
  • GPU memory: ≤ 800 MB resident for backbone weights — looser than UltraVPR's 600 MB because NetVLAD's VGG16 backbone is larger.
  • These NFRs are not enforced as engine-rule blockers; they're operator guidance for the research binary's resource budget.

Compatibility

  • The PyTorch state_dict format is owned by C7's PyTorch runtime (AZ-300); this task consumes the produced model via config.vpr.backbone_weights_path.
  • The upstream NetVLAD code drop is pinned per Plan-phase; PCA-whitening parameters change with weights → AZ-280 sidecar carries them.

Reliability

  • Strategy is single-threaded by contract (INV-1).
  • Dual-stage normalisation order (intra-cluster THEN global L2) is mandatory; reversing the order produces a different embedding subspace and silently breaks AC-2.1b (recall regression).
  • VprBackboneError does not crash the process; downstream falls back to VIO-only.

Unit Tests

AC Ref What to Test Required Outcome
AC-1 isinstance(NetVladStrategy(...), VprStrategy) True
AC-2 embed_query output shape (4096,), dtype float16, L2-norm == 1.0 ± 1e-3
AC-3 Spy on normaliser methods intra_cluster_normalise called BEFORE l2_normalise exactly once each per embed_query
AC-4 embed_query × 3 same frame bit-exact embeddings
AC-5 retrieve_topk against fixture corpus len == 10, sorted, backbone_label == "net_vlad", descriptor_dim == 4096
AC-6 descriptor_dim() × 100 (D=4096 instance) + a second D=512 instance first instance always 4096; second always 512
AC-7 Model with wrong output shape ConfigurationError at create time
AC-8 forward raises VprBackboneError; ERROR log + FDR
AC-9 malformed image_bytes VprPreprocessError; ERROR log + FDR
AC-10 compose_root(config="net_vlad") wired; INFO log with {strategy: "net_vlad", descriptor_dim: 4096}
AC-11 airborne binary + config.vpr.strategy = "net_vlad" ConfigurationError with PyTorch-OFF message; fail-fast
Preprocess-shape preprocessor.preprocess(frame) output shape (1, 3, 480, 480), dtype float16
Preprocess-input-shape preprocessor.input_shape() returns (480, 480)

Constraints

  • Dual-stage normalisation order is non-negotiable — intra-cluster THEN global L2. Reversing is forbidden.
  • NetVLAD uses the PyTorch runtime, NOT TensorRT — the simple-baseline policy isolates it from TRT engine compile risk. The research binary links both runtimes; airborne binary excludes the PyTorch runtime via BUILD_PYTORCH_RUNTIME=OFF, which makes NetVLAD effectively unselectable for airborne (AC-11).
  • Preprocessing parameters are weights-coupled(480, 480) resize, ImageNet mean/std. Hard-coded; not config-knobs.
  • descriptor_dim IS config-driven (unlike UltraVPR which hard-codes 512) because NetVLAD ships in two flavours: full 4096-d and PCA-whitened 512-d. The choice is part of the operator's deployment, not a runtime decision.
  • Constructor injection only; no import gps_denied_onboard.config inside the strategy module.
  • The strategy holds the engine ID, NOT the engine itself — engine lifecycle is owned by C7.

Risks & Mitigation

Risk 1: NetVLAD embedding dim of 4096 is 8× larger than UltraVPR's 512; FAISS HNSW lookup is slower

  • Risk: retrieve_topk may exceed C2-PT-01's 2 ms budget for the lookup stage; the budget was set against UltraVPR's D=512.
  • Mitigation: retrieve_topk p95 ≤ 4 ms is the looser baseline budget (acknowledged in NFRs); for the research binary this is acceptable since NetVLAD is comparison-only. If an operator wants the production-fast path with NetVLAD, they configure PCA-whitening (D=512) at corpus build time (C10).

Risk 2: NetVLAD recall@10 ≥ 0.85 floor not achievable with FP16

  • Risk: FP16 quantisation degrades the VLAD aggregation precision below the relaxed engine-rule floor.
  • Mitigation: C2-IT-01's NetVLAD assertion is the validation gate (deferred to Step 9). If FP16 fails, the operator can configure FP32 weights — the strategy does not hard-code dtype; it follows the runtime's loaded model.

Risk 3: PyTorch FP16 runtime on Tier-1 Jetson is slower than expected

  • Risk: PyTorch FP16 inference on Jetson has known pipeline-stall issues compared to TRT.
  • Mitigation: NetVLAD is research-only by build-flag combination (AC-11 enforces); the production critical path is UltraVPR. If a future cycle wants NetVLAD on the airborne binary, that's a separate task: convert NetVLAD to ONNX → TRT engine, then update this strategy to use the TRT runtime.

Risk 4: Operator picks NetVLAD on airborne binary by mistake

  • Risk: A typo in the airborne config that selects net_vlad would silently fall back to VIO-only every flight if the runtime were missing.
  • Mitigation: AC-11 makes this fail-fast at composition-root time with a clear error message. Operators learn at startup, not after takeoff.

Risk 5: AZ-283 DescriptorNormaliser may not yet ship intra_cluster_normalise

  • Risk: The helper as defined in AZ-283 ships only l2_normalise; this task needs intra_cluster_normalise too.
  • Mitigation: As noted in Scope/Excluded, extending AZ-283 to add intra_cluster_normalise is a backward-compatible function addition. If AZ-283 already merged before this task starts, the addition is committed alongside this task with a one-line note in _docs/02_document/contracts/shared_helpers/descriptor_normaliser.md. If AZ-283 not yet merged, coordinate the addition during AZ-283's implementation. Either way, no breaking change to existing consumers.

Runtime Completeness

  • Named capability: mandatory simple-baseline VprStrategy for engine-rule comparative validation against the production-default UltraVPR (architecture / E-C2 / solution.md "NetVLAD mandatory simple-baseline" / engine rule + AC-2.1b relaxed floor).
  • Production code that must exist: real NetVladStrategy calling real C7 PyTorch InferenceRuntime.forward with a real loaded NetVLAD .pth model; real NetVladBackbonePreprocessor performing real OpenCV resize + ImageNet normalisation + FP16 cast; real dual-stage normalisation (intra-cluster THEN global L2); real composition-root wiring path.
  • Allowed external stubs: tests MAY use FakeInferenceRuntime returning pre-computed VLAD descriptors; FakeTileStore; FakeFdrClient; FakeDescriptorNormaliser instrumented to verify call order (AC-3); production wiring uses the real C7 PyTorch runtime + real NetVLAD weights + real C6.
  • Unacceptable substitutes: a NumPy-only NetVLAD forward pass (would not satisfy NFR-perf budget; would defeat the runtime-isolation strategy of using a different runtime than UltraVPR); skipping intra-cluster normalisation (would silently break AC-2.1b's recall floor); using TensorRT for NetVLAD (would defeat the simple-baseline policy of isolating runtime risk); making preprocessing parameters config-knobs (would let operators silently break the recall floor); selecting NetVLAD in an airborne binary (must fail-fast per AC-11); a single-stage L2-only normalisation (would deviate from NetVLAD's published preprocessing chain; recall regression risk).