Files
gps-denied-onboard/_docs/02_tasks/done/AZ-339_c2_megaloc_mixvpr.md
T
Oleksandr Bezdieniezhnykh 0d65ff4705 [AZ-339] C2 MegaLoc + MixVPR secondary VPR backbones
Adds two research-only VprStrategy implementations for the IT-12
comparative-study matrix. MegaLocStrategy (D=2048, 322x322) and
MixVprStrategy (D=4096, 320x320), both via C7 TensorRT FP16 with
their own concrete BackbonePreprocessor. Single-stage global L2
normalisation; retrieval delegated to FaissBridge; FDR records +
structured logs identical to UltraVPR. BUILD_VPR_MEGALOC and
BUILD_VPR_MIXVPR ON for research/replay-cli only, OFF for airborne
and operator-tooling (fail-fast at composition root via existing
AZ-336 factory). Uses helpers.iso_ts_from_clock from day 1 — no
new timestamp helper duplicates introduced.

36 parametrised AC tests + 25 protocol-conformance + 18 helper
regression tests pass; 1690 / 1690 unit tests pass (excluding 1
pre-existing flaky cold-start subprocess test in c12). Verdict:
PASS_WITH_WARNINGS — one Medium follow-on (AZ-527 to consolidate
4-way _assert_engine_output_dim) + one Low AC wording drift.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-13 23:52:54 +03:00

16 KiB
Raw Blame History

C2 MegaLoc + MixVPR Secondary Backbones

Task: AZ-339_c2_megaloc_mixvpr Name: C2 MegaLoc + MixVPR Secondary Backbones (Research-only) Description: Implement MegaLocStrategy and MixVprStrategy, two secondary VprStrategy backbones used for IT-12 comparative-study purposes (research binary only). Both run on the C7 TensorRT runtime (same path as UltraVPR; FP16 engines compiled by C10) but are gated OFF for airborne and operator-tooling per ADR-002 — they're available only in the research binary and (selectable) replay-cli. Each strategy ships its own concrete BackbonePreprocessor (different resize target and normalisation per upstream code drop). Embeddings: MegaLoc D=2048, MixVPR D=4096. Both produce L2-normalised embeddings; both delegate retrieve_topk to the C6 TileStore Public API. Neither is on the production critical path; performance NFRs are looser than UltraVPR. Complexity: 5 points Dependencies: AZ-336_c2_vpr_strategy_protocol, AZ-263_initial_structure, AZ-269_config_loader, AZ-298_c7_tensorrt_runtime, AZ-303_c6_storage_interfaces, AZ-283_descriptor_normaliser, AZ-281_engine_filename_schema, AZ-321_c10_engine_compiler, AZ-266_log_module, AZ-272_fdr_record_schema Component: c2_vpr (epic AZ-255 / E-C2) Tracker: AZ-339 Epic: AZ-255 (E-C2)

Document Dependencies

  • _docs/02_document/contracts/c2_vpr/vpr_strategy_protocol.md — Protocol contract; both strategies satisfy every invariant.
  • _docs/02_document/components/02_c2_vpr/description.md — § 1 secondary backbones for IT-12 comparative study; § 5 backbone library list.
  • _docs/02_document/module-layout.mdc2_vpr.mega_loc and c2_vpr.mix_vpr Internal entries; BUILD_VPR_MEGALOC and BUILD_VPR_MIXVPR rows (both OFF for airborne/operator-tooling, ON for research; replay-cli inherits research selection at config time).
  • _docs/02_document/contracts/c7_inference/inference_runtime_protocol.mdInferenceRuntime interface (TRT runtime).
  • _docs/02_document/contracts/shared_helpers/descriptor_normaliser.md — L2 normalisation.

Problem

Without this task:

  • The IT-12 comparative-study cannot enumerate MegaLoc and MixVPR alongside UltraVPR / NetVLAD; researchers cannot quantify whether UltraVPR's PRIMARY designation is justified against the broader VPR-backbone landscape.
  • The research binary's link surface is incomplete; the comparative-study CI matrix entry that asserts the research binary contains every secondary backbone fails.
  • A future cycle that wants to swap MegaLoc to PRIMARY (e.g., if UltraVPR's upstream code drop becomes unmaintained) would have no migration path — the strategy class would not yet exist.

Outcome

  • src/gps_denied_onboard/components/c2_vpr/mega_loc.py defining MegaLocStrategy (Protocol-conforming) + create(config, tile_store, inference_runtime) factory entry-point.
    • Constructor signature: __init__(self, runtime, tile_store, weights_path, preprocessor, normaliser, fdr_client).
    • embed_query: preprocess → TRT forward → L2 normalise → return VprQuery.
    • retrieve_topk: delegate to tile_store.faiss_topk; return VprResult with backbone_label="mega_loc", descriptor_dim=2048.
    • descriptor_dim() -> int: returns 2048; engine output shape asserted at load.
  • src/gps_denied_onboard/components/c2_vpr/_preprocessor_mega_loc.py defining MegaLocBackbonePreprocessor:
    • input_shape() -> (322, 322) per upstream MegaLoc default.
    • Normalisation: ImageNet mean/std (same as UltraVPR — common upstream convention; not a coupling, both happen to use ImageNet).
    • Centre-crop with calibration-aware logic (same pattern as UltraVPR / NetVLAD; copied not shared per description.md § 6).
    • Output dtype FP16, NCHW.
  • src/gps_denied_onboard/components/c2_vpr/mix_vpr.py defining MixVprStrategy (mirrors MegaLocStrategy structure):
    • backbone_label="mix_vpr", descriptor_dim=4096.
  • src/gps_denied_onboard/components/c2_vpr/_preprocessor_mix_vpr.py defining MixVprBackbonePreprocessor:
    • input_shape() -> (320, 320) per upstream MixVPR default.
    • Normalisation: ImageNet mean/std.
    • Output dtype FP16, NCHW.
  • Composition-root wiring paths for config.vpr.strategy in {"mega_loc", "mix_vpr"}.
  • BUILD_VPR_MEGALOC and BUILD_VPR_MIXVPR CMake flags wired per ADR-002.
  • Logging per description.md § 9 (INFO ready, WARN top-1-above-threshold, ERROR / FDR per error path).
  • Engine output shape assertion at load for both strategies.
  • Unit tests covering Protocol conformance, L2-normalisation, deterministic embeddings, top-K invariants, error paths — for BOTH strategies.

Scope

Included

  • Both MegaLocStrategy and MixVprStrategy classes implementing the Protocol.
  • Both concrete BackbonePreprocessor implementations (one per strategy; preprocessing parameters per upstream code drop).
  • Module-level create factory functions for both.
  • Composition-root wiring for both strategy choices.
  • Engine output shape assertion at load for both.
  • Logging + FDR records identical pattern to UltraVPR (per-backbone backbone_label).
  • Unit tests for both strategies covering invariants + error paths.
  • BUILD_VPR_MEGALOC and BUILD_VPR_MIXVPR CMake flag wiring.

Excluded

  • The VprStrategy Protocol — owned by AZ-336.
  • Shared DescriptorNormaliser — already AZ-283.
  • C7 TensorRT runtime — owned by AZ-298.
  • Engine compilation — owned by AZ-321.
  • Other backbones — AZ-337 (UltraVPR), AZ-338 (NetVLAD), AZ-340 (SelaVPR + EigenPlaces + SALAD).
  • FAISS retrieve wiring — owned by AZ-341.
  • Recall@10 acceptance tests for these secondary backbones — deferred to Step 9 / E-BBT (and the floors are looser per the engine rule — these are research-only, not engine-rule-binding).

Acceptance Criteria

AC-1 (per strategy): Protocol conformance Given a constructed MegaLocStrategy AND a constructed MixVprStrategy When isinstance(strategy, VprStrategy) is evaluated Then both return True

AC-2 (per strategy): embed_query produces L2-normalised FP16 embedding of correct dim Given a valid NavCameraFrame and CameraCalibration When embed_query is called on each strategy Then MegaLoc returns embedding.shape == (2048,), MixVPR returns embedding.shape == (4096,); both are dtype == np.float16; both have ||embedding||_2 == 1.0 ± 1e-3

AC-3 (per strategy): Deterministic embeddings Given the same frame When embed_query is called 3 times Then bit-exact embeddings (ULP-tolerant FP16) for each strategy

AC-4 (per strategy): retrieve_topk returns exactly k candidates with correct backbone_label Given a corpus of 100 tiles per strategy's descriptor_dim + a constructed VprQuery When retrieve_topk(query, k=10) is called on each strategy Then len(candidates) == 10, sorted ascending; backbone_label == "mega_loc" for MegaLoc; backbone_label == "mix_vpr" for MixVPR; descriptor_dim matches

AC-5 (per strategy): descriptor_dim() is stable Given a constructed strategy When descriptor_dim() is called 100 times Then MegaLoc returns 2048 every call; MixVPR returns 4096 every call

AC-6 (per strategy): Engine output shape mismatch → ConfigurationError Given a TRT engine whose output tensor shape does not match the strategy's expected descriptor_dim When create(...) is called Then ConfigurationError is raised; the strategy is NOT instantiated

AC-7 (per strategy): VprBackboneError on forward-pass failure Given an InferenceRuntime test double that raises When embed_query is called Then VprBackboneError is raised; ERROR log + FDR record emitted

AC-8 (per strategy): VprPreprocessError on corrupt image bytes Given a frame with malformed image_bytes When embed_query is called Then VprPreprocessError is raised; ERROR log + FDR record emitted

AC-9 (per strategy): Composition-root wiring Given config.vpr.strategy = "mega_loc" (resp. "mix_vpr") AND valid weights AND matching descriptor_dim When compose_root(config) runs Then the corresponding strategy is wired; AZ-336 factory's pre-flight descriptor_dim validation passes; INFO log kind="c2.vpr.ready" with {strategy: "mega_loc", descriptor_dim: 2048} (resp. mix_vpr / 4096) is emitted

AC-10 (per strategy): Build-flag exclusion in airborne binary Given config.vpr.strategy = "mega_loc" (resp. "mix_vpr") AND BUILD_VPR_MEGALOC=OFF (resp. BUILD_VPR_MIXVPR=OFF) — the airborne case When the binary tries to load Then ConfigurationError is raised at composition-root time with message containing the missing flag; the binary refuses to start (fail-fast per AZ-336 factory's lazy-import → ImportError → ConfigurationError mapping)

AC-11 (per strategy): Preprocessing input shape Given the strategy's preprocessor instance When input_shape() is called Then MegaLoc returns (322, 322); MixVPR returns (320, 320)

Non-Functional Requirements

Performance (looser than UltraVPR — research-only, not on production critical path):

  • MegaLoc embed_query p95 ≤ 80 ms on Tier-1 Jetson Orin (FP16 TRT).
  • MixVPR embed_query p95 ≤ 100 ms on Tier-1 Jetson Orin (FP16 TRT) — slightly higher because MixVPR's mix-net is ~30% larger than UltraVPR's backbone.
  • retrieve_topk p95: MegaLoc ≤ 3 ms, MixVPR ≤ 4 ms (4096-d FAISS HNSW slower than 512-d).
  • GPU memory per strategy: MegaLoc ≤ 700 MB; MixVPR ≤ 800 MB resident.
  • These NFRs are research-side guidance; not engine-rule blockers.

Compatibility

  • Both consume TRT engines produced by AZ-321 with the AZ-281 self-describing filename schema.
  • Upstream code drops pinned per Plan-phase; weight-format changes between drops require engine rebuild.

Reliability

  • Both strategies single-threaded by contract.
  • Both use unconditional L2-normalisation (INV-3).
  • Errors do not crash the process; downstream falls back to VIO-only.

Unit Tests

AC Ref What to Test Required Outcome
AC-1 (MegaLoc) isinstance(MegaLocStrategy(...), VprStrategy) True
AC-1 (MixVPR) isinstance(MixVprStrategy(...), VprStrategy) True
AC-2 (MegaLoc) embed_query output shape (2048,), dtype float16, L2-norm ≈ 1.0
AC-2 (MixVPR) embed_query output shape (4096,), dtype float16, L2-norm ≈ 1.0
AC-3 (each) embed_query × 3 same frame bit-exact embeddings (ULP-tolerant)
AC-4 (each) retrieve_topk against fixture corpus len == 10, sorted, correct backbone_label, correct descriptor_dim
AC-5 (each) descriptor_dim() × 100 always returns the correct dim
AC-6 (each) TRT engine with wrong output shape ConfigurationError at create time
AC-7 (each) forward raises VprBackboneError; ERROR log + FDR
AC-8 (each) malformed image_bytes VprPreprocessError; ERROR log + FDR
AC-9 (each) compose_root(config=<strategy>) wired; INFO log with correct backbone label and dim
AC-10 (each) airborne binary + strategy chosen ConfigurationError with missing-flag message; fail-fast
AC-11 (MegaLoc) MegaLocBackbonePreprocessor.input_shape() returns (322, 322)
AC-11 (MixVPR) MixVprBackbonePreprocessor.input_shape() returns (320, 320)
Preprocess-shape (each) preprocess(frame) output NCHW shape (1, 3, H, W), dtype float16

Constraints

  • Each strategy ships its own concrete preprocessor — preprocessing parameters per upstream code drop (description.md § 6 "C2-internal helper, NOT a shared helper").
  • Preprocessing parameters are weights-coupled(322, 322) for MegaLoc, (320, 320) for MixVPR; ImageNet mean/std for both. Hard-coded; not config-knobs.
  • Centre-crop logic is duplicated, NOT shared — copying preprocessing between strategies is intentional per the contract; sharing would couple weights-versions across strategies and let one strategy's upgrade silently break another's preprocessing.
  • Both use TensorRT runtime (consistent with UltraVPR's path); the difference between secondary and primary is not the runtime but the build-flag ON/OFF in airborne.
  • No engine compilation in this task — the .trt engine files come from AZ-321; this task consumes them via config.vpr.backbone_weights_path.
  • Both strategies hold engine IDs returned by inference_runtime.load_engine, NOT engines themselves.
  • No GPU operations in __init__ beyond engine load — same constraint as UltraVPR.

Risks & Mitigation

Risk 1: MegaLoc and MixVPR upstream code drops use different ONNX op sets that TRT 10.3 partially supports

  • Risk: Engine compilation succeeds but with fallback layers that don't run on GPU; embed_query p95 inflates.
  • Mitigation: AZ-321 (engine compile) is responsible for detecting fallback layers and reporting them. This task consumes the produced engine; if NFR-perf budgets are violated, AZ-321 escalates the upstream support gap.

Risk 2: Higher embedding dim (4096 for MixVPR) inflates corpus storage requirements

  • Risk: A research binary that switches between UltraVPR (D=512) and MixVPR (D=4096) needs to rebuild the FAISS corpus every swap; researchers may forget.
  • Mitigation: AZ-336 factory's pre-flight descriptor_dim validation catches the mismatch at startup with a clear error message. Researchers must rebuild the corpus (C10) before swapping; the helpful error tells them so.

Risk 3: MegaLoc / MixVPR are research-only — operators may select them by mistake

  • Risk: A typo or copy-pasted research config selects MegaLoc / MixVPR on an airborne binary; cold start fails.
  • Mitigation: AC-10 ensures fail-fast at composition-root with a clear message. Operators learn at startup, not after takeoff.

Risk 4: Test fixtures for MegaLoc / MixVPR engines don't exist in CI

  • Risk: Without TRT engines for these strategies, the unit tests cannot exercise the full embed_query path; they're stubbed via FakeInferenceRuntime.
  • Mitigation: This is fine — Step 9 / E-BBT validates the real engine path against C2-IT-01 and the C2-PT-01 NFR. The unit tests validate Protocol conformance + invariants; they don't need real engines.

Risk 5: Preprocessing duplication across strategies invites subtle bugs

  • Risk: A bug fix to UltraVPR's centre-crop logic doesn't propagate to MegaLoc / MixVPR.
  • Mitigation: This is the documented trade-off (description.md § 6). The duplication is intentional. If a bug fix is needed across strategies, each strategy's preprocessor is updated explicitly with a coordinated commit; cross-checking is part of code review.

Runtime Completeness

  • Named capability: secondary VprStrategy implementations for IT-12 comparative-study (architecture / E-C2 / solution.md "MegaLoc, MixVPR secondary backbones").
  • Production code that must exist: real MegaLocStrategy and MixVprStrategy classes calling real C7 TRT InferenceRuntime.forward with real loaded .trt engines; real concrete preprocessors with real OpenCV resize + ImageNet normalisation + FP16 cast; real L2-normalisation; real composition-root wiring paths.
  • Allowed external stubs: tests MAY use FakeInferenceRuntime returning pre-computed embeddings; FakeTileStore; FakeFdrClient; production wiring uses real C7 + real engines + real C6.
  • Unacceptable substitutes: NumPy-only forward passes (would not satisfy NFR budgets); skipping L2-normalisation (would break INV-3); shared preprocessors across strategies (would defeat description.md § 6 isolation); selecting these strategies in airborne binaries (must fail-fast per AC-10); engine load at first frame (would defer the engine-output-shape assertion past startup); per-strategy thread safety (the contract is single-thread).