Closes out greenfield Step 6 (Decompose) for all 14 components (C1-C13 + cross-cutting helpers/replay). Covers tasks AZ-266..AZ-446 plus the _dependencies_table.md and component contract documents. State file updated to greenfield Step 7 (Implement), not_started. Co-authored-by: Cursor <cursoragent@cursor.com>
19 KiB
C2 NetVLAD Mandatory Simple-Baseline
Task: AZ-338_c2_net_vlad
Name: C2 NetVLAD Mandatory Simple-Baseline
Description: Implement NetVladStrategy, the C2 mandatory simple-baseline VprStrategy (engine rule: every component MUST ship a comparative baseline alongside its production-default; description.md § 1 designates NetVLAD as the C2 baseline). NetVLAD has a much higher embedding dim than UltraVPR (D=4096 with NetVLAD-VGG16 default; can be reduced to D=512 via PCA-whitening per the upstream NetVLAD code drop) and uses PyTorch FP16 (NOT TensorRT) per the simple-baseline policy: "the baseline runs on the simplest available runtime" so a TRT engine compile bug doesn't simultaneously break baseline AND primary. Includes the concrete NetVladBackbonePreprocessor (different resize target + normalisation than UltraVPR). MUST satisfy AC-2.1b's relaxed engine-rule floor recall@10 ≥ 0.85 on Derkachi normal segment.
Complexity: 3 points
Dependencies: AZ-336_c2_vpr_strategy_protocol, AZ-263_initial_structure, AZ-269_config_loader, AZ-300_c7_pytorch_baseline, AZ-303_c6_storage_interfaces, AZ-283_descriptor_normaliser, AZ-266_log_module, AZ-272_fdr_record_schema
Component: c2_vpr (epic AZ-255 / E-C2)
Tracker: AZ-338
Epic: AZ-255 (E-C2)
Document Dependencies
_docs/02_document/contracts/c2_vpr/vpr_strategy_protocol.md— Protocol contract; every invariant MUST be satisfied; INV-3 (L2-normalised) is critical because NetVLAD raw embeddings include intra-cluster residuals that must be globally L2-normalised after the VLAD aggregation._docs/02_document/components/02_c2_vpr/description.md— § 1 NetVLAD designated as mandatory simple-baseline; § 5 PyTorch matches simple-baseline track; § 9 logging._docs/02_document/module-layout.md—c2_vpr.net_vladInternal entry;BUILD_VPR_NETVLADrow;BUILD_PYTORCH_RUNTIMErow (NetVLAD requires PyTorch runtime ON which is OFF for airborne — NetVLAD is research/replay-only by build-flag combination)._docs/02_document/components/02_c2_vpr/tests.md— C2-IT-01 engine rule checkrecall@10 ≥ 0.85for NetVLAD on Derkachi normal segment._docs/02_document/contracts/c7_inference/inference_runtime_protocol.md—InferenceRuntimeinterface; AZ-300pytorch_fp16_runtimeis the consumed concrete runtime._docs/02_document/contracts/shared_helpers/descriptor_normaliser.md— L2 + intra-normalisation (NetVLAD's published preprocessing chain includes intra-cluster normalisation BEFORE the global L2 normalisation; theDescriptorNormaliserhelper must support both).
Problem
Without this task:
- The C2 component has no comparative baseline; the engine rule (every primary backbone has a baseline alongside it for FT-12 comparative-study and for risk reduction if the primary fails) is violated for C2 specifically — the project-wide policy goes unsatisfied for one of its largest backbone surfaces.
- AC-2.1b's relaxed-floor check (
recall@10 ≥ 0.85for NetVLAD) has no producer; suite-level FT-P-19 cannot validate the engine rule. - The research binary (which links every backbone for IT-12 comparative studies) cannot ship without a NetVLAD strategy; researchers cannot run the comparative study that informs whether the primary's engine choice is justified.
- A code drop / weights / engine compile bug in UltraVPR has no fallback at the strategy layer; the operator who notices a sudden drop in suite-level satellite re-loc accuracy would have no mechanism to A/B against the baseline.
Outcome
src/gps_denied_onboard/components/c2_vpr/net_vlad.pydefining:NetVladStrategyclass implementing theVprStrategyProtocol.- Constructor signature:
__init__(self, runtime: InferenceRuntime, tile_store: TileStore, weights_path: Path, preprocessor: NetVladBackbonePreprocessor, normaliser: DescriptorNormaliser, fdr_client: FdrClient, descriptor_dim: int = 4096). embed_query(frame, calibration):tensor = self._preprocessor.preprocess(frame, calibration)(returns FP16 NCHW (1, 3, H, W); H=W=480 per the upstream NetVLAD-VGG16 default).intermediate = self._runtime.forward(self._engine_id, {"input": tensor})["vlad_descriptor"](returns FP16 (1, descriptor_dim) post-VLAD aggregation).intra_normalised = self._normaliser.intra_cluster_normalise(intermediate[0], num_clusters=64)(per NetVLAD's published preprocessing: intra-cluster L2 first).embedding = self._normaliser.l2_normalise(intra_normalised)(then global L2).- Return
VprQuery(frame_id, embedding, produced_at=monotonic_ns()). - Catch RuntimeError → wrap in
VprBackboneError; emit ERROR log + FDR record.
retrieve_topk(query, k): identical to UltraVPR — delegates totile_store.faiss_topk; returnsVprResultwithbackbone_label="net_vlad".descriptor_dim() -> int: returns the constructor-passed value (default 4096); asserted at engine-load time against the engine's output tensor shape; mismatch →RuntimeError.- Module-level
create(config, tile_store, inference_runtime) -> VprStrategy:- Resolve
weights_path = config.vpr.backbone_weights_path(a PyTorch state_dict file with the.pthextension; NetVLAD does NOT use the AZ-281 self-describing TRT filename schema — its own AZ-280 sidecar carries the PCA matrix + cluster centres). - Resolve
descriptor_dim = config.vpr.netvlad_descriptor_dim(default 4096; can be 512 if PCA-whitened weights are loaded). - Construct
NetVladBackbonePreprocessor(input_shape=(480, 480), mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)). - Construct
DescriptorNormaliserwithintra_cluster_normalisecapability. - Load model via
inference_runtime.load_engine(weights_path)(the PyTorch runtime accepts.pthfiles; AZ-300). - Assert engine output shape ==
(1, descriptor_dim); mismatch →ConfigurationError. - Construct and return
NetVladStrategy(...).
- Resolve
src/gps_denied_onboard/components/c2_vpr/_preprocessor_net_vlad.py:- Implements
BackbonePreprocessorProtocol. preprocess(frame, calibration):- Decode
frame.image_bytesto RGB uint8 (H_in, W_in, 3). - Centre-crop to a square region (same calibration-aware logic as UltraVPR — copied here, NOT shared, because the calibration handling is part of the preprocessor's contract).
- Resize to
(480, 480)via OpenCV. - Normalise:
(pixel/255.0 - mean) / std; cast to FP16. - Transpose HWC → CHW; add batch dim.
- Return ndarray of shape
(1, 3, 480, 480)dtype float16.
- Decode
input_shape() -> tuple[int, ...]: returns(480, 480).- On failure: raise
VprPreprocessError.
- Implements
- Composition-root wiring path for
config.vpr.strategy == "net_vlad". - Logging per description.md § 9: INFO
kind="c2.vpr.ready"with{strategy: "net_vlad", descriptor_dim: 4096}; ERROR / WARN identical to UltraVPR. - FDR records emitted:
kind="vpr.embed_query",kind="vpr.backbone_error",kind="vpr.preprocess_error".
Scope
Included
NetVladStrategyimplementing the Protocol;NetVladBackbonePreprocessorimplementingBackbonePreprocessor.- Module-level
create(config, tile_store, inference_runtime)factory entry-point. - Intra-cluster L2 normalisation BEFORE global L2 normalisation (NetVLAD's published preprocessing chain).
- Composition-root wiring for
config.vpr.strategy == "net_vlad". - Engine output shape assertion at load time.
- Logging + FDR records identical to UltraVPR (the per-backbone label distinguishes the records).
- Unit tests covering all 7 invariants, the dual-stage normalisation, the preprocessing contract, the load-time shape assertion.
BUILD_VPR_NETVLADCMake flag wiring per ADR-002 (ON for research; OFF for airborne / operator-tooling because PyTorch runtime is excluded; ON-but-effectively-unused for replay-cli unless explicitly selected).
Excluded
- The
VprStrategyProtocol — owned by AZ-336. - The
DescriptorNormaliser.l2_normalise— already AZ-283. Note: AZ-283 shipsl2_normalise; this task may need to extend AZ-283 to addintra_cluster_normalise(vec, num_clusters). Decision: extending AZ-283 is in scope here as a small contract addition (the helper ships withl2_normalise; addingintra_cluster_normaliseis a single function). If AZ-283 is already merged when this task starts, the addition is a backward-compatible function add; no breaking change. - The C7 PyTorch runtime — owned by AZ-300; this task consumes the interface.
- Other backbones — owned by AZ-337 (UltraVPR), AZ-339 (MegaLoc + MixVPR), AZ-340 (SelaVPR + EigenPlaces + SALAD).
- FAISS retrieve wiring — owned by AZ-341.
- C2-IT-01's NetVLAD recall@10 ≥ 0.85 acceptance test — deferred to Step 9 / E-BBT.
Acceptance Criteria
AC-1: Protocol conformance
Given a constructed NetVladStrategy instance
When isinstance(strategy, VprStrategy) is evaluated
Then the result is True
AC-2: embed_query produces L2-normalised FP16 (descriptor_dim,) embedding
Given a valid NavCameraFrame and CameraCalibration
When strategy.embed_query(frame, calibration) is called
Then embedding.shape == (4096,) (or the configured descriptor_dim), embedding.dtype == np.float16, ||embedding||_2 == 1.0 ± 1e-3
AC-3: Dual-stage normalisation — intra-cluster THEN global L2
Given a fake intermediate VLAD descriptor with non-zero per-cluster sub-vectors
When the embedding pipeline runs
Then intra_cluster_normalise is called BEFORE l2_normalise (verifiable via spy on the normaliser); the order is NEVER reversed; the output's per-cluster sub-vectors are unit-norm in the intra-cluster sense AND the full vector is unit-norm globally
AC-4: embed_query is deterministic
Given the same frame + calibration
When embed_query is called 3 times
Then all three returns have bit-exact embedding arrays (ULP-tolerant FP16)
AC-5: retrieve_topk returns exactly k candidates with backbone_label = "net_vlad"
Given a corpus of 100 tiles + a constructed VprQuery with D=4096
When strategy.retrieve_topk(query, k=10) is called
Then len(candidates) == 10; sorted ascending; backbone_label == "net_vlad"; candidates[0].descriptor_dim == 4096
AC-6: descriptor_dim() is config-driven and stable
Given construction with descriptor_dim=4096
When descriptor_dim() is called 100 times
Then every call returns 4096; constructing a second instance with descriptor_dim=512 (PCA-whitened weights case) returns 512 from that instance's descriptor_dim()
AC-7: Engine output shape mismatch at load → ConfigurationError
Given a model whose output tensor shape is (1, 2048) while config.vpr.netvlad_descriptor_dim = 4096
When NetVladStrategy.create(...) is called
Then ConfigurationError is raised with message containing "engine output shape mismatch: expected (1, 4096), got (1, 2048)"; the strategy is NOT instantiated
AC-8: VprBackboneError on forward-pass failure
Given a InferenceRuntime test double that raises RuntimeError from forward
When embed_query is called
Then VprBackboneError is raised; ERROR log + FDR record emitted
AC-9: VprPreprocessError on corrupt image bytes
Given a frame with malformed image_bytes
When embed_query is called
Then VprPreprocessError is raised; ERROR log + FDR record emitted
AC-10: Composition-root wiring
Given config.vpr.strategy = "net_vlad" AND valid weights AND matching descriptor_dim
When compose_root(config) runs
Then a NetVladStrategy is wired; AZ-336 factory's pre-flight descriptor_dim validation passes; INFO log kind="c2.vpr.ready" with {strategy: "net_vlad", descriptor_dim: 4096} emitted
AC-11: Build-flag combination — NetVLAD requires PyTorch runtime
Given config.vpr.strategy = "net_vlad" AND BUILD_PYTORCH_RUNTIME=OFF (airborne binary)
When the binary tries to load
Then ConfigurationError is raised at composition-root time with message containing "NetVLAD requires BUILD_PYTORCH_RUNTIME=ON; this binary has BUILD_PYTORCH_RUNTIME=OFF"; the binary refuses to start (fail-fast)
Non-Functional Requirements
Performance
embed_queryp95 ≤ 80 ms on Tier-1 Jetson Orin with PyTorch FP16 — looser than UltraVPR's 60 ms because the simple-baseline runs on the simpler runtime; not on the production critical path.retrieve_topkp95 ≤ 4 ms — slightly looser than UltraVPR because the higher embedding dim (4096 vs 512) makes FAISS lookup ~ 8× more compute; still sub-frame at 3 Hz.- GPU memory: ≤ 800 MB resident for backbone weights — looser than UltraVPR's 600 MB because NetVLAD's VGG16 backbone is larger.
- These NFRs are not enforced as engine-rule blockers; they're operator guidance for the research binary's resource budget.
Compatibility
- The PyTorch state_dict format is owned by C7's PyTorch runtime (AZ-300); this task consumes the produced model via
config.vpr.backbone_weights_path. - The upstream NetVLAD code drop is pinned per Plan-phase; PCA-whitening parameters change with weights → AZ-280 sidecar carries them.
Reliability
- Strategy is single-threaded by contract (INV-1).
- Dual-stage normalisation order (intra-cluster THEN global L2) is mandatory; reversing the order produces a different embedding subspace and silently breaks AC-2.1b (recall regression).
VprBackboneErrordoes not crash the process; downstream falls back to VIO-only.
Unit Tests
| AC Ref | What to Test | Required Outcome |
|---|---|---|
| AC-1 | isinstance(NetVladStrategy(...), VprStrategy) |
True |
| AC-2 | embed_query output |
shape (4096,), dtype float16, L2-norm == 1.0 ± 1e-3 |
| AC-3 | Spy on normaliser methods | intra_cluster_normalise called BEFORE l2_normalise exactly once each per embed_query |
| AC-4 | embed_query × 3 same frame |
bit-exact embeddings |
| AC-5 | retrieve_topk against fixture corpus |
len == 10, sorted, backbone_label == "net_vlad", descriptor_dim == 4096 |
| AC-6 | descriptor_dim() × 100 (D=4096 instance) + a second D=512 instance |
first instance always 4096; second always 512 |
| AC-7 | Model with wrong output shape | ConfigurationError at create time |
| AC-8 | forward raises |
VprBackboneError; ERROR log + FDR |
| AC-9 | malformed image_bytes |
VprPreprocessError; ERROR log + FDR |
| AC-10 | compose_root(config="net_vlad") |
wired; INFO log with {strategy: "net_vlad", descriptor_dim: 4096} |
| AC-11 | airborne binary + config.vpr.strategy = "net_vlad" |
ConfigurationError with PyTorch-OFF message; fail-fast |
| Preprocess-shape | preprocessor.preprocess(frame) output |
shape (1, 3, 480, 480), dtype float16 |
| Preprocess-input-shape | preprocessor.input_shape() |
returns (480, 480) |
Constraints
- Dual-stage normalisation order is non-negotiable — intra-cluster THEN global L2. Reversing is forbidden.
- NetVLAD uses the PyTorch runtime, NOT TensorRT — the simple-baseline policy isolates it from TRT engine compile risk. The research binary links both runtimes; airborne binary excludes the PyTorch runtime via
BUILD_PYTORCH_RUNTIME=OFF, which makes NetVLAD effectively unselectable for airborne (AC-11). - Preprocessing parameters are weights-coupled —
(480, 480)resize, ImageNet mean/std. Hard-coded; not config-knobs. descriptor_dimIS config-driven (unlike UltraVPR which hard-codes 512) because NetVLAD ships in two flavours: full 4096-d and PCA-whitened 512-d. The choice is part of the operator's deployment, not a runtime decision.- Constructor injection only; no
import gps_denied_onboard.configinside the strategy module. - The strategy holds the engine ID, NOT the engine itself — engine lifecycle is owned by C7.
Risks & Mitigation
Risk 1: NetVLAD embedding dim of 4096 is 8× larger than UltraVPR's 512; FAISS HNSW lookup is slower
- Risk:
retrieve_topkmay exceed C2-PT-01's 2 ms budget for the lookup stage; the budget was set against UltraVPR's D=512. - Mitigation:
retrieve_topkp95 ≤ 4 ms is the looser baseline budget (acknowledged in NFRs); for the research binary this is acceptable since NetVLAD is comparison-only. If an operator wants the production-fast path with NetVLAD, they configure PCA-whitening (D=512) at corpus build time (C10).
Risk 2: NetVLAD recall@10 ≥ 0.85 floor not achievable with FP16
- Risk: FP16 quantisation degrades the VLAD aggregation precision below the relaxed engine-rule floor.
- Mitigation: C2-IT-01's NetVLAD assertion is the validation gate (deferred to Step 9). If FP16 fails, the operator can configure FP32 weights — the strategy does not hard-code dtype; it follows the runtime's loaded model.
Risk 3: PyTorch FP16 runtime on Tier-1 Jetson is slower than expected
- Risk: PyTorch FP16 inference on Jetson has known pipeline-stall issues compared to TRT.
- Mitigation: NetVLAD is research-only by build-flag combination (AC-11 enforces); the production critical path is UltraVPR. If a future cycle wants NetVLAD on the airborne binary, that's a separate task: convert NetVLAD to ONNX → TRT engine, then update this strategy to use the TRT runtime.
Risk 4: Operator picks NetVLAD on airborne binary by mistake
- Risk: A typo in the airborne config that selects
net_vladwould silently fall back to VIO-only every flight if the runtime were missing. - Mitigation: AC-11 makes this fail-fast at composition-root time with a clear error message. Operators learn at startup, not after takeoff.
Risk 5: AZ-283 DescriptorNormaliser may not yet ship intra_cluster_normalise
- Risk: The helper as defined in AZ-283 ships only
l2_normalise; this task needsintra_cluster_normalisetoo. - Mitigation: As noted in Scope/Excluded, extending AZ-283 to add
intra_cluster_normaliseis a backward-compatible function addition. If AZ-283 already merged before this task starts, the addition is committed alongside this task with a one-line note in_docs/02_document/contracts/shared_helpers/descriptor_normaliser.md. If AZ-283 not yet merged, coordinate the addition during AZ-283's implementation. Either way, no breaking change to existing consumers.
Runtime Completeness
- Named capability: mandatory simple-baseline
VprStrategyfor engine-rule comparative validation against the production-default UltraVPR (architecture / E-C2 /solution.md"NetVLAD mandatory simple-baseline" / engine rule + AC-2.1b relaxed floor). - Production code that must exist: real
NetVladStrategycalling real C7 PyTorchInferenceRuntime.forwardwith a real loaded NetVLAD.pthmodel; realNetVladBackbonePreprocessorperforming real OpenCV resize + ImageNet normalisation + FP16 cast; real dual-stage normalisation (intra-cluster THEN global L2); real composition-root wiring path. - Allowed external stubs: tests MAY use
FakeInferenceRuntimereturning pre-computed VLAD descriptors;FakeTileStore;FakeFdrClient;FakeDescriptorNormaliserinstrumented to verify call order (AC-3); production wiring uses the real C7 PyTorch runtime + real NetVLAD weights + real C6. - Unacceptable substitutes: a NumPy-only NetVLAD forward pass (would not satisfy NFR-perf budget; would defeat the runtime-isolation strategy of using a different runtime than UltraVPR); skipping intra-cluster normalisation (would silently break AC-2.1b's recall floor); using TensorRT for NetVLAD (would defeat the simple-baseline policy of isolating runtime risk); making preprocessing parameters config-knobs (would let operators silently break the recall floor); selecting NetVLAD in an airborne binary (must fail-fast per AC-11); a single-stage L2-only normalisation (would deviate from NetVLAD's published preprocessing chain; recall regression risk).